What are zero-width characters and why do they cause problems?
Zero-width characters are Unicode code points that render as nothing — they have no visible glyph. You cannot see them, but they are very much present in the text. Copying from websites, PDFs, Word documents, Slack messages, and APIs routinely introduces them.
The most common offenders are:
- Zero-width space (U+200B) — the most common. Appears after words and inside URLs on certain websites as a line-break hint. Looks exactly like no character at all.
- Zero-width joiner (U+200D) — used in emoji sequences and complex scripts. Appears in text copied from mobile apps, Twitter, and messaging platforms.
- Zero-width non-joiner (U+200C) — used in Arabic and Persian typography. Appears in text copied from multilingual documents and CMS platforms.
- Byte order mark (U+FEFF) — prepended by Microsoft Word, Windows Notepad, and Excel. Invisible but causes parse failures in JSON, CSV, and shell scripts.
- Word joiner (U+2060) — a typographic no-break hint that word processors insert around certain character sequences.
Why they are so dangerous
Because they are invisible, zero-width characters cause the most frustrating category of bugs — the kind where the code looks correct, the text looks identical, but everything fails. Common symptoms:
- String comparisons return false on text that looks identical. Two strings that appear the same on screen differ in their underlying byte sequences because one contains U+200B and the other does not.
- Regex patterns fail to match text you can clearly see. A pattern like
/hello world/won't matchhelloworldeven though you see the same two words. - Database deduplication breaks. Two rows with the same visible text are stored as distinct values because one has a zero-width space.
- Variable names silently corrupt when you paste code from a website.
myVariablelooks likemyVariablebut is a different identifier, causingNameErrororundefined. - Search functionality fails. A user searching for a term will not find a record that contains the same term with a hidden zero-width character in it.
- Character counts are wrong. Form validation fails — a field that looks empty may have a non-zero length, or a field that looks 10 characters long may count as 12.
Where do zero-width characters come from?
Websites and CMS platforms are the most common source. Medium, WordPress, Notion, and Confluence all insert zero-width spaces at word boundaries for typographic control. When you copy-paste an article or documentation, every one of those invisible characters comes with it.
PDF files frequently contain zero-width joiners and spaces, especially in documents with complex typography, ligatures, or multilingual content.
Messaging apps — Slack, WhatsApp, iMessage, and Discord — add zero-width characters around mentions, hashtags, and link previews. Copying a code snippet that was shared without code-block formatting brings these along.
Copy-paste from browsers: some browser extensions and websites inject zero-width spaces for tracking or fingerprinting purposes, embedding them in copied text without your knowledge.
APIs and data feeds that return JSON or plain text from third-party providers sometimes include zero-width characters in field values, especially if the upstream data originated in a word processor or CMS.
How Unformat removes zero-width characters
Unformat strips zero-width characters in both Standard and Developer modes. Paste your text, and U+200B, U+200C, U+200D, U+2060, and U+FEFF are removed immediately — no configuration needed.
The stats toast tells you exactly how many invisible characters were removed. If you paste text that looks clean and the count is non-zero, that's the confirmation you needed that something was hiding in there.
Developer mode additionally handles BOM markers (U+FEFF) and soft hyphens (U+00AD). Use this mode when cleaning text destined for code editors, terminals, JSON files, or API request bodies.
All processing happens in your browser with JavaScript. Your text never reaches a server. You can verify this by opening your browser's Network tab — zero outbound requests are made during cleaning.
How to clean your text
- Copy the text you suspect contains zero-width characters.
- Paste it into the text area above (Ctrl+V or Cmd+V).
- Unformat removes all zero-width characters instantly — check the stats toast for the count.
- If cleaning code or config, switch to Developer mode for BOM removal and additional fixes.
- Click "Copy Clean Text" or press Ctrl+K to copy the cleaned output.
- Paste the clean text into your destination — code editor, database, search field, or API.
Frequently Asked Questions
How can I tell if my text has zero-width characters?
By looking at it — you can't. That's what makes them dangerous. The clearest sign is when string comparisons fail on text that looks identical, when regex patterns don't match visible text, or when character counts are higher than you expect. Paste your text into Unformat and check the stats toast: if the zero-width character count is non-zero, they were there.
Will removing zero-width characters break emoji?
Some emoji sequences use zero-width joiners (U+200D) to combine multiple emoji into one (e.g., family emoji, flag emoji). Removing U+200D may split these combined emoji back into their component parts. If your text contains complex emoji that must be preserved exactly, review the output carefully. For code, data, and plain text use cases, removing U+200D is always correct.
Do zero-width spaces affect SEO?
Yes. Search engine crawlers parse the underlying text, not what you see on screen. Zero-width spaces inside keywords can split words that search engines would otherwise treat as a single token. For CMS content and blog posts, cleaning text before publishing prevents these invisible characters from affecting how your content is indexed.
Can I detect zero-width characters programmatically?
Yes. In Python: text.count("\u200b") detects zero-width spaces. In JavaScript: /[\u200B\u200C\u200D\u2060\uFEFF]/g.test(text) tests for the full set. To remove them in JavaScript: text.replace(/[\u200B\u200C\u200D\u2060\uFEFF]/g, ""). Unformat applies this logic plus additional cleanup rules in a single operation.
Does this tool also remove non-breaking spaces?
Yes. Non-breaking spaces (U+00A0) are also stripped — they are converted to regular spaces. While not technically zero-width, non-breaking spaces are equally invisible and cause the same category of bugs. Unformat handles both categories together.