What invisible characters cause problems?
Several Unicode characters render as nothing — they have no visible glyph but affect text length, parsing, and string comparisons. They sneak in when copying from websites, PDFs, Word documents, messaging apps, and APIs.
- Zero-width space (U+200B) — appears inside URLs and text on many websites as a line-break hint. Breaks string comparisons and regex matches.
- Zero-width joiner (U+200D) — used in emoji sequences. Appears in text copied from mobile apps and social platforms.
- Zero-width non-joiner (U+200C) — used in Arabic and Persian typography. Common in multilingual CMS content.
- Byte order mark (U+FEFF) — prepended by Microsoft Word, Windows Notepad, and Excel. Causes parse failures in JSON, CSV, and shell scripts.
- Non-breaking space (U+00A0) — visually identical to a regular space but a different character. Breaks split(), trim(), and word-boundary regex.
- Soft hyphen (U+00AD) — an invisible line-break hint inserted by typesetting tools. Causes unexpected hyphens when text is processed programmatically.
- Word joiner (U+2060) — a typographic no-break hint inserted by word processors.
Because they are invisible, these characters cause the most frustrating bugs — the kind where the code looks correct and the text looks identical, but string comparisons return false, database deduplication fails, and regex patterns refuse to match.
How Unformat removes invisible characters
Unformat removes all categories of invisible characters in a single pass. Standard mode covers: zero-width spaces (U+200B, U+200C, U+200D), the BOM (U+FEFF as a zero-width character), word joiner (U+2060), and non-breaking spaces (U+00A0, converted to regular spaces).
Enabling Sanitize code (Developer mode) adds: BOM removal at the start of the document, soft hyphen removal (U+00AD), and additional cleanup for code-hostile characters.
The stats toast shows exactly how many of each character type were found and removed. A non-zero count confirms that invisible characters were present.
All processing happens in your browser. Your text never reaches a server.
How to clean your text
- Copy the text you suspect contains invisible characters.
- Paste it into the text area above (Ctrl+V or Cmd+V).
- Invisible characters are removed instantly — check the stats toast for the count.
- For code or config files, enable Sanitize code in options for additional cleanup.
- Click "Copy Clean Text" or press Ctrl+K to copy the cleaned result.
Frequently Asked Questions
How do I know if my text has invisible characters?
You cannot tell by looking — that's the problem. Signs include: string comparisons that return false for text that looks identical, character counts higher than expected, regex patterns that won't match visible text, and trim() not reducing empty-looking strings to zero length. Paste your text here: if the stats toast shows a non-zero count, invisible characters were present.
Will this remove all whitespace, including real spaces?
No. Regular spaces (U+0020), tabs, and line breaks are not removed by default. Only invisible non-printing characters are stripped. Non-breaking spaces (U+00A0) are converted to regular spaces (U+0020), not removed.
Does this also handle emoji?
Some emoji use zero-width joiners (U+200D) to combine multiple characters into a single glyph (e.g., family emoji, flag sequences). Removing U+200D may split these into component parts. For plain text and code use cases this is correct; if preserving complex emoji is important, review the output before using it.