Why Word text is never truly plain text
Microsoft Word is the single most common source of invisible formatting problems. Even when you copy text that appears to be plain and unformatted, Word embeds a dense layer of Unicode characters that cause real issues downstream.
The most pervasive issue is non-breaking spaces (U+00A0). Word inserts these between words to prevent line breaks, after punctuation marks, and around special characters. They look identical to regular spaces but are a completely different Unicode code point. When pasted into code editors, databases, search fields, or APIs, they cause subtle bugs: string comparisons fail, search doesn't find known text, and data deduplication breaks.
Smart quotes are enabled by default in every Word installation. When you type " or ', Word replaces them with Unicode curly quotes (“ ” ‘ ’). These break JSON files, SQL queries, YAML configuration, Python scripts, shell commands, and virtually every programming language and data format.
Soft hyphens (U+00AD) are another Word specialty. These invisible characters are inserted at syllable boundaries throughout the text. They tell the rendering engine where a word can be hyphenated at a line break. But when you copy the text, the soft hyphens come along. They're invisible but present, inflating character counts, breaking string comparisons, and causing unexpected behavior in text processing pipelines.
The Byte Order Mark problem
Word often prepends a Byte Order Mark (BOM, U+FEFF) at the beginning of text. This invisible character was designed to indicate byte ordering in Unicode files, but Word includes it unnecessarily. When you paste Word text into a terminal, API request body, or configuration file, the BOM can cause:
- JSON parse errors on the very first character
- Shell scripts failing with “command not found”
- CSV files with a phantom empty column in the first row
- HTTP headers being rejected by strict parsers
- XML documents failing validation at the declaration line
Em-dashes, en-dashes, and special punctuation
Word automatically converts double hyphens (--) into em-dashes (—) and number ranges into en-dashes (–). While typographically correct, these characters are problematic in technical contexts. Command-line flags that use --verbose become —verbose, and numeric ranges like 1-10 contain a character that isn't a minus sign or hyphen.
Word also replaces ... with a single ellipsis character (…), and converts fractions and ordinals into special Unicode forms. Each of these substitutions creates potential issues when the text leaves Word's ecosystem.
Why the Notepad method falls short
The traditional advice for stripping Word formatting is: “Paste into Notepad, then copy from Notepad.” This is a popular workaround, but it's incomplete. Here's what Notepad does and doesn't handle:
What Notepad strips: What Notepad keeps:
───────────────────── ─────────────────────
✓ Bold, italic, colors ✗ Smart quotes (“ ” ‘ ’)
✓ Font sizes and families ✗ Non-breaking spaces (U+00A0)
✓ Hyperlinks (visual) ✗ Soft hyphens (U+00AD)
✓ Bullet styling ✗ Zero-width characters
✓ Table formatting ✗ Em-dashes and en-dashes
✗ BOM markers (U+FEFF)
✗ Windows line endings (CRLF)Notepad strips the visual formatting (bold, font size, colors) but preserves all the character-level Unicode substitutions. The text looks plain in Notepad, but it still contains smart quotes, non-breaking spaces, soft hyphens, and every other hidden character Word inserted.
Unformat strips everything. Visual formatting is already gone (since you're pasting into a plain text area), and Unformat handles all the character-level artifacts that Notepad misses. It's the “paste into Notepad” method, done properly.
How Unformat strips Word formatting
Unformat is designed to handle the full spectrum of Word formatting artifacts. It targets every Unicode substitution that Word makes and converts it back to the ASCII-compatible equivalent.
Standard mode is suitable for most Word text cleaning: it replaces smart quotes, removes zero-width characters, converts non-breaking spaces, normalizes line endings, trims trailing whitespace, and collapses excessive blank lines. Use this when copying Word content into emails, messaging apps, or other documents.
Developer mode goes further for technical use cases. In addition to everything Standard mode does, it removes soft hyphens, strips BOM markers, replaces em-dashes and en-dashes with regular hyphens, and converts tabs to your preferred indentation. Use this when pasting Word content into code editors, terminals, config files, or API request bodies.
Unformat vs. Notepad vs. online converters
Feature Notepad Online tools Unformat
────────────────── ─────── ──────────── ────────
Smart quotes ✗ ~ ✓
Non-breaking spaces ✗ ~ ✓
Soft hyphens ✗ ✗ ✓
Zero-width chars ✗ ✗ ✓
BOM removal ✗ ✗ ✓
Em/en-dash fix ✗ ✗ ✓
Line ending fix ✗ ✗ ✓
Privacy (local only) ✓ ✗ ✓
Stats/breakdown ✗ ✗ ✓Unlike online converters that upload your text to a server, Unformat processes everything in your browser using JavaScript. This is critical for business documents, contracts, NDAs, and proprietary content — your text never leaves your machine.
Settings are persisted in your browser's localStorage. Your preferred mode, indentation size, and auto-copy preference are remembered between sessions without any account or login.
How to clean your text
- Select and copy the text from your Word document (Ctrl+C or Cmd+C).
- Choose Standard mode for general text, or Developer mode for code and config files.
- Paste into the text area above (Ctrl+V or Cmd+V) — cleaning happens instantly.
- Review the stats toast to see exactly what was removed: smart quotes, soft hyphens, non-breaking spaces, etc.
- For Developer mode, click the gear icon to set your preferred indentation (2 spaces, 4 spaces, or keep tabs).
- Click "Copy Clean Text" or press Ctrl+K to copy the cleaned output.
- Paste the genuinely plain text into your destination application.
Frequently Asked Questions
Why can't I just paste into Notepad to get plain text?
Notepad strips visual formatting (bold, fonts, colors) but preserves all character-level Unicode substitutions that Word makes. After pasting into Notepad, your text still contains smart quotes (“”), non-breaking spaces (U+00A0), soft hyphens (U+00AD), zero-width characters, and BOM markers. These invisible characters cause the same bugs as before. Unformat removes all of them.
What about using 'Paste as Plain Text' (Ctrl+Shift+V)?
Paste as Plain Text strips rich formatting (HTML tags, styles) from the clipboard, but it does NOT convert Unicode characters back to ASCII. Smart quotes remain smart quotes, non-breaking spaces remain non-breaking spaces. Unformat goes deeper by replacing these character-level substitutions with their ASCII equivalents.
Can I disable smart quotes in Word?
Yes. In Word, go to File → Options → Proofing → AutoCorrect Options → AutoFormat As You Type, then uncheck 'Straight quotes with smart quotes'. However, this only prevents future smart quotes — it doesn't fix existing ones in your document, and it doesn't prevent non-breaking spaces, soft hyphens, or other hidden characters. Unformat handles all of these regardless of your Word settings.
Is this safe for confidential or legal documents?
Yes. Unformat runs entirely in your browser using JavaScript. Your document text is never sent to any server, never stored, and never logged. There are no cookies, no analytics on your content, and no third-party scripts that access the text area. You can verify this by checking the Network tab in your browser's Developer Tools — zero requests are made during cleaning.
Will Unformat change the actual content of my text?
No. Unformat only replaces invisible formatting characters with their standard equivalents. Smart quotes become straight quotes, non-breaking spaces become regular spaces, and invisible characters are removed. Your words, sentences, paragraphs, and meaning are completely preserved. The stats toast shows you exactly what was changed and how many characters were affected.