What is a BOM and why does it break everything?
A Byte Order Mark (BOM) is a specific Unicode character — U+FEFF — that was originally designed to signal byte ordering in UTF-16 encoded files. In UTF-8, a BOM is unnecessary (UTF-8 has no byte ordering ambiguity), but Microsoft applications — Word, Notepad, and Excel — still prepend it to text files and clipboard data as a legacy compatibility measure.
The BOM is completely invisible. You cannot see it in a text editor, and most fonts render it as nothing. But it is the very first character in your text, and parsers that don't explicitly handle it will fail — often with confusing errors that point to the first line of your file, not the actual problem.
How a BOM breaks JSON
JSON is one of the most common casualties of an unexpected BOM. The JSON specification requires that a document begin with a structural character ({, [, ", or a number). A BOM at position 0 means the first character a JSON parser sees is U+FEFF, not a valid structural character.
# What you think your file starts with:
{"name": "Alice", "role": "admin"}
# What the parser actually sees (BOM shown as FEFF):
{"name": "Alice", "role": "admin"}
#^ invalid — parser fails hereThe error message is usually SyntaxError: Unexpected token or JSONDecodeError: Expecting value at line 1 column 1. The real cause — a single invisible character — is almost impossible to spot without a hex editor or a tool like Unformat.
How a BOM breaks shell scripts
Shell scripts must begin with a shebang line: #!/bin/bash or #!/usr/bin/env python3. If a BOM is prepended, the kernel sees the first bytes as the BOM character rather than #!. The shebang is not recognized, and the script fails with:
bash: ./script.sh: /bin/bash^M: bad interpreter: No such file or directory
# or
-bash: ./script.sh: cannot execute binary fileHow a BOM breaks CSV imports
CSV files with a BOM produce a phantom extra column in the first row. The column header that should be id becomes id. When a data pipeline, database import, or spreadsheet application reads the CSV, it sees an unfamiliar column name and either rejects the file, maps it incorrectly, or silently corrupts the first field of every row.
This is especially common with CSV files exported from Microsoft Excel on Windows, which always includes a BOM in its UTF-8 CSV exports.
How a BOM breaks HTTP headers and API requests
When you copy a configuration value, token, or header value from a Word document or Windows Notepad and paste it into an API request or config file, the BOM may travel with it. HTTP header values with a leading BOM are rejected by strict HTTP parsers. Bearer tokens with a BOM fail authentication silently — the server receives a different string than you intended.
Where BOMs come from
- Microsoft Word — includes a BOM when copying text to the clipboard and when saving as UTF-8 text
- Windows Notepad — has historically saved UTF-8 files with BOM by default (Windows 10 1903 changed the default, but legacy files are common)
- Microsoft Excel — always prepends a BOM to UTF-8 CSV exports
- Windows text editors — many older Windows editors (Wordpad, older Notepad++) default to UTF-8 with BOM
- Some IDEs and tools — certain configurations of Visual Studio and older Microsoft tools save source files with BOM
How Unformat strips BOM markers
Switch to Developer mode before pasting. In Developer mode, Unformat scans for the BOM character (U+FEFF) at any position in the text — not just at the start — and removes it. The stats toast reports how many BOM characters were stripped.
In addition to BOM removal, Developer mode simultaneously handles the other artifacts that Windows-origin text commonly carries: smart quotes, non-breaking spaces, zero-width characters, soft hyphens, em-dashes, en-dashes, and CRLF line endings. This means one paste cleans everything — you don't need to run multiple tools.
Standard mode does not remove BOM markers, because in some non-developer contexts a BOM at the start of a document is intentional. Developer mode is explicit — when you switch to it, BOM removal is always active.
All processing is local. Your text never leaves your browser. Unformat uses JavaScript regex to match U+FEFF and removes it in a single pass, along with all other artifacts.
How to clean your text
- Copy the text that may contain a BOM — from Word, Excel, Notepad, or a CSV file.
- Switch to Developer mode using the toggle above the text area.
- Paste into the text area above (Ctrl+V or Cmd+V) — the BOM is removed immediately.
- Check the stats toast — if the BOM count is 1 or more, it confirms the BOM was there and has been stripped.
- Click "Copy Clean Text" or press Ctrl+K to copy the BOM-free output.
- Paste into your JSON file, shell script, CSV, or API request body.
Frequently Asked Questions
How do I know if my text has a BOM?
You usually can't tell by looking at it. The most reliable signs are: JSON parse errors on line 1 column 1 with no visible cause, shell scripts failing with 'bad interpreter', CSV imports producing a phantom extra column, or string comparisons failing on text that looks identical. Paste your text into Unformat in Developer mode and check the stats toast — it reports BOM count directly.
Does the BOM affect text in the middle of a file?
A BOM is intended for the start of a file, but U+FEFF can appear anywhere in text, especially when multiple documents are merged or text is concatenated from different sources. Unformat removes U+FEFF at any position in the text, not just at the beginning.
Why does Excel always add a BOM to CSV files?
Excel adds a BOM to UTF-8 CSV exports as a signal to other Microsoft applications that the file is UTF-8 encoded (rather than the system's default ANSI encoding). This is a legacy compatibility decision. While it works fine in Microsoft's own ecosystem, it causes failures in virtually every other tool that processes CSV — databases, Python, Node.js, and Linux command-line utilities all handle BOM-prefixed CSVs incorrectly or not at all.
Can I prevent Word from adding a BOM?
Not reliably. Word's clipboard copy behavior includes the BOM as part of how it encodes Unicode text. The most practical approach is to clean the text after copying, which is what Unformat does. If you're generating files programmatically, specify 'utf-8' (without BOM) rather than 'utf-8-sig' (with BOM) in your file writing code.
Is a BOM the same as a CRLF problem?
No, they are different issues. A BOM (U+FEFF) is a single invisible character at the start of text. CRLF (\r\n) is a Windows-style line ending at the end of each line. However, both come from the same source (Windows applications and Microsoft Office), and text with a BOM often also has CRLF line endings. Unformat's Developer mode removes both in a single pass.