Why smart quotes break Python code
You've copied a Python snippet from a blog post, Slack message, Word document, or email. It looks perfectly fine. You paste it into your editor, run it, and get:
File "script.py", line 3
print(βHello, world!β)
^
SyntaxError: invalid character 'β' (U+201C)The culprit is smart quotes (also called curly quotes or typographic quotes). These are Unicode characters that word processors, email clients, and messaging apps automatically substitute for straight ASCII quotes.
Before vs. After: what your code actually looks like
Here's the problem visualized. This is what you think you pasted:
# This looks correct but won't run
message = "Hello, world!"
name = 'Alice'
print(f"{name} says: {message}")But this is what your editor actually received (Unicode code points shown):
# Hidden curly quotes β causes SyntaxError
message = βHello, world!β # U+201C and U+201D
name = βAliceβ # U+2018 and U+2019
print(fβ{name} says: {message}β)After cleaning with Unformat (Developer mode), you get valid Python:
# Clean ASCII quotes β runs correctly
message = "Hello, world!"
name = 'Alice'
print(f"{name} says: {message}")Python expects ASCII quotes
Python's lexer only recognizes two quote characters for string delimiters:
- Double quote:
"(U+0022) - Single quote:
'(U+0027)
But word processors and email clients silently replace these with:
- Left double quote:
β(U+201C) β looks like"in most fonts - Right double quote:
β(U+201D) - Left single quote:
β(U+2018) β looks like'in most fonts - Right single quote:
β(U+2019)
These look nearly identical in most fonts, making the bug extremely hard to spot visually. The Python interpreter sees completely different characters and rejects them immediately.
Where smart quotes come from
Microsoft Word and Google Docs have βsmart quotesβ enabled by default. Any code snippet typed or pasted into these editors gets its quotes silently replaced.
Slack, Teams, and Discord apply the same transformation outside code blocks. When a colleague pastes code into a Slack message without wrapping it in backticks, the quotes are converted to Unicode curly quotes.
Blog posts and tutorials copied from web pages frequently carry smart quotes, especially if the content was authored in a CMS like WordPress or Medium that applies typographic transformations. Even Stack Overflow answers can contain them if the author composed their answer in a word processor first.
PDF documents almost always use smart quotes. Copying code from a textbook, research paper, or documentation PDF is one of the most common sources of this issue.
The problem goes beyond quotes
Smart quotes rarely travel alone. The same sources that produce curly quotes also inject:
- Non-breaking spaces (U+00A0) that look like regular spaces but cause Python's
IndentationError: unexpected indent - Zero-width spaces (U+200B) that create invisible characters inside variable names, breaking
NameError: name is not defined - Em-dashes (β) that replace minus signs and hyphens, breaking arithmetic expressions
- BOM markers (U+FEFF) at the start of text that cause
SyntaxErroron the first line
Unformat's Developer mode catches all of these simultaneously.
How to sanitize Python code with Unformat
Use Developer mode for code cleaning. It replaces all Unicode quote variants with their ASCII equivalents and simultaneously fixes every other invisible character that breaks Python.
Developer mode handles the full set of problematic characters:
β ββ"andβ ββ'(smart quotes β straight ASCII quotes)U+00A0β regular space (fixesIndentationError)U+200B U+200C U+200D U+FEFFβ removed entirely (fixes phantom characters)β ββ-(em/en-dashes β regular hyphens)- Tab characters β spaces (configurable: 2 or 4 spaces, or keep tabs)
- BOM markers β removed (fixes first-line
SyntaxError) \r\nβ\n(CRLF β LF line endings)
Fixing it programmatically in Python
If you need to fix smart quotes in a Python script rather than manually, here's a quick approach:
def fix_smart_quotes(text: str) -> str:
replacements = {
'\u201c': '"', '\u201d': '"', # double
'\u2018': "'", '\u2019': "'", # single
'\u00a0': ' ', # nbsp
'\u2014': '-', '\u2013': '-', # dashes
}
for old, new in replacements.items():
text = text.replace(old, new)
return textBut when you just need to quickly clean a pasted snippet, Unformat is faster than writing a script. Paste, clean, copy β done in under a second.
All processing runs in your browser. Your code never touches a server, which matters when you're working with proprietary code or sensitive data.
How to clean your text
- Copy the Python code that has smart quote errors.
- Switch to Developer mode using the toggle above the text area.
- Paste your code into the text area (Ctrl+V or Cmd+V).
- The code is instantly cleaned β curly quotes become straight quotes, invisible characters are removed.
- Verify the indentation setting (2 or 4 spaces) matches your project via the gear icon.
- Click "Copy Clean Text" or press Ctrl+K, then paste into your editor.
- Run your Python code β the SyntaxError should be gone.
Frequently Asked Questions
Why does Python show SyntaxError: invalid character?
Python's lexer only accepts ASCII quote characters (U+0022 for double quotes, U+0027 for single quotes) as string delimiters. When you paste code that contains Unicode curly quotes (U+201C, U+201D, U+2018, U+2019), Python sees them as invalid characters because they are not recognized quote delimiters. The error message usually includes the Unicode code point, like U+201C, which confirms the issue is a smart quote.
Which Python versions are affected by smart quotes?
All Python versions are affected, including Python 2.7, 3.x, and the latest releases. The Python lexer has never accepted Unicode curly quotes as string delimiters and likely never will, since doing so would be ambiguous (these characters are valid in Unicode string content, just not as delimiters).
Can non-breaking spaces cause IndentationError in Python?
Yes. Python uses indentation for block structure, and it expects regular ASCII spaces (U+0020) or tabs. A non-breaking space (U+00A0) looks identical to a regular space but is a different character. Python treats it as a non-whitespace character, which causes IndentationError: unexpected indent or IndentationError: unindent does not match any outer indentation level. Unformat converts all non-breaking spaces to regular spaces.
How do I prevent smart quotes when copying code from Slack?
In Slack, always share code inside code blocks (wrap with backticks ` for inline or ``` for multi-line). Slack preserves straight quotes inside code blocks but converts them to smart quotes in regular message text. If someone has already sent code without backticks, copy it and paste it into Unformat to fix the quotes.
Does this tool handle Python f-strings and triple-quoted strings?
Yes. Unformat replaces all Unicode curly quotes with ASCII straight quotes regardless of their position in the text. This correctly handles f-strings (f"..."), triple-quoted strings ("""..."""), raw strings (r"..."), and byte strings (b"..."). The tool processes at the character level, so string type prefixes are preserved.