Fixing Encoding Issues
Last updated: 2026-02-20 · Troubleshooting
What Are Encoding Issues?
Character encoding defines how text characters are stored as bytes. When the wrong encoding is used to read a file, special characters — accented letters (é, ñ, ü), CJK characters, or symbols — appear as garbled text (sometimes called "mojibake").
Common Symptoms
- Place names with accented characters appear as
Müncheninstead ofMünchen - Attribute values show
�orâ€"replacement characters - Chinese, Japanese, or Korean text appears as question marks or boxes
How ConvertGeoData Handles Encoding
- Auto-detection: We check for a .cpg file (Shapefiles), BOM markers, and use heuristic detection.
- Manual override: In the conversion wizard, you can specify the source encoding if auto-detection gets it wrong.
- UTF-8 output: All output files are written in UTF-8 (the universal standard).
Common Encodings by Region
| Region | Likely Encoding | IANA Name |
|---|---|---|
| Western Europe | Windows-1252 or ISO-8859-1 | windows-1252 |
| Central/Eastern Europe | Windows-1250 or ISO-8859-2 | windows-1250 |
| Japan | Shift_JIS or EUC-JP | shift_jis |
| China | GBK or GB2312 | gbk |
| Korea | EUC-KR | euc-kr |
| Universal (modern) | UTF-8 | utf-8 |
Tips
- If you created the data, always save in UTF-8 when possible.
- For Shapefiles, include a .cpg file containing just the encoding name (e.g.,
UTF-8). - If auto-detection fails, try Windows-1252 first — it's the most common legacy encoding for Western European data.