Fixing Encoding Issues

Last updated: 2026-02-20 · Troubleshooting

What Are Encoding Issues?

Character encoding defines how text characters are stored as bytes. When the wrong encoding is used to read a file, special characters — accented letters (é, ñ, ü), CJK characters, or symbols — appear as garbled text (sometimes called "mojibake").

Common Symptoms

  • Place names with accented characters appear as München instead of München
  • Attribute values show � or â€" replacement characters
  • Chinese, Japanese, or Korean text appears as question marks or boxes

How ConvertGeoData Handles Encoding

  1. Auto-detection: We check for a .cpg file (Shapefiles), BOM markers, and use heuristic detection.
  2. Manual override: In the conversion wizard, you can specify the source encoding if auto-detection gets it wrong.
  3. UTF-8 output: All output files are written in UTF-8 (the universal standard).

Common Encodings by Region

RegionLikely EncodingIANA Name
Western EuropeWindows-1252 or ISO-8859-1windows-1252
Central/Eastern EuropeWindows-1250 or ISO-8859-2windows-1250
JapanShift_JIS or EUC-JPshift_jis
ChinaGBK or GB2312gbk
KoreaEUC-KReuc-kr
Universal (modern)UTF-8utf-8

Tips

  • If you created the data, always save in UTF-8 when possible.
  • For Shapefiles, include a .cpg file containing just the encoding name (e.g., UTF-8).
  • If auto-detection fails, try Windows-1252 first — it's the most common legacy encoding for Western European data.

Stop emailing zip files and hoping for the best.

GeoShare: cloud preview + flexible download for every geospatial file you share. Coming soon to ConvertGeoData.