Unicode Angst in Japan and East Asia


The site Unicode in Japan tracks the history of encoding in Japan and explains the technical and not-so-technical issues for Unicode detractors. An even harsher criticism was written by Norman Goundry (date 2001).

One problem for the East Asian languages is that different countries (China, Taiwan, Japan) may use different shapes to draw the "same" character. But since Chinese writing is made up of thousands of charcters, the question then become how many variations are needed.

The Unicode Consortium proposed Han Character Unification to avoid designating too many characters, but this has its quirks. One potential problem is that the same "character" could look very different if you are using a Japanese font vs. a Chinese font. Thus you are back to specifying fonts again.

Issues like this are one reason national character sets like Shift-JIS for Japanese persist. For instance, the Mojikyo Character set has been developed apart from Unicode specifically to support archaic Japanese characters and other variants.

Is it hopeless? Probably not. For one thing Unicode has been rapidly evolving so that 2006 Unicode is quite different from 2001 Unicode. Every version from Unicode 3.1 through Unicode 5.0 has added characters and specifications to resolve older issues with Asian encoding.

Another plus is that the Unicode Consortium seems to be changing its policy on unifying every script...all sorts of historical variations are popping up in even the Western European Latin blocks. My favorite has been the encoding of German Fraktur letters and Gaelic alphabetic variants.

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments