Got Unicode?
Japanese Notes
Japanese is an East Asian script, but differs significantly from the Chinese script because it uses three phonetic scripts in addition to the Chinese kanji characters.
Basic How-Tos
If you just want to set up on Japanese on your Windows or Mac, see the Penn State Japanese Set Up Page.
Multiple Scripts
The Japanese script is considered one of the most complex because it combines four writing systems in one. Fortunately, three of them are phonetic, but you cannot be considered an educated until you can also read Chinese Kanji. The scripts are:
- Katakana - Based on Chinese, but each symbol is a syllable. Used for foreign words or technical vocabular.
- Hiragana - Also based on Chinese, but rounder. Each symbol is also a syllable. Often used for grammatical endings.
- RomÄji - Roman (English) alphabet, often mixed in with other scripts in modern Japan
- Kanji - the set of Chinese characters used in Japanese. However, not all Japanese characters are the same as the characters used for Chinese (hanzi) (Japan Reference)
Phonetic scripts developed in Japan partly as a way to write Japanese case endings (okurigana) not found in Chinese.
Still more
In addition to the forms found on the Web, there are a few more variants
- Furigana - Kanji Characters with minature Katakana or Hiragana above or below to show the phonetic pronunciation. Technially
- Hentaigana - an archaic syllabary found in soba noodle shops, diplomas, invitations and other times when a formal script might be used. Can also refer to a style of Japanese calligraphy.
- Manyogana - Another syllabary with Chinese Kanji used only for their phonetic value (not their meaning). These were used in ancient poetry.
Information about these additional scripts can be found at these sites:
- Wikipedia Japanese Writing System (links in right menu)
- Japanese Studies Com Hentaigana
As of September 2006, neither Hentaigana or Manyogana blocks had been develeoped in Unicode, but there may be fonts that display either Hiragana or Katagana as one of the older fonts.
Unicode Angst in Japan and Asia
If you ever subscribe to any Unicode list, you will quickly discover that many non-English speakers people have strong negative feelings about Unicode and Japan is no exception.
The site Unicode in Japan tracks the history of encoding in Japan and explains the technical and not-so-technical issues for Unicode detractors. An even harsher criticism was written by Norman Goundry (date 2001)
One problem for the East Asian languages is that different countries (China, Taiwan, Japan) may use different shapes to draw the "same" character. But since Chinese writing is made up of thousands of charcters, the question then become how many variations are needed.
The Unicode Consortium proposed Han Character Unification to avoid designating too many characters, but this has its quirks. One potential problem is that the same "character" could look very different if you are using a Japanese font vs. a Chinese font. Thus you are back to specifying fonts again.
Issues like this are one reason national character sets like Shift-JIS for Japanese persist. For instance, the Mojikyo Character set has been developed apart from Unicode specifically to support archaic Japanese characters and other variants.
Is it hopeless? Probably not. For one thing Unicode has been rapidly evolving so that 2006 Unicode is quite different from 2001 Unicode. Every version from Unicode 3.1 through Unicode 5.0 has added characters and specifications to resolve older issues with Asian encoding.
Another plus is that the Unicode Consortium seems to be changing its policy on unifying every script...all sorts of historical variations are popping up in even the Western European Latin blocks. My favorite has been the encoding of German Fraktur letters and Gaelic alphabetic variants.
RUBY Vertical Text for Furigana
Did you want vertical text or furigana support on the Web? Well maybe you'll get it some day, but not right now (unless you go the PDF route).
But check in with the W3C RUBY Annotation Specification page for more details and tests. Currently, CSS3 is scheduled to include RUBY formatting attributes.
CSS3 is also scheduled to include a "writing-mode" attribute for other types of vertical writing, but these must be incorporated into the various browsers and text devices. But I'm positive....Some year, "someday" may be today!