Recently in South Asian Category
What better way to celebrate the 100th entry in this blog than with...a correction. It's a humble reminder that just because you know a lot about Unicode doesn't mean you can't mess up a crucial detail.
Way back in 2007, I posted an entry about generating Arabic (calligraphic) numbers in Microsoft Office (i.e. "١,٢,٣" vs. "1,2,3"). The entry noted that in Arabic "Arabic number" actually means Western (1,2,3) (actually called the DIGIT ONE, DIGIT TWO,... in Unicode). The term for numbers like ١,٢,٣ is actually "Hindi number" in Arabic (or ARABIC-INDIC DIGIT ONE, ARABIC-INDIC DIGIT TWO... in Unicode).
But the numbers I displayed as "Hindi/Arabic" were actually the Devanagari numbers as used in India (e.g. १,२,३). In Unicode these are called DEVANAGARI DIGIT ONE, DEVANAGARI DIGIT TWO...). Fortunately Eric Verlind pointed out the flaw, so I was able to correct the forms. Eric also pointed me to a Microsoft Digit Support page where I learned there are variations for Arabic, Persian and Urdu.
The learning never stops in Unicode world.
A great resource from my library is the South Asia Language Resource Center out of the University of Chicago. They include information about the major scripts of India and neighboring countries including font information (with samples).
Address is http://salrc.uchicago.edu/.
Unicode version 5.1 was recently released, and includes some new code blocks as well as new specifications. As with all new versions of Unicode there will be a time lag until the new items can be incorporated into fonts and utilities, but here is a partial list of new items
If you're interested in the new characters, the best place to view them is at http://www.unicode.org/charts/
New Plane 0 Scripts
- Cham (Cambodia/Vietnam)
- Kayah Li (Thailand/Myanmar)
- Lepcha (India)
- Ol Chiki/Santali (India)
- Rejang (indonesia)
- Saurashtra (India)
- Sundanese (Indonesia)
- Vai (Liberia)
Script Extensions
These blocks add characters to previously encoded scripts.
- Cyrillic Extended-A
- Cyrillic Extended-B
- Arabic - characters for math, 4 Qu'ranic and multiple characters for different languages
- Indic - Malayalam, Tamil character sequences, Devanagari chandra a,
Sanskrit sounds in Gurmukhi, Oriya, Telegu - Latin - characters for minority languages and capital German sharp S (rare)
- Math Symbols
- Medievalist Punctuation - for research
- Myanmar Additions
New Plane 1 Ancient Scripts and Miscellaneous Symbols
- Carian (Anatolia/Turkey)
- Lycian (Anatolia/Turkey)
- Lydian (Anatolia/Turkey)
- Phaistos Disk (Crete)
- Domino Tile Symbols
- Mahjong Tile Symbols
Recent Comments