November 2007 Archives

Formatting Arabic Numbers

|

Someone on another posted an interesting question I hadn't pondered yet - why can you switch a keyboard to Arabic, Hindi or Japanese, but still end up with Western numbers?

Example Numbers
* Western (Latin or "Arabic" in Arabic) - 0,1,2,3,4,5,6,7,8,9
* Arabic (or "Hindi" in Arabic) - ٠,١,٢,٣,٤,٥,٦,٧,٨,٩ Added March, 2009
* Hindi (Arabic actually Devanagari) - ०,१,२,३,४,५,६,७,८,९

Part of the answer is that Western numbers have become a true global standard. According to this Arabeyse.Org forum post from Arfeen Serajul, many Western Arabic speaking countries like Morocco ONLY use Western numbers and are unfamiliar with what we call "Arabic numbers" (Arabic speakers call them "Hindi numbers").

But...the other part of the answer is that the numbers are really numbers. If you input numbers into a spreadsheet like Excel, you want all the calculations to be accurate. From a computing point of view, you have one number, but a variety of options of how you want to display it (with Western as the default in the U.S.)

So, to get non-Western numbers, you typically have to go into the Region settings, not the keyboard settings. Here are some sample instructions for Microsoft Windows. The big gotcha (and it's a doozy) is that you often change the number display setting ACROSS THE ENTIRE OPERATING SYSTEM.

I did experiment with displaying Arabic (Hindi) numbers, but ended up seeing them everywhere, even in English Web sites. Just a tad disorienting.

If you do need to display non-Western numbers, I would recommend doing it in just Word only (there are some options). It's still tricky though - I had to do an AutoCorrect hack in one case (e.g. \1 = १). I think I missed a step somewhere....

Categories:

Promoting UTF-8 over ASCII

|

At the last Unicode Conference in October, Computer Science professor Jiangping Wang gave a good talk about how to train new programmers (especially those in the U.S.) how to program software which can easily use Unicode.

One issue Dr Wang mentioned is that when encoding is taught in traditional computer science programs, it is very brief and the topic sticks to ASCII only. This is obviously problematic since encoding had extended beyond ASCII since the 1980s. Another problem is that ASCII encoding isn't as complex as Unicode encoding.

Unicode isn't just about expanding the set, but understanding how additional typographic issues. For instance Unicode contains characters which control text direction (Left or right) which is not found in ASCII. In addition, Unicode can be presented in "several flavors" such as UTF-8, UTF-16 and so forth. ASCII also had a few national variants, but it was never dependant on byte order like Unicode is.

Of course Dr. Wang was "preaching to the choir" at Unicode 31 - WE all know how important proper Unicode support is. The real challenge is convincing others that Unicode is really the true wave of the future.

Will this ever happen? Actually, one thing that will probably accelerate adoption of Unicode is developing online Web 2.0 technologies. Those companies who want their tools to reach a global audience (e.g. Google, Yahoo, del.cio.us, Twitter) are building in Unicode support from the start. That way, anyone from Japan to Russia can tag their custom maps with their native characters.

I don't know about you, but nothing makes me feel more connected to the world wide web than seeing a Twitter posting in Cyrillic.

Categories:

Some Ancient Script Mega Fonts

|

If you've achieved total script geekdom, then you especially want fonts which support ancient scripts as well as the modern ones. Unicode has been expanding its ancient script coverage, and fonts have been catching up in the past year or two.

Some of my favorites include:

MPH 2B Damase - available free from Gallery of Unicode Fonts. It includes many scripts including the Aegean scripts, Phoenician, many cuneiform scripts, Glagolitic, and more.

Aegean, Akkadian, Unicode Symbols - These fonts and others are freeware fonts from George Douros. Just pick the ones you want. Note - he has a heiroglyphic font if you need it, but it's not Unicode compliant (Unicode is still working on this script)

Alphabetum Unicode - This font from Juan José Marcos comes highly recommended, but it does cost $15.

Code 2001 - From James Kass. Technically it's still in beta, but that doesn't appear to concern anyone much.

Categories:

Multilingual Mac Leopard Updates

|

I'm not on Leopard yet - I suspect our tech support staff would like the kinks to be shaken out first, but Multilingual Mac is making good notes on the new Unicode features which are available.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments