Recently in News Category

List of Old Church Slavonic Fonts

|

AATSEEL (American Association of Teachers of Slavic and East European Languages) has just posted a set of links to "Medieval Slavic Fonts" for Old Church Slavonic, Glagolitic and Blackletter.

See http://www.aatseel.org/medieval_slavic_font for more information

List includes Unicode fonts and older non-Unicode fonts

Unicode 31: Lessons from the "Front Line"

|

I had a hard time deciding which sessions to attend at the last Unicode conference, but I did end up at "Unicode at the Front Lines", which was a series of mini-presentations from scholars working with lesser-known languages and scripts. This is a place where the Unicode rubber really hits the road, and I learned some interesting "life-lessons".

1. The problem with "reforming" a script is that new readers may not be able to read the older texts. This was in context of the Tai Viet script (apparently the reform was so unpopular, they ditched it), but occurs in Chinese (Traditional vs. Simplified), Korean (new texts use only Hangul, but older ones included Chinese) and even in cases where spelling reform is enacted (as in the Netherlands and Germany).

BTW - I'm not against spelling/script reform, but we do have to admit that there will be some "loss" (enough to keep a few scholars in archaic languages in business).

2. Try not to invent a new letter for new languages. In the earlier part of the 20th century, linguists were fond of inventing quirky new symbols for languages they were documenting. A classic case is Igbo which has a lots of vowels with dots beneath them as in Ị,ị,Ọ,ọ,Ụ,ụ. There is no objection to the dots per se, but they are an unusual in the context to what Western alphabets do. Because these characters are outside the norm, Igbo internationalization has to play continual catch-up because even programs which can handle Western European languages, may not know what to do with the dots.

If your lesser-known language already includes letters that are common to the major languages, implementation of utilities in your language is much easier. Of course, I think Unicode is better for including dotted letters.

For now though...if you have a choice between "v" or "vh" in your language, the latter is (unfortunately) a little more Unicode ready.

3. H ≠ Η ≠ Н - For the record the first is English H /h/, the second is Greek capital Eta /ē/ and the last is Cyrillic En /n/. I knew that many capital letters are triple encoded (e.g. A/alpha/Cyrillic Ah), but this is the first time I realized that the phonetic values can be so different. Normally this isn't an issue unless you have linguists from all over Europe trying to use their native script for phonetic spellings. When do you have the right H?

4. ŵ ≠ ŵ it matters when you type the accent). Unicode supports "pre-composed accents" (that is an accent which can float over any letter) and in theory the combination of ̂́+ w (to make ŵ) should be the same as w + ̂́ (to make ŵ) ...but it's not. A linguistic archive database has these precomposed letters but can't "merge" the two string combinations as one letter.

Again, this wouldn't be too critical except that sometimes a linguist puts the accent before the w, and sometimes they put the w before the accent. Again these are the same world-wide linguists who gave us the problem of the three H's.

A member in the audience did suggest that it was a "training issue", but who are we kidding...these are FACULTY. Faculty are great scholars, but few are well-trained data entry operators.

Is the IUC Conference Worth it? Absolutely!

|

My university was kind enough to send me to IUC 31 (http://www.unicodeconference.org) this year, and I can honestly say that it was one of the best conferences I've been to.

For one thing, almost all the major players (Microsoft, Apple, Sun, IBM, Adobe, Google, Yahoo, W3C) sent representatives, so I got to hear a lot of great information straight from the source. I've been hacking away at this for seven years, but I learned quite a bit of new information, especially about some of the more technical aspects.

The Unicode conference is also very good at providing a good range of how-tos ranging from absolute monolingual beginner to cutting edge tools for the experienced Unicoder. Even the basics gave me some pointers that I had forgotten or hadn't considered. I obviously couldn't make all the sessions (no cloning yet), but the PDF's that attendees can access are fairly detailed and can help you track it down.

I have to confess that my favorite track was probably "Unicode on the Front Lines" in which linguists described encoding issues for minority languages and scripts. From a language geek perspective, it's fascinating what new issues come up. More importantly, I saw that there was a lot of support for outreach in the Unicode community. I heard the members of the Unicode Org point some users to resources they hadn't know about before.

I myself gave a presentation about Unicode at Penn State, and I have to say most of the feedback was very positive, and I got a few tips myself.

So all in all, I have to say thanks to the organizers of the conference for putting on a great event.

Unicode 31 Presentation

|

I'm actually presenting at IUC 31 (http://www.unicodeconference.org) in San José about supporting international technnoogy at Penn State. You can download the Powerpoint if you want to read more.

Download Powerpoint

North Korea Applies for Internet Domain Code

|

Although the country code KP (Democratic People's Republic of Korea /North Korea) has been available for some time, there had been no agency in North Korea to administer any .kp Web sites. This has been seen as an sign of the official policy to isolate North Korea from outside influences.

In Aug 2007, ICANN reported that it had received a request to "delegate this doman," but they said no decision had been reached as of Aug 14, 2007.

According to Prof. Kim Young-Soo, North Korea does have some access to the Internet, but is probably only available to only a few of the highest-level government officials, including Kim Jong-il.

References

Unicode 5 Released

|

The latest specs for Unicode 5.0 are out.
http://www.unicode.org/versions/Unicode5.0.0/

As with every new version, new characters have been added including minor additions to older blocks like Latin, Math, Hebrew, Greek, Cyrillic and others.

In addition five new blocks were added - Phonecian, Sumero-Akkadian Cuneiform, Balinese, Phags-pa and N'ko.

What Does this Mean....Implementation Wise?

I did want to add a cautionary note here - just because Unicode 5.0 has come out does not mean you'll be typing Phonecian in Windows Vista next week.

After this comes several steps that take place in order for a new script block to be fully implemented. Roughly, they are:

1. New fonts must be created or old ones retooled. The fonts must be Unicode compliant so that the right glyphs are matched with the right code points. This is a fairly rapid, but critical step. You can post material online with just a font, but it's harder unless...

2. Someone develops a keyboard utility for that script. These allow you to type the characters directly from your keyboard instead of using escape codes or cutting and pasting from the Character Map (Win) or Character Palette (Mac).

The first ones are usually third-party tools and sometimes they work perfectly within the operating system, but sometimes not depending on the quirks of the script.

3. In my opinion, true prime time acceptance occurs when the major vendors (Microsoft/Apple/Firefox/Adobe) build in support for a script into their products. This may take several years and a certain amount of wrangling.

Also if you're a font purist (and good designers must be), it should be noted that the first fonts are almost always "underdeveloped" and ligatures may not be as pretty as they could be. Fortunately most first fonts are now being developed in Open Type, so the initial quality is a little better than the old True Type font.