ELIZABETH J PYATT: June 2007 Archives

Visual Basic 6.0 in Unicode


This tutorial discusses using Unicode within Visual Basic 6.0 for Windows



Migrating PHP To Unicode


Here's a site documenting how a PHP programmer migrated to UTF-8 (with some of the gotchas).



Getting the ř of Dvořák


If you've been visiting accent code pages looking for the hachek R (ř) found in the famous Czech's composer's name...chances are it's not there. That's because the tables only cover those accented letters found in Western European languages, or in Unicode terms, accented characters with code points #0-255. You can see the Penn State Encoding Tutorial if you want the full details.

If the code point is over 255, you have to switch to a new method of inputting things. As it happens the "exotic" accented letters in Central European languages like Czech, Polish, Hungarian, Croatian, Serbian and Slovenian are all over 255. So this means...


If you're just typing in a few names like Dvořák, then the Windows Character Map will let you insert the characters above 255. The ř character is in the Latin Extended A range. Note: There are numeric ALT codes, but they don't work in all applications.

On the other hand, if you're typing text in Czech or other Central European language, then it's probably better to activate the appropriate language keyboard which lets you type the accented letters directly from the keyboard.


I personally recommend the Extended Keyboard because you can type a wider range of accented letters. I wish Windows had one of these... (maybe on Vista?)

However, you could also activate the Character Palette or the specific language keyboard depending on your needs.

As you can see, not all Unicode code points are equal (especially in the U.S. market).


How the Swastika Got into Unicode


Script standards seems like a relatively innocent topic, but you do encounter the strangest questions. For instance, a few weeks the Unicode list is discussing how to encode Swastika variants. Before you faint with shock, consider

1) The swastika is encoded as a symbol used in historical documents, so we’re stuck with it. It’s actually encoded in the Unihan (East Asian) block at point at U+534D.

2) The swastika is actually a positive sign in the East (India/East Asia). It was even used in a non-political manner in the West...before the rise of the Nazi party. I even found a 1920’s handkerchief from a great aunt with a charming cross stick swastika motif (in blue).

Gentle Swastika

The actual discussion is whether you need swastika variants (e.g. rotated or with different arms). But it also brings up an interesting cultural question - How long should we let the atrocities of the Nazi party hold a symbol hostage?

Right now the display of the swastika is banned in Germany...which has a logic to it. But has some odd consequences in that you can’t sell any WWII comics in Germany, even if they’re from the Allied point of view! You also can’t even sell any manga comics which may have a Buddhist non-violet swastika displayed.

Similarly, someone in the discussion section of the Swastika Wikipedia page did NOT want this page to ever be a "Featured" page, even though the goal was to discuss the positive Eastern uses of the symbol. It can head into censorship territory.

I reject using the symbol in a Neo-Nazi context, but in the spirit of trying to undo a crime of culture, I made some Swastika symbol samples from various Asian fonts. These aren't nearly as toxic.

Curved Swastika and Chinese Stroke Unihan Swastika


Unicode 5 Released


The latest specs for Unicode 5.0 are out.

As with every new version, new characters have been added including minor additions to older blocks like Latin, Math, Hebrew, Greek, Cyrillic and others.

In addition five new blocks were added - Phonecian, Sumero-Akkadian Cuneiform, Balinese, Phags-pa and N'ko.

What Does this Mean....Implementation Wise?

I did want to add a cautionary note here - just because Unicode 5.0 has come out does not mean you'll be typing Phonecian in Windows Vista next week.

After this comes several steps that take place in order for a new script block to be fully implemented. Roughly, they are:

1. New fonts must be created or old ones retooled. The fonts must be Unicode compliant so that the right glyphs are matched with the right code points. This is a fairly rapid, but critical step. You can post material online with just a font, but it's harder unless...

2. Someone develops a keyboard utility for that script. These allow you to type the characters directly from your keyboard instead of using escape codes or cutting and pasting from the Character Map (Win) or Character Palette (Mac).

The first ones are usually third-party tools and sometimes they work perfectly within the operating system, but sometimes not depending on the quirks of the script.

3. In my opinion, true prime time acceptance occurs when the major vendors (Microsoft/Apple/Firefox/Adobe) build in support for a script into their products. This may take several years and a certain amount of wrangling.

Also if you're a font purist (and good designers must be), it should be noted that the first fonts are almost always "underdeveloped" and ligatures may not be as pretty as they could be. Fortunately most first fonts are now being developed in Open Type, so the initial quality is a little better than the old True Type font.


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments