December 2008 Archives

HKSCS (Hong Kong Supplementary Character Set) Links

|

A while ago, I wrote about the complexity of specifying a language code for Cantonese, the form of Chinese spoken in Hong Kong. As many East Asian specialists know, Cantonese is so distinct from standard Mandarin Chinese (Beijing) that Western universities offer separate Cantonese language classes.

To further complicate the situation I also recently learned that there is also HKSCS or the "Hong Kong Supplementary Character Set" which is a block of Chinese hanzi characters used just on Hong Kong. I did decide to gather a few links for myself, in case the topic ever comes up. Here is what I found.

Some Basic Notes

1. Microsoft does incorporate HKSCS support into Windows in principle, but you may need to download the appropriate plugins, especially for XP and earlier versions of Windows. See the first few links above for details. Full support may also depend on implementation in other software packages.

2. Recent versions of Mac include Changjie and Janyie option in the Traditional Chinese input utilities. See the Yale Chinese Mac page above for details. Full support may also depend on implementation in other software packages.

3. HKSCS comes in a 2001 and a 2004 version. It is also tied to both Uniicode (UCS) and Big5 encoding (Traditional Chinese, Taiwan) even though the rest of China mostly uses Simplified Chinese.

4. Some recent discussions on the Unicode list (ca. Nov 2008) seemed to indicate that HKSCS was not as wide-spread as it could be, but it does appear that the major vendors are making initial steps.

While I am not an expert on the technical aspects of HKSCS, I do think it's interesting that there continues to be a "Hong Kong" issue even though it's been a part of China for over 10 years. Several centuries of a separate colonial heritage has allowed a Cantonese written standard to more fully emerge than it might otherwise have happened.

Categories:

Got Coptic?

|

I was trying to learn more about how the Coptic alphabet interacted with Unicode, and although each and every script has it's own story, I was surprised at how tricky Coptic is with respect to Unicode. Coptic is a left-to-right alphabet with minimal spacing issues - much like the Latin, Cyrillic and Greek alphabets. If you can get Gothic online, Coptic should be easy right? Not necessarily...

Some Things You Should Know About Coptic and Unicode

Such as...

1. There is an old Coptic block and a new Coptic block

You may already know that the Coptic alphabet is an adaptation of the Greek alphabet as used in late Ptolomeic, Roman Egypt. I think most Unicode aficionados know that there is an old Coptic block (containing just the letters adapted from Demotic Egyption script), and a new Coptic block (everything).

At one point the Unicode community was treating Coptic as a variant of the Greek alphabet with a few extra letters, but later it was decided to separate Greek and Coptic completely, so the new block was created in just the past few years.

2. The old Coptic block didn't go away

As far as I can tell, the Demotic characters were not assigned new numbers, but were left as part of the Greek block. A complete Coptic alphabet is pulling from both blocks.

This was kind of a surprise since many Coptic charts just show the new block and miss the Demotic letters altogether.

3. New Coptic fonts and utilities are available

The new Coptic block is old enough for the academic and other developers to catch up. Here's my current list. By the way, I also recommend Quivira and MPH2B Damase as general purpose linguistic fonts - they do cover a lot of blocks.

Coptic Fonts

The following freeware fonts are available for both Windows and Mac:

Coptic Computing and Keyboards

4. Browsers generally choke on Coptic (except Firefox 3 and Safari)

I uploaded my fonts, and checked my new chart on Safari (which is fine, but not always a Unicode superstar in my opinion). Everything worked there, but when I checked my chart in Firefox 2, all I got were the Unicode question marks of death (Whoa)

The same also happened in IE 7 and Opera. Not a pleasant surprise. For the record, I was able to get Opera and Firefox 2 (Windows) to display Coptic if I made a font with Coptic the generic default (hence my recommendation for Quivira). I was not able to get either Firefox 2 for Mac or IE 7 to display Coptic (and I did see some other forum messages indicating similar issues).

The good news is that the Coptic did encourage me to upgrade to Firefox 3, and there everything is fine - no font tweaks needed.

As I said earlier I am mystified by this because Coptic is not particularly unusual as far as Unicode blocks go. But it is working in some browsers now.

So that was my adventure with Coptic. Someday I hope I may get to use it in a real textual or linguistic application, but at least I know that I was able to update to Firefox 3 and not lose all of my other plugins!

Categories:

Funky Fraction Glitch

|

It's been a long week and I was catching up on my celebrity, when I saw the following in my RSS headline reader.

O.J. Simpson Sentenced to 171-2 Years

I'm not a big O.J. Simpson fan, but a 171-2 year sentence seemed a little excessive for robbery. But actually it was a Unicode glitch. It was supposed to be a 17½ but that part of the reader was having problem.

17.5Not171.gif

The lesson learned - always leave a space between the whole number and it's fractional component. TGIF!!

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments