Macintosh: August 2009 Archives

Announced i18n Enhancements for Mac Snow Leopard (10.6)

|

New operating systems often mean new i18n toys to play with and even through the upgrade from Apple 10.5 (Leopard) to 10.5 (Snow Leopard) is not supposed to be full of new features, there are, in fact, new features scheduled for the upgrade.

According to the Apple Snow Leopard Enhancement page, 10.6 will include:

  • Redesign of Pinyin Chinese input with faster speed and enhanced dictionary
  • Improvements to handwritten Chinese input
  • New Asian fonts - Heiti SC, Heiti TC, Hiragino Sans B.
  • New generic monospace font Menlo to be used in applications such as Terminal
  • Enhanced RTL support including split cursor option to show text direction in documents with bidirectional text
  • General Text substitution (e.g. (c) to ©) across applications. Could be handy for a lot of situations when you need to enter an unusual symbol. This already exists in Microsoft Office (Mac/PC).

But I almost missed the big one - the International pane in the System Preferences has been redesigned and will now be the Language and Text pane, presumably with more features. There may be other enhancements in the works that are too minor to be announced (or at least too minor for most people), but there may be more things to find out.

How will they work? Alas, no details from Apple yet. I guess we won't know until we know....

Categories:

Enter Plane 1 (Phonecian/Linear B...) on Mac Unicode Hex Keyboard

|

A useful utility on the Mac is the Unicode Hex keyboard which allows you to press Option plus any four digit Unicode code to get that character.

For instance, if you need to enter the rarely seen archaic Roman numeral symbol for 5,000 (), you could look up its Unicode character number (U+2181), then activate this keyboard then type Option+2181 and generate the code (assuming the correct font is loaded).

But a lot of ancient scripts are in Plane 1, meaning they have Unicode values with five digits (i.e. U+10000 or higher). In Unicode world, adding the fifth digit means that some processes go slightly awry, and the Unicode Hex keyboard is one of them. Suppose I want to input Phonecian character Alf (Aleph) (𐤀 or an A on its side), which is U+10900. If I enter Option+10900 on the Unicode Hex keyboard, I will not get Alf, but ႐ instead.

Note: Characters U+0000 to U+FFFF are in Plane 0 or the BMP (Basic Multilingual Plane). A lot of systems are set up to deal with BMP only, but need special support for codes beyond U+FFFF. The four-digit restriction corresponds to 16-bytes which a constraint in older systems. If you're not a programmer, let's just say it's a long story and leave it at that.

It turns out that the Unicode Hex keyboard has a four-digit limit. To get around it, you can break U+10900 into two 16-byte (i.e. 4-digit) sequences, also known as as a UTF-16 Surrogate Pair. For U+10900, the surrogate pair is D802+DD0C. So in the Unicode Hex utility, you can now do this.

  1. Hold down the Option key.
  2. Type D802+DD0C, where the + means type the Plus sign.
  3. Release the Option key.

I bet you're asking - how did she get from U+10900 to D802+DD0C? There is an algorithm, but in this case I got it by opening the Character Palette, finding the character I wanted and mousing over it. When you do that, the Unicode code point appears along with its surrogate pair in parentheses.

Of course, you could also directly Insert the character with the palette, but actually there are times when the Insert doesn't quite work (at some points in the careers of my laptops, I have corrupted my Character Palette so badly, it refused to play with me anymore).

Although this utility seems a little limited at the moment, if there's one thing I have learned is that Unicode no trick has ever gone to waste.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments