Enter Plane 1 (Phonecian/Linear B...) on Mac Unicode Hex Keyboard

|

A useful utility on the Mac is the Unicode Hex keyboard which allows you to press Option plus any four digit Unicode code to get that character.

For instance, if you need to enter the rarely seen archaic Roman numeral symbol for 5,000 (), you could look up its Unicode character number (U+2181), then activate this keyboard then type Option+2181 and generate the code (assuming the correct font is loaded).

But a lot of ancient scripts are in Plane 1, meaning they have Unicode values with five digits (i.e. U+10000 or higher). In Unicode world, adding the fifth digit means that some processes go slightly awry, and the Unicode Hex keyboard is one of them. Suppose I want to input Phonecian character Alf (Aleph) (𐤀 or an A on its side), which is U+10900. If I enter Option+10900 on the Unicode Hex keyboard, I will not get Alf, but ႐ instead.

Note: Characters U+0000 to U+FFFF are in Plane 0 or the BMP (Basic Multilingual Plane). A lot of systems are set up to deal with BMP only, but need special support for codes beyond U+FFFF. The four-digit restriction corresponds to 16-bytes which a constraint in older systems. If you're not a programmer, let's just say it's a long story and leave it at that.

It turns out that the Unicode Hex keyboard has a four-digit limit. To get around it, you can break U+10900 into two 16-byte (i.e. 4-digit) sequences, also known as as a UTF-16 Surrogate Pair. For U+10900, the surrogate pair is D802+DD0C. So in the Unicode Hex utility, you can now do this.

  1. Hold down the Option key.
  2. Type D802+DD0C, where the + means type the Plus sign.
  3. Release the Option key.

I bet you're asking - how did she get from U+10900 to D802+DD0C? There is an algorithm, but in this case I got it by opening the Character Palette, finding the character I wanted and mousing over it. When you do that, the Unicode code point appears along with its surrogate pair in parentheses.

Of course, you could also directly Insert the character with the palette, but actually there are times when the Insert doesn't quite work (at some points in the careers of my laptops, I have corrupted my Character Palette so badly, it refused to play with me anymore).

Although this utility seems a little limited at the moment, if there's one thing I have learned is that Unicode no trick has ever gone to waste.

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments