March 2009 Archives

Unicode Hexadecimal Alt Code Entry in Microsoft Office


In Windows you can use the ALT key to enter numeric codes for decimal values of Unicode code points (e.g. ALT+065 = A). The limit is set to 255...unless you're in Microsoft Office, where you can input larger values (e.g. ALT+1046 = Cyrlilic Ж (Zhe)).

This is handy, but I found out through the grape vine (specifically blogger John D. Cook) there's a way to also enter the hexadecimal value of the Unicode point. For instance Ж is actually U+0416 in the Unicode spec where they are listed by hexadecimal values. Unfortunately, it's still restricted to Microsoft Office but it can be useful

It's a little tricky, so here's how it goes

  1. Open Microsoft Word or other Office app.
  2. Type a four digit hex code point (e.g. "0416").
  3. Next type Alt+X. The numeric code will be replaced by the correct character.

I wish this trick worked in every Windows app, but it's still useful if you are using Word and need a code and can only get access to a list of codes in hexadecimal format. At least you can bypass the hexadecimal to decimal conversion.


Pinyin Joe's Chinese Computing Help Desk for Vista/XP


Webmaster "Pinyin Joe" site that might clear up some of the mysteries of Chinese support in Windows (including Vista)

He goes through set up, the possible input utilities you can activate and even some font samples for the Microsoft Chinese fonts. There's good coverage of Windows XP as well.

FYI - Mac users should check Yale's Chinese Mac site.


Entity Code or Raw Text in HTML?


If you need to enter a non-English character like /ə/ (schwa) on a Web page, you actually have two choices - you can use an entity code like ə or you can use an input utility to enter in raw date (i.e. ə). You can see how the code looks below

Bahama = /bəhamə/

Bahama = <b>/b&#601;ham&#601;/</b>

Bahama = <b>/bəhamə/</b>

Which is Better?

The entity code was a solution from before browsers could reliably recognize true UTF-8 text or before a server or Web tool could "serve" it up properly. In the year 2000, it was the safest solution by far.

These days though the tide is shifting so that I would recommend avoiding entity codes unless your tech is not quite up to speed. For instance, this blog (served by Movable Type) hardly ever uses an entity code. I type or copy Unicode text and out it goes. The same is true for Facebook, Twitter and most Web 1.0 sites hosted at Penn State.

The advantage of NOT using entity codes is that it is easier to port content between file formats including RSS and other XML formats. RSS, unlike HTML, does not recognize the HTML entity codes. An entity code such as &#601 will be displayed as... &#601 (not schwa). Only ə is displayed as ə. The same is true if you want to include Unicode on your Facebook profile page.

The other advantage is debugging and proof. Which do you want to spell check? Русский or &#x0420;&#x0443;&#x0441;&#x0441;&#x043A;&#x0438;&#x0439;?

However there are cases where you need to use escape codes just to be safe. Often the problem is that you are using a server which can't deliver UTF-8 encoded text for whatever reason. One of these, unfortunately, has been our course management system - fortunately it's WYSIWYG editor converts non-English text to escape codes for you.

If you are working with a static page, there really should be no roadblock at this long as your page has the correct UTF-8 meta tag. The cases where this isn't working is likely due to a under configured Apache set up.

Ironically though, I seem to see more under configured Apache issues than I used to...One step forward, one step back?


Native American & Indigenous Australian Keyboard Layouts


Languages from the Americas and Australia are usually written in the Latin alphabet, but often contain characters (e.g. ʉ, ʔ,ɬ/ł,ō) or combinations of characters not found in other language orthographies. So, a keyboard utility which consolidates them in one keyboard layout is very handy.

Chris Harvey of Language has a page of keyboard layout downloads (both Windows and Mac). His site also includes keyboard layouts for Cherokee and the languages which use the Aboriginal Syllabics, as well as several freeware fonts covering all these languages. Well worth a visit.


Handout for CALICO 2009 Conference


Below is a link to handout and materials for a CALICO 2009 presentation on Unicode text entry for the Macintosh (with a supplemental Windows handout). Click the link below to download.

Download CALICO Unicode Handouts .zip


Entry #100 - Still More about Middle Eastern Numbers


What better way to celebrate the 100th entry in this blog than with...a correction. It's a humble reminder that just because you know a lot about Unicode doesn't mean you can't mess up a crucial detail.

Way back in 2007, I posted an entry about generating Arabic (calligraphic) numbers in Microsoft Office (i.e. "١,٢,٣" vs. "1,2,3"). The entry noted that in Arabic "Arabic number" actually means Western (1,2,3) (actually called the DIGIT ONE, DIGIT TWO,... in Unicode). The term for numbers like ١,٢,٣ is actually "Hindi number" in Arabic (or ARABIC-INDIC DIGIT ONE, ARABIC-INDIC DIGIT TWO... in Unicode).

But the numbers I displayed as "Hindi/Arabic" were actually the Devanagari numbers as used in India (e.g. १,२,३). In Unicode these are called DEVANAGARI DIGIT ONE, DEVANAGARI DIGIT TWO...). Fortunately Eric Verlind pointed out the flaw, so I was able to correct the forms. Eric also pointed me to a Microsoft Digit Support page where I learned there are variations for Arabic, Persian and Urdu.

The learning never stops in Unicode world.


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage ( for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (

Powered by Movable Type Pro

Recent Comments