March 2008 Archives

Microsoft Word ∧ Logic: Inserting the Right Code Point


The Insert Symbol Tool in Word

As I said last entry, I'm working on a symbolic logic course and am learning new quirks for dealing with with Unicode logic symbols...and one of them apparently is the Microsoft Word Insert Symbol tool (this is found by going to Insert » Symbol in most versions of Word.

Like the Windows Character Map and Mac Character Palette, the Insert Symbol tool lets you insert single characters into a document so you can change "P implies Q" to the logical formulation P ⊃ Q or P → Q depending on your symbolism (and you can also switch between "P and Q," P & Q or P ∧ Q).

But...unlike the Windows Character Map/Mac Character Palette, the Insert Symbol tool can take you on a little detour out of standard Unicode and into the Microsoft Private Use Area block - or the block where vendors can define their own characters. For instance, when I tried to insert the character ∩ (union) into a document, I noticed that the Insert Symbol palette gave a code point of U+F0C7 instead of the expected U+2229, and yes the U+F0 code is a sign that you are in the Private Use Area.


First I should say that there is a rationale for this. You'll notice that the font in the graphic is set to "Symbol" which is an older pre-Unicode font which was used to insert lots of special mathematical symbols. The Private Use set-up undoubtedly prevents a lot older documents from breaking.

So What?

If all you're doing is using with Word, the Insert Symbol tool may still be working for you. But these days, more and more documents are actually destined for the Web or some other format...and not all tools recognize the Microsoft Private Use codes.

The way I first noticed that the logic symbols weren't standard Unicode was that some logic symbols did not "convert" well to HTML in Course Genie but mysteriously became things like "(". The ones I had inserted properly converted, but not the ones inserted with the Word Symbol tool. Ugh.

The use of proper Unicode versus an older format does have a real world impact.


To avoid the Private Use function in new Word documents just always use the WWindows Character Map and Mac Character Palette. On Windows, you may need to switch the font to Arial Unicode.

Or if you're especially insane, you can develop your own logic symbol keyboard utility.


Course Genie and Unicode: A–


Since my day job is online course developer, I get to work with a lot of academic tools, including my newest tool the Course Genie (or Wimba Create) Word plugin.

This is a tool which takes a Word file "injected" with the right styles and converts a long Word manuscript into a set of well-formed HTML documents complete with table of contents page and page navigation. Even if you don't insert any self-test quizzes, this is a major time saver. But...can it do Unicode?

For once, this is a real issue since the course I'm working on is symbolic logic and uses plenty of specialized symbols like ∪,∩,∃x,∀x and so forth. So far I've been pleasantly surprised to discover that the CourseGenie planners did think ahead and implemented decent Unicode strategies.

The good news is that if your instructor (aka subject matter expert) hands you a Word file including these symbols, you may not have to do much other than make sure that the symbols are inserted from the Character Map and not from an old custom font. Course Genie by default will either convert these to numeric codes...or if you select a special UTF 8 theme, even include the UTF-8 meta tag.

For most modern browsers this is sufficient. The only gotcha is that it sets everything to Verdana text (even the symbols) and IE 5/6 acts a little strange when fonts for special characters are pre-specified for Arial Unicode.

The other complaint is that that most theme settings insert the ISO-8859-1 Latin-1 encoding meta tag instead of UTF-8...EVEN THOUGH the base XML file is UTF-8. Unless you know to select a UTF-8 theme, you won't get meta tag. Not only does this make me nervous on principle, but it means that you have to be extra careful if you ever edit the files in another program like Dreamweaver.


Igbo in Facebook - It Can Be Done (But Numeric Code Breaks)


How does Facebook handle accents? Pretty well actually - but you can't use the numeric code. Instead you have to directly insert the character either by typing it in an Igbo Keyboard or via the Windows Character Map or Mac Character Palette.

For Web 1.0, the safest way to display accented letters was with numeric entity codes. For instance, if wanted to display Ụwa, I might write Ụwa within the HTML document. The codes were safer because they would work even if a developer forgot to include the UTF-8 meta tag.

In a Web based form, the rules may differ depending on how the developer configured the service. In some forms, you MUST enter the numeric code (often because the UTF-8 tag is missing). In other cases you CANNOT use the numeric code - this is true when you are entering data into a text field which will not go through any HTML formatting schemes. As long as the output has the UTF-8 meta tag (and Facebook does), you can avoid a numeric code (i.e. enter a "raw" accented letter) and still be OK.

How can you tell? Unfortunately, you have to test each application one by one. As I've commented before, applications which truly expect to support a global audience are generally UTF-8 ready and you can skip the numeric code. This includes Facebook, MovableType, iTunes, GoogleMaps, Twitter and so forth.

Being able to skip the numeric code is a positive sign (why memorize numbers when you can type?), but as with all change, there will be some old habits to break.


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage ( for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (

Powered by Movable Type Pro

Recent Comments