March 2010 Archives

✔: Twitter Keys Does Work Outside of Twitter


Twitter Keys is a little bookmarket you can install so you can insert basic Western emoticons into your Twitter messages. As they point out, "☁ out, better bring ☂" takes up many fewer characters than "Cloudy out, better bring umbrella."

It basically is an online version of the Windows Character Map and Macintosh Character Palette, but marketed for Twitter (very clever). does work in any tool which works with UTF-8 text, including this blog, Facebook, Google services, etc. It technically works with offline tools like Microsoft Word, and it probably does have the most important emoticons handy. I may end up ♥ing it or at least ❥ing (liking) it.

FYI - If Twitter Keys is missing a desperately important character (not a lot of foreign language support IMHO), you can use the native accent tools on your Mac or PC on Twitter and many other tools.


Dealing with x-bar (x̄) and p-hat (p̂) in Statistics


Revised Oct 23, 2012

A common question I get (at least common in Unicode terms) is what the code is for the p-hat (p̂) symbol and x-bar (x̄) symbols in statistics. Although these are common symbols, they haven't made it as a single character into Unicode (much like there thermodynamic dot symbols are half missing unless they are also in Old Irish or another foreign language's spelling system.

The good news is that they can be created in Unicode, but it's quirky. The trick here is to forget math and think phonetics. There is a mechanism to place any diacritic/accent mark over any letter using one of the combining diacritics These are accents, but with a spacing specification that basically says to go backwards over the previous letter.

There's a list of various combing diacritics for HTML at our sister site, Penn State Computing with Accents Diacritics page, but I'll explain how it works for x̄ and p̂


For HTML, I recommend inputting the base letter (x or p) then the appropriate numeric escape code for the combining diacritic. See code examples below:

View the HTML Code

x-bar = x̄ or x̄ (hex)

p-hat = x̂ or x̂ (hex)

View Results

x-bar = x̄ or x̄ (hex)

p-hat = p̂ or p̂ (hex)

Font Notes

If the result of your code is something like p^or ^p rather than , the problem is usually the font. Relatively few fonts support combining diacritics well (not even math fonts support diacritics well). Those that do are phonetics oriented and these include include:

The first two fonts come from Microsoft/Apple, are commonly installed, and also include math symbols...but they are sans-serif fonts. If you want a serif font, you may want to specify one of the other serif phonetics fonts, then point users to it (but specify Arial Unicode/Lucida Grande as backups). Not a very pretty solution at the moment.

Word, etc

And now the fun really begins. You can input these characters in other programs (see below), but editing them will be odd (see below). Here's the procedure:

  1. Switch to a font which supports combining diacritics and type the base letter (x or p in this case).
  2. Type a space and move your cursor back (you'll thank me for this tip):
  3. Then you can insert the combining diacritic:
    1. With the Character Map (Windows) or Character Palette (Mac) OR:
    2. With an Alt code in Word for Windows or an Option code on the Mac Hex keyboard. See codes at The Word/Windows ALT code for x̄ x ALT+0772. The Option Hex code for on the Mac x Option+0304

When you edit, you will discover that sometimes you will delete the accent, and sometimes you will delete the letter beneath the accent (very entertaining). You may need to undo your delete move your cursor with your arrow keys when that happens. It doesn't always look like the cursor is moving, but it is in Unicode text land. When you are trying to type a nasalized open-o /ɔ̃/, you do get some practice....


And now a from...


Ever since I learned that .tv sites are actually from domains registered in the Pacific Island nation of Tuvalu, I keep an eye for unusual domain suffixes. One of my former favorites was (using the rare .us domain suffix for United States). I'm sorry it's now officially

My new favorite may be the addresses used for short URL aliases (similar to aliases). But at some point, I finally had to ask...where is .ly? Answer: It's Libya. You can look it up at (out of Belgium.

Of course, there are many more opportunities out there to explore - like .al (Albania), .an (Netherlands Antillies) .er (Eritrea), .es (Spain) .it (Italy), .in (India) and even .um (US Minor Outlying Islands). Spanish Web services may find .ar (Argentina), .er (Eretria) and .ir (Iran) interesting since these are all verb ifinitives endings. You can see even more options at this blog post. As you can see, the only barrier is our imagination and a nation's willingness to participate in these pun schemes.

This is nothing new, but always fun to observe and ponder...who are these people who provide us our popular online services? I was interested to note that has apparently branched to where .mp are the Northern Mariana Islands.

P.S. The .st suffix is São Tomé and Principe.


African Localization Resources


African languages were in the news Unicode wise because data from a number of African languages was added to the latest version of the Unicode Common Locale Data Repository (UCDL). The hope is that this repository will make it easier to provide African-language software implementations (e.g. a Windows implementation in Hausa) and data presentations (e.g. a calendar in Afar).

If you are interested in implementing content in African languages, there are a number of resources available depending on what your goals are:

African Computing

These pages cover internationalization of African languages.

African Languages

And lastly, the Penn State Computing with Resources site maintains several informational pages on working with text in multiple languages from Africa.


New U.S. "Happy People" Umlaut/Dots


The "inauthentic" use of umlauts in American culture has been well-known for several decades now. Common uses include the metal umlaut (e.g. Mötley Crüe, Motörhead) and what I would call the "fake Garmanic umlaut" (e.g. Häagen-Dazs and the movie title Brüno). They are "inauthentic" because somehow we're supposed to think these words come from a language where they are originally spelled with umlauts...even though the words either don't exist or are spelled without the umlaut.

But recently I am seeing new umlauts (or dots above letters) used in English product or company names. Unlike the other umlauts, there is no Germanic connotation whatsoever. Instead they are meant to be heads of happy people uniting under the product line. Specific examples include Udutu (Üdütü?), a new online content development platform, Intuit (InṪuiṪ?), a small-business Web hosting service with a Quicken plugin, and my personal favorite, Unum (Uṅüṁ), a provider of long term disability insurance. Here are their logos below:

Udutu logo with double dots over each U

Intuit logo with T's in small caps with single dot on top

Unum logo with single dot above n 2nd u and m

As you can see, this usage of the dots differs from the metal/fake Germanic use in that 1) there are both double dots and single dots used and 2) they are meant to represent the concept of people, not another language. In terms of tweaking a phonetic writing system to add a logographic element is interesting, but not new (e.g. I &heart; NY).

To be honest, I think what struck me more is how strong a reaction I had to it (and not a good one). My favorite is Unum, because my mother was a former Unum client and all was well until she had to collect on her disability payments. Then it was a multi-year process for her to be actually approved (after which, my mother said they actually had excellent relations).

Unfortunately, I don't think she suffered alone, as this LA Times article notes attests. Not surprisingly Unum is rebranding itself, but honestly I would much rather read about their improved performance rather than see a cutesy logo with cheerful TV campaign.

I should add that the focus of this post isn't to bash Unum - it appears they are trying to improve their record. And truthfully, getting disability payments out of Social Security was just as long and arduous (that's a complaint for another time).

My actual point is that I wonder if these logos are really increasing trust. Maybe they have tested well on the market, but to me, they remind me of the falsely cheerful restaurant server who says "Hi, my name is ..." Does knowing my server's name in the local chain build a rapport? If they don't get the order right, then no. I'd rather see a server who may be a little world weary but exudes a sense of experience gained through many days of "being in the weeds."

I feel the same principle applies same here. But that's just my opinion.


A List Apart Accent Folding Article (with Review)


Accent Folding (or as A List Apart writes "Áçčềñṭ-Ḟøłðǐṅg") is the process of programming your search algorithms so that if a user types "cafe", results will be displayed for entries containing either "cafe" or the more Francophonically correct "café". Author Carlos Bueno also mentions its importance in Auto-Complete so that an English user can type something like "Lo" and pull up "López" from the system.

As with case folding (merging capital "A" and lowercase "a" as the 'same' letter), the strategy is to list variants and assign them to one "archiletter". This is generally recommended for English because the writing system has absolutely no accents at all, so English monolinguals really do tend to get flummoxed by typing accent codes. If the audience is not English, the question could be trickier (really, you don't want to merge ñ and n in Spanish) and may depend on whether the user needs to search data from an English language source or a source where accents are dropped (e.g. chat/e-mail).

The Accent Folding article by Carlos Bueno is a general introduction but does include snippets of code and links to additional samples and information about pre-existing code libraries. He also points out the various pitfalls (e.g. Should French thé 'tea' also pull up English the? - it will depend on the audience and database).

Another concept that Bueno brings up is that some language like German have conversion schemes in place such that ü becomes "ue" (e.g. Tübingen ~Tuebingen, not just "u". On the other hand, if your text is NOT from a source who knows that, you may be dealing with plain "u" as well (e.g. Müller ~ Muller). Bueno then mentions the transliteration issue and points out that there are usually multiple transliteration schemes (the German ü > ue or u is a simple example). Sensibly, he puts that issue to the side.


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage ( for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (

Powered by Movable Type Pro

Recent Comments