August 2013 Archives

Explaining UTF-8


The UTF-8 encoding is not a straight encoding of Unicode code points, but rather a "compromise character encoding" which allows files with just ASCII characters to stay the same size as ASCII, but also include any Unicode code point, regardless of byte size.

If this is sounding a bit confusing, you may want to try this Game Dev article on UTF 8. It's still under review, but it does step through some parts of the the conversion from a Unicode code point to a UTF-8 representation.


Text Expander & Breevy: Great Unicode Input Tools


A Unicode issue many people have is how to enter stray symbols not found within the range of normal entry utilities. For instance as a linguist, I may be entering phonetic symbols, logic symbols or random characters from a variety of languages. These go way beyond the range of "everyday" accented letters

Lately though, I've been introduced to a class of text expander tools such as TextExpander (Mac) and Breevy (Windows) that has truly been a life saver.

What both these tools is allow you to create abbreviation codes for symbols, words, phrases and even entire paragraphs. In my case, I created abbreviations for many phonetic symbols (e.g. "\e" = ə (Schwa) and "\n" = ŋ (Engma)). You can also create codes for math symbols (";all" = (upside down A)), emoticons and icons (;hrt = ♥ (heart)) and even common words or phrases that you use a lot but don't want to type out (;dvrk = Dvořák) or (;rdetr raison d'être) . Actually, I mostly use this tool for full phrases I use a lot in e-mail (e.g. ;lmk = "Let me know what you think.") This is truly a multipurpose linguistic tool!

I've tried lots of input tools in the past, and text expanders have some very nice advantages. One is that the codes work everywhere from e-mail and Facebook to Microsoft Word and Illustrator. Also, since you define the abbreviation, you will be more likely to remember it. Typing numeric codes only goes so far. Finally, I don't have to switch keyboards or open a program just to input one word or phrase. If you are truly working with two languages, then switching is practical, but for sporadic symbols and words/phrases, it's a pain.

There is one drawback in that you have to design your codes so that you won't type them elsewhere. You'd be surprised when a code like "urpr" might misfire when you type "surprised."

Windows 8 & Windows 8.1 Ancient Script and Asian Fonts


Scholars interested in ancient scripts such as Glagoltic, Gothic and Old Hangul may be interested in the new fonts packaged with Windows 8, in particular the updated Segoe UI Symbol font.

Or you could wait for Windows 8.1 when support for Coptic and different scripts of South and Southeast Asia will be added.


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage ( for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (

Powered by Movable Type Pro

Recent Comments