Recently in Accents & Punctuation Category
What these are
The superscript a/o (sometimes underlined) are abbreviations for ordinal numbers used in Spanish, Italian and Portuguese similar to English -th (as in "4th, 5th, 6th.."). The use of "o" vs "a" depends on the gender of the noun. For instance, the "1st American woman" would be 1ª americana in Spanish and the "1st American man" would be 1º americano. The 5th Amercan woman and man would be 5ª americana/5º americano.
The Codes
I got a request for putting codes for these on the Penn State Web Computing with Accents Web site in various locations, so I thought I would summarize the codes here.
| Feminine Ordinal (ª) | Masculine Ordinal (º) | |
|---|---|---|
| Unicode Code Point | U+00AA (170) | U+00BA (186) |
| Windows Alt Code | ALT+0170 | ALT+0186 |
| Mac Option Code | Option+9 | Option+0 |
| HTML Entity Code | ª | º |
But Wait There's More
But in the land of Unicode, there's always more to know...such as that in Spanish 1º primero '1st.masc' or '1º' may be shortened to primer which can be abbreviated as '1er'...or that you may write octavo 'eight.masc' as 8º or 8.º or possibly 8vo...although Google tends to have more instances of 8º.
What's important though is that only º and ª have their own code points in Unicode. For English -th, -nd, -rd or Spanish -vo,-er you have to rely on the old fashioned SUP (superscript) tag or its equivalent in CSS.
Defining Emoji
There were lots of interesting sessions at last week's Unicode conference, but the one that I think non-experts can relate to the most was the one about Emoji or those little tiny icons popular in Japanese e-mail messages.
A rough translation of emoji might be emoticon, but the range of images goes way beyond smiley faces to include weather symbols, hearts, beer steins, sports icons, high heels,fast food, astrological signs, warnings, hand gestures and bikinis.
Why Unicode?
It's good to catalog and standardize any symbol set, but in this case economic necessity is driving this campaign. Specifically, Google and Apple (and its iPhone) who want to expand more into the Japanese market.
According to our presenters, the three major Japanese cell phone carriers all support emoji, and these images are popular with most adults (even the ones over 30). It's an important enough feature that iPhone (and iChat), Gmail and even Twitter support emoji.
But really it would be good to support one encoded set of emoji, not a hack of three emoji encodings from the Japanese cell phone carriers...hence the need for a unified encoding which combines those items already encoded (e.g. zodiac symbols) with symbols not currently in Unicode.
Remaining Issues
Because no Unicode script block is free of quirks, I document the issues overheard at the conference and at the Web. Namely:
Color - Real emoji have colors (really bright ones), but the spec is in black and white. This makes sense because the rest of Unicode is also in black and white. Plus you will have more options to add the colors you want!
5-Digit Code Points - Or more technically, the new glyphs will be assigned a number above U+FFFF (i.e. not in the BMP or Plane 0). Not surprisingly, many mobile devices are limited to U+FFFF and below. The committee's comment was that they expected that moble developers would learn to overcome this restriction...because they really are running out of room in the U+0000-FFFF range. That may be good news for anyone wanting to transmit the ancient scripts over cell phones. You never know when you need to access a Mycenaean Greek text away from the office or when the next Linear B revival may happen.
There's a
JailbreakApp for that - When researching this article I encountered articles about tricks for enabling emoji on non-Japanese iPhones, not all of which were legit. For a while, Apple was discouraging use of emoji outside of Japan so it was hiding the emoji. Fortunately, there is a legal way to enable emoji now (both a trick and an app).
So there you have it - thanks to the great folks at Google and Apple, we will all be able to standardize the addition of cute icons in our online communication...or at least we will have a documented explanation of what they were for future generations. Trust me, in about 500 years, we will need it.
Languages from the Americas and Australia are usually written in the Latin alphabet, but often contain characters (e.g. ʉ, ʔ,ɬ/ł,ō) or combinations of characters not found in other language orthographies. So, a keyboard utility which consolidates them in one keyboard layout is very handy.
Chris Harvey of Language Geek.com has a page of keyboard layout downloads (both Windows and Mac). His site also includes keyboard layouts for Cherokee and the languages which use the Aboriginal Syllabics, as well as several freeware fonts covering all these languages. Well worth a visit.
The Windows International keyboard is a Windows utility from Microsoft which allows users to enter a variety of accent codes with combinations of keys like '+e (for é) instead of memorizing a list of numeric ALT codes. If you are typing a lot of accented characters on a Windows machine, it's a godsend.
The interesting thing is that you can now download a Mac version of the Windows International keyboard. As a longtime Mac addict, I find it amusing because I am so used to the Apple Option keys. To me it's an interesting reduncancy.
But I can imagine that if you are a long-time Windows user, you may not want to re-learn a new set of Option codes. I can relate, because I've been struggling with my new phonetics keyboard which is very different from my old one. There's some serious retraining needed before I could use it.
What's really important is that there are utilities out there which allow users to customize their keyboards to just the way they want it. Vive la différance.
I was checking the font repositories and found some new fonts that might be of interest to the linguistics/medieval/math crowd. But before that, I would like to define a new term LGC = Latin/Greek/Cyrillic font which refers to any font which includes the Latin, Latin-A, Cyrillic and Greek and a few math symbols. So many fonts include all three blocks, that's a handy acronym for me.
One caveat is that Basic LGC fonts don't necessarily include ALL LGC characters. For instance a font like Verdana may be missing IPA extensions, Cyrillic extensions and Greek extensions. The good news is that more fonts including the special characters are becoming available, and we're getting freeware large fonts to fill in typographical needs like small caps and narrow characters.
- Arev Sans - A sans serif font with excellent LGC coverage including Latin/Greek/Cyrililc extensions, a good inventory of math symbols and other symbols/punctuation.
- Linux Libertine - A family of OTF fonts with separate fonts for bold, italics, small caps. Good LGC coverage. It's also good to have a small caps font for Greek and Cyrillic, but it seems to be missing some of the phonetic characters.
- Marin Font - This font is notable for being a little narrower than others which is a nice change and has glyphs for the Cherokee block and the Canadian Aboriginal Syllables. It also includes a separate Small Caps font.
- Roman Cyrillic Std, BukyVede, KlimentStd from Kodeks German Medieval Slavicists Server - Bukyvede in particular includes a lot of historical Cyrillic characters and includes the Glagoltic characters. Kliment and Roman Cyrillic are LGC fonts which include other variations of the Glagoltic block. Latin and Greek are also included
- Quivira - I discussed this a few entries ago, but to repeat: Big font. Lots of scripts including LGC, Coptic, Armenian, Hebrew, Georgian, Thai, Baybayin, Runic, Thai, Braille, some Indic...
- Sophia Nubian - a new Coptic and Nubian script font from SIL with Keyman keyboard utility (Windows). A Mac Coptic Unicode Keyboard is also available.
I should mention that SIL is an excellent source of freeware fonts for undersupported scripts. Here's a list of the SIL fonts.
There are always more fonts out there so I recommend a periodic check of Gallery of Unicode Fonts and Alan Wood's Font list periodically. You never know what you might find.
A set of words which will generate some puzzled looks are those for the accent marks above and below letters in non-English spelling systems. So here is a quick list.
| Accent | Sample | Notes |
|---|---|---|
| Grave |
ò | Italian, Scottish Gaelic, French among others. |
| Double Grave | ȍ | May mark some tones in Slovenian, Serbian, Croatian, Bosnian |
| Acute |
ó | Spanish, French, Irish among others. Some languages like Dutch use both acute and grave accents. |
| Double Acute | ő | Used in Hungrarian |
| Circumflex |
ô | Used in French, Welsh among others. |
| Umlaut/Diaresis |
ö | Used in German, Welsh, Hungarian, among others |
| Tilde |
õ | Used in Spanish, Portuguese, Breton, Tagalog among others. |
| Cedilla/Cedille |
ç | Used in French, Turkish among others |
| Ogonek (Backwards Cedilla) |
ǫ | Used in Polish and some native American languages (for nasal vowels). |
| Macron/Long |
ō | Used in Maori, Hawaiian among others as well as some Latin texts |
| Breve/Short | ŏ | Used in Romanian |
| Caron/Hachek | ǒ | Letters č, š, ž used in Czech and other Central European languages |
| Ring | å | Used in some Scandinavian languages and Czech |
| Dot Above | ȯ | Used in Old Irish among others. |
| Dot Below | ọ | Used in Igbo (Africa) among others. Also common in English transliterations of languages of India. |
There are two types of hook accents which can appear under letters - the cedilla/cedille (French and other languages) and the ogonek (Polish and other languages). From a distance they appear similar, but the direction the hook faces is opposite in each case.
| Cedilla | Ogonek |
|---|---|
| Ç | Ǫ |
I'm writing about these two accents now because I did get myself tripped up this week when I was looking a character code. I think the best tip to remember is that ogoneks mostly appear on vowels (see below) and cedillas tend to appear on consonants
Origin of Ogonek
The term ogonek is a Polish term meaning "little tail" (actually a diminutive). It's used to indicate a nasalized vowel in Polish and other Central European languages, but now the use of ogonek has spread to Native American spelling systems where the ogonek also indicated a nasal vowel.
FYI - nasal vowels are also found in languages like French (français = /frãse/) and Portugues (as in São), but their writing systems implemented different slutions.
Cedilla or Cedille?
Both are valid depending on context. The Unicode standard uses the term cedilla (as in "LATIN SMALL LETTER C WITH CEDILLA" (U+00E7)). On the other hand, I first encountered this accent in French, so I tend to use the French term cedille. In French it's used with the letter "c" to indicate that the letter should be pronounced as /s/ and not /k/ (despite normal spelling convention).
The cedilla is used in other languages/letters such as Turkish where S-cedilla ş = // or the sound "sh" and Latvian where a cedilla consonant is palatalized. For whatever reasons though, cedillas are used with consonants.
Entering Ogonek and Cedilla into Text
If you're on a Mac, you can use the U.S. Extended keyboard and enter cedilla by pressing Control+C plus the letter. Ogonek is Control+M plus the letter.
If you're on Windows, you can either activate a keyboard for the target language (e.g. French, Polish, Turkish) or you can use the Character Map.
Some Entity Codes
Ogonek with Vowel
Below are the HTML entity codes for vowels with ogonek. The first code in each cell is the decimal version. The second the hexadecimal version which also corresponds with the Unicode code point.
| A | E | I | O | U | Combine | |
|---|---|---|---|---|---|---|
| Uppercase |
Ą Ą |
Ę Ę |
Į Į |
Ǫ Ǫ |
Ų Ų |
--̃ ̃ |
| Lowercase |
ą ą ą |
ę ę ę |
į į į |
ǫ ǫ ǫ |
ų ų ų |
-- |
Cedilla with Consonants
Note that letter E with cedilla does exist. Therefore the some vowels can have either a cedilla or an ogonek.
| Character Name | Character | Entity | Hex Entity |
|---|---|---|---|
| CEDILLA | ¸ | ¸ | ¸ |
| LATIN CAPITAL LETTER C WITH CEDILLA | Ç | Ç | Ç |
| LATIN SMALL LETTER C WITH CEDILLA | ç | ç | ç |
| LATIN CAPITAL LETTER D WITH CEDILLA | Ḑ | Ḑ | Ḑ |
| LATIN SMALL LETTER D WITH CEDILLA | ḑ | ḑ | ḑ |
| LATIN CAPITAL LETTER E WITH CEDILLA | Ȩ | Ȩ | Ȩ |
| LATIN SMALL LETTER E WITH CEDILLA | ȩ | ȩ | ȩ |
| LATIN CAPITAL LETTER G WITH CEDILLA | Ģ | Ģ | Ģ |
| LATIN SMALL LETTER G WITH CEDILLA | ģ | ģ | ģ |
| LATIN CAPITAL LETTER H WITH CEDILLA | Ḩ | Ḩ | Ḩ |
| LATIN SMALL LETTER H WITH CEDILLA | ḩ | ḩ | ḩ |
| LATIN CAPITAL LETTER K WITH CEDILLA | Ķ | Ķ | Ķ |
| LATIN SMALL LETTER K WITH CEDILLA | ķ | ķ | ķ |
| LATIN CAPITAL LETTER L WITH CEDILLA | Ļ | Ļ | Ļ |
| LATIN SMALL LETTER L WITH CEDILLA | ļ | ļ | ļ |
| LATIN CAPITAL LETTER N WITH CEDILLA | Ņ | Ņ | Ņ |
| LATIN SMALL LETTER N WITH CEDILLA | ņ | ņ | ņ |
| LATIN CAPITAL LETTER R WITH CEDILLA | Ŗ | Ŗ | Ŗ |
| LATIN SMALL LETTER R WITH CEDILLA | ŗ | ŗ | ŗ |
| LATIN CAPITAL LETTER S WITH CEDILLA | Ş | Ş | Ş |
| LATIN SMALL LETTER S WITH CEDILLA | ş | ş | ş |
| LATIN CAPITAL LETTER T WITH CEDILLA | Ţ | Ţ | Ţ |
| LATIN SMALL LETTER T WITH CEDILLA | ţ | ţ | ţ |
| COMBINING CEDILLA | ̧ | ̧ | ̧ |
I just discovered a new large True-Type Unicode font called Quivira from a German developer. It is based somewhat on Garamond, and includes a lot of useful characters such as Latin, Phonetics, Math, Greek, Coptic, Cyrillic, Cherokee, Currency, Box/Geometrics/Arrows, Old Italic, Gothic, Braille, Armenian, Hebrew and so forth.
The site is in German, but there's enough information for a user to get by using "Internet German", and as the author says "Quivira ist Freeware."
Download: http://www.grinningbit.com/quivira.php
List of Characters (PDF): http://www.grinningbit.com/files/Quivira.pdf

This week, my Glyph du Jour is one that does NOT exist in Unicode. It's a capital Q with a dot above representing "heat transfer per unit time" (or rate of heat transfer). Similar thermodynamic symbols are Ẇ (rate of work produced) and ṁ (rate of mass transfer)...and interestingly these DO exist in Unicode.
Why W-dot and m-dot, but no Q-dot? It's because these particular symbols probably have a use somewhere beyond thermodynamics. For instance, ṁ was sometimes used in older Classical Irish spelling (today's mh). Therefore the community was able to lobby for the inclusion of this letter within Unicode in order to transcribe historic Classical Irish texts (lucky for my thermodynamics course).
The irony here is that within Unicode Classical Irish actually has better resources than the engineering community (or the statistics community which could use p-hat or p̂). I don't think it's an evil conspiracy, but the fact that many engineers probably think of their notation quirks as a "font/layout" issue rather than as a "foreign language" issue.
The next step could be that someone proposes the inclusion of Q-dot or Q̇ (and its sibling q-dot or q̇ which is rate of heat transfer per unit mass. This could raise the issue of whether we can get with just combining Q plus a "combining diacritic" dot - that is a manually combining a letter and a diacritic.
Based on what I've seen, I would say no. First, few everyday fonts support "combining accents" well. They would much rather work with precomposed characters with accents built in. partly because it is difficult to place a dot consistently for each letter without building it ahead. I can fudge a q̇, but if I try Q̇, the dot often disappears into the taller capital Q. At best I'm stuck with Q ̇ (Q with upper-right dot).
Another lesson that someone WILL always find some new combination of the Latin alphabet to mess around with.
Dotted Letters & Combining Diacritic Test
Below is a table showing a test of the combining dot for the Q-dots and the existing dotted letters. As you can see, there are only a few dotted letters missing.
Note: Q-dots best viewed with Arial Unicode MS, Gentium or other specialized Unicode font.
| Character Name | Character | Hex Entity Code | Decimal Entity Code |
|---|---|---|---|
| Lower Q dot (Fudged) | q̇ | q+̇ | q+̇ |
| Capital Q dot (Fudged) | Q̇ | Q+̇ | Q+̇ |
| Lower A with dot above | ȧ | ȧ | ȧ |
| Capital A with dot above | Ȧ | Ȧ | Ȧ |
| Lower B with dot above | ḃ | ḃ | ḃ |
| Capital B with dot above | Ḃ | Ḃ | Ḃ |
| Lower C with dot above | ċ | ċ | ċ |
| Capital C with dot above | Ċ | Ċ | Ċ |
| Lower D with dot above | ḋ | ḋ | ḋ |
| Capital D with dot above | Ḋ | Ḋ | Ḋ |
| Lower E with dot above | ė | #x0117; | ė |
| Capital E with dot above | Ė | #x0116; | Ė |
| Lower F with dot above | ḟ | ḟ | ḟ |
| Capital F with dot above | Ḟ | Ḟ | Ḟ |
| Lower G with dot above | ġ | ġ | ġ |
| Capital G with dot above | Ġ | Ġ | Ġ |
| Lower H with dot above | ḣ | ḣ | ḣ |
| Capital H with dot above | Ḣ | Ḣ | Ḣ |
| Capital I with dot above | İ | İ | İ |
| Lower H with dot above | ṁ | ṁ | ṁ |
| Capital H with dot above | Ṁ | Ṁ | Ṁ |
| Lower n with dot above | Ṅ | ṅ | ṅ |
| Capital N with dot above | ṅ | Ṅ | Ṅ |
| Lower O with dot above | ȯ | ȯ | ȯ |
| Capital O with dot above | Ȯ | Ȯ | Ȯ |
| Lower P with dot above | ṗ | ṗ | ṗ |
| Capital P with dot above | Ṗ | Ṗ | Ṗ |
| Lower R with dot above | ṙ | ṙ | ṙ |
| Capital R with dot above | Ṙ | Ṙ | Ṙ |
| Lower S with dot above | ṡ | ṡ | ṡ |
| Capital S with dot above | Ṡ | Ṡ | Ṡ |
| Lower T with dot above | ṫ | ṫ | ṫ |
| Capital T with dot above | Ṫ | Ṫ | Ṫ |
| Lower W with dot above | ẇ | ặ | ẇ |
| Capital W with dot above | Ẇ | Ặ | Ẇ |
| Lower X with dot above | ẋ | ẋ | ẋ |
| Capital X with dot above | Ẋ | Ẻ | Ẋ |
| Lower Y with dot above | ẏ | ế | ẏ |
| Capital Y with dot above | Ẏ | Ế | Ẏ |
| Lower Z with dot above | ż | ż | ż |
| Capital Z with dot above | Ż | Ż | Ż |
Below is the Braille Symbol "1-4-5" which is encoded in Unicode. As you can see it is named for the configuration of the dots, not for a letter.
About 1-4-5
As it happens, this pattern is used for English letter D as well as for delta (Δ) in Greek, dalet (ד) in Hebrew, dal (د) in Arabic and letter (Д) in Russian. When Braille was first developed, only six cells were used, but in Unicode cells 7 and 8 were added to expand the repertoire of possible characters.
Most Braille software and hardware are devoted to visually impaired users, but there are fonts for sighted users. Below is a list of links to Braille charts by language and freeware Braille fonts.
Braille Charts
- U.S. English (Braille Bug)
- Scripts for the Blind (Multiple languages)
- Nemeth Braille (Braille Codes for Mathematics)
- Nemeth Code Math Symbols for Braille
- Phonetic Symbol Braille
Braille Unicode Fonts
- gh ASCII and Unicode Braille (from MathSpeak, includes other fonts from Nemeth and other uses)
- Braille (UBraille.ttf)
- Apple Braille (free on Macintosh, starting with OS X 10.5, Leopard)
- Apple Symbols (free on Macintosh)
- Deja Vu Sans (a large font with lots of scripts)
- Gallery of Unicode Fonts: Braille
Recent Comments