January 2009 Archives
If your eyes are becoming glazed trying to determine if that glyph is = or ≡ or ≅ or something else remarkably similar...then you may want to check your vision with this helpful Unicode Eye Chart.
Comes with a useful key at the bottom. Isn't is amazing what you can find on the Web?
A set of words which will generate some puzzled looks are those for the accent marks above and below letters in non-English spelling systems. So here is a quick list.
||ò||Italian, Scottish Gaelic, French among others.
|Double Grave||ȍ||May mark some tones in Slovenian, Serbian, Croatian, Bosnian|
||ó||Spanish, French, Irish among others. Some languages like Dutch use both acute and grave accents.
|Double Acute||ő||Used in Hungrarian|
||ô||Used in French, Welsh among others.
||ö||Used in German, Welsh, Hungarian, among others
||õ||Used in Spanish, Portuguese, Breton, Tagalog among others.|
||ç||Used in French, Turkish among others|
|ǫ||Used in Polish and some native American languages (for nasal vowels).|
||ō||Used in Maori, Hawaiian among others as well as some Latin texts|
|Breve/Short||ŏ||Used in Romanian|
|Caron/Hachek||ǒ||Letters č, š, ž used in Czech and other Central European languages|
|Ring||å||Used in some Scandinavian languages and Czech|
|Dot Above||ȯ||Used in Old Irish among others.
|Dot Below||ọ||Used in Igbo (Africa) among others. Also common in English transliterations of languages of India.
There are two types of hook accents which can appear under letters - the cedilla/cedille (French and other languages) and the ogonek (Polish and other languages). From a distance they appear similar, but the direction the hook faces is opposite in each case.
I'm writing about these two accents now because I did get myself tripped up this week when I was looking a character code. I think the best tip to remember is that ogoneks mostly appear on vowels (see below) and cedillas tend to appear on consonants
Origin of Ogonek
The term ogonek is a Polish term meaning "little tail" (actually a diminutive). It's used to indicate a nasalized vowel in Polish and other Central European languages, but now the use of ogonek has spread to Native American spelling systems where the ogonek also indicated a nasal vowel.
FYI - nasal vowels are also found in languages like French (français = /frãse/) and Portugues (as in São), but their writing systems implemented different slutions.
Cedilla or Cedille?
Both are valid depending on context. The Unicode standard uses the term cedilla (as in "LATIN SMALL LETTER C WITH CEDILLA" (U+00E7)). On the other hand, I first encountered this accent in French, so I tend to use the French term cedille. In French it's used with the letter "c" to indicate that the letter should be pronounced as /s/ and not /k/ (despite normal spelling convention).
The cedilla is used in other languages/letters such as Turkish where S-cedilla ş = // or the sound "sh" and Latvian where a cedilla consonant is palatalized. For whatever reasons though, cedillas are used with consonants.
Entering Ogonek and Cedilla into Text
If you're on a Mac, you can use the U.S. Extended keyboard and enter cedilla by pressing Control+C plus the letter. Ogonek is Control+M plus the letter.
Some Entity Codes
Ogonek with Vowel
Below are the HTML entity codes for vowels with ogonek. The first code in each cell is the decimal version. The second the hexadecimal version which also corresponds with the Unicode code point.
Cedilla with Consonants
Note that letter E with cedilla does exist. Therefore the some vowels can have either a cedilla or an ogonek.
|Character Name||Character||Entity||Hex Entity|
|LATIN CAPITAL LETTER C WITH CEDILLA||Ç||Ç||Ç|
|LATIN SMALL LETTER C WITH CEDILLA||ç||ç||ç|
|LATIN CAPITAL LETTER D WITH CEDILLA||Ḑ||Ḑ||Ḑ|
|LATIN SMALL LETTER D WITH CEDILLA||ḑ||ḑ||ḑ|
|LATIN CAPITAL LETTER E WITH CEDILLA||Ȩ||Ȩ||Ȩ|
|LATIN SMALL LETTER E WITH CEDILLA||ȩ||ȩ||ȩ|
|LATIN CAPITAL LETTER G WITH CEDILLA||Ģ||Ģ||Ģ|
|LATIN SMALL LETTER G WITH CEDILLA||ģ||ģ||ģ|
|LATIN CAPITAL LETTER H WITH CEDILLA||Ḩ||Ḩ||Ḩ|
|LATIN SMALL LETTER H WITH CEDILLA||ḩ||ḩ||ḩ|
|LATIN CAPITAL LETTER K WITH CEDILLA||Ķ||Ķ||Ķ|
|LATIN SMALL LETTER K WITH CEDILLA||ķ||ķ||ķ|
|LATIN CAPITAL LETTER L WITH CEDILLA||Ļ||Ļ||Ļ|
|LATIN SMALL LETTER L WITH CEDILLA||ļ||ļ||ļ|
|LATIN CAPITAL LETTER N WITH CEDILLA||Ņ||Ņ||Ņ|
|LATIN SMALL LETTER N WITH CEDILLA||ņ||ņ||ņ|
|LATIN CAPITAL LETTER R WITH CEDILLA||Ŗ||Ŗ||Ŗ|
|LATIN SMALL LETTER R WITH CEDILLA||ŗ||ŗ||ŗ|
|LATIN CAPITAL LETTER S WITH CEDILLA||Ş||Ş||Ş|
|LATIN SMALL LETTER S WITH CEDILLA||ş||ş||ş|
|LATIN CAPITAL LETTER T WITH CEDILLA||Ţ||Ţ||Ţ|
|LATIN SMALL LETTER T WITH CEDILLA||ţ||ţ||ţ|
The tutorial technically Creating SVG Tiny Pages in Arabic, Hebrew and other Right-to-Left Scripts, but it actually provides an excellent explanation of how Unicode specifies text direction and how you need to encode both RTL (right to left) and LTR (left to right) in a Middle Eastern text which includes European words as BIDI (Bidirectional).