January 2009 Archives

A Unicode Eye Chart

|

If your eyes are becoming glazed trying to determine if that glyph is = or ≡ or ≅ or something else remarkably similar...then you may want to check your vision with this helpful Unicode Eye Chart.

Comes with a useful key at the bottom. Isn't is amazing what you can find on the Web?

Categories:

More Accent Terminology

|

A set of words which will generate some puzzled looks are those for the accent marks above and below letters in non-English spelling systems. So here is a quick list.

Accent Sample Notes
Grave
ò Italian, Scottish Gaelic, French among others.
Double Grave ȍ May mark some tones in Slovenian, Serbian, Croatian, Bosnian
Acute
ó Spanish, French, Irish among others. Some languages like Dutch use both acute and grave accents.
Double Acute ő Used in Hungrarian
Circumflex
ô Used in French, Welsh among others.
Umlaut/Diaresis
ö Used in German, Welsh, Hungarian, among others
Tilde
õ Used in Spanish, Portuguese, Breton, Tagalog among others.
Cedilla/Cedille
ç Used in French, Turkish among others
Ogonek
(Backwards
Cedilla)
ǫ Used in Polish and some native American languages (for nasal vowels).
Macron/Long
ō Used in Maori, Hawaiian among others as well as some Latin texts
Breve/Short ŏ Used in Romanian
Caron/Hachek ǒ Letters č, š, ž used in Czech and other Central European languages
Ring å Used in some Scandinavian languages and Czech
Dot Above ȯ Used in Old Irish among others.
Dot Below Used in Igbo (Africa) among others. Also common in English transliterations of languages of India.

Categories:

Ogonek vs. Cedilla Accent

|

There are two types of hook accents which can appear under letters - the cedilla/cedille (French and other languages) and the ogonek (Polish and other languages). From a distance they appear similar, but the direction the hook faces is opposite in each case.

Cedilla Ogonek
Ç Ǫ

I'm writing about these two accents now because I did get myself tripped up this week when I was looking a character code. I think the best tip to remember is that ogoneks mostly appear on vowels (see below) and cedillas tend to appear on consonants

Origin of Ogonek

The term ogonek is a Polish term meaning "little tail" (actually a diminutive). It's used to indicate a nasalized vowel in Polish and other Central European languages, but now the use of ogonek has spread to Native American spelling systems where the ogonek also indicated a nasal vowel.

FYI - nasal vowels are also found in languages like French (français = /frãse/) and Portugues (as in São), but their writing systems implemented different slutions.

Cedilla or Cedille?

Both are valid depending on context. The Unicode standard uses the term cedilla (as in "LATIN SMALL LETTER C WITH CEDILLA" (U+00E7)). On the other hand, I first encountered this accent in French, so I tend to use the French term cedille. In French it's used with the letter "c" to indicate that the letter should be pronounced as /s/ and not /k/ (despite normal spelling convention).

The cedilla is used in other languages/letters such as Turkish where S-cedilla ş = // or the sound "sh" and Latvian where a cedilla consonant is palatalized. For whatever reasons though, cedillas are used with consonants.

Entering Ogonek and Cedilla into Text

If you're on a Mac, you can use the U.S. Extended keyboard and enter cedilla by pressing Control+C plus the letter. Ogonek is Control+M plus the letter.

If you're on Windows, you can either activate a keyboard for the target language (e.g. French, Polish, Turkish) or you can use the Character Map.

Some Entity Codes

Ogonek with Vowel

Below are the HTML entity codes for vowels with ogonek. The first code in each cell is the decimal version. The second the hexadecimal version which also corresponds with the Unicode code point.

  A E I O U Combine
Uppercase
Ą
Ą
Ć
Ę
Ę
Ę
Į
Į
Į
Ǫ
Ǫ
Ǫ
Ų
Ų
Ų
--̃
̃
̃
Lowercase
ą
ą
ą
ę
ę
ę
į
į
į
ǫ
ǫ
ǫ
ų
ų
ų
--

Cedilla with Consonants

Note that letter E with cedilla does exist. Therefore the some vowels can have either a cedilla or an ogonek.

Character Name Character Entity Hex Entity
CEDILLA¸¸
LATIN CAPITAL LETTER C WITH CEDILLAÇÇ
LATIN SMALL LETTER C WITH CEDILLAçç
LATIN CAPITAL LETTER D WITH CEDILLA Ḑ
LATIN SMALL LETTER D WITH CEDILLA ḑ
LATIN CAPITAL LETTER E WITH CEDILLA Ȩ Ȩ
LATIN SMALL LETTER E WITH CEDILLA ȩ ȩ
LATIN CAPITAL LETTER G WITH CEDILLAĢĢ
LATIN SMALL LETTER G WITH CEDILLAģģ
LATIN CAPITAL LETTER H WITH CEDILLA Ḩ
LATIN SMALL LETTER H WITH CEDILLA ḩ
LATIN CAPITAL LETTER K WITH CEDILLAĶĶ
LATIN SMALL LETTER K WITH CEDILLAķķ
LATIN CAPITAL LETTER L WITH CEDILLAĻĻ
LATIN SMALL LETTER L WITH CEDILLAļļ
LATIN CAPITAL LETTER N WITH CEDILLAŅŅ
LATIN SMALL LETTER N WITH CEDILLAņņ
LATIN CAPITAL LETTER R WITH CEDILLAŖŖ
LATIN SMALL LETTER R WITH CEDILLAŗŗ
LATIN CAPITAL LETTER S WITH CEDILLAŞŞ
LATIN SMALL LETTER S WITH CEDILLAşş
LATIN CAPITAL LETTER T WITH CEDILLAŢŢ
LATIN SMALL LETTER T WITH CEDILLAţţ
COMBINING CEDILLA̧̧

Categories:

Tutorial on RTL/LTR & BIDI in Arabic/Hebrew

|

The tutorial technically Creating SVG Tiny Pages in Arabic, Hebrew and other Right-to-Left Scripts, but it actually provides an excellent explanation of how Unicode specifies text direction and how you need to encode both RTL (right to left) and LTR (left to right) in a Middle Eastern text which includes European words as BIDI (Bidirectional).

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments