ELIZABETH J PYATT: November 2008 Archives

Quivira Unicode Font

|

I just discovered a new large True-Type Unicode font called Quivira from a German developer. It is based somewhat on Garamond, and includes a lot of useful characters such as Latin, Phonetics, Math, Greek, Coptic, Cyrillic, Cherokee, Currency, Box/Geometrics/Arrows, Old Italic, Gothic, Braille, Armenian, Hebrew and so forth.

The site is in German, but there's enough information for a user to get by using "Internet German", and as the author says "Quivira ist Freeware."

Download: http://www.grinningbit.com/quivira.php
List of Characters (PDF): http://www.grinningbit.com/files/Quivira.pdf

Categories:

Glyph du Jour: Thermodynamic Q-dot

|
Q with Dot above in multiple fonts

This week, my Glyph du Jour is one that does NOT exist in Unicode. It's a capital Q with a dot above representing "heat transfer per unit time" (or rate of heat transfer). Similar thermodynamic symbols are (rate of work produced) and (rate of mass transfer)...and interestingly these DO exist in Unicode.

Why W-dot and m-dot, but no Q-dot? It's because these particular symbols probably have a use somewhere beyond thermodynamics. For instance, was sometimes used in older Classical Irish spelling (today's mh). Therefore the community was able to lobby for the inclusion of this letter within Unicode in order to transcribe historic Classical Irish texts (lucky for my thermodynamics course).

The irony here is that within Unicode Classical Irish actually has better resources than the engineering community (or the statistics community which could use p-hat or ). I don't think it's an evil conspiracy, but the fact that many engineers probably think of their notation quirks as a "font/layout" issue rather than as a "foreign language" issue.

The next step could be that someone proposes the inclusion of Q-dot or (and its sibling q-dot or which is rate of heat transfer per unit mass. This could raise the issue of whether we can get with just combining Q plus a "combining diacritic" dot - that is a manually combining a letter and a diacritic.

Based on what I've seen, I would say no. First, few everyday fonts support "combining accents" well. They would much rather work with precomposed characters with accents built in. partly because it is difficult to place a dot consistently for each letter without building it ahead. I can fudge a , but if I try , the dot often disappears into the taller capital Q. At best I'm stuck with Q ̇ (Q with upper-right dot).

Another lesson that someone WILL always find some new combination of the Latin alphabet to mess around with.

Dotted Letters & Combining Diacritic Test

Below is a table showing a test of the combining dot for the Q-dots and the existing dotted letters. As you can see, there are only a few dotted letters missing.
Note: Q-dots best viewed with Arial Unicode MS, Gentium or other specialized Unicode font.

Character Name Character Hex Entity Code Decimal Entity Code
Lower Q dot (Fudged) q+̇ q+̇
Capital Q dot (Fudged) Q+̇ Q+̇
Lower A with dot above ȧ ȧ ȧ
Capital A with dot above Ȧ Ȧ Ȧ
Lower B with dot above ḃ ḃ
Capital B with dot above Ḃ Ḃ
Lower C with dot above ċ ċ ċ
Capital C with dot above Ċ Ċ Ċ
Lower D with dot above ḋ ḋ
Capital D with dot above Ḋ Ḋ
Lower E with dot above ė #x0117; ė
Capital E with dot above Ė #x0116; Ė
Lower F with dot above ḟ ḟ
Capital F with dot above Ḟ Ḟ
Lower G with dot above ġ ġ ġ
Capital G with dot above Ġ Ġ Ġ
Lower H with dot above ḣ ḣ
Capital H with dot above Ḣ Ḣ
Capital I with dot above İ İ İ
Lower H with dot above ṁ ṁ
Capital H with dot above Ṁ Ṁ
Lower n with dot above ṅ ṅ
Capital N with dot above Ṅ Ṅ
Lower O with dot above ȯ ȯ ȯ
Capital O with dot above Ȯ Ȯ Ȯ
Lower P with dot above ṗ ṗ
Capital P with dot above Ṗ Ṗ
Lower R with dot above ṙ ṙ
Capital R with dot above Ṙ Ṙ
Lower S with dot above ṡ ṡ
Capital S with dot above Ṡ Ṡ
Lower T with dot above ṫ ṫ
Capital T with dot above Ṫ Ṫ
Lower W with dot above ặ ẇ
Capital W with dot above Ặ Ẇ
Lower X with dot above ẋ ẋ
Capital X with dot above Ẻ Ẋ
Lower Y with dot above ế ẏ
Capital Y with dot above Ế Ẏ
Lower Z with dot above ż ż ż
Capital Z with dot above Ż Ż Ż

Categories:

Language Tage "mo" for Moldovan Deprecated

|

As of November 3, 2008, both the ISO-639 language code mo (Moldovan) and the ISO-639-2 code mol (Moldovan) were deprecated in favor of Romanian.

In other words, the encoding standards authorities have embodied the notion that Moldovan, as spoken in the Republic of Moldavia, is actually so closely related to Romanian that they are both dialects of each other. This has been the stance claimed by the linguistic community and many elements in both the Romanian and Moldovan community.

From now on, the code ro(Romanian) will refer to the language forms used in both the countries of Romania and Moldova. The tags to distinguish linguistic forms in Romania from that of Moldova will be ro-RO (Romanian or Romania) and ro-MD (Romania of Moldavia).

This may seem to be a trivial change, but it's heartening from my point of view. In recent years, there had been a trend in language code assignments to favor political expedience over linguistic reality.

The most similar case was the elimination of the sh for Serbo-Croatian, as spoken in the former Yugoslavia in favor of three "separate" language codes for Serbian (sr), Croatian (hr) and Bosnian (bs). Although there are genuine regional differences between the forms (especially for Croatian), linguists still debate whether these forms are separate languages or dialects.

Although I do not expect the three codes for Serbian, Croatian and Bosnian to be eliminated anytime soon, I do think it's a good sign that speakers in Moldova and Romania were willing to re-evaluate their linguistic identity.

Categories:

Some Recent Language Tagging News (incl Pinyin/Wade-Giles)

|

Codes for language varieties are constantly being updated, but here is a list of some important changes that have happened in recent months.

The most up-to-date list is available at:
http://www.iana.org/assignments/language-subtag-registry

Chinese Romanizations

  • zh-Latn-pinyin for Pinyin Latin romanization (Mandarin)
  • zh-Latn-wadegile for Wade-Giles romanization (Mandarin)

Note that here the assumption is that zh is Mandarin Chinese. From the discussion it appears that more precise codes for Mandarin could not be used because they had not been fully-approved (sigh). If you are working with a "dialect", you may need to include an appropriate dialect/language extension.

Cornish Spelling

It's hard to believe that a language just being revived already has multiple competing spelling systems, but that's how it goes sometimes. The codes are:

  • kw for Cornish
  • kw-kkcor for Cornish, Common Cornish orthography
  • kw-uccor for Cornish, Unified Cornish orthography
  • kw-ucrcor for Cornish, Unified Cornish Revised orthography

Valencian

Valencian (Spain) is considered to be a regional dialect of Catalan or code ca-valencia.

Belarusian, 1959 spelling

The code be-1959acad is for "Academic (govermental) variant of Belarusian as codified in 1959.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments