Recently in News Category

Unicode 6.1 Additions

|

The Unicode standard was just updated to version 6.1, and that means new blocks and characters.

New Blocks

Blocks added included Miao (script developed for Hmong/Miao languages), Merotic Heiroglyphic & Merotic Cursive (adaptation of Egyptian heirogphys from ancient Meroë in what is now Northern Sudan) and multiple scripts from India (Sora Sompeng, Chakma, Sharada, Takri).

Two new blocks for the Arabic script were also added - Arabic Mathematical Symbols and Arabic Extended -A. Extensions for the Sundanese and Meetei Mayak scripts were also added.

New Characters

The Unicode Consortium has an index of which new characters have been added to different scripts.

Categories:

Unicode 6.0 Released

|

The revised Unicode standard version 6.0.0 has been officially released by the Unicode Consortium. In addition to changes in the specification, some additional characters have been added include a block of emoji (emoticon) symbols, the new rupee sign of India as well as new blocks for Mandaic (Iran), Batak (Indonesia) and Brahmi (ancestral form of most scripts of India). Additional characters have also been added for alchemical symbols and playing cards, and there have been additions for CJK ideographs, Ethiopic, Tifinagh and Bamum (Cameroon) scripts.

Hopefully the mainstream fonts and OS will catch up with some of these issues soon.

Categories:

STIX Math Font Formally Released

|

One of the more pleasant surprises in computer technolofy is when a long-standing project under development comes to fruition as a usable product. Such is the case with the STIX Fonts, a set of OTF fonts released under the SIL Open Font License, which is an open source license.

The focus of the STIX fonts are math and technical symbols, and there are plenty of those in the font set (at the appropriate Unicode points of course). However the fonts also include a variety of Latin, Greek and Cyrillic characters and even some phonetic characters (which is nice for linguistic publications referencing logic and set notation). The typographic design is based on Times New Roman, a common font used in many technical publications. STIX is also designed to be used in MathML.

There are other math freeware fonts, but the STIX family is notable for 1) including a variety of predesigned fonts for bold, italic and multiple font sizes and 2) being sponsored a consortium of scientific societies including the American Institute of Physics, American Chemical Society, American Mathematical Society, IEEE (Institute of Electrical and Electronics Engineers), American Physical Society and Elsevier.

Version 1.0 includes basic support, but future versions are scheduled to include additional support for Microsoft Word and LaTex.

Categories:

Farewell Chryʃanþi (Chrsyanthi) Font?

|

I'm getting help cleaning up my links, and I was sad to discover that the link to the Chrysanthi font (formerly at http://everywitchway.net/linguistics/fonts/chrysuni.html) was no longer working. This was a nice multilingual font which included lots of Latin, Greek, Cyrillic and phonetic characters along with other scripts like Armenian, Runes, math symbols, spiritual symbols and others. I never knew much about the creator, but I appreciated that it was free and fairly elegant.

I wasn't alone in my appreciation since links have appeared on Greek Unicode sites, Armenian Unicode sites, Gallery of Unicode Fonts, Wikipedia and elsewhere. However no one has an updated link (I'm not counting mirroring sites.

The site http://everywitchway.net/ now features what appears to be a Gothic ABC book (very cute), so the site is not entirely dead, but the font site was last cached by the Way Back Machine back in July 2008. As I said, I do need to update some links, but then again, so do a lot of us. In the meantime, I did want to say a belated thanks to the creator. I enjoyed the range of symbols available, at a time when few fonts were providing them (some symbols are still hard to find in fonts).

If anyone has news of this, you can always contact me at ejp10@psu.edu

Categories:

Almost Half the Web in Unicode

|

An entry from the Google Blog has a graph showing that Unicode is rapidly gaining dominance as the defacto encoding standard on the Web.

The good news is that Unicode is now the number 1 encoding standard used, but it's not quite 50% yet (more like around 48%). Only 6 years ago (ca. 2004), the percentage was about 5-10%, so acceptance of Unicode and ability to implement it has increased geometrically.

As of 2010 though, about 40% of the Web was either in Latin-1/Win-1251 (ca 19%) while another was still in ... US-ASCII (ca 20%). Fortunately, that percentage is also dropping geometrically (from 55%+ ASCII in 2001). Some of us may be lagging behind, but it looks like we're all going to catch up sooner or later.

Categories:

Ancient Egyptian & Other Additions in Unicode 5.2

|

The latest Unicode Standard, Version 5.2, was released at the beginning of October, 2009. A lot is added each standard, but I confess that the most noteworthy for me was that an Egyptian Heiroglyphic block (U+13000 to U+1342E) was added. It was certainly the largest block added at 1071 code points.

Additional code points added included blocks for Avestan, Old South Arabic, Samaratian, Imperial Aramaic, Inscriptional Parthian, Old Turkic. In addition, supporting characters were added for the Coptic, Devanagari (esp Vedic support), Hangul (Old Korean), Phonecian and other ancient script blocks.

In South and Southeast Asia, support was added for Javanese, Tai Tham, Lisu, Kaithi, Meitei Mayak, Myanmar (new points), New Tai Lue (new points) and others. In other regions, a new Caniadian Aboriginal Syllabics Extended block was created with 80 additional code points. Some African scripts were also encoded including the Banum script and Rumi numerals. Additions were also made to various math and symbol blocks.

For a complete list of changes, see the information on the DerivedAge.txt file (scroll to end) and Revised Unicode 5.2 charts. In terms of support, there may be freeware (or commercial) fonts available, but time will be needed to develop the input utilities and then for these glyphs to be incorporated into major operating systems.

Until then...there's always Unicode 6.0.

Categories:

Chinese Olympic Pictograms

|

One of the more interesting "color" pieces on the U.S. Olympics coverage on NBC was a piece on how the icons for the different sports were inspired by early Chinese pictograms, which were the precursors of the modern Chinese characters.

You can read a bit more about the design process in this article from the People's Daily Online.

The use of ancient art for modern Olympic pictograms is not new (see the entries from Athens, Syndney, Lillehammer and Salt Lake City) but I think this was the first time it made it to television.

Categories:

List of Old Church Slavonic Fonts

|

AATSEEL (American Association of Teachers of Slavic and East European Languages) has just posted a set of links to "Medieval Slavic Fonts" for Old Church Slavonic, Glagolitic and Blackletter.

See http://www.aatseel.org/medieval_slavic_font for more information

List includes Unicode fonts and older non-Unicode fonts

Categories:

Unicode 31: Lessons from the "Front Line"

|

I had a hard time deciding which sessions to attend at the last Unicode conference, but I did end up at "Unicode at the Front Lines", which was a series of mini-presentations from scholars working with lesser-known languages and scripts. This is a place where the Unicode rubber really hits the road, and I learned some interesting "life-lessons".

1. The problem with "reforming" a script is that new readers may not be able to read the older texts. This was in context of the Tai Viet script (apparently the reform was so unpopular, they ditched it), but occurs in Chinese (Traditional vs. Simplified), Korean (new texts use only Hangul, but older ones included Chinese) and even in cases where spelling reform is enacted (as in the Netherlands and Germany).

BTW - I'm not against spelling/script reform, but we do have to admit that there will be some "loss" (enough to keep a few scholars in archaic languages in business).

2. Try not to invent a new letter for new languages. In the earlier part of the 20th century, linguists were fond of inventing quirky new symbols for languages they were documenting. A classic case is Igbo which has a lots of vowels with dots beneath them as in Ị,ị,Ọ,ọ,Ụ,ụ. There is no objection to the dots per se, but they are an unusual in the context to what Western alphabets do. Because these characters are outside the norm, Igbo internationalization has to play continual catch-up because even programs which can handle Western European languages, may not know what to do with the dots.

If your lesser-known language already includes letters that are common to the major languages, implementation of utilities in your language is much easier. Of course, I think Unicode is better for including dotted letters.

For now though...if you have a choice between "v" or "vh" in your language, the latter is (unfortunately) a little more Unicode ready.

3. H ≠ Η ≠ Н - For the record the first is English H /h/, the second is Greek capital Eta /ē/ and the last is Cyrillic En /n/. I knew that many capital letters are triple encoded (e.g. A/alpha/Cyrillic Ah), but this is the first time I realized that the phonetic values can be so different. Normally this isn't an issue unless you have linguists from all over Europe trying to use their native script for phonetic spellings. When do you have the right H?

4. ŵ ≠ ŵ it matters when you type the accent). Unicode supports "pre-composed accents" (that is an accent which can float over any letter) and in theory the combination of ̂́+ w (to make ŵ) should be the same as w + ̂́ (to make ŵ) ...but it's not. A linguistic archive database has these precomposed letters but can't "merge" the two string combinations as one letter.

Again, this wouldn't be too critical except that sometimes a linguist puts the accent before the w, and sometimes they put the w before the accent. Again these are the same world-wide linguists who gave us the problem of the three H's.

A member in the audience did suggest that it was a "training issue", but who are we kidding...these are FACULTY. Faculty are great scholars, but few are well-trained data entry operators.

Categories:

Is the IUC Conference Worth it? Absolutely!

|

My university was kind enough to send me to IUC 31 (http://www.unicodeconference.org) this year, and I can honestly say that it was one of the best conferences I've been to.

For one thing, almost all the major players (Microsoft, Apple, Sun, IBM, Adobe, Google, Yahoo, W3C) sent representatives, so I got to hear a lot of great information straight from the source. I've been hacking away at this for seven years, but I learned quite a bit of new information, especially about some of the more technical aspects.

The Unicode conference is also very good at providing a good range of how-tos ranging from absolute monolingual beginner to cutting edge tools for the experienced Unicoder. Even the basics gave me some pointers that I had forgotten or hadn't considered. I obviously couldn't make all the sessions (no cloning yet), but the PDF's that attendees can access are fairly detailed and can help you track it down.

I have to confess that my favorite track was probably "Unicode on the Front Lines" in which linguists described encoding issues for minority languages and scripts. From a language geek perspective, it's fascinating what new issues come up. More importantly, I saw that there was a lot of support for outreach in the Unicode community. I heard the members of the Unicode Org point some users to resources they hadn't know about before.

I myself gave a presentation about Unicode at Penn State, and I have to say most of the feedback was very positive, and I got a few tips myself.

So all in all, I have to say thanks to the organizers of the conference for putting on a great event.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments