New Home Page

|

Looking for updated entries? Go to our new blog site at:

http://sites.psu.edu/gotunicode

A Unicode Ted Talk

|

Johannes Bergerhausen recently gave a Ted Talk in Vienna on Unicode. It's a good summary of key issues for those who don't know know about bits or bytes

Looking forward to the day when you can "send text messages in Cuneiform".

Categories:

Russian Ruble Symbol Coming to Unicode

|

A new Russian ruble sign as just been approved by the Central Bank of Russia. The sign is a traditional Cyrillic R (Р) with a crossed line below. You can also see other recent design candidates if interested.

The next discussion of course will be where in Unicode this will appear. Some have proposed that it will be U+0554, but that is the Armenian letter keh (Ք). Although the appearance is similar, there is a discussion online already of whether this is the best idea to transmit Armenian /k/ as a currency symbol.

Based on previous patterns, I predict that a new code will be assigned, perhaps in the Currency block (U+20BB?) or possibly the Cyrillic block. If it's in the Cyrillic block, it would a new addition to the recent Indian Rupee Symbol (U+20B9/₹) and Turkish Lira sign (U+20BA/₺).

From a sociolinguistic perspective, I am finding the creation of new currency symbols interesting, especially for currencies which have existed as long as the ruble and the rupee. To me this is a clear extension of idea that a language doesn't socially exist as a "real language" unless it has its own spelling/writing system.

Apparently it's now equally important for governments to establish a unique currency sign to be counted as a "major" currency.

Postscript: 16 Dec 2013

The debate about whether the Russian Ruble sign should be in the Armenian block has reached the Armenpress news wire. Many are recommending no.

Categories:

Embedded Fonts htaccess Update

|

htaccess Fix

Since Penn State just rolled out its Word Press service, I thought I experiment with font embedding with SIL webfont kits I uploaded into one of my directories on another server. The good news is that it works, but I had to adjust my .htacess file to allow Firefox to process the files.

The code that worked for me was:

<FilesMatch "\.(ttf|otf|eot|woff)$">
  <IfModule mod_headers.c>
    Header set Access-Control-Allow-Origin "*"
  </IfModule>
</FilesMatch>

Thanks Stack Overflow and the Sites Team for their help.

Correction - Not on Firefox for Mac

I thought it was working on Firefox for Mac, but something broke. Oy. It is OK on Firefox for PC.

Google Fonts not Just English

I was also happy to find that the Google Fonts options have expanded to Greek, Cyrillic and extended Latin. Just click the Script to filter for the appropriate fonts. There are some choices if you don't want to mess with your .htaccess file.

Categories:

Explaining UTF-8

|

The UTF-8 encoding is not a straight encoding of Unicode code points, but rather a "compromise character encoding" which allows files with just ASCII characters to stay the same size as ASCII, but also include any Unicode code point, regardless of byte size.

If this is sounding a bit confusing, you may want to try this Game Dev article on UTF 8. It's still under review, but it does step through some parts of the the conversion from a Unicode code point to a UTF-8 representation.

Categories:

Text Expander & Breevy: Great Unicode Input Tools

|

A Unicode issue many people have is how to enter stray symbols not found within the range of normal entry utilities. For instance as a linguist, I may be entering phonetic symbols, logic symbols or random characters from a variety of languages. These go way beyond the range of "everyday" accented letters

Lately though, I've been introduced to a class of text expander tools such as TextExpander (Mac) and Breevy (Windows) that has truly been a life saver.

What both these tools is allow you to create abbreviation codes for symbols, words, phrases and even entire paragraphs. In my case, I created abbreviations for many phonetic symbols (e.g. "\e" = ə (Schwa) and "\n" = ŋ (Engma)). You can also create codes for math symbols (";all" = (upside down A)), emoticons and icons (;hrt = ♥ (heart)) and even common words or phrases that you use a lot but don't want to type out (;dvrk = Dvořák) or (;rdetr raison d'être) . Actually, I mostly use this tool for full phrases I use a lot in e-mail (e.g. ;lmk = "Let me know what you think.") This is truly a multipurpose linguistic tool!

I've tried lots of input tools in the past, and text expanders have some very nice advantages. One is that the codes work everywhere from e-mail and Facebook to Microsoft Word and Illustrator. Also, since you define the abbreviation, you will be more likely to remember it. Typing numeric codes only goes so far. Finally, I don't have to switch keyboards or open a program just to input one word or phrase. If you are truly working with two languages, then switching is practical, but for sporadic symbols and words/phrases, it's a pain.

There is one drawback in that you have to design your codes so that you won't type them elsewhere. You'd be surprised when a code like "urpr" might misfire when you type "surprised."

Windows 8 & Windows 8.1 Ancient Script and Asian Fonts

|

Scholars interested in ancient scripts such as Glagoltic, Gothic and Old Hangul may be interested in the new fonts packaged with Windows 8, in particular the updated Segoe UI Symbol font.

Or you could wait for Windows 8.1 when support for Coptic and different scripts of South and Southeast Asia will be added.

Categories:

Icon Fonts + Unicode and Accessibility

|

Remember the old Symbol font in which you could type S and get the Greek sigma (Σ) symbol....without activating any Greek keyboard? Or Wingdings font which produced all number of cool symbols. Unicode gurus have been trying to get rid of these fonts for a while now since you had issues where different symbols could appear depending on which font a user had installed.

For instance, the following passage will either be a series of astrological symbols or the letters A-E depending on whether you have the Wingdings installed or not (and whether your browser will let you get away with displaying Wingdings).

a b c d e

Web Fonts to the Rescue....Sort Of

This situation had forced Web developers to retire the use of Symbols and Wingdings for the Web....until the advent of the embedded Web font which allows a Web site to send a font file to any user. Now everyone will have your font.

This has led to the rise of new "icon fonts" which can be embedded on a Website. Some of my favorites are

In theory, you can get rid of a lot of icon images and use more lightweight text, and since the font will download to everyone, you no longer have to worry about display issues or even encoding. You can even use modern CSS to perform all sorts of formatting tricks. Great news .... right?

Encoding Side Note

Generally speaking icon fonts are taking two approaches. One is the 1990s "Unicode...what Unicode?" approach. If you download and use the StateFace font, typing w will output the outline of West Virginia.

The second approach is to try to conform to Unicode. Glyphs that do have a code (e.g. a smiling face) are put in that slot and others are put into the Private Use Area. Obviously, this approach has many advantages over the original approach in that some text integrity can be maintained.

Accessibility and Other

As you might expect, there is trouble in paradise. A major potential gotcha is screen reader access because screen readers don't do fonts, only underlying text. VoiceOver recognizes many Unicode symbols, but if a font is using a symbol not in Unicode, then you're out of luck. Similarly, if a font is matching a symbol to a letter a la Wingdings, then VoiceOver will read the letter and not the symbol.

Even worse is JAWS which has very limited symbol recognition by default. If you replace letters with symbols, JAWS will read letters. But even if you use a properly encoded font, JAWS may still not read your symbol (whereas any image with ALT text is recognised) especially those in the Private Use Area. In this case, you may need to provide a JAWS symbol text file...unless you are crafty in your use of icons.

But don't forget mobile devices. Some phones may be able to recognize embedded fonts, but some older phones may not. As recent reports are showing, more and more people are accessing the Web primarily through a phone or tablet device.

Best Practices

I think there are some tricks that can help you have your icons and still have accessible content. The key is to avoid using an non-standard icon alone in text.

  1. Try to keep icon use decorative by adding a text equivalent. Icons used as a decorative element really don't an ALT tag anyway, so being skipped by a screen reader will not be too disruptive. You can also work around bad encoding issues also.

  2. Use font to generate an image. If an icon use being used alone, then an image with ALT tag may still work. BUT an icon font can give you a head start by providing a nice set of vector based images to work with. Sweet!

  3. OR...you can re-add a text equivalent, but hide it from sighted users. This is a sneaky way to add back in ALT text for screen readers.

Categories:

Nunavut Offical Languages Act and Canadian Aboriginal Syllabics

|

A script that may not be well-known to U.S. citizens in the Canadian Aboriginal Syllabic script which is a syllabary used to write certain indigenous languages including Iñuit languages spoken in the Nunavut territory of Canada.

This script is about to appear in many more documents and signs because the Nunavut's Official Languages Act is coming in to force to promote the Iñuit languages to be official languages alongside and English.

In addition to the languages of Nunavut, this script is used in Canada to write a number of indigenous languages including Ojibwe, Blackfoot, Cree and others. In contrast, most indigenous languages in the U.S. are written in the Latin alphabet with the notable exception of Cherokee.

I'm curious if indigenous communities in the U.S. would consider adopting this script to further differentiate themselves from the U.S. If that happens, the Nunavut law should ensure that proper Unicode support is available.

Categories:

MathML Test on MovableType

|

If you're on Firefox 4+, Safari 5+ or Internet Explorer 9 with MathType Player 3, the text below with be a MathML representation of Planck's Law.

If you want to replicate this you have to:

  1. Paste the XML in the HTML code (i.e. NOT the WYSIWYG editor)
  2. Make sure that the first line of the XML includes a link to the MathML namespace as follows:
    <math xmlns="http://www.w3.org/1998/Math/MathML">
  3. I also like to use CSS to bump the font size - those super/subscripts can get very tiny.

All I can say is - Wow. I wish all my CMS systems played this well with MathML.

E λ b = 2 π ℎc 2 λ 5 e ℎc λ k b T 1

P.S. If you send me comments on this blog, please note that this text may possibly depart from standard academic English. Linguists can do that, especially in a blog.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments