ELIZABETH J PYATT: October 2008 Archives

7 Things You Should Know About Unicode


If you know about the Educause 7 Things You Should Know About... Series, then you should know that it is important to be able to identify seven important elements about any technology.

So here is my spin on what the "you should know" (or what someone not familiar with Unicode might need to know).

1. What is it?

Unicode is an encoding scheme. Each character in each script has a number (because computers track everything by number).Unicode is an encoding standard of millions of characters allowing literally any character from any script to be assigned a number. Unicode does this by assigning a block of numbers of a script (http://www.unicode.org/charts)

Unicode began in 1999 and focused the most commonly used scripts first such as the Latin alphabet, Cyrillic, Chinese, Japanese, Arabic, Greek, Hebrew, Devanagari and others.All major world scripts are covered, as well as many minority and ancient scripts.

2. Who's doing it?

Unicode encoding has been incorporated into Windows (since Windows NT), Macintosh (since OS X) and new versions of Linux/Unix. Applications supporting Unicode include newer versions of Adobe applications, Microsoft Office, the Apple iLife/iWork series, FileMaker, EndNote, Google, GoogleDocs, Twitter, Zotero, blogs, Facebook and many others.

3. How does it work?

To read Unicode text, a user needs to have the correct Unicode font installed. Both Apple and Microsoft provide well-stocked fonts for free, but not every character is covered. Fortunately many freeware fonts are available.

To enter Unicode text, users must activate keyboard utilities or use special escape codes to enter characters for the appropriate script. Again Microsoft and Apple provide a lot of built-in utilities, but additional ones are also available online, many as freeware.

4. Why is it significant?

Consistent encoding allows users to exchange text consistently and for font developers to develop new fonts with a wide range of characters in a consistent manner. When properly implemented, a Mac user can read a Greek text file created on a Windows machine with minimal adjustment.

5. What are the downsides?

One is that older programs developed before Unicode may need to be retrofitted if they are meant to be used by a global audience. Programmers need to learn new techniques in order to take advantage of Unicode encoding.

The other remaining problems is that Unicode implementation on the user end is still confusing. Users working with languages other than English need to either activate/install special utilities or memorize a series of special codes. Methods to input text also vary from software to software. A lot of tech-saviness is required in order to maximize Unicode compatibility.

6. Where is it going?

The goal is for every script, even those for ancient languages, to be encoded within Unicode. This will not only enable new technologies to be used in any language, but will allow texts from around the world to be digitized in a common format. Unicode support for major languages has arrived, but support for many lesser-known scripts and quirky cases in major scripts still needs to be implemented.

7. What are the implications for teaching and learning?

Unicode will

  • Simplify the display of non-English texts in foreign language courses and courses taught in non-English speaking areas
  • Standardize the display of mathematical and technical symbols
  • Allow non-English speaking communities to write in their native scripts instead of transliterating text in the Roman alphabet
  • Expand the typographical repertoire of font designers
  • And...if you're a pioneer...Unicode will introduce you to the joys of converting between decimal and hexadecimal values


W3C Japanese Layout Task Force


The latest reports from the W3C Japanese Layout Task Force is posted at
http://www.w3.org/2007/02/japanese-layout/. The working language is Japanese, but key documents are translated into English.

The page also includes a basic layout primer which discusses issues for vertical layout iin Japanese, Ruby Annotation (not Ruby on Rails), switching to the Roman alphabet, Japanese punctuation and more.


My Page of Rune Resources


I've created a quick Runes on the Web tutorial on this blog at http://www.personal.psu.edu/ejp10/blogs/gotunicode/charts/runes.html



About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments