Unfortunately, Unicode works a little differently in different software packages and online technologies. What works for one technology won't necessarily translate to another. Here are some of my experiments and traps I've had to stumble out of.
Table of Contents
- Best Browsers (March, 2006)
- CSS Files and Unicode
- Flash: Inserting the Union (∪) and Intersection (∩) Symbol into a Text File for Flash 8
- RSS and Unicode
- Server Side Includes (SSI) and Unicode
Once a year I test the current browsers (Mac and Windows) on different scripts, so I get to see a wide array of quirks. No one single browser is perfect, but a few a great for every day use. My recommeded browsers for intense Unicode browsing are in order:
1 - Opera
|Opera is an Unicode excellent browser all around. I especially like it because it allows you to change fonts via the style sheet and it has the best Indic font support on the Mac (although not perfect - this is probably an Apple thing).
It's only drawback is that it doesn't support Plane 1 fonts (e.g. Gothic, Old Italic). But how many people need Plane 1 font support now?
2 - Firefox/Mozilla
|Mozilla/Firefox are also excellent Unicode browsers (and I use them everyday). They let you set the font for most scripts and they support Plane 1 fonts (unlike Opera). And then there are all the cool plugins.
The only reason it's not first is that the Mac version doesn't have perfect Indic support. Vowel marks are placed a little funny.
3 - Safari (Mac)
|Free from Apple. This is fast and works great so long as you don't care about font display. As long as you have a font with the right characters loaded, Safari will work. But I wish it didn't always default to Lucida Unicode for phonetics. I have other fonts installed - I want to pick them.|
4 - Internet Explorer (Win)
|Free from Windows. This is almost tied with Safari. It's advantage is that it supports East Asian vertical text when no other major browser does.
But I find that it's a little quirkier than I like. For phonetic scripts, my CSS must manually specify the correct font, or else the default Times New Roman will generate the question mark of death. And it's missing Plane 1 font support.
5 - Netscape 4.7 (Win)
|Netscape 4.7 for Windows had many problems with CSS, but its Unicode support is pretty decent. Still I would suggest moving on.
The one bad quirk is that Netscape 4.7 does not recognize new entity codes like
5 - Internet Explorer (Mac)
Older Mac browsers like IE 5.5 coded display script by script and they only got around to the major scripts, meaning only Russian, Chinese, Japanese, Korean, Central European, Baltic and Turkish are properly supported. Everyone else is out of luck.
You can have a valid font for another script, but IE 5 will still give you the question mark of death. Time to make the plunge to Firefox, Safari or Opera.
6 - Netscape 4.7 (Mac)
This has all the problems of Internet Explorer, but it has the added quirk of using characters from Japanese fonts for Russian and Greek. The spacing is truly hideous.
Note for System 9 users - You may need to stay with System 9 for economic reasons, but your Unicode experience will never be as good as it needs to be. And many vendors have switched to OS X, so I strongly recommend an upgrade to OS X (and not only for Unicode).
In Dreamweaver, I set my default encoding for my files for utf-8 (Unicode) so that all my HTML files will be encoded as Unicode by default. This has improved my life quite a bit, but CSS did take a hit.
I code my CSS by hand generally speaking, but occassionally I would add a style attribute that would not appear in either Dreamweaver WYSWIG or in the browser. I would literally copy and past correct code from another document into my CSS file and NOTHING would happen. Text remained plain and black, even if it was supposed to be red and italics.
The culprit was that the .css file was also in UTF-8 format which adds a few invisible bytes. Occassionally, the browser would encounter one of these and not be able to parse the attribute. Very weird.
So... I now have to create CSS files in BBEdit, and make sure that the encoding is set to plain vanilla MacRoman (or ANSI in the Windows world). Now if I don't see styles, it's usually because of a syntax error.
So the future is coming, but remember that alternate transportation still may need tobe used to get there.
I'm working on a math quiz in Math for set theory where questions are pulled into a text file. The instructor wants to include the union (∪) and intersection symbol (∩) in his problems, so what to do?
The good news is that if you can create a UTF-8 text file and insert the symbols, it will import into Flash (at least in Flash 8.) For math, your best bet is usally to use the Windows Character Map utility and insert the symbols into a Notepad text file or use the Macintosh Character Palette with a Text Edit text file. Unfortunately, the process is still a little clunky in both platforms, but it's better than in 2005.
You have to open both Notepad (Start » Accessories) and Character Map (Start » Accessories » System Tools)
For the Windows Character Map, it's a semi-clunky process. You have to switch the font to "Arial Unicode MS" (because it has the all themath symbols), then scroll down to window untul you see the math section. Then you have to select, copy and paste each symbol into Notepad.
In Notepad, when you save the file, you have to make sure the encoding menu under the file name is changed from "ANSI" to "UTF-8". Fortunately, it will warn you.
In Text Edit for the Mac, you go to Edit » Special Characters to bring up the Character Palette. Click the Math option and hunt for the symbol. Highlight and click Insert to place it in Text Edit.
Once you insert the symbols, you have to make sure your encoding is set to UTF-8 during the save process. Go to the Format menu and select "Make Plain Text." Then, when you save the file you have to make sure the encoding menu under the file name is changed from "MacRoman" to "UTF-8".
Reopening UTF-8 Files in Mac Text Edit
In Text Edit, if you reopen a UTF-8 file it may be magically transformed to MacRoman (you'll see things like Á& instead of your intended character). Very annoying (grr!!) To prevent this, you must go into the Text Edit Preferences, then click the Open and Save panel. Make sure that the Plain Text Encoding options for opening and saving are set for "UTF-8." Or you can spring for a license for BBEdit or Mellel which are better about warning you.
As for Flash - fonts are still a little tricky within Flash, but at least it's playing well with text files.
I'm sure you've wondering if the iTunes music application supports Unicode, and the answer is ... YES! (at least on the Mac version).
I found this out when I imported two Japanese instrumental tracks from the Kill Bill (Vol 1) soundtrack and discovered that the Japanese artists names were written in Japanese script. Then I added some phonetic Unicode phonetic transcription using the IPA-SIL keyboard as a test.
So...if you have iTunes playlists for non-English musicians, you will be able to represent titles and names in their native format.
Last entry, I used a long a with a macron (ā) in the title to see how Unicode support was. RSS is an XML format designed for news readers. Like other XML schemas, Unicode is a supported encoding.
For once, this is actually true in implementation, and even more unusually you MUST use native Unicode (RSS and other XML files do not support HTML entity codes). So... I took a deep breath, activated my Macintosh Extended keyboard, typed the macron into the XML and posted it. You can also insert a character via the Windows Character Map or the Mac Character Palette.
The long a (ā) appeared in NetNewsWire (Mac), Safari (Mac) and Firefox (with CSS). It did not appear in FeedDemon (Windows)... still not bad though. There is a lot of work to be done with Unicode, but the future is coming.
The Penn State server delivers UTF-8 Unicode pages. Dreamweaver creates Unicode pages. They appear fine in all my browsers without the entity code translation. So I should be able to include Unicode characters in server side includes - right?
Not exactly. Any .inc file must be encoded as ASCII and only include ASCII characters. If you want to include a Unicode character (like the £ symbol, you have to use an entity code like £ (all characters in the entity code are ASCII). If a non-ASCII character is included, then users will see a question mark, indicating that the character is "not available in the font" (yeah right).
It's little glitches like these that make Unicode development still an entertaining adventure even in 2006.