I was so excited to get some time to test new tools that I tested three for basic Unicode support. My updates:
Yammer - Pass
Yammer is a service similar to Twitter but with more tools suitable to a corporate environment. I posted some text with obscure phonetic characters and some Devanagari, and results were generally good.
This was done on a Mac via the Web site interface and via the desktop client using. It seemed low fuss enough that I suspect support is good in most configurations. Note that third party clients are always an unknown. For instance, although Twitter also has excellent Twitter support some of the third party viewers was pretty bad.
iBooks Author - Pass
Most apps from Apple have good Unicode support and this is no different. My only concern here is font control. It looks like you can define new styles based on pre-existing formatted text, but can't really edit existing one.
One non-Unicode gripe is that some styles had small caps and I was not able to disable that. It may not be a show stopper in most docs, but not all scripts include small caps (or even distinguish capital/lower case).
I gather that the format generated is a form of XML (per Alan Quarterman) with HTML features and CSS....but the CSS is hard to directly edit. Whenever you leave the Western alphabet with few controls over font presentation, it's time to be nervous.
Google Earth - Pass, but slightly Tempermental
For the record, I am in love with Google Earth as a teaching tool. However, entering data was tricky.
The keyboard methods seem to work fine to enter text for items such as new locations, and so forth. However I had problems with using the Character Viewer (OS X 10.7). I would double click the symbol and nothing would happen ;(. Then again, it could be the new Character Viewer although it seems to be OK with Yammer.
In any case, this could be an issue if a user is trying to use a cute emoji symbol. Cut and paste from another document did appear to work.
Categories:
For Unicode fans, one of the bigger and more useful changes in the Mac 10.7 Lion operating system is the updated Character Viewer. Despite improvements though, it's different enough that I think some documentation is work posting.
How to Access
As with previous versions Character Viewer is activated in the Language & Text section of the System Preferences menu. See http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.html for details.
How to Find Characters
By default, the Viewer only gives options for Symbols (e.g. "Arrows, Punctuation, Currency Symbols, Emoji"), I actually like how the symbols have been organized into semantic groups rather than by numeric block. However, the list does not include all the scripts I need to access.
Fear not though - the other block are availables. To view other blocks click the Gear icon, then select Customize List. This opens a pop-up window which provides a list of all available symbol lists, including Unicode, which is the entire list organized by Unicode block.

How to Insert
The basic mechanism is to select the character you want to highlight. In previous versions of this tool, there was an Insert button, but this has disappeared.
In this version, you need to do the following.
- Place your cursor at an appropriate insertion point in your document.
- Open the Character Viewer.
- Find and highlight your character.
- Double click the character. It will be inserted into your document.
Note: You can also drag and drop characters into your document.
Favorites
A feature that I am now using is the Favorites list, a place to list commonly used characters to insert. This puts everything in one list which you can order as needed.
You can add Favorites by selecting a character and either either clicking the Add to Favorites button or dragging it into the Favorites list.
Conclusion
Although I was a little confused by the new Lion Character Viewer, I actually think it was a good overhaul. It suits the average user who needs access to common symbols and emoji, but allows us more dedicated users access to what we need.
A final improvement worth noting is that the Font Variation feature is much more stable than in previous features. This is perfect for times when you need to debug a weird character/font combination.
Categories:
If you've ever wanted Unicode support for snowflakes, decprative arrows, crosses and stars, then you may be interested in the free Erler Dingbats font from the Font Shop. The fonts even ship with keyboard layouts to make data entry easier
The image below shows roughly the glyphs covered (generally in black and white). There are more characters covered in the for-fee font DD Dingbats 2.0, but even these provide some interesting possibilities in terms of documentation and even fancy bullet lists (especially if combined with font embedding.
Categories:
Unicode hasn't been part of my life enough recently, but it did emerge in a very unexpected way this week to during a recent calendar upgrade.
One of the conversion tasks was for us to add group e-mail addresses so we could share calendars among each other efficiently. But when I tried to copy and paste, I got a "not found error." Here is one of these addresses (altered for security reasons):
umg-sc.foo.staff@fuyu.ucal.psu.edu
Can you spot the problem (HINT: Try cutting and pasting into a text file).
Given up? The problem is the hyphen. In the right font, you will see that it's not just a hyphen (U+002D or ASCII #45), but actually the more elegant and slightly longer en dash which is actually U+2013 (not in ASCII). As many of you know, many databases are still sensitive to differences, so a hyphen is just not the same as an en dash. Theis means searching is a FAIL.
How did the en-dash get in there if it's outside of ASCII? My guess is that it's a result of an auto-correct feature from Word which makes some formatting tweaks to enhance visual appeal. One is to change plain hyphens into a slightly longer en-dash (more favored by typographers).
Another common change is to convert plain straight quotes (" at U+0022 or ASCII #34) to "Smart Quotes" like (“ at U+201C) and (” at U+201D). Copying HTML code attributes from Word can be similarly dangerous since HTML recognizes plain quotes, but NOT fancy double quotes. Most of the time, the change does nothing, but when it comes to interacting with some systems, the reformatting makes a difference in a very annoying way.
How to catch it? In some cases, you can change the font, but many fonts make the dash and en-dash appear identical (Arggh!). Which leaves the old standdy (test,test,test) plus some Unicode awareness (which is increasing among programmers).
Categories:
The Unicode Consortium announced they they had created a Unicode "Pipeline Table" page of characters scheduled for future versions of Unicode.
The table is organized by projected UCS code point number, but they are in various stages of the proposal process. Although dates of acceptance to a particular stage are posted, the target future version is not listed. Although many specifications look complete, the Unicode Consortium does warn that they are subject to change.
If you are interested in entire script blocks (particularly Ancient and lesser-known Indian scripts) coming to Unicode, you can go to the Proposed New Script page. The caveat that "things are subject to change" also applies here.
Categories:
Categories:
They're kind of scattered, but it looks like the next version of Mac OSX will be bringing lots of good enhancements for those working outside of English.
Asian Fonts and Text Input
Support for many scripts from South Asia has been lagging behind Windows, so I am personally pleased to see fonts for Bengali, Kannada, Malayalam, Oriya, Telugu and Sinhala being added (especially since I took 12 credits of Sinhala back in the day). New fonts for Tamil, Devanagari, Gujarati and Urdu are also scheduled to be added as well as for Lao, Khmer and Myanmar.
Those working with East Asian languages should be able to access improved utilities for Chinese (filtering by tones, ordering radical/stroke), Japanese Kotoeri and Vietnamese (old and new orthography). The Chinese handwriting recognition software is also scheduled to include more support for Simplified Chines and Roman characters. Finally, Apple announced that Lion will support vertical text (typing and display)
Everyone will also be able to a new color emoji font.
In Safari
Improvements for Safari included:
- Math ML support in Safari
- Improved CSS3 support including vertical text, East Asian emphasis, auto hyphenation
Non-English Accessibility
Accessibility options for those not using 100% English are not available include Voiceover speech in 23 languages and expanded Braille options.
Categories:
The latest draft of the CSS3 writing modules came out recently, and it includes revised specifications for how to handle vertical East Asian CJK text as well as specifications for RTL (right-to-left) text.
Although minimal support for RTL text has been around in recent years, vertical text remains a hurdle, partly because it's not clear which standards the vendors will agree to. The only browser I know supporting a vertical text spec is Internet Explorer, but it's layout specification was developed by Microsoft, and it does not appear that it is being adopted as is for CSS 3 (see proposed CSS 3 vertical properties for details). It also looks like a vertical text scheme for SVG is also being deprecated.
Will vertical text be possible across platforms? Only time will tell.
Categories:
Over at Language Log, linguist Victor Mair has written an nice article about how a manual "typewriter" in Chinese/Japanese works complete with two YouTube video demonstrations.
The video shows how time intensive it was, and I would also say that it's not so much a type writer as a miniature printing press. Blocks for characters are stored in a tray and an operator moves the imprinting device over the correct character to make an impression on the paper (hunt and punch?). One tray holds about 2,000-3,000 characters, but extra trays are available with additional, rarer characters. Wow.
Computers have changed the process because many Japanese and Chinese typists can enter a Romanized syllable equivalent (e.g. "MA") and then select from a list of appropriate characters. In Chinese characters are further organized by stroke radical in many input methods. In any case, these methods allow users to use a smaller, Roman alphabet type keyboard, but there's still an amazing amount of computer and human processing.
Categories:
I've heard some buzz about newer methods font-embedding, but hadn't had a chance to test it until now. The good news is that you CAN embed fonts across multiple browsers (including Internet Explorer, Safari, Firefox and Google Chrome.) The silly news is that it looks like each browser wants a different font format (or pretty darned close). But it's surprisingly robust for all that.
I'll describe the process, but I strongly recommend getting help from a Web font repository like Font Squirrel, Webfonts.info or Kernest which will generate some code for you. I will be documenting with Font Squirrel so I won't have to rely on remote hosting.
@font-face Theory
The magic of modern font embedding happens via the magic of a @font-face CSS style declaration. This declaration names the font then provides the URL so it can be embedded, but because each browser supports one and only one format, you actually need links to four different uploaded versions of the font.
The font versions in play are the following:
- Embedded Open Type/EOT (.eot) from Microsoft for Internet Explorer - this has actually been around since the late 90s but is only now living to its potential.
- TTF and OTF - These are the usual True Type and Open Type font formats and embedding these are supported on FIrefox (3.5+), Safari (3.1+) and Opera (10+).
- SVG - The Scalar Vector Graphic format. This is the format that Google Chrome and many mobile phones support including iPhone 3.1 (although Droid apparently supports TTF).
- WOFF - this is a new format that is supported on Firefox 3.6+, Internet Explorer 9+, Chrome 5+
I'll talk about how to get the different versions of the fonts in a future blog post, but it looks like that at some point the key format will be WOFF which is a compressed version of a TTF/OTF font. Since embedding requires the viewer to download a font, smaller font sizes are better.
Simple Download with Character Range Tip
Another piece of helpful news is that some common open source fonts have been converted for you including Galatia SIL (Greek and Latin) and Gentium (Phonetics, Extended Latin plus Greek) (thanks Font Squirrel!)
Warning - there is a download catch. Font Squirrel assumes that you are writing in English only, so the default download gives you ONLY ENGLISH LANGUAGE characters in order to make the file size smaller.
Since you're at a Unicode blog, I will assume you want these fonts for their non-English characters. So when you download Galatia SIL and Gentium, make sure you do the following:
- At Font Squirrel, select the font you wish to embed.
- Change the Choose a Subset menu from English to Don't Subset
- Click Download@font-face kit
Planning Supported Ranges in Fonts
Speaking of character ranges, you should plan your embedded font selections carefully so that viewers download a font with only the characters needed to view the Web page. That is, you probably want to avoid full versions of the mega fonts and use specialized fonts or slimmed-down versions of a mega font. Indeed, if your script is well supported (e.g. Chinese, Japanese), you can probably skip font embedding except for some extremely rare characters.
For instance, on the Penn State Computing with Accents language pages, I will be including custom @font-face declarations for the specific scripts used on a page. One of these is the Greek Unicode page in which Galatia SIL is embedded. (FYI - I embedded Galatia SIL because it includes some of the rarer Greek characters and is a serif fonts, which I do like for reference).
Some Embedding Code
Let's talk about embedding Galatia SIL on a Greek page. The @font-face file that is downloaded from Font Squirrel contains the different versions of each font as well as sample code and CSS declarations to copy and paste.
Once the file is downloaded, you can test locally, then upload the fonts and your new pages to your Web site. I put any fonts I will embed into a fonts directory (along with licenses in case anyone pokes around). I also put each font into its own folder.
The next step is to add a @font-face declaration in CSS. Here is mine, based on the stylesheet.css file from Font Squirrel:
<style type="text/css">
<!--
@import url("../int.css");
/*** @font-face code adapted from stylesheet.css file from Font Squirrel. Thanks again! ***/
@font-face {
font-family: 'GalatiaSILBold'; /*** Name of Font ***/
src: url('/fonts/Galatia/GalSILB-webfont.eot'); /*** Link to IE EOT file first ****/
src: url('/fonts/Galatia/GalSILB-webfont.eot?iefix') format('eot'), /*** EOT again with IE Version control **/
url('/fonts/Galatia/GalSILB-webfont.woff') format('woff'),
url('/fonts/Galatia/GalSILB-webfont.ttf') format('truetype'),
url('/fonts/Galatia/GalSILB-webfont.svg#webfontJEXBBlW4') format('svg');
font-weight: normal;
font-style: normal;
}
-->
</style>
This embeds the font on a single page, but if you need to embed a font on multiple pages, add the @font-face declaration to the site-wide .css file.
At that point, the font named in the font-face declaration can be used as part the font-family or font attributes in later declarations. Here is my .bigbluegreek class and then the reference to the class used in HTML
.bigbluegreek {font-family: 'GalatiaSILBold', 'Arial Unicode MS', sans-serif;
font-size:24 px; color: #006; text-align:center;}
<!-- Table Cell -->
<td class="bigbluegreek">μ</td>
Note that the font-family declaration still includes alternate fonts...just in case the font-embedding doesn't work on a particular browser.
Font Copyright
I'm going to the end the entry here, and talk about font conversion another time, but if you do want to embed a font, make sure the license lets you do it. Many open-source fonts include the options to modify the font, so creating alternate versions is OK. Commercial foundries are also offering @font-face kits for their fonts also... for a fee.
Recent Comments