Recently in Tool Tests Category

JAWS 13 and Phonetic Symbols

|

A a linguist, I work with lots of exotic symbols, but only a small percentage of them are recognized by the standard U.S. of JAWS. If you work with phonetic symbols like /ə, ʃ,ʒ,ɰ/ you will need to tweak your pronunciation files.

I wrote about this in an earlier post on JAWS 6, but today I was able to document and implement, so I thought I would share the procedure.

The fix I am using will expand the symbol set within JAWS so that a character like /ə/ will be read as "schwa" (but not as its phonetic value of "uh") Ideally, it would be nice to have a word pronunciation engine so that phonetic pronunciation values are emulated, but let's take this one problem at a time.

SBL Files

JAWS includes a set of symbol or .sbl files which match punctuation and symbol characters with a "word" (e.g, ? = "question mark"). The key is to add the character and reading to your working files.

Luckily, there there is a phonetic symbol .sbl file from Robert Englebretson. There's also a math symbol .sbl file from Carroll Tech.

Add Characters to Symbol File

This procedure assumes that JAWS is using the Eloquence engine, in which case the key file to change is eloq.sbl. You will also need to have an Admin account to implement the changes.

Note: SBL files can be opened in any text editor such as Notepad.

  1. Open or download phonetic symbol .sbl file (New Window)
  2. Find the location of your eloq.sbl file. Mine was in the the following path on my C hard drive:
    C:\Users\All Users\Freedom Scientific\Jaws\13.0\Settings\enu\eloq.sbl
  3. Make a (second) copy of this file and rename as eloqOld.sbl. This is your backup in case something goes wrong.
  4. Make a third copy and rename it as eloqNew.sbl. This is a temporary file to edit since you may not be able to directly edit eloq.sbl.
  5. Open eloqNew.sbl in a text editor such as Notepad. This file contains pronunciation values for multiple languages. Scroll to the language you normally use (e.g. "[American English]"
  6. Scroll to the end of the symbol list for that language.
  7. Copy and paste the list of symbols from one of the other .sbl files immediately after the final line in the list. Each symbol will be in a single line and have the format U+0001=character name
    Note: Don't worry if the format does not match the rest of the symbol list.

  8. Repeat the last step for each language you want to support. You can translate character names as needed for each language. Save and close file.
  9. Exit JAWS if it is open.
  10. Delete eloq.sbl. You may be asked for an admin password at this point.
  11. Rename eloqNew.sbl as eloq.sbl.
  12. Restart JAWS and test on a page such as IPA Characters based on Letter A with Numeric Codes

Look Up Additional Codes

Each line in the SBL file has this format:

U+Codepoint=Character Name (no quotes)

For instance, if I wanted to expand the repertoire of currency symbols to include the new rupee symbol of India (₹), I would add the following to my .sbl file

U+20B9=Rupee symbol of India

A list of Unicode charts with code points is available at http://www.unicode.org/charts

Categories:

Unicode 3Play - Yammer, Google Earth, iBooks Author

|

I was so excited to get some time to test new tools that I tested three for basic Unicode support. My updates:

Yammer - Pass

Yammer is a service similar to Twitter but with more tools suitable to a corporate environment. I posted some text with obscure phonetic characters and some Devanagari, and results were generally good.

This was done on a Mac via the Web site interface and via the desktop client using. It seemed low fuss enough that I suspect support is good in most configurations. Note that third party clients are always an unknown. For instance, although Twitter also has excellent Twitter support some of the third party viewers was pretty bad.

iBooks Author - Pass

Most apps from Apple have good Unicode support and this is no different. My only concern here is font control. It looks like you can define new styles based on pre-existing formatted text, but can't really edit existing one.

One non-Unicode gripe is that some styles had small caps and I was not able to disable that. It may not be a show stopper in most docs, but not all scripts include small caps (or even distinguish capital/lower case).

I gather that the format generated is a form of XML (per Alan Quarterman) with HTML features and CSS....but the CSS is hard to directly edit. Whenever you leave the Western alphabet with few controls over font presentation, it's time to be nervous.

Google Earth - Pass, but slightly Tempermental

For the record, I am in love with Google Earth as a teaching tool. However, entering data was tricky.

The keyboard methods seem to work fine to enter text for items such as new locations, and so forth. However I had problems with using the Character Viewer (OS X 10.7). I would double click the symbol and nothing would happen ;(. Then again, it could be the new Character Viewer although it seems to be OK with Yammer.

In any case, this could be an issue if a user is trying to use a cute emoji symbol. Cut and paste from another document did appear to work.

Categories:

The Twitter Unicode Test: A

|

Just for kicks, I decided to run Twitter through some Unicode tests, and I give it an A. For the record, I pretty much knew from Twittervision it supported a lot of encodings, but I threw a few more exotic tests...just to see.

The first was my standard phonetic character test...from a Mac. As far, as I'm concerned you have to pass this to be a serious global Unicode contender in my book. I also through in a long vowel (ā) and the one Hebrew word I can type (שבלת) or "shibboleth" to confirm right to left support.

What impressed me though, oddly was the the support for entity codes like é (é) and &&x0909; (उ);. Twitter can accept either raw é or it can take é and convert it to é. This differs from other modern tools like Facebook or XML which can only accept raw Unicode input (entity codes break).

Accepting either format is probably a pain to program, but very nice for the user. Having to remember when to enter entity codes and when to enter raw Unicode is confusing, but still an all-too-common reality. I appreciate Twitter for making the transition a little easier... even it's only for 140 characters.

Screen capture of Twitter messages with Unicode Characters in test messages

Categories:

Arial Unicode on OS X (Leopard)

|

I was able to upgrade to Leopard recently on my Mac which means I'm able to manipulate a working version of Arial Unicode MS for the Mac...yeah.

Web Display

My blog actually switched to Arial Unicode because of the way I had coded the CSS. It was very legible, but the x-height seemed smaller in comparison to the Apple Lucida Grande - so I reordered the priority. I will have to see if I can download Lucida Grande onto Windows via the Windows Safari download.

Back to the Logic Symbols in Word

Most of my recent Unicode adventures have been about inserting logic symbols like (∨,∧,⊃) into Word (and later Excel). My main struggle has been that if I insert them from the Character Palette, the font switches to Symbol... which is OK until I start typing English. At that point I will stop outputting the English alphabet and σταρτ ουτπυτιν τηε γρεεκ αλπηαβετ. Greek is great...unless you're typing English text. I was using the left arrow key quite a bit.

Now that Microsoft has developed a working version of Arial Unicode MS, I can input the symbols without switching over to Greek. The only gotcha is that I have to shif old logic symbols out of their pre Arial Unicode fonts (thank goodness for keyboard shortcuts). What I'm hoping is that I can bypass the big font switch in Windows word too.

So I'm happy to say that we're adding another small step towards Unicode compatibility. Finally I can have logic symbols in a non-Greek, non-Japanese, non-Chinese font!

Categories:

Microsoft Word ∧ Logic: Inserting the Right Code Point

|

The Insert Symbol Tool in Word

As I said last entry, I'm working on a symbolic logic course and am learning new quirks for dealing with with Unicode logic symbols...and one of them apparently is the Microsoft Word Insert Symbol tool (this is found by going to Insert » Symbol in most versions of Word.

Like the Windows Character Map and Mac Character Palette, the Insert Symbol tool lets you insert single characters into a document so you can change "P implies Q" to the logical formulation P ⊃ Q or P → Q depending on your symbolism (and you can also switch between "P and Q," P & Q or P ∧ Q).

But...unlike the Windows Character Map/Mac Character Palette, the Insert Symbol tool can take you on a little detour out of standard Unicode and into the Microsoft Private Use Area block - or the block where vendors can define their own characters. For instance, when I tried to insert the character ∩ (union) into a document, I noticed that the Insert Symbol palette gave a code point of U+F0C7 instead of the expected U+2229, and yes the U+F0 code is a sign that you are in the Private Use Area.

InsertMathSymbolMac.png

First I should say that there is a rationale for this. You'll notice that the font in the graphic is set to "Symbol" which is an older pre-Unicode font which was used to insert lots of special mathematical symbols. The Private Use set-up undoubtedly prevents a lot older documents from breaking.

So What?

If all you're doing is using with Word, the Insert Symbol tool may still be working for you. But these days, more and more documents are actually destined for the Web or some other format...and not all tools recognize the Microsoft Private Use codes.

The way I first noticed that the logic symbols weren't standard Unicode was that some logic symbols did not "convert" well to HTML in Course Genie but mysteriously became things like "(". The ones I had inserted properly converted, but not the ones inserted with the Word Symbol tool. Ugh.

The use of proper Unicode versus an older format does have a real world impact.

Summary

To avoid the Private Use function in new Word documents just always use the WWindows Character Map and Mac Character Palette. On Windows, you may need to switch the font to Arial Unicode.

Or if you're especially insane, you can develop your own logic symbol keyboard utility.

Categories:

Course Genie and Unicode: A–

|

Since my day job is online course developer, I get to work with a lot of academic tools, including my newest tool the Course Genie (or Wimba Create) Word plugin.

This is a tool which takes a Word file "injected" with the right styles and converts a long Word manuscript into a set of well-formed HTML documents complete with table of contents page and page navigation. Even if you don't insert any self-test quizzes, this is a major time saver. But...can it do Unicode?

For once, this is a real issue since the course I'm working on is symbolic logic and uses plenty of specialized symbols like ∪,∩,∃x,∀x and so forth. So far I've been pleasantly surprised to discover that the CourseGenie planners did think ahead and implemented decent Unicode strategies.

The good news is that if your instructor (aka subject matter expert) hands you a Word file including these symbols, you may not have to do much other than make sure that the symbols are inserted from the Character Map and not from an old custom font. Course Genie by default will either convert these to numeric codes...or if you select a special UTF 8 theme, even include the UTF-8 meta tag.

For most modern browsers this is sufficient. The only gotcha is that it sets everything to Verdana text (even the symbols) and IE 5/6 acts a little strange when fonts for special characters are pre-specified for Arial Unicode.

The other complaint is that that most theme settings insert the ISO-8859-1 Latin-1 encoding meta tag instead of UTF-8...EVEN THOUGH the base XML file is UTF-8. Unless you know to select a UTF-8 theme, you won't get meta tag. Not only does this make me nervous on principle, but it means that you have to be extra careful if you ever edit the files in another program like Dreamweaver.

Categories:

Igbo in Facebook - It Can Be Done (But Numeric Code Breaks)

|

How does Facebook handle accents? Pretty well actually - but you can't use the numeric code. Instead you have to directly insert the character either by typing it in an Igbo Keyboard or via the Windows Character Map or Mac Character Palette.

For Web 1.0, the safest way to display accented letters was with numeric entity codes. For instance, if wanted to display Ụwa, I might write Ụwa within the HTML document. The codes were safer because they would work even if a developer forgot to include the UTF-8 meta tag.

In a Web based form, the rules may differ depending on how the developer configured the service. In some forms, you MUST enter the numeric code (often because the UTF-8 tag is missing). In other cases you CANNOT use the numeric code - this is true when you are entering data into a text field which will not go through any HTML formatting schemes. As long as the output has the UTF-8 meta tag (and Facebook does), you can avoid a numeric code (i.e. enter a "raw" accented letter) and still be OK.

How can you tell? Unfortunately, you have to test each application one by one. As I've commented before, applications which truly expect to support a global audience are generally UTF-8 ready and you can skip the numeric code. This includes Facebook, MovableType, iTunes, GoogleMaps, Twitter and so forth.

Being able to skip the numeric code is a positive sign (why memorize numbers when you can type?), but as with all change, there will be some old habits to break.

Categories:

EndNote and Unicode Input

|

EndNote has supported Unicode since version 8, but I'm just now getting around to testing it. As long as you activate the appropriate keyboard, Unicode input appears to be fairly straightforward, at least on the Mac (and I assume the PC).

Just a font note - I ended up changing my default in the preferences from Helvetica to Lucida Grande. EndNote does change fonts with script, but Lucida Grande is legible and has a large set of Unicode characters. The equivalent font is ArialUnicode on the PC.

FYI - there may be some quirks for export. See the University of Sydney's EndNote documentation for details.

One nice feature they point out is the "Translated Author" and "Translated Title" field.

Categories:

Google Maps: Officially Unicode Friendly

|

I've been playing around with adding markers to Google Maps and I can report that so far it appears to be pretty Unicode friendly. At least I was able to input phonetic characters and I've seen multiple character sets in use for different maps.

I wish all apps were this easy...

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments