JAWS 13 and Phonetic Symbols

|

A a linguist, I work with lots of exotic symbols, but only a small percentage of them are recognized by the standard U.S. of JAWS. If you work with phonetic symbols like /ə, ʃ,ʒ,ɰ/ you will need to tweak your pronunciation files.

I wrote about this in an earlier post on JAWS 6, but today I was able to document and implement, so I thought I would share the procedure.

The fix I am using will expand the symbol set within JAWS so that a character like /ə/ will be read as "schwa" (but not as its phonetic value of "uh") Ideally, it would be nice to have a word pronunciation engine so that phonetic pronunciation values are emulated, but let's take this one problem at a time.

SBL Files

JAWS includes a set of symbol or .sbl files which match punctuation and symbol characters with a "word" (e.g, ? = "question mark"). The key is to add the character and reading to your working files.

Luckily, there there is a phonetic symbol .sbl file from Robert Englebretson. There's also a math symbol .sbl file from Carroll Tech.

Add Characters to Symbol File

This procedure assumes that JAWS is using the Eloquence engine, in which case the key file to change is eloq.sbl. You will also need to have an Admin account to implement the changes.

Note: SBL files can be opened in any text editor such as Notepad.

  1. Open or download phonetic symbol .sbl file (New Window)
  2. Find the location of your eloq.sbl file. Mine was in the the following path on my C hard drive:
    C:\Users\All Users\Freedom Scientific\Jaws\13.0\Settings\enu\eloq.sbl
  3. Make a (second) copy of this file and rename as eloqOld.sbl. This is your backup in case something goes wrong.
  4. Make a third copy and rename it as eloqNew.sbl. This is a temporary file to edit since you may not be able to directly edit eloq.sbl.
  5. Open eloqNew.sbl in a text editor such as Notepad. This file contains pronunciation values for multiple languages. Scroll to the language you normally use (e.g. "[American English]"
  6. Scroll to the end of the symbol list for that language.
  7. Copy and paste the list of symbols from one of the other .sbl files immediately after the final line in the list. Each symbol will be in a single line and have the format U+0001=character name
    Note: Don't worry if the format does not match the rest of the symbol list.

  8. Repeat the last step for each language you want to support. You can translate character names as needed for each language. Save and close file.
  9. Exit JAWS if it is open.
  10. Delete eloq.sbl. You may be asked for an admin password at this point.
  11. Rename eloqNew.sbl as eloq.sbl.
  12. Restart JAWS and test on a page such as IPA Characters based on Letter A with Numeric Codes

Look Up Additional Codes

Each line in the SBL file has this format:

U+Codepoint=Character Name (no quotes)

For instance, if I wanted to expand the repertoire of currency symbols to include the new rupee symbol of India (₹), I would add the following to my .sbl file

U+20B9=Rupee symbol of India

A list of Unicode charts with code points is available at http://www.unicode.org/charts

Categories:

Testing Some MP3 Sites with Halfaxa Titles

|

Unicode is such an esoteric subject, you sometimes wonder who's seeing the possibilities. One artist who does appreciate is Canadian electronic musician Grimes whose album Halfaxa contains song titles such as "ΔΔΔΔRasikΔΔΔΔ", "Sagrad Прекрасный", "† River †", along with the charmingly titled "World♡Princess" and the mathematically complex "≈Ω≈ω≈ω≈ω≈ω≈ω≈ω≈ω≈" (Αlmost Omega?)

That makes this album a great test case to check out how well your MP3 or streaming service does with Unicode. As you can see below, iTunes and Rhapsody do well, but for some reason Amazon is giving me the Unicode question mark of death (my guess it's because the page specfies Verdana which doesn't have all the characters).

I haven't tested every music site, but you get the idea...

iTunes Halfaxa List

Halfaxa album list on iTunes with correct symbols

Rhapsody Halfaxa List

Halfaxa album list on Halfaxa with correct symbols

Amazon Halfaxa List (Verdana Type)

Halfaxa album list on Amazon with ?? for symbols

Categories:

Converting Numeric Entity Codes Back to Text

|

I got a technical question recently which I thought to share.

Not so long ago in the history of Web Development, the safest way to display non-Western text was the use of numeric entity codes. For instance, one course management system would convert Cyrillic text like Україна (Ukraine) to a series of numeric codes like:

Ч&#x;країла

This is fine for single words and small phrases, but it's bad for an entire page...especially if you want to edit it.

Fortunately, there is a quasi-fix for this if you need to replace numeric codes with real text. That is:

  1. Open your page in a browser which does render the entity codes as the correct text.
  2. Copy displayed text and paste it in another file. It will be rendered as text.
  3. Put the text back into your HTML source.

It's a little tedious, but since I couldn't quickly find a better tool for this, it is a decent stop gap. At least you won't have to re-type everything...

Categories:

Unicode and WCAG 2.0 (Accessibility)

|

Unicode is incorporated into multiple standards such as RSS (newsfeeds), MathML and other standards. Unicode is also incorporated into the newest WCAG 2.0 (Web Content Accessibility Guidelines) standard in some interesting ways.

Text not Image

One guideline in particular of interest is Guideline 1.4.5:

WCAG Guideline 1.4.5 Images of Text: If the technologies being used can achieve the visual presentation, text is used to convey information rather than images of text except for the following:

In other words, it is generally better to use CSS+actual text to present textual information, even when it is stylized. Unicode is especially important for doing this especially for characters beyond ASCII or Latin-1.

There are two reasons for this guideline. First is that if a screen reader has text available, the developer does not need to include any additional information such as an image ALT tag. The other is that text tends to be more flexible across devices. It particular, it can be zoomed without being rasterized (appearing jagged at large sizes) and it can have its format changed without information loss (say flipping from black text on white to white text on black - a format preferred by some users).

Right to Left Marker

A second relevant guideline is:

WCAG Guideline 1.3.2: When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined. (Level A)

An important concept for RTL (right to left languages) is ensuring that text remains in logical order so that characters are in their correct linear order, even if they are presented "backwards" from the more common LTR order. WCAG threrfore also recommends logical order and mentions the Unicode RLM (right-to-left marker) and LRM characters

Language Tags

A final i18n technology mandate of the WCAG 2.0 is the use of language tags.

WCAG Guideline 3.1.1: Language of Page: The default human language of each Web page can be programmatically determined.

In other words, use language tags to identify page language. This is especially important for screen readers which need to switch pronunciation engines between languages.

There have been several debates about the utility of WCAG 2.0, but I can rest assured that at least the needs of multilingual users have been considered.

Categories:

Math+HTML 5 in 3 Browsers

|

As you can see I haven't been posting here regularly. It's because I've been tied up with a11y (accessibility) including MathML.

However, I am happy to report that I was able to create a HTML5+MathML file that works in Internet Explorer AND Firefox/Safari (with some Unicode thrown in).

As a reward, I think I will write a Unicode post today.

Categories:

Unicode 6.1 Additions

|

The Unicode standard was just updated to version 6.1, and that means new blocks and characters.

New Blocks

Blocks added included Miao (script developed for Hmong/Miao languages), Merotic Heiroglyphic & Merotic Cursive (adaptation of Egyptian heirogphys from ancient Meroë in what is now Northern Sudan) and multiple scripts from India (Sora Sompeng, Chakma, Sharada, Takri).

Two new blocks for the Arabic script were also added - Arabic Mathematical Symbols and Arabic Extended -A. Extensions for the Sundanese and Meetei Mayak scripts were also added.

New Characters

The Unicode Consortium has an index of which new characters have been added to different scripts.

Categories:

Unicode 3Play - Yammer, Google Earth, iBooks Author

|

I was so excited to get some time to test new tools that I tested three for basic Unicode support. My updates:

Yammer - Pass

Yammer is a service similar to Twitter but with more tools suitable to a corporate environment. I posted some text with obscure phonetic characters and some Devanagari, and results were generally good.

This was done on a Mac via the Web site interface and via the desktop client using. It seemed low fuss enough that I suspect support is good in most configurations. Note that third party clients are always an unknown. For instance, although Twitter also has excellent Twitter support some of the third party viewers was pretty bad.

iBooks Author - Pass

Most apps from Apple have good Unicode support and this is no different. My only concern here is font control. It looks like you can define new styles based on pre-existing formatted text, but can't really edit existing one.

One non-Unicode gripe is that some styles had small caps and I was not able to disable that. It may not be a show stopper in most docs, but not all scripts include small caps (or even distinguish capital/lower case).

I gather that the format generated is a form of XML (per Alan Quarterman) with HTML features and CSS....but the CSS is hard to directly edit. Whenever you leave the Western alphabet with few controls over font presentation, it's time to be nervous.

Google Earth - Pass, but slightly Tempermental

For the record, I am in love with Google Earth as a teaching tool. However, entering data was tricky.

The keyboard methods seem to work fine to enter text for items such as new locations, and so forth. However I had problems with using the Character Viewer (OS X 10.7). I would double click the symbol and nothing would happen ;(. Then again, it could be the new Character Viewer although it seems to be OK with Yammer.

In any case, this could be an issue if a user is trying to use a cute emoji symbol. Cut and paste from another document did appear to work.

Categories:

Understanding the Character Viewer in OS X 10.7, Lion

|

For Unicode fans, one of the bigger and more useful changes in the Mac 10.7 Lion operating system is the updated Character Viewer. Despite improvements though, it's different enough that I think some documentation is work posting.

How to Access

As with previous versions Character Viewer is activated in the Language & Text section of the System Preferences menu. See http://tlt.its.psu.edu/suggestions/international/keyboards/charpalosx.html for details.

How to Find Characters

By default, the Viewer only gives options for Symbols (e.g. "Arrows, Punctuation, Currency Symbols, Emoji"), I actually like how the symbols have been organized into semantic groups rather than by numeric block. However, the list does not include all the scripts I need to access.

Fear not though - the other block are availables. To view other blocks click the Gear icon, then select Customize List. This opens a pop-up window which provides a list of all available symbol lists, including Unicode, which is the entire list organized by Unicode block.

Character Viewer with Customize options open and multiple scripts checked

How to Insert

The basic mechanism is to select the character you want to highlight. In previous versions of this tool, there was an Insert button, but this has disappeared.

In this version, you need to do the following.

  1. Place your cursor at an appropriate insertion point in your document.
  2. Open the Character Viewer.
  3. Find and highlight your character.
  4. Double click the character. It will be inserted into your document.
    Note: You can also drag and drop characters into your document.

Favorites

A feature that I am now using is the Favorites list, a place to list commonly used characters to insert. This puts everything in one list which you can order as needed.

You can add Favorites by selecting a character and either either clicking the Add to Favorites button or dragging it into the Favorites list.

Conclusion

Although I was a little confused by the new Lion Character Viewer, I actually think it was a good overhaul. It suits the average user who needs access to common symbols and emoji, but allows us more dedicated users access to what we need.

A final improvement worth noting is that the Font Variation feature is much more stable than in previous features. This is perfect for times when you need to debug a weird character/font combination.

Categories:

Free ErlerDingbats Unicode Font for 2700 Block

|

If you've ever wanted Unicode support for snowflakes, decprative arrows, crosses and stars, then you may be interested in the free Erler Dingbats font from the Font Shop. The fonts even ship with keyboard layouts to make data entry easier

The image below shows roughly the glyphs covered (generally in black and white). There are more characters covered in the for-fee font DD Dingbats 2.0, but even these provide some interesting possibilities in terms of documentation and even fancy bullet lists (especially if combined with font embedding.

Unicode Block UTF+2700-27BF

Categories:

Got Double Hypens from Word?

|

Unicode hasn't been part of my life enough recently, but it did emerge in a very unexpected way this week to during a recent calendar upgrade.

One of the conversion tasks was for us to add group e-mail addresses so we could share calendars among each other efficiently. But when I tried to copy and paste, I got a "not found error." Here is one of these addresses (altered for security reasons):

umg-sc.foo.staff@fuyu.ucal.psu.edu

Can you spot the problem (HINT: Try cutting and pasting into a text file).

Given up? The problem is the hyphen. In the right font, you will see that it's not just a hyphen (U+002D or ASCII #45), but actually the more elegant and slightly longer en dash which is actually U+2013 (not in ASCII). As many of you know, many databases are still sensitive to differences, so a hyphen is just not the same as an en dash. Theis means searching is a FAIL.

How did the en-dash get in there if it's outside of ASCII? My guess is that it's a result of an auto-correct feature from Word which makes some formatting tweaks to enhance visual appeal. One is to change plain hyphens into a slightly longer en-dash (more favored by typographers).

Another common change is to convert plain straight quotes (" at U+0022 or ASCII #34) to "Smart Quotes" like (“ at U+201C) and (” at U+201D). Copying HTML code attributes from Word can be similarly dangerous since HTML recognizes plain quotes, but NOT fancy double quotes. Most of the time, the change does nothing, but when it comes to interacting with some systems, the reformatting makes a difference in a very annoying way.

How to catch it? In some cases, you can change the font, but many fonts make the dash and en-dash appear identical (Arggh!). Which leaves the old standdy (test,test,test) plus some Unicode awareness (which is increasing among programmers).

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments