Recently in South Asian Category

JAWS 13 and Phonetic Symbols

|

A a linguist, I work with lots of exotic symbols, but only a small percentage of them are recognized by the standard U.S. of JAWS. If you work with phonetic symbols like /ə, ʃ,ʒ,ɰ/ you will need to tweak your pronunciation files.

I wrote about this in an earlier post on JAWS 6, but today I was able to document and implement, so I thought I would share the procedure.

The fix I am using will expand the symbol set within JAWS so that a character like /ə/ will be read as "schwa" (but not as its phonetic value of "uh") Ideally, it would be nice to have a word pronunciation engine so that phonetic pronunciation values are emulated, but let's take this one problem at a time.

SBL Files

JAWS includes a set of symbol or .sbl files which match punctuation and symbol characters with a "word" (e.g, ? = "question mark"). The key is to add the character and reading to your working files.

Luckily, there there is a phonetic symbol .sbl file from Robert Englebretson. There's also a math symbol .sbl file from Carroll Tech.

Add Characters to Symbol File

This procedure assumes that JAWS is using the Eloquence engine, in which case the key file to change is eloq.sbl. You will also need to have an Admin account to implement the changes.

Note: SBL files can be opened in any text editor such as Notepad.

  1. Open or download phonetic symbol .sbl file (New Window)
  2. Find the location of your eloq.sbl file. Mine was in the the following path on my C hard drive:
    C:\Users\All Users\Freedom Scientific\Jaws\13.0\Settings\enu\eloq.sbl
  3. Make a (second) copy of this file and rename as eloqOld.sbl. This is your backup in case something goes wrong.
  4. Make a third copy and rename it as eloqNew.sbl. This is a temporary file to edit since you may not be able to directly edit eloq.sbl.
  5. Open eloqNew.sbl in a text editor such as Notepad. This file contains pronunciation values for multiple languages. Scroll to the language you normally use (e.g. "[American English]"
  6. Scroll to the end of the symbol list for that language.
  7. Copy and paste the list of symbols from one of the other .sbl files immediately after the final line in the list. Each symbol will be in a single line and have the format U+0001=character name
    Note: Don't worry if the format does not match the rest of the symbol list.

  8. Repeat the last step for each language you want to support. You can translate character names as needed for each language. Save and close file.
  9. Exit JAWS if it is open.
  10. Delete eloq.sbl. You may be asked for an admin password at this point.
  11. Rename eloqNew.sbl as eloq.sbl.
  12. Restart JAWS and test on a page such as IPA Characters based on Letter A with Numeric Codes

Look Up Additional Codes

Each line in the SBL file has this format:

U+Codepoint=Character Name (no quotes)

For instance, if I wanted to expand the repertoire of currency symbols to include the new rupee symbol of India (₹), I would add the following to my .sbl file

U+20B9=Rupee symbol of India

A list of Unicode charts with code points is available at http://www.unicode.org/charts

Categories:

Unicode 6.1 Additions

|

The Unicode standard was just updated to version 6.1, and that means new blocks and characters.

New Blocks

Blocks added included Miao (script developed for Hmong/Miao languages), Merotic Heiroglyphic & Merotic Cursive (adaptation of Egyptian heirogphys from ancient Meroë in what is now Northern Sudan) and multiple scripts from India (Sora Sompeng, Chakma, Sharada, Takri).

Two new blocks for the Arabic script were also added - Arabic Mathematical Symbols and Arabic Extended -A. Extensions for the Sundanese and Meetei Mayak scripts were also added.

New Characters

The Unicode Consortium has an index of which new characters have been added to different scripts.

Categories:

i18n Enhancements Announced for Mac OSX 10.7 (Lion)

|

They're kind of scattered, but it looks like the next version of Mac OSX will be bringing lots of good enhancements for those working outside of English.

Asian Fonts and Text Input

Support for many scripts from South Asia has been lagging behind Windows, so I am personally pleased to see fonts for Bengali, Kannada, Malayalam, Oriya, Telugu and Sinhala being added (especially since I took 12 credits of Sinhala back in the day). New fonts for Tamil, Devanagari, Gujarati and Urdu are also scheduled to be added as well as for Lao, Khmer and Myanmar.

Those working with East Asian languages should be able to access improved utilities for Chinese (filtering by tones, ordering radical/stroke), Japanese Kotoeri and Vietnamese (old and new orthography). The Chinese handwriting recognition software is also scheduled to include more support for Simplified Chines and Roman characters. Finally, Apple announced that Lion will support vertical text (typing and display)

Everyone will also be able to a new color emoji font.

In Safari

Improvements for Safari included:

  • Math ML support in Safari
  • Improved CSS3 support including vertical text, East Asian emphasis, auto hyphenation

Non-English Accessibility

Accessibility options for those not using 100% English are not available include Voiceover speech in 23 languages and expanded Braille options.

Categories:

New Rupee Symbol May Now Be U+20B9

|

As mentioned in this blog's previous entry, the government of India has designated a new rupee symbol. One the the more interesting questions is how quickly it could be integrated into Unicode and other standards.

According to Live Mint.com, the symbol has been voted into Unicode at code point U+20B9. That would be in the currency block right after the Tenge currency sign. I do not see any official conformation from Unicode, but it was on the agenda for that meeting.

You can read some of the discussion on the issue from late July which includes information about multiple proposals, the source of the new rupee design and comparisons to the design of the euro (€) symbol. Fascinating if somewhat heated.

Categories:

A New Rupee Symbol

|

In case you've ever wondered whether the Unicode standard will ever be "complete", the answer is probably not. This was highlighted by the fact that India adopted a new rupee currency symbol just last month (July 2010).

Indian_Rupee_symbol, front part of capital R with 2 horizontal bars near top

Winning design by Shri D Udaya Kumar. Image from Wikimedia Commons.

Actually, the government of India sponsored a contest and got some interesting entries which you can see in the linked slideshow.

Design History

Actually, there had been a rupee symbol already () and it was in Unicode at codepoint U+20A8, but if you see the character, you'll see that it's a rather boring ligature of Western Capital R plus s. The new symbol melds Western R and Devanagari ("Ra") and adds a currency bar to boot - very clever.

Actually the rupee story gets more complex because there are "rupee" signs for different scripts/countries.

Rupee Other Scripts for Bangali, Tamil, Gujarati

The "Bengali" rupee is actually the "Taka" sign of Bangladesh, but I am perplexed by the Tamil and Gujarati versions since they would be regions of India and/or Sri Lanka. I am guessing that they are regional "informal" characters, but enough in use to be included in Unicode.

What Now

Even though the Government of India has signed off on the symbol, there's a long road ahead. There are fonts to be retooled, but the rupee sign won't be in its Unicode code point...because one hasn't been assigned to it (although they're working on it...). That means that even though this sign was born in the era of Unicode, a "legacy" pre-Unicode system will be in place which will have to be corrected later. Ah well.

Other systems that will have to be retrofitted include currency databases, Excel formatting options, and probably cash registers (at least what prints out on the receipt). And that's no doubt the tip of the iceburg. Interestingly, there are no plans to put the symbol on bills and coins, but as this Times of India piece article notes, most bills/coins don't have a currency symbol. Americans can pull out a dollar bill to check - no $ in sight.

A final comment is how speakers in non-Devanagari areas will react. The crossed bar shape actually works for many Northern Indian scripts such as Devanagari () and Gujarati but R looks very different in a lot of scripts including Tamil (Tamil R with 2 vertical lines and 1 horizontal) and others. I occasionally run into comments from Tamil writers about not assuming that Devanagari is a universal script in India. I wonder what the impact here will be.

Pictures instead of Text?

Some of you may be interested to note that the Tamil/Gujarati/Bengali text are actually images. For some reason the MT CSS is insisting on font selections and I haven't been able to override it yet, not even with !important!. Not sure how to troubleshoot, but this does not happen to me in Web 1.0...

Categories:

Entry #100 - Still More about Middle Eastern Numbers

|

What better way to celebrate the 100th entry in this blog than with...a correction. It's a humble reminder that just because you know a lot about Unicode doesn't mean you can't mess up a crucial detail.

Way back in 2007, I posted an entry about generating Arabic (calligraphic) numbers in Microsoft Office (i.e. "١,٢,٣" vs. "1,2,3"). The entry noted that in Arabic "Arabic number" actually means Western (1,2,3) (actually called the DIGIT ONE, DIGIT TWO,... in Unicode). The term for numbers like ١,٢,٣ is actually "Hindi number" in Arabic (or ARABIC-INDIC DIGIT ONE, ARABIC-INDIC DIGIT TWO... in Unicode).

But the numbers I displayed as "Hindi/Arabic" were actually the Devanagari numbers as used in India (e.g. १,२,३). In Unicode these are called DEVANAGARI DIGIT ONE, DEVANAGARI DIGIT TWO...). Fortunately Eric Verlind pointed out the flaw, so I was able to correct the forms. Eric also pointed me to a Microsoft Digit Support page where I learned there are variations for Arabic, Persian and Urdu.

The learning never stops in Unicode world.

Categories:

SALRC - South Asia Language Resource Center

|

A great resource from my library is the South Asia Language Resource Center out of the University of Chicago. They include information about the major scripts of India and neighboring countries including font information (with samples).

Address is http://salrc.uchicago.edu/.

Categories:

What's New in Unicode 5.1?

|

Unicode version 5.1 was recently released, and includes some new code blocks as well as new specifications. As with all new versions of Unicode there will be a time lag until the new items can be incorporated into fonts and utilities, but here is a partial list of new items

If you're interested in the new characters, the best place to view them is at http://www.unicode.org/charts/

New Plane 0 Scripts

  • Cham (Cambodia/Vietnam)
  • Kayah Li (Thailand/Myanmar)
  • Lepcha (India)
  • Ol Chiki/Santali (India)
  • Rejang (indonesia)
  • Saurashtra (India)
  • Sundanese (Indonesia)
  • Vai (Liberia)

Script Extensions

These blocks add characters to previously encoded scripts.

  • Cyrillic Extended-A
  • Cyrillic Extended-B
  • Arabic - characters for math, 4 Qu'ranic and multiple characters for different languages
  • Indic - Malayalam, Tamil character sequences, Devanagari chandra a,
    Sanskrit sounds in Gurmukhi, Oriya, Telegu
  • Latin - characters for minority languages and capital German sharp S (rare)
  • Math Symbols
  • Medievalist Punctuation - for research
  • Myanmar Additions

New Plane 1 Ancient Scripts and Miscellaneous Symbols

  • Carian (Anatolia/Turkey)
  • Lycian (Anatolia/Turkey)
  • Lydian (Anatolia/Turkey)
  • Phaistos Disk (Crete)
  • Domino Tile Symbols
  • Mahjong Tile Symbols

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments