Encoding Theory: January 2010 Archives

Language Tagging and JAWS: How to return to English?



I am not seeing other reports of the JAWS quirk reported in this entry. It is based on hearsay from a JAWS user, although one who is fairly tech literate. Hopefully, the point is moot, but since information is so spotty, I am leaving this entry up for now.

Original Article

Unicode and accessibility should be natural partners, but sometimes the tools get a little confused. Take language tagging for instance....

Language tagging identifies the language of a text to search engines, databases and significantly, screen reader tools used by those with severe visual impairments. The newer screen readers can switch pronunciation dictionaries if they encounter a language tag. Language tagging syntax, as recommended by the W3C for HTML 4 works as follows:

  1. Include the primary language tag for the document in the initial HTML tag. For example, an English document would be tagged as <html lang="en">
  2. Tag any passages in a second language individually. For instance, a paragraph in French would be <p lang="fr"> while a word or phrase would be <span lang="fr">.

The idea though is that once you exit the passage tagged with the second language code, you should assume that the language is back to the primary language. Unfortunately, a comment I heard from a JAWS user was something like "The lang tag works, but developers forget to switch back to English." When I asked him for details, he indicated that an English text with a Spanish word makes the switch in pronunciation engines, but then remains in Spanish mode for the rest of the passage.

What I interpret from this is that the JAWS developers are assuming that there should be a SECOND LANG tag to return the document back to the primary language. So we have two syntax schemes:

What W3C Expects

Text: The French name for "The United States" is Les États Unis, not Le United States.

Code: <p>The French name for "The United States" is <i lang="fr">Les États Unis.</i> not <i>Le United States.</i></p>

Note that the only LANG tag is the one for French Les États Unis with the assumption that the document contains a <html lang="en"> specification which applies to the entire document.

What JAWS Wants

As I indicated earlier, it appears that if this code is parsed by the JAWS screen reader, it would remain in French mode even after Les États Unis was read. I am not sure what the syntax would be, but I'm guessing something like this:

Code: <p>The French name for "The United States" is <i lang="fr">Les États Unis.</i> <span lang="en">not <i>Le United States.</i></span></p>

Now there is a second English LANG tag whose domain is the rest of the sentence. I am assuming that JAWS would remain set as English thereafter. In this scenario, I am also guessing that what the JAWS programmers did was to set the switch in pronunciation engines to be triggered ONLY by a language tag - which would explain why it didn't switch back to English in the previous code.

What the W3C is expecting though is that tools should be sensitive to domains of language tags and know to switch back to English when the appropriate end tag is encountered. It's more difficult to program, but it CAN be done.

The Coding Dilemma

So here's the coding dilemma developers face: Do they code to the declared and accepted W3C standard or do they code for JAWS? Of course, the JAWS community would like developers to code for JAWS (after all the person I was speaking with was convinced the problem was developer cluelessness, not bad JAWS standards implementation).

The problem is that this approach perpetuates the more bloated code standards were supposed to streamline. Essentially, you are coding for a specific Web browser just like those developers who only code for Internet Explorer. It's an appealing short term solution, but in the long run counter-productive. This is why even Web-AIM (Web Accessibility group from Utah State) recommends NOT coding for the quirks in JAWS or user agents.

Besides, we can always hope this quirk will be fixed in a future release of JAWS.

Did I Mention Unicode Above 255?

I've also heard rumors that JAWS may read some Unicode characters above 255 as just the Unicode code point. Thus ∀ ("for all" or the upside-down A symbol) might be read as "2200" or "U+2200". There are special .sbl symbol files you can install in JAWS, but it would be nice if the process were a little more transparent. I feel it's the equivalent of Apple or Microsoft not providing any default fonts for non-Western European language...


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments