When Accessibility and Linguistics Collide

| | Comments (1)

A fairly old accessibility recommendation is that you insert a tag to indicate a language change. For instance, if I were to write a Welsh sentence Hanner paint o gwrw os gwelwch yn dda (or 'Half a pint of beer please'), it would be tagged something like the code below.

Sample Lang Tag

<cite lang ="cy">Hanner paint o gwrw os gwelwch yn dda</cite>

Note that this assumes that you've included a <html lang="en-us">' tag in your code...which many systems do these days!

But someone asked an interesting question - do pop culture phrases like ¡Hasta la vista....baby! have to be tagged as "Spanish". Hmm. This isn't really an accessibility question so much as a linguistics question, which is when does a word stop being a "borrowing" and become part of the English language? It actually does happen in stages. We all realize that taco, señor, jalapeño are Spanish words but so are lasso, canyon, rodeo

not to mention Arizona, Colorado.

But we don't realize that lasso (cognate with lace from French) is Spanish partly because we have "nativized" the pronunciation (there is no "short a" in Spanish). The more recent borrowings tend to resemble Spanish a little bit more.

Is it in the Dictionary Yet?

In theory, one reason to tag a switch in language is to switch pronunciation dictionaries. Clearly this would be ridiculous for "lasso" (or "lahso" in Spanish), and might be overkill for "taco" which does have an entry in the English dictionary (I mean the official ones published used as reference materials)...and therefore is likely to be in the screen reader list of words.

Unfortunately, I don't think Hasta la vista is in the official dictionary...partly because dictionaries generally don't include phrases. The only word that will be in the English dictionary is "vista", but with its own English pronunciation which is different from Spanish. But since the English word "baby" has now intruded, you can have embedded LANG tags as in:

<cite lang="es">¡Hasta la vista....<span lang="en">baby!</span></cite>

Isn't compliance fun?

In Practice

But let's go on a cynical sidetrip here and ask...if you tag it will the tool recognize it? Screen readers...sort of...if you want it too (and know how to enable automatic language detection). Some search engines may have a better record (or they may relying on ISP). The most "robust" use is in the Word spell checker. If you have an extended text in Spanish, every word will be marked as a spelling/grammatical error until you "mark" the text as Spanish (so that Microsoft can switch checkers.) It's under the Tools » Language » Set Language menu (except for Office 2007 where it's under the Review tab.

Muy bien, but...which languages really count? We all have access to a Spanish and French spelling dictionary/pronunciation files (and German, Dutch, Italian....), but what about Welsh and Basque? There may dictionaries, but they do not come standard. You have to hunt these out and install them. Still at least they exist.

However, there are those languages without any dictionaries (the ones with about 10,000 speakers or less). Then the tagging here is really just metadata, but is it good metadata? From a linguistic point of view, the standard language codes are not very useful for detailed linguistic description.

The ISO-639 codes normally really NOT language codes in a linguistic sense, but just codes correlating to a spelling system. It matters whether it is "en-US" (USA) or "en-GB" (Britain) because the two countries spell differently (and we do have minor gramattical differences). The distinction between "en-US" (USA) and "en-PR" (English spelling in Puerto Rico) is technically there, but in practice non-existent. Most English writing in Puerto Rico is probably aimed at the U.S. standard.

As the last example shows, the original method of specifying codes just by country has some problems. Fortunately, the standards groups are working on it. Still the new codes, which may be better, are rarely recognized by the vendors (or at least there is a MAJOR timelag).

So what are we tagging and does it matter? For "major languages" yes it does matter. For lots of other uses...probably not. I could tag (and often I do), but at the end of the day, it's the visible text identifying the language/dialect that matters the most.

1 Comments

TK Lee Author Profile Page said:

(I posted my comment on my own blog for further discussion because of its slight diverted direction.) Here's the copy:

Very insightful! The examples cited are wonderful and inspiring. Your post got me think, too. Here's my 2 cents:

Referencing other languages in written text used to exist only in special domains, e.g. articles about languages, literature works about multilingual aspects of life, or cross-cultural, cross-linguistic discussion. However, on the spoken language side, considering languages and their users interact with each other, both historically and synchronically, I think the reason multilingual written text used to be unusual is just the nature of written text evolves at a more conservative rate.

We are already very used to encounter multilingual situation in the spoken domain -- native bilingual/trilingual speakers interact among themselves, communication/negotiation between native and non-native speakers, or language learning situations. In recent years, I began to see more and more bilingual writing. Email among writers whose native languages are not alphabetical, such as Chinese, has a lot of English, for the sake of easier input. The same goes with chatting and blogs. Of course, code switching is not considered formal yet -- I haven't seen any resume mixing two languages freely, except for non-translation-friendly proper nouns (such as company names, product names, chemical substances, etc.)

More and more spoken text is transcribed or annotated (for searching, archiving, etc.). It means we will surely encounter more and more code switching in written form.

Sometimes, metadata may still be linguistically valuable beyond just spell checkers. Machine translation, for one, will heavily rely on such metadata. Even for "non-major" languages of which tools are not developed yet (or will never be in the near future), the tags at least can tell the machine to skip parsing instead of spitting out errors or try to take wild guesses.

Leave a comment