(X)HTML Markup: March 2012 Archives

Converting Numeric Entity Codes Back to Text

|

I got a technical question recently which I thought to share.

Not so long ago in the history of Web Development, the safest way to display non-Western text was the use of numeric entity codes. For instance, one course management system would convert Cyrillic text like Україна (Ukraine) to a series of numeric codes like:

Ч&#x;країла

This is fine for single words and small phrases, but it's bad for an entire page...especially if you want to edit it.

Fortunately, there is a quasi-fix for this if you need to replace numeric codes with real text. That is:

  1. Open your page in a browser which does render the entity codes as the correct text.
  2. Copy displayed text and paste it in another file. It will be rendered as text.
  3. Put the text back into your HTML source.

It's a little tedious, but since I couldn't quickly find a better tool for this, it is a decent stop gap. At least you won't have to re-type everything...

Categories:

Unicode and WCAG 2.0 (Accessibility)

|

Unicode is incorporated into multiple standards such as RSS (newsfeeds), MathML and other standards. Unicode is also incorporated into the newest WCAG 2.0 (Web Content Accessibility Guidelines) standard in some interesting ways.

Text not Image

One guideline in particular of interest is Guideline 1.4.5:

WCAG Guideline 1.4.5 Images of Text: If the technologies being used can achieve the visual presentation, text is used to convey information rather than images of text except for the following:

In other words, it is generally better to use CSS+actual text to present textual information, even when it is stylized. Unicode is especially important for doing this especially for characters beyond ASCII or Latin-1.

There are two reasons for this guideline. First is that if a screen reader has text available, the developer does not need to include any additional information such as an image ALT tag. The other is that text tends to be more flexible across devices. It particular, it can be zoomed without being rasterized (appearing jagged at large sizes) and it can have its format changed without information loss (say flipping from black text on white to white text on black - a format preferred by some users).

Right to Left Marker

A second relevant guideline is:

WCAG Guideline 1.3.2: When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined. (Level A)

An important concept for RTL (right to left languages) is ensuring that text remains in logical order so that characters are in their correct linear order, even if they are presented "backwards" from the more common LTR order. WCAG threrfore also recommends logical order and mentions the Unicode RLM (right-to-left marker) and LRM characters

Language Tags

A final i18n technology mandate of the WCAG 2.0 is the use of language tags.

WCAG Guideline 3.1.1: Language of Page: The default human language of each Web page can be programmatically determined.

In other words, use language tags to identify page language. This is especially important for screen readers which need to switch pronunciation engines between languages.

There have been several debates about the utility of WCAG 2.0, but I can rest assured that at least the needs of multilingual users have been considered.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments