ELIZABETH J PYATT: March 2012 Archives

Testing Some MP3 Sites with Halfaxa Titles


Unicode is such an esoteric subject, you sometimes wonder who's seeing the possibilities. One artist who does appreciate is Canadian electronic musician Grimes whose album Halfaxa contains song titles such as "ΔΔΔΔRasikΔΔΔΔ", "Sagrad Прекрасный", "† River †", along with the charmingly titled "World♡Princess" and the mathematically complex "≈Ω≈ω≈ω≈ω≈ω≈ω≈ω≈ω≈" (Αlmost Omega?)

That makes this album a great test case to check out how well your MP3 or streaming service does with Unicode. As you can see below, iTunes and Rhapsody do well, but for some reason Amazon is giving me the Unicode question mark of death (my guess it's because the page specfies Verdana which doesn't have all the characters).

I haven't tested every music site, but you get the idea...

iTunes Halfaxa List

Halfaxa album list on iTunes with correct symbols

Rhapsody Halfaxa List

Halfaxa album list on Halfaxa with correct symbols

Amazon Halfaxa List (Verdana Type)

Halfaxa album list on Amazon with ?? for symbols


Converting Numeric Entity Codes Back to Text


I got a technical question recently which I thought to share.

Not so long ago in the history of Web Development, the safest way to display non-Western text was the use of numeric entity codes. For instance, one course management system would convert Cyrillic text like Україна (Ukraine) to a series of numeric codes like:


This is fine for single words and small phrases, but it's bad for an entire page...especially if you want to edit it.

Fortunately, there is a quasi-fix for this if you need to replace numeric codes with real text. That is:

  1. Open your page in a browser which does render the entity codes as the correct text.
  2. Copy displayed text and paste it in another file. It will be rendered as text.
  3. Put the text back into your HTML source.

It's a little tedious, but since I couldn't quickly find a better tool for this, it is a decent stop gap. At least you won't have to re-type everything...


Unicode and WCAG 2.0 (Accessibility)


Unicode is incorporated into multiple standards such as RSS (newsfeeds), MathML and other standards. Unicode is also incorporated into the newest WCAG 2.0 (Web Content Accessibility Guidelines) standard in some interesting ways.

Text not Image

One guideline in particular of interest is Guideline 1.4.5:

WCAG Guideline 1.4.5 Images of Text: If the technologies being used can achieve the visual presentation, text is used to convey information rather than images of text except for the following:

In other words, it is generally better to use CSS+actual text to present textual information, even when it is stylized. Unicode is especially important for doing this especially for characters beyond ASCII or Latin-1.

There are two reasons for this guideline. First is that if a screen reader has text available, the developer does not need to include any additional information such as an image ALT tag. The other is that text tends to be more flexible across devices. It particular, it can be zoomed without being rasterized (appearing jagged at large sizes) and it can have its format changed without information loss (say flipping from black text on white to white text on black - a format preferred by some users).

Right to Left Marker

A second relevant guideline is:

WCAG Guideline 1.3.2: When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined. (Level A)

An important concept for RTL (right to left languages) is ensuring that text remains in logical order so that characters are in their correct linear order, even if they are presented "backwards" from the more common LTR order. WCAG threrfore also recommends logical order and mentions the Unicode RLM (right-to-left marker) and LRM characters

Language Tags

A final i18n technology mandate of the WCAG 2.0 is the use of language tags.

WCAG Guideline 3.1.1: Language of Page: The default human language of each Web page can be programmatically determined.

In other words, use language tags to identify page language. This is especially important for screen readers which need to switch pronunciation engines between languages.

There have been several debates about the utility of WCAG 2.0, but I can rest assured that at least the needs of multilingual users have been considered.


Math+HTML 5 in 3 Browsers


As you can see I haven't been posting here regularly. It's because I've been tied up with a11y (accessibility) including MathML.

However, I am happy to report that I was able to create a HTML5+MathML file that works in Internet Explorer AND Firefox/Safari (with some Unicode thrown in).

As a reward, I think I will write a Unicode post today.


About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.


The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments