A script that may not be well-known to U.S. citizens in the Canadian Aboriginal Syllabic script which is a syllabary used to write certain indigenous languages including Iñuit languages spoken in the Nunavut territory of Canada.
This script is about to appear in many more documents and signs because the Nunavut's Official Languages Act is coming in to force to promote the Iñuit languages to be official languages alongside and English.
In addition to the languages of Nunavut, this script is used in Canada to write a number of indigenous languages including Ojibwe, Blackfoot, Cree and others. In contrast, most indigenous languages in the U.S. are written in the Latin alphabet with the notable exception of Cherokee.
I'm curious if indigenous communities in the U.S. would consider adopting this script to further differentiate themselves from the U.S. If that happens, the Nunavut law should ensure that proper Unicode support is available.
If you want to replicate this you have to:
- Paste the XML in the HTML code (i.e. NOT the WYSIWYG editor)
- Make sure that the first line of the XML includes a link to the MathML namespace as follows:
- I also like to use CSS to bump the font size - those super/subscripts can get very tiny.
All I can say is - Wow. I wish all my CMS systems played this well with MathML.
P.S. If you send me comments on this blog, please note that this text may possibly depart from standard academic English. Linguists can do that, especially in a blog.
The "Tweed" column from the Chronicle of Higher Education had an amusing story of a Blackletter glyph variant glitch on the new University of Idaho diplomas (specifically "Congrabulations on Your Grabuation!")
As with many U.S. diplomas, the university name was rendered in a Blackletter (aka "Old English" or Gothic") calligraphic style font. This font though had a particularly high flourish on the lower case "v", high enough that recipients wondered if they had written a "b" instead of "v" (and who wants a diploma from the Unibersity of Iowa?).
According to the Chronicle, the administration reassured them that it was an archaic "v", but this case does highlight the legibility issues of some older manuscript fonts and the need to balance historical font authenticity with modern needs.
The Arabic computing industry has worked with a number of encoding schemes since the 1960s. The History of Arabic on Computers page lists a number of historic encodings from NCR-64 to ASMO 708 and Windows 1256.
My favorite might be an early 7-bit set which replaced the lower case English letters with Arabic letters (but kept the capiral letters). As the article notes, this worked because "Some printers were not even capable of printing lower case English letters."
It's a good thing we've moved beyond that.
A a linguist, I work with lots of exotic symbols, but only a small percentage of them are recognized by the standard U.S. of JAWS. If you work with phonetic symbols like /ə, ʃ,ʒ,ɰ/ you will need to tweak your pronunciation files.
I wrote about this in an earlier post on JAWS 6, but today I was able to document and implement, so I thought I would share the procedure.
The fix I am using will expand the symbol set within JAWS so that a character like /ə/ will be read as "schwa" (but not as its phonetic value of "uh") Ideally, it would be nice to have a word pronunciation engine so that phonetic pronunciation values are emulated, but let's take this one problem at a time.
JAWS includes a set of symbol or .sbl files which match punctuation and symbol characters with a "word" (e.g, ? = "question mark"). The key is to add the character and reading to your working files.
Add Characters to Symbol File
This procedure assumes that JAWS is using the Eloquence engine, in which case the key file to change is eloq.sbl. You will also need to have an Admin account to implement the changes.
Note: SBL files can be opened in any text editor such as Notepad.
- Open or download phonetic symbol .sbl file (New Window)
- Find the location of your eloq.sbl file. Mine was in the the following path on my C hard drive:
C:\Users\All Users\Freedom Scientific\Jaws\13.0\Settings\enu\eloq.sbl
- Make a (second) copy of this file and rename as eloqOld.sbl. This is your backup in case something goes wrong.
- Make a third copy and rename it as eloqNew.sbl. This is a temporary file to edit since you may not be able to directly edit eloq.sbl.
- Open eloqNew.sbl in a text editor such as Notepad. This file contains pronunciation values for multiple languages. Scroll to the language you normally use (e.g. "[American English]"
- Scroll to the end of the symbol list for that language.
- Copy and paste the list of symbols from one of the other .sbl files immediately after the final line in the list. Each symbol will be in a single line and have the format
U+0001=character nameNote: Don't worry if the format does not match the rest of the symbol list.
- Repeat the last step for each language you want to support. You can translate character names as needed for each language. Save and close file.
- Exit JAWS if it is open.
- Delete eloq.sbl. You may be asked for an admin password at this point.
- Rename eloqNew.sbl as eloq.sbl.
- Restart JAWS and test on a page such as IPA Characters based on Letter A with Numeric Codes
Look Up Additional Codes
Each line in the SBL file has this format:
U+Codepoint=Character Name (no quotes)
For instance, if I wanted to expand the repertoire of currency symbols to include the new rupee symbol of India (₹), I would add the following to my .sbl file
U+20B9=Rupee symbol of India
A list of Unicode charts with code points is available at http://www.unicode.org/charts
Unicode is such an esoteric subject, you sometimes wonder who's seeing the possibilities. One artist who does appreciate is Canadian electronic musician Grimes whose album Halfaxa contains song titles such as "ΔΔΔΔRasikΔΔΔΔ", "Sagrad Прекрасный", "† River †", along with the charmingly titled "World♡Princess" and the mathematically complex "≈Ω≈ω≈ω≈ω≈ω≈ω≈ω≈ω≈" (Αlmost Omega?)
That makes this album a great test case to check out how well your MP3 or streaming service does with Unicode. As you can see below, iTunes and Rhapsody do well, but for some reason Amazon is giving me the Unicode question mark of death (my guess it's because the page specfies Verdana which doesn't have all the characters).
I haven't tested every music site, but you get the idea...
iTunes Halfaxa List
Rhapsody Halfaxa List
Amazon Halfaxa List (Verdana Type)
I got a technical question recently which I thought to share.
Not so long ago in the history of Web Development, the safest way to display non-Western text was the use of numeric entity codes. For instance, one course management system would convert Cyrillic text like Україна (Ukraine) to a series of numeric codes like:
This is fine for single words and small phrases, but it's bad for an entire page...especially if you want to edit it.
Fortunately, there is a quasi-fix for this if you need to replace numeric codes with real text. That is:
- Open your page in a browser which does render the entity codes as the correct text.
- Copy displayed text and paste it in another file. It will be rendered as text.
- Put the text back into your HTML source.
It's a little tedious, but since I couldn't quickly find a better tool for this, it is a decent stop gap. At least you won't have to re-type everything...
Unicode is incorporated into multiple standards such as RSS (newsfeeds), MathML and other standards. Unicode is also incorporated into the newest WCAG 2.0 (Web Content Accessibility Guidelines) standard in some interesting ways.
Text not Image
One guideline in particular of interest is Guideline 1.4.5:
WCAG Guideline 1.4.5 Images of Text: If the technologies being used can achieve the visual presentation, text is used to convey information rather than images of text except for the following:
In other words, it is generally better to use CSS+actual text to present textual information, even when it is stylized. Unicode is especially important for doing this especially for characters beyond ASCII or Latin-1.
There are two reasons for this guideline. First is that if a screen reader has text available, the developer does not need to include any additional information such as an image ALT tag. The other is that text tends to be more flexible across devices. It particular, it can be zoomed without being rasterized (appearing jagged at large sizes) and it can have its format changed without information loss (say flipping from black text on white to white text on black - a format preferred by some users).
Right to Left Marker
A second relevant guideline is:
WCAG Guideline 1.3.2: When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined. (Level A)
An important concept for RTL (right to left languages) is ensuring that text remains in logical order so that characters are in their correct linear order, even if they are presented "backwards" from the more common LTR order. WCAG threrfore also recommends logical order and mentions the Unicode RLM (right-to-left marker) and LRM characters
A final i18n technology mandate of the WCAG 2.0 is the use of language tags.
WCAG Guideline 3.1.1: Language of Page: The default human language of each Web page can be programmatically determined.
In other words, use language tags to identify page language. This is especially important for screen readers which need to switch pronunciation engines between languages.
There have been several debates about the utility of WCAG 2.0, but I can rest assured that at least the needs of multilingual users have been considered.
As you can see I haven't been posting here regularly. It's because I've been tied up with a11y (accessibility) including MathML.
However, I am happy to report that I was able to create a HTML5+MathML file that works in Internet Explorer AND Firefox/Safari (with some Unicode thrown in).
As a reward, I think I will write a Unicode post today.
The Unicode standard was just updated to version 6.1, and that means new blocks and characters.
Blocks added included Miao (script developed for Hmong/Miao languages), Merotic Heiroglyphic & Merotic Cursive (adaptation of Egyptian heirogphys from ancient Meroë in what is now Northern Sudan) and multiple scripts from India (Sora Sompeng, Chakma, Sharada, Takri).
Two new blocks for the Arabic script were also added - Arabic Mathematical Symbols and Arabic Extended -A. Extensions for the Sundanese and Meetei Mayak scripts were also added.
The Unicode Consortium has an index of which new characters have been added to different scripts.