Recently in Arabic Script Category
The Arabic computing industry has worked with a number of encoding schemes since the 1960s. The History of Arabic on Computers page lists a number of historic encodings from NCR-64 to ASMO 708 and Windows 1256.
My favorite might be an early 7-bit set which replaced the lower case English letters with Arabic letters (but kept the capiral letters). As the article notes, this worked because "Some printers were not even capable of printing lower case English letters."
It's a good thing we've moved beyond that.
The Unicode standard was just updated to version 6.1, and that means new blocks and characters.
Blocks added included Miao (script developed for Hmong/Miao languages), Merotic Heiroglyphic & Merotic Cursive (adaptation of Egyptian heirogphys from ancient Meroë in what is now Northern Sudan) and multiple scripts from India (Sora Sompeng, Chakma, Sharada, Takri).
Two new blocks for the Arabic script were also added - Arabic Mathematical Symbols and Arabic Extended -A. Extensions for the Sundanese and Meetei Mayak scripts were also added.
The Unicode Consortium has an index of which new characters have been added to different scripts.
The latest draft of the CSS3 writing modules came out recently, and it includes revised specifications for how to handle vertical East Asian CJK text as well as specifications for RTL (right-to-left) text.
Although minimal support for RTL text has been around in recent years, vertical text remains a hurdle, partly because it's not clear which standards the vendors will agree to. The only browser I know supporting a vertical text spec is Internet Explorer, but it's layout specification was developed by Microsoft, and it does not appear that it is being adopted as is for CSS 3 (see proposed CSS 3 vertical properties for details). It also looks like a vertical text scheme for SVG is also being deprecated.
Will vertical text be possible across platforms? Only time will tell.
African languages were in the news Unicode wise because data from a number of African languages was added to the latest version of the Unicode Common Locale Data Repository (UCDL). The hope is that this repository will make it easier to provide African-language software implementations (e.g. a Windows implementation in Hausa) and data presentations (e.g. a calendar in Afar).
If you are interested in implementing content in African languages, there are a number of resources available depending on what your goals are:
These pages cover internationalization of African languages.
- Open Road African Language Guide
- African Network for Localisation - technical information and efforts to provide utilities for different languages
- PanAfrican Localisation project
- Bisharat Net - very technical information
- UCLA AFLANG Directory
- Languages of South Africa
- Cornell African Writing Systems
- Penn Languages of Africa Resources
A while back I wrote a blog entry asking how to define the boundary between glyph variant and calligraphic art. Today I ran into the case that I really thinks highlights how complex it is.
The Bishmallah character is in the Arabic presentation block and it visually jumped out at me because it was so complex in comparison to every other symbol. You can see the character below at 288 point, 36 point, and 14 point. My initial reaction? Wow, what a beauty.
|36 point||14 point|
The full name of the sign in the Unicode spec is "ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM" and it is assigned to Unicode Point U+FDFD. According to my research, this phrase translates to "In the name of God (Allah), Most Gracious, Most Merciful/Compassionate" (translations vary). It begins every chapter of the Koran (Qur'an) except one and is used in prayers and is apparently used in other contexts including preambles of several constitutions in the Islamic world. Wikipedia has a good overview of the Bismallah/Basmala.
It has a deep spiritual meaning and this phrase has become the basis of many pieces of Arabic calligraphy. Since the phrase is so common in Islamic religion, it makes sense that a special sign may be needed.
Having said that, there are several technical challenges that can be considered. One is the complexity of the sign itself. As you can see in the images above, at 14 points, it looks almost like a piece of lace with none of the characters distinguishable. The structure is not really visible until the point count is in the 30s (headline size), and even then it the size should be larger to gain full appreciation of its design. It is clearly not meant to be a simple logogram incoporated into a text.
More interestingly, there are many variations in what a calligraphic Bismallah looks like. You can see examples from Flickr User Said Bak, Islam 101 and eleswhere. Some look like cirlcles others birds or fruits and many are artistic lines. Based on the examples I've seen the creation of new forms of Bismallah is a vibrant art form.
So the question is...with so many variations, which variation do you select for your font? It does look like there are standard forms (past masterpieces I am assuming). The font I used is PakType Naskh (from Pakistan) and the designers selected a semi-circular form.
At one level the technical challenge has been overcome, but it still does not answer address the question of information versus art. The symbol in the PakType font is beautiful but will future generations think that the Bismallah should have a set form or will the calligraphic tradition survive? And the tricky question - will there be variants encoded for archival purposes or where there will be one Unicode point with an infinite number of a variations. I know I do not have the answer to that one.
I was comparing notes for the Arabic block and noticed some new additions for which I was getting Unicode box of death (i.e. none of my fonts have that symbols).
Some of them are actually Arabic math symbols which were recently added. You can read about them in the proposal at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3086-1.pdf But of course I MUST find fonts to cover these extra symbols. Some of this can be handled by using different symbols when working with Arabic math text, but it's good to have a reference glyph.
It looks like the latest Unicode Symbols font has the Outlined White Star (5 points, rounded corners = U+269D).
An interesting conundrum are arrows which are designated "LEFTWARDS" or "RIGHTWARDS". If I understand the proposal correctly, it appears that the conventions for which arrow is forwards or backwards would be be reverse in Arabic, so mirroring conventions are needed when using mathematical arrows in a RTL language.
Postscript - April 16, 2009
Still hunting down Arabic fonts for some of the Unicode 5 characters, but I did find a W3C page describing Arabic mathematical typesetting.
http://www.w3.org/TR/arabic-math/. Note that the some of the code is still theoretical.
What better way to celebrate the 100th entry in this blog than with...a correction. It's a humble reminder that just because you know a lot about Unicode doesn't mean you can't mess up a crucial detail.
Way back in 2007, I posted an entry about generating Arabic (calligraphic) numbers in Microsoft Office (i.e. "١,٢,٣" vs. "1,2,3"). The entry noted that in Arabic "Arabic number" actually means Western (1,2,3) (actually called the DIGIT ONE, DIGIT TWO,... in Unicode). The term for numbers like ١,٢,٣ is actually "Hindi number" in Arabic (or ARABIC-INDIC DIGIT ONE, ARABIC-INDIC DIGIT TWO... in Unicode).
But the numbers I displayed as "Hindi/Arabic" were actually the Devanagari numbers as used in India (e.g. १,२,३). In Unicode these are called DEVANAGARI DIGIT ONE, DEVANAGARI DIGIT TWO...). Fortunately Eric Verlind pointed out the flaw, so I was able to correct the forms. Eric also pointed me to a Microsoft Digit Support page where I learned there are variations for Arabic, Persian and Urdu.
The learning never stops in Unicode world.
One set of fonts I didn't have a chance to include in last week's font wrap are Arabic script fonts. When discussing Arabic script fonts, it's important to note which language or region a font has a been design for because the requirements for writing Arabic versus Persian versus Urdu and so forth can vary (just as they do for English vs. German).
I should also comment that I am not an Arabic script typography expert, but these fonts are recommended by reliable sources. So with that warning:
- Arabeyes.Org Fonts - Many designs. See Gallery of Unicode Fonts for font screenshots, primarilly designed for Arabic language
- Microsoft True Type Open Pack - Windows only, but many designs and support for multiple languages
Urdu & Sindhi (Pakistan/India)
- Arabic SIL Fonts - Windows users download OTF; Mac Users download AAT
- Nafees Riqa and Nfees Nastaleeq (Center for Research in Urdu Language Processing)
- Urdu Nastaliq Unicode (Windows)
- Urdu Word Processing for Windows
- Free Urdu Unicode Fonts and Keyboards
Uighur and Central Asian Turkic
I also generally recommend Gallery of Unicode fonts which is a Unicode font directory. The Web master very helpfully splits many of the fonts into separate language pages.
The tutorial technically Creating SVG Tiny Pages in Arabic, Hebrew and other Right-to-Left Scripts, but it actually provides an excellent explanation of how Unicode specifies text direction and how you need to encode both RTL (right to left) and LTR (left to right) in a Middle Eastern text which includes European words as BIDI (Bidirectional).
Unicode version 5.1 was recently released, and includes some new code blocks as well as new specifications. As with all new versions of Unicode there will be a time lag until the new items can be incorporated into fonts and utilities, but here is a partial list of new items
If you're interested in the new characters, the best place to view them is at http://www.unicode.org/charts/
New Plane 0 Scripts
- Cham (Cambodia/Vietnam)
- Kayah Li (Thailand/Myanmar)
- Lepcha (India)
- Ol Chiki/Santali (India)
- Rejang (indonesia)
- Saurashtra (India)
- Sundanese (Indonesia)
- Vai (Liberia)
These blocks add characters to previously encoded scripts.
- Cyrillic Extended-A
- Cyrillic Extended-B
- Arabic - characters for math, 4 Qu'ranic and multiple characters for different languages
- Indic - Malayalam, Tamil character sequences, Devanagari chandra a,
Sanskrit sounds in Gurmukhi, Oriya, Telegu
- Latin - characters for minority languages and capital German sharp S (rare)
- Math Symbols
- Medievalist Punctuation - for research
- Myanmar Additions
New Plane 1 Ancient Scripts and Miscellaneous Symbols
- Carian (Anatolia/Turkey)
- Lycian (Anatolia/Turkey)
- Lydian (Anatolia/Turkey)
- Phaistos Disk (Crete)
- Domino Tile Symbols
- Mahjong Tile Symbols