Recently in Arabic Script Category
A while back I wrote a blog entry asking how to define the boundary between glyph variant and calligraphic art. Today I ran into the case that I really thinks highlights how complex it is.
Bismallah chracter
The Bishmallah character is in the Arabic presentation block and it visually jumped out at me because it was so complex in comparison to every other symbol. You can see the character below at 288 point, 36 point, and 14 point. My initial reaction? Wow, what a beauty.
| 288 point |
|---|
![]() |
| 36 point | 14 point |
|---|---|
Meaning
The full name of the sign in the Unicode spec is "ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM" and it is assigned to Unicode Point U+FDFD. According to my research, this phrase translates to "In the name of God (Allah), Most Gracious, Most Merciful/Compassionate" (translations vary). It begins every chapter of the Koran (Qur'an) except one and is used in prayers and is apparently used in other contexts including preambles of several constitutions in the Islamic world. Wikipedia has a good overview of the Bismallah/Basmala.
It has a deep spiritual meaning and this phrase has become the basis of many pieces of Arabic calligraphy. Since the phrase is so common in Islamic religion, it makes sense that a special sign may be needed.
Technical Challenges
Having said that, there are several technical challenges that can be considered. One is the complexity of the sign itself. As you can see in the images above, at 14 points, it looks almost like a piece of lace with none of the characters distinguishable. The structure is not really visible until the point count is in the 30s (headline size), and even then it the size should be larger to gain full appreciation of its design. It is clearly not meant to be a simple logogram incoporated into a text.
More interestingly, there are many variations in what a calligraphic Bismallah looks like. You can see examples from Flickr User Said Bak, Islam 101 and eleswhere. Some look like cirlcles others birds or fruits and many are artistic lines. Based on the examples I've seen the creation of new forms of Bismallah is a vibrant art form.
So the question is...with so many variations, which variation do you select for your font? It does look like there are standard forms (past masterpieces I am assuming). The font I used is PakType Naskh (from Pakistan) and the designers selected a semi-circular form.
At one level the technical challenge has been overcome, but it still does not answer address the question of information versus art. The symbol in the PakType font is beautiful but will future generations think that the Bismallah should have a set form or will the calligraphic tradition survive? And the tricky question - will there be variants encoded for archival purposes or where there will be one Unicode point with an infinite number of a variations. I know I do not have the answer to that one.
I was comparing notes for the Arabic block and noticed some new additions for which I was getting Unicode box of death (i.e. none of my fonts have that symbols).
Some of them are actually Arabic math symbols which were recently added. You can read about them in the proposal at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3086-1.pdf But of course I MUST find fonts to cover these extra symbols. Some of this can be handled by using different symbols when working with Arabic math text, but it's good to have a reference glyph.
It looks like the latest Unicode Symbols font has the Outlined White Star (5 points, rounded corners = U+269D).
An interesting conundrum are arrows which are designated "LEFTWARDS" or "RIGHTWARDS". If I understand the proposal correctly, it appears that the conventions for which arrow is forwards or backwards would be be reverse in Arabic, so mirroring conventions are needed when using mathematical arrows in a RTL language.
Postscript - April 16, 2009
Still hunting down Arabic fonts for some of the Unicode 5 characters, but I did find a W3C page describing Arabic mathematical typesetting.
http://www.w3.org/TR/arabic-math/. Note that the some of the code is still theoretical.
What better way to celebrate the 100th entry in this blog than with...a correction. It's a humble reminder that just because you know a lot about Unicode doesn't mean you can't mess up a crucial detail.
Way back in 2007, I posted an entry about generating Arabic (calligraphic) numbers in Microsoft Office (i.e. "١,٢,٣" vs. "1,2,3"). The entry noted that in Arabic "Arabic number" actually means Western (1,2,3) (actually called the DIGIT ONE, DIGIT TWO,... in Unicode). The term for numbers like ١,٢,٣ is actually "Hindi number" in Arabic (or ARABIC-INDIC DIGIT ONE, ARABIC-INDIC DIGIT TWO... in Unicode).
But the numbers I displayed as "Hindi/Arabic" were actually the Devanagari numbers as used in India (e.g. १,२,३). In Unicode these are called DEVANAGARI DIGIT ONE, DEVANAGARI DIGIT TWO...). Fortunately Eric Verlind pointed out the flaw, so I was able to correct the forms. Eric also pointed me to a Microsoft Digit Support page where I learned there are variations for Arabic, Persian and Urdu.
The learning never stops in Unicode world.
One set of fonts I didn't have a chance to include in last week's font wrap are Arabic script fonts. When discussing Arabic script fonts, it's important to note which language or region a font has a been design for because the requirements for writing Arabic versus Persian versus Urdu and so forth can vary (just as they do for English vs. German).
I should also comment that I am not an Arabic script typography expert, but these fonts are recommended by reliable sources. So with that warning:
Arabic
- Arabeyes.Org Fonts - Many designs. See Gallery of Unicode Fonts for font screenshots, primarilly designed for Arabic language
- Microsoft True Type Open Pack - Windows only, but many designs and support for multiple languages
Persian (Iran)
Urdu & Sindhi (Pakistan/India)
- Arabic SIL Fonts - Windows users download OTF; Mac Users download AAT
- Nafees Riqa and Nfees Nastaleeq (Center for Research in Urdu Language Processing)
- Urdu Nastaliq Unicode (Windows)
- Urdu Word Processing for Windows
- Free Urdu Unicode Fonts and Keyboards
Pashto (Afghanistan)
Uighur and Central Asian Turkic
I also generally recommend Gallery of Unicode fonts which is a Unicode font directory. The Web master very helpfully splits many of the fonts into separate language pages.
The tutorial technically Creating SVG Tiny Pages in Arabic, Hebrew and other Right-to-Left Scripts, but it actually provides an excellent explanation of how Unicode specifies text direction and how you need to encode both RTL (right to left) and LTR (left to right) in a Middle Eastern text which includes European words as BIDI (Bidirectional).
Unicode version 5.1 was recently released, and includes some new code blocks as well as new specifications. As with all new versions of Unicode there will be a time lag until the new items can be incorporated into fonts and utilities, but here is a partial list of new items
If you're interested in the new characters, the best place to view them is at http://www.unicode.org/charts/
New Plane 0 Scripts
- Cham (Cambodia/Vietnam)
- Kayah Li (Thailand/Myanmar)
- Lepcha (India)
- Ol Chiki/Santali (India)
- Rejang (indonesia)
- Saurashtra (India)
- Sundanese (Indonesia)
- Vai (Liberia)
Script Extensions
These blocks add characters to previously encoded scripts.
- Cyrillic Extended-A
- Cyrillic Extended-B
- Arabic - characters for math, 4 Qu'ranic and multiple characters for different languages
- Indic - Malayalam, Tamil character sequences, Devanagari chandra a,
Sanskrit sounds in Gurmukhi, Oriya, Telegu - Latin - characters for minority languages and capital German sharp S (rare)
- Math Symbols
- Medievalist Punctuation - for research
- Myanmar Additions
New Plane 1 Ancient Scripts and Miscellaneous Symbols
- Carian (Anatolia/Turkey)
- Lycian (Anatolia/Turkey)
- Lydian (Anatolia/Turkey)
- Phaistos Disk (Crete)
- Domino Tile Symbols
- Mahjong Tile Symbols
Note: The number forms were corrected on March 9, 2009
As I mentioned in my previous entry "Formatting Arabic Numbers", most Arabic documents include Western style "straight" numbers like 1,2,3 by default instead of "curly" Middle Eastern numbers like ١,٢,٣, but you can configure Word to generate the correct numbers.
FYI - The curly (or calligraphic) style are actually called "Hindi numbers" in Arabic, while "Arabic" numbers refer to the straight Western style (vs. older Roman numbers like I,II,III).
Note: The "Hindi" numbers used an India (i.e. Devanagari) do not match the "Hindi" forms used in Arabic writing. (Thanks to Eric Verlind for pointing this out).
Word 2007 (thanks to Katia Zakharia for details)
- Make sure you have activated an appropriate Arabic, Persian or other regional keyboard in the Windows Control Panel
- Open Word 2007, then click the circular Office icon in the upper left.

- In the new window, click the Word Options button in the lower right corner.
- Click Advanced in the left menu.
- Scroll to the Show document content section then look for the Numeral menu.
- Choose Context in the Numerals menu then close the window
Note: Do not choose "Hindi" as your option unless you want this style in all documents (including English).
- In the Word document, when you switch to an Arabic keyboard, numbers will be in the Hindi style.
Word 2003 for Windows
Instructions are available from http://www.uga.edu/islam/arabic_windows.html. Scroll to section 8c.Macintosh NeoOffice (from their support forum)
A similar option is available in the free open source NeoOffice package.- Open NeoOffice, then click Preferences in the NeoOffice menu.
- In the Preferences panel, click the arrow to the left of Language Settings to view additional options. Click the Languages link.
- Check the option for Enabled for complex text layout. A new link on the left called Complex Text Layout.
- Click the new Complex Text Layout link in the left.
- In the Numerals select Hindi.
Macintosh Word 2004
I am not aware of a similar tool in Word 2004 for the Mac. I was able to create some Auto Correct text which replaces "\3\" with ٣.The only other option it to tweak the Region settings in System Preferences, but that affects every application.
Someone on another posted an interesting question I hadn't pondered yet - why can you switch a keyboard to Arabic, Hindi or Japanese, but still end up with Western numbers?
Example Numbers
* Western (Latin or "Arabic" in Arabic) - 0,1,2,3,4,5,6,7,8,9
* Arabic (or "Hindi" in Arabic) - ٠,١,٢,٣,٤,٥,٦,٧,٨,٩ Added March, 2009
* Hindi (Arabic actually Devanagari) - ०,१,२,३,४,५,६,७,८,९
Part of the answer is that Western numbers have become a true global standard. According to this Arabeyse.Org forum post from Arfeen Serajul, many Western Arabic speaking countries like Morocco ONLY use Western numbers and are unfamiliar with what we call "Arabic numbers" (Arabic speakers call them "Hindi numbers").
But...the other part of the answer is that the numbers are really numbers. If you input numbers into a spreadsheet like Excel, you want all the calculations to be accurate. From a computing point of view, you have one number, but a variety of options of how you want to display it (with Western as the default in the U.S.)
So, to get non-Western numbers, you typically have to go into the Region settings, not the keyboard settings. Here are some sample instructions for Microsoft Windows. The big gotcha (and it's a doozy) is that you often change the number display setting ACROSS THE ENTIRE OPERATING SYSTEM.
I did experiment with displaying Arabic (Hindi) numbers, but ended up seeing them everywhere, even in English Web sites. Just a tad disorienting.
If you do need to display non-Western numbers, I would recommend doing it in just Word only (there are some options). It's still tricky though - I had to do an AutoCorrect hack in one case (e.g. \1 = १). I think I missed a step somewhere....
I just found this 2004 presentation on Persian language support from Behdad Esfabod at
http://behdad.org/download/Publications/persiancomputing/a007.pdf
Interestingly, they seemed to have surrendered to the generic Tahoma font (although maybe things have improved since then).
Arabic is enough of a challenge to work with on a computing level because it's right to left and has special ligature forms for when certain letters come together (plus consonant forms change depending on if it's at the end, beginning or middle of a word).
But wait until you hit Urdu and Persian! Now you have to work with letters not found in Arabic and a different form of calligraphy. Although modern Arabic text is based on Naskh writing, Persian and Urdu prefer Nastaliq writing.
There's actually a nice picture from the Wikipedia at
http://en.wikipedia.org/wiki/Naskh_%28script%29
The lesson for me was that every language seems to need its own special support even if its script is already "covered."

Recent Comments