Software and Unicode: December 2010 Archives

Formatting Ordered Lists

|

A topic receiving some attention in the CSS specs are how to format ordered lists across different numbering systems. Not all are supported in every browser, but a wide range are, so I thought I would present some test data.

If your browser does not support a particular list type, you will see something like "1,2,3" as bullets for the list items. If your browser supports a list, but is missing a font, you may see some Unicode question marks of death indicating to go find a font for that glyph.

Note: Test data is not complete, so a untested type may be supported in some browsers.

Supported in all browsers

Numeric
list-style-type:decimal
Capital Alphabetical
list-style-type:upper-alpha
Lower Alphabetical
list-style-type:lower-alpha
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3

Capital Roman
list-style-type:upper-roman
Lower Roman
list-style-type:lower-roman
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3

CSS 2 Firefox, Safari, Opera, Internet Explorer 8

  • These are supported in Firefox/Safari.
  • They are also supported in Internet Explorer 8, but a DOCTYPE statement must be included.
Leading Zero
list-style-type:
decimal-leading-zero

Armenian
list-style-type:
aremenian
Georgian
list-style-type:
georgian
Lower Greek
list-style-type:
lower-greek
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3

Only in Firefox, Safari

These are other list styles being proposed as well. The following styles are found in Dreamweaver CS5 and are supported in recent versions of Firefox and Safari

Hebrew
list-style-type:
hebrew

Katakana
list-style-type:
katakana
Hiragana
list-style-type:
hiragana
Hiragana-Iroha
list-style-type:
hiragana-iroha
CJK Numbers
list-style-type:
cjk-ideographic
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3

 

Only in Safari

CSS 3 includes many more specifications, particularly for Asian languages. Some are supported in in recent versions Safari like the ones below. For a complete list of propsed specifications see the W3C Specification for CSS 3 Lists.

Arabic-Indic
list-style-type:
arabic-indic
Devanagari
list-style-type:
devanagari

Thai
list-style-type:
thai
Bengali
list-style-type:
bengali
Gujarati
list-style-type:
gujarati
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3

Gurmukhi
list-style-type:
gurmukhi

Kannada
list-style-type:
kannada
Lao
list-style-type:
lao
Malayalam
list-style-type:
malayalam
Mongolian
list-style-type:
mongolian

  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3

Myanmar
list-style-type:
myanmar
Persian
list-style-type:
persian
Telugu
list-style-type:
telugu
Tibetan
list-style-type:
tibetan
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3
  1. Item 1
  2. Item 2
  3. Item 3

Categories:

Turnitin, Plagiarism and spotting Cyrillic Е for E

|

An interesting Unicode tidbit came up when I was reviewing the some literature from Turnitin.com. One article from Turitnin discuss attempted tricks to circumvent Turntin and their countermeasures. It won't surprise Unicode experts that one is to replace Latin alphabet E (i.e. our E) with a Cyrillic Е or a Greek Ε. The technical term for this is visual spoofing

In theory, the instructor will see "thе" as "the", but will actually be different enough so as to NOT be flagged. Apparently Turnitin has seen this and offers their counter measure.

One trick is to replace a common character like "e" throughout the text of their paper with a foreign language character that looks like an "e" but is actually different (for example, a Cyrillic "e"). This method does not work because our algorithms replace such characters with the corresponding standard English character. The special character will still appear in the Originality Report; however, the word it is in will have been matched against words containing every character that looks like that character. This allows us to show you matches to words with both the special character and the standard character.

Checking Outside of Turnitin

I'm relieved that Turnitin is on top of this, but there are some tricks for instructors who aren't using the service to spot Cyrillic/Greek letters masquerading as the English alphabet. Namely:

  1. Use spelling checking to find errors in words that looked correctly spelled.
  2. Switch to a decorative font which does NOT contain Greek and Cyrillic letters.

One trick is to use or turn on the visual spell check feature in Word and other tools. This the one that puts red wavy lines under a misspelling. For instance, the image below shows three versions of "Elizabeth" with Latin, Cyrillic and Greek E's. They look alike, but only one is free from the red squiggly underline - this is the one with the English E. The others are hiding Greek and Cyrillic E's and thus triggering notifications from the Microsoft Word spell check.

3 versions of Elizabeth, only center one is NOT underlined

Another trick is to switch to an unusual font. Common fonts like Times New Roman, Arial and Verdana contain Greek and Cyrillic characters, but a lot of decorative fonts are missing them. In some cases the Greek/Cyrillic E's visible in one font will be converted to a box/question mark or other weird symbol indicating that the system isn't processing the character like an English letter.

Elizabeth with Russian E with E replaced by box

In some cases though, the non-English E may be rendered in a similar font which contains that character. That's what's happening in this Comic-Sans example. If you look closely though, you will see that only the center capital E is rounded like the other letters. The non-English E's in the other two are straight up and down because they are in another font.

ElizabethComicSans.png

There are other similar tests you can perform but the upshot is that most technologies are not really set up to integrate Cyrillic/Greek text with English text, and for the savvy instructor looking for spoofing, this is a good thing.

Categories:

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments