Turnitin, Plagiarism and spotting Cyrillic Е for E

|

An interesting Unicode tidbit came up when I was reviewing the some literature from Turnitin.com. One article from Turitnin discuss attempted tricks to circumvent Turntin and their countermeasures. It won't surprise Unicode experts that one is to replace Latin alphabet E (i.e. our E) with a Cyrillic Е or a Greek Ε. The technical term for this is visual spoofing

In theory, the instructor will see "thе" as "the", but will actually be different enough so as to NOT be flagged. Apparently Turnitin has seen this and offers their counter measure.

One trick is to replace a common character like "e" throughout the text of their paper with a foreign language character that looks like an "e" but is actually different (for example, a Cyrillic "e"). This method does not work because our algorithms replace such characters with the corresponding standard English character. The special character will still appear in the Originality Report; however, the word it is in will have been matched against words containing every character that looks like that character. This allows us to show you matches to words with both the special character and the standard character.

Checking Outside of Turnitin

I'm relieved that Turnitin is on top of this, but there are some tricks for instructors who aren't using the service to spot Cyrillic/Greek letters masquerading as the English alphabet. Namely:

  1. Use spelling checking to find errors in words that looked correctly spelled.
  2. Switch to a decorative font which does NOT contain Greek and Cyrillic letters.

One trick is to use or turn on the visual spell check feature in Word and other tools. This the one that puts red wavy lines under a misspelling. For instance, the image below shows three versions of "Elizabeth" with Latin, Cyrillic and Greek E's. They look alike, but only one is free from the red squiggly underline - this is the one with the English E. The others are hiding Greek and Cyrillic E's and thus triggering notifications from the Microsoft Word spell check.

3 versions of Elizabeth, only center one is NOT underlined

Another trick is to switch to an unusual font. Common fonts like Times New Roman, Arial and Verdana contain Greek and Cyrillic characters, but a lot of decorative fonts are missing them. In some cases the Greek/Cyrillic E's visible in one font will be converted to a box/question mark or other weird symbol indicating that the system isn't processing the character like an English letter.

Elizabeth with Russian E with E replaced by box

In some cases though, the non-English E may be rendered in a similar font which contains that character. That's what's happening in this Comic-Sans example. If you look closely though, you will see that only the center capital E is rounded like the other letters. The non-English E's in the other two are straight up and down because they are in another font.

ElizabethComicSans.png

There are other similar tests you can perform but the upshot is that most technologies are not really set up to integrate Cyrillic/Greek text with English text, and for the savvy instructor looking for spoofing, this is a good thing.

About The Blog

I am a Penn State technology specialist with a degree in linguistics and have maintained the Penn State Computing with Accents page since 2000.

See Elizabeth Pyatt's Homepage (ejp10@psu.edu) for a profile.

Comments

The standard commenting utility has been disabled due to hungry spam. If you have a comment, please feel free to drop me a line at (ejp10@psu.edu).

Powered by Movable Type Pro

Recent Comments