Penn State University
Eberly College of Science
University Park, PA 16802
Office: 403 McAllister
Phone: (814) 865-3329
Fax: (814) 865-3735
<!- ------------------------------------------------------------------------ ->
Shannon's Experiment to Calculate the Entropy of English
by Adriano M. Garsia
Claude Shannon, the inventor of information theory, devised an experiment aimed at determining the entropy of an English letter (the amount of information in bits that we obtain on the average when we learn one letter of English). The experiment is carried out as follows. The user faces a sequence of dashes, each representing one of the 26 English letters or a space. The user is to guess the successive letters of the sentence using only the knowledge of the letters previously guessed. At each step we record the total number of guesses to reach the correct letter for that position. Shannon views the sequence of numbers of guesses as an encoding of the orginal sentence. He uses the entropy of the resulting random variable as an estimate for the entropy of an English letter.
The following applet allows you to simulate the Shannon experiment. Start typing on the keyboard your guesses for the first letter of the sentence. When you type the correct letter, the letter will appear in that position and the number of guesses will be displayed underneath it. Go on to guess the next letter in the same manner. You should press the space bar when you believe that a word has ended. Note that at the bottom of the Applet the letters of the alphabet and space appear in blue. When you guess a letter incorrectly, the computer helps you keep track of your guesses by erasing those letters from the alphabet.
When the sentence is complete or when you tire of guessing
letters and click on Entropy, the computer will output the estimate
of the entropy that results from this particular experiment.
Perhaps we should mention that large scale experiments of this kind yield that
the amount of information we gain when we learn a letter of English, within
a sentence, is aproximately 1.1 bits. We should also mention
that in a classroom of about 60 students, with everybody venturing
guesses for each next letter, we consistently obtained a value of
about 1.6 bits for the estimate of the entropy.
Download this applet for off-line viewing (includes source code)