David Little Mathematics Department Penn State University Eberly College of Science University Park, PA 16802 Office: 403 McAllister Phone: (814) 865-3329 Fax: (814) 865-3735 e-mail:dlittle@psu.edu Shannon's Experiment to Calculate the Entropy of English by Adriano M. Garsia Claude Shannon, the inventor of information theory, devised an experiment aimed at determining the entropy of an English letter (the amount of information in bits that we obtain on the average when we learn one letter of English). The experiment is carried out as follows. The user faces a sequence of dashes, each representing one of the 26 English letters or a space. The user is to guess the successive letters of the sentence using only the knowledge of the letters previously guessed. At each step we record the total number of guesses to reach the correct letter for that position. Shannon views the sequence of numbers of guesses as an encoding of the orginal sentence. He uses the entropy of the resulting random variable as an estimate for the entropy of an English letter. The following applet allows you to simulate the Shannon experiment. Start typing on the keyboard your guesses for the first letter of the sentence. When you type the correct letter, the letter will appear in that position and the number of guesses will be displayed underneath it. Go on to guess the next letter in the same manner. You should press the space bar when you believe that a word has ended. Note that at the bottom of the Applet the letters of the alphabet and space appear in blue. When you guess a letter incorrectly, the computer helps you keep track of your guesses by erasing those letters from the alphabet. When the sentence is complete or when you tire of guessing letters and click on Entropy, the computer will output the estimate of the entropy that results from this particular experiment. Perhaps we should mention that large scale experiments of this kind yield that the amount of information we gain when we learn a letter of English, within a sentence, is aproximately 1.1 bits. We should also mention that in a classroom of about 60 students, with everybody venturing guesses for each next letter, we consistently obtained a value of about 1.6 bits for the estimate of the entropy. Download this applet for off-line viewing (includes source code)