RULES FOR CONSTRUCTING ESSAY QUESTIONS

The construction of clear, unambiguous essay questions that call forth the desired responses is a much more difficult task than is commonly presumed. The following rules will not make the task any easier, but their application will result in essay items of higher quality.

1. Use essay questions to measure complex learning outcomes only. Most knowledge outcomes profit little from being measured by essay questions. These outcomes can usually be measured more effectively by objective items, which lack the sampling and scoring problems that essay questions introduce. There may be a few exceptions, as when supplying the answer is a basic part of the learning outcome, but for most knowledge outcomes essay questions simply provide a less reliable measure with no compensating benefits.

At the comprehension, application, and analysis levels of learning, both objective tests and essay tests are useful. Even here, though, the objective test would seem to have priority, the essay test being reserved for those situations that require the student to give reasons, explain relationships, describe data, formulate conclusions, or in some other way produce the appropriate answer. Where supplying the answer is vital, a properly constructed restricted-response question is likely to be most appropriate.

At the synthesis and evaluation levels of learning, both the objective test and the restricted-response test have only limited value. These tests may be used to measure some specific aspects of the total process, but the production of a complete work (such as a plan of operation) or an overall evaluation of a work (for instance, an evaluation of a novel or an experiment) requires the use of extended-response questions. It is at this level that the essay form contributes most uniquely.

2. Relate the questions as directly as possible to the learning outcomes being measured. Essay questions do not measure complex learning outcomes unless they are carefully constructed to do so. Each question should be specifically designed to measure one or more well-defined outcomes. Thus, the place to start, as is the case with objective items, is with a precise description of the performance to be measured. This will help determine both the content and form of the item and aid in the phrasing of it.

The restricted-response item is related quite easily to a specific learning outcome because it is so highly structured. The limited response expected from the student also makes it possible for the test maker to phrase the question so that its intent is communicated clearly to the student. The extended-response item, however, requires greater freedom of response and typically involves a number of learning outcomes. This makes it more difficult to relate the question to the intended outcomes and to indicate the nature of the desired answer through the phrasing of the question. If the task is prescribed too rigidly in the question, the students' freedom to select, organize, and present the answer is apt to be infringed upon. One practical solution is to indicate to the students the criteria to be used in evaluating the answer. For example, a parenthetical statement such as the following might be added: "Your answer will be evaluated in terms of its comprehensiveness, the relevance of its arguments, the appropriateness of its examples, and the skill with which it is organized." This clarifies the task to the students without limiting their freedom, and makes the item easier to relate to clearly defined learning outcomes.

3. Formulate questions that present a clear task to be performed. Phrasing an essay question so that the desired response is obtained is no simple matter. Selecting precise terms and carefully phrasing and rephrasing the question with the desired response in mind will help clarify the task to the student. Since essay questions are to be used as a measure of complex learning outcomes, avoid starting such questions with "who ," "what," "when," "where," "name," and "list." These terms tend to limit the response to knowledge outcomes. Complex achievement is most apt to be called forth by such words as "why," "describe," "explain," "compare," "relate," "contrast," "interpret," "analyze," "criticize," and "evaluate." The specific terminology to be used will, of course, be determined largely by the specific behavior described in the learning outcome to be measured.

There is no better way to check on the phrasing of an essay question than to write a model answer, or at least to formulate a mental answer, to the question. This helps the test maker detect any ambiguity in the question, aids in determining the approximate time needed by the student to develop a satisfactory answer, and provides a rough check on the mental processes required. This procedure is most feasible with the restricted-response item, the answer to which is more limited and more closely prescribed. With the extended-response form it may be necessary to ask one or more colleagues to read the question to determine if the form and scope of the desired answer are clear.

4. Do not permit a choice of questions unless the learning outcome requires it. In most tests of achievement, it is best to have all students answer the same questions. If they are permitted to write on only a fraction of the questions, such as three out of five, their answers cannot be evaluated on a comparative basis. So, since the students will tend to choose those questions they are best prepared to answer, their responses will provide a sample of their achievement that is less representative than that obtained without optional questions. As we noted earlier, one of the major limitations of the essay test is the limited and unrepresentative sampling it provides. Giving students a choice among questions simply complicates the sampling problem further and introduces greater distortion into the test results.

In some situations the use of optional questions might be defensible. For example, if the essay is to be used as a measure of writing skill only, some choice of topics on which to write may be desirable. This might also be the case if the essay is used to measure some aspects of creativity, or if the students have pursued individual interests through independent study. Even for these special uses, however, great caution must be exercised in the use of optional questions. The ability to organize, integrate, and express ideas is determined in part by the complexity of the content involved. Thus, an indeterminate amount of contamination can be expected when optional questions are used. [Not all instructors agree on this item -- however, if you do allow a choice of questions, try to make the questions of similar length and difficulty that test the same type of information.]

5. Provide ample time for answering and suggest a time limit in each question. Since essay questions are designed most frequently to measure intellectual skills and abilities, time must be allowed for thinking as well as for writing. Thus, generous time limits should be provided. For example, rather than expecting students to write on several essay questions during one class period, it might be better to have them focus on one or two. There seems to be a tendency for teachers to include so many questions in a single essay test that a high score is as much a measure of writing speed as of achievement. This is probably an attempt to overcome the problem of limited sampling, but it tends to be an undesirable solution. In measuring complex achievement, it would seem better to use fewer questions and to improve the sample by more frequent testing.

Informing students of the appropriate amount of time they should spend on each question will help them use their time more efficiently; ideally, it will also provide a more adequate sample of their achievement. If the length of the answer is not clearly defined by the problem, as in some extended-response questions, it might also be desirable to indicate page limits. Anything that will clarify the form and scope of the task without interfering with the measurement of the intended outcomes is likely to contribute to more effective measurement.

 

RULES FOR SCORING ESSAY TESTS

As we noted earlier, one of the major limitations of the essay test is the subjectivity of the scoring. That is, the feeling of the scorers are likely to enter into the judgments they make concerning the quality of the answers. This may be a personal bias toward the writer of the essay, toward certain areas of content or styles of writing, or toward shortcomings in such extraneous areas as legibility, spelling, and grammar. These biases, of course, distort the results of a measure of achievement and tend to lower their reliability.

The following rules are designed to minimize the subjectivity of the scoring and to provide as uniform a standard of scoring from one student to another as possible. These rules will be most effective, of course, when the questions have been carefully prepared in accordance with the rules for construction.

1. Evaluate answers to essay questions in terms of the learning outcomes being measured. The essay test, like the objective test, is used to obtain evidence concerning the extent to which clearly defined learning outcomes have been achieved. Thus, the desired student performance specified in these outcomes should serve as a guide both for constructing the questions and for evaluating the answers. If a question is designed to measure "the ability to explain cause-effect relations," for example, the answer should be evaluated in terms of how adequately the student explains the particular cause-effect relations presented in the question. All other factors, such as interesting but extraneous factual information, style of writing, and errors in spelling and grammar, should be ignored (to the extent possible) during the evaluation. In some cases separate scores may be given for spelling or writing ability, but these should not be allowed to contaminate the scores that represent the degree of achievement of the intended learning outcomes.

2. Score restricted-response answers by the point method, using a model answer as a guide. Scoring with the aid of a previously prepared scoring key is possible with the restricted-response item because of the limitations placed on the answer. The procedure involves writing a model answer to each question and determining the number of points to be assigned to it and to the parts within it. The distribution of points within an answer must, of course, take into account all scoreable units indicated in the learning outcomes being measured. For example, points may be assigned to the relevance of the examples used and to the organization of the answer, as well as to the content of the answer, if these are legitimate aspects of the learning outcome. As indicated earlier, it is usually desirable to make clear to the student at the time of testing the bases on which each answer will be judged (content, organization, and so on).

3. Grade extended-response answers by the rating method, using defined criteria as a guide. Extended-response items allow so much freedom in answering that the preparation of a model answer is frequently impossible. Thus, the test maker usually grades each answer by judging its quality in terms of a previously determined set of criteria, rather than scoring it point by point with a scoring key. The criteria for judging the quality of an answer are determined by the nature of the question and thus by the learning outcomes being measured. If students were asked to "describe a complete plan for preparing an achievement test," for example, the criteria would include such things as (1) the completeness of the plan (for example, whether it included a statement of objectives, a set of specifications, and the appropriate types of items, (2) the clarity and accuracy with which each step was described, (3) the adequacy of the justification for each step, and (4) the degree to which the various parts of the plan were properly integrated.

Typically the criteria for evaluating an answer are used to establish about five levels of quality. Then as the answer to a question is read, it is assigned a letter grade or a number from one to five, which designates the reader's rating. One grade may be assigned on the basis of the overall quality of the answer, or a separate judgment may be made on the basis of each criterion. The latter procedure provides the most useful information for diagnosing and improving learning and should be used wherever possible.

More uniform standards of grading can usually be obtained by reading the answers to each question twice. During the first reading the papers should be tentatively sorted into five piles, ranging from high to low in quality. The second reading can then serve the purpose of checking the uniformity of the answers in each pile and making any necessary shifts in rating.

4. Evaluate all of the students' answers to one question before proceeding to the next question. Scoring or grading essay tests question by question, rather than student by student, makes it possible to maintain a more uniform standard for judging the answers to each question. This procedure also helps offset the halo effect in grading. When all of the answers on one paper are read together, the grader's impression of the paper as a whole is apt to influence the grades he assigns to the individual answers. Grading question by question, of course, prevents the formation of this overall impression of a student's paper. Each answer is more apt to be judged on its own merits when it is read and compared with other answers to the same question, than when it is read and compared with other answers by the same student.

5. Evaluate answers to essay questions without knowing the identity of the writer. This is another attempt to control personal bias during scoring. Answers

to essay questions should be evaluated in terms of what is written, not in terms of what is known about the writers from other contacts with them. The best way to prevent our prior knowledge from biasing our judgment is to evaluate each answer without knowing the identity of the writer. This can be done by having the students write their names on the back of the paper or by using code numbers in place of names.

6. Whenever possible, have two or more persons grade each answer. The best way to check on the reliability of the scoring of essay answers is to obtain two or more independent judgments. Although this may not be a feasible practice for routine classroom testing, it might be done periodically with a fellow teacher (one who is equally competent in the area). Obtaining two or more independent ratings becomes especially vital where the results are to be used for important and irreversible decisions, such as in the selection of students for further training or for special awards. Here the pooled ratings of several competent persons may be needed to attain level of reliability that is commensurate with the significance of the decision being made.

(Gronlund, Norman. Constructing Achievement Tests. Englewood Cliffs, NJ: Prentice-Hall, 1982).

ESSAY QUESTIONS

You are certainly familiar with essay questions. They consist of a statement, often several sentences long, that describes a situation and/or poses a problem. The student's task is to write an essay to answer the problem posed. This answer may be as short as a paragraph or several pages long.

The difference between short-answer and essay questions is more than just in the length of response required. On essay questions there is more emphasis on the organization and integration of the material, such as when marshaling arguments to support a point of view or method. Then, too, there is usually more than one correct answer to an essay question; that is, the problem can be approached in various ways.

Essay questions can be used to measure attainment of a variety of objectives. Stecklein (1955) has listed 14 types of abilities that can be measured by essay items:

1. Comparisons between two or more things

2. The development and defense of an opinion

3. Questions of cause and effect

4. Explanations of meanings

5. Summarizing of information in a designated area

6. Analysis

7. Knowledge of relationships

8. Illustrations of rules, principles, procedures, and applications

9. Applications of rules, laws, and principles to new situations

10. Criticisms of the adequacy, relevance, or correctness of a concept, idea, or information

11. Formulation of new questions and problems

12. Reorganization of facts

13. Discriminations between objects, concepts, or events

14. Inferential thinking.

Note that all these involve the higher-level skills mentioned in Bloom's Taxonomy.

One other aspect of essay questions should be considered: the role of writing skills. There is no question that good writing skills are helpful when answering essay questions. The question is what role they should play in evaluating responses. That is, what relative emphasis should be given to the content of the answer versus the quality of the writing exhibited. Some teachers think that primary emphasis should be given to the information and thought process shown in the answer, with writing style and skill being important only when they detract from the clarity or correctness of the answer. Other teachers think that quality of expression is more important, that a tightly organized and well-written response merits a higher grade than a response that contains the same information but is not so well written. Which approach you take depends on your own preferences and goals. In either case you should tell your students which approach you take. My philosophy is that writing skills, though extremely important, are better developed in situations other than examinations. Thus, when grading essay exams, I give primary emphasis to the correctness of the knowledge and reasoning displayed and only a minor emphasis (never more than 10% to 20%) to writing style and quality.

 

Guidelines for Writing Essay Questions

Since few essay questions can be given on any test, each item must be a good item. And because essay questions are designed to measure higher-level cognitive skills, we must be sure that the items do, in fact, tap these skills rather than just call for a series of semirelated facts. To accomplish these goals, certain guidelines should direct the writing of essay questions.

1. The question should clearly define the task A common error is to state essay questions so broadly or ambiguously that the student does not know where to begin. One of my all-time favorite items is:

Summarize the Vietnam War.

This was a 10-point 5-minute item on a ninth grade history test. How can anyone summarize the Vietnam War in an hour, let alone in five minutes? A less extreme example is:

Describe the steps in constructing norms for a test.

At first glance, this night seem like a reasonable essay question. However, contrast it with the following item.

Suppose that you were assigned the task of developing the norms for a reading comprehension test to be used in grades 4-6 in schools throughout the United States. Describe the steps that you would take in developing the norms and the procedures used in each step. Indicate what type(s) of scores you would use to express performance and why you selected this type of score.

By phrasing the question more specifically, we have outlined the students' task more clearly. In addition, the second version requires students to apply general principles to a specific situation and asks them both to describe the procedures that they would use and to justify their use. On the other hand, we must avoid making the question so detailed that it either gives away (part of) the answer and/or converts the question into nothing more than a series of short-answer items. For example:

Compare essay and multiple-choice items in terms of their: a) ease of construction, b) scoring problems, c) reliability, d) validity, e) advantages and limitations, f) appropriateness for various subjects, and g) types of cognitive skills measured.

By listing so many specific dimensions, we have told students how essay and multiple-choice items differ, rather than measuring if they know the important dimensions along which the items differ. Moreover, we have essentially asked a number of short-answer questions that do not require the student to organize or integrate the material.

On essay tests, we usually want to measure students' ability to organize, integrate, apply, and evaluate material. In most cases we also want to observe their reasoning processes as well as their knowledge of the material and conclusions. Thus good phrases to introduce essay questions include: Compare and contrast . . .; Give the arguments for and against . . .; Give the reasons for...; Explain how (or why) ...; Evaluate ...; Illustrate how (X) applies to ..., and so on. And because we want to measure students' ability to apply and use their knowledge, whenever possible essay questions should require application of knowledge, principles, and methods to new situations and examples.

2. Indicate the scope and direction of the answer required. One way to accomplish this goal is to state the questions in such a manner that the type of answer required is apparent. This, of course, is part of the reason for guideline 1. In some cases we may tell students what length answer is required.

What is the value of studying science fiction? (Give your answer in complete, correct sentences. Write at least five sentences.)

Or we could tell students how many points their answers should cover.

Describe four advantages of multiple-choice items over essay questions.

Most commonly, however, the scope and length of the answer required are indicated by the tine (or points) devoted to the question. That is, a teacher night ask students to answer four essay questions and to spend an equal amount of tine on each question. Or the test directions could indicate the relative value of each essay question.

3. Use questions that have correct answers. This guideline does not imply that an essay question should have only one correct answer; many, if not most, essay questions have more than one possible correct answer. What it does mean is that essay questions should measure knowledge and reasoning ability, not opinions or attitudes. Thus, instead of asking students:

How do you think inflation should be controlled? we would ask them:

Describe one method that has been proposed for controlling inflation. Give the reasons why you think this method would be effective or ineffective.

These questions not only allow students to express an opinion but also require them to defend their opinions. Scoring would be based primarily on the correctness of the information and the quality of the reasoning displayed, not on the stand the student took on the issue.

4. Allow for "think time." When we write an essay question, we generally have the desired answer in mind. Although we allow time for students to recall the answer and write it down, we often forget that they need time to organize and integrate the material before writing. Thus we should always allow students enough time, not only to write their answer to an essay question, but also to think about and plan their answer.

5. Use more shorter-essay items rather than fewer longer ones. There are several reasons for this suggestion. First, shorter questions are generally more specific, thus better define the task. Second, scoring problems are simplified. And third, by including more questions, you can obtain better coverage of the content domain. On the other hand, the shorter and narrower the questions, the less opportunity students will have to demonstrate their ability to integrate the material or explain their answers.

6. Use optional questions sparingly. While use of optional questions allows students to select the topics that they know more about, they present some scoring problems. It is unlikely that all questions are equally difficult or that you can grade consistently from item to item. Thus, some students may obtain higher scores, not because they knew more, but because they happened to select an easier question or one that you graded easier. Then, too, one could argue that if a question is important enough to include on a test, all students should have to demonstrate their knowledge of the material, that is, all students should answer the same questions.

7. Develop a scoring key before administering the test. The reasons for this guideline are the same as those for short-answer items-to prepare for scoring and to identify potentially confusing items.

Variations

Many types of class assignments-term papers, themes, written reports, book reviews-share many features of essay questions. They differ from essay questions given on exams in that they generally require a longer response, the choice of topics is often more flexible, and students prepare the paper outside class and thus can use various types of reference materials. However, the basic task is the same: to write an organized, coherent essay. Consequently, the construction (and evaluation) of these assignments should follow essentially the sane guidelines as essay questions given on exams.

Written reports of lab experiments and projects are also similar to essay questions. However, they usually contain another dimension, the analysis of data. Yet the written presentation requires many of the sane skills needed in writing an answer to an essay question, particularly organization, analyses, and evaluation.

Advantages and Limitations

The major advantage of essay questions is that they can tap certain abilities that cannot be effectively measured by other types of items-for example, the ability to integrate and organize knowledge, to marshal arguments in support of a position, to evaluate, and to develop new approaches to a problem. Because essay questions require relatively long answers, a topic can be probed in some depth. And most teachers find essay questions quite easy to construct, although, as we have seen, writing good essay items is not as easy as it might seen.

Essay questions also have several limitations. They are an inefficient method for testing command of knowledge. They also provide a limited sampling of the content domain, since only a few items can be included on a test. While the areas covered by these items can be probed in depth, other areas will not be covered as well or at all. Asking very broad questions only partially solves the problem, because students will necessarily focus on certain parts of the materials in their answers.

The other major problem with essay questions is scoring. Here there are several considerations. One is tine; essay questions require a long tine to score, especially if you do a conscientious job. Because students take varying approaches when answering an essay question, scoring may not be comparable from paper to paper. Then, too, there is ample evidence that grading is influenced by factors other than the quality of reasoning and information presented. For example, the length of the responses, the quality of the writing, and even legibility of handwriting are known to influence the scores assigned. All these factors combine to make grades assigned to essay responses quite unreliable. In the following chapter we discuss ways of making the scoring of essay items more objective and reliable.

Brown, Frederick Gramm. Measuring Classroom Achievement. NY: Holt, Rinehart, Winston, 1981.