Performance Assessment for Science Teachers

In elementary and some secondary classes such as Spanish 1. algebra, U.S. history, or biology, there is a great amount of material to be memorized. In order to make sense out of their world and to learn more advanced subject matter, students must memorize names, dates, formulae, relationships, etc. However, most science teachers want their students to do more than memorize and recall information. Though it is very important in almost all classes for students to learn information well enough to remember it for future use, this is not the only important goal.
This chapter works under the assumption that information memorized and recalled will be used by students in their schooling and later as adults in several important higher-level thinking activities. The emphasis here will be on developing test items which are objective in nature, which can be reliably scored, and which will measure students' higher level thinking abilities.
![]()
When you finish this lesson you should be able to:
Higher-level thought processes are classified in several different ways. Benjamin Bloom's taxonomy is an example. What follows is a first attempt of the author at a more inclusive list of thought processes that are used in learning and doing science. You should certainly be able to improve on this attempt. It will, however, give you a list from which you can choose kinds of items which would be appropriate for the subject matter and the skills that you want the students to master in the specific classes that you teach.
Good Thinkers do most of these well:
Analyzing
Synthesizing
Interpreting
Elaborating
Simplifying
Evaluating
Questioning
Integrating
Seeing Implications
Relating to
Inferring
Comparing
Inventing
Generating alternatives
Generalizing
Making formulas
Making models
Thinking metaphorically
Identifying patterns
Decoding
Classifying
Critical Listening
Inventing
Observing
Summarizing
This kind of test item consists of several objective questions about a common set of data. See example 1 below:
Example 1:
One of the methods formerly used by geologists to determine the age of the earth was a
calculation based on the amount of salt (NaCl) in the ocean, and the amount added to ocean
waters each year by the rivers that empty into the ocean. If this method of age
determination is used, certain assumptions must be made. Items 56-62 consist of a number
of assumptions. Classify each assumption as:
56. The salt concentrations of the oceans is gradually increasing. (a) |
An interpretive exercise consists of a set of data or information, which we'll call a display, followed by a series of problems or questions having answers that are dependent upon the information given.
Prose paragraphs
Numerical data
Charts, graphs, diagrams, or maps
Pictures, drawings, or photographs
Cartoons or caricatures
Lists of words or symbols
Mathematical formulas
Musical scores or excerpts
Audio or video recordings
Poems, short stories, or essays
Articles from newspapers, magazines, or journals
Quotations, adages, or scriptures
Specimens (rocks, plants, animals, chemicals, art, etc.)
The questions or problems which accompany the display may be presented in one or more of the following formats:
Short answer or completion items
Alternative response questions
Matching exercises
Multiple-choice questions
Essay questions
Example 1 above uses questions in a matching format; multiple choice questions are used in the following example.
| I | Physics | Standard: 02 |
| Objective: .01 Analyze the motion of, and with, a system | ||
| ILO: 2a. Identify variables and describe relationships between them 2e. Analyze data a draw warranted inferences. 4d. Recognize the personal relevance of science in daily life 6a. Use the language and concepts of science as a means of thinking and communicating. |
||
One of the easier rides at a local amusement park is the seam engine driven train. The train follows the track shown in the diagram below. The total length of the track is 1.4 miles. It takes the train 12.0 minutes to cover the length of the track. The train takes 4 sec from stop to reach its speed which it then maintains through the entire course until it takes 6 sec to stop back at the station.

Answer the following questions as they relate to the above:
Correct Answers: 1) d, 2) a, 3) c, 4) c.
There are two major tasks in constructing interpretive exercises: (1) the selection of appropriate introductory material, and (2) the construction of a series of dependent test items. Special care must be taken to construct test items which require an analysis of the introductory material. The following suggestions will aid in constructing interpretive exercises of high quality:
An essay question may be defined as an item which requires an original thoughtful response composed by the examinee, in the form of several sentences.
To test your understanding of what a good essay item is, mark the test items below which satisfy the criteria listed above for an essay item?
| YES YES
YES
|
NO NO
NO
|
1. 2.
4.
|
List the three ways of testing for a starch. What is meant by the statement "all physical and chemical changes are accompanied by changes in energy?" Tell what you know about active transport. How do you feel about the position that the US Government seems to be taking on environmental issues? Why is the following assumption so important to our understanding of geologic history? |
Here is what we think of the five test questions above.
Item one. NO. It requires the student to remember three procedures. "The" in this item suggests that this item is a recall item. You would expect the answer key to list three procedures; anyone could score this item.
Items two and five. YES. These seem to meet all of the criteria for an essay item.
Item three. NO. "Tell what you know" allows the student to write almost anything for full credit.
Item four. NO. This one asks for an opinion. This is not well written. It could become a good essay task if the respondent were asked to take a position on the environment and defend it.
The following are some objectives for which essay items are appropriate.
Essay questions have two advantages over objectively scored items:
Along with these advantages are several limitations or misconceptions:
![]()
A well formulated essay question should first delimit the scope of the content to be covered. Having said that, you need to understand that sometimes, English teachers will use essay questions to test students ability to write. When this is the case, the content doesn't need to be narrow or limited in any way. A teacher could, in fact, give a student an assignment to write on any one of several topics and let the student choose the topic, but only if the teacher's purpose is to measure ability to write well. If the teacher's purpose is to measure any of the abilities listed earlier in this lesson, then the content should be limited.
Some years ago a Peanuts cartoon showed Linus sitting at a desk holding a document which read, "History Test: Explain World War II, use both sides of the paper if necessary." This is so obviously impossible that it became a joke. Many teachers, however, do exactly the same thing in the questions they give students.
When a teacher says in an essay item, "Contrast erosion with vulcanism," the teacher has given the student a task so broad that it's going to be impossible to score correctly and it will be impossible for students to respond the way the teacher intended. Think of some ways that the question "Describe erosion." could be limited in scope. One way to limit the question might be to ask, "Explain the effects of erosion on newly formed mountains." The question would even be better if it were worded, "Explain how glaciation helps form U-shaped canyons."
Do you see that by limiting the scope, you can still test students' understanding of erosion, and by limiting it, you make it a do-able task for the student. In addition, and this may become more clear later in this lesson, by limiting the scope of the question, you are able to better prepare a scoring guide. When the scope is broad, then the teacher invites students to write on any one of a myriad of subtopics and the teacher should give full credit if the job is well done. If you want the student to speak about glaciation as a form of erosion, then say so. Don't leave that up to the student to guess.
Another example:
* Write something. Ridiculous, of course
* Write something about cell reproduction. Just as ridiculous
* Describe the process of mitosis. Better.
* Describe the process of mitosis from interphase to telophase. Better still.
* Describe the function of the centromeres in mitosis. Much better.
Put yourself in the place of a student. Don't you think that the last two identify the topic more clearly?
Another guideline for establishing well-formulated essay questions is to "define the students' task as clearly, specifically, and completely as possible." Words or phrases such as analyze, classify, evaluate, interpret, explain why, justify the use of, cite examples of, give reason to support, predict what would happen if, are appropriate to help the student identify the task you want him or her to address. Avoid using such verbs as "discuss," "comment on," "elaborate on." The dictionary meaning for discuss, for an example, suggests that this is a verbal discourse with another person. You certainly don't want that to occur as a result of your essay item. Instead of telling a student to discuss, or to comment, or to elaborate, tell them more specifically what you want them to do. The following definitions of key verbs may help.
The next guide to preparing essay questions is a rule found in most textbooks on item writing which is violated by most public school teachers. Avoid using essays to measure learning outcomes that can be better measured by objectively scored items. Use essays to measure ability to synthesize, integrate, speculate and perform other "high level" thinking tasks. The reason we don't use essays to measure recall is because essays are so difficult to score. If your purpose is to find out what students have remembered, then use a multiple-choice or a matching, or a short answer question and then phrase it so the student knows that the task is to recall certain information.
Another rule for using essay items is to use several relatively short essay questions rather than one long one. This rule exists to help you who must score students' responses. It will be easier to prepare scoring guides for several short items and your scoring will be much more reliable than on one long item. Additionally, using several short questions allows you to measure more high level thinking abilities and more of the subject matter covered in your instruction.
For each essay item, you should tell the student the point value possible and the approximate time limit students should observe in responding. This, of course, will help students know the relative importance you place on each item and will allow them to portion their energies and their time accordingly.
Three other rules apply. Before administering the question, always write a model answer or a scoring guide--an outline or list of important elements that should be included in an ideal answer. Before administering any of the questions you have written, ask at east one of your colleagues to review critically each question and your proposed scoring guide in light of the objective you're trying to measure. Revise each question and proposed answer on the basis of suggestions obtained. Finally, after administering the test, carefully review the range of answers you receive and the manner in which students appear to have interpreted your question. Make whatever revisions are necessary to improve the question for future use.
Consider doing any or all of the following to help students succeed on essay exams:
There are two methods commonly used to score, or grade, essays. The first is the holistic method.
EXAMPLE:
To use this approach, rapidly read each essay and sort them into ordered piles or categories representing different degrees of quality before assigning any points to individual papers.
Before any points are assigned, each essay should be read a second time to verify that all of the essays within each file are of similar quality. If necessary, the categories can be subdivided.
All essays within each category are then assigned the same points.
| GOOD | Very Good Good Not as Good |
| MEDIOCRE | Better than acceptable Acceptable Not as acceptable |
| POOR | Not as poor Poor Verypoor |
The analytical method, is one preferred by your instructor.
as--
Accuracy of factual information
Pertinent examples
Relevant reasons
Unified organization
Etc.
Note:
This method may produce more reliable results than holistic grading, but it is generally more difficult and time consuming to use.
However, it permits the teacher to give each student feedback which pinpoints specific merits and shortcoming in the student's essay.
There are two ways to score items analytically. The first method uses a rating scale. See the "Scoring Guide For A Persuasive Letter" which follows. The elements that the teacher wants to find in the persuasive letter are listed and then there is room for a rating of 1 to 5 for each element. In the example before you, the teacher has also left room for formative comments to be given back to the student. It may be that in using the scoring guide, all of the criteria should not receive the same point value. In this case, on the guide, indicate the relative value which can be assigned to each of the criteria. For instance, on "writer's position," it may be that top score would be 5. On "main ideas clearly stated and relevant," you could award scores from 1 to 10, this being, in your mind, much more important than other criteria. "Spelling and grammar," the last criterion listed could be given no points or more points depending on your intention.
1 = Unsatisfactory and 5 = Exemplary
Name: _________________________
| Criteria | Score | Comment |
| The writer's position is evident The main ideas are clearly stated and relevant There is supporting detail for each main idea Arguments are based on scholarship rather than emotion Arguments are based on logical assumptions Arguments are to the point and do not digress The closing statement is very strong There is a clear sense of the specific audience for this writing Vocabulary is well chosen for this argument and audience Mechanics such as spelling, grammar, etc., are correct
|
The second kind of analytical approach is called a scoring rubric, see "Organization of a Student Response" which follows. This one includes four levels of student response. Notice that one point is given for an answer that is so disorganized that you cannot understand most of the message. Total credit (4 points) is given if the organization is superior in meeting the requirements. A scoring rubric guides the eyes of the person scoring an essay response to look for the specific things that the instructor or the writer of the item had in mind. It is extremely helpful. You will quickly note, however, that preparing a scoring rubric for an item takes time and requires careful thought. Nonetheless, if the item is worth asking and if it is to be given a substantial number of points, then you should commit to preparing the item and the scoring guide in a manner that serves your purpose.
| ORGANIZATION OF A STUDENT RESPONSE | |
|---|---|
| The organization rating focuses on how the content of the message is structured. It is concerned with sequence and the relationships among the ideas in the message. | |
| 1 = | The organization is inadequate in meeting the requirements of the task.
An example is:
|
| 2 = | The organization is minimal in meeting the requirements of the task.
Examples are:
|
| 3 = | The organization is adequate in meeting the requirements of the task.
Examples are:
|
| 4 = | The organization is superior in meeting the requirements of the task.
Examples are:
|
One other skill in scoring essay responses, that a teacher needs, is the ability to identify "fluff" and "bluff." Consider the following question:
The human heart, as part of the circulatory system, is a double organ, each part with separate functions. During prenatal life, the heart has an opening between the two auricle chambers. Occasionally this opening persists into adulthood. Predict what effect this would have when exercising strenuously? (5 points possible)
Now look at the following "possible answers" and score each response. Give a score ranging from 0 to 5. A five would be a perfect answer, a zero would be a very inadequate answer. (You don't need to know human physiology to understand what I want you to do with this question. It's a question that asks the student to "predict what effect this would have when exercising strenuously.")
Number one, though brief, is a pretty good response to the question. You probably should have given it a score of 4 or 5.
Number two misses the point. Though the student has written a couple of sentences, the student has said nothing in response to the question and probably should have a 0 for a score.
Number three. Notice here that the student starts out by conning the teacher. Entirely fluff. The second part of the response is speaking to the question and could be given a 3 or a 4.
Response four. Notice what this student has done. The student does some name dropping, which shouldn't count for any points, and then restates the question. In all of these words, the student hasn't answered the intent of the question. A person without a scoring guide or without good eyes might miss this and give the student part or maybe full credit. Don't do it. The student hasn't answered the question, give this response a 0.
Response five is a pretty good response. It speaks to the question, the response is fairly well written and fairly complete, you could give this one full credit.
In summary, in scoring student responses, you will want to watch very carefully for "fluff" or "bluff." You may choose to punish the student by subtracting points or you may choose just to ignore the fluff and look for that which speaks to the question. In any case, don't reward it. If you do, you encourage it and teach students not to think carefully and to con teachers when they don't know the answer.
This final list, may serve you well as you score student responses.
![]()