Performance Assessment for Science Teachers


Background Reading


In this background section there are included several topics which must be understood if you are to accomplish the goals of the course.


General Principles of Evaluation


Measurement vs. Evaluation


Why Evaluate Students?


Self Test on Principles and Purposes of Evaluation


Directions: Choose the best answer and write it on a separate sheet of paper. Do not turn in this test.

1.

Which of the following is a primary use of evaluation for public school teachers?
  1. Evaluating ongoing programs being used in a school or district.
  2. Research on teaching methods so as to teach a future class.
  3. Improvement of learning of students.
  4. Determining student growth so as to rank them for grading purposes.
  5. Determining student growth so as to place them in special groups.

2.

A school should administer many different types of tests because
  1. Developmentally, children change as they grow and mature.
  2. Parents will object if not enough test are given.
  3. No one test can measure all the varied facets of a child's ability.
  4. Of educational and administrative needs.
  5. The more tests given, the more students will learn.

3.

I decide to use a criterion-referenced test in my mathematics class. This means my students scores will be compared with
  1. The number of items I have set as the minimum acceptable number correct.
  2. The average score made by other students in their class.
  3. The distribution of scores made by other students to establish a students rank.
  4. Their previous performance on similar tests.
  5. The average score made by all students who took the test.

4.

Which of the following is NOT a primary use of evaluation for public school teachers?
  1. As a learning activity.
  2. As a basis for assigning grades to students.
  3. To improve instruction.
  4. To determine content mastery.
  5. To establish criteria or standards for a course.

5.

Which one of the following is NOT a general principle of evaluation?
  1. Determining what is to be evaluated has priority.
  2. Evaluation techniques should relate to purpose served.
  3. Comprehensive evaluation requires a variety of techniques.
  4. Evaluation is no better than the learning activities provided.
  5. Evaluation is a means, not an end.

6.

Which of the following would be considered measurement rather than evaluation?
  1. Bill finally made first chair in the school orchestra.
  2. John's study habits are ineffective.
  3. Mary got 70% correct on the spelling test.
  4. Sam earned a "C" grade with his average of 89%.
  5. Joe is the only one in the class who reads eighth grade books.

7.

Collecting data on pupils is best defined as:
  1. Evaluation.
  2. Reliability.
  3. Norm-referenced.
  4. Measurement.
  5. Validity.

8.

What do we mean, "Measurement should not imply judgment concerning the worth or value of the behavior being tested."

Answers: 1: c, 2: c, 3: a, 4: b, 5: d, 6: c, 7: d, 8: We are suggesting that measures provide raw data; that a judgment involves a comparison of the data with something--criteria, other pupils, or past performance.

Reliability vs. Validity


What is Validity?

The Validity of a test may be defined as the degree to which a test measures what it is supposed to measure. Since validity is a matter of degree, it is incorrect to say that a test is either valid or invalid. All tests have some degree of validity for any purpose for which they are used; however, some are much more valid than others.

Although there are several types of validity, classroom teachers are most concerned with the type known as content validity. Content validity is the extent to which the test or test items are an accurate sample of the total subject-matter content. Content validity relies heavily on the preparation of good instructional objectives to define the subject-matter to be learned. Properly written objectives can serve as a guide to the construction of valid test items.

If a test looks like it measures what it claims to, the test is said to have face validity. A test with face validity may or may not actually produce data which will correspond to the learning objectives. For example, an objective may call for a student to classify four types of leaves. At first, the item may seem valid, it has high face validity. Closer attention will show that the objective requires higher levels of thinking to classify and describe each leaf. The test then does not in fact measure what it purports to, and can be said to have a low degree of content validity. No chemistry teacher would think of measuring knowledge of analytical chemistry with a test on acid rain. Nor would a biology teacher think seriously about measuring microscope skills with an essay test.

The following illustration may help you to understand validity.

If your test were 40% valid we could represent the test like this:

One part of the test would be measuring your course content and one part would be measuring something else. Ideally, the two would overlap completely, like this.

If teacher-made tests are 80% valid they can be used to help students increase their learning.

There are many factors which may make test results invalid for their intended use.

What is Reliability?

Reliability refers to the consistency of test results. If a test gives the same results when measuring an individual or group on two different occasions then the scores are reliable. If different teachers rate the same essay, for example, on the same criteria and obtain the same score then we say the scores are reliable from one rater to another. In both cases we are interested in consistency or trustworthiness. In a simple example you might consider the task of measuring the length of a room.

There are several instruments you could use. You could step off the length on two different occasions or two people could step it off. Another way would be to use a large rubber band with marks on it at one foot intervals. Again you could measure several times or several people could use the rubber band. A third choice might be to use a steel measuring tape. The tape will obviously give more consistent results; the more trustworthy results--from time to time and from measurer to measurer. Unless the measurement can be shown to be reasonably consistent over different occasions or over different samples of the same behavior little confidence can be placed in the results. The results are not reliable.

Reliability is an important consideration. It would be helpful if several teachers, each reading a book report, would give it the same score; otherwise, how can the score or the feedback notes to the students be trusted? It would be desirable if we could be sure that a test provided reliable scores of several samples of the same behavior or of a class' behavior over a given period of time.

Although reliability is a desired quality it provides no assurance that evaluation results will give the desired results. Little is gained if measures, or tests, consistently give the wrong information. Refer to our example with the steel tape measure. As reliable as it is, it is not a valid measure of room temperature. Of the two qualities, reliability and validity, validity is the more important.

There are three factors which influence the reliability of a test. They are:

Self Test on Validity and Reliability


Answers: 1: b, 2: c, 3: b, 4: a, 5: a (in this case the test would be invalid), 6a: ambiguous objectives, few items, wrong kind of items, 6b: not enough items, items too easy or too difficult, ambiguous objectives, wrong kind of items, 7: train the scorers or prepare more objectively scored items, increase the range of difficulty of items, add more valid items.

Norm vs. Criterion Evaluation


Objectives

Self-test on norm- and criterion-referenced evaluation

Objective

Self-Test on Formative vs. Summative Evaluation

               

The role of "objectives," & "intended learning outcomes"


Before we can determine the validity of a measure, or for that matter of a unit of instruction, we must know what the objectives are for the unit. Instructional objectives specify what students will be able to do when they have completed instruction. The following is an introduction to instructional objectives. When you finish you should be able to write objectives for instruction in your subject major which tell three things.

The role of inference in evaluation


Type of evaluation items


(1) characteristics, (2) uses, (3) advantages, (4) limitations, and (5) rules for construction. In this section you will study the first four ways of looking at evaluation items. You should be able to select the kind of item that best fits your programs.

Self-Test on Test-Item Determination

Constructing objectively scored items


Objectives: You should be able to --

  1. Identify violations of item writing rules when given sample objectively scored test items.
  2. Write objectively scored items which comply with the following rules.

Self-test on Objective Item Construction

Completion