Teaching Development Assessment and Evaluations Notes - Assessment and Evaluations

Notes - Assessment and Evaluations

Category : Teaching

Assessment and Evaluations

Assessment is a process of obtaining information about students' learning and making value judgments about their progress. Information about students' progress can be obtained from a variety of sources including projects, assignments, performances/ observations, and tests. Students' learning is often assigned specific numbers or grades and this involves measurement.

Measurement takes into account such questions as, "How much?" and is used most commonly when the teacher scores a test or product and assigns numbers (e.g. 28 /30 on English test; 90/100 on the environmental science project).

Evaluation is the process of making judgments about the assessment information. These judgments may be about individual students (e.g. should Rahul's course grade take into account his significant improvement over the grading period?), the assessment method used (e.g. is the essay type test a useful way to obtain information about problem solving), or one's own teaching (e.g. most of the students this year did much better on the essay assignment than last year so my new teaching methods seem effective).

Assessment for learning is often formative assessment, i.e. it takes place during the course of instruction by providing information that teachers can use to revise their teaching and students can use to improve their learning. Formative assessment includes both informal assessment involving spontaneous unsystematic observations of students' behaviours (e.g. during a question and answer session or while the students are working on an assignment) and formal assessment involving pre- planned, systematic gathering of data.

Assessment of learning is summative assessment that involves assessing students in order to certify their competence and fulfill accountability mandates and is the primary focus of the next chapter on standardized tests but is also considered in this chapter. Assessment of learning is typically summative, that is, administered after the instruction is completed (e.g. a final examination in an educational psychology course). Summative assessments provide information about how well students mastered the material, whether students are ready for the next unit, and what grades should be given.

For an assessment to be high quality it needs to have good validity and reliability as well as absence from bias.

Validity

Validity of assessment means that the assessment measures what it is supposed to measure. For example, how appropriate is it to conclude that the results of a language test on grammar given to recent immigrants accurately represents their understanding of grammar? Is it alright for the teacher to conclude, based on the observation of her student, Ruhi that she has Attention Deficit Disorder because she does not follow the teachers verbal instructions? Obviously in each situation other interpretations are possible that the immigrant students have poor English skills rather than mathematics skills, or that Ruhi may be hearing impaired.

Reliability

Reliability refers to the consistency of the measurement. Suppose Mr. Kapur is teaching a unit on food chemistry in his tenth grade class and gives an assessment at the end of the unit using test items from the teachers' guide. Reliability is related to questions such as: How similar would the scores of the students be if they had taken the assessment on a Friday or Monday? Would the scores have varied if Mr. Kapur had selected different test items, or if a different teacher had graded the test? An assessment provides information about students by using a specific measure of performance at one particular time. Unless the results from the assessment are reasonably consistent over different occasions, different raters, or different tasks (in the same content domain) confidence in the results will be low and so cannot be useful in improving student learning.

STANPARPIZEP TESTS

Standardized tests are created by a team?usually test experts from a commercial testing company who consult classroom teachers and university faculty?and are administered in standardized ways. Students not only respond to the same questions they also receive the same directions and have the same time limits. Explicit scoring criteria are used. Standardized tests are designed to be taken by many students within a state, province, or nation, and sometimes across nations. Teachers help administer some standardized tests and test manuals are provided that contain explicit details about the administration and scoring. For example, teachers may have to remove all the posters and charts from the classroom walls, read directions out loud to students using a script, and respond to student questions in a specific manner.

Use of standardized tests for diagnosing student's strengths and weaknesses

Standardized tests, along with interviews, classroom observations, medical examinations, and school records are used to help diagnose students' strengths and weaknesses. Often the standardized tests used for this purpose are administered individually to determine if the child has a disability. For example, if a kindergarten child is having trouble with oral communication, a standardized language development test could be administered to determine if there are difficulties with understanding the meaning of words or sentence structures, noticing sound differences in similar words, or articulating words correctly. It would also be important to determine if the child was a recent immigrant, had a hearing impairment or mental retardation. The diagnosis of learning disabilities typically involves the administration of at least two types of standardized tests?an aptitude test to assess general cognitive functioning and an achievement test to assess knowledge of specific content areas. We discuss the difference between aptitude and achievement tests later in this chapter.

Selecting students for specific programs

Standardized tests are often used to select students for specific programs. For example, the SAT (Scholastic Assessment Test) is a norm referenced tests used to help determine if high school students are admitted to selective colleges. Norm referenced standardized tests are also used, among other criteria, to determine if students are eligible for special education or gifted and talented programs. Criterion referenced tests are used to determine which students are eligible for promotion to the next grade or graduation from high school. Schools that place students in ability groups including high school college preparation, academic, or vocational programs may also use norm referenced or criterion referenced standardized tests. When standardized tests are used as an essential criteria for placement they are obviously high stakes for students.

Assisting teachers' planning

Norm referenced and criterion referenced standardized tests, among other sources of information about students, can help teachers make decisions about their instruction. For example, if a social studies teacher learns that most of the students did very well on a norm referenced reading test administered early in the school year he may adapt his instruction and use additional primary sources. A reading teacher after reviewing the poor end-of-the- year criterion referenced standardized reading test results may decide that-next year she will modify the techniques she uses. A biology teacher may decide that she needs to spend more time on genetics as her students scored poorly on that section of the standardized criterion referenced science test. These are examples of assessment for learning which involves data-based decision making. It can be difficult for beginning teachers to learn to use standardized test information appropriately, understanding that test scores are important information but also remembering that there are multiple reasons for students' performance on a test.

TYPES OF STANPARDIZEP TESTS

Achievement tests

K-12 achievement tests are designed to assess what students have learned in a specific content area. These tests include those specifically designed to access mastery of the academic content standards and also a general achievement test to provide normative information. Achievement tests are used as one criterion for obtaining a license in a variety of professions. Their use in teacher education is recent and there is a requirement that teacher education students take achievement tests in order to obtain a teaching license. These tests include constructed-response and multiple- choice items which tests teacher education students. The scores needed in order to pass each test vary.

Diagnostic tests

Profiling skills and abilities: Some standardized tests are designed to diagnose strengths and weaknesses in skills, typically reading or mathematics skills. For example, an elementary school child may have difficult in reading and one or more diagnostic tests would provide detailed information about three components: (1) word recognition, which includes phonological awareness (pronunciation), decoding, and spelling; (2) comprehension which includes vocabulary as well as reading and listening comprehension, and (3) fluency. Diagnostic tests are often administered individually by school psychologists, following standardized procedures. The examiner typically records not only the results on each question but also observations of the child's behaviour such as distractibility or frustration. The results from the diagnostic standardized tests are used in conjunction with classroom observations, school and medical records, as well as interviews with teachers, parents and students to produce a profile of the student's skills and abilities, and where appropriate diagnose a learning disability.

Aptitude tests

Predicting the future: Aptitude tests, like achievement tests, measure what students have learned, but rather than focusing on specific subject matter learned in school (e.g. math, science, English or social studies), the test items focus on verbal, quantitative, problem solving abilities that are learned in school or in the general culture. These tests are typically shorter than achievement tests and can be useful in predicting general school achievement. If the purpose of using a test is to predict success in a specific subject (e.g. language arts) the best prediction is past achievement in language arts and so scores on a language arts achievement test would be useful. However when the predictions are more general (e.g. success in college) aptitude tests are often used. According to the test developers, SAT Reasoning tests, used to predict success in college, assess general educational development and reasoning, analysis and problem solving as well as questions on mathematics, reading and writing. The SAT Subject Tests that focus on mastery of specific subjects like English, history, mathematics, science, and language are used by some colleges as entrance criteria and are more appropriately classified as achievement tests than aptitude tests even though they are used to predict the future.

Tests designed to assess general learning ability have traditionally been called Intelligence Tests but are now often called learning ability tests, cognitive ability tests, scholastic aptitude tests, or school ability tests. The shift in terminology reflects the extensive controversy over the meaning of the term intelligence and that its traditional use was associated with inherited capacity. The more current terms emphasize that tests measure developed ability in learning not innate capacity.

CLASSROOM ASSESSMENT

Formulating appropriate questions

When to test

Frequent testing encourages the retention of information and appears to be more effective than a comparable amount of time spent reviewing and studying the material.
Tests are especially effective in promoting learning if you give students a test on the material soon after they learn it, then retest on the material later
The use of cumulative questions on tests is a key to effective learning. Cumulative questions ask students to apply information learned in previous units to solve a new problem.

It is argued by some researchers that students will learn more if we teach them less that is if the curriculum includes fewer topics, but explores those topics in greater depth and allows more time for review, practice, testing, and feedback.

Textbook tests

Most elementary and secondary school texts today come complete with supplemental materials such as teaching manuals, handout masters, and ready-made tests. Using these tests can save time, but is this a good teaching practice? It depends on your objectives for students, the way you taught the material, and the quality of the tests provided. If the textbook test matches your testing plan and the instruction you actually provided your students, then it may be the right test to use.

It's important to keep these key points in mind while evaluating textbook tests
The decision to use a textbook test must come after a teacher identifies the objectives that he or she taught and now wants to assess.
Textbook tests are designed for the typical classroom, but since few classrooms are typical, most teachers deviate somewhat from the text in order to accommodate their pupil's needs.
The more classroom instruction deviates from the textbook objectives and lesson plans, the less valid the textbook tests are likely to be.
The main consideration in judging the adequacy of a textbook test is the match between its test questions and what pupils were taught in their classes:

1. Are questions similar to the teacher's objectives and instructional emphases?

2. Do questions require pupils to perform the behaviours they were taught?

3. Do questions cover all or most of the important objectives taught?

4. Is the language level and terminology appropriate for pupils?

5. Does the number of items for each objective provide a sufficient sample of pupil performance?

Objective testing

Multiple-choice questions, matching exercises, T/F statements, and short answer or fill-in items are all types of objective testing. The word "objective" in testing means "not open to many interpretations," or "not subjective". The scoring of these types of items is relatively straightforward compared to the scoring of essay questions because the answers are more clear-cut than essay answers. These are easy to score fairly and can cover many topics.

Using multiple-choice tests

People often assume that multiple-choice items are appropriate only for asking factual questions. But multiple-choice items can test higher level objectives as well, although writing higher-level items is difficult. A multiple-choice item can assess more than recall and recognition if it requires the student to deal with new material by applying or analyzing the concept or principle being tested.

Writing Multiple-choice Questions

All test items require skillful; construction, but good multiple-choice items are a real challenge. The stem of a multiple-choice item is the part that asks the question or poses the problem. The choices that follow are called alternatives. The wrong answers are called distracters because their purpose is to distract students who have only a partial understanding of the material.

GUIDELINES FOR OBJECTIVE-TYPE TESTS

The stem should be clear and simple, and present only a single problem. Unessential details should be left out.

Poor form- There are several different kinds of standard or derived scores. An IQ score is especially useful because...

Better form An advantage of an IQ score is...

Do not expect students to make extremely fine discrimination among answer choices Poor form The % of area in a normal curve falling between +1 and -1 standard deviation is about:

(a) 66% (b) 67%

Better form

The % of area in a normal curve falling between +1 and -1 standard deviation is about:

(a) 14% (b) 34%

As much wording as possible should be included in the stem so that phrases will not have to be repeated in each alternative.

Poor form

A percentile score

(a) Indicates the % of items answered correctly

(b) Indicates the % of correct answers divided by the % of wrong answers

(d) Indicates the % of people who scored at or below a given raw score

Better form

A percentile score indicates the % of

(e) Items answered correctly

(f) Correct answers divided by the % of wrong answers

(g) people who scored at or above a given raw score

(h) people who scored at or below a given raw score

Each alternative answer should fit the grammatical form of the stem, so that no answers are obviously wrong

Poor form

The Stanford-Binet test yields an

(a) IQ score

(b) Reading level

(d) Mechanical aptitude

Better form

The Stanford-Binet is a test of

(a) Intelligence

(b) Reading level

(d) Mechanical aptitude

You should also avoid including 2 distractors that have the same meaning. If only one answer can be right and if two answers are the same, then these two must both be wrong. This narrows down the choices considerably

Poor form

The most frequently occurring score in the distribution is called the

(a) Mode (b) Arithmetical average

Better form

The most frequently occurring score in the distribution is called the

(a) Mode (b) Standard deviation

Avoid using the exact wording found in the textbook

Poor students may recognize the answers without knowing what they mean

Avoid overuse of all of the above and none of the above

Such choices may be helpful to students who are simply guessing. In addition, using all of the above may trick a quick student who sees that the first alternative is correct and does not read on to discover that the others are correct too.

ESSAY TESTING

The best way to measure some learning objectives is to require students to create answers on their own. An essay question is appropriate in these cases. The most difficult part of essay testing is judging the quality of answers, but writing good, clear questions is not particularly easy, either.

Constructing Essay Tests

Because answering takes time, true essay tests cover less material than objective tests. Thus, for efficiency, essay tests should be limited to the assessment of more complex learning outcomes.

An essay question should give students a clear and precise task and should indicate the elements to be covered in the answer. The students should know how extensive their answer should be and about how much time they should spend on each question.

Students should be given ample time for answering. If more than one essay is being completed in the same class period, you may want to suggest time limits for each. Remember, that time pressure increases anxiety and may prevent accurate assessment of some students. Do not make up for the limited amount of material an essay test can cover by including a large number of essay questions. It would be better to plan on more frequent testing than to include more than two or three essay questions in a single class period.

Advantages and Disadvantages of Different Kinds of Test Items

Type	Advantages	Disadvantage
Short answer	Can test many facts in short time. Fairly easy to score. Excellent format for Math. Test recall	Difficult to measure complex learning. Often ambiguous
Essay	Can test complex learning. Can assess thinking process and creativity	Difficult to score objectively. Uses a great deal of testing time.
True / False	Tests the mot facts in shortest time. Easy to score. Tests recognition. Objective	Difficult to measure complex learning. Difficult to write reliable items. Subject to guessing
Matching	Excellent for testing associations and recognition of facts. Although terse, can tests complex learning (especially concepts) objective	Difficult to write effective items. Subject to process of elimination.
Multiple-choice	Can assess learning at all levels of complexity. Can be highly reliable, objective. Teats fairly large knowledge base in short time. Easy to score	Difficult to write. Somewhat subject guessing.

CONTINUOUS ANP COMPREHENSIVE EVALUATIONS

Continuous and Comprehensive evaluation refers to a system of school based assessment that covers all aspects of student's development.

It emphasizes two fold objectives.

(a) Continuity in evaluation and assessment of broad based learning.

(b) Behavioural out come

Comprehensive evaluation includes assessment of holistic development including cognitive, affective and psychomotor domains.
Scholastic refers to those aspects which are related to intellect. It includes assessment of learners in curricular subjects, assignments, project work, practical and oral work, etc.
Co-scholastic includes psychomotor skills, physical development, life skills, attitudes, values, interests, and participation in co-curricular activities.
It also envisages improving on-going teaching-learning processes by diagnosing the learning gaps and offering corrective and enrichment input.
In India, the Continuous and Comprehensive Evaluation has been introduced at the Secondary level in CBSE schools. Each term has two formative assessments & one summative assessment in each subject.
Formative assessments is used by the teacher to continuously monitor student progress in a supportive environment and comprise of

(d) Homework

(e) Oral questions

(f) Quizzes

(g) Projects

(h) Assignments/Tests

Summative assessment is carried out at the end of a course of instruction and indicates and measures how much a student has learnt.

Teaching Development Assessment and Evaluations Notes - Assessment and Evaluations

Other Topics

Notes - Assessment and Evaluations