We evaluated the score stability of the Framework for Teaching (FFT), a prominent observation instrument used for teacher evaluation. Three raters each scored 200 reading and mathematics lessons taught by… Click to show full abstract
We evaluated the score stability of the Framework for Teaching (FFT), a prominent observation instrument used for teacher evaluation. Three raters each scored 200 reading and mathematics lessons taught by 20 kindergarten teachers. Using Generalizability theory analyses, we decomposed the FFT’s Classroom Environment, Instruction, and Total scores into potential sources of variation (teachers, lessons, raters, and their interactions). The scores’ variances attributable to differences among teachers were 71% and 76% for Classroom Environment, 49% and 37% for Instruction, and 69% and 66% for the Total score, for reading and mathematics, respectively. Reliability estimates (G) ranged from 0.92 to 0.96 for Classroom Environment and Total scores; they were 0.87 and 0.79 for reading and mathematics Instruction. Decision studies indicated that two raters, each scoring three reading lessons or four mathematics lessons, are necessary to achieve sufficiently reliable Total scores. For Instruction scores, three raters each scoring seven readings lessons are needed; more than four raters each scoring eight lessons are needed for mathematics.
               
Click one of the above tabs to view related content.