Monday, June 2, 2014

Teacher Evaluations and Standardized Test Scores

The Idaho Department of Education recently released its newest application for a waiver from provisions of the No Child Left Behind law for public comment. The waiver application includes a timeline for the implementation of a statewide teacher evaluation system (pages 290-299).

Idaho’s proposed teacher evaluation system is based upon a controversial plan from the state of New Mexico, which recently provided teachers with their evaluation rankings, sparking some reaction from teachers across the state. 50% of a teacher’s rating in the New Mexico system is based on student growth and achievement on the New Mexico Standards-Based Assessment (to be replaced by PARCC assessment in 2015) and/or other assessments (such as end of course and reading assessments), 25% on teacher observations, and 25% on multiple measures, such as teacher attendance, preparation and planning, and professionalism. Teacher evaluation ratings were not made public in New Mexico in 2014. In Idaho’s waiver application, the first statewide teacher evaluation ratings would occur in Summer 2015.

Idaho’s system currently calls for 33% of the evaluation to be based on achievement and growth on standardized tests, though legislation in 2012 that increased the percentage to 50% was rendered null and void by the overturn of Propositions 1, 2, and 3, and House Bill 557, held in committee in 2014, would have increased the percentage to 50% over a period of years, comporting with the new NCLB waiver application.

The Technical Advisory Committee (TAC), a subcommittee of the Tiered Licensure Committee reviewing Idaho’s licensure requirements as part of the planning for Idaho’s Career Ladder, recommended last week that the SBAC (or another statewide assessment) be a mandatory part of teacher evaluation at applicable grades (the SBAC is currently planned for administration in Idaho in grades 3-11 in spring 2015).

These test-based teacher effectiveness measures are called Value Added Measures (VAM) and purportedly measure individual teacher contribution to student achievement by comparing the expected growth of the individual student against the actual individual student growth while in the teachers' classroom (source: New Mexico Department of Education).

However, two recent studies from reputable statistical organizations question the relationship of student growth on standardized test measures to teacher quality.

First, the American Statistical Association produced a statement regarding Value Added measures and teacher evaluation, making these points:

  • VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.
  • VAMs typically measure correlation, not causation: Effects – positive or negative - attributed to a teacher may actually be caused by other factors that are not captured in the model.
  • VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
Here is an article from Valerie Strauss of the Washington Post detailing the ASA findings.

In early May, Morgan Polikoff of USC and Andrew Porter of the University of Pennsylvania published another study, "Instructional Alignment as a Measure of Teacher Quality” in the peer-reviewed journal Educational Evaluation and Policy Analysis. Using results of the Gates Foundation Measures In Effectiveness of Teaching study, they found “no association between value-added results and other widely-accepted measures of teaching quality.” (see article in Education Week).

The seeming disconnect between the work of the Technical Advisory Committee and the research on value-added measures and teacher evaluation hearkens back to the introduction of Proposition 2, Pay for Performance, in 2011. Research showing the ineffectiveness of Pay for Performance implementation in other states was plentiful, as is the emerging research on use of value-added measurement for teacher evaluation. The first distribution of funds highlighted the same inequities that an evaluation system based on standardized test scores will show.

The work of teachers with their students during the school year can easily be taken into account at the local level in evaluations done by principals with teachers without using standardized test scores and value-added measurement. Progress of orchestra students on musical pieces, Pre-Algebra students on a statistics unit, U.S. History students in understanding the causes of the Civil War – all should be handled at the local level through judicious use of content-specific assessment data, observations of instruction, and analysis of classroom management and preparation, as well as other variables from effective teaching frameworks such as the Charlotte Danielson model.