02/11/2014 11:42 am ET Updated Apr 13, 2014

Bad Science

There's an old joke that goes something like this: a scientist - of dubious reputation - sets out to discover how far a frog will jump under adverse conditions. He sets up a measured course, places the frog at the zero mark and begins his trials. His research plan is to shout, "JUMP FROG," to trigger the frog's jumping impulse, introduce the adverse conditions and record the results.

The experiment unfolds.

• With four fully functioning legs and a hearty "JUMP FROG," the frog leaps 3 feet. The investigator dutifully records the result.
Frog with four legs . . . jumps three feet
• For the second trial, the experimenter cuts off one of the frog's legs. "JUMP FROG."
Frog with three legs . . . jumps 2 feet, 4 inches
• Third try - this time with two legs working. "JUMP FROG."
Frog with two legs . . . jumps 16 inches
• Next attempt with one leg. "JUMP FROG."
Frog with one leg . . . jumps 3 inches
• One last trial. This time with not a leg to stand on. "JUMP FROG."
No movement from the frog.
"JUMP FROG" (louder).
No movement from the frog.
"JUMP FROG" (louder still).
No movement from the frog. The scientist records the finding.
Frog with no legs . . . deaf.

What has a story about a clueless researcher, performing a sadistic stunt, drawing an erroneous conclusion to do with today's ed reform initiatives? Too much, I'm afraid.

In addition to all kinds of political chicanery that we see operating regarding education policy at the federal and state level today, we are witnessing an initiative that might be the grand prize winner for bad science. What is happening today with teacher evaluations in relation to student test scores resembles the work of the frog scientist: a preposterous plan yielding predictably misleading results. Campbell's Law reminds us that a quantitative social indicator used for social decision-making is subject to corruption pressures and the more it is used the more apt it will be to distort and corrupt the social processes it is intended to monitor. With this adage in mind, we should be, at the very least, skeptical about the validity of the results of the student testing/teacher evaluation systems which are used prominently in educational decision-making throughout the country.

In New York, 40% of a teacher's rating for the year depends on test scores. The 40% contribution is a deceptive percentage in that a two-year- in-a-row-failing-score on the test portion of the rating can lead to a teacher's dismissal. In this system, it is entirely possible for a teacher, who focuses on test prep to the exclusion of all other forms of inquiry, to receive a strong rating. Conversely, it is entirely possible for a teacher who does not focus on test prep, but, instead, spends classroom time on open-ended, experiential learning to receive a poor rating.

New York's annual professional performance review (or APPR), as it is called, has four rating categories -highly effective, effective, developing, ineffective. This is a high stakes calculation in that a teacher's career hangs in the balance. This approach, brokered in a deal New York State made with the United States Department of Education to win Race to the Top money, was designed by politicians and policy makers as an experiment in school accountability.

The experiment unfolds.

• Teacher A - someone with poor human relations skills, very little compassion or empathy; a compulsive scheduler of test prep; has no time for creative endeavors or open-ended class conversations that do not in some way involve standardized test result improvement. Teacher with no interest in children's curiosity or developmentally appropriate lessons. Generally disliked by students, but students are too afraid not to be good soldiers in this teacher's test prep regimens. Teacher has consistently strong standardized testing results..

Teacher A rating . . . HIGHLY EFFECTIVE

• Teacher B - someone with outstanding human relations skills, beloved by the students, develops creative endeavors in designing instruction. Teacher with focus on children's natural wonder about the world and a love of learning, one who casually pays attention to the tests, believing that there is more important work to be done with the valuable time he has with his students. Takes students on field trips, spends extra hours each day planning for instructionally unique experiences; cares deeply about the welfare of his students. Scores on state tests are usually poor.

Teacher B rating . . . INEFFECTIVE

While these examples may be caricatures of the problem, they nonetheless point up the trends that have developed since test score results have been exalted in the media and in the public perceptions of school success. The tendency in all parts of the country now is to reward those who do well on tests and punish those who do not.

And as if the insult of this comparison were not enough, there are other parallels we can draw between the sadistic toad experimenter and those who run our educational policy machines.

Teachers are being crippled by more and more handicaps - their legs are being cut out from under them - and then they are measured to see how far they can jump. Greater class size, poorly resourced schools, less professional development, more job insecurity, attacks on tenure, the crushing effects of childhood poverty on learning, English as a second language deficits, special needs of students, all these pressures notwithstanding, they are still told to "jump." The measurement police are watching.

Something is terribly wrong when common sense is ignored in favor of a glaringly inappropriate policy affecting an entire generation of children and their teachers. The implementation of an evaluation system, based on a metric that reduces learning to obedience training, one that leads to dire consequences, is so obviously flawed, that one wonders who is in charge of the experiment.

Bad science.

Bad education.

No joke.