The relationship between educational policies and educational research is both fascinating and disturbing. Sometimes policy makers, including those who piously invoke the idea of "data-driven" practice, pursue initiatives that they favor regardless of the fact that there is no empirical support for them (e.g., high-stakes testing) or even when the research suggests the policy in question is counterproductive (e.g., forcing struggling students to repeat a grade).
Sometimes insufficient attention is paid to the limits of what a study has actually found, such as when a certain practice is said to have been proved "effective," even though that turns out to mean only that it's associated with higher scores on bad tests.
Sometimes research is cited in ways that are disingenuous because anyone who takes the time to track down those studies often finds that they actually offer little or no support for the claims in question. (Elsewhere, I've offered examples of this phenomenon in the context of assertions about the supposed benefits of homework -- along with details about some of the other ways in which research is under-, over-, or misused.)
Then there's the question of what happens when the press gets involved. It's no secret that the reporting of research is often, shall we say, disappointing: A single experiment's results may be overstated or a broad conclusion may be vaguely attributed to what "studies show," despite the fact that multiple qualifications are warranted. Possible explanations aren't hard to adduce: tight deadlines, lack of expertise, or a reporter's hunger for more column-inches or prominent placement (hint: "The results are mixed at best" isn't a sentence that advances journalistic careers).
Whether ideology may also play a role -- a tendency to play up certain results more than others -- is hard to prove. But last week I found myself wondering whether the New York Times would have prominently featured a study, had there been one, showing that taking tests is basically a waste of time for students. After all, the Times, like just about every other mainstream media outlet, has been celebrating test-based "school reform" for some time now, and, in its news coverage of education, routinely refers to "achievement," teacher "effectiveness," exemplary school "performance," and positive "results," when all that's really meant is higher scores on standardized tests. The media have a lot invested in the idea that testing students is useful and meaningful.
So we probably shouldn't have been surprised to discover that last week the Times ran a lengthy (30-something-inch) story on the second page of its national news section under the headline "Take a Test to Really Learn, Research Suggests." And it should be equally unsurprising that the study on which the story was based didn't really support that conclusion at all.
(I'm picking on the New York Times because of its prominence, but many other news organizations also featured this article and described the study in similar terms. Other headlines included: "Taking a Test Helps Learning More Than Studying, Report Shows," "Learning Science Better the Old-Fashioned Way," and "Beyond Rote Learning.")
We should begin by noticing that the study itself, which was published online in the January 20 issue of Science, had nothing to do with -- and therefore offered not the slightest support for -- standardized tests. Moreover, its subjects were undergraduates, so there's no way of knowing whether any of its findings would apply to students in K-12 schools.
The real problem with the news coverage, though, is twofold: On closer inspection there are issues with how both the independent variable ("Take a Test") and the dependent variable ("Really Learn") are described.
What interested the two Purdue University researchers, Jeffrey D. Karpicke and Jannell R. Blunt, was the idea that trying to remember something that has been taught can aid learning at least as much as the earlier process of encoding or storing that information. Their study consisted of two experiments in which college students either practiced retrieving information they'd learned or engaged in other forms of studying. The former proved more effective.
The type of retrieval practice used in the study was an exercise in which students recalled "as much of the information as they could on a free recall test." But the idea of retrieval practice didn't need to involve testing at all. "The NY Times article emphasized 'testing,' which is unfortunate, because that's really irrelevant to our central point," Karpicke told me in an email message. "Students could engage in active retrieval of knowledge in a whole variety of ways that aren't 'testing,' per se." For example (as he explained in a subsequent message), they might put the book aside to see how much of it they can recall, try to answer questions about it, or just talk about the topic with someone.
In other words, the experiments didn't show -- and never attempted to show -- that taking a test works better than studying. They were really comparing one form of studying to another.
Then there's the question of outcome. When I said a moment ago that the study showed retrieval practice was more "effective," the most appropriate response would have been to ask what that word meant in this particular context: more effective at what?
In the first experiment, students were asked both verbatim questions and inference questions that drew on concepts from the text they had been given. In the second experiment, they either took a short-answer test of the material or were asked to create concept maps of that material from memory.
The researchers seemed impressed that practice retrieving facts worked better than making concept maps (with the text in front of them) at preparing students for a closed-book test even when the test itself involved making concept maps. But the students were tested mostly on their ability to recall the material, so it may not be surprising that recall practice proved more useful.
I would argue that this result says less about how impressive the method was than about how unimpressive the goal was. Karpicke and Blunt weren't investigating whether students could construct meaning, apply or generalize concepts to new domains, solve ill-defined problems, draw novel connections or distinctions, or do anything else that could be called creative or higher-order thinking. Now if testing -- or any other form of retrieval practice -- were shown to enhance those capabilities, that would certainly deserve prominent media attention. But this study showed nothing of the sort. Indeed, I know of no reason to believe that tests have any useful role to play in the promotion of truly meaningful learning.
The main contribution of the articles that were published about this study is to remind us of the importance of reading the actual studies being described. To understand why the description of this one was misleading, try to imagine a newspaper running a more accurate account -- one with a headline such as "Practice Recalling Facts Helps Students Recall Facts."
Follow Alfie Kohn on Twitter: www.twitter.com/@alfiekohn
Bill Tucker: Three Truths About Testing and Cheating
I would add that the self-fulfilling circles of "defining what's on the test, preparing for the test, taking the test and declaring victory with slight improvement on the test" are more than delusional. There is a growing body of neurobiological evidence that these practices actually reduce cognitive development.
Most interesting to me are the kinds of comments that the original story received (both here on huffingtonpost and on the NYT site). A lot of comments saying things like "testing works," when, as you point out, "testing" is not at all the focus of this paper. Which says something about how stories are *read*, not just how they are written!
http://www.palmbeachpost.com/opinion/letters/letters-fcat-administrator-has-trail-of-blunders-ties-759223.html
I read the article in the New York Times. I thought it made a very good case for in-class closed-book quizzes as a learning strategy. Mind you, I am somewhat prejudiced myself. My 16-year old son has a teacher who gives him Concept Map assignments all the time, and I have never noticed that he learns much from them. What it was not though, was any kind of a case, for or against, the kind of high-stakes testing, with little prior information about what material will be on the test, that we use these days to judge the fitness of our schools and their staff.
i recall an AP negating concerns i had about class size with: "the district's research demonstrates class size has no impact on student success."
duh! this outcome suited their agenda: consultants are highly paid--theyappease customer demands for return business. a colleague scoffed at what AP clearly believed.
he knows better, as do i and any teacher juggling like 40 students when 20, maybe 25 fit in a classroom it may be cost-effective but it it is not effective
i work in low performing schools--where i am needed most. we have el, iep, eq issues, remediation, corruprt/incompetent leadership, limited resources, inequitable circumstances (like an ELA teacher at 200 students with 4 preps and another with 75 and 2 classes) and accomodations not addressed by testing.
from what i see, teachers are competing, as are students, on a socio-economic playing field that is profoundly uneven if these scores generated data holistically. how can i be judged for a child's low rank if she comes into the cousre 3-5 years behind the grave level standards i am responsible for? if she and others show growth, why isn't this a consideration in my evaluation instead of pitting the performance of a colleague working in San Marino against mine?
the test itself has not been closely considered as a viable tool for anything but its ability to generate big bucks.
Amen, this has been my experience working in low income schools as well. But what is an AP?
Few districts have implemented any sort of system for using the information gleaned from tests.
Where I live, the Iowa test is given to children ad-nauseum. My received a letter from the school expressing its concern that her son was far below grade level in reading and math: A week earlier, she received her son's Iowa test scores stating that he was in the 80th percentile for both reading and math.
When I asked the counselor about the discrepancy, she replied that nobody trusted the results of the Iowa test.
Why the heck is my state spending millions on this test if nobody is using it?
BTW...I agree with the counselor. My stepson's Math and Language skills are dismal.
That being said, I do have a quibble with your comment: "This is sort of a straw argument against testing."
It isn't, actually. The article doesn't take a stand for or against testing, it says that the New York Times headlined a story as being about testing, when it was actually about how to make studying more effective.
It is a self perpetuating information system.
Mr. Kohn's count is a bit short. I've been working on a review of the research literature on the effects of testing. To this point, I have reviewed a few thousand studies, several hundred of which are included in two meta-analyses--of quantitative and survey studies--and a research summary of the qualitative studies. The text for this study is currently under review by a scholarly journal.
I've posted the quantitative ans survey studies' source lists and effect sizes here:
http://www.npe.ednews.org/Review/Resources/QuantitativeList.htm
http://www.npe.ednews.org/Review/Resources/SurveyList.htm
...and hope to post the same for the qualitative studies soon.
No matter what kind of test is used, the mean effects are moderately to strongly positive (for the effect of testing on student achievement). Indeed, the type of studies Mr. Kohn would appreciate most--the qualitative--provide the strongest evidence of all: over 95% positive.
Prominent researchers in education and economics continue to claim that there has been no research (either ever or prior to theirs) on the effects of testing on achievement. And, they continue to get away with it. The easiest way to win a debate is to declare the opposition nonexistent.
Just curious....
- something other than a test (e.g., GPA, improved attitude, curricular alignment) is used as the outcome measure
- one group is tested more often than the other prior to a common, final test
- one group is tested with higher stakes than the other prior to a final test
- one group is tested with feedback/awareness of results and the other is tested but told nothing at all prior to a final test
- one group is tested and the other is not, and their relative gains on a different, common monitoring test are compared
- one group is told there's a test at the end of a period of time and the other group is told there is not, and then the two are compared on the final exam
- two groups are told there's a test at the end of a period of time, but one group is told the test will count and the other group is told it will not
- in survey studies, respondents are simply asked what they think based on their own, or their children's experience
We need more of an action research orientation to teacher training as opposed to the same old same old from the last 50+ years. The press may be irresponsible, but if those who write the research aren't even responsible in making sure their research is practical and useful, than what's the purpose of the research to begin with?
Honestly, I was surprised by this. I certainly never expected this when I began my Ph.D. journey. But, it is true. I only learned it when the requirements of my degree forced me into the research world. If that is the case for me, I shudder when I see how the general public is regularly and consistently misled by the reporting on education research. Often times, the reports about a particular piece of education research come not from the substantiated findings, but from the opinions about what to study next.
But, don't construe this as a blanket indictment of all education research. Researching living human beings is tough. The techniques are difficult. The timeframes for study can be long. Obtaining large samples can be costly or practically impossible. And then finally, the review process for educational research can be onerous (and that is a generous phrasing of it), the politics contentious, and the parental approvals improbable. It is a wonder we have what we have, but it still needs to be better.
There is so much attention to testable/measurable results and rating and ranking for accountability and none to meaningful learning. There seems to be a total disregard for the unfolding of people’s human potential that can only be realized through a joy of learning.
All children enter the educational system with an inherent thirst for learning and yet exit the system with it squelched. So no matter how much we’ve managed to cram into their short-term memory and not matter what our measures and rankings show we are left with individuals who haven’t the curiosity of mind and critical thinking processes reflective of people who love to learn anew. Learning is essential to the viability of humankind—we are not instinctually regulated—and so when we destroy the inherent thirst for it, we contribute to our very own destruction.
http://www.forprogressnotgrowth.com/2010/11/23/getting-education-right/
http://www.forprogressnotgrowth.com/2010/11/04/enfold-and-unfold/
http://www.youtube.com/watch?v=zDZFcDGpL4U