10/06/2014 03:49 pm ET Updated Dec 06, 2014

State Tests Help to Make What Works in Education an Open Book

Todd Grindal & Beth Boulay

Over the next few weeks, states across the country will make public the results of their standardized tests. The administration of mandated state tests has been maligned by many teachers, parents and students almost since their inception. Critics argue that these exams represent a waste of instructional time at best, and at worst an insidious effort to narrow the curriculum, destabilize the teaching profession, and privatize public education. Lost in this frenzy of concern is recognition of how the information collected through these tests contributes to our understanding of the effectiveness of educational practices and programs.

There are understandable concerns about the appropriateness of using these data to determine whether a school is sanctioned, a teacher receives a bonus, or a student advances from grade to grade. These important policy debates aside, state test data has contributed to a dramatic expansion of our knowledge about what works in U.S education.

Conducting large-scale education research is expensive. Recruiting students, developing measures, and -- most of all -- collecting information on outcomes for students all represent substantial investments of time and money. Before the introduction of large-scale state testing, this expense and complexity meant that education research was dominated by smaller-scale studies of the experiences of single schools, classrooms and students. Although the rich detail this type of research provides has invaluably deepened our understanding of the educational process, it has been less useful in answering the broader questions about what programs and practices are effective for students across multiple settings.

State tests provide data that help to address these questions. First, state tests allow researchers to observe an entire population of students, using a common measure. This allows the results of analyses of these data to be more generalizable than is the case with smaller-scale data collection efforts. These large samples also allow researchers to understand whether or not differences observed between students or schools were the result of chance. Second, these tests provide solid measures of children's skills and abilities. Although all measures have their strengths and weaknesses, and the purpose of individual test questions may sometimes seem confusing to parents, the questions featured on state tests typically are subjected to their own rigorous testing during their initial development phase to ensure that they are both reliable and valid indicators of specific knowledge and skills that children, on average, should learn in school. Finally, these tests provide the opportunity to observe individual students, schools and districts over time, making it possible to identify whether a change in some aspect of schooling was followed by improved performance.

The availability of state test data has made it simpler and more cost-effective to conduct the sort of research that can produce strong (what researchers refer to as internally valid) and generalizable evidence. This is demonstrated by the enormous amount of credible evidence that has been generated since NCLB required testing in more grades and subjects. A recent search of studies that met the strict standards for evidence for inclusion in the Institute for Education Sciences What Works Clearinghouse database indicated that nearly 1 out of every 6 individual studies of K-12 education used data collected through state tests to help estimate the impacts of a specific education practice or program. Using these data, researchers have provided sound and reliable evidence on important questions of policy and practice. As a result, the 4th grade teacher who wants to know which mathematics curriculum will help her students understand geometric concepts can quickly see how students across an entire state performed after using a computer-based mathematics curriculum. The parents trying to decide whether to enroll their child in a bilingual reading program can examine the WWC intervention reports and see how similar children fared after being exposed to this sort of curriculum. The policymaker weighing the merits of expanding charter schools can look at the performance of thousands of students who did and did not attend charter schools.

State test data also play a large role in evaluating the effectiveness of programs funded by the Investing in Innovation (i3) grants, the Department of Education's laboratory for generating innovative solutions to common education challenges. Of the 92 projects being studied through the first three years of i3, 55 are evaluating their effectiveness using state test data to assess students' outcomes. Here, state data will be used to help us understand the effectiveness of innovative solutions to common challenges such as how best to train teachers, support new school leaders, avoid summer slumps in learning, engage struggling students, and involve parents in school, just to name a few.

The introduction of the Common Core standards and the subsequent redesign of many state tests have created an opportunity to rethink the purpose of large-scale assessments. Indeed, there has been a great deal of productive discussion and analysis of exactly how well students' responses to test items reflect important knowledge and skills. That said, some policymakers have reacted to the tumult regarding Common Core implementation by calling for an end to annual state testing, and some parents have responded by unilaterally opting their children out of the test administration. These actions are misguided, and confuse concerns regarding the policy implications of testing with the regular collection of reliable information on children's academic skills. Concerns about how current testing and accountability policies influence teacher practice and school priorities are important. But let's not let differences over these issues lead us to discard this critical tool for the efficient evaluation of the effectiveness of educational programs and practices.

Disclosure: Abt Associates conducts research that informs program and policy decisions at the U.S. Department of Education, the Administration for Children and Families at HHS, the National Science Foundation, NASA, and other agencies and organizations at the federal, state and local levels. For more information on Abt's clients see click here.

Todd Grindal is an associate with Abt Associates, where he studies the impact of public policies on young children and children with disabilities. Todd conducted his doctoral work at the Harvard Graduate School of Education, where the Harvard Center on the Developing Child awarded him a Julius B. Richmond Fellowship in support of his dissertation research on the unionization of home childcare providers. Before starting his doctoral studies, Todd worked for six years as a teacher and school administrator at the high school level in Florida, and at the elementary school and preschool levels in the Washington, D.C. metropolitan area.

Beth Boulay is a principal associate in Abt Associates' Social and Economic Policy Division. Her research focuses on the rigorous evaluation of a range of educational interventions aimed at solving our nation's most persistent problems in education. She currently directs the National Evaluation of Investing in Innovation (i3) Program. The i3 Program funds the implementation and evaluation of more than 100 educational interventions; the National Evaluation provides technical assistance to improve the strength of the evidence generated by the evaluations supported by i3, and will provide comprehensive reports of the results. Beth often presents at conferences on overcoming practical barriers to rigorous research, and learning more from evaluation through high-quality measurement of implementation fidelity.

Research support from Krista Olson & Rebecca Gotlieb