07/15/2014 03:33 pm ET Updated Sep 14, 2014

Beyond the Bubble Test: Why We Need Performance Assessments

Note: This post originally appeared as a guest blog in Matthew Lynch's Education Week column, "Education Futures: Emerging Trends and Technologies in K-12."

Last spring, while millions of American students were bubbling in answers to multiple-choice questions on the ubiquitous tests that determine school and teacher ratings, student promotions, graduation, and college admissions, some students were meeting a higher standard. At the Urban Academy, a second chance high school in New York City that is part of the New York Performance Standards Consortium, Gemma Venuti completed the set of research papers that were part of her graduation portfolio -- and defended them before a committee of teachers, students, and experts from outside the school.

Schools in the Consortium require students to complete a well-developed literary analysis, a mathematical model, a scientific investigation, and a social science research paper. Students must provide evidence of competence in oral and written communication, critical thinking, and technology use, among other 21st-century skills. Some schools add demonstrations of competence in the arts, world language, or other fields. Teachers are trained to score the tasks reliably using common rubrics. Students who do not meet a graduation standard revise their work until it meets the criteria. Based on the high rates of graduation, college-going, and college success of Consortium students, most of whom are low-income students of color, New York State allows these performance tasks to substitute for the Regents Examinations.

In science, Gemma conducted and wrote up an experiment about the Effect of Stress on Memory. In mathematics, she calculated the distance of the Empire State building from her school and explained how this and other problems could be solved using trigonometry. Gemma's literary analysis evaluated the religious views of The Life of Pi protagonist, who claimed three different religions. In social science, which requires the use of primary source materials to write a research paper, Gemma wrote about Hobbes' and Kropotkin's views on the right form of government. Gemma noted, "Although I started off convinced by Hobbes, I re-read Kropotkin's writing a few times, read background material, and found that I actually agreed with many of his opinions. I revised my opinion, and argued that neither one of them was completely correct." Gemma answered tough questions about the meaning of her findings, how she derived them, and how they could be extended when presenting these projects to panels of judges.

It is easy to see how Gemma's work prepared her for college. The tasks she undertook exemplify the "college and career-ready" standards recently adopted by most states across the country. These standards urge an emphasis on deeply understanding and applying content knowledge to real world situations, critical thinking, complex problem solving, inquiry, communication, collaboration, and uses of technology to research and create.

Despite the widespread adoption of new standards, few states yet have ways to measure these complex skills, since the performance assessments many developed during the 1990s -- writing and research tasks, science investigations, and real-world mathematical challenges -- were largely eliminated in the No Child Left Behind (NCLB) era.

NCLB's increased testing requirements led states to abandon performances requiring human scoring in favor of inexpensive, multiple-choice tests that could be scored by machine. The result, according to a recent study by the RAND Corporation, was that higher-order skills were also left behind. RAND found that less than 2 percent of mathematics items and only about 20 percent of English language arts items on current state tests measure the kind of critical thinking, analysis, and evaluation skills today's careers require.

Perhaps this is why Laszlo Bock, Google's senior vice president in charge of hiring, told Thomas Friedman of the New York Times that "Test scores are worthless. ... We found that they don't predict anything." Bock noted that the number one skill Google looks for is "learning ability" -- the ability to seek out and make sense of disparate information and apply it to solve problems. Unfortunately, because of the high stakes attached to U.S. tests, instruction has increasingly been narrowed to resemble both the content and the format of the tests -- squeezing out time for teachers to develop higher-order thinking skills. As one teacher noted in a national survey:

"Before [our state test] I was a better teacher. I was exposing my children to a wide range of science and social studies experiences. I taught using themes that really immersed the children into learning about a topic using their reading, writing, math, and technology skills. Now I'm basically afraid to not teach to the test. I know that the way I was teaching was building a better foundation for my kids as well as a love of learning."

The contrast with other countries is stark. High achieving nations typically employ open-ended essays and problems, in addition to research papers and projects that students complete in the classroom to demonstrate their critical reasoning, communication, and ability to apply their knowledge to real-world situations. Testing is less frequent, usually occurring at no more than one grade level before secondary school, plus high school examinations that students choose in subjects where they want to demonstrate their qualifications to colleges or employers. These are used to stimulate high-quality instruction, but not to rank or close schools or to fire teachers.

In places like Singapore, Hong Kong, and Australia, for example, science assessments require students to design, conduct, analyze, and write up an independent investigation over the course of several weeks. International Baccalaureate schools in more than 100 countries ask students to design and conduct a collaborative inquiry and present it orally and in writing to others, defending their process and conclusions. In Denmark, students have full access to the internet as they write their open-ended examinations. The goal is to be able to research, synthesize, and apply, not just to remember. Teachers score these tasks as part of the examination system, which helps them learn about the standards and improve their instruction, while students learn transferable skills they can take with them into college and careers.

These kinds of assessments are desperately needed in the United States, not only to evaluate the new standards, but to ensure that our children will have the kinds of learning experiences that will allow them to succeed in the knowledge-based society they are entering. Students abroad who engage in these experiences are learning to become the scientists, researchers, and innovators of the future. U.S. students will fall behind if they continue to spend their time perfecting the art of choosing one answer out of five -- a skill they will never use in the real world -- rather than engaging in more useful work.

A modest step in the direction of more performance-based assessment will be taken by two multi-state consortia created to assess the Common Core State Standards -- the Partnership for Assessing Readiness for College and Careers (PARCC) and the Smarter Balanced Assessment Consortium (SBAC). PARCC and SBAC assessments, to be launched in 2015, will include some open-ended items and at least one short (one- to two-period) performance task in each subject. If these tests are used as feedback to inform teaching, rather than misused as arbiters of sanctions for schools, teachers, and students, they can help move learning forward.

Going still further, some states and districts, such as those belonging to the Innovation Lab Network (ILN), coordinated by the Council for Chief State School Officers, plan to introduce even more extensive performance assessments, so that they can measure the full range of the new standards -- and develop critical traits such as resourcefulness, collaboration, and perseverance. These will include longer-term tasks that result in a range of products -- social science research reports, math analyses, experimental logs, engineering designs, spreadsheets, literary and artistic products -- presented in a variety of forms, including oral, written, graphic, and multimedia presentations. Together the states are creating a Performance Assessment Bank that will provide tasks for schools and teachers to draw upon, along with resources for developing, scoring, and training.

If efforts to re-introduce performance assessments on a wide-scale are to succeed, however, it will be important to address the concerns that will emerge -- including means for ensuring validity and reliability, training teachers to develop and score tasks consistently, and managing systems in ways that are feasible and affordable. This includes taking advantage of new technologies for both assessment and scoring, building educators' assessment literacy, and addressing the needs of a wide range of diverse learners. As a set of leading researchers describes in the recently-released Beyond the Bubble Test, we can learn valuable lessons about how to address such challenges from a growing number of nations that have successfully implemented performance assessments, as well as from prior state experiences.

This body of research suggests that performance assessments can pay significant dividends to students and teachers, as well as colleges and employers, as they improve the quality of information about learning. When students develop and revise projects and exhibitions evaluated with rigorous criteria, they internalize standards of quality and develop college- and career-ready skills of planning, resourcefulness, perseverance, a capacity to use feedback productively, a wide range of communication skills, and a growth mindset for learning -- all of which extend beyond the individual assignments to shape their ability to learn to learn in new contexts. Meanwhile, students, teachers, parents, states, and postsecondary systems learn more about what students actually know, think, and can do -- thus helping them make better decisions and support stronger improvements in learning.

Gemma will be well-prepared for the expectations she will encounter at the College of the Atlantic this fall, where she will study human ecology. Clearly, all students need these kinds of opportunities if we are to achieve our goals of college- and career-readiness for all. It is time to get beyond the bubble test. Creating new systems that include performance assessments which are used to support student learning and higher-order teaching is a critical step in this process.