Robo-Readers: Can They Help U.S. Teachers Grade And Improve How High School Students Write?
By Stephanie Simon
(Reuters) - American high school students are terrible writers, and one education reform group thinks it has an answer: robots.
Or, more accurately, robo-readers - computers programmed to scan student essays and spit out a grade.
The theory is that teachers would assign more writing if they didn't have to read it. And the more writing students do, the better at it they'll become - even if the primary audience for their prose is a string of algorithms.
That sounds logical to Mark Shermis, dean of the College of Education at the University of Akron. He's helping to supervise a contest, set up by the William and Flora Hewlett Foundation, that promises $100,000 in prize money to programmers who write the best automated grading software.
"If you're a high school teacher and you give a writing assignment, you're walking home with 150 essays," Shermis said. "You're going to need some help."
But help from a robo-reader?
"Wow," said Thomas Jehn, director of the Harvard College Writing Program. He paused a moment.
"It's horrifying," he said at last.
Automated essay grading was first proposed in the 1960s, but computers back then were not up to the task. In the late 1990s, as technology improved, several textbook and testing companies jumped into the field.
Today, computers are used to grade essays on South Dakota's student writing assessments and a handful of other high-stakes exams, including the TOEFL test of English fluency, taken by foreign students.
But machines do not grade essays on either the SAT or the ACT, the two primary college entrance exams. And American teachers by and large have been reluctant to turn their students' homework assignments over to robo-graders.
The Hewlett contest aims to change that by demonstrating that computers can grade as perceptively as English teachers - only much more quickly and without all that depressing red ink.
Automated essay scoring is "nonjudgmental," Shermis said. "And it can be done 24/7. If students finish an essay at 10 p.m., they get feedback at 10:01."
Take, for instance, the Intelligent Essay Assessor, a web-based tool marketed by Pearson Education, Inc. Within seconds, it can analyze an essay for spelling, grammar, organization and other traits and prompt students to make revisions. The program scans for key words and analyzes semantic patterns, and Pearson boasts it "can 'understand' the meaning of text much the same as a human reader."
Jehn, the Harvard writing instructor, isn't so sure.
He argues that the best way to teach good writing is to help students wrestle with ideas; misspellings and syntax errors in early drafts should be ignored in favor of talking through the thesis. "Try to find the idea that's percolating," he said. "Then start looking for whether the commas are in the right place." No computer, he said, can do that.
What's more, Jehn said he worries that students will give up striving to craft a beautiful metaphor or insightful analogy if they know their essays will not be read, but scanned for a split second by a computer program.
"I like to know I'm writing for a real flesh-and-blood reader who is excited by the words on the page," Jehn said. "I'm sure children feel the same way."
Even supporters of robo-grading acknowledge its limitations.
A prankster could outwit many scoring programs by jumbling key phrases in a nonsensical order. An essay about Christopher Columbus might ramble on about Queen Isabella sailing with 1492 soldiers to the Island of Ferdinand -- and still be rated as solidly on topic, Shermis said.
Computers also have a hard time dealing with experimental prose. They favor conformity over creativity.
"They hate poetry," said David Williamson, senior research director at the nonprofit Educational Testing Service, which received a patent in late 2010 for an Automatic Essay Scoring System.
But Williamson argues that automated graders aren't meant to identify the next James Joyce. They don't judge artistic merit; they measure how effectively a writer communicates basic ideas. That's a skill many U.S. students lack. Just one in four high-school seniors was rated proficient on the most recent national writing assessment.
The Hewlett Foundation kicked off its robo-grading contest by testing several programs already on the market. Results won't be released for several weeks, but Hewlett officials said they did very well.
Hewlett then challenged amateurs to come up with their own algorithms.
The contest, hosted on the data science website Kaggle.com, has drawn hundreds of competitors from all walks of life. They have until April 30 to write programs that will judge essays studded with awkward phrases such as, "I slouch my bag on to my shoulder" or "When I got my stitches some parts hurted."
The goal is to get the computer to give each essay the same score a human grader would.
Martin O'Leary, a glacier scientist at the University of Michigan, has been working on the contest for weeks.
Poring over thousands of sample essays, he discovered that human graders generally don't give students extra points for using sophisticated vocabulary. So he scrapped plans to have his computer scan the essays for rare words.
Instead, he has his robo-grader count punctuation marks. "The number of commas is a very strong predictor of score," O'Leary said. "It's kind of weird. But the more, the better."
As he digs into the data, O'Leary has run into a dismaying truth: The human graders he's trying to match are inconsistent. They disagree with one another on the merits of a given essay. They award scores that seem random. Indeed, studies have shown that human readers are influenced by factors that should be irrelevant, such as how neatly a student writes.
"The reality is, humans are not very good at doing this," said Steve Graham, a Vanderbilt University professor who has researched essay grading techniques. "It's inevitable," he said, that robo-graders will soon take over.
O'Leary won't mind when that day comes. He tests his program against student prose that has already been graded by a teacher. When the scores diverge, O'Leary reads the essay to find out why.
"More often than not," he said, "I agree with the computer."
(Editing by Jonathan Weber and Philip Barbara)