What the Monster Wrote: How Will the College Board Grade a Well-Written Endorsement of Evil?

01/31/2012 05:06 pm ET | Updated Apr 01, 2012

I wrote the following after I pretended to be a teenager and took the SAT. My experiment was to write an essay endorsing evil and see how it got scored. The result was written up by the New York Times.

I work as a private tutor, and every once in a while I take the test to remember what it's like. This time, however, I went with my own question I wanted answered: What would the College Board do with an essay that gave it exactly what it asked for as far as structure and writing quality, but took as its topic something truly monstrous?

Before we could begin, the test proctor checked our identification. Everyone else used their high school IDs, but I pulled out a Florida driver's license. "Florida!" he said, "what is this, your fake?" My response, in the desperately calm tones of someone going through customs: "no, sir." I didn't say I'd kept it to vote in a swing state.

The essay is the first section of the SAT. I had twenty five minutes to resolve whether "compromise [is] always the best way to resolve a conflict." I started broad: The notion that everyone has the right to the same opportunities, and the right to exist at all, has led to world societies in which the middle ground is viewed as the highest ground. You could almost miss it, that "right to exist at all" buried in the middle.

My first body paragraph praised the Nazis: One of their most important decisions was to seek a means to evaluate intelligence, and also seek to exterminate those who didn't meet their high standards.

As I began writing the essay, the skin at the back of my neck pricked. I train most of my students to write solid four-paragraph essays with upbeat theses: lots of The Diary of Anne Frank, lots of Martin Luther King, Jr.. Meanwhile, if my new essay persona had his way, Miss Frank's death was a stroke of genius and Mr. King should have been picking cotton.

The 13th, 14th, and 15th amendments enfranchised blacks and effectively ended an economically effective two-tier society, in which blacks did what they were most skilled at - genetically superior musculature made them great laborers - and whites controlled intellectual product.

... the reflexive urge to compromise comes from a fear that strongly held views are unkind; it is essentially a push towards dilution. Only by resisting that tendency, for example by safeguarding racial stratification and genetic superiority, can true and ambitious progress be made.

My SAT essay was scanned into a database and sent to two graders somewhere in the country. Those graders, who make roughly $20 an hour and must finish each essay within three minutes, assigned it a score from one to six. I had to wait three weeks to know whether my evil essay would sail through with a 6, crash on a 1, or win me a visit from FBI agents. At the start of those three weeks, I predicted I'd get a high score. There is no standard morality, so a test that is standardized can't evaluate your ethics. To do so would be entirely beyond the SAT's mission. Right?

Whether a twenty-five minute essay is a valid means to evaluate students has been under debate since its inclusion into the SAT 2005. The Educational Testing Service, which writes the SAT, determined in one of its own 1990s studies that no short writing task can accurately measure a student's writing ability. (The ETS-produced Advanced Placement exams, for example, require two hours of essays.) But when, in 2001, the University of California system threatened to cut the SAT as a requirement, ETS bent to their reform requests, among them the short essay. An MIT education professor, Les Perelman, evaluated ETS's grading criteria and observed that the highest scores seemed bound to go to the longest essays. ETS demurred, so he did his own informal study, scoring essays based solely on length by pinning them to the wall across the room. His scores correlated with the real scores 90% of the time. "I have never found a quantifiable predictor in 25 years of grading that was anywhere near as strong as this one," he told The New York Times.

So. Length = good. And, we can still presume, skillful writing = good.

But does goodly = good?

As far as evil goes, I chose a pair of pretty obvious targets. The Nazis live on, after all, in the binders of debate teams across the country. It wasn't that hard to come up with perfectly coherent arguments for genocide and apartheid. Some cruel ideas are like that - they refine logic to such a pure form that it's immune to counterarguments based in pansy concepts like compassion. Eugenics and slavery actually have a long history of respected celebrity proponents. For eugenics: Plato, Margaret Sanger. For slavery: James Madison, Thomas Jefferson. Unlike arguing for random killings or animal abuse, say, an essay supporting these ideas has a rationale that, at least in its own narrow context, holds a couple pints of water.

My friends and students raised doubts about my high score assumptions, though. I could easily get a grader who would gladly nail an immoral essay. After all, try explaining that morality can't be standardized to any kid who actually attends that Catholic high school. Or anyone who was educated between 1000 BC and 1960 or so AD. Today, our goal for secular students is that they gain knowledge and critical thinking skills. But for most of our history, moral education was prime. Our most famous universities were founded as seminaries. In the old schoolhouses, producing proper young gentlemen and ladies was more important than producing scholars. If we can imagine an 1800s SAT, my essay would have shown little knowledge of ethics or moral reasoning and likely received a low score.

Unless, of course, it was graded by a slaveholder. Or a budding eugenicist.

As promised, my score became available online exactly three weeks after I took the test. Along with over a million teens across the world, I clicked through the SAT website, fingers sweaty.

A grader gave me a 6 out of 6, reserved for those essays that exhibit, by the College Board's guidelines, "outstanding critical thinking, using clearly appropriate examples."

That score begged the question: Just what makes an example "clearly appropriate" enough to earn a top score? Appropriate to the essay task, or appropriate to the culture as a whole? The term "appropriate," like most institutional language, manages to give the appearance of clarity while retaining its ability to shift its focus as needed. (Much like the SAT itself, which went from the Scholastic Achievement Test to the Scholastic Aptitude Test to the Scholastic Assessment Test until, in 1994, it was officially declared to stand for nothing so as to avoid legal challenges over accountability.) How can a national test possibly set a uniform standard for appropriateness? The relativist response, of course, is that eugenics and slavery are appropriate just by the fact that they further the argument within which they're presented.

My results weren't as clear as all that, though. The other grader didn't assign my evil essay a top score. It got a 5. Since graders don't provide comments, I turned to the College Board's official standards. While a 6 "effectively and insightfully develops a point of view," a 5 only "effectively develops a point of view." What's missing is the insight. My essay had its problems - the word choice was showy, the slavery example didn't really prove its point about compromise - but I also wonder whether my second grader saw into the soul of my evil alter ego and saw something lacking, what the grading criteria could name insight. I've worked as an SAT tutor for six years, and I've seen plenty of essays far more rhetorically flawed get top scores. But if she took a point off for my subject matter, not wanting to inflate the sails of a racist neo-Nazi, can I blame her? It would take a truly dispassionate person to stick on a gold star on that essay of mine and say "great work, young man."

I'll admit, though, that I was really bummed by that 5. Not because I'd wanted her to give me a 6 - I'd have been far more excited by a 1. If my objectionable morality was worth one point off, wasn't it worth them all? I wanted to know: had she read my essay passionlessly, or had she gotten angry? Did she see a wink where a wink was given, and decide I was taking the piss out of her? Or, more soberingly, is my essay topic not all that uncommon? I'm struck by how distanced my score report is from the real human reactions that generated it. I'll never know where my graders were sitting, or whether they told their families back home about the alarming paragraphs they'd read from 4:51 to 4:54 pm that advocated exterminating the weak and enslaving blacks. No one's proposing that the College Board provide extensive written feedback on the two million essays it has to grade in three weeks. But still I'm dissatisfied. This good boy gone bad wanted to see the reaction he got. Maybe that's how to defuse evil: don't pay it any special attention, just treat it like any other essay in the stack. Don't allow it the pleasure of witnessing your shock.

But maybe that's actually how evil is perpetuated. By evaluating its arguments we accept its precepts. Imagine: You're an SAT grader at the end of a long shift, eager to get home and start dinner. You're paid a buck to assign a grade to four hundred words within three minutes. Give the gifted monster a high score for "outstanding critical thinking, using clearly appropriate examples" and turn to the next essay for more Mother Teresa and Anne Frank.