Last time, we were looking at what goes on inside our heads as we read a text, in hopes it might shed some light on how to emulate human reading proficiency in an automated knowledge-acquisition capability. More specifically, we looked at whether, and to what extent, readers actively construct and manipulate mental models of the entities and events they encounter in the text, versus whether they are content to simply skim over, without reflecting on, the words on the page.
If true, the latter alternative would certainly be far simpler to automate, but is it?
Judging by the pitched battles between "constructivists" and "minimalists" raging in reading theory throughout the 1990s, that question turned out to be more controversial than you might think. But when the dust settled, one experimental result, due to Leo Noordman and Wietske Vonk, seemed fairly well established -- namely, that the harder the text, the less inferencing and/or mental modeling occurs while actually reading it. Rather, any such active (re)construction of a difficult text's meaning only takes place after the fact, and even then only if readers are interrogated regarding the material.
More significant for present purposes, though, is the converse finding that people do deploy inferencing and modeling when reading "texts dealing with familiar topics".
Why should familiarity with a topic enable readers to engage in inferencing on the spot? A review of Noordman and Vonk's experimental design offers a clue: It turns out that, in those cases which gave rise to deferred, only-on-demand inferencing, "[t]he texts were chosen so that readers did not have the background knowledge underlying the inferences" [Noordman and Vonk 1993].
Crawling a bit further out on this limb, I'm going to interpret the above as implying that the reason the more difficult texts failed to evoke any inferencing on the part of their readers was that, in the absence of the relevant background knowledge, those readers had no basis on which to infer anything -- at least not without considerable conscious, after-the-fact cogitation.
What Noordman and Vonk appear to have demonstrated, then, is that having a sufficient store of readily accessible background knowledge -- a.k.a. common sense -- on the subject matter at hand is prerequisite to performing the sort of immediate inferencing that enables a reader to actually understand a text as he or she is reading it.
What does all this say about knowledge-acquisition systems, though? Are they going to require background knowledge too? And, if so, --
How Much is Enough?
Even granting the assumption -- which, as we've seen, is by no means rock-solid -- that we have a relatively firm fix on how humans read and understand texts, would we necessarily need or want a computer program to do things the same way?
Take, for instance, the question of motivation: Would we want a machine that would read for the same reasons that we humans do? While it's certainly not inconceivable that some post-singularity artificial intelligence might decide to curl up with a good book purely for purposes of personal enrichment or enjoyment, it's hard to see this as a high-priority design goal on the part of its human architects. If anything, our current concept of a knowledge-acquisition machine seems closer to the state of affairs described by Ashwin Ram [1999, p. 258]:
People read newspaper stories for a reason: to learn more about what they are interested in. Computers, on the other hand, do not. In fact, computers do not even have interests; there is nothing in particular that they are trying to learn when they read.
Lacking, at least at the present stage of AI development, any innate goals of its own, all of a computer's motivations for reading, as for any other task, must originate from the outside -- from us. Machines read, if they do so at all, to serve our purposes, whether those purposes are answering questions about the text, or summarizing news articles, or collating reports for analysis, or, in the most generic case, producing some output in response to what has been input.
Note, however, that this wholly instrumental nature of machines' reading has the effect of placing an even greater premium on their ability to accurately represent and reason about what it is they have read. After all, absent any internal imperatives, a tool's only reason for being is that it performs its assigned task well.
(Not to put too fine a point on it, consider that a person who knew both Russian and English might enjoy reading War and Peace in the original, yet draw the line at writing it all back out in translation. Now contrast this case with that of a machine-translation system that likewise inputs War and Peace in Russian, and likewise produces no output. Clearly, although the person's experience is entirely plausible and readily justifiable, a machine-translation program that stops halfway through the job like this would have no conceivable raison d'etre whatsoever.)
This, by extension, implies that, while knowledge-modeling and inferencing may or may not be optional for human readers, they are mandatory for any computer system that purports to read and understand stories. As we saw, humans encountering the description of a fatal fall -- from either a fourteen-story building or the Bridge of San Luis Rey -- might be forgiven for not intuitively making the connection to grievous injury or death. A story-understanding program that failed to do so, however, would be hard put to justify its very existence.
As an example of a system that would seem to flunk that test, we have James Meehan's account of his travails in trying to program an understanding of this self-same phenomenon of falling into an early story-telling program called TALE-SPIN [Meehan 1981, p. 218]:
Here are some rules that were in TALE-SPIN when the next horror occurred:
... If you're in a river, you want to get out, because you'll drown if you don't. If you have legs you might be able to swim out. With wings, you might be able to fly away. With friends, you can ask for help.
These sound reasonable. However, when I presented "X FELL" as "GRAVITY MOVED X," I got this story:
HENRY ANT WAS THIRSTY. HE WALKED OVER TO THE RIVER BANK WHERE HIS GOOD FRIEND BILL BIRD WAS SITTING. HENRY SLIPPED AND FELL IN THE RIVER. GRAVITY DROWNED.
Poor gravity had neither legs, wings, nor friends. ...
As far as TALE-SPIN was concerned, gravity was lacking something much more important than just appendages and acquaintances: it was lacking access to any overarching framework that could integrate this fundamental force of nature meaningfully into the rest of TALE-SPIN's microworld.
Such frameworks, or schemata, are in turn said to be (analogues of) the foundational constructs by which we humans come to grips with our natural and social environments. In their totality, they comprise that faculty of "common sense" which, as we saw above, is key to our ability to comprehend information on the fly.
As we have seen, it is this commonsensical or background knowledge, built up over the course of a lifetime's experience in the real world, that story-tellers like Lawrence Sterne and Thornton Wilder take for granted on the part of their audiences -- and that their silicon-based audiences have so far lacked. If endowing a computerized text-understanding system with the requisite constructive reasoning ability is truly what is needed for accurate knowledge acquisition, then this will involve explicitly encoding thousands (millions?) of propositions into a knowledge base, and providing mechanisms for rapidly matching them against the situation under analysis.
And even then the effort may prove unavailing. But that's a story for next time.
Meehan, James (1981), "TALE-SPIN," in Roger C. Schank, Christopher K. Riesbeck, eds., Inside Computer Understanding: Five Programs Plus Miniatures, Hillsdale NJ: Erlbaum, 1981.
Noordman, Leo G. M. and Wietske Vonk (1993), "A More Parsimonious Version of Minimalism in Inferences," Psycoloquy: 4(08) Reading Inference (9). http://www.cogsci.ecs.soton.ac.uk/cgi/psyc/newpsy?4.08.
Ram, Ashwin (1999), "A Theory of Questions and Question Asking," in Ashwin Ram and Kenneth Moorman, eds., Understanding Language Understanding: Computational Models of Reading, Cambridge MA: MIT Press, pp. 253-298, at: http://www.cc.gatech.edu/faculty/ashwin/papers/git-cc-92-02.pdf.