THE BLOG
05/05/2014 10:45 am ET | Updated Jul 05, 2014

Building a Conversational Agent from the Ground Up, Part II

In the first installment of this three(?)-part blog on creating an AI that can hold up its end of a free-form conversation, we took a look at what doesn't work (ELIZA) and what would work (full sentence parsing), were it not for the problems raised by ambiguity. And we left off with the thought that perhaps overcoming those problems would require going beyond the bounds of the individual sentence.

The first step in doing so is ...

Discourse Analysis and Management, or What's Going On Around Here?

The NLP technology that focuses on units of language larger than a sentence is discourse analysis and management (DAM). At this level, the equivalent of a parser's search for the sentence's nouns and verbs is a discourse analyzer's quest for the topic of the conversation as a whole.

It should be clear how such topicality tracking might apply to our previous parsing problem: knowing whether the theme of a discussion to date was "temporality" or "little-known fly subspecies of the world" can make a big difference when we go to disambiguate "time flies like an arrow." This aspect of discourse management falls under the heading of "semantic priming" or "spreading activation." Arduous to implement, it's simple enough in concept: if you've been discussing airplanes and someone says "pilot," it's a safe bet they're not talking about the light in your stove.

Another reason to track the topic is that knowing which information is old and which is new can enable a conversational agent to place emphasis on the novel items in its response. This ability to guide the prosodics of an utterance can make, among other things, for more natural-sounding speech synthesis.

One DAM Thing After Another

But perhaps the single most noticeable contribution a discourse analysis and management system can make to our slowly-coalescing conversational agent lies in the area of linking pronouns back to their antecedents.

No matter how accurate a conversational system's parser, much of its credibility seems to hinge instead on how well it can mimic the small linguistic grace notes -- like pronoun resolution -- that we humans take utterly for granted. Precisely because we are so good at handling such focus of discourse issues ourselves, an AI agent's failure to take them in stride becomes glaringly obvious. I refer to this unspoken expectation of competence in our conversational partners as the "Heidegger's Hammer" syndrome -- it having been philosopher Martin Heidegger who observed that a carpenter only notices the hammer he or she is using when it breaks.

If finding the referent for a pronoun is one of the key tasks for any DAM system, it is also one of the best windows into the nature of discourse analysis as a whole. For, whereas (symbolic) parsing may at least aspire to the algorithmic rigor of a context-free grammar, all discourse analysis has to work with is heuristics -- many rules of thumb instead of a single rule of law.

That Doggie in the Window

To see what's involved, consider the following sentence:

The puppy pressed its nose so hard against the window pane that it broke it.

The question is: what do the three "it"s (including the possessive form "its") in this sentence refer to? Here are the issues the discourse analyzer must grapple with as it gropes toward an answer:

1. The "it" in "its nose" can be resolved mainly on a syntactic level: here, it almost certainly refers to the puppy, which is the only literal antecedent in sight.

That said, however, it should be mentioned that there is also an outside chance of a forward reference (a cataphor) to the window pane. For instance, what if the sentence had read:

The puppy pressed against its glass so hard that the window pane broke.

This implies we may need to apply some knowledge-based constraints to our analysis of the original sentence (e.g., window panes may have glass, but they don't have noses).)

2. The first "it" in "it broke it" also looks pretty simple -- it's obviously "the puppy," right? Well, maybe... but it could be "its nose" (a case-grammar analysis could probably resolve this one, since, with its built-in preference for agent over instrument, it mirrors what a human would do).

2a. ...But, more troublesome, what if the sentence had been:

The puppy pressed its nose so hard against the window pane that it broke.

Now that second "it" seems less likely to resolve to "the puppy" (after all, it takes more than a window pane to break a whole dog!), and more likely to refer to the window pane... or possibly "the puppy's nose."

3. ...and that's also the problem with the final "it" in our original sentence, too: as a result of the puppy's pressing, something ("it") gets broken. But is "it" the "window pane" or the "puppy's nose"? Before you answer that one, what if the sentence had been:

The puppy pressed its nose so hard against the cornerstone of the Empire State Building that it broke it.

Syntax and semantics can't save us now. To figure out what actually happened here we need real-world data on the differential stress resistance and strength-of-materials of puppy's noses vs. window panes vs. cornerstones.

In all fairness, this seemingly simple sentence is, in the final analysis, no less ambiguous to a human language processor, than to an automated one. It's a nice example of the sorts of quagmires that await the dabbler in discourse analysis.

It also illustrates why even discourse analysis and management is never enough. Factual information about such things as the relative fragility of windowpanes, cornerstones, and puppy's noses will forever remain beyond its ken.

No, to take such factors into consideration, we need a modicum of knowledge about the real world too. Next time we'll take a look at how we might go about getting it ...