05/05/2014 08:42 pm ET Updated Jul 05, 2014

Building a Conversational Agent from the Ground Up, Part III

In the second installment of this three-part blog on creating an AI that can hold up its end of a free-form conversation, we took a look at how discourse analysis and management (DAM) can help redress some of the challenges that ambiguity presents to both symbolic and statistical parsing technology. But we also saw that DAM runs up against limitations of its own.

Maybe it's time to apply a little common sense?

Knowledge Representation: The Critique Of Pure Reason

It might seem to go without saying that, in order to use language with something approaching human competence, it's helpful to have an idea of what it is you're talking about. Curiously enough, this seemingly self-evident insight flies in the face of the last half century or so of linguistic theory -- theory which has been at pains to maintain a strict separation between syntax and semantics. On the other hand, our conversational agent doesn't have to conform to the dictates of the theoreticians -- it just has to work!

And making our agent work does turn out to involve endowing it with some sense of the meaning underlying the language it uses, and of the world that language is intended to describe. Not only is this essential in order for the agent to interpret correctly what we say to it, it is just as important for maintaining the coherence and consistency of what the agent itself says in response. At this juncture, however, the pursuit of conversational competence for our agent leads us out of the domain of computational linguistics altogether, and into that of knowledge representation and reasoning (KR&R).

As the name suggests, KR&R focuses on how reality as a whole might best be represented in an internal "mental model," and what operations on that model are needed in order to draw valid conclusions from known facts. All of which is to say that knowledge representation is heavy-duty, general-purpose technology, with concerns and applicabilities ranging far beyond that of constructing simulated conversationalists. This is to be expected given the nature of the challenge, of course, but it can give rise to a sort of "overkill" phenomenon. All we are trying to do, after all, is to approximate human reasoning powers, whereas mainstream KR&R research often seems to be trying to exceed them.

Perhaps the best example of this phenomenon is also the best-known KR&R project in the world: Cyc. Begun back in the mid-eighties, Cyc (short for Encyclopedia) has called forth the sort of generational effort that earlier ages reserved for cathedral building. Its knowledge base now consists of several hundred thousand concepts, and several million propositions about those concepts, but the system as a whole is still hovering on the brink of commercial applicability.

Be that as it may, Cyc's creator Doug Lenat did at least furnish an epigram for Part I of my new science thriller Dualism:

"Absolutely none of my work is based on a desire to understand how human cognition works. I don't understand, and I don't care to understand. It doesn't matter to me how people think; the important thing is what we know, not how do we know it."

B. Jack Copeland is less sanguine about Cyc: in his view, Lenat suffers from "a failure to appreciate the sheer difficulty of the ontological, logical and epistemological problems that he has taken on."

But again, encompassing all of what human beings can potentially know may be overkill for our purposes. There may, in fact, be no point in starting up the same asymptotic learning curve that Cyc has been scaling for the past thirty years. As opposed to Doug Lenat's focus on universal Knowledge with a capital "K," all our conversational agent really needs is just enough in the way of memories, beliefs, expertises, and reasoning ability to hold its own in normal conversation about topics relevant to the character it's intended to portray.

One way to think about what's required is to draw a distinction between the amount of medical knowledge it takes to actually function as a physician and how much it takes to impersonate that same physician in a General Hospital soap opera. (Think: "I'm not a doctor, but I play one on TV.") So what if ER's George Clooney couldn't tell an esophagus from a duodenum? "Dr. Doug Ross" wasn't going be slicing anybody open looking for one or the other -- and neither will our conversational characters.

Even these somewhat scaled-down expectations involve more in the way of implementation issues than I can hope to address in the allotted space-time, but here is a "One-Minute Epistemologist" version:

  • Our knowledge model (which, for convenience, I will refer to here as a "mindset") begins with an ontology -- that is, a catalogue of all the things in the world that we want our agent to be able to converse about (its "universe of discourse");
  • Next we specify how the things (more properly, the "concepts") in the ontology are related to one another. We do so by embedding them in a so-called "IS-A" hierarchy, not unlike the class hierarchies used in object-oriented programming. IS-A linkages will capture the fact, for instance, that an elephant is-a mammal, that a mammal is-a vertebrate, that a vertebrate is-a[n] animal, etc.;
  • The rationale for the IS-A hierarchy in KR&R is the same as for the class hierarchy in C++: inheritance of properties and functionalities. If we install the proposition "mammals breathe oxygen" into our embryonic mindset, then elephants, being mammals, will automatically inherit this attribute, freeing us from the necessity of further asserting that elephants -- not to mention dogs, cats, etc. -- "breathe oxygen," and so on.
  • The kind of knowledge we are capturing and encoding so efficiently here is what philosopher Rudolf Carnap called intensional knowledge -- i.e., propositions that are true of a class (and its subclasses) in general. And the kind of reasoning such knowledge affords is a logical formalism called First Order Predicate Calculus (FOPC). Much of the inferencing (the drawing of conclusions from a body of known fact) which our agent will be called upon to perform in imitation of human reasoning boil down to theorem-proving operations in this propositional calculus.
  • The underlying inheritance scheme gets into trouble, though (and FOPC along with it), when we try to extend it from classes of things to the individuals belonging to those classes. If an agent's mindset posits both that "elephants have four legs" and that "Clyde is an elephant," then when asked "How many legs does Clyde have?", the agent should answer "Four!" Except that (oops!) I forgot to mention that poor Clyde, victim of a freak gardening accident, only has three legs.
  • Such extensional knowledge -- as the propositions specifying the properties and behaviors of individuals (as opposed to classes) are collectively known -- represents a speed bump in the otherwise smooth top-downward propagation path of inheritance logic. Particular assertions made at this level can, as the Clyde example demonstrates, override universal intensional truths handed down from on high. And, unlike C++, FOPC does not look kindly on "method overriding."

The fact that extensional knowledge does not play nicely with others has led more than a few ontologists to exclude it from their representations entirely. This is, to say the least, unfortunate, since it is at the level of individual entities and incidents that all the really interesting things there are to talk about must be encoded. Lamentable as it may be to the logicians among us, the truth of the matter is that cocktail-party conversation made up entirely of statements like "a mallard is a kind of duck" or "bread is made of flour" gets old pretty quick, at least by comparison with discourse on such hot topics as whom George Clooney is engaged to this week.

An ability to subvert the otherwise monotonous validity of first-order truth-testing knowledge representations becomes even more crucial when we consider the most likely venues for early-adopter conversational agents: portraying interactive characters in the games-and-entertainment space. Bereft of an "extensional-override" capability, our agents would, among other things, be incapable of lying. Characters in stories, on the other hand, seldom if ever tell the whole truth and nothing but the truth: even the simplest murder mystery presupposes that at least some of the suspects aren't telling us everything they know.

Even those knowledge representation systems that give extensional knowledge its due may still fall short of what we need in a conversational agent. The point about knowledge of specifics is that it's, well, specific -- every conversational character in an interactive fiction, while sharing a baseline knowledge of common-sense generalities with the rest of the dramatis personae, is going to need its own individual mindset as well. Lacking at least some personalized facts, memories, beliefs it's hard to see how an agent could portray a personality. So, the final challenge to be overcome in our quest for a conversational agent able to enliven our interactive fictions is authorability -- not just mindsets, but mindsets that can be tailored to order as part and parcel of the content-creation process.

How do we build such a mindset? How do we feed in the facts, memories, and opinions that undergird a unique conversational characterization?

Simple (in principle if not in fact): we leverage the agent's inbuilt capabilities and -- having first ratcheted its gullibility level up to the point where it'll believe anything we tell it -- we hold a conversation with it.

* * *

If you'd like to see some of this conversational technology in action, come on out to my Tech Talk being hosted by Huffington Post tomorrow (Wednesday) evening, May 7th, from 6:30 pm on. Check out the Event Page for details and be sure to click the "Join and RSVP" button on the top right.

Hope to see you there!