Showing posts with label tutorial. Show all posts
Showing posts with label tutorial. Show all posts

Saturday, March 28, 2009

Bot Training I: Lingustic Targets

The construction of a chat robot has a video game metaphor. The botmaster writes some AIML, then reads the conversation logs with the resulting bot. Spotting places in the dialog where the bot breaks down, the botmaster writes new AIML to improve the broken dialog, and then repeats this cycle again and again. When we find an opportunity to improve the bot's response in the conversation log, we call that a “target”. Thus authoring the chat bot becomes a process of identifying targets and “filling them up” with new AIML content. The more content, the
more AIML categories created the higher you “score” in this video game metaphor.

Pandorabots has taken the game metaphor a step further and automated the process of finding targets. How do we know when the bot gives an incorrect, vague, or imprecise response? In AIML, the answer is simple: whenever the input pattern contains a wildcard (* or _character). If the input pattern contains no wildcards, then thematch was exact, and in most cases, the bot can formulate an exact reply. If there are wildcards, then the bot by definition only recognized part of what the client said.

(The above describes the situation only to a first order approximation. Strictly speaking, we should consider the input <pattern>, <that> and <topic>. Only if all three contain no wild cards is there truly an exact match. If for example the <pattern> is YES and the <that> pattern is thewildcard *, it is a potential target, because we can make a more exact response by taking into account a <that> value. But as happens many times with AIML, it is simpler to explain a principle of the language by thinking about the input <pattern> in isolation and ignoring the details of <that> and <topic> until later. The extension of the principle to <that> and <topic> is a matter of bookkeeping.)

Strictly speaking, a Target consists of two things: an input, and an AIML category that it matches. For example, the input HE IS STRONG together with the category

<category>
<pattern>HE IS *</pattern>
<template>I did not know he is.</template>
</category>

form a Target. Let's call the input the “Target input” and the category the “Target category”. The Target input “He is strong” together with the Target category above give the botmaster to create a
new, more specific category:

<category>
<pattern>HE IS STRONG</pattern>
<template>Does he work out?</template>
</category>

Pandorabots Targeting algorithm scans the conversation logs, re-classifying the inputs into the AIML Graphmaster, and finds matches. When the matches contain a wildcard, the algorithm saves the input and the matched category on a list of matches. As we might expect, there is a Zipf distribution over the Target categories. Usually there is one category, typically the ultimate default category with <pattern>*</pattern& gt;, associated with more Target inputs than any other category. Then there is a second most activated category, and a third, and so on, down to a long tail of Target categories with only one Target input each.

Using the Targeting algorithm, the botmaster can have quite an enjoyable afternoon building his bot by “filling up lingusitic targets” and accumulating category-count points. To use the Targeting algoritm, the botmaster first selects the conversation logs for analysis. Then by choosing the “Find Targets” option, the botmaster generates a list of Target categories, each with a link to the associated Target inputs. The program provides a direct link from the Target inputs to the Pandorabots Training section, so that the botmaster can efficiently move from Targets to writing new categories.

Friday, March 27, 2009

<srai>: The basics: Table of Common <srai> forms

In writing AIML response templates, certain forms of <srai> responses occur over and over again. They are common enough to be worth identifying and naming them. The following table summarizes the four basic types of common <srai> templates: the Synonym, Simple Wildcard, Multiple Wildcard and Divide and Conquer forms.


Synonym form:

<srai>PHRASE</srai>

Simple Wildcard forms:

<srai><star/></srai>

<srai>PHRASE <star/></srai>

<srai><star index="2"/></srai>

Multiple Wildcard forms:

<srai><star/> <star index="2"/></srai>

<srai><star/> PHRASE <star index="2"/></srai>

Divide and Conquer forms:

<srai><star/></srai>. <srai><star index="2"/></srai>

<srai>PHRASE1 <star/></srai>. <srai>PHRASE2 <star index="2"/></srai>

Thursday, March 26, 2009

<srai>: The basics II: Simple wildcard reductions

AIML uses a wildcard * (the star character) to stand for one or more words. An AIML pattern such as <pattern>I AM *</pattern>, taken by itself, matches a wide range of inputs such as "I am tired", "I am reading a book", "I am waiting for a reply" and so on. If the input is "I am tired", the wildcard is said to be bound to the word "tired" (1 word). If the input is "I am reading a book", the wildcard is bound to "reading a book" (3 words), and similarly, if the input is "I am waiting for a reply", the wildcard is bound to "waiting for a reply" (4 words).

Inside the template, AIML uses the <star/> tag to access wildcard bindings.

Trivially, the category

<category>
<pattern>*</pattern>
<template><star/></template>
</category>

will just echo the client's input:

Human: Hello!
Robot: Hello!
Human: Who are you?
Robot: Who are you?

The simplest form of AIML reduction using <srai> together with <star/>, involves reducing or simplifying the input by one or a few words:

<category>
<pattern>I AM ESPECIALLY *</pattern>
<template><srai>I AM <star/></srai></template>
</category>

If someone says, "I am especially tired", or "I am especially interested in this book", it is really no different logically from saying "I am tired" or "I am interested in this book". A philosopher might say, the word "especially" plays no logical role in the sentence. More practically, the bot may have a reply for "I am tired" and "I am interested in something", so reducing the input by removing the
word "especially" will link these inputs to appropriate responses.

The categories:

<category>
<pattern>I AM TIRED</pattern>
<template>Maybe you should take a nap?</template>
</category>

<category>
<pattern>I AM ESPECIALLY *</pattern>
<template><srai>I AM <star/></srai></template>
</category>

produce the dialog:

Human: I am especially tired.
Robot: Maybe you should take a nap?

As a bonus, these types of reduction categories can reduce a sentence with "I am" followed by any number of occurrences of "especially":

Human: I am especially especially especially tired.
Robot: Maybe you should take a nap?

A slightly related, also very common form of reduction, eliminates sequences of words (clauses) that can be eliminated from the input without changing its meaning significantly. Such clauses are decoration, added by the human personality, perhaps as social conventions, but again, the philosopher might say they have no logical purpose. "I will state that I warned you about his condition" is really the same as "I warned you about his condition", at least as far as the robot is concerned. The robot may already have a response to "I warned you about something", so by reducing the input, the bot stands a better chance of making an intelligent sounding reply.

In these cases we use a <pattern> with a wildcard, but end up throwing away all the words in the pattern except for the wildcard:

<category>
<pattern>I WILL STATE THAT *</pattern>
<template><srai><star/></srai></template>
</category>

Another example is a category for inputs that start with "At any rate..."

<category>
<pattern>AT ANY RATE *</pattern>
<template><srai><star/></srai></template>
</category>

A last example handles sentences that begin "I assure you that..."

<category>
<pattern>I ASSURE YOU THAT *</pattern>
<template><srai><star/></srai></template>
</category>