Scripting for Conceptual Respresentation

Post date: Oct 22, 2014 1:21:48 AM

Knowledge representation

Knowledge representation is crucial. One of the clearest results of artificial intelligence research so far is that solving even apparently simple problems requires lots of knowledge. Really understanding a single sentence requires extensive knowledge both of language and of the context. For example, today's (4th Nov) headline ``It's President Clinton'' can only be interpreted reasonably if you know it's the day after the American elections. [Yes, these notes are a bit out of date]. Really understanding a visual scene similarly requires knowledge of the kinds of objects in the scene. Solving problems in a particular domain generally requires knowledge of the objects in the domain and knowledge of how to reason in that domain - both these types of knowledge must be represented.

Knowledge must be represented efficiently, and in a meaningful way. Efficiency is important, as it would be impossible (or at least impractical) to explicitly represent every fact that you might ever need. There are just so many potentially useful facts, most of which you would never even think of. You have to be able to infer new facts from your existing knowledge, as and when needed, and capture general abstractions which represent general features of sets of objects in the world.

Knowledge must be meaningfully represented so that we know how it relates back to the real world. A knowledge representation scheme provides a mapping from features of the world to a formal language. (The formal language will just capture certain aspects of the world, which we believe are important to our problem - we may of course miss out crucial aspects and so fail to really solve our problem, like ignoring friction in a mechanics problem). Anyway, when we manipulate that formal language using a computer we want to make sure that we still have meaningful expressions, which can be mapped back to the real world. This is what we mean when we talk about the semantics of representation languages

Scripts are used by humans, in a sense.

Imagine you hear this story: "Bob went to the shops. Ten minutes later, he walked out with his shopping and went home."

You make a few assumptions - that Bob bought the shopping, that Bob was short of a few items etc.

The reason you know this is because you follow a script unconsciously in your head. You know the basic outline of shopping (due to experience) and you can fill in the details, and make assumptions from the rest.

Let's look at another story: "Bob went to the gardeners. He asked the waiter for a BMW and left." Now, this story makes no sense whatsoever to the normal person! This is because is does not follow the "gardeners-script". Gardeners don't have waiters, nor do they sell BMW's!

A concept can become a a part of multiple trains of thought having many connections, connections are often via "Linking Verbs".

Is a, Has the color, eats, is the size....

Having said that CR programs are incredibly difficult to program, that doesn't mean such programs don't exist.

All have been demonstration, proof-of-concept programs. these models can be created using the object oriented programming paradigm.

Perhaps one of the most famous AI programs, SAM (Script Applier Mechanism) was developed in 1975 by Richard Cullingford, Wendy Lehnert, Anatole Gershman and Jaime Carbonell. It was designed to read stories that followed basic scripts, and output summaries in several languages, and create questions and answers based on the text.

SAM had 4 basic modules: a parser and generator based on a previous program, then the main module - the Script Applier (by Cullingford), the question-answer module (Lehnert), and the Russian and Spanish generators (Gershman and Carbonell, respectively)

Here is some sample output from SAM:

Input: Friday evening a car swerved off Route 69. The vehicle struck a tree. The passenger, a New Jersey man, was killed. David Hall, 27, was Pronouced dead at the scene by Dr. Dana Blanchard, medical examiner. frank Miller, 32, of 592 Foxon Rd., the driver, was taken to Milford Hospital by Flanagan Ambulance. He was treated and released. No charges were made. Patrolman Robert Onofrio investigated the accident. English Summary: AN AUTOMOBILE HIT A TREE NEAR HIGHWAY 69 FOUR DAYS AGO. DAVID HALL, AGE 27, RESIDENCE IN NEW JERSEY, THE PASSENGER, DIED. FRANK MILLER, AGE 32, RESIDENCE AT 593 FOXON ROAD IN NEW HAVEN, CONNECTICUT, THE DRIVER, WAS SLIGHTLY INJURED. THE POLICE DEPARTMENT DID NOT FILE CHARGES. Spanish Summary: UN AUTO CHOCO CONTRA UN ARBOL CERCA DE LA CARRETERA 69 EL VIERNES AL ANOCHECER. DAVID HALL, DE 27 ANOS, QUE VIVIA EN NEW JERSEY, EL PASAGERO, SE MURIO. FRANK MILLER, DE 32 ANOS, QUE VIVIA EN 593 CALLE FOXON EN NEW HAVEN, CONNECTICUT, EL CHOFER, QUEDO UN POCO HERIDO. EL DEPARTAMENTO DE POLICA NO ACUSO A NADIE. Question-Answering Output: Q1: Was anyone killed? A1: YES, DAVID HALL DIED Q2: Was anyone hurt? A2: YES, FRANK MILLER WAS SLIGHTLY INJURED. Q3: Why was Miller hurt? A3: BECAUSE THE AUTOMOBILE HIT THE TREE. Q4: Did Miller go to the hospital? A4: YES, HE WAS IN THE MILFORD HOSPITAL. Q5: How did Miller get to the hospital. A5: THE FLANAGAN AMBULANCE COMPANY TOOK HIM TO THE MILFORD HOSPITAL.

SAM had a few shortcomings, though. If a story digressed from a script, SAM would have a hard time. A program that handled stories with more complicated plots, and characters would need more complicated structures.

.

Recall a famous story, perhaps the birth of Jesus, "Boy Cries Wolf", "Little Red Riding Hood" or any story that you know. Now, tell that story to a friend. After that, tell that story to another friend. Did you tell the story in the exact same way each time? It is highly likely you didn't. Why did you do this? The answer lies in the way you remember the story - you do not remember the story word for word, you store the ideas and the concepts of the story in your head. In Artificial Intelligence, this is called conceptual representation (CR).

What does this have to do with Artificial Intelligence? Well, imagine the potential of a program that could parse information and store it in a string of concepts.

  • Translation: Translation programs are notorious for their incredible sketchy translations, due to the fact they often took the absolute, or most common, meaning a word to translate. For example, a program might take the English phrase, "Mum, please don't hassle me, I've gotta fly to school now, I woke up late." and translate the word fly as to take a plane. If CR was used, firstly a parser would parse the sentence, then a conceptual-representoer would create the necessary data structures then a translator would translate those concepts into the necessary language, without the complications that arose before.

  • Paraphrasing: When you paraphrase something, you take the information you were given and then recreate your own shorter version. The Microsoft Word paraphraser uses a mathematical approach to paraphrasing, here how it does it:

  • How does Auto Summarize determine what the key points are? Auto Summarize analyzes the document and assigns a score to each sentence. (For example, it gives a higher score to sentences that contain words used frequently in the document.) You then choose a percentage of the highest-scoring sentences to display in the summary.

(Extract from Microsoft Word Online Help)

Whilst this is very effective when you would like to take a large document and only read the most important parts, such a mathematical method produces output that does not always make sense as a whole.

  • If a CR-approach is used, the overall summary will not only highlight the most important parts, it will also make grammatical sense as a piece of text. Such a program would be great for radio stations and other networks that receive information from Press networks. Information could be paraphrased by the very computers receiving the information, making the job of making the news reports presentable, a lot easier.

  • Story creation: Apart from the field of computer arts, other applications of story creation could perhaps be in gaming, where the story is altered and reconstructed dynamically according to how the player changes the game world - imagine the long-term playability of that!

MARGIE

(Meaning Analysis, Response Generation and Inference on English) -- model natural language understanding.

SAM

(Script Applier Mechanism) -- Scripts to understand stories. See next section.

PAM

(Plan Applier Mechanism) -- Scripts to understand stories.