Overview

Tutorial Overview

Natural Language Generation (NLG) has undergone significant advancement in the recent past, and various NLG systems are being used for either data-to-text tasks (e.g., generating financial reports from tables, generating weather reports) or text-to-text tasks (e.g., summarizing news reports, text-style transfer).

Structured data and knowledge bases or knowledge graphs are a key machine representation mechanism used in a wide variety of domains to capture domain-specific knowledge. For example, 1) the financial performance of companies and industries in financial domain, or 2) information about chemical composition of drugs, patient records, etc. in healthcare domain, or 3) inventory records of products and their features in retail domain, are all captured with domain-specific KGs/KBs. For AI driven interaction applications, often, it is important to communicate the content being represented in such knowledge bases in the form of natural language (such as English).

Take an example in question-answering setting in Financial domain where a question:

How did XYZ corp. perform compared to its competitors in North America in last 2 quarters?

would query a DB/KG and retrieves a result set table containing the relevant financial performance numbers about revenues, profit margin, competitors, technology segments, quarterly breakdown, etc.. However, it is not just sufficient for an AI system to simply display such a table of numbers, but rather, go one step further and explain the key message that addresses the user's question in plain natural language, for example, by saying,

In the N.A. region, XYZ Corp's  revenues in the Cloud segment increased by 11% to $8.9B in the last 2 quarters as compared to its key competitor ABC. However, in the Analytics segment its revenues declined by 3% while ABC's revenues grew by 4% and that of other smaller players in Analytics increased much more (around 8%).

Another important use-case is story-telling from data such as report generation -- for example in weather domain (localized weather reports), finance (company performance reports) or healthcare (patient reports).

Motivated by above, this “first-of-its kind” tutorial intends to provide the conceptual underpinnings of the natural language generation (NLG) from a variety of structured representations. We will discuss various NLG paradigms ranging from heuristics to the modern data-driven techniques that include end-to-end neural architectures. A brief overview of evaluation methods and output quality estimation techniques will also be provided.