November 13–15, 2014
Westin Arlington Gateway in Arlington, Virginia adjacent to Washington, DC

Motivation and Scope:

Today’s enterprises need to make decisions based on analyzing massive and heterogeneous data sources. More and more aspects of decision making are driven by data, and as a result, more and more business users need access to data. Offering easy access to the right data to diverse business users is of growing importance. There are several challenges that must be overcome to meet this goal. One is the sheer volume: enterprise data are predicted to grow by 800 percent in the next five years. The biggest part (80 percent) are stored in unstructured documents, most of which are lacking informative meta data or semantic tags (beyond date, size, and author) that might help in accessing them. A third challenge comes from the need to offer access to these data for different types of users, most of whom are not familiar with the underlying syntax or semantics of the data. 

Natural Language Interfaces and Question Answering Systems, such as Watson, Smartweb, Siri, Start, or Evi, have been successfully implemented in various domains; for example in encyclopedic knowledge bases (e.g., IBM`s Jeopardy Challenge), in the field of energy (e.g., DGRC), or in the domain of mathematics (e.g., Wolfram Alpha). Following up on prior work in natural language interfaces to databases (NLIDB) and question answering (QA) systems, this workshop brings together experts from both academia and industry to present their most recent work related to problems that leverage natural language in the context of big data. They can share information on their latest investigations and exchange ideas and thoughts in order to push the research frontier towards new technologies that tackle the aspect of natural language access to large-scale and heterogeneous data.

Call for Papers:

We welcome the submission of research papers on all aspects of natural language access and question answering to large-scale structured and unstructured data. The following topics are of particular interest:

 Natural language interaction technologies (e.g., in the context of knowledge navigation; personal assistant)
 Speech interfaces and interactive question answering
 Automatic question answering based on structured data sources
 Natural language access to the Semantic Web 
 Question answering and natural language interfaces to Linked Data
 Formalization of structured information / queries (RDF, OWL, SPARQL)
 Machine learning techniques (e.g., large-scale hierarchical classification) for translating the users' information needs into formal queries 
 Information extraction at web scale that supports natural language access 
 Web mining and social network analysis
 Social media analysis and opinion mining
 Text summarization (e.g., question-focused summarization)
 Natural language processing for document analysis including information extraction, semantic role labeling and co-reference resolution
 Architectures for natural language access to big data
 UIMA modules
 Applications and projects

Invited Talks:

Jim Spohrer,  Director, Global University Programs, IBM
Frank Stein, Director, Analytics Solution Center, IBM

Title:  An Industry Perspective on Watson and Cognitive Assistants for Leveraging Big Data

Recent advances in cognitive computing are leading to commercially viable cognitive assistance systems.  And, in this age of zettabytes and yottabytes, users will increasingly rely on these Cognitive Assistants to help them to utilize this data in their professionals. This talk will discuss some of the current r&d work at IBM and elsewhere on providing access to big data via cognitive assistants, some thoughts on further research needed, and our recent effort to build a coalition of academia, industry, and government to advance the start of the art. 

Jason Eisner, Johns Hopkins University

Title: Weighted Deduction for Analyzing Natural Language and Other Data

Dyna is a very concise declarative notation for specifying computations over heterogeneous data.  Dyna can naturally express NLP and AI algorithms, as well as other computations that analyze and aggregate semi-structured data of any sort.  Thus, it can serve as a bridge between NLP and big data.  

A Dyna program specifies a dynamic deductive database.  Such a "dynabase" is a finite or infinite network of data items that depend on one another and perhaps on other dynabases.  Querying and updating these data items has a well-defined semantics.  An implementation of the language has the freedom (and the responsibility) to select strategies for managing queries, updates, materialization, and indexing.  

This talk will illustrate how Dyna makes it easy to specify the combinatorial structure of typical computations needed in NLP, ML, and AI, and how it allows easy integration of multiple data sources and analytics.  We will also briefly discuss implementational issues and key research directions.




Thursday, November 13

09:15 – 09:45

Welcome and goals of the symposium: Natural Language Access to Big Data

09:45 – 10:30

James Mayfield, Paul McNamee, Craig Harman, Tim Finin and Dawn Lawrie: Kelvin: extracting knowledge from large text collections

10:30 – 11:00

Coffee break

11:00 – 12:30

Invited talk
Frank Stein, Jim Spohrer:
An Industry Perspective on Watson and Cognitive Assistants for Leveraging Big Data

12:30 – 02:00


02:00 – 02:45

Nitish Aggarwal and Paul Buitelaar: Wikipedia-based Distributional Semantics for Entity Relatedness

02:45 – 03:30

Charles Jochim, Bogdan Sacaleanu and Lea   Deleris: 
Risk Event and Probability Extraction for Modeling Medical Risks

03:30 – 04:00

Coffee break

04:00 – 04:45

Peter Yeh and Adwait Ratnaparkhi:
Mining Large-Scale Knowledge Graphs to Discover Inference Paths for Query Expansion in NLIDB

04:45 – 05:30

Malte Sander, Ulli Waltinger, Mikhail Roshchin and Thomas Runkler:
Ontology-based translation of natural language queries to SPARQL


Friday, November 14

09:00 – 10:30

Invited talk: 
Jason Eisner:
Weighted Deduction for Analyzing Natural Language and Other Data

10:30 – 11:00

Coffee break

11:00 – 12:30

Joined Session with HUMAC      
Invited Talk: Ronald Arkin:
Lethal Autonomous Systems and the Plight of the Noncombatant

12:30 – 02:00


02:00 – 02:45

Sunandan Chakraborty and Lakshminarayanan Subramanian: Extraction Of (Key,Value) Pairs From Unstructured Ads

02:45 – 03:30

Jennifer Sleeman and Tim Finin: Contextualizing Entity Type Mappings

03:30 – 04:00

Coffee break

04:00 – 04:45

Bonnie Dorr, Milenko Petrovic, James Allen, Choh Man Teng and Adam Dalton: Discovering and Characterizing Emerging Events in Big Data (DISCERN) Expansion in NLIDB

04:45 – 05:30

Gyula Vörös, Péter Rabi, Balázs Pintér, András Sárkány, Daniel Sonntag and Andras Lorincz:
Recommending Missing Symbols of Augmentative and Alternative Communication by Means of Explicit Semantic Analysis


Saturday, November 15

09:00 – 12:00

Working group meeting to draft roadmap for Natural Language Access to Big Data / Cognitive Computing



Format of Symposium

The symposium will include invited talks, presentations on accepted papers, a demo/poster section with showcases for new advances in natural language access and interfaces, and an open panel discussion focusing on key issues at the end. 

Accepted papers can be accompanied by a poster or demo in order to encourage a more interactive presentation of the content.

Submission Details

Regular papers: the length of regular papers (including figures and bibliography) should not exceed 8 pages.
Poster/summary paper: the length of poster/summary papers (including all figures and
bibliography) should not exceed 4 pages. 

All papers must be submitted in PDF format; camera-ready versions must be formatted according to the AAAI guidelines.

Important Dates

 Papers due to: July 6, 2014 (extended)

 Author notifications: July 31, 2014

 Accepted camera-ready copy due to AAAI: September 10, 2014

 Symposium: November 13–15, 2014