Speaker: Dave Lewis , Brainspace, A Cyxtera Business
Abstract: In December 2006, a change to the US Federal Rules of Civil Procedure made “electronically stored information” – effectively every bit of storage in an enterprise – fair game for discovery requests in civil litigation. The result was a multi-billion dollar electronic discovery industry, a remarkable embrace by lawyers and judges of the artifacts of experimental machine learning (learning curves, effectiveness estimates, active learning,...), and a torrent of technical challenges for machine learning, natural language processing, information retrieval, and statistics. I will discuss the state of e-discovery science and technology, and its spread to new applications such as internal investigation and breach response.
Biography: David D. Lewis, Ph.D. is Chief Data Scientist at Brainspace, a Cyxtera business, where he leads their research efforts as well as the machine learning software development team. Prior to joining Brainspace, he was variously a freelance consultant, corporate researcher (Bell Labs, AT&T Labs), research professor, and software company co-founder. Dave has published more than 40 peer-reviewed scientific publications and 9 patents. He was elected a Fellow of the American Association for Advancement of Science in 2006 for foundational work in text categorization, and won a Test of Time Award from ACM SIGIR in 2017 for his paper w/ Gale introducing uncertainty sampling.
Generalizing Representations of Language for Documents Analysis across Different Domains
Speaker: Ndapa Nakashole, University of California, San Diego
Abstract. Labeled data for tasks such as information extraction, question answering, text classification, and other types of document analysis are often drawn from a limited set of document types and genres because of availability, and cost. At test time, we would like to apply the trained models to different document types and genres. However, a model trained on one dataset often fails to generalize to data drawn from distributions other than that of the training data. In this talk, I will talk about our work on generalizing representations of language, and discuss some of the document types we are studying.
Biography: Ndapa Nakashole is an Assistant Professor at the University of California, San Diego, where she teaches and carries out research on Statistical Natural Language Processing. Before that she was postdoctoral scholar at Carnegie Mellon University. She obtained her PhD from Saarland University and the Max Planck Institute for Informatics. She completed undergraduate studies in Computer Science at the University of Cape Town, South Africa.
Speaker: Rajasekar Krishnamurthy, IBM Research
Abstract: Enterprise applications and Business processes rely heavily on experts and knowledge workers reading, searching and analyzing business documents to perform their daily tasks. For instance, legal professionals read contracts to identify non-standard clauses, risks and exposures. Loan officers analyze borrower business documents to understand income, expense and contractual commitments before making lending decisions.
Document Intelligence is the ability for a system to read, understand and interpret business documents through the application of AI-based technologies. It has the potential to significantly improve an employee's productivity and an organization's effectiveness by augmenting the expert in their daily task. Several challenges arise in this context such as variability in document authoring, necessity to contextually understand textual and tabular content and organization/role-specific variations in semantic interpretations. Furthermore, as experts rely on document intelligence, they expect the system to exhibit key properties such as explainability, consistent model evolution and ability to enhance the system's knowledge with a few examples.
In this talk, using real-world enterprise application examples, I first describe how document intelligence can play a key role in augmenting enterprise AI applications. I then outline key challenges that arise in business document understanding and desiderata that enterprise AI applications and users expect. I conclude with a set of open research challenges that need to be tackled spanning across language understanding, knowledge representation and reasoning, deep learning and systems research.
Biography: Rajasekar Krishnamurthy is a Principal Research Staff Member and Senior Manager leading the Watson Discovery team in the Watson AI organization. Prior to this role, he was a Principal Research Staff Member at IBM Research - Almaden leading the NLP, Entity Resolution and Discovery department. Rajasekar's technical interests focus around helping enterprises derive business insights from a variety of unstructured content sources ranging from public and third-party data sources to governing business documents within an enterprise. Rajasekar has expertise in building scalable and usable analytics tools for individual stages in analyzing unstructured documents, such as text analytics, document structure analysis and entity resolution. He is a member of the IBM Academy of Technology. He received a B.Tech in Computer Science and Engineering from the Indian Institute of Technology-Madras, and a Ph.D.in Computer Science from the University of Wisconsin-Madison.
Speaker: Asli Celikyilmaz, Microsoft Research
Abstract: Automatic text generation enables computers to summarize text, describe pictures to visually impaired, write stories or articles about an event, have conversations in customer-service, chit-chat with individuals, and other settings, and customize content based on the characteristics and goal of the human interlocutor. Neural text generation (NLG) – using neural network models to generate coherent text – have seen a paradigm shift in the last years, caused by the advances in deep contextual language modeling (e.g., LSTMs, GPT, GPT2) and transfer learning (e.g., ELMo, BERT). While these tools have dramatically improved the state of NLG, particularly for low resources tasks, state-of-the-art NLG models still face many challenges: a lack of diversity in generated text, commonsense violations in depicted situations, difficulties in making use of factual information, and difficulties in designing reliable evaluation metrics. In this talk I will discuss existing work on text only transformers that specifies how to generate long-text with better discourse structure and narrative flow, generate multi-document summaries, build automatic knowledge graphs with commonsense transformers as text generators. I will conclude the talk with a discussion of current challenges and shortcomings of neural text generation, pointing to avenues for future research.
Biography: Asli Celikyilmaz is a Principal Researcher at Microsoft Research in Redmond, Washington. She is also an Affiliate Professor at the University of Washington. Her research interests are mainly in deep learning and natural language, specifically on language generation with long-term coherence, language understanding, language grounding with vision, and building intelligent agents for human-computer interaction She has received several “best of” awards including NAFIPS 2007, Semantic Computing 2009, and CVPR 2019.