19. April — 16.45 Uhr
The periodic table of data structures and the path toward self-designing data systems
Data structures are everywhere. They define the behavior of modern data systems and data-driven algorithms. For example, with data systems that utilize the correct data structure design for the problem at hand we can reduce the monthly bill of large-scale data systems applications on the cloud by hundreds of thousands of dollars. We can accelerate data science tasks by being able to dramatically speed up the computation of statistics over large amounts of data. We can train drastically more neural networks within a given time budget, improving accuracy.
However, knowing the right data structure and system design for any given scenario is a notoriously hard problem; there is a massive space of possible designs while there is no single design that is perfect across all data, queries, and hardware scenarios. We will discuss our quest for the first principles of data structures and data system design. We will show signs that it is possible to reason about this massive design space, and we will show early results from a prototype self-designing data system which can take drastically different shapes to optimize for the workload, hardware, and available cloud budget. These shapes include data structure and system designs which are discovered automatically and do not exist in the literature or industry.
Stratos Idreos is an associate professor of Computer Science at Harvard University where he leads the Data Systems Laboratory. His research focuses on making it easy and even automatic to design workload and hardware conscious data structures and data systems with applications on relational, NoSQL, and data science problems. For his PhD thesis on adaptive indexing, Stratos was awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award from the European Research Council on Informatics and Mathematics. In 2015 he was awarded the IEEE TCDE Rising Star Award from the IEEE Technical Committee on Data Engineering for his work on adaptive data systems and in 2020 he received the ACM SIGMOD Contributions award. Stratos is also a recipient of the National Science Foundation Career award, and the Department of Energy Early Career award.
21. Juni — 16.30 Uhr
Explainable Opinion Summarization
Entity matching refers to the task of determining whether two
Abstract: Subjective data refers to data that contains opinions and experiences. Such data is ubiquitous in product reviews, tweets, and discussion forums on social media. We present an abstractive opinion summarization framework, which does not rely on gold-standard summaries for training. The opinion summarizer extracts opinion phrases from reviews and trains a Transformer model to reconstruct the original reviews from these extractions. Automatic evaluation on Yelp data shows that our summarizer outperforms competitive baselines. Human studies on two corpora verify that our opinion summarizer produces informative summaries and shows promising customization capabilities. We show how the idea of reconstructing summaries from extracted opinions also allows us to generate explanations for the generated summaries.
This is joint work with Yoshihiko Suhara, Xiaolan Wang, and Stefanos Angelidis, Zhengjie Miao, and Yuliang Li.
Wang-Chiew is a research scientist at Facebook AI. Prior to joining Facebook AI, she led the research efforts at Megagon Labs with the goal of building advanced technologies to enhance search by experience where her team conducted research on data integration, information extraction, text mining and summarization, knowledge base construction and commonsense reasoning, and data visualization. Prior to that, she was a Professor of Computer Science at University of California, Santa Cruz. She also spent two years at IBM Research - Almaden. Her research interests include data integration and exchange, data provenance, and natural language processing. She is a co-recipient of the 2014 ACM PODS Alberto O. Mendelzon Test-of-Time Award, the 2018 ICDT Test-of-Time Award, and the 2020 Alonzo Church Award. She received the 2019 VLDB Women in Database Research Award. She was on the VLDB Board of Trustees (2014-2019) and she is a Fellow of the ACM.