DOLAP 2022: 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data




Co-located with EDBT 2022, Edinburgh, UK

March 29, 2022



Keynotes

H. V. Jagadish

Data Equity: A Core Requirement for Responsible Analytical Processing of Big Data

It was only recently that we regularly used to hear statements like “Let the data speak for themselves”. Today, we instead hear worries about fairness of data-driven systems and AI. However, much of the recent work to address these concerns is focused narrowly on the model-building, and often uses a selected specific mathematical formulation of fairness. Responsible data analysis requires that we pay attention to the whole process. We need to address inequitable representation in the data record, inequities due to the data scientist’s world view being reflected in the model, inequities in the resulting outcomes, and inequities in access to fruits of the analysis. In this talk, I will lay out a research agenda in this direction, and invite you to join me.

Short Bio: H. V. Jagadish is Bernard A Galler Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan in Ann Arbor, and Director of the Michigan Institute for Data Science. Prior to 1999, he was Head of the Database Research Department at AT&T Labs, Florham Park, NJ. Professor Jagadish is well known for his broad-ranging research on information management, and has approximately 200 major papers and 37 patents, with an H-index of 94. He is a fellow of the ACM, "The First Society in Computing" (since 2003) and of AAAS (since 2018). He served on the board of the Computing Research Association (2009-2018). He has been an Associate Editor for the ACM Transactions on Database Systems (1992-1995), Program Chair of the ACM SIGMOD annual conference (1996), Program Chair of the ISMB conference (2005), a trustee of the VLDB (Very Large DataBase) foundation (2004-2009), Founding Editor-in-Chief of the Proceedings of the VLDB Endowment (2008-2014), and Program Chair of the VLDB Conference (2014). Since 2016, he is Editor of the Morgan & Claypool Synthesis Lecture Series on Data Management. Among his many awards, he won the David E Liddle Research Excellence Award (at the University of Michigan) in 2008, the ACM SIGMOD Contributions Award in 2013, and the Distinguished Faculty Achievement Award (at the University of Michigan) in 2019. His popular MOOC on Data Science Ethics is available on EdX, Coursera, and Futurelearn.

Juliana Freire

Dataset Search for Data Discovery, Augmentation and Explanation

Recent years have seen an explosion in our ability to collect and catalog immense amounts of data about our environment, society, and populace. Moreover, with the push towards transparency and open data, scientists, governments, and organizations are increasingly making structured data available on the Web and in data lakes. Combined with advances in analytics and machine learning, the availability of such data should in theory allow us to make progress on many of our most important scientific and societal questions. However, this opportunity is often missed due to a central technical barrier: it is currently nearly impossible for domain experts to weed through the vast amount of available information to discover datasets that are needed for their specific application. While search engines have addressed the discovery problem for Web documents, there are many new challenges involved in supporting the discovery of structured data---from crawling the Web in search of datasets, to the need for dataset-oriented queries and new strategies to rank and display results. I will discuss these challenges and present our recent work in this area, including the Auctus dataset search engine and scalable techniques to support dataset discovery queries.

Short Bio: Juliana Freire is a Professor of Computer Science and Data Science at New York University. She was the elected chair of the ACM Special Interest Group on Management of Data (SIGMOD), served as a council member of the Computing Research Association’s Computing Community Consortium (CCC), and was the NYU lead investigator for the Moore-Sloan Data Science Environment, a $32.8 million grant awarded jointly to UW, NYU, and UC Berkeley. She develops methods and systems that enable a wide range of users to obtain trustworthy insights from data. This spans topics in large-scale data analysis and integration, visualization, machine learning, provenance management, web information discovery, and different application areas, including urban analytics, predictive modeling, and computational reproducibility. Freire has co-authored over 200 technical papers (including 11 award-winning publications), several open-source systems, and is an inventor of 12 U.S. patents. According to Google Scholar, her h-index is 61 and her work has received over 16,000 citations. She is an ACM Fellow, a AAAS Fellow, and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She was awarded the ACM SIGMOD Contributions Award in 2020. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She received a B.S. degree in computer science from the Federal University of Ceara (Brazil), and M.Sc. and Ph.D. degrees in computer science from the State University of New York at Stony Brook.