Data Systems meet Data Science Workshop
Montréal, June 7 2023
Collocated with CS-CAN/INFO-CAN
Overview
The second edition of the Data Systems meet Data Science (DSDS) workshop brings together the research community that works at the intersection of data/software systems, software engineering, and Data Science/AI/ML, either by building the next generation Data Science platforms or by using AI/ML techniques to improve systems. Technical presentations from guest speakers in industry and academia will be augmented with poster and demo sessions from students, as well as time and space for discussions.
Have a look at the first edition of the workshop here, which was focused on the Montreal research community.
The workshop is co-organized by Bettina Kemme, Essam Mansour, and Oana Balmau.
Important Information
Where: McGill University, Room Trottier 1080
[UPDATE] The lunch and poster session will be in the Strathcona Dentistry Building
When: June 7 2023, 9am – 5pm
CS-CAN Conference registration link.
All attendees must be registered to CS-CAN for the workshop day.
Speakers will be automatically registered.
Poster Registration
Are you a graduate or an undergraduate student? Are you interested in Systems for Data Science for ML? Then, consider submitting a poster to DSDS!
Poster registration/submission: https://dsds23.hotcrp.com/
Poster registration deadline: May 10 May 15, May 25, June 1st midnight, AoE. Only the poster title and 100-word abstract are needed to register.
Author notification: First notification: May 26. Notification for papers submitted after May 25: June 2nd.
Poster submission deadline: June 1st, midnight, AoE.
Important: All presenters of accepted poster submissions must be registered for the CS-CAN/INFO-CAN conference.
Register to the workshop here.
Keynote Speakers
Morning Keynote: Indexing Data Lakes
Renée Miller, Northeastern University
Abstract: In data science, data sets are often stored in large data lakes. In this talk, we consider how data lakes have been indexed to support fast data set search or discovery of tabular data. We discuss the state-of-the-art and important challenges that remain open.
Speaker Bio: Renée J. Miller is a University Distinguished Professor of Computer Science at Northeastern University. She is a Fellow of the Royal Society of Canada, Canada’s National Academy of Science, Engineering and the Humanities. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Ontario Premier’s Research Excellence Award, and an IBM Faculty Award. She formerly held the Bell Canada Chair of Information Systems at the University of Toronto and is a fellow of the ACM. Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She and her colleagues received the ICDT Test-of-Time Award and the 2020 Alonzo Church Alonzo Church Award for Outstanding Contributions to Logic and Computation for their influential work establishing the foundations of data exchange. Professor Miller is an Editor-in-Chief of the VLDB Journal and former president of the non-profit Very Large Data Base (VLDB) Foundation. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor’s degrees in Mathematics and Cognitive Science from MIT.
Afternoon Keynote: A Systematic View of Data Science
M. Tamer Özsu, University of Waterloo
Abstract: There is a data-driven revolution underway in science and society, disrupting every form of enterprise. We are collecting and storing data more rapidly than ever before. There is an increasing recognition that data science can assist in leveraging this data and the insights obtained from it into products, systems, and policies. This has resulted in the formation within academia of data science research centres, institutes and even academic units and the establishment of major initiatives within every major industrial organization. However, our understanding of data science is vague and highly varied and, in many cases, are squeezed to fit the available openings within an institution. There is a need to approach this field systematically to define its scope and its boundaries. The objective of this talk is to provide such a consistent and systematic study of the scoping of data science.
Speaker Bio: M. Tamer Özsu is a University Professor at Cheriton School of Computer Science at University of Waterloo. Previously, he was the Director of the Cheriton School and Associate Dean (Research) of the Faculty of Mathematics. His research is on data engineering aspects of data science focusing on distributed data management and the management of non-conventional data. He is a Fellow of the Royal Society of Canada, American Association for the Advancement of Science, Association for Computing Machinery, Institute of Electrical and Electronics Engineers, Asia-Pacific Artificial Intelligence Association and Balsille School of International Affairs, an elected member of Science Academy, Turkey and a member of Sigma Xi. Dr. Özsu is the recipient of the IEEE Innovation in Societal Infrastructure Award (2022), CS-Can/Info-Can Lifetime Achievement Award (2018), ACM SIGMOD Test-of-Time Award (2015), the ACM SIGMOD Contributions Award (2006), and The Ohio State University College of Engineering Distinguished Alumnus Award (2008). He is the Founding Editor-in-Chief of ACM Books (2014-2020) and the Founding Series Editor of Synthesis Lectures on Data Management (2009-2014). He serves on the editorial boards of three journals and one book series.
Speakers
Arno Jacobsen, University of Toronto
Christophe Dubach, McGill University
Essam Mansour, Concordia
Eyal de Lara, University of Toronto
Fei Chiang, McMaster University
Khaled Ammar, Borealis AI
Khuzaima Daudjee, University of Waterloo
Oana Balmau, McGill University
Panos Kalnis, KAUST
Sujaya Maiyya, University of Waterloo
Tianzheng Wang, Simon Fraser University
Verena Kantere, University of Ottawa
Panel Discussion: Graph Data Science, Today and Tomorrow.
Panel moderator: Bettina Kemme, McGill University
Panelists:
Fei Chiang, McMaster University
Khaled Ammar, Borealis AI
Panos Kalnis, KAUST
Agenda
8:45 - 9:00 Doors open & Welcome
Talk session 1
9:00 - 9:15 Eyal de Lara | Systems Research for the Hierarchical Cloud
9:15 - 9:30 Panos Kalnis | Scaling Large Language Models to a Thousand GPUs
9:30 - 9:45 Arno Jacobsen | Our Data Science and Systems Research: A Brave Selection
9:45 - 10:00 Christophe Dubach | Automatic Synthesis of AI Accelerators for FPGAs
10:00 - 10:30 Coffee break
Morning Keynote
10:30 - 11:00 Renée Miller | Keynote title: Indexing Data Lakes
11:00 - 11:15 Break
Talks session 2
11:15 - 11:30 Khaled Ammar | Managing Data in Research Organizations: Challenges and Opportunities
11:30 - 11:45 Essam Mansour | A GML-Enabled Knowledge Graph Platform: Challenges and Opportunities
11:45 - 12:00 Khuzaima Daudjee | Distributed DNN Training on Serverless Resources
12:00 - 13:00 Lunch in the Strathcona Dentistry Building
Afternoon Keynote
13:00 - 13:30 M. Tamer Özsu | Keynote: A Systematic View of Data Science
13:30 - 13:40 Break
Talks session 3
13:40 - 13:55 Fei Chiang | A Glimpse into Data Currency Estimation
13:55 - 14:10 Oana Balmau | Towards Practical Learned Indexes
14:10 - 14:25 Tianzheng Wang | Asynchronous Data Movement for Modern Transactional Data Systems
14:25 - 14:40 Verena Kantere | Workload-driven Query Planning and Optimization Using Machine Learning
14:40 - 14:55 Sujaya Maiyya | Ensuring Data Fault Tolerance in Oblivious Datastores
15:00 - 16:00 Poster session / Networking break - in the Strathcona Dentistry Building
16:00 - 16:55 Panel discussion
16:55 - 17:00 Closing remarks
Need help?
Please contact oana.balmau@mcgill.ca for any technical issues or questions about the event.