Data Systems meet Data Science Workshop 

Montréal, June 7 2023 

Collocated with CS-CAN/INFO-CAN 

DSDS editions:

Overview

The second edition of the Data Systems meet Data Science (DSDS) workshop brings together the research community that works at the intersection of data/software systems, software engineering, and Data Science/AI/ML, either by building the next generation Data Science platforms or by using  AI/ML techniques to improve systems. Technical presentations from guest speakers in industry and academia will be augmented with poster and demo sessions from students, as well as time and space for discussions.

Have a look at the first edition of the workshop here, which was focused on the Montreal research community.

The workshop is co-organized by Bettina Kemme, Essam Mansour, and Oana Balmau.

Important Information

Poster Registration

Are you a graduate or an undergraduate student? Are you interested in Systems for Data Science for ML?  Then, consider submitting a poster to DSDS!

Poster registration/submission: https://dsds23.hotcrp.com/ 

Poster registration deadline: May 10 May 15, May 25, June 1st midnight, AoE.  Only the poster title and 100-word abstract are needed to register.

Author notification: First notification: May 26. Notification for papers submitted after May 25: June 2nd.

Poster submission deadline: June 1st, midnight, AoE.

Important: All presenters of accepted poster submissions must be registered for the CS-CAN/INFO-CAN conference. 

Register to the workshop here.


Keynote Speakers

Morning Keynote: Indexing Data Lakes


Renée Miller, Northeastern University

Abstract: In data science, data sets are often stored in large data lakes.  In this talk, we consider how data lakes have been indexed to support fast data set search or discovery of tabular data.  We discuss the state-of-the-art and important challenges that remain open.

Speaker Bio: Renée J. Miller is a University Distinguished Professor of Computer Science at Northeastern University.  She is a Fellow of the Royal Society of Canada, Canada’s National Academy of Science, Engineering and the Humanities. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Ontario Premier’s Research Excellence Award, and an IBM Faculty Award. She formerly held the Bell Canada Chair of Information Systems at the University of Toronto and is a fellow of the ACM. Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She and her colleagues received the ICDT Test-of-Time Award and the 2020 Alonzo Church Alonzo Church Award for Outstanding Contributions to Logic and Computation for their influential work establishing the foundations of data exchange.  Professor Miller is an Editor-in-Chief of the VLDB Journal and former president of the non-profit Very Large Data Base (VLDB) Foundation. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor’s degrees in Mathematics and Cognitive Science from MIT.

Afternoon Keynote: A Systematic View of Data Science 


M. Tamer Özsu, University of Waterloo 


Abstract: There is a data-driven revolution underway in science and society, disrupting every form of enterprise. We are collecting and storing data more rapidly than ever before. There is an increasing recognition that data science can assist in leveraging this data and the insights obtained from it into products, systems, and policies. This has resulted in the formation within academia of data science research centres, institutes and even academic units and the establishment of major initiatives within every major industrial organization. However, our understanding of data science is vague and highly varied and, in many cases, are squeezed to fit the available openings within an institution. There is a need to approach this field systematically to define its scope and its boundaries. The objective of this talk is to provide such a consistent and systematic study of the scoping of data science.


Speaker Bio: M. Tamer Özsu is a University Professor at Cheriton School of Computer Science at University of Waterloo. Previously, he was the Director of the Cheriton School and Associate Dean (Research) of the Faculty of Mathematics. His research is on data engineering aspects of data science focusing on distributed data management and the management of non-conventional data. He is a Fellow of the Royal Society of Canada, American Association for the Advancement of Science, Association for Computing Machinery, Institute of Electrical and Electronics Engineers, Asia-Pacific Artificial Intelligence Association and Balsille School of International Affairs, an elected member of Science Academy, Turkey and a member of Sigma Xi. Dr. Özsu is the recipient of the IEEE Innovation in Societal Infrastructure Award (2022), CS-Can/Info-Can Lifetime Achievement Award (2018), ACM SIGMOD Test-of-Time Award (2015), the ACM SIGMOD Contributions Award (2006), and The Ohio State University College of Engineering Distinguished Alumnus Award (2008). He is the Founding Editor-in-Chief of ACM Books (2014-2020) and the Founding Series Editor of Synthesis Lectures on Data Management (2009-2014). He serves on the editorial boards of three journals and one book series.

Speakers


Panel Discussion: Graph Data Science, Today and Tomorrow.


Agenda


8:45  - 9:00       Doors open & Welcome


Talk session 1

  9:00 - 9:15     Eyal de Lara | Systems Research for the Hierarchical Cloud

  9:15 - 9:30     Panos Kalnis | Scaling Large Language Models to a Thousand GPUs

  9:30 - 9:45     Arno Jacobsen | Our Data Science and Systems Research: A Brave Selection

  9:45 - 10:00     Christophe Dubach | Automatic Synthesis of AI Accelerators for FPGAs


10:00 - 10:30       Coffee break 


Morning Keynote

10:30 - 11:00     Renée Miller | Keynote title: Indexing Data Lakes


11:00 - 11:15            Break    


Talks session 2

  11:15 - 11:30     Khaled Ammar | Managing Data in Research Organizations: Challenges and Opportunities

  11:30 - 11:45     Essam Mansour | A GML-Enabled Knowledge Graph Platform: Challenges and Opportunities

  11:45 - 12:00     Khuzaima Daudjee | Distributed DNN Training on Serverless Resources


12:00 - 13:00        Lunch in the Strathcona Dentistry Building


Afternoon Keynote 

13:00 - 13:30        M. Tamer Özsu | Keynote: A Systematic View of Data Science

    

13:30 - 13:40           Break


Talks session 3

13:40 - 13:55           Fei Chiang | A Glimpse into Data Currency Estimation

13:55 - 14:10           Oana Balmau |  Towards Practical Learned Indexes

14:10 - 14:25           Tianzheng Wang | Asynchronous Data Movement for Modern Transactional Data Systems

14:25 - 14:40           Verena Kantere | Workload-driven Query Planning and Optimization Using Machine Learning

14:40 - 14:55           Sujaya Maiyya | Ensuring Data Fault Tolerance in Oblivious Datastores



15:00 - 16:00       Poster session / Networking break - in the Strathcona Dentistry Building

16:00 - 16:55       Panel discussion

16:55 - 17:00        Closing remarks

Need help?


Please contact oana.balmau@mcgill.ca for any technical issues or questions about the event.