McGill DISCS Lab

Data-Intensive Storage & Computer Systems

DISCS centers around efficient storage systems. 

Our goal is to understand how new storage technologies (persistent memory, NVMe drives, RDMA) will impact the future of computer systems and shape future data-intensive applications, such as machine learning, data science, and edge computing applications.

An important part of the DISCS vision is training researchers that can recognize the full-system impact of new technologies. We are a diverse group and are always looking to recruit talented and motivated students.

📰  News

See the latest events in our group

See all news.

Research

Data powers everything we do and we are collecting it at unprecedented rates. The driver for research at DISCS is to create a storage infrastructure that enables us to gain insights from this data in a fast and energy-conscious manner. See details on our three main research directions below.

Systems for Data Science & Machine Learning

Explore how storage can support ML & data science workloads in real-time, on TB-scale datasets.

This research is done in collaboration with MLCommons.

Data Science and ML workloads are ubiquitous. From taking care of our health to running businesses to managing our energy systems and transport planning, we leverage learning to make more informed decisions. We obtain these insights through a combination of algorithms and vast amounts of data. The way data is stored and accessed strongly influences how fast the algorithms can provide us with useful insights. Inefficient data management can unfortunately slow down the entire pipeline. 

The needs of ML and Data Science workloads are poorly met by current general-purpose storage systems. To obtain fast results, existing systems rely on heuristics or use stale information. At DISCS, we are designing new tools and storage systems that (1) scale with TB-scale datasets used by Data Science and ML (2) ingest and clean incoming data at high throughput, and (3) serve data with low latency. 

This challenging goal entails many research directions, such as identifying opportunities to reduce data movement, designing adaptable data structures that harmonize with Data Science workloads, and the creative use of new storage resources (e.g., NVRAM, fast SSDs, etc.).

Storage Building Blocks for Fast Devices

Redesign caching, files systems, and indexes for new hardware and real systems.

Emerging storage technologies are challenging fundamental assumptions in computer systems design. One major assumption is the significant performance gap between memory and persistent storage access. This gap is now bridged by Byte-addressable persistent memory. Another assumption is that I/O bandwidth is the main bottleneck in storage systems. This too has changed with the development of new fast drives (e.g., Intel Optane NVMe SSDs) shifting the bottleneck to the CPU. In addition, the storage stack is getting deeper and more heterogeneous. It is likely that in a typical server developers and system administrators will have to manage will contain RAM, persistent memory, different types of SSDs and hard disks.

These hardware advances provide an opportunity to redesign the basic storage building blocks, such as file systems, caching policies, key-value stores, and relational databases, as well as re-questioning the appropriate level of support that should be ensured by the Operating System.

Ultimately, given that the hardware and the workloads keep evolving, our long-term vision is to create a framework that automatically generates storage systems which meet the desired performance requirements, given the workload profile and a set of generic hardware characteristics as inputs.

Efficient Data Management for Edge Computing

Shape data management for IoT devices, which will be the world’s largest data producers by 2025.

This research is done as a part of a DND IDEaS micro-net, in collaboration with Profs. Eyal de Lara and David Lie from the University of Toronto, Prof. Aastha Mehta from UBC, and Prof. Julien Gascon-Samson from ETS Montreal.

The Internet of Things (IoT) is a fast-growing field that produces vast amounts of data. In fact, it is estimated that the data produced by IoT workloads alone in 2025 will be larger than all of the data we will produce in 2020. Naturally, this is an excellent opportunity for storage research.

IoT poses serious challenges in terms of resource management. Numerous IoT settings make use of battery-powered devices with limited energy, low storage and data processing capacities, and unreliable connectivity. An interesting direction is determining at what granularity such systems should store data at the sensor-, edge-, and cloud-levels, while developing energy-efficient schemes for data-filtering and data movement between these layers. In addition, the nature of the collected data raises compelling questions as well. One possible avenue is designing data layouts that are suitable for storing vast amounts of noisy data, which may also contain high levels of redundancy (e.g., in video surveillance systems).

Meet the Team

Dr. Oana Balmau 

Assistant Professor

Focus: Computer Systems and Storage Technologies

Dr. Stella Bitchebe

Postdoctoral Researcher

Focus: Hardware Virtualization

Nelson Bore

PhD Researcher

Efficient Data Management in Edge Computing

Jiaxuan Chen

PhD Researcher

Virtualized Processing in-Memory – Co-advised with Prof. Xue Liu

Shubham Vashisth

PhD Researcher

Systems for ML  – co-advised with Prof. Bettina Kemme

Rahma Nouaji

PhD Researcher

Data Pre-processing in ML

Pritish Mishra

PhD Researcher

Stream Processing Frameworks for Edge Computing – At the University of Toronto, co-advised with Prof. Eyal de Lara

Ruben Adao

PhD Researcher

Optimizations for New Storage and Memory– At INESC TEC, co-advised with Dr. Ricardo Macedo

Zachary Doucet

MSc. Thesis

Systems optimizations for decentralized learning

Aidan Goldfarb

MSc. Thesis

Characterization of ML frameworks compilers – Co-advised with Prof. Christophe Dubach

Ruoyu Deng

MSc. Thesis

File Systems for Edge Computing

Aayush Kapur

MSc. Project

Applications of ML algorithms in  edge systems

DISCS Alumni

Alumni

DISCS Lab could not exist without the generous support from our sponsors. If you are interested in sponsoring our research, please contact us at oana.balmau@cs.mcgill.ca. DISCS Lab is funded by: