Projects‎ > ‎

Data Science

Data science (DS) is a new field that builds on computer science, statistics, mathematics, e-science, and others, to develop principles, algorithms, and best practices to generate, process, and use data. Virtually all parts of society are now becoming data driven: a lot of data is being collected, stored, and analyzed to glean insights. Data science emerged in response to this need. 

There is no commonly accepted definition of DS as yet. For our purposes, we define DS to be focusing on three goals:
  • extracting actionable insights from raw data (a.k.a., building the "infamous" raw-data-to-insight pipeline). 
  • developing data-intensive artifacts, e.g., knowledge bases/graphs, recommender systems, etc. 
  • designing data-intensive experiments to answer questions, e.g., A/B testing. 
We believe that DS will become increasingly critical, that the database community has much to contribute, and that our community should seek to play a leadership role in pushing this field forward. Toward these goals, our group is currently exploring the following directions: 
  • Designing and teaching DS classes at both undergrad and grad levels. 
  • Setting up DS programs at both undergrad and grad levels (we focus for now on setting up a MS program).
  • Developing DS services for UW-Madison scientists (more details to come soon). 
  • Conducting research in DS: see our work in developing a DI system agenda, Magellan.
This is a broad DS agenda that requires extensive collaboration with other groups in the Department of Computer Sciences, with many groups/centers/institutions across the campus, and with organizations/companies outside the campus. 


This project is supported by a UW2020 award from the UW-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.