Workshop on Data Science for Data MarketPlaces

In conjunction with the 48th International Conference on Very Large Data Bases (VLDB)

September 5, 2022

Sydney, Australia

News

  • (August 24, 2022) Workshop program posted. The workshop will run in a hybrid format, with both in-person and remote participation.

  • (August 20, 2022) The workshop is going to be held on September 5, 9:00AM-12:30PM Sydney Time. More detailed schedule will be published soon. Stay tuned!

  • (August 20, 2022) Three excellent papers were accepted to the workshop. Their final versions are posted. We look forward to the talks.

  • (March 2, 2022) Professor Yiling Chen from Harvard University and Professor Raul Castro Fernandez will be giving keynote talks at the workshop.

  • (February 14, 2022) The workshop Website is up!

Workshop Program


All time is in Sydney Time. Both in-person and remote participation supported.

  • 9:00AM-9:45AM Keynote 1: Yiling Chen (Remote)

  • 9:45AM-10:30AM Keynote 2: Raul Castro Fernandez (In-Person)

  • 10:30AM-11:00AM Coffee Break

  • 11:00AM-11:25AM Invited Talk: Fatemeh Nargesian (In-Person)

  • 11:25AM-12:25PM Paper Presentations

  1. Towards Data Economy: Are Products and Marketplaces Ready? (Remote)

  2. Monetary Incentive Scheme for Sequential Collaboration in Data Sharing (Remote)

  3. Establishing the Enterprise Data Marketplace: Characteristics, Architecture, and Challenges (In-Person)

  • 12:25PM-12:30PM Discussion and Wrap-up

Workshop Overview

We are entering a brave new world ushered in by the global digital transformation. With an estimated more than 2.5 quintillion bytes of data produced every day, data is becoming an independent and strategic asset. There have appeared data marketplaces of various forms, which aim to make access to data a commodity. In these marketplaces, the main idea is to facilitate interaction between data providers (e.g., individuals or organizations that possess data in diverse domains and wish to offer their data to other interested parties) and data consumers who are interested in obtaining data to accomplish certain tasks, such as training new machine learning models, increasing the accuracy of existing ones, and conducting statistical estimation. Since such platforms aim to adopt the characteristics of a marketplace, data exchange carries an underlying cost (e.g., monetary value). The advent of such marketplaces can be viewed as an initial step to the enablement of efficient trading of data, with enormous social and economic benefits. The design of the operating principles, marketplace mechanisms and trading strategies (to name a few topics) of such marketplaces constitute open research directions and involve multiple research communities, such as economics and data science. The theme of this workshop is to address the challenges and opportunities of data management and data science in a data marketplace environment.

We welcome submissions presenting interesting and initial ideas that address fundamental research and technical issues in this challenging area and especially encourage reports on system level research and interdisciplinary practice related to data management and data science in data marketplaces. We also welcome new visions and critical reviews on marketplaces.

Topics of Interest

Topics of interest include, but not limited to:

  • Data valuation

  • Data pricing

  • Data acquisition

  • Data quality measurement

  • Data utility

  • Arbitrage and prevention

  • Game-theoretic approaches to data markets

  • Privacy issues in data trading

  • System support for data trading

Keynote Speakers

Talk Title: Economic Considerations for Pricing Information and Data

Abstract: Information and data when used appropriately may create incredible value to their holders. It's hence natural to think that they can and should become the subject of transactions, to allow data and information be allocated to those who value them more. But how should information and data be priced, when they can be partially revealed, may have unverifiable quality, and may be fabricated? In this talk, I will discuss some economic considerations for pricing information and data, including designing the "right format" of information as products for sell and using payments to ensure the integrity of the acquired data. I will take a Bayesian view of information in the discussion.

Bio: Yiling Chen is a Gordon McKay Professor of Computer Science at Harvard University. She received her Ph.D. in Information Sciences and Technology from the Pennsylvania State University. Prior to working at Harvard, she spent two years at Yahoo! Research in New York City. Her research lies in the intersection of computer science, economics and other social sciences, with a focus on social aspects of computational systems. She was a recipient of The Penn State Alumni Association Early Career Award, and was selected by IEEE Intelligent Systems as one of "AI's 10 to Watch” early in her career. Her work received best paper awards at ACM EC, AAMAS, ACM FAT* (now ACM FAccT) and ACM CSCW conferences. She has co-chaired the 2013 Conference on Web and Internet Economics (WINE’13), the 2016 ACM Conference on Economics and Computation (EC’16), the 2018 AAAI Conference on Human Computation and Crowdsourcing (HCOMP’18), and the 2023 AAAI Conference on Artificial Intelligence (AAAI'23), and has served as an associate editor for several journals.

Talk Title: The Value of Data and the Design of Data Markets

Abstract: While data and artificial intelligence are driving many changes to our economic, social, political, financial, and legal systems, we know surprisingly little about their foundations and governing dynamics. In this talk, I will argue that the value of data arises from data markets, environments where agents exchange data. I will illustrate the importance of studying data markets and show several examples of where today's data markets fall short and cause negative impacts on welfare and privacy. Then, I will suggest that data markets can be deliberately designed to avoid those pitfalls and present a proposal to advance the field that consists of two steps: data market design and data market platform implementation. I will illustrate the above points with examples of existing and future data markets. Finally, I will propose there exists a connection between the study of the value and economics of data and the discipline of data science. I will convey the importance of studying the data economy and the many tools and approaches the data management community has to contribute to this area.

Bio: I am interested in understanding the economics and value of data, including the potential of data markets to unlock that value. The goal of my research is to understand how to make the best use of data possible. For that, I often build systems to share, discover, prepare, integrate, and process data. I often use techniques from data management, statistics, and machine learning. I am an assistant professor in the Computer Science department at the University of Chicago. Before UChicago, I did a postdoc at MIT with Sam Madden and Mike Stonebraker. And before that, I completed my PhD at Imperial College London with Peter Pietzuch.

Invited Talk

Title: A Unified Framework for Distribution-aware Query Answering in Data Markets

Abstract: Addressing the increasing demand for data exchange has led to the development of data markets that facilitate transactional interactions between data buyers and data sellers. This talk discusses cost-effective and distribution-aware query answering in data market environments. In a passive-provider data market, the burden is on the consumers to perform data discovery in sources providers offer and to come up with a way to integrate the data they need. In a passive-consumer model, the query answering burden is on the data providers to explore their locally owned data lakes or other data publishers in order to answer a consumer’s query. While differentiating different types of data markets, this talk discusses a unified query-answering framework and how it can be instantiated for these market types. We will describe how the functionalities of this framework enable discovering and integrating data from different sources into a dataset that meets user-provided schema and distribution requirements in a cost-effective manner.

This is joint work with Abolfazl Asudeh from the University of Illinois at Chicago.

Bio: Fatemeh Nargesian is an assistant professor in the Department of Computer Science, at the University of Rochester. She got her PhD at the University of Toronto and was a research intern at IBM Watson in 2014 and 2016. Before the University of Toronto, she worked at Clinical Health and Informatics Group at McGill University. Her primary research interests are in data intelligence focused on data integration, data for ML, and (climate) time-series management.

Accepted papers

  • Towards Data Economy: Are Products and Marketplaces Ready?
    Perry Chen (Macquarie University), Jian Yang (Macquarie University), Amin Beheshti (Macquarie University), Jianwen Su (UC Santa Barbara)
    (PDF)

  • Monetary Incentive Scheme for Sequential Collaboration in Data Sharing
    Dimong Chea (Kyoto University), Masatoshi Yoshikawa (Kyoto University), Yang Cao (Kyoto University)
    (PDF)

  • Establishing the Enterprise Data Marketplace: Characteristics, Architecture, and Challenges
    Rebecca Eichler (University of Stuttgart), Christoph Gröger (Robert Bosch GmbH), Eva Hoos (Robert Bosch GmbH), Christoph Stach (University of Stuttgart), Holger Schwarz (University of Stuttgart)
    (Please contact the authors for the manuscript.)

Organization

Workshop Co-Chairs

Program Committee

  • Anish Agarwal (MIT)

  • Yang Cao (Kyoto University)

  • Raul Castro Fernandez (UChicago)

  • Rubén Cuevas Rumín (Universidad Carlos III de Madrid)

  • Ruoxi Jia (Virginia Tech)

  • Yuqing Kong (Peking University)

  • Jinfei Liu (Zhejiang University)

  • Yang Liu (UC Santa Cruz)

  • Ce Zhang (ETH)

Important Dates

  • Paper submission: May 16, 2022 June 17, 2022

  • Notification of acceptance: June 14, 2022 July 15, 2022

  • Camera-ready copies: August 1, 2022

  • Workshop: September 5, 2022

Submission

Submission Site

https://cmt3.research.microsoft.com/DSDM2022

Submission Instructions

We welcome submissions that fall in one of the following two categories with different page limits: (1) vision papers (up to 4 pages), and (2) technical papers, including research papers and application papers (8-12 pages).

Submissions are to be formatted following the standard VLDB template available at:

http://vldb.org/pvldb/vol15-formatting/

The review process is single-blinded. Authors must include their names and affiliations on the first page of the manuscript. We use CMT’s conflict management system, through which authors should flag conflicts with members of the program committee.

To encourage submissions discussing on-going work, papers presented in the workshop are not regarded as formally published, so that they can still be submitted to other venues.