Interrogating
Data Science


CSCW 2020 Workshop

Welcome

Data science provides powerful tools and methods. CSCW researchers have contributed insightful studies of conventional work-practices in data science – and particularly machine learning. However, recent research has shown that human skills and collaborative decision-making, play important roles in defining data, acquiring data, curating data, designing data, and creating data. This virtual workshop gathers researchers and practitioners together to take a collective and critical look at data science work-practices, and at how those work-practices make crucial and often invisible impacts on the formal work of data science. When we understand the human and social contributions to data science pipelines, we can constructively redesign both work and technologies for new insights, theories, and challenges.

To accommodate people from multiple timezones, and to support limited commitments away from families and caregiving responsibilities, we plan to hold the workshop for four hours on one day. We will choose the time based on the timezones of the organizers and the submitters to the workshop.

Call for Participation

Data science provides powerful tools and methods. Recent research has shown that human skills and collaborative decision-making, play important roles in defining data, acquiring data, curating data, designing data, and creating data. This workshop gathers researchers and practitioners together to take a collective and critical look at data science work-practices, and at how those work-practices make crucial and often invisible impacts on the formal work of data science. When we understand the human and social contributions to data science pipelines, we can constructively redesign both work and technologies for new insights, theories, and challenges. We invite submissions that describe ways in which humans (as individuals or groups) intervene or shape datasets, models, pipelines, and other aspects of data science work. Critical perspectives are also welcome.

Feinberg and Seidelin separately examined how humans design data and how data may become a medium of design (see the text of the full workshop proposal for citations). In their views, the data in data science are made by humans, and reflect a series of individual and social decisions, such as “what constitutes the ‘data’?”, "how do we define an outlier?", and “how can we measure what we want to know?” Tanweer extended this approach to consider data as having materiality, which could be subject to breakdown and human repair, especially if data are combined from multiple sources. Bergman et al. remind us that data require on-going human correction during the lifecycle of data usage. Muller et al. showed a series of human interventions between "the data" and "the model" in multiple data science projects. These studies show that data are not simple or "objective." Rather, humans are actively shaping the data through a series of careful decisions, These decisions often go unrecorded.

Both Feinberg and also Passi and Jackson focused on matters of vision in data science. Passi and Jackson described the ability to engage in data vision when approaching data science work - i.e., applying the rules of algorithms flexibly and interpretively to meet the situated challenges of the work. Feinberg analyzed humans’ ability to shift their focus from data as a fully-formed component of a dataset, to data as a raw material for re-use in novel and unexpected combinations, through humans’ material vision to see and create new intuitive possibilities in a static dataset and/or a set of procedures. Thus, a concern for how people perceive and act on data and algorithms (individually or collaboratively) returns us to the theme of data as a medium or material for design.

These papers afford multiple critical views on “the data” in data science, and on “the work” of data science. The common thread through these papers begins with human centered data science (HCDS) and leads toward refutations of straightforward accounts of data science as "objective" and “data driven." These projects and analyses argue instead that data science is a necessarily human and social endeavor, in which algorithmic work depends crucially on the individual and collective discernments and aspirations of humans.

The full workshop proposal is available at https://drive.google.com/file/d/185Hhy8ByoylppZG8Lu1XqWWK6vszHiIg/view?usp=sharing

Submissions

We invite your brilliant submissions to the work. This section describes starting-ideas for topics, and the format for your submission. The deadline for your submission is 27 September 2020. We will send notifications by During the week of 28 September.

We hope to publish the workshop's Proceedings through CEUR. Each author would have an option to include - or not include - their submission in those Proceedings. Working with CEUR involve a great deal of online paperwork, so we will postpone those arrangements until after the workshop.

Topics

  • Human roles in data science

  • Human interventions in data science

  • Data as a site of design

  • Social definitions of data

  • Role-specific views onto data

  • Data as boundary object within and beyond data science teams

  • In Explainable AI, who speaks, and who reads, the explanations?

  • Rhetorics of data and of data science

  • "Do data have politics?"

  • Can there be "value sensitive data science"?

  • Human-centered algorithms

Format

We invite submissions of up to four pages (references may appear on additional pages) in the "Legacy SIGCHI Extended Abstracts Format ( Word / LaTeX )." If these formats cause difficulties, please communicate with us. We will work with you to find alternate submission formats.

Our Address

Please send your submission to us at interrogatingdatascience@gmail.com .

Workshop Program

Disciplines and Disciplinarity

  • Building a data science mindset. Brian Keegan (University of Colorado)

  • Intersectional human-centered data science: Raising consciousness among the data-driven. Alicia Boyd (DePaul University) and Brian C. Keegan (University of Colorado)

  • Analyzing challenges in data science team collaboration. Rohith Sothilingam and Eric Yu (University of Toronto)

  • The lineage of human-centered data science. Andrea Figueroa (University of Washington)

Acting on Data

  • Logical force in the giving and taking of data. Elliott Hauser (University of Texas at Austin)

  • Work practices at the intersection of data annotation and ML engineering. Milagros Miceli, Martin Scheussler, and Tianling Yang (Technische Universität Berlin)

  • Towards a folded ecology of interoperability work in translational biomedical research. Andrew S. Hoffman (Radboud University)

Problematizing Data

  • Defining data: Questions of numerical "facts." Malinda Dietrich, Morgan Scheuerman, and Katy Weathington (University of Colorado)

  • Visualizations as data prototypes. Peter Kun (Aalborg University) and Gerd Kortuem (Delft University of Technology)

  • On the shoulders of pull requests: The organization of data work and design for interferometric data. Will Sutherland (University of Washington)

  • Outliers: More than numbers? Dilruba Showkat and Eric P.S. Baumer (Lehigh University)

Values and Ethics in and of AI

  • Accounts, accountability and agency for safe and ethical AI. Rob Proctor (Warwick University), Mark Rouncefeld (Lancaster University), and Peter Tolmie (University of Siegen)

  • Harm and data science: Framing negative human outcomes as a data science problem. Katherine Weathington and Morgan Klaus Scheuerman (University of Colorado)

  • Multiple virtues in data work: CDIS' practices of generating healthcare data. Kathleen H. Pine (Arizona State University) and Claus Bosson (Aarhus University)

  • Data science for all. Jácome Cunha (1), José Dias (1), Paula Pereira (1), João P. Fernandes (2), and Rui Pereira (1). 1: University of Minho. 2:Univerity of Coimbra

Organizers

Michael Muller studies work-practices of data science workers at IBM Research (Cambridge MA USA). With colleagues, he has analyzed how humans intervene (individually and collaboratively) between "the data" and "the model" as aspects of responsible and accountable data science work.

Cecilia Aragon is a Professor in the Department of Human Centered Design Engineering and Director of the Human-Centered Data Science Lab at the University of Washington. Her research focuses on enabling humans to explore and gain insight from vast data sets. In 2008, she received the Presidential Early Career Award for Scientists and Engineers (PECASE)..

Shion Guha is an Assistant Professor in the Department of Computer Science at Marquette University. His research primarily focuses on understanding how algorithms are designed, deployed and implemented in public services such as child welfare or criminal justice systems. His methodological research focuses on developing methods that bridge computational and human-based analyses.

Marina Kogan is an Assistant Professor in the School of Computing at the University of Utah. Her research interests are in Crisis Informatics, Social Computing, and Network Science. She studies how social media platforms — as both complex and sociotechnical systems — affect and are affected by social behavior.

Gina Neff is a Senior Research Fellow and Associate Professor at the Oxford Internet Institute and the Department of Sociology at the University of Oxford. She leads a new multinational comparative research project on the effects of the adoption of AI across multiple industries, and is the author of award-winning Venture Labor (MIT Press, 2012).

Cathrine Seidelin is a PostDoc at the Computer Science Department at the IT University of Copenhagen. She studies data-related work practices in multistakeholder environments and explores how data may become a design medium.

Katie Shilton is an associate professor in the College of Information Studies at the University of Maryland, College Park. She is the PI of the PERVADE project, a multi-campus collaboration focused on data science research ethics.

Anissa Tanweer is a research scientist at the University of Washington’s eScience Institute, where she focuses on studying and advancing human-centered data science practices. She leads UW’s Data Science for Social Good program.