Configurations of Data Science Teams
In our investigations into Human Centered Data Science (HCDS), we described the practices of data scientists and data science workers (Mao et al., 2019, Muller, Lange, et al., 2019; Muller et al., 2021; Zhang et al., 2020), and we offered constructive critiques of those practices (Muller, 2024; Muller & Strohmayer, 2022). We wrote the first textbook on HCDS (Aragon et al., 2022).
Through a survey, we provided early evidence that data science is a "team sport" (Zhang et al., 2020), as contrasted with the then-prevalent idea of an individual data scientist working alone. We examined the roles, tools, and collaborative relationships of data science workers - i.e., a broader set of concepts than "data scientist." We showed how collaborations made use of tools, and were sometimes structured by available tools. Our analysis contributed to studies by others about the team dynamics of data science projects.
Data Practices in Data Science
We then examined how data science workers work with their data, through a series of qualitative interviews (Muller, Lange, et al., 2019). We explored the diversity of their practices, and we began to question how data science workers specify, revise, and re-specify their data collection methods, thereby re-specifying what constitutes data in their projects. These interviews led to critical data studies concepts of "how do data become 'data'?"
We examined data science practices in detail. We focused on labeling (i.e., annotating) data with "ground-truth" values (Muller et al., 2021). We described three patterns of labeling: (a) pre-planned practices with definitive a priori labels; (b) adaptive practices that anticipated and allowed for modification and revision of labels; and (c) improvisational practices that were subject to sudden, unplanned revisions and redefinitions of both labels and labeling practices. These studies advanced our understanding of "ground truth" as a human-constructed type of data that reflected human expectations, project constraints, and client-imposed requirements that were sometimes applied after a complete set of labels had been created. These studies also informed our related projects to apply some labels algorithmically, emphasizing the need for human inspection and revision of algorithmic outcomes (Ashktorab et al., 2021; Desmond et al., 2021) - i.e., practice-informed trade-offs of productivity vs. over-reliance on algorithmically-derived labels.
Critical Analyses of Data Practices
We used these learnings to inform a series of workshops that explored (Kogan, 2020; Muller, Feinberg, et al., 2019) and challenged (Muller et al., 2020) Tanweer et al., 2022) how humans work with data. As Bowker (2008) and others have argued (e.g., Gitelman, 2013), data do not exist in "raw" form, and must be refined and revised to make those data fit-for-purpose. Møller et al. described these practices as data work (2020; Møller, 2024), and Sambasivan et al. showed that data science workers prefer to focus on the model work in their projects, rather than on the data work (2021). Using those analyses and the results of our workshops, we wrote a critical analysis of the many under-studied steps in data work, and the many opportunities for human biases to enter into those steps (Muller & Strohmayer, 2022). Along with the workshop outcomes, our critical studies of data science practices (Muller, Lange, et al., 2019) and labeling practices (Muller et al., 2021) contributed to that analysis.
Using Sambasivan et al.'s insights, we described data work as a series of steps in which each step k potentially inserts assumptions and biases into the data, and each step k+1 tends to overlay the weaknesses of the preceding steps with a belief in "objective" and "bias-free" operations. We called this series of steps a "forgettance stack," because each operation on the stack tends to cause humans to forget the weaknesses of the previous steps. We proposed initial counter-measures and counter-metrics that could help data science workers to resist the temptation to forget the weaknesses of their data work, and to aid in revising their data work when needed. Inspired by Onuoha's concept of data silences (Onuoha, 2016), we described a taxonomy of data silences in machine learning. Our 2022 paper won a best paper honorable mention award. Two years later, our revision of that taxonomy won a best paper award (Muller, 2024).
We began to think about data as a verb - i.e., while we model the data, we usually need to data the model. That is, we use data to configure and parameterize the model. However, as noted above, we also change the data to accommodate the practices for creating the model, and thus data and model exist in a process of mutual co-construction. In an exploratory article, we asked about the practices that are associated with "data-ing," and what other practices are associated with the inverse, "un-data-ing" (Strohmayer & Muller, 2023). We wanted to question the status of "data" as a medium of design (e.g., Feinberg, 2017) that was necessarily transformed before it could be put to use (e.g,. Bowker, 2008). Data began to look less and less like an objective "given," and more like a human construction (Aragon, 2022; Feinberg, 2022). The role of human decision-making in selecting data (Muller, Lange, et al., 2019) and characterizing data (Muller et al., 2021) became more and more salient. Earlier, Seager (2016) had concluded that "what gets counted, counts." Guiliano and Estill (2023) added that "what gets characterized, counts." We began to realize that who does the counting also counts, as well.
Data Practices in the Sciences
In 2025, we expanded these themes into a set of "parables" for data work and data practices (Muller & Morrison, under revision). We noted that, as Bowker had argued, good science inevitably requires data work practices that make "raw" data fit-for-analysis. In each parable, we used examples from history of science to illustrate a data work practice and its potential problems, and we proposed constructive modifications to practices to help researchers to (re)examine their data work and their data. We summarized these problems and proposals in the concept of data attention. Our proposals became data practices for more consciously applying human data attention to understand "how data become 'data'." Reviewers asked us to strengthen our data attention analysis, and we will be resubmitting this work during 2026.
Conclusion
Human data practices require further investigation. If data are designed (Feinberg, 2017, 2022), constructed (Muller, Lange, et al., 2019; Muller & Strohmayer, 2022; Muller et al., 2021), and altered (Bowker, 2008; Gitelman, 2013), we need to analyze how this happens. As Bowker, Feinberg, and Gitelman have argued, many alterations in data are done with good intent (see also Muller, Lange, et al., 2019; Muller et al., 2021). However, as argued by Proctor and Scheibinger (2025) and Brevini et al. (2024), some data are altered deliberately, to achieve social or political goals (see also Aragon et al., 2022; Muller, 2024; Muller & Strohmayer, 2022). Data science workers of good intent will want to understand "how data become data," and how their own actions interact with bias, oppression, and human flourishing (e.g., Almetoglu et al, 2025).
References
Ahmetoglu, Y., Somanath, S., Lallemand, C., Solovey, E. T., Brumby, D. P., & Cox, A. L. (2025, June). Paving the Way for AI that Supports Flourishing at Work. In Adjunct Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work (pp. 1-3).
Aragon, C., Guha, S., Kogan, M., Muller, M., & Neff, G. (2022). Human-centered data science: an introduction. MIT Press.
Ashktorab, Z., Desmond, M., Andres, J., Muller, M., Joshi, N. N., Brachman, M., ... & Reimer, D. (2021). Ai-assisted human labeling: Batching for efficiency without overreliance. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1-27.
Bowker, G. C. (2008). Memory practices in the sciences. MIT Press.
Brevini, B., Fubara-Manuel, I., Le Ludec, C., Jensen, J. L., Jimenez, A., & Bates, J. (2024). Critiques of data colonialism. In Dialogues in Data Power (pp. 120-137). Bristol University Press.
Desmond, M., Muller, M., Ashktorab, Z., Dugan, C., Duesterwald, E., Brimijoin, K., ... & Pan, Q. (2021, April). Increasing the speed and accuracy of data labeling through an ai assisted interface. In Proceedings of the 26th International Conference on Intelligent User Interfaces (pp. 392-401).
Feinberg, M. (2017). A design perspective on data. In Proceedings of the 2017 CHI conference on human factors in computing systems (pp. 2952-2963).
Feinberg, M. (2022). Everyday adventures with unruly data. MIT Press..
Gitelman, L. (Ed.). (2013). Raw data is an oxymoron. MIT Press.
Guiliano, J., & Estill, L. (2023). What gets categorized counts: Controlled vocabularies, digital affordances, and the international digital humanities conference. Digital Scholarship in the Humanities, 38(3), 1088-1100.
Kogan, M., Halfaker, A., Guha, S., Aragon, C., Muller, M., & Geiger, S. (2020, January). Mapping out human-centered data science: Methods, approaches, and best practices. In Companion proceedings of the 2020 ACM international conference on supporting group work (pp. 151-156).
Mao, Y., Wang, D., Muller, M., Varshney, K. R., Baldini, I., Dugan, C., & Mojsilović, A. (2019). How data scientistswork together with domain experts in scientific collaborations: To find the right answer or to ask the right question?. Proceedings of the ACM on Human-Computer Interaction, 3(GROUP), 1-23.
Muller, M. (2024). Data silences: How to unsilence the uncertainties in data science. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM) (pp. 388-391). Society for Industrial and Applied Mathematics.
Muller, M., Aragon, C., Guha, S., Kogan, M., Neff, G., Seidelin, C., ... & Tanweer, A. (2020). Interrogating data science. In Companion Publication of the 2020 Conference on Computer Supported Cooperative Work and Social Computing (pp. 467-473).
Muller, M., Feinberg, M., George, T., Jackson, S. J., John, B. E., Kery, M. B., & Passi, S. (2019). Human-centered study of data science work practices. In Extended abstracts of the 2019 CHI conference on human factors in computing systems (pp. 1-8).
Muller, M., Lange, I., Wang, D., Piorkowski, D., Tsay, J., Liao, Q. V., ... & Erickson, T. (2019). How data science workers work with data: Discovery, capture, curation, design, creation. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1-15).
Muller, M., & Morison, K. (under revision). Parables for data work in human centered AI: Constructing our data attention.
Muller, M., & Strohmayer, A. (2022). Forgetting practices in the data sciences. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1-19).
Muller, M., Wolf, C. T., Andres, J., Desmond, M., Joshi, N. N., Ashktorab, Z., ... & Dugan, C. (2021). Designing ground truth and the social life of labels. In Proceedings of the 2021 CHI conference on human factors in computing systems (pp. 1-16).
Møller, N. H. (2024). Uniting Workers and Citizens in IS Design With and Through Data. Scandinavian Journal of Information Systems, 36(2), 16.
Møller, N. H., Bossen, C., Pine, K. H., Nielsen, T. R., & Neff, G. (2020). Who does the work of data?. Interactions, 27(3), 52-55.
Onuoha, M. (2016). The Library of Missing Datasets, https://mimionuoha.com/the-library-of-missingdatasets.
Proctor, R., and Schiebinger, L. (2025). Ignorance unmasked: The new science of agnotology. Stanford University Press.
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., & Aroyo, L. M. (2021). “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-15).
Seager, J. (2016). Missing women, blank maps, and data voids: What gets counted counts. Talk at the Boston Public Library, March, 22.
Strohmayer, A., & Muller, M. (2023). Data-ing and Un-Data-ing. Interactions, 30(3), 38-42.
Tanweer, A., Aragon, C. R., Muller, M., Guha, M. S., Neff, G., & Kogan, M. (2022). Interrogating Human-centered Data Science: Taking Stock of Opportunities and Limitations. In CHI Conference on Human Factors in Computing Systems Extended Abstracts.
Zhang, A. X., Muller, M., & Wang, D. (2020). How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW1), 1-23.