This workshop is a joint initiative by the Data Ethics Intiative at LMU's Department of Statistics and the Munich Center for Machine Learning (MCML).
Data ethics is concerned with the ethical questions that may emerge in any type of engaging with data -- whether designing, creating, collecting, processing, analyzing, sharing, linking, commissioning, or otherwise 'using' data or data products.
Our understanding of data ethics aligns closely with the perspective set out in the Ethical Data Initiative’s statement. For us, data ethics addresses multiple issues such as
privacy;
openness, data sharing, and accessibility;
consent and transparency;
ownership – both (1) of individuals over their own personal data, and (2) rightful claims and credit for data producers over data they have created;
tracing data provenance, uses, and effects on groups and individuals;
accountability of data users and data producers;
potential pitfalls and limitations of measurement and quantification in general.
Yet, data ethics is decidedly not a static checklist to tick off, nor something to be 'solved in the abstract'. Rather, data ethics only becomes manifest in a specific context and through the consequences that our choices (may) have.
These choices occur along the entire data journey: from our initial intentions – which information we would like to obtain and the choice to collect data in the first place –, design decisions, through data collection (or selection of external sources if we don’t collect our own data), processing, and analysis, to the dissemination of data products or results, and the deployment of, for instance, trained prediction models. Context matters on multiple levels. For instance, in data collection, it may mean the micro level (the individual from whom data are collected), the meso level (e.g., timing, location, and instruments used), and the macro level (e.g., political, economic, or scientific environments).
Likewise, when considering the consequences, context should be taken into account at different levels. In assessing the consequences of data and data use – both harms and benefits – we must ask questions such as: Who is affected, how, and why? How can better choices improve the consequences? What characterizes situations in which we can’t find a moral way to collect or use data? How can we ensure such ethical uses – or why can’t we? Harms and benefits arise for individuals, societies, and beyond – e.g., through environmental impacts and material consequences. Harms and benefits to humans can be highly diverse – including financial, social, participatory, and epistemic.
Data ethics addresses anyone engaging with and affected by data: individuals, groups, organizations, and societies. As 'moral agency' is most often attributed to individuals, the responsibility for data ethics is ultimately anchored in the individual statistics practitioner, data scientist, researcher, or decision-maker (see ISI declaration). At the same time, many external factors – such as frameworks, institutional design, demands on data and knowledge production, power, selection of personnel, and technology – influence the ethical environment for the individual and for society.