Day 26
Today
In-class discussion on ethics in data science
For Next Time
Remember, we'll be having our data science Final Expo on May 3rd (Tuesday) from 12pm-2pm in AC326.
All project deliverables besides the poster are due by the end of the day Wednesday May 4th.
Ethics of Data Science Discussion
Small Group Discussions
Sit in a group of 3 or 4. To the extent possible, please sit with people that you haven't had a chance to work with and that you don't normally sit with. The point of this is to encourage each table to represent as diverse a set of perspectives as possible.
Before beginning, pick one of laptop to be the scribe laptop. This laptop should rotate around the group to allow each person to have a turn taking notes. Notes should be captured in a Google doc and shared with me (paullundyruvolo@gmail.com).
Note your individual positionality [5 minutes]
Do you have previous thoughts/biases on these topics? Are you one of the stakeholders mentioned in the article? For example, suppose you’ve already done work on image recognition and you know the technology intimately, this may influence the discussion. Perhaps you’ve been thinking deeply about diversity in the tech industry; again, this may be useful context for others in your group to know before we start discussing. If you don't yet have an individual positionality, that is okay too (that is one of the points of this activity)! As a high-level guiding principle we are seeking to promote an atmosphere where everyone feels comfortable sharing their opinion even if this is their first chance to explore their thoughts / opinions in these areas. Noting individual background is a good way to make sure that everyone knows where each other is coming from.
Discuss the Case Studies [40 minutes]
By case studies I mean the following articles:
Police Programs Aim to Predict those Most Likely to Commit Crimes
Google Apologizes For Tagging Photos Of Black People As ‘Gorillas'
We have a total of 40 minutes for discussion of these case studies. It is possible that you will be able to get through all four, but if you are only able to get through a couple that is totally fine. For each case study, here are some questions / activities to guide your discussion.
Who are the stakeholders involved in the situation? (individuals, companies, communities, in addition to the data scientist). What did they do and what values (e.g. loyalty, transparency, fairness, harm reduction, etc.) did they act upon? Pretend you are the C.E.O. of the data science company in each of these case studies, what steps might you take to minimize the severity or likelihood of an ethical pitfall?
What is the role of the data scientist in the midst of all these stakeholder interactions and values conflicts? If you were the data scientist in these case studies, what would you have done? (e.g. suggest stakeholder engagement methods so members of the relevant population are included in the R&D process early and often, putting in extra effort to be transparent about your research method with the audience and participants, etc). Knowing the stakeholders’ context, would you still want to work on this project?
Given the list of goals determined in (2), as a data scientist what strategies could you employ to realize these goals? When strategizing, remember to consider the interpersonal dynamics and office politics involved.
Extract a Set of Principles [20 minutes]
Generate a set of principles for responsible data science. These principles could be centered around the types of problems that we should be addressing with data science, the way in data scientists interact with other stakeholders, or specifics of the technical methods data scientists apply to problems. For each identified principle, identify it as a personal principle (P), a principle that you would like any company/organization (O) you work for to share, or a universal principle (U) that all data scientists should aspire to (indicate which it is by writing the letter, U, P, or O in the upper right corner of the sticky). Work through this activity in the following phases:
On your own generate your set of principles [8 minutes]
Aggregate the group’s responses. Put everyone’s individual responses on the whiteboard (not trying to get to a consensus) [2 minutes]
Take a look at your combined responses. As a group, discuss any interesting similarities or differences. Add the set of observations to your notes [10 minutes]
Class-wide Activity
Take a tour [15 minutes]
Take a stroll around the room. Examine the values and guidelines you see on the post-its.
Relate Your Guidelines to the Guidelines Articles [10 minutes]
We read two articles that were higher-level (above the level of a single case study) about data science ethics. These articles are:
The Ethical Data Scientist -- People have too much trust in numbers to be intrinsically objective.
Data Science Association - DATA SCIENCE CODE OF PROFESSIONAL CONDUCT
The goal of this class-wide discussion is to relate the guidelines that you all generated to these articles. To get us started, here are some possible discussion questions:
Do they say similar things to what the class groups wrote? Different? What’s missing?
Is it possible to have a “universal” list of values that the Data Science community should strive towards? Or is this ultimately a personal activity and trying to make a universal list would be a fruitless effort?
Give Some Feedback
I would really appreciate it if you would fill out this survey to help me get a sense of both how effective this activity was and what deltas you have for this activity or for the integration of ethics-related topics into this course more generally.