Tools

This toolkit is intended to serve as a planning, reflection, and evaluation guide for research teams who leverage data sciences methods and tools in their work. Developed by the Data SRI: Ethics and Social Justice Committee, the toolkit provides guidance on the development of data science research projects that moves toward more ethically and socially just processes and outcomes, while generating new knowledge, opportunities, and ideas for the field at large.

The toolkit includes a set of questions team members should ask and reflect on along the research development process: ideation, proposal writing, sampling/data collection, instrumentation, analysis, implementation/interpretation of outcomes & products, etc. The toolkit does not provide advice on analytical tools, as these will vary depending on what the research is about.

This toolkit may also serve as a guide (by the Data SRI) used to evaluate proposals for funding.

Data for Good

Explore research objectives in relation to human rights. Build inclusive and diverse teams. Collect data transparently and allow for appropriate replicability. Curate data with attention to privacy and data sovereignty. Analyze tools that mitigate harmful bias and produce just outcomes and processes.

Research Objectives

How does the proposed research plan include objectives of improving the well-being of individuals, communities, and/or the global landscape?
How does this research address social challenges and social inequities?
Resources
- Who is helped and who is harmed? (Gebru, 2021)
- Resistance AI Workshop (NeurIPS, 2020)

Policy Implications

How might the results from this research be used: to make or guide public policy, to distribute resources and programs (equally or unequally), or to offer products and/or services to the public or to private individuals and organizations?
Resources
- Algorithmic Justice League
- From Evidence to Policy Jameel Poverty Action Lab
- Beyond prediction: Using big data for policy problems Science (Athey, 2017)

Project Selection and Scope of WOrk

Is data science (applied or methodological) the right tool for the job?
What are alternative ways the problem might be conceptualized and why is one of them preferred?
Why is collecting or analyzing data an appropriate solution to this problem?

BUILDING the TEam

Who is involved in determining the research problem, methodology, and applications of the answers?
How does the team include and/or consider individuals and institutions/communities who will ultimately be affected by the tool?
What might be team members’ connections to ways data science research can be applied toward social good?
Resources

Data Curation

What measures will be taken to ensure privacy, de-identification of data, secure storage of images and data, and (affirmative) consent from participants?
What precautions are in place for "small" samples and vulnerable populations (esp data that reveals racial or ethnic origins, political opinions, sexual orientation, or religious beliefs)?
Resources
- Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons Learned (Kenthapadi, 2018)
- The Science of Socially Aware Algorithm Design (Kearns & Roth, 2019)
- Differential Privacy: A primer for non-technical audience (Nissim et al., 2019)
- U.S. Indigenous Data Sovereignty Network

METHODOLOGY

Does the methodology potentially perpetuate or address statistical, AI/ML biases that can cause unintended harm to populations, locales, organizations?
Such biases may include but are not limited to: sampling bias, algorithmic bias; interaction bias; selection bias; labeling bias. See the resource sheet for how these terms are operationalized.
Resources
- Defining bias in NLP (Blodgett, 2021)
- Bias on Search and Recommender Systems (Baeza-Yates, 2021)
- Biased Programmers? Or Biased Data? A Field Experiment in Operationalizing AI Ethics (Cowgill et al., 20201)

Transparency and Replication

How will the data collection process, including annotation and labeling of data, be documented?
If applicable, does your study have a pre-analysis plan?
Does the study follow any guidelines for algorithmic transparency (open-source documentation) and accountability?
If applicable does your study allow for Datasheets for Datasets that describe its operating characteristics, test results, recommended uses, and other information.
Resources
- NeurIPS 2021 Datasets and Benchmarks Track
- Data Statements worksheets for example in Natural Language Processing (NLP) work.
- Examples of pre-analysis plans in the social sciences.

Data for All

What populations are prioritized and centered?

What subgroups and entities may stand to be neglected, marginalized, and further harmed?

The Research Team

Does the research team (including partner organizations and collaborators) include expertise (this can include lived and research experiences, connections, values, priorities) related to the subgroups in the community?

IMPROVING LIVEIHOODs

In what ways do you anticipate your research processes, knowledge, outcomes, and tools generated will improve the livelihoods (i.e., greater access to power and resources) of the subgroups in the community?

Subgroups to Consider

Broadly, how does your research relate to these subgroups, and in what ways will your work not only mitigate harm but the processes and outcomes serve to improve lives & our relationship to the environment & urban/lived spaces (as applicable)? These groups listed are not all-inclusive.

Historically Underrepresented Minorities

To include an intersectional lens that redress system inequities (e.g., racism, sexism, ableism, and other harmful forms of bias and discrimination) that have had adverse intentional and unintentional impacts on individual livelihoods and subgroups in their opportunities and access to power and resources.
- AI Colonialism (Series from MIT Tech Review, 2022)

Local Context / Community

To include locals and community members (for studies within and outside of the US).

Indigenous Populations

To include indigenous peoples (for studies within the US and outside the US) - those who self-identify as indigenous peoples and participate in their community. Such communities may have historical continuity with pre-colonial and/or pre-settler societies; form non-dominant groups of society; and may maintain a distinct language, culture, beliefs, social, economic, or political systems.
- U.S. Indigenous Data Sovereignty Network
- State of Open Data: Indigenous Data Sovereignty

Cal Poly Community

To include the faculty, staff, students, alumni of the California Polytechnic State University system.

Environment

To include studies related to land, water, air, and living entities (e.g., plant & animal life). To include our relationships with the environment (e.g., to consider how socio-ecological decisions, those made by individuals, communities, and organizations, and institutions impact the natural and built worlds, with goals of creating and sustaining socially and environmentally just decisions-thru data science).

Urban and/or Lived Space

To include access to urban spaces and developments, their construction, upkeep and consideration for both the communities they serve and the environment that they inhabit.
- Beyond Fairness: Big data, racial justice, and housing (MIT)

Page updated

Google Sites

Report abuse