This toolkit is intended to serve as a planning, reflection, and evaluation guide for research teams who leverage data sciences methods and tools in their work. Developed by the Data SRI: Ethics and Social Justice Committee, the toolkit provides guidance on the development of data science research projects that moves toward more ethically and socially just processes and outcomes, while generating new knowledge, opportunities, and ideas for the field at large.
The toolkit includes a set of questions team members should ask and reflect on along the research development process: ideation, proposal writing, sampling/data collection, instrumentation, analysis, implementation/interpretation of outcomes & products, etc. The toolkit does not provide advice on analytical tools, as these will vary depending on what the research is about.
This toolkit may also serve as a guide (by the Data SRI) used to evaluate proposals for funding.
How does the proposed research plan include objectives of improving the well-being of individuals, communities, and/or the global landscape?
How does this research address social challenges and social inequities?
Resources
How might the results from this research be used: to make or guide public policy, to distribute resources and programs (equally or unequally), or to offer products and/or services to the public or to private individuals and organizations?
Resources
Is data science (applied or methodological) the right tool for the job?
What are alternative ways the problem might be conceptualized and why is one of them preferred?
Why is collecting or analyzing data an appropriate solution to this problem?
Who is involved in determining the research problem, methodology, and applications of the answers?
How does the team include and/or consider individuals and institutions/communities who will ultimately be affected by the tool?
What might be team members’ connections to ways data science research can be applied toward social good?
Resources
What measures will be taken to ensure privacy, de-identification of data, secure storage of images and data, and (affirmative) consent from participants?
What precautions are in place for "small" samples and vulnerable populations (esp data that reveals racial or ethnic origins, political opinions, sexual orientation, or religious beliefs)?
Resources
Does the methodology potentially perpetuate or address statistical, AI/ML biases that can cause unintended harm to populations, locales, organizations?
Such biases may include but are not limited to: sampling bias, algorithmic bias; interaction bias; selection bias; labeling bias. See the resource sheet for how these terms are operationalized.
Resources
How will the data collection process, including annotation and labeling of data, be documented?
If applicable, does your study have a pre-analysis plan?
Does the study follow any guidelines for algorithmic transparency (open-source documentation) and accountability?
If applicable does your study allow for Datasheets for Datasets that describe its operating characteristics, test results, recommended uses, and other information.
Resources
Does the research team (including partner organizations and collaborators) include expertise (this can include lived and research experiences, connections, values, priorities) related to the subgroups in the community?
In what ways do you anticipate your research processes, knowledge, outcomes, and tools generated will improve the livelihoods (i.e., greater access to power and resources) of the subgroups in the community?
Broadly, how does your research relate to these subgroups, and in what ways will your work not only mitigate harm but the processes and outcomes serve to improve lives & our relationship to the environment & urban/lived spaces (as applicable)? These groups listed are not all-inclusive.
To include an intersectional lens that redress system inequities (e.g., racism, sexism, ableism, and other harmful forms of bias and discrimination) that have had adverse intentional and unintentional impacts on individual livelihoods and subgroups in their opportunities and access to power and resources.
AI Colonialism (Series from MIT Tech Review, 2022)
To include locals and community members (for studies within and outside of the US).
To include indigenous peoples (for studies within the US and outside the US) - those who self-identify as indigenous peoples and participate in their community. Such communities may have historical continuity with pre-colonial and/or pre-settler societies; form non-dominant groups of society; and may maintain a distinct language, culture, beliefs, social, economic, or political systems.
To include the faculty, staff, students, alumni of the California Polytechnic State University system.
To include studies related to land, water, air, and living entities (e.g., plant & animal life). To include our relationships with the environment (e.g., to consider how socio-ecological decisions, those made by individuals, communities, and organizations, and institutions impact the natural and built worlds, with goals of creating and sustaining socially and environmentally just decisions-thru data science).
To include access to urban spaces and developments, their construction, upkeep and consideration for both the communities they serve and the environment that they inhabit.