AI and Indigenous Data Sovereignty:

Responsible Use for Biodiversity Monitoring

What does ‘AI for Tribal biodiversity monitoring’ actually mean?

In developing technology with Tribal nations, how can we tangibly and practically incorporate Traditional Ecological Knowledge and Indigenous Data Sovereignty principles?
Our goal: provide tangible use cases of AI for biodiversity monitoring, tradeoffs and benefits especially in the context of TEK and IDS, and open questions we're thinking about.

Use Cases

1: Species Classification from Audio and Image Data

Current process: Once a network of camera and audio traps are set up, images and sounds are hand labeled by those with intimate knowledge of their ancestral land being monitoring

AI Use Case: Use image and audio classifiers to identify species from camera images or acoustic monitor sound bites. This can help land managers understand wildlife population size, track endangered and/or invasive species presence.

Benefits: This would reduce the time and effort needed to label all of the image and audio data. Without having to label each data point by hand, there is substantially more bandwidth for other activities.

Considerations:

Training an AI model requires lots of data. Collecting this data is often labor intensive. Who is collecting this data? How are they being compensated?
Who owns the model? Image classifiers for wildlife data are often trained on large open datasets, but if the model is further refined by the Tribe, where does ownership lie?
How is the workflow for using the AI implemented? Locally? In the cloud?
AI models are always imperfect. They will always make some level of mistakes. This is always important to recognize and be aware of - the model does not output guaranteed truth.
Human out of the loop: The identification of the data from images is not only a way to ensure quality within the species identification process, it helps the Tribal scientists have a direct connection with the animals, allowing the team to see patterns in the data, and develop hypotheses for future work. Going through the images can be a joyful aspect of this work which could be lost.

2: Using ML to Model Population Estimates of Wildlife or Vegetation

Current process: Once a network of camera and audio traps are set up, images and sounds are hand labeled by those with intimate knowledge of their ancestral land being monitoring

Benefits: Knowing the patterns of biodiversity in a region is incredibly helpful for making land management decisions.

Considerations:

Species locality information can be extremely sensitive data. Knowing the location of species can put the environment or species at risk for destruction including risk of poaching and exploitation, land use conflicts, tourism of rare or endangered species. In the context of Indigenous Data Sovereignty, a tribe may also decide this data should to remain private for reasons of cultural importance.
Population estimates are frequently oversimplified, resulting in flawed predictions because of the intricate and ever-changing characteristics of actual ecological systems. Over-complicated models with limited datasets can lead to overfitting, however, which also produces flawed predictions. Modeling is therefore an exercise in trying to balance simplicity and complexity for the best possible predictions.
Models may fail to properly account for changing environmental conditions, which can greatly influence population dynamics.

3: Plain Text Processing for Data Analysis and Visualization

Current process: Once labeled image and sound data is acquired, code must be written to both filter and vizualize the data.

AI Use Case: Fine tuning an LLM to convert plain text requests into code to filter data, and code to visualize data.

Benefits:

Technical capacity to perform data analysis is often limited. Implementing AI for use of systems would reduce the time and effort required to produce visualizations for the data, which are an important and helpful part of data analysis.
This could also increase Tribal agency by reducing dependence on outside collaborators to produce the visualizations.
This would lower the technical barrier to be able to engage with the data, allowing a broader set of people in the tribe to be able to use and understand it.

Considerations:

AI models are always imperfect. They will always make some level of mistakes. This is always important to recognize and be aware of - the model does not always output what is intended.
Many LLMs were trained on large amounts of data that may have been used without consent or compensation.
The entire lifecycle of LLMs (and many other deep learning models) have high environmental costs, from manufacturing, to training the model, to model deployment and use. General purpose LLMs have notoriously high energy costs - much higher than models trained for a specific task.
Even if you host an LLM locally, the training of the LLM you’re using may have high energy, carbon, and water costs. Running your model locally may also still be very resource intensive, depending on the model.
Due to the high environmental toll of data centers, it's also important to consider the specific land that servers are built on, and which lands and people are affected by the environmental cost.

Open Questions

What would help our team’s wildlife conservation work?
When are we comfortable using the Cloud for technical workflows?
How do we approach tradeoffs between security concerns and access to crucial technical elements?
How can we get both understanding and consensus between different groups of stakeholders within a tribal entity?
Can we sustainably build AI systems without using the cloud?
Which parts of the technology we build can we make available to other tribes? Which parts must remain confidential?

Reach out to us to chat, with questions, or to read the white paper!

contact: ccmartinez@berkeley.edu, debruyn@berkeley.edu

Page updated

Google Sites

Report abuse