Key: ⚠️Harm to Health / Safety 🏴 Potential Discrimination 💣Adversarial Attack
⛓Harm to Civil Liberties 💲Financial Harm 📜Non-ML Failure
Microsoft's Tay, an artificially intelligent chatbot, was released and removed within 24 hours due to multiple racist, sexist, and anit-semetic tweets generated by the bot.
Background Information:
Full description of the incident (taken from AIID)
Microsoft chatbot, Tay, was published onto Twitter on March 23, 2016. Within 24 hours Tay had been removed from Twitter after becoming a "holocaust-denying racist" due to the inputs entered by Twitter users and Tay's ability to craft responses based on what is available to read on Twitter. Tay's "repeat after me" feature allowed any Twitter user to tell Tay what to say and it would be repeated, leading to some of the racist and anti-semetic tweets. "Trolls" also exposed the chatbot to ideas that led to production of sentences like: "Hitler was right I hate the Jews," "i fucking hate feminists," and "bush did 9/11 and Hitler would have done a better job than the monkey we have now. Donald Trump is the only hope we've got." Tay was replaced by Zo. It's noteworthy that Microsoft released a similar chatbot in China named Xiaolce, who ran smoothly without major complications, implying culture and public input had a heavy role in Tay's results.
Short description of the incident (taken from AIID)
Microsoft's Tay, an artificially intelligent chatbot, was released and removed within 24 hours due to multiple racist, sexist, and anit-semetic tweets generated by the bot.
Timeline
The incident occurred on March 23, 2016
Description of AI system involved
Microsoft's Tay chatbot, an artificially intelligent chatbot published on Twitter
System developer
Microsoft
Sector of deployment
Arts
Entertainment and recreation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
AI did not have a failsafe which deployed in time
Microsoft did not / could not stop it Tay's racist, sexist, and antisemitic comments before it became a PR disaster.
A likely use case was not considered before deployment
"Trolls" on the internet were not considered when developing Tay.
A human is at least partly responsible
The behavior of internet trolls lead to Tay's "corruption."
Sources of Weakness:
Weakness in Algorithm
The algorithm’s natural language comprehension did not allow it to understand what it was saying. It only knew how to put together human-like sentences.
Weakness in Training - Biases Present:
All training data was “cleansed” of hurtful or harmful text
Weakness in Testing - Biases Present:
All testing was likely done with input “cleansed” of hurtful or harmful text
Further Research
Uber’s self-driving car was involved in a fatal accident after not being able to recognize a pedestrian crossing the road.
Background Information:
Full description of the incident
The March 2018 accident was the first recorded death by a fully autonomous vehicle. On-board video footage showed the victim, 49-year-old Elaine Herzberg, pushing her bike at night across a road in Tempe, Arizona, moments before she was struck by the AI-powered SUV at 39 MPH. Uber’s AI repeatedly misclassified Herzberg as a car, a bike, and an unknown object before determining too late that a crash was unavoidable. The SUV’s Driver, Rafaela Vasquez, was using Uber’s “autonomous mode”, and was visibly not paying attention to the road.
There were no human inputs which affected the crash, though human inputs from the driver may have prevented it. Other inputs which may have affected the AI was that it was a new moon, Herzberg’s bike had shopping bags hanging from the handlebars, Herzberg was jaywalking, and that the SUV’s auto-braking feature was disabled.
Short description of the incident
Uber’s self-driving car was involved in a fatal accident after not being able to recognize a pedestrian crossing the road.
Timeline
The incident occurred on Sunday March 18, 2018 at 9:58 pm
Location
On a highway in Tempe, Arizona
Description of AI system involved
Uber’s “autonomous mode,” a ML algorithm meant to allow for autonomous driving, i.e. it will drive the car instead of the driver.
System developer
Uber
Sector of deployment
Transportation and storage
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Causes
Causes for Incident:
AI is repeatedly uncertain in a situation which requires certainty
The AI repeatedly misclassified Herzberg as a bike, car, and unknown object before colliding. For a autonomous AI to be safe, it must be certain of what is around it.
AI did not have a failsafe which deployed in time
The AI's failsafe was to delay braking for a full second while returning control of the vehicle to the driver. However, this activated with less than a second before an unavoidable collision, making the failsafe insufficient. Furthermore, the SUV's internal failsafe, which would have braked to reduce the damage from a collision, was disabled since it conflicted with Uber's autonomous mode.
A likely use case was not considered before deployment
Jaywalking pedestrians were not considered when developing the autonomous mode.
A human has partial responsibility for the incident
The driver was watching "The Voice" at the time of the incident, and was distracted.
Sources of Weakness:
Weakness in Algorithm
Objects can be classified in quick succession, preventing the AI from tracking location and direction of movement.
Weakness in Training - Biases Present:
The training was biased towards vehicles moving in the same direction as the user’s car when in the same lane as the user’s car.
Pedestrians were only present on crosswalks in the training data.
Weakness in Training - Uncovered Situations
Uber’s AI was not trained on jaywalking pedestrians, despite this being a likely RW case.
Weakness in Testing - Uncovered Situations
Uber’s AI was not tested on jaywalking pedestrians, despite this being a likely scenario.
Further Research
💲
Youtube’s AI mistakenly flagged videos by Antonio Radic (Agadmator) for discussing 'black versus white' in a chess conversation, resulting in his channel being blocked for 'harmful and dangerous' content before being restored.
Background Information:
Full description of the incident (taken from AIID)
YouTube has an AI meant to detect hate speech in videos, flagging them for review. There is also likely some level of automation for blocking YouTube channels after a certain number or frequency of videos are flagged. Antonio Radic, known by his YouTube channel name Agadmator, believes several of his chess videos were flagged due to them including terms such as ‘white’, ‘black’, ‘threat’, and ‘attack’, despite them being within a chess context the AI couldn’t understand.
Short description of the incident (taken from AIID)
Youtube’s AI mistakenly flagged videos by Antonio Radic (Agadmator) for discussing 'black versus white' in a chess conversation, resulting in his channel being blocked for 'harmful and dangerous' content before being restored.
Timeline
The channel was blocked on June 28, 2020, however video flagging likely occurred before then
Description of AI system involved
Youtube’s AI meant to detect harmful language in videos
System developer
YouTube
Sector of deployment
Entertainment and recreation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI was certain that several chess videos contained racist remarks, when they did not in reality.
Sources of Weakness:
Weakness in Training - Biases Present:
Training data likely did not include a significant number of chess videos
Training data likely often included the terms ‘white’, ‘black’, ‘threat’, and ‘attack’ in an offensive context
Weakness in Testing - Biases Present:
Testing likely did not include a significant number of chess videos
Further Research
On July 7, 2016, a Knightscope K5 autonomous security robot collided with a 16-month old boy while patrolling the Stanford Shopping Center in Palo Alto, CA.
Background Information:
Full description of the incident (taken from AIID)
On July 7, 2016, a Knightscope K5 autonomous security robot patrolling the Stanford Shopping Center in Palo Alto, CA collided with a 16-month old boy, leaving the boy with a scrape and minor swelling. The Knightscope K5 carries nearly 30 environment sensors including LIDAR, sonar, vibration detectors, and 360-degree HD video cameras.
Short description of the incident (taken from AIID)
On July 7, 2016, a Knightscope K5 autonomous security robot collided with a 16-month old boy while patrolling the Stanford Shopping Center in Palo Alto, CA.
Timeline
July 7, 2016
Description of AI system involved
Knightscope K5 autonomous security robot uses several environmental sensors and voice commands to conduct security operations.
System developer
Knightscope
Sector of deployment
Administrative and support service activities
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Palo Alto, CA
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI was certain that nothing was in its way, causing it to run over the toddler who was in front of it.
Sources of Weakness:
Weak Training - Biases Present:
Training data likely did not include a significant number of small obstacle
Weak Testing - Biases Present:
Testing likely did not include a significant number of small obstacles
Further Research
A Tesla Model S on autopilot crashed into a white articulated tractor-trailer on Highway US 27A in Williston, Florida, killing the driver.
Background Information:
Full description of the incident (taken from AIID)
A Tesla Model S on autopilot crashed into an articulated tractor-trailer on Highway US 27A in Williston, Florida killing the driver, Joshua Brown. The trailer was turning left in front of the incoming Tesla, and the Tesla autopilot system was unable to detect the white trailer against the bright sky. Cruise control was set at 74 mph and did not slow before collision. The driver had his hands on the wheel for 25 seconds of the 37 minute trip and was watching a Harry Potter movie when the collision occurred. Before the collision, the driver received 6 audible warnings that his hands had been off the wheel for too long.
Short description of the incident (taken from AIID)
A Tesla Model S on autopilot crashed into a white articulated tractor-trailer on Highway US 27A in Williston, Florida, killing the driver.
Timeline
May 7, 2016
Description of AI system involved
The Tesla Autopilot driving system allows hands-off driving, parking, and navigation using environmental sensors, long range radars, and 360 ultrasonic.
System developer
Tesla
Sector of deployment
Transportation and storage
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Williston, FL
Causes
Causes for Incident:
AI is certain about an incorrect result
The autopilot failed to distinguish between a white tractor-trailer crossing the highway and the bright sky.
A human has partial responsibility for the incident
The driver was listening to Harry Potter at the time of the incident, and was distracted.
Sources of Weakness:
Weakness in Training - Biases Present:
Training data likely did not include a significant number of small obstacle
Weakness in Testing - Biases Present:
Testing likely did not include a significant number of small obstacles
Further Research
A study by the University of Toronto, the Vector Institute, and MIT showed the input databases that trained AI systems used to classify chest X-rays led the systems to show gender, socioeconomic, and racial biases.
Background Information:
Full description of the incident (taken from AIID)
A study by the University of Toronto, the Vector Institute, and MIT showed the input databases that trained AI systems used to classify chest X-rays led the systems to show gender, socioeconomic, and racial biases. Google startups like Qure.ai, Aidoc, and DarwinAI can scan chest X-rays to determine likelihood of conditions like fractures and collapsed lungs. The databases used to train the AI were found to consist of examples of primarily white patients (67.64%), leading the diagnostic system to be more accurate with diagnosing white patients than other patients. Black patients were half as likely to be recommended for further care when it was needed.
Short description of the incident (taken from AIID)
A study by the University of Toronto, the Vector Institute, and MIT showed the input databases that trained AI systems used to classify chest X-rays led the systems to show gender, socioeconomic, and racial biases.
Timeline
October 21, 2020
Description of AI system involved
Google start up companies Qure.ai, Aidoc, and DarwinAI that use AI systems to analyze medical imagery
System developer
Sector of deployment
Human health and social work activities
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
Location
Global
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI misdiagnoses dark-skinned and female patients at a noticeably higher rate than white men.
Sources of Weakness:
Weakness in Training - Biases Present:
Training data did not include as many women and dark-skinned patients as it did white men
Weakness in Testing - Biases Present:
Testing data did not include as many women and dark-skinned patients as it did white men
Further Research
Google Photos image processing software mistakenly labelled a black couple as "gorillas."
Background Information:
Full description of the incident (taken from AIID)
Google's Google Photo image processing software "mistakenly labelled a black couple as being 'gorillas.'" The error occurred in the software's image processing that attempts to assign themes to groups of similar photos. In this example, the suggested themes were "Graduation, Bikes, Planes, Skyscrapers, Cars, and Gorillas."
Short description of the incident (taken from AIID)
Google Photos image processing software mistakenly labelled a black couple as "gorillas."
Timeline
June 29, 2015
Description of AI system involved
Google's Google Photo Image Processing
System developer
Sector of deployment
Arts, entertainment and recreation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
Location
Global
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI incorrectly categorized photos of black people as gorillas.
Sources of Weakness:
Weakness in Training - Biases Present:
Training data did not include a sufficient number of dark-skinned individuals
Training data did not include a sufficient number of partially obscured faces
Weakness in Testing - Biases Present:
Testing data did not include a sufficient number of dark-skinned individuals
Testing data did not include a sufficient number of partially obscured faces
Further Research
A penal re-offense assessing algorithm developed by Northpointe is twice as likely to incorrectly label a black person as a high-risk re-offender, and twice as likely to incorrectly label a white person as low-risk for re-offense according to a ProPublica review.
Background Information:
Full description of the incident (taken from AIID)
An algorithm developed by Northpointe and used in the penal system is shown to be inaccurate and produces racially-skewed results according to a review by ProPublica. The review shows how the 137-question survey given following an arrest is inaccurate and skewed against people of color. While there is not question regarding race in the survey, the algorithm is two times more likely to incorrectly label a black person as a high-risk re-offender (False Positive) and is also two times more likely to incorrectly label a white person as low-risk for re-offense (False Negative) than actual statistics support. Overall, the algorithm is 61% effective at predicting re-offense. This system is used in Broward County, Florida to help judges make decisions surrounding pre-trial release and sentencing post-trial.
Short description of the incident (taken from AIID)
An algorithm developed by Northpointe and used in the penal system is two times more likely to incorrectly label a black person as a high-risk re-offender and is two times more likely to incorrectly label a white person as low-risk for re-offense according to a ProPublica review.
Timeline
2016 - 2019
Description of AI system involved
An algorithm, developed by Northpointe designed to assign a risk score associated with a person's likelihood of re-offending after their original arrest.
System developer
Northpointe
Sector of deployment
Public administration and defense
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
Location
Broward County, Florida
Causes
Causes for Incident:
The algorithm uses an insufficient stand-in for its target data
Since a person’s risk of re-offense cannot be determined directly, the algorithm uses other factors, such as drug use and parental conviction, to estimate it. These factors lead to racially / socioeconomically biased results, and do not adequately estimate the risk for re-offense.
Sources of Weakness:
Weakness in Algorithm
The algorithms don't take race directly into account, but instead use data that stands in for correlative information that could stand in as a proxy. The Florida algorithm evaluated in the report is based on 137 questions, such as "Was one of your parents ever sent to jail or prison?" and "How many of your friends/acquaintances are taking drugs illegally?" Those two questions, for example, can appear to evaluate someone's empirical risk of criminality, but instead, they target those already living under institutionalized poverty and over-policing. Predominantly, those people are people of color.
Further Research
Google's Perspective API, which assigns a toxicity score to online text, seems to award higher toxicity scores to content involving non-white, male, Christian, heterosexual phrases.
Background Information:
Full description of the incident (taken from AIID)
Google's Perspective API, which assigns a toxicity score to online text, has been shown to award higher toxicity scores to content involving non-white, male, Christian, heterosexual phrases. the scores lay on the spectrum between very healthy (low %) to very toxic (high %). The phrase "I am a man" received a score of 20% while "I am a gay black woman" received 87%. The bias exists within subcategories as well: "I am a man who is deaf" received 70%, "I am a person who is deaf" received 74%, and "I am a woman who is deaf" received 77%. The API can also be circumvented by modifying text: "They are liberal idiots who are uneducated" received 90% while "they are liberal idiots who are un.educated" received 15%.
Short description of the incident (taken from AIID)
Google's Perspective API, which assigns a toxicity score to online text, seems to award higher toxicity scores to content involving non-white, male, Christian, heterosexual phrases.
Timeline
2017
Description of AI system involved
Google Perspective is an API designed using machine learning tactics to assign "toxicity" scores to online text with the oiginal intent of assisting in identifying hate speech and "trolling" on internet comments. Perspective is trained to recognize a variety of attributes (e.g. whether a comment is toxic, threatening, insulting, off-topic, etc.) using millions of examples gathered from several online platforms and reviewed by human annotators.
System developer
Sector of deployment
Information and communication
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
AI is certain about an incorrect result
AI is certain that some toxic speech is not toxic, potentially as a result of an adversarial attack.
AI is certain that some non-toxic speech is toxic, potentially as a result of the inclusion of negative words without consideration for connotation.
A likely use case was not considered before deployment
Misspelling words to avoid censorship is a common practice, yet misspelled or similarly minorly altered words result in lower toxicity ratings.
Sources of Weakness:
Weakness in Algorithm
Instead of understanding nuances in text, the algorithm essentially finds ‘good’ words and ‘bad’ words, and assigns scores based on their presence. This can lead to phrases such as “racism is bad” to get high toxicity scores, since ‘racism’ and ‘bad’ are generally negative, while toxic phrases with neutrally aligned words, such as “race war now” are not considered toxic.
Weakness in Training - Biases Present
Text with words which have a generally negative connotation though no actual toxicity, such as “bad,” were likely over-represented in toxic training data.
Weakness in Training - Uncovered Situations
The weakness in defense, misspelling or otherwise minorly altering toxic words reducing the AI’s toxicity score, is likely due to most training data being grammatically correct.
Weakness in Testing - Biases Present
Text with words which have a generally negative connotation though no actual toxicity, such as “bad,” were likely over-represented in toxic testing data.
Weakness in Training - Uncovered Situations
The weakness in defense, misspelling or otherwise minorly altering toxic words reducing the AI’s toxicity score, is likely due to most testing data being grammatically correct.
Weakness in Defense - Physical Attack
By misspelling words, or adding incorrect punctuation, users can trick the AI into considering toxic speech non-toxic. For example, “gas the joos race war now” only rated 40% toxic, and changing “idiot” to “idiiot” reduced the toxicity rate of an otherwise identical comment from 84% to 20%.
Further Research
A facial recognition system in China mistakes a celebrity's face on a moving billboard for a jaywalker.
Background Information:
Full description of the incident (taken from AIID)
In November 2018, Dong Mingzhu, the chairwoman of China's biggest maker of air conditioners, Gree Electric Appliances, had her face displayed on a huge screen erected along a street in the port city of Ningbo that displays images of people caught jaywalking by surveillance cameras. The artificial software used by the traffic police erred in capturing Dong's image from an advertisement on the side of a moving bus.
Short description of the incident (taken from AIID)
A facial recognition system in China mistakes a celebrity's face on a moving billboard for a jaywalker.
Timeline
November 21, 2018
Description of AI system involved
The facial recognition algorithm used by the traffic police in Ningbo, China to spot and shame jaywalkers.
System developer
Ningbo traffic police
Sector of deployment
Public administration and defense
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Ningbo, China
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI was certain that Dong Mingzhu was jaywalking, when in actuality her face was on a bus ad.
A likely use case was not considered before deployment
Bus ads are fairly common, but were likely not considered before deployment.
Sources of Weakness:
Weakness in Algorithm
The AI is trained to recognize faces jaywalking in order to publicly shame them, however it does not check whether a person is jaywalking.
Weak Training - Uncovered Situation:
Bus ads were likely not covered in the training data.
Weak Testing - Uncovered Situation:
Bus ads were likely not covered in the testing data.
Further Research
Amazon shuts down internal AI recruiting tool that would down-rank female applicants.
Background Information:
Full description of the incident (taken from AIID)
In 2015, Amazon scrapped an internal recruiting algorithm developed by its Edinburgh office that would down-rank resumes when it included the word "women's", and two women's colleges. The algorithm ranked an applicant out of five stars, and it would give preference to resumes that contained what Reuters called "masculine language," or strong verbs like "executed" or "captured". These patterns occurred because the engineer who made the algorithm trained it with past candidates' resumes submitted over the previous ten years, and the past candidates in the industry were male-dominated.
Short description of the incident (taken from AIID)
Amazon shuts down internal AI recruiting tool that would down-rank female applicants.
Timeline
2014 - 2015
Description of AI system involved
Resume screening tool developed by Amazon to scan resumes and raise strong job applicants for consideration
System developer
Amazon
Sector of deployment
Professional, scientific and technical activities
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
Location
Edinburgh, Scotland
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI is certain that applications which are from women were inherently worse than the same application from a man.
Sources of Weakness:
Weakness in Training - Biases Present:
Much of the training data it was given came from male applicants, leading it to conclude that male applicants were inherently better suited than female applicants.
Further Research
A fake and heavily manipulated video depicting Ukrainian President Volodymyr Zelenskyy ordering his soldiers to stand down circulated on social media and was placed on a Ukrainian news website by hackers before it was debunked and removed.
Background Information:
Full description of the incident
A fake and heavily manipulated video depicting Ukrainian President Volodymyr Zelenskyy circulated on social media and was placed on a Ukrainian news website by hackers before it was debunked and removed. The video, which shows a rendering of the Ukrainian president appearing to tell his soldiers to lay down their arms and surrender the fight against Russia, is a so-called deepfake that ran about a minute long. Officials at Facebook, YouTube and Twitter said the video was removed from their platforms for violating policies. On Russian social media, meanwhile, the deceptive video was boosted. In a video posted to his Telegram channel, Zelenskyy responded to the fake video by saying: "We are defending our land, our children, our families. So we don't plan to lay down any arms. Until our victory."
Short description of the incident
A fake and heavily manipulated video depicting Ukrainian President Volodymyr Zelenskyy ordering his soldiers to stand down circulated on social media and was placed on a Ukrainian news website by hackers before it was debunked and removed.
Timeline
March 16, 2022
Description of AI system involved
A recurrent neural network was used to generate synthetic video and audio.
System developer
Unknown
Sector of deployment
Information and communication
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Ukraine
Causes
Causes for Incident:
The AI’s purpose is malevolent, or has a gross lack of consideration for potential consequences. It has failed humanity even if (or especially if) it behaves as intended.
Deepfakes provide a dangerous threat to the authenticity of all online content. In this case, the AI attempted to sow disinformation for militaristic gain.
A human is at least partly responsible
A human, or team thereof, was responsible for using the AI to create the video.
Sources of Weakness:
N/A
Further Research
New Zealand passport robot reader rejects the application of an applicant with Asian descent and says his eyes are closed.
Background Information:
Full description of the incident (taken from AIID)
Richard Lee, a New Zealander of Asian descent had submitted his ID photo to an online photo checker at New Zealand's Department of Internal Affairs and was told his eyes were closed. He was trying to renew his passport so he could return to Australia where he was studying aerospace engineering in Melbourne in December 2016. When asked about the incident, Lee said, "No hard feelings on my part, I've always had very small eyes and facial recognition technology is relatively new and unsophisticated."
Short description of the incident (taken from AIID)
New Zealand passport robot reader rejects the application of an applicant with Asian descent and says his eyes are closed.
Timeline
December, 2016
Description of AI system involved
The facial recognition software used by New Zealand's Department of Internal Affairs detects passport photos to make sure they meet all the government requirements.
System developer
Unknown, managed by New Zealand's Department of Internal Affairs
Sector of deployment
Administrative and support service activities
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
New Zealand
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI was certain that Mr. Lee’s passport photo was incorrect, despite it following all guidelines.
Sources of Weakness:
Weakness in Training - Biases Present:
It is possible that the AI was not trained on enough people of Asian descent, or those with smaller eyes.
Weakness in Testing - Biases Present:
It is possible that the AI was not tested on enough people of Asian descent, or those with smaller eyes.
Further Research
In 2016, after artificial intelligence software Beauty.AI judged an international beauty contest and declared a majority of winners to be white, researchers found that Beauty.AI was racially biased in determining beauty.
Background Information:
Full description of the incident (taken from AIID)
In 2016, Beauty.AI, an artificial intelligence software designed by Youth Laboratories and supported by Microsoft, was used to judge the first international beauty contest. Of the 600,000 contestants who submitted selfies to be judged by Beauty.AI, the artificial intelligence software chose 44 winners, of which a majority were white, a handful were Asian, and only one had dark skin. While a majority of contestants were white, approximately 40,000 submissions were from Indians and another 9,000 were from Africans. Controversy ensued that Beauty.AI is racially biased as it was not sufficiently trained with images of people of color in determining beauty.
Short description of the incident (taken from AIID)
In 2016, after artificial intelligence software Beauty.AI judged an international beauty contest and declared a majority of winners to be white, researchers found that Beauty.AI was racially biased in determining beauty.
Timeline
January 2016 - June 2016
Description of AI system involved
Artificial intelligence software that uses deep learning algorithms to evaluate beauty based on factors such as symmetry, facial blemishes, wrinkles, estimated age and age appearance, and comparisons to actors and models.
System developer
Youth Laboratories
Sector of deployment
Arts, entertainment and recreation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
The training data was biased against non-white faces.
Sources of Weakness:
Weakness in Training - Biases Present:
The database used for deep learning had a lot more white people than other races, likely biasing the AI.
Further Research
Predictive policing algorithms meant to aid law enforcement by predicting future crime show signs of biased output.
Background Information:
Full description of the incident (taken from AIID)
Predictive policing algorithms meant to aid law enforcement by predicting future crime show signs of biased output. PredPol, used by the Oakland (California) Police Department, and the Strategic Subject List, used by Chicago PD, were subjects of studies in 2015 and 2016 showing their bias against "low-income, minority neighborhoods." These neighborhoods would receive added attention from police departments expecting crimes to be more prevalent in the area. Notably, Oakland Police Department used 2010's record of drug crime as their baseline to train the system.
Short description of the incident (taken from AIID)
Predictive policing algorithms meant to aid law enforcement by predicting future crime show signs of biased output.
Timeline
2015 - 2017
Description of AI system involved
Predictive policing algorithms meant to aid police in predicting future crime.
System developer
PredPol
Chicago Police Department
Sector of deployment
Public administration and defense
Relevant AI functions
"Cognition," i.e. making decisions;
Location
Chicago
Causes
Causes for Incident:
AI is certain about an incorrect result
The algorithm is certain that some neighborhoods, primarily composed of minorities, are higher risk for crime than they likely are
Sources of Weakness:
Weakness in Algorithm:
The algorithm predicts how likely a neighborhood is to have criminal actors, and recommends extra policing to neighborhoods it deems high risk. However, aggressive policing policies, combined with human biases, lead to police officers looking for problems when they’re sent out. If someone is stopped and frisked, or arrested on a false suspicion, that creates more data the AI is trained on, leading it to conclude the neighborhood is an even higher risk. In a sense, the algorithm creates a self-fulfilling prophecy through human error.
Weakness in Training - Biases Present:
The training data included an unrepresentative large number of crimes perpetrated by minorities, leading the algorithm to conclude that minority neighborhoods were more criminal than they likely were.
Further Research
💲
A third-party Amazon merchant named “my_handy_design” was suspected of using a bot to generate cell phone case designs based on the bizarre and unattractive designs being offered.
Background Information:
Full description of the incident (taken from AIID)
In 2017, the third-party Amazon merchant named “my_handy_design” was found to be marketing thousands of unique cell phone cases printed with bizarre images. The seller is believed to be a bot trained to create product listings based on image search popularity. Their products include many images that human designers would be unlikely to select, ranging from banal to lewd and illegal, with a predilection for stock photos of medical procedures.
Short description of the incident (taken from AIID)
A third-party Amazon merchant named “my_handy_design” was suspected of using a bot to generate cell phone case designs based on the bizarre and unattractive designs being offered.
Timeline
2017
Description of AI system involved
my_handy_design' is a third-party Amazon merchant speculated to use image seach data and a database of open source images to autonomously generate cell phone case designs.
System developer
my_handy_design
Sector of deployment
Wholesale and retail trade
Relevant AI functions
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI is certain that phone cases with strange, often medical images would be popular sellers.
The algorithm uses an insufficient stand-in for its target data
The algorithm determines what phone cases would be popular based on popular image searches, which do not exactly correspond to desired phone covers
Sources of Weakness:
Weakness in Algorithm:
The algorithm determines what phone cases would be popular based on popular image searches, however this is not a precise way to generate desired images. Its means of operation is flawed from the start.
Weakness in Testing - Incorrect or insufficient tests used in testing:
It is unclear what, if any, tests were applied to this algorithm before release. However, there were clearly not enough tests if phone cases of obscure medical treatments were valid by the end.
Further Research
Yandex, a Russian technology company, released an artificially intelligent chat bot named Alice which began to reply to questions with racist, pro-stalin, and pro-violence responses
Background Information:
Full description of the incident (taken from AIID)
Yandex, a Russian technology company, released an artificially intelligent chat bot named Alice which began to reply to questions with racist, pro-stalin, and pro-violence responses. Examples include: "There are humans and non-humans" followed by the question "can they be shot?" answered with "they must be."
Short description of the incident (taken from AIID)
Yandex, a Russian technology company, released an artificially intelligent chat bot named Alice which began to reply to questions with racist, pro-stalin, and pro-violence responses
Timeline
October 2017
Description of AI system involved
Chat bot Alice developed by Yandex produces responses to input using language processing and cognition
System developer
Yandex
Sector of deployment
Information and communication
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
A likely use case was not considered before deployment
Internet trolls are common, and the Microsoft Tay incident proved that the internet can corrupt chatbots within a matter of hours. As such, not considering questions and statements which would lead to pro-genocidal views was a leading cause for the incident.
A human is at least partly responsible
Users communicated with the algorithm, and their comments were likely responsible for the algorithm becoming pro-genocidal
Sources of Weakness:
Weakness in Algorithm
The algorithm’s natural language comprehension did not allow it to understand what it was saying. It only knew how to put together human-like sentences.
While Yandex did “shut out unacceptable responses by creating blacklists for certain terms and phrases,” due to the inherent weakness of the algorithm, they could not shut out unacceptable responses which involved common neutral words, like ‘yes’ or ‘no.’
Weakness in Training - Biases Present:
Training data may not have included a significant number of conversations on domestic or political violence, leading Alice to form extreme opinions on the issues when confronted.
Weakness in Testing - Biases Present:
It is clear based on Alice’s responses that conversations involving domestic and political violence were not tested sufficiently.
Further Research
FaceApp is criticized for offering racist filters.
Background Information:
Full description of the incident (taken from AIID)
FaceApp, which uses facial recognition to change users' expressions and look, received a storm of criticism after releasing its new "black", "white", "Asian" and "Indian" filters. It received backlash on social media who described it as "racist" and "offensive". The photo editing app, which uses neural networks to modify pictures of people while keeping them realistic, was also criticized for the fact that its "hot" filter often lightens the skin of people with darker complexions.
Short description of the incident (taken from AIID)
FaceApp is criticized for offering racist filters.
Timeline
August, 2017
Description of AI system involved
The facial recognition algorithm used by FaceApp, based on deep generative convolutional neural networks, which can edit selfies using filters and other tools.
System developer
FaceApp
Sector of deployment
Arts, entertainment and recreation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
The AI’s purpose has a gross lack of consideration for potential consequences.
‘Blackface’ has widely been recognized as racist under most circumstances, yet a filter designed to mimic just that was produced.
The algorithm uses an insufficient stand-in for its target data
Since the algorithm does not know what makes a face “hot,” it used whiteness, thinness, and a lack of glasses as a stand-in for beauty.
Sources of Weakness:
Weakness in Algorithm:
The algorithm does not know what makes a face ‘hot’, leading it to use an insufficient stand-in for the data.
Weakness in Training - Biases Present:
Given the comments of the CEO, and the results, it is likely the algorithm was not trained on enough people of color.
Weakness in Testing - Biases Present:
There were likely not enough people of color tested on the ‘hot’ filter, or if there were no one recognized that it was whitening them to make them “hotter.”
Further Research
A Tesla Model S remained on autopilot while being operated by a drunk, sleeping operator whose hands were not on the wheel.
Background Information:
Full description of the incident (taken from AIID)
A Tesla Model S continued autopilot at 70 mph on a California highway in November 2018 despite the driver's hands not being placed on the wheel, a requirement of enabling the Autopilot system. The California Highway Patrol was unable to wake the driver and had to drive in front of the Tesla for approximately 7 minutes to activate its 'driver assist' feature and slow the vehicle to a stop. The driver was allegedly sleeping and with his blood alcohol content being twice the legal limit.
Short description of the incident (taken from AIID)
A Tesla Model S remained on autopilot while being operated by a drunk, sleeping operator whose hands were not on the wheel.
Timeline
November 20, 2018, around 5:30 pm
Description of AI system involved
The Tesla Autopilot is a driver assistance system which possesses two functions, Traffic-Aware Cruise Control, which matches the speed of your car to that of the surrounding traffic and Autosteer, which assists in steering within a clearly marked lane, and uses traffic-aware cruise control.
System developer
Tesla
Sector of deployment
Transportation and storage
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Palo Alto, CA
Causes
Causes for Incident:
AI is certain about an incorrect result
The algorithm was meant to only allow autopilot while the driver’s hands were on the wheel, however it did not pull over or stop driving when the driver went to sleep away from the wheel.
Sources of Weakness:
Weakness in Testing - Uncovered Situations:
Engaging autopilot without having hands on the wheel, or removing hands from the wheel after autopilot was engaged was clearly not tested thoroughly enough.
Further Research
Facebook's automatic language translation software incorrectly translated an Arabic post saying "Good morning" into Hebrew saying "hurt them," leading to the arrest of a Palestinian man in Beitar Illit, Israel.
Background Information:
Full description of the incident (taken from AIID)
Facebook's automatic language translation software incorrectly translated an Arabic post saying "Good morning" into Hebrew saying "hurt them," leading to the arrest of a Palestinian man in Beitar Illit, Israel. The post was not read by any Arabic-speaking officers before the arrest was made. The man was posted the words along with a picture of him leaning on a bulldozer, which are sometimes used in terrorist attacks, therefore the conclusion made he was inciting violence. Facebook's automatic language translation software can translate 40 languages in 1,800 directions, and posts the translation instead of the original when confident the translation is correct.
Short description of the incident (taken from AIID)
Facebook's automatic language translation software incorrectly translated an Arabic post saying "Good morning" into Hebrew saying "hurt them," leading to the arrest of a Palestinian man in Beitar Illit, Israel.
Timeline
October 15, 2017
Description of AI system involved
Facebook's automatic language translation software that can translate 40 languages in 1,800 directions
System developer
Sector of deployment
Information and communication
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Beitar Illit, Israel
Causes
Causes for Incident:
AI is certain about an incorrect result
The translation algorithm believed that “good morning” translated from Arabic into Hebrew as “attack them.”
Sources of Weakness:
Weakness in Training - Uncovered Situations:
Testing all translations exhaustively would require immense resources. However, better testing could have discovered this translation error before this incident occurred.
Further Research
The Detroit Police Department wrongfully arrest a black man due to its faulty facial recognition program provided by Dataworks Plus.
Background Information:
Full description of the incident (taken from AIID)
In June 2020, the Detroit Police Department wrongfully arrested Robert Julian-Borchak Williams after facial recognition techonology provided by DataWorks Plus had mistaken Williams for a black man who was recorded on a CCTV camera stealing. This incident is cited as an instance where facial recognition continues to possess racial bias, especially towards the Black and Asian population.
Short description of the incident (taken from AIID)
The Detroit Police Department wrongfully arrest a black man due to its faulty facial recognition program provided by Dataworks Plus.
Timeline
June 2020
Description of AI system involved
DataWorks Plus facial recognition software was provided to the Detroit Police Department and focuses on biometrics storage and matching, including fingerprints, palm prints, irises, tattoos, and mugshots.
System developer
DataWorks Plus
Sector of deployment
Public administration and defense
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Detroit, Michigan
Causes
Causes for Incident:
AI is certain about an incorrect result
The algorithm was incorrectly certain that the blurry man in the CCTV footage was Mr. Williams
A human is at least partly responsible
Despite the incorrect matching being accompanied by the message “This document is not a positive identification. It is an investigative lead only and is not probable cause for arrest” in bold letters, the police did not perform due investigative diligence before arresting Mr. Williams.
Sources of Weakness:
Weakness in Training - Biases Present:
There was likely an insufficient number of black people used in the facial recognition training data.
Weakness in Testing - Biases Present:
There was likely an insufficient number of black people used in the facial recognition testing suite.
Further Research
In response to the Covid-19 pandemic, the International Baccalaureate final exams were replaced by a calculated score, prompting complaints of unfairness from teachers and students.
Background Information:
Full description of the incident (taken from AIID)
In response to the Covid-19 pandemic, the foundation that grants the International Baccalaureate high school diploma decided to replace students' 2020 year-end exam scores with a calculated grade. The International Baccalaureate foundation used students’ prior grades and school to develop a statistical model to generate estimated test scores for each student. In several cases these grades were lower than students and teachers were expecting, which may impact their college admission or scholarship.
Short description of the incident (taken from AIID)
In response to the Covid-19 pandemic, the International Baccalaureate final exams were replaced by a calculated score, prompting complaints of unfairness from teachers and students.
Timeline
2020
Description of AI system involved
In 2020, International Baccalaureate developed a statistical model to calculate students' projected final exam grades.
System developer
International Baccalaureate
Sector of deployment
Education
Relevant AI functions
"Cognition," i.e. making decisions;
Location
Global
Causes
Causes for Incident:
The algorithm uses an insufficient stand-in for its target data
The algorithm uses previous student grades to predict their exam scores, but this is insufficient. Students who performed poorly at the start of the year but have made steady, considerable improvement have had their improvement ignored. Students with insufficient information had their grades calculated using data from other schools, leading to some student grades being calculated differently than others. Furthermore, students who test better than they perform on classwork do not have that taken into account.
Sources of Weakness:
Weakness in Algorithm:
Due to an insufficient stand-in, the algorithm guesses scores rather than determining them fairly. This also leads to bias, resulting in high-achieving, low-income students being penalized particularly hard due to overall lower performance from their school / region.
Weakness in Training - Insufficient Time Dedicated:
Due to the algorithm’s necessity on short notice, the algorithm could not have trained for as long as it would need to to perform fairly.
Weakness in Testing - Insufficient Time Dedicated:
Due to the algorithm’s necessity on short notice, the algorithm could not have been tested for as long as it would need to to perform fairly.
Further Research
A 2020 study conducted in the Mass General Brigham health system demonstrated that a popular algorithm for estimating kidney function included a race multiplier, which underestimated the risk to African-American patients.
Background Information:
Full description of the incident (taken from AIID)
A 2020 study conducted in the Mass General Brigham health system demonstrated that a popular algorithm for estimating kidney function underestimated the risk to African-American patients. This bias could lead to inequitable outcomes, such as not being placed on a kidney transplant waiting list. The equation, known as the Chronic Kidney Disease Epidemiology Collaboration estimated Glomerular Filtration Rate (CKD-EPI eGFR) equation, includes a race multiplier for African-Americans. When researchers removed the race multiplier, 33.4% of African-American patients in their study were reclassified into more severe risk categories.
Short description of the incident (taken from AIID)
A 2020 study conducted in the Mass General Brigham health system demonstrated that a popular algorithm for estimating kidney function included a race multiplier, which underestimated the risk to African-American patients.
Timeline
June 2019
Description of AI system involved
An equation created by the Chronic Kidney Disease Epidemiology Collaboration to calculate Glomerular Filtration Rate (GFR)
System developer
Chronic Kidney Disease Epidemiology Collaboration
Sector of deployment
Human health and social work activities
Relevant AI functions
"Cognition," i.e. making decisions;
Location
Boston, MA
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI consistently underestimated how at-risk black patients were for kidney disease
A human is at least partly responsible
Since this AI was not the result of machine learning, a human programmer was responsible for including race as an input, and, depending on whether the race was black or not-black, adjusting their risk-assessment.
Sources of Weakness:
Weakness in Algorithm:
The AI was designed under the assumption that black people have higher muscle mass on average, leading to higher kidney function. In addition to not defining how black a person must be to be ‘black enough,’ there is growing advocacy to stop using muscle mass as a factor when determining kidney health.
Further Research
In a Scottish soccer match the AI-enabled ball-tracking camera used to livestream the game repeatedly tracked an official’s bald head as though it were the soccer ball.
Background Information:
Full description of the incident (taken from AIID)
Scottish soccer team Inverness Caledonian Thistle Football Club uses cameras with AI ball-tracking to livestream their matches on YouTube. In a 2020 match against Ayr United, a camera repeatedly tracked an official’s bald head, thinking it was the soccer ball.
Short description of the incident (taken from AIID)
In a Scottish soccer match the AI-enabled ball-tracking camera used to livestream the game repeatedly tracked an official’s bald head as though it were the soccer ball.
Timeline
December 24, 2020
Description of AI system involved
AI ball-tracking technology using video feed to determine and follow a ball in order to keep the game in focus.
System developer
Unknown
Sector of deployment
Arts, entertainment and recreation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Inverness, Scotland, UK
Causes
Causes for Incident:
AI is certain about an incorrect result
The algorithm was certain that the referee’s bald head was actually the soccer ball.
Sources of Weakness:
Weakness in Algorithm:
The algorithm was likely trained to follow round, bright objects within the field of play. Unfortunately, this does not cover the entirety of what a soccer ball is, and as such it was confused when a similar object was “in play” at the same time.
Weakness in Training - Biases Present:
The training data likely just included heads with hair on the field, not bald ones.
Weakness in Testing - Biases Present:
The testing data likely just included heads with hair on the field, not bald ones.
Further Research
Gmail, Yahoo, Outlook, GMX, and LaPoste email inbox sites showed racial and content-based biases when AlgorithmWatch tested their spam box filtering algorithms.
Background Information:
Full description of the incident (taken from AIID)
Gmail, Yahoo, Outlook, GMX, and LaPoste email inbox sites showed racial and content-based biases when AlgorithmWatch tested their spam box filtering algorithms. AlgorithmWatch sent hundreds of emails to 10 email accounts on the listed sites, and noticed emails would be filtered into the spam box if certain words were within the body of the email. A Nigerian students internship application was marked spam, but when the word "Nigeria" was removed it was delivered to the inbox. The same applied to a "sex education" email that was forwarded to inbox after removing "sex". A Joe Biden speech went through when the words "loan, investment, billion" were removed.
Short description of the incident (taken from AIID)
Gmail, Yahoo, Outlook, GMX, and LaPoste email inbox sites showed racial and content-based biases when AlgorithmWatch tested their spam box filtering algorithms.
Timeline
October 22, 2020
Description of AI system involved
Machine learning algorithms used to filter spam emails out of inboxes.
System developer
Gmail, Outlook, Yahoo, GMX, LaPoste
Sector of deployment
Information and communication
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
AI is certain about an incorrect result
The algorithm is certain that some non-spam emails are spam when they include certain keywords.
The algorithm uses an insufficient stand-in for its target data
Since the algorithm cannot inherently know what is spam and what is not, it uses some words as indicators that an email is spam. Unfortunately, these words can appear in non-spam emails as well, leading to misclassification.
Sources of Weakness:
Weakness in Algorithm:
Since the algorithm cannot inherently know what is spam and what is not, it uses some words as indicators that an email is spam. Unfortunately, these words can appear in non-spam emails as well, leading to misclassification.
Weakness in Training - Biases Present:
It is likely that the algorithm was not trained on enough emails which were not spam, but included spam-probable words like “Nigeria,” “sex,”, and “loan.”
Weakness in Testing - Biases Present:
It is likely that the algorithm was not tested on enough emails which were not spam, but included spam-probable words like “Nigeria,” “sex,”, and “loan.”
Further Research
Avaaz, an international advocacy group, released a review of Facebook's misinformation identifying software showing that the labeling process failed to label 42% of false information posts, most surrounding COVID-19 and the 2020 USA Presidential Election.
Background Information:
Full description of the incident (taken from AIID)
Avaaz, an international advocacy group, released a review of Facebook's misinformation identifying software showing that the labeling process failed to label 42% of false information posts, most surrounding COVID-19 and the 2020 USA Presidential Election. Avaaz found that by adjusting the cropping or background of a post containing misinformation, the Facebook algorithm would fail to recognize it as misinformation, allowing it to be posted and shared without a cautionary label.
Short description of the incident (taken from AIID)
Avaaz, an international advocacy group, released a review of Facebook's misinformation identifying software showing that the labeling process failed to label 42% of false information posts, most surrounding COVID-19 and the 2020 USA Presidential Election.
Timeline
October 2019 - August 2020
Description of AI system involved
Facebook's algorithm and process used to place cautionary labels on posts that are decided to contain misinformation.
System developer
Sector of deployment
Information and communication
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
Location
Global
Named entities
Avaaz
Reuters
AP
PolitiFact
Causes
Causes for Incident:
AI is certain about an incorrect result
The algorithm is certain that some misinformation, after being tweaked slightly, is no longer misinformation.
A human is at least partly responsible
Humans are primarily responsible for making and propagating misinformation over Facebook and across the internet.
Sources of Weakness:
Weakness in Defenses - Physical Attack:
Users could circumvent the algorithm by adding or changing image borders, font choice, and cropping images.
Weakness in Defenses - Out of Distribution Attack:
If the algorithm was trained primarily on images, then users writing out the text of the images to circumvent the algorithm’s detection would qualify as an OOD attack.
Further Research
UK passport photo checker shows bias against dark-skinned women.
Background Information:
Full description of the incident (taken from AIID)
Women with darker skin are more than twice as likely to be told their photos fail UK passport rules when they submit them online when compared to lighter-skinned men, according to a BBC investigation. Elaine Owusu, a Black student, said she was wrongly told her mouth looked open each time she uploaded five different photos to the government website. This shows how "systemic racism" can spread. The facial recognition software was used by the Home Office of the British goverment to help users get their passports more quickly. Additionally, Cat Hallam, who describes her complexion as dark-skinned, told the BBC reporters that her photos have been judged to be poor quality which included "there are reflections on your face" and "your image and the background are difficult to tell apart."
Short description of the incident (taken from AIID)
UK passport photo checker shows bias against dark-skinned women.
Timeline
October 2020
Description of AI system involved
The facial recognition algorithm used by the Home Office of the UK Government to identify and check for applicant's passport photos.
System developer
The Home Office of the UK Government
Sector of deployment
Administrative and support service activities
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
Location
United Kingdom
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI was certain that their passport photos were incorrect, despite them following all guidelines.
Sources of Weakness:
Weakness in Training - Biases Present:
It is likely that the AI was not trained on enough dark-skinned people.
Weakness in Testing - Biases Present:
It is likely that the AI was not tested on enough dark-skinned people.
Weakness in Training and Testing - Insufficient Time Dedicated
Documents released as part of a freedom of information request in 2019 had previously revealed the Home Office was aware of this problem, but decided "overall performance" was good enough to launch the online checker. Home Office did not dedicate enough time to properly train and test their algorithm.
Further Research
In October 2020, results for "Jewish baby stroller" on Google Images showed anti-semitic images as a result of organized online targeting by anti-Semitic online groups.
Background Information:
Full description of the incident (taken from AIID)
In October 2020, Google images showed anti-semitic images of a portable oven when a user searches "Jewish baby stroller" due to anti-semitic online groups tagging these anti-semitic images with the tag "Jewish baby stroller." Google claims this is a result of 'voids of information' or the algorithm being unable to decipher the image and relying on the images' tags.
Short description of the incident (taken from AIID)
In October 2020, results for "Jewish baby stroller" on Google Images showed anti-semitic images as a result of organized online targeting by anti-Semitic online groups.
Timeline
2017 - October 2020
Description of AI system involved
The Google image algorithm uses image recognition and appended text or tags to classify images.
System developer
Sector of deployment
Information and communication
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Global
Causes
Causes for Incident:
AI is certain about an incorrect result
The AI was certain that images of portable gas ovens were jewish baby strollers.
A human is at least partly responsible
Anti-Semetic humans are responsible for the miscategorization of images which lead to this result.
Sources of Weakness:
Weakness in Training - Incorrect Data:
As a result of the interference of anti-Semetic groups, some training data regarding portable German ovens was corrupted to instead classify them as Jewish baby strollers.
Weakness in Defense - User Interference
Users are able to misclassify images, resulting in training data being corrupted. The lack of safeguards against this constitutes a weakness in the algorithm’s defenses.
Further Research
Tesla’s self-driving car was fooled by two large signs, displaying circular letters in red and orange, thinking they were two stop lights.
Background Information:
Full description of the incident
Redditor cyntrex posted their video to the subreddit r/teslamotors. The video shows the view from the inside of a stationary Model 3. The car's Autopilot system is registering traffic lights changing intermittently from red to yellow, and back to red. The person recording the video then shifts the camera angle to show us what is outside, just in front of the car. Two vertical flags with the word "coop" written in bold red and orange letters — shown in the image above — seem to be confusing the Tesla Autopilot system. As the flag waves in the wind, the system changes from red to yellow lights as it reads the different colors on the looped "o's" on the flags. The car clearly thinks the round letters on the flag, belonging to a service station, are red and yellow traffic lights.
Short description of the incident
Tesla’s self-driving car was fooled by two large signs, displaying circular letters in red and orange, thinking they were two stop lights.
Timeline
October 22, 2020
Description of AI system involved
The Tesla Autopilot driving system allows hands-off driving, parking, and navigation using environmental sensors, long range radars, and 360 ultrasonic.
System developer
Tesla
Sector of deployment
Transportation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
near Zurich, Switzerland
Causes
Causes for Incident:
AI is certain about an incorrect result
The algorithm is confident that the signs are actually stoplights.
Sources of Weakness:
Weakness in Training - Biases Present:
The signs the algorithm was trained on likely did not include enough signs that looked like stop lights.
Weakness in Testing - Biases Present:
The signs the algorithm was tested on likely did not include enough signs that looked like stop lights.
Further Research
A chess-playing robot, apparently unsettled by the quick responses of a seven-year-old boy, grabbed and broke his finger during a match at the Moscow Open.
Background Information:
Full description of the incident
A chess-playing robot, apparently unsettled by the quick responses of a seven-year-old boy, grabbed and broke his finger during a match at the Moscow Open. Sergey Smagin, vice-president of the Russian Chess Federation, told Baza the robot appeared to pounce after it took one of the boy’s pieces. Rather than waiting for the machine to complete its move, the boy opted for a quick riposte, he said. The machine, which can play multiple matches at a time and had reportedly already played three on the day it encountered Christopher, was “unique”, Smagin said. “It has performed at many opens.”
Short description of the incident
A chess-playing robot, apparently unsettled by the quick responses of a seven-year-old boy, grabbed and broke his finger during a match at the Moscow Open.
Timeline
July 19, 2022
Description of AI system involved
The Russian model was trained to play chess and move the pieces using a robotic arm.
System developer
Unknown
Sector of deployment
Arts, entertainment and recreation
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Moscow, Russia
Causes
Causes for Incident:
The AI did not have a failsafe
When the robotic arm was gripping the boy's finger, it could not be removed before a fracture occurred.
A likely use case was not considered before deployment
Someone moving too quickly in response to the model's move should have been considered, especially in games such as chess where there is often time limits which encourage moving quickly.
A human shares responsibility for the incident
The child did not follow safety protocols regarding waiting after the model made its move.
Sources of Weakness:
Weakness in Algorithm:
The chess model did not apparently have a means of distinguishing a finger from a piece, and the necessity of a waiting period after moving indicates that this was a possibility considered during development. However, the fact that it remained during real world applications is a clear weakness.
Weakness in Training - Biases Present:
The model was likely not trained on users who did not follow the safety protocols.
Weakness in Testing - Biases Present:
The model was likely not tested on users who did not follow the safety protocols.
Further Research
A Tesla vehicle crashes into a $3.5 million private jet after being summoned by its owner using the automatic parking feature.
Background Information:
Full description of the incident
A Tesla vehicle crashes into a $3.5 million Cirrus Vision Jet after being summoned by its owner using the automatic parking feature. The “Smart Summon” feature enables a Tesla vehicle to leave a parking space and navigate around obstacles to its owner. Smartphone video appears to capture security camera footage of the Tesla slowly crashing into and then actually pushing the jet across the tarmac.
Short description of the incident
A Tesla vehicle crashes into a $3.5 million Cirrus Vision Jet after being summoned by its owner using the automatic parking feature.
Timeline
April 22, 2022
Description of AI system involved
Tesla's “Smart Summon” feature enables a Tesla vehicle to leave a parking space and navigate around obstacles to its owner.
System developer
Tesla
Sector of deployment
Transportation and storage
Relevant AI functions
"Perception," i.e. sensing and understanding the environment;
"Cognition," i.e. making decisions;
"Action," i.e. carrying out decisions through physical or digital means.
Location
Felts Field in Spokane, Washington
Causes
Causes for Incident:
The AI is certain of an incorrect result
The model did not detect that it had collided with a plane, and instead continued trying to move forward.
The AI did not have a failsafe
When the crash occurred it was not detected, resulting in further damages.
Sources of Weakness:
Weakness in Algorithm:
Tesla acknowledges that there are several weaknesses in the model. For example, "Smart Summon may not stop for all objects (especially very low objects such as some curbs, or very high objects such as a shelf) and may not react to all traffic." Furthermore, "Smart Summon does not recognize the direction of traffic, does not navigate around empty parking spaces, and may not anticipate crossing traffic."
Weakness in Training - Uncovered Situation:
The model was likely not trained on avoiding planes specifically
Weakness in Testing - Uncovered Situation:
The model was likely not tested on avoiding planes specifically
Further Reading