Abstracts

List of accepted 18 abstracts:

Title: Phrase Detectives

Authors: Jon Chamberlain

Abstract: The online game-with-a-purpose Phrase Detectives has been collecting decisions about anaphoric coreference in human language for over 10 years (4 million judgements from 40,000 players). The game was originally designed to collect multiple valid solutions for a single task, which complicated aggregation but created a very rich (and noisy) dataset. Analysis of the ambiguous player decisions highlights the need for understanding and resolving disagreement that is inherent in language interpretation. This talk will present some of the interesting cases of ambiguity found by the players of Phrase Detectives (a dataset that will be made available to the research community later this year) and discuss the statistical methods we have been working on to harness crowds that disagree with each other.

Slides: https://drive.google.com/open?id=1xV2xCpwGp_0N6tF8vPNZF1eLqDWJUOSm

Title: Capturing Cognitive Bias in Human Agents Performing Music-related Tasks

Authors: Ioannis Petros Samiotis, Christoph Lofi, and Alessandro Bozzon

Abstract: Human computation has been used extensively on music data, for a wide variety of applications. Having human agents to listen to parts of music tracks will invoke certain “aesthetic emotions” and stimulate their recognition memory, as found in neuroscience and cognitive psychology studies. These emotions and memories are affected by the experiences of the individual, which are also connected to their cultural expositions. Therefore, human agents performing music-related crowd computing tasks, are introducing types of biases such as cultural bias due to enculturation, or confirmation bias due to expectations based on their past experiences. In this work, we conduct a survey on the most common forms of biases associated with music consumption and preferences. These biases are later related with types of music crowdsourcing tasks, based on their design and their use of musical elements. After realising how human agents can exhibit these types of cognitive biases, we introduce them as additional parameters on a user’s model, performing those tasks. Finally, we propose a methodology to capture crowdworkers’ biases using the agents’ ethnicity, age and music taste, through the use of a questionnaire. The captured bias parameters are later used to determine the task assignment step.

Slides: https://drive.google.com/open?id=1DoQ9OmTiSsS_j3p96VPbiuOyqP0oiWEs

Title: Exploring Bias in Crowd-Powered Social Robotics for Stress Mitigation Tasks

Authors: Tahir Abbas, Vassilis-Javed Khan, Ujwal Gadiraju and Panos Markopoulos

Abstract: Real-time crowdsourcing (RTC) is an area in Human Computation research in which online workers carry out tasks under real time constraints. RTC has been used in a wide variety of domains including tele-operating a robot’s locomotion and supporting or facilitating information retrieval tasks. When it comes to handling complex social conversational tasks with RTC, the crowds’ collective input is vulnerable to bias due to a variety of factors ranging from the cultural background of workers to their personal interests or viewpoints. In a recent study, we investigated how effectively a crowd of workers, recruited in real-time, can act as a coach for assisting stressed students via a robot. The crowd’s input includes interesting data on how the workers’ cultural background biases their recommended strategies in the context of mitigating the stress of students. More specifically, we developed a system where Softbank’s Pepper robot broadcasts a live audio-video (AV) feed of a student who is stressed to crowd workers. Workers are then asked to converse with the students by typing messages which are in turn spoken out by Pepper, in an attempt to alleviate the stress of students and propose coping strategies. Our results highlight the cultural bias among workers. We found that Indian workers asked more personal questions (e.g. about the age or partner of the student), suggested solutions influenced by their spiritual beliefs (e.g. to pray, or to believe in God) and proposed unique solutions (e.g. practicing yoga). Conversely, US workers let the students express themselves by asking open questions, explored students’ strengths and coping skills, and suggested practical exercises (e.g. walking) as a main strategy to alleviate stress. Based on this study, we reflect on the potential challenges when using real-time crowdsourcing to support conversational tasks, and envision opportunities to reduce bias in such complex social conversational tasks. Our proposed solutions include training workers, building intelligent workflows and UIs.

Slides: https://drive.google.com/open?id=1cr3Sixn6Fghvlks-QTZfe_6vHVkKoLKx

Title: Fairness in Algorithmic and Crowd-Generated Descriptions of People Images

Authors: Styliani Kleanthous, Jahna Otterbacher, Pinar Barlas and Kyriakos Kyriakou

Abstract: Crowdsourcing plays a key role in developing algorithms for image recognition or captioning. Image analysis algorithms have become indispensable in the modern information ecosystem. Beyond their early use in restricted domains (e.g., military, medical), they are now widely used in consumer applications and social media, with the consumers taking the output of these applications for granted.

With the rise of the "Algorithm Economy", image analysis algorithms are increasingly being commercialized as Cognitive Services. This practice is proving to be a boon to the development of applications where user modeling, personalization and adaptation are required. From e-stores, where image recognition is used to curate a "personal style" for a given shopper based on previously viewed items, to dating apps, which can now act as "visual matchmakers", the technology has gained increasing influence in our digital interactions and experiences.

However, proprietary image tagging services are black boxes and there are numerous social and ethical issues surrounding their use in contexts where people can be harmed. In this work, we will provide an overview of recent and planned future work in analyzing proprietary image tagging services (e.g., Clarifai, Google Vision, Amazon Rekognition) for their gender and racial biases when tagging images depicting people. Our work focused primarily on discrimination discovery in this domain, as well as on understanding user perceptions of fairness. Finally, we will explore the sources of such biases, by comparing human versus machine descriptions of the same people images.

Slides: https://drive.google.com/open?id=1crQxvUo-8DZQMJa8H4JRIoaWow1AYFA6

Title: Biases in Human Computation and Crowdsourcing - Current Legal Perspectives

Authors: Stefanie Pletz

Abstract: “Justice outweighs all other values.”

C. Perelman [1]

Human-computer interaction inevitably consists of value transfer processes[2].

Specifically, the transfer of these values is subject to human biases and discrimination in algorithms, machine learning and computation at large. This particular discriminatory bias appears topically manifold and wide-ranging due to the complexities in data processing. Demographic information such as attributes [3] referring to age, national origin and race, have been identified as negative social discriminatory bias in computation which opposes legal analytical logic and methodology that statutorily [4] enshrines fundamental human values and standards such as fairness, justice and equality. The paper seeks to identify and discuss human bias and human bias research methodologies to further explore the ‘efficiency vs. equality trade-off’ caused by discriminatory biases in human computation including institutional and legal perspectives. It contends that existing legislative frameworks are insufficient to support the detection and identification of human biases in computation. The paper then examines the rationale for legal intervention and illustrates how recent legislative developments and proposals such as the use of super-soft law, or the Algorithmic Accountability Act could mitigate discriminatory human biases. The paper closes by discussing and contextualising specific current practices such as testing and evaluating algorithms and proposes to follow the approach of dynamic legislation to reduce human bias in computation.

[1] Chaim Perelman, Justice, Law and Argument – Essays on Moral and Legal Reasoning (first published 1980) 1

[2] Value transfer processes Ger. for Wertübertragunsprozesse in Thilo Hagendorff, ‘Maschinelles Lernen und Diskriminierung: Probleme und Lösungsansätze’ [2019] ÖZS 53

[3] Yanai Elazar and Yoav Goldberg, ‘ Adversarial Removal of Demographic Attributes from Text Data’ [2018] Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 11-21

[4] Protocol No. 12 to the Convention for the Protection of Human Rights and Fundamental Freedoms, an additional protocol to the European Convention on Human Rights (ECHR) provides for a general prohibition of discrimination states in Art. 1 that: "The enjoyment of the rights and freedoms set forth in this Convention shall be secured without discrimination on any ground such as sex, race, colour, language, religion, political or other opinion, national or social origin, association with a national minority, property, birth or other status." And in Art. 2 respectively: “No one shall be discriminated against by any public authority on any grounds such as those mentioned above.”

Slides: Author/s do not authorize to publish

Title: Combining humans and machines towards drug data quality assessment

Authors: Amrapali Zaveri

Abstract: The economic and medical benefits of targeting drug treatments to patients on a personalised basis depend on data about those drugs that are complete, consistent, and accurate. However, current efforts to create such high quality data are unreliable, expensive, and unscalable. For example, automated methods to improve the quality of data, particularly to catalogue FDA-approved drug uses and indications, have yielded large datasets of poor quality. Human (medical) experts can help to label and curate data manually to increase their quality. However, involving experts can get expensive at a large scale. A critical need remains for efficient and cost-effective solutions for assessing and improving the quality of drug data. In this talk, I will talk about our work where we aim to develop a novel quality assessment framework that combines humans via crowdsourcing and machines via machine learning towards drug data quality assessment. I will discuss our initial experiments and results and particularly of the insights gathered from the crowdsourcing experiments. We aim to generate a high-quality dataset of drug indications that also contains medical context, provenance and the literature support for it. Such a dataset is urgently needed to ensure the accuracy of machine learning methods for drug discovery.

Slides: https://drive.google.com/open?id=1v9V4yatey6C1szwtA5v8cXm_X0aJA_OX

Title: A human in the loop approach to detect episodic and thematic framing in news videos

Authors: Panagiotis Mavridis, Markus De Jong, Alessandro Bozzon, Tobias Kuhn and Lora Aroyo

Abstract: A vast majority of people prefer the use of news videos to get informed. However, news videos, like any kind of news information, are prone to bias that leads to miscommunication. This bias comes from differences between groups of news consumers. Citizens that interpret the news have different political orientations and thus understand news differently. Also, media scholars interpret and study news differently compared to the broader public. While automated methods exist for different types of bias detection, their capabilities and accuracy are limited or there is a need to fine-tune specific parameters to make them work out of the box. For instance, episodic and thematic framing is very difficult to be captured by existing automatic techniques when applied to news videos. Instead, we propose the use of humans in the loop in order to detect framing. We propose the design of crowdsourcing tasks that will help people identify through a set of particular questions the existence of thematic vs. episodic framing. Then, we evaluate the result of the crowdsourcing experiment and compare the results with our expert ground truth. Conversely, when applied for a particular news event the occurring word patterns and annotations of crowdsourcing can help determine framing in an automatic manner. After having evaluated the skillset of the crowd to determine framing, we finally evaluate our semi-automatic method with the use of humans in the loop.

Slides: Author/s do not authorize to publish

Title: On the Quality of Crowdsourced Information Quality Assessments

Authors: Davide Ceolin

Abstract: Human judgment is a fundamental aspect of information quality assessment. Information quality is a highly subjective and contextual matter. Different individuals in different contexts can judge quality differently. However, when these humans share a similar background and the task at hand is clearly constrained and defined, the judgments that individuals produce are quite uniform. Crowdsourcing could be, in principle, a means to reach out to a large number of humans to collect large examples of information quality assessments. However, in many crowdsourcing platforms, the identity and profiles of contributors are not public. Thus, when aggregating such assessments, the challenge is to determine how uniform these can be, and how to handle them properly.

To address this issue, we are working towards a metric of quality of information quality assessments. The goal of this metric is to assess the ability of the user to motivate their assessments, and we use it as a proxy to determine the similarity of the background of the contributors. This score is meant to link the strength of the rationales provided by the contributors with their variance, possibly with respect to the experts (when such experts exist). In other words, the higher the score is, the stronger the rationale will be and the closer the judgment of the contributor is to that of experts. Vice-versa, a lower score indicates a low probability of the judgment of the contributor to be close to that of experts, as well as a high variance among the judgments sharing the same score.

I will illustrate this score with examples based on a corpus of articles from the vaccination debate.

Slides: https://drive.google.com/open?id=1AJmFmRqEhdhSIZLwhXT_1bzStXfV-hVf

Title: Comparing measurement of perceptions of energy density and carbon footprint using Citizen Science and a UK representative sample

Authors: Christian Reynolds

Abstract: There is a food knowledge disconnect between the food researcher and the general population. Indeed researchers cannot measure easily what citizens understand or perceive to know about food. Standard surveys take time and are costly.

One digital engagement and data collection methodology that would allow the measurement of perception around food, as well as educate participants and measure changes in knowledge is citizen science. The citizen science method invites members of the public to participate in scientific thinking and data collection around a set theme or experiment and to collaboratively become researchers.

This paper provides comparison of two methods 1) the Zooniverse citizen science platform (n=~516 participants); and 2) a UK representative sample (n=400) deployed over the Qualtrics Panel, to measure perceptions foods of energy density (kcal) and carbon footprint (g of CO2). Ten foods were selected with a range of energy and carbon densities.

Slides: https://drive.google.com/open?id=11qlS6CdW7twtv__aD1hQXFSAx1TQw3M1

Title: Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments

Authors: Ujwal Gadiraju

Abstract: Crowdsourced data acquired from tasks that comprise a subjective component (e.g. opinion detection, sentiment analysis) is potentially affected by the inherent bias of crowd workers who contribute to the tasks. This can lead to biased and noisy ground-truth data, propagating the undesirable bias and noise when used in turn to train machine learning models or evaluate systems. In this work, we aim to understand the influence of workers’ own opinions on their performance in the subjective task of bias detection. We analyze the influence of workers’ opinions on their annotations corresponding to different topics. Our findings reveal that workers with strong opinions tend to produce biased annotations. We show that such bias can be mitigated to improve the overall quality of the data collected. Experienced crowd workers also fail to distance themselves from their own opinions to provide unbiased annotations.

Slides: https://drive.google.com/open?id=1LS6TuBYtwand5aUR23BeBX2BIXAkfQMk

Title: Understanding Bias and Subjectivity in Conversational Microtask Crowdsourcing Using Conversational Style Estimation

Authors: Sihang Qiu, Ujwal Gadiraju and Alessandro Bozzon

Abstract: Crowdsourcing platforms such as Amazon Mechanical Turk and Figure Eight have created marketplaces for millions of people to gain profit or make a living. As crowdsourcing has grown to be an important source of income of these people (i.e. online crowd workers), the study of human factor recently became a popular direction in the crowdsourcing field. With the booming of chatbots, previous works have shown that text-based conversational agents can effectively support crowdsourcing task execution with high user satisfaction. However, bias and subjectivity can still affect crowdsourcing result as long as human is involved.

During the last few decades, linguists and psychologists have found that the conversational style could reflect the personality, thinking style, and growth environment of a person, which inspires us to investigate whether we can better understand bias in crowdsourcing by estimating the conversational style of the worker. To this end, we design and implement a web-based conversational agent, which can be easily embedded in any web-based crowdsourcing platforms, to assist workers in microtask execution. Furthermore, we proposed a coding scheme to manually classified conversational styles into two categories (involvement vs. considerateness) according to relevant linguistic works. We use this coding scheme to produce ground-truth dataset, and then train a classification model to automatically measure conversational styles. We intend to conduct a crowdsourcing experiment, where workers need to collect and understand the information acquired from the Internet (about a controversial issue) and then give a brief summary. Finally, we plan to evaluate the quality of the answers given by workers, and to investigate whether bias significantly correlate with conversational style.

Slides: https://drive.google.com/open?id=1jWk4PHnwvPbHMI2Hjq674Pcl3KYvXtig

Title: AI-Assisted Peer Review: Opportunities and Biases

Authors: Lorenzo Bracciale, Alessandro Checco, Pierpaolo Loreti and Giuseppe Bianchi

Abstract: The scientific literature peer-review workflow is under strain because of the constant growth of submission volume and retraction rates, that in turn is increasing the need for strict scrutiny towards low-quality submissions. Reducing screening and review time would save millions of working hours and potentially boost academic productivity.

Many platforms have already started to use automated screening tools, to prevent plagiarism and failure to respect format requirements. Recent tools even attempt to flag the quality of a study or summarise its content, to reduce reviewers’ load.

The recent advances in Artificial Intelligence (AI) open the door to (semi) automated peer review systems, where potentially low-quality or controversial studies could be flagged, and reviewer-document matching could be performed in an automated manner.

In this work, we further explore this opportunity by using AI to analyse paper content and potentially match correlations with reviewers evaluations. This has the potential of reducing desk rejecting and exploring latent motivations or bias behind evaluations.

To this aim, we analysized 3400 papers submitted to three different venues, together with their reviews.

Using a Dense Neural Network (DNN) we remarkably improved (+25% accuracy) the prediction on paper acceptance with respect to a random classifier.

We use regression techniques to predict the average score a paper received by its reviewers. Using DNN we improved the MAE of a naive regressor prediction by 21%.

We also investigate the explainability of the AI choices, to assist human decision and increase the transparency of such automated systems.

Slides: https://drive.google.com/open?id=11DsJxTeiqOE53ljMZFG8xGZ2QL2ZFNds

Title: Correcting dataset biases for machine learning applications with human-in-the-loop systems

Authors: Agathe Balayn

Abstract: Real-life applications of machine learning (ML) suffer from unfairness due to biases emerging along the ML pipeline. Research mostly focuses on solving algorithmic aspects of unfairness and biases within high-stake applications using structured data (e.g. characteristics of individuals to predict their probability to reimburse a loan). Conversely, here we investigate data biases in classification tasks of unstructured data, assuming that these applications have different important challenges to address.

Specifically we explore the creation of unbiased datasets for fairer ML applications, assuming that dataset representativeness is the primary source of unfairness. Through two use-cases (profession prediction from images, and toxicity prediction based on textual sentences), we identify data biases stemming from the content of the samples (latent features of images giving away protected attributes) and from the labels (aggregated crowd-provided labels representing majority opinions) that contribute to the unfairness of current systems. We then propose human-in-the-loop approaches in order to mitigate these biases and the consequent unfairness, relying on the analysis of the outputs of ML models. First, we show that training ML models on the dataset to correct and evaluating their fairness provides an indicative bias measure. Then, we propose a visualization tool to help ML experts understand the causes of label bias and identify the sentences in the dataset and the type of crowd workers that could provide labels correcting unfairness. We also propose a methodology to pinpoint the causes of sample biases in images, combining crowdsourced knowledge on stereotypes and information from methods to explain ML models. We present initial experiments that show the potential of these methodologies and discuss future work.

Slides: https://drive.google.com/open?id=1ftiYlSQV_H44FER0Hxf5eIVySHuczKmz

Title: Population-level amplification and suppression of individual biases

Authors: Mathew Hardy, Bill Thompson, Peter M. Krafft and Thomas L. Griffiths

Abstract: When solving information processing problems as individuals, numerous cognitive biases shape people’s behavior and choices. However, people often make decisions in social contexts, surrounded by others facing similar problems. Does social interaction counteract our individual biases or entrench them? To examine this question, we developed a model of decision making in crowds of biased individuals. Our analysis shows that this process shares formal properties with a class of stochastic approximation algorithms known as Sequential Monte Carlo methods. This relationship allows us to characterize the computation performed by the group and formalize the effects of individual biases on population-level computation. We tested the predictions of our analysis in a series of online behavioural experiments in which networked participants made perceptual judgements in the context of biased incentives. Participants were organized into discrete generations. Individuals in generation t observed the judgements made by individuals in generation t-1, allowing us to capture a simple form of the temporal dependencies relevant to many real-world examples of human computation. These experiments focused on motivated reasoning, a form of biased decision making where people overestimate the probability of high-utility events. Our results suggest that social decision making increased participants’ perceptual accuracy relative to an asocial baseline, but also amplified their bias. We also explore potential ways to recalibrate this population-level process. Our model suggests that access to social metadata revealing the incentives and choice distributions of earlier generations can do so, enabling individuals in social networks to increase their perceptual accuracy without increasing their bias.

Slides: Author/s do not authorize to publish

Title: Wisdom of the crowds vs expert knowledge: experiments in genre annotation

Authors: Serge Sharoff

Abstract: This talk will report differences in annotation experiments aimed at the same task, ie. genre annotation, while involving different annotation setups. More specifically, this study compares crowd-sourced annotation of commonly recognised genres, such as news or editorial cite:asheghi16, vs expert annotation of text functions, such as argumentative or reporting texts cite:sharoff18genres.

Most of the previous genre annotation studies have not been tested for inter-annotator reliability. When they have been tested cite:sharoff10lrec, they were found to exhibit low inter-annotator agreement. With precise and consistent annotation guidelines for well-defined and well-recognized categories, it is possible to use crowdsourcing (via Amazon Mechanical Turk) to obtain reasonably good agreement. However, this applies to a small portion of the training corpus. Many documents in the crowd-sourced annotation experiment received the non-informative `Other' label.

One of the reasons for this is genre hybridism. Even with strict editorial control, the authors may choose to combine distinct styles of writing in a single text, such as reportage and expressions of opinions in a newspaper article. This leads to possible proliferation of genre labels, e.g., /editorial, column, opinion, analytic, feature article/. On the Web there are far fewer explicit gate-keepers, and far more authors with varying levels of expertise or willingness to express themselves according to traditionally accepted ways which are recognised as genres. This leads to such phenomena as citizen journalism or research blogs. From the annotation perspective, different annotators can interpret a hybrid Web text in different ways, thus producing different annotations for stylisticallly similar texts.

Expert annotation can be based on recognition of communicative, in which the texts are described in terms of their similarity to prototype genres. The suggested set of 15 functions is designed to be applicable to any text on the Web and to be reliable in annotation practice. Inter-annotator agreement results show that the suggested categories produce Krippendorff's α at above 0.76.

Slides: https://drive.google.com/open?id=1uZ-ex6FGWm_E9kjZiuclPAYKnuNgRNtn

Title: An Analysis of the Cognitive Load of A Skill Taxonomy Hypertree Interface

Authors: Haoyu Xie

Abstract: Crowdsourcing is rapidly gaining attention and utilisation in creative designing, online business and research, producing output with high quality and diversity (Olenski, 2016). Both service requesters and providers need user-friendly interfaces on the skill or job selection to get the best matches between them (Kittur et al., 2008). With the skills and jobs under various categories (Mavridis et al., 2016) people are struggling to search for, and find, the most desired ones from the standard drop-down boxes. Therefore, a user-friendly skill-picking interface on which it is easy to browse and search for the required skills can significantly boost the efficiency and accuracy of matching the requested skills with suitable workers for each task.

The main focus of this study is the form of information presentation. Interviews were conducted to evaluate the cognitive load users generated during the tasks on both the official web interface and the hypertree interface. Through the analysis of the interview feedback and the subjective ratings of cognitive load, it was revealed that the hypertree interface put a relatively lower cognitive load on participants than the official interface. One potential interpretation is: Presenting the element connection of the skill taxonomy in hypertree rather than textual instruction could enhance the cognitive performance of participants. It was also found that presenting excessive information on both interfaces will cause information overload to the users. Finally, users’ learning of skill taxonomy could be improved by optimising the searching and recommendation functions on both interfaces.

Slides: https://drive.google.com/open?id=14CHxQ9y1SIHoHMgmVBlSfpTqseb67sjY

Title: Measuring social biases in human annotators using counterfactual queries in Crowdsourcing

Authors: Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang and Klaus Mueller

Abstract: Algorithmic bias has been termed as the imminent AI danger faced by our society. Recent studies have shown that machine learning(ML) algorithms are capable of exhibiting social biases like gender, race, etc. A major source of bias in the ML pipeline arises from the training dataset. Crowdsourcing is a popular way to gather labeled data for different ML tasks. As crowdsourcing tasks might involve a subjective component, it’s important to gauge implicit social biases of human annotators and prevent them from spreading into the curated dataset.

We propose a novel way to measure social biases in crowdworkers using counterfactuals queries. Here, we are considering a supervised learning scenario with numeric or categorical input features. A counterfactual to a given query is the most similar query in an alternate world where its sensitive attributes like race, gender, etc. is flipped. Counterfactual queries can be generated using causal inference by measuring the impact of flipping the sensitive attribute on other input features. During the training phase of the user study, we can ask human annotators to label a set of counterfactual queries and measure the deviation in the responses. Zero deviation characterizes perfect unbiased behavior and higher values symbolize more biased behaviour. If the deviation is beyond a specific threshold, we can consider such annotators to be unfit for the study and terminate the study for those labelers. This methodology doesn’t need unbiased labels and biased labelers are disqualified in the training phase itself. Hence, this can serve as a cost-effective way to tackle social biases in crowdsourcing.

Slides: https://drive.google.com/open?id=17KQwUByRPBWzCCrLRf1BnovMxLo69avp

Title: The Use of Algorithms in Predictive Policing – Durham Constabulary Case Study

Authors: Jomanah Alhamidi and Hend Alhudaib

Abstract: Background: machine learning algorithms in law enforcement have been increasingly adopted by governments to reduce crime rates and victimization. That is apparent in the manifestation of predictive policing tools, therefore, reshaping existing power structures within the society and transforming systems dramatically. Nevertheless, such tools in general are evaluated by performance and can be enhanced depending on the data quality and quantity and the selected algorithms.

Aims: to investigate the use of predictive policing algorithmic tools within law enforcement, powered by data from various sources. To detect types of biases and examine their potential harm on society. In addition, to provide solutions that can deal with the use of algorithms where data bias can be eliminated.

Methods: the research is conducted using a qualitative approach which is based on the Durham constabulary case study that uses a predictive policing tool to help make custodial decisions. Furthermore, an interview was recorded to emphasise the social implications of using predictive policing on individuals from the perspectives of human rights and data protection.

Results: Types of biases have been identified through the investigation; racial bias where the use of the algorithms tends to falsely predict high-level of reoffending in the black race more than the white. The accuracy of the system used is 88% true in the high-risk cases.

Social implications: False crime-prediction creates controversial and inaccurate outcomes due to the high reliance on the assumption that machines think like humans and can decide on racial and social class equity. Although number of algorithms in the predictive policing have been tuned and enhanced to accommodate fairness in terms of predicting the same accuracy level for both white and black race defendants, reducing racial bias remains the responsibility of the human-in-the-loop as the main factor in terms of making judgements against certain individuals.

Slides: Author/s do not authorize to publish