Accepted Papers

Healthcare, Epidemics, Climate and Pandemics

Localization of Malaria Parasites and White Blood Cells in Thick Blood Smears | [pdf]

Authors: Rose Nakasi (Makerere University); Ernest Mwebaze (Google); Aminah Zawedde (Makerere University); Jeremy Tusubira (Makerere University); Gilbert Maiga (Makerere University)

Abstract: Effectively determining malaria parasitemia is a critical aspect in assisting clinicians to accurately determine the severity of the disease and provide quality treatment. Microscopy applied to thick smear blood smears is the de facto method for malaria parasitemia determination. However, manual quantification of parasitemia is time consuming, laborious and requires considerable trained expertise which is particularly inadequate in highly endemic and low resourced areas. This study presents an end-to-end approach for localisation and count of malaria parasites and white blood cells (WBCs) which aid in the effective determination of parasitemia; the quantitative content of parasites in the blood. On a dataset of slices of images of thick blood smears, we build models to analyse the obtained digital images. To improve model performance due to the limited size of the dataset, data augmentation was applied. Our preliminary results show that our deep learning approach reliably detects and returns a count of malaria parasites and WBCs with a competitive precision and recall. We also evaluate our system against human experts and results indicate a strong correlation between our deep learning model counts and the manual expert counts (p=0.998 for parasites, p=0.987 for WBCs). This approach could potentially be applied to support malaria parasitemia determination especially in settings that lack sufficient Microscopists.

2. Detection of Malaria Vector Breeding Habitats using Topographic Models [pdf]

Authors: Aishwarya N Jadhav (Veermata Jijabai Technological Institute)

Abstract: Treatment of stagnant water bodies that act as a breeding site for malarial vectors is a fundamental step in most malaria elimination campaigns. However, identification of such water bodies over large areas is expensive, labour- intensive and time-consuming and hence, challenging in countries with limited resources. Practical models that can efficiently locate water bodies can target the limited resources by greatly reducing the area that needs to be scanned by the field workers. To this end, we propose a practical topographic model based on easily available, global, high-resolution DEM data to predict locations of potential vector-breeding water sites. We surveyed the Obuasi region of Ghana to assess the impact of various topographic features on different types of water bodies and uncover the features that significantly influence the formation of aquatic habitats. We further evaluate the effectiveness of multiple models. Our best model outperforms earlier attempts that employ topographic variables for detection of small water sites, even the ones that utilize additional satellite imagery data, by 9.8% and demonstrates robustness across different settings

3. Unsupervised learning for economic risk evaluation in the context of Covid-19 pandemic [pdf]

Authors: Santiago E Cortes (Factored.ai); Andrea Quintero (EAFIT university)

Abstract: Justifying draconian measures during the Covid-19 pandemic was difficult notonly because of the restriction of individual rights, but also because of its eco-nomic impact. The objective of this work is to present a machine learning ap-proach to identify regions that should implement similar health policies. For that end, we successfully developed a system that gives a notion of economic impact given the prediction of new incidental cases through unsupervised learningand time series forecasting. This system was built taking into account computational restrictions and low maintenance requirements in order to improve thesystem’s resilience. Finally this system was deployed as part of a web applica-tion for simulation and data analysis of COVID-19, in Colombia, available at(https://epidemiologia-matematica.org).

4. Learning Explainable Interventions to Mitigate HIV Transmission in Sex Workers Across Five States in India[pdf]

Authors: Raghav Awasthi (Indraprastha Institute of Information Technology); Prachi Patel (Catalyst Management Services); Vineet Joshi (Indraprastha Institute of Information Technology Delhi (IIITD)); Shama Karkal (Swasti.org); Tavpritesh Sethi (Indraprastha Institute of Information Technology Delhi (IIITD))

Abstract: Female sex workers(FSWs) are one of the most vulnerable and stigmatized groups in society. As a result, they often suffer from a lack of quality access to care. Grassroot organizations engaged in improving health services are often faced with the challenge of improving the effectiveness of interventions due to complex influences. This work combines structure learning, discriminative modeling, and grass-root level expertise of designing interventions across five different Indian states to discover the influence of non-obvious factors for improving safe-sex practices in FSWs. A bootstrapped, ensemble-averaged Bayesian Network structure was learned to quantify the factors that could maximize condom usage as revealed from the model. A discriminative model was then constructed using XgBoost and random forest in order to predict condom use behavior The best model achieved 83% sensitivity, 99% specificity, and 99% area under the precision-recall curve for the prediction. Both generative and discriminative modeling approaches revealed that financial literacy training and confidence was the primary influence and predictor of condom use in FSWs. These insights have led to a currently ongoing field trial for assessing the real-world utility of this approach. Our work highlights the potential of explainable models for transparent discovery and prioritization of anti-HIV interventions in female sex workers in a resource-limited setting.

5. Deep Learning Towards Efficiency Malaria Dataset Creation [pdf]

Authors: Nyamos S Waigama (The University of Dodoma); Martha Stephen Shaka (The University of Dodoma); Frederick R Apina (University of Dodoma); Emilian A Ngatunga (University of Dodoma); Saidi Mmaka (University of Dodoma); Halidi S Maneno (University of Dodoma)

Abstract: The exponential growth in digital technologies characterizing the fourth industrial revolution, such as artificial intelligence, offers an exciting opportunity to save lives threatened by malaria across sub-Saharan Africa. According to UNICEF, malaria kills a child every 2 minutes, especially in marginalized communities. Malaria mortality can be drastically reduced by ensuring prompt access to diagnosis and treatment. However, using the microscope, the recommended diagnosis tool, is expensive, expert dependent, time-consuming for a single diagnosis and becomes impractical in areas with a high disease burden. Traditional data labeling processes use tools such as Labelmg to manually annotate every object and this makes it an expensive process and time consuming thus leading to limited dataset especially in a marginalized community. In this project, we present findings from research using deep learning techniques to facilitate a fast and effective creation of ground truth datasets to be used in developing relevant malaria diagnosis tools, drawing on data from Tanzania. Our results demonstrate that it took one third less time with high efficiency to annotate the dataset compared to traditional methods. This annotation technique provides the assurance of the availability of high-quality labeled malaria datasets that can be used to develop machine-learning based malaria diagnosis tools. We will release the created ground truth dataset along with the annotation tool for the research community.

6. Incorporating Healthcare Motivated Constraints in Restless Bandit Based Resource Allocation [pdf]

Authors: Aviva Prins (University of Maryland, College Park); Aditya S Mate (Harvard University); Jackson Killian (Harvard University); Rediet Abebe (Harvard University); Milind Tambe (Harvard University)

Abstract: As reinforcement learning is increasingly being considered in the healthcare space, it is important to consider how best to incorporate practitioner expertise. One notable case is in improving tuberculosis drug adherence, where a health worker must simultaneously monitor and provide services to many patients. We find that---without considering domain expertise---state of the art algorithms allocate all resources to a small number of patients, neglecting most of the population. To avoid this undesirable behavior, we propose a human-in-the-loop model, where constraints are imposed by domain experts to improve equitability of resource allocations. Our framework enforces these constraints on the distribution of actions without significant loss of utility on simulations derived from real-world data. This research opens a new line of research inquiry on human-machine interactions in restless multi-armed bandits.

7. Learning drivers of climate-induced human migrations with Gaussian processes [pdf]

Authors: Gustau Camps-Valls (Universitat de València); Maria Piles Guillem (Universitat de València); Jose Maria JM Tarraga (Universidad de Valencia)

Abstract: In the current context of climate change, extreme heat waves, droughts and floods are not only impacting the biosphere and atmosphere but the anthroposphere too. Human populations are forcibly displaced, which are now referred to as climate-induced migrants. In this work we investigate which climate and structural factors forced major human displacements in the presence of floods and storms during years 2017-2019. We built, curated and harmonized a database of meteorological and remote sensing indicators along with structural factors of 27 developing countries world-wide. We show how we can use Gaussian Processes to learn what variables can explain the impact of floods and storms in a context of forced displacements and to develop models that reproduce migration flows. Our results at regional, global and disaster-specific scales show the importance of structural factors in the determination of the magnitude of displacements. The study may have both societal, political and economical implications.

Diacritics and Languages

8. The Challenge of Diacritics in Yoruba Embeddings [pdf]

Authors: Oluwatosin Adewumi (Luleå University of Technology)

Abstract: The major contributions of this work include the empirical establishment of a better performance for Yoruba embeddings from undiacritized (normalized) dataset and provision of new analogy sets for evaluation. The Yoruba language, being a tonal language, utilizes diacritics (tonal marks) in written form. We show that this affects embedding performance by creating embeddings from exactly the same Wikipedia dataset but with the second one normalized to be undiacritized. We further compare average intrinsic performance with two other work (using analogy test set; WordSim) and we obtain the best performance in WordSim; corresponding Spearman correlation.

9. Automated and interpretable m-health discrimination of vocal cord pathology enabled by machine learning [pdf]

Authors: Nabeel Seedat (University of the Witwatersrand); Vered Aharonson (University of the Witwatersrand); Yaniv Hamzany (Rabin Medical Center)

Abstract: Clinical methods that assess voice pathologies are based on laryngeal endoscopy or audio-perceptual assessment, which limits access in low-resourced healthcare settings. M-health systems can provide a quantitative assessment and improve early detection in a patient centered care. We apply automated methods for voice pathology assessment using machine learning methods to discriminate between vocal cord polyp and vocal cord paralysis. Acoustic and spectral features are extracted from audio recording obtained from a low-cost device and multiple classifiers compared using batched cross validation, with the best classifier being the Extra Trees classifier. Explainable AI (XAI) and feature interpretability analysis is carried out to allow clinicians to use the features marked as important in clinical care and planning. The most important features are octave-based spectral contrast and MFCCs 0 to 3. The results indicate a feasibility of machine learning to accurately discriminate between different types of vocal cord pathologies.

10. Accurate and Scalable Matching of Translators to Displaced Persons for Overcoming Language Barriers [pdf]

Authors: Divyansh Agarwal (Delta Analytics), Yuta Baba (Delta Analytics), Pratik Sachdeva (University of California, Berkeley), Tanya Tandon (Delta Analytics), Thomas Vetterli (Delta Analytics), Aziz Alghunaim (Tarjimly)

Abstract: Residents of developing countries are disproportionately susceptible to displacement as a result of humanitarian crises. During such crises, language barriers impede aid workers in providing services to those displaced. To build resilience, such services must be flexible and robust to a host of possible languages. Anonymous(1) aims to overcome these barriers by providing a platform capable of matching bilingual volunteers to displaced persons or aid workers in need of translating. However, Anonymous’ large pool of translators comes with the challenge of selecting the right translator per request. In this paper, we describe a machine learning system capable of matching translator requests to volunteers at scale. We demonstrate that a simple logistic regression, operating on easily computable features, can accurately predict and rank translator response. In deployment, this lightweight system matches 82% of requests with a median response time of 59 seconds, allowing aid workers to accelerate their services supporting displaced persons.

Mapping Using Non-Traditional Data Sources

11. Assessing the Quality of Gridded Population Data for Quantifying the Population Living in Deprived Communities [pdf]

Authors: Agatha Mattos (University College Dublin); Gavin McArdle (University College Dublin); Michela Bertolotto (University College Dublin)

Abstract: Over a billion people live in slums in settlements that are often located in ecologically sensitive areas and hence highly vulnerable. This is a problem in many parts of the world, but it is more poignant in low income countries, where in 2014 on average 65% of the urban population lived in slums. As a result, building resilient communities requires quantifying the population living in these deprived areas and improving their living conditions. However, most of the data about slums comes from census data which is only available at aggregate levels and often excludes these settlements. Consequently, researchers have looked at alternative approaches. These approaches, however, commonly rely on expensive high-resolution satellite imagery and field-surveys which hinders their large scale applicability. In this paper we investigate a cost-effective methodology to estimate the slum population by assessing the quality of gridded population data for this task. We evaluate the accuracy of the WorldPOP and LandScan population layers against ground-truth data composed of 1703 georeferenced polygons that were mapped as deprived areas and which had their population surveyed during the 2010 Brazilian census. While the LandScan did not produce satisfactory results for most polygons, the WorldPOP estimates were less than 20% off for 67% of the polygons and the overall error for the totality of the studied area was only -5.9%, which suggests that population layers with a resolution of at least a 100m, such as WorldPOP’s, can be useful tools to estimate the population living in slums.

12. Interpretable Poverty Mapping using Social Media Data, Satellite Images, and Geospatial Information [pdf]

Authors: Chiara Ledesma (Thinking Machines Data Science); Oshean Garonita (Thinking Machines Data Science); Lorenzo Jaime Flores (Thinking Machines Data Science); Isabelle B Tingzon (Thinking Machines Data Science); Danielle Dalisay (Thinking Machines Data Science)

Abstract: Access to accurate, granular, and up-to-date poverty data is essential for humanitarian organizations to identify vulnerable areas for poverty alleviation efforts. Recent works have shown success in combining computer vision and satellite imagery for poverty estimation; however, the cost of acquiring high-resolution images coupled with black box models can be a barrier to adoption for many development organizations. In this study, we present an interpretable and cost-efficient approach to poverty estimation using machine learning and readily accessible data sources including social media data, low-resolution satellite images, and volunteered geographic information. Using our method, we achieve an R2 of 0.66 for wealth estimation in the Philippines, compared to 0.63 using satellite imagery. Finally, we use feature importance analysis to identify the highest contributing features both globally and locally to help decision makers gain deeper insights into poverty.

13. Combining Twitter and Earth Observation Data for Local Poverty Mapping [pdf]

Authors: Lukas Kondmann (German Aerospace Center); Matthias Häberle (German Aerospace Center); Xiaoxiang Zhu (Technical University of Munich (TUM); German Aerospace Center (DLR))

Abstract: Accurate and timely data on economic development is essential for policy-makers in low- and middle-income countries where such data is often unavailable. To fill this gap, existing approaches have used alternative data sources to proxy for levels of local development such as satellite imagery or mobile phone data. In this paper, we underline the power of an underrated data source for poverty mapping: Geolocated tweets. We show that the number of tweets in a region as singular input can already explain 55 % of the variation in local wealth in Sub-Saharan Africa with a simple Random Forest model. When nighttime light and Twitter usage information are combined as inputs to a Random Forest model they already explain 65% of the variation in local wealth which is in the range of state-of-the-art neural network architectures based on satellite images. Our results show that the naive combination of these data sources in a random forest is already competitive in performance and more elaborate fusion approaches are a promising direction to advance the accuracy of poverty mapping.

14. Crowd-Sourced Road Quality Mapping in the Developing World [pdf]

Authors: Benjamin Choi (Stanford University); John Kamalu (Stanford University)

Abstract: Road networks are among the most essential components of a country’s infrastructure. By facilitating the movement and exchange of goods, people, and ideas, they support economic and cultural activity both within and across borders. Up-to-date mapping of the the geographical distribution of roads and their quality is essential in high-impact applications ranging from land use planning to wilderness conservation. Mapping presents a particularly pressing challenge in developing countries, where documentation is poor and disproportionate amounts of road construction are expected to occur in the coming decades. We present a new crowd-sourced approach capable of assessing road quality and identify key challenges and opportunities in the transferability of deep learning based methods across domains.

Satallite Imagery and Satallite Measurements as a Tool for Improving Resilience

15. I Spy With My Electricity Eye: Predicting levels of electricity consumption for residential buildings in Kenya from satellite imagery. [pdf]

Authors: Simone Fobi (Columbia University); Jay Taneja (University of Massachusetts); Vijay Modi (Columbia University)

Abstract: Electricity access planning requires an accurate estimate of electricity consumption levels across different locations. Electricity consumption can be inferred from various sources including direct but scarce sources like survey data and indirect but widely available sources like nighttime lights and satellite images. Leveraging Convolutional Neural Networks (CNNs), this work presents a novel approach for predicting levels of stable residential electricity consumption for buildings using daytime satellite images. We train our model on a dataset of residential electricity bills and corresponding satellite images of the buildings. Our results show that learning from images outperforms a baseline model using VIIRS nighttime lights when predicting individual consumption levels for residential buildings in Kenya.

16. Hi-UCD: A Large-scale Dataset for Urban Semantic Change Detection in Remote Sensing Imagery [pdf]

Authors: Shiqi Tian (Wuhan University); Zhuo Zheng (Wuhan University); Ailong Ma (Wuhan University); Yanfei Zhong (Wuhan University)

Abstract: With the acceleration of the urban expansion, urban change detection (UCD), as a significant and effective approach, can provide the change information with respect to geospatial objects for dynamical urban analysis. However, existing datasets suffer from three bottlenecks: (1) lack of high spatial resolution images; (2) lack of semantic annotation; (3) lack of long-range multi-temporal images. In this paper, we propose a large scale benchmark dataset, termed Hi-UCD. This dataset uses aerial images with a spatial resolution of 0.1 m provided by the Estonia Land Board, including three-time phases, and semantically annotated with nine classes of land cover to obtain the direction of ground objects change. It can be used for detecting and analyzing refined urban changes. We benchmark our dataset using some classic methods in binary and multi-class change detection. Experimental results show that Hi-UCD is challenging yet useful. We hope the Hi-UCD can become a strong benchmark accelerating future research.

17. Inferring High Spatiotemporal Air Quality Index - A Study in Bangkok [pdf]

Authors: Muhammad Rizal Khaefi (Pulse Lab Jakarta)

Abstract: Robust estimates of human exposure to inhaled air pollutants are necessary for a realistic appraisal of the risks these pollutants pose, and for the design and implementation of strategies to control and limit those risks. However, these tasks are challenging in Bangkok due to few number of official air quality sensors and huge city administrative areas. This research couples data from government official air quality sensors with multiple data sources from ride-hailing, satellite measurement, transportation, official statistics, and meteorological information to infers daily air quality index in three months sample of high, normal, and low seasons for whole Bangkok city at 1km x 1km spatial resolution. The best model shows 0.6 r2 performance using a Land Use Regression (LUR) approach.

18. Application of Convolutional Neural Networks in Food Resource Assessment [pdf]

Authors: Muhammad Shakaib Iqbal (COMSATS University); Talha Iqbal (National University of Ireland); Hazrat Ali (Umea University, Umea)

Abstract: The Kingdom of Tonga faces frequent natural disasters that directly cause damages to natural food resources such as coconut trees. This work presents a deep learning approach to segment coconut trees within a given satellite imagery. This can provide support is assessment of damages to food resource. The pipeline uses mask R-CNN with ResNet-101 backbone architecture. The aerial imagery is provided through the AI competition and organized by The World Bank in collaboration with OpenAerialMap and WeRobotics. We perform several experiments with different configuration parameters and report the best configuration for detection of coconut trees with more than 90% confidence factor. For evaluation purpose, we use the Microsoft COCO dataset evaluation metric namely mean average precision (mAP). We achieve an overall 91% mean average precision for coconut trees detection.

19. Enhancing Poaching Predictions for Under-Resourced Wildlife Conservation Parks Using Remote Sensing Imagery [pdf]

Authors: Rachel Guo (Harvard University); Lily Xu (Harvard University); Milind Tambe (Harvard University)

Abstract: Illegal wildlife poaching is driving the loss of biodiversity. To combat poaching, rangers patrol expansive protected areas for illegal poaching activity. However, rangers often cannot comprehensively search such large parks. Thus, the Protection Assistant for Wildlife Security (PAWS) was introduced as a machine learning approach to help identify the areas with highest poaching risk. As PAWS is deployed to parks around the world, we recognized that many parks have limited resources for data collection and therefore have scarce feature sets. To ensure under-resourced parks have access to meaningful poaching predictions, we introduce the use of publicly available remote sensing data to extract features for parks. By employing this data from Google Earth Engine, we also incorporate previously unavailable dynamic data to enrich predictions with seasonal trends. We automate the entire data-to-deployment pipeline and find that, with only using publicly available data, we recuperate prediction performance comparable to predictions made using features manually computed by park specialists. We conclude that the inclusion of satellite imagery creates a robust system through which parks of any resource level can benefit from poaching risks for years to come.

Economic Resilience

20. Assessing the use of transaction and location based insights derived from Automatic Teller Machines (ATM’s) as near real time “sensing” systems of economic shocks [pdf]

Authors: Dharani Dhar Burra (United Nations Global Pulse ); Sriganesh Lokanathan (United Nations Global Pulse)

Abstract: Big data sources provide a significant opportunity for governments and development stakeholders to “sense” and identify in near real time, economic impacts of shocks on populations at high spatial and temporal resolutions. In this study, we assess the potential of transaction and location based measures obtained from automatic teller machine (ATM) terminals, belonging to a major private sector bank in Indonesia, to “sense” in near real time, the impacts of shocks across income groups. For every customer and separately for years 2014 and 2015, we model the relationship between aggregate measures of cash withdrawals for each year, total inter-terminal distance traversed by the customer for the specific year and reported customer income group. Results reveal that the model was able to predict the corresponding income groups with 80% accuracy, with high precision and recall values in comparison to the baseline model, across both the years. Shapley values further show that the total inter-terminal distance traversed by a customer across multiple ATM terminals in each year had significantly high resolving power. Descriptive statistics and Kruskal-Wallis testing further confirm these observations, and reveal that customers in the lower-middle class income group, have significantly high median values of inter-terminal distances traversed (7.21 Kms for 2014 and 2015) in comparison to high (2.55 Kms and 0.66 Kms for years 2014 and 2015), and low (6.47 Kms for 2014 and 2015) income groups. Although no shocks were noted in 2014 and 2015, results show that lower-middle class income group customers, exhibit relatively high mobility in comparison to customers in low- and high-income groups. Additional work is required to further realize the potential of the sensing capabilities of this data source in the event of a shock, however its ability to provide high-resolution insights on, who, where and by how much is the population impacted by a shock, can facilitate targeted responses.

21. Who is more ready to get back in shape? [pdf]

Authors: Rajius Idzalika (Pulse Lab Jakarta)

Abstract: This empirical study estimates resilience (adaptive capacity) around the periods of the 2013 heavy flood in Cambodia. We use nearly 1.2 million microfinance institution (MFI) customer data and implement the unsupervised learning method. Our results highlight the opportunity to develop resilience by having a better understanding of which areas are likely to be more or less resilient based on the characteristics of the MFI customers, and the individual choices or situations that support stronger adaptiveness. We also discuss the limitation of this approach.

22. Bandit Data-driven Optimization: AI for Social Good and Beyond [pdf]

Authors: Zheyuan Ryan Shi (Carnegie Mellon University); Steven Wu (Carnegie Mellon University); Rayid Ghani (Carnegie Mellon University); Fei Fang (Carnegie Mellon University)

Abstract: The use of machine learning (ML) systems in real-world applications entails more than just a prediction algorithm. AI for social good (AI4SG) applications, and many real-world ML tasks in general, feature an iterative process that joins prediction, optimization, and data acquisition in a loop. We introduce bandit data-driven optimization, the first iterative prediction-prescription framework to combine the advantages of online bandit learning and offline predictive analytics. It offers a flexible setup to reason about unmodeled policy objectives and unforeseen consequences. We propose PROOF, the first algorithm for this framework and show that it achieves no-regret. PROOF achieves superior performance over existing baseline in numerical simulations.

Google Sites

Report abuse