Schedule

Recordings for each talk are linked in the schedule below and on our Youtube channel.

 Schedule

9:30-9:45 A.M.

Check-In & Welcome

9:30-9:45 a.m.

Bistra Dilkina and Eric Rice, USC CAIS Co-Directors

9:45-10:15 A.M.

Using Natural Language Processing and Discourse Analysis to Understand LGBTQ+ Marginalization

9:45-9:55 a.m.

Abstract: People who identify as lesbian, gay, bisexual, transgender or with other minoritized sexual and gender identities (LGBT+) experience pronounced disparities in social adversity, behavioral risks, and health and well-being. Most studies of LGBT+ health disparities stem from self-reported federal health surveys, the majority of which have only begun collecting self-reported sexual orientation and gender identity (SOGI) data within the last 5 years. Although inclusion of SOGI in surveys has propelled health equity research for LGBT+ populations, studies still suffer from hampered statistical power due to small sample sizes and limited measures collected in these surveys. Recent efforts to identify LGBT+ Veterans have been underway at the Veterans Health Administration, capitalizing on large data repositories of electronic health records (EHR) containing both structured (e.g., diagnosis coding) and unstructured (e.g., clinical progress notes) data. This presentation will feature, first, a summary of selected efforts, some of which include machine learning algorithms and natural language processing. These include studies corroborating use of International Classification of Disease (ICD) codes as a proxy of identifying transgender and gender diverse patients; identifying LGBT-related terminology based on natural language processing of clinical progress notes; and cross-referencing structured EHR data on sex with data from official death records to reveal misclassification for transgender and gender diverse patients. Second there will be a discussion of quandaries related to ethical use and interpretation of as well as systemic biases in EHR data for finding marginalized populations, such as concerns around activating passive research into patient recruitment and where, in the clinical ecosystem, mentions of unstructured SOGI data are most likely to appear. Third there will be consideration of the compromises around including SOGI information in administrative data in contexts that are hostile to LGBT+ individuals. For example, in states that have or are currently criminalized gender affirming care, will we see intentionally evasive or “guerilla” coding to assure patients can get care but would disrupt research efforts reliant on structured and unstructured data? Would structured SOGI fields be feasible in health care contexts given overt and covert discrimination against LGBT+ people, which could lead to patients refusing to answer such questions? 

9:55-10:05 a.m.

Abstract: In June of 2020, the names of George Floyd, Breona Taylor, and Ahmaud Arbery became rallying cries for the Black Lives Matter (BLM) movement and its stand against police brutality and racial injustice. Amidst those cries, were the names of other lesser known victims — e.g., Dominique Fells, Riah Milton, Tony McDade — who represented just a few of the at least 302 violent deaths of transgender and gender-nonconforming people reported since 2013, a majority of whom have been Black.

Bearing witness to this “epidemic” of violence, largely through social media, is the Black Trans Lives Matter (BTLM) movement. BTLM is a grassroots response to the oppression of Black transgender people and the perceived failures of LGBTQ+ rights and Black Lives Matter movements and other institutions to adequately prioritize their intersectional struggles. In this presentation, I will discuss a new line of research that endeavors to bring computational and human-centered methods to bear on the interrogation of contemporary media framings of this issue. Specifically, the study aims to characterize social media discourse about lost lives in the Black trans community and compare that to the discourse of mainstream media on the same subject with the purpose of unpacking discrepancies between grassroots (community-led) and institutional framings of this critical issue and the implications of these discrepancies for public understanding and structural reform.

To these ends, we draw on Twitter data collected March - July 2021 using the Twitter Streaming API and a series of keywords and and phrases representative of the BTLM movement itself (e.g., “btlm,” “black trans lives matter”) and those related to Black trans collective action more broadly (e.g., “black trans activists, “black trans rights”). This larger corpus was further filtered using terms for the racial and sexual orientation and gender identity (SOGI) demographics of interest combined with terms about loss of life (e.g., “African American” + “transgender” + “murder”). Tweets mentioning names of known Black transgender people who were victims of fatal violence in 2020 and 2021 were also included. This yielded a deduplicated dataset of 11,035 tweets about loss of life. To characterize these tweets we inductively developed a thematic coding scheme that we then applied to a random 10% sample of the dataset (n = 1,104). In total, 14 themes were identified, for example documenting cases, perpetrators, calls for action, systemic causes, and bridging struggles of Black trans people with other minority groups. 

The presentation will be organized in three parts. First, I will highlight the research infrastructure that supports the project, with an emphasis on its mixed methodological approach and community-based orientation. Second, I will preview the data and preliminary findings, focusing primarily on the themes we discovered in the human-centered coding of tweets about loss of life in the Black trans community. Finally, I will conclude with an overview of next steps and thoughts on the viability of using publicly available social media data to fill information gaps in victim case files, to inform policy reform, and to intervene in high risk locations.

10:05-10:10 a.m.

Abstract: The presence of non-binary gender individuals in social networks is increasing; however, the relationship between gender and activity within online communities is not well understood and limited by the failures of automated gender recognition algorithms to recognize non-binary individuals. We use natural language processing to investigate individual identity on the Twitter platform, focusing on gender expression as represented by users' chosen pronouns from among  14 different pronoun groups. We find that non-binary groups tend to be more active on the platform, preferring to post messages rather than liking others' messages, compared to binary groups. Additionally, non-binary groups receive more replies, but their messages are reshared and liked less. We also find significant variation in the emotional expressions within non-binary groups. The study highlights the importance of considering gender as a spectrum, rather than a binary, in understanding online interactions and expression.

10:10-10:15 a.m.

Q&A

10:15-10:45 A.M.

Algorithmic Fairness and Robustness: Methodological and Public Policy Implications

10:15-10:25 a.m.

Abstract: In consequential domains, we ultimately require a human in the loop to oversee decisions. Hence, algorithmic allocations, at best, are only encouragements to the human decision-maker towards one decision or another. When these decisions have causal effects on the outcome, such as recommendations for different levels of sanction or support, this relates to the study of (optimal) encouragement designs in causal inference. Under this point of view, algorithmic recommendations are ultimately only encouragements into treatments that have causal effects, but algorithmic recommendations by themselves do not have causal effects. We study optimal encouragement designs in consequential settings and implications for racial disparities. Our motivating example is the case of supervised release/electronic monitoring in pretrial risk assessments. Currently, supervised release programs are expanding, leading to concerns about net-widening or undue surveillance. Despite this, there remains little research on who, if anyone, ought to receive supervised release. Judges have wide discretion in who to release. Some previously used decision-making matrices do make recommendations regarding electronic monitoring, but it is not clear on what grounds. Modeling the two-stage decision-making process can help audit whether racial disparities in outcomes arise from differential causal effects or differential compliance from humans-in-the-loop, which are policy-relevant distinctions.

Our contributions in this work are as follows: we study algorithmic allocation in optimal encouragement designs, when algorithms are only encouragements for the human-in-the-loop to assign treatment. We consider a setting with additive costs for the firm/state for final outcomes and treatment take-up. Motivated by practical resource constraints that limit the total expected cost of treatment, we first study comparative statics and fairness considerations of optimal encouragement design allocations. We study demographic parity constraints on total expected treatment take-up and welfare implications relative to unconstrained optimal allocations. This motivates novel formulations of optimal disparity reduction subject to constraints on utility loss reduction relative to the unconstrained optimal solution. With these population characterizations in hand, we move on to estimation. We develop multiply robust estimators that improve statistical estimation based on estimated outcome models (probabilities of new criminal activity given supervised release, for example) and estimated treatment assignment models (probabilities of a judge assigning supervised release given algorithmic recommendation). Leveraging a stochastic optimization formulation of the problem, we develop algorithms for policy learning and profiled inference.

Lastly, we conduct extensive analysis of a case study of the Arnold Public Safety Assessment Decision-Making Framework and investigate supervised release in Cook County. The given PSA DMF at the time recommended supervised release for defendants in an intermediate range. The evidence on beneficial treatment effects of supervised release (i.e. electronic monitoring rather than pretrial detention or unconditional release) is mixed. We observe wide variation in judges assigning supervised release beyond the recommendation. We apply our methods and analysis on a dataset of judicial decisions regarding detention, bail, electronic monitoring and release.

10:25-10:30 a.m.

Abstract: We study the problem of allocating scarce societal resources of different types (e.g., permanent housing, deceased donor kidneys for transplantation, ventilators) to heterogeneous allocatees on a waitlist (e.g., people experiencing homelessness, individuals suffering from end-stage renal disease, Covid-19 patients) based on their observed covariates. We leverage administrative data collected in deployment to design a counterfactual online policy that maximizes expected outcomes while satisfying budget and fairness constraints, in the long run. Our proposed policy waitlists each individual for the resource maximizing the difference between their estimated mean treatment outcome and the estimated resource dual-price or, roughly, the opportunity cost of using the resource. Resources are then allocated as they arrive, in a first-come first-serve fashion. We demonstrate that our data-driven policy converges to the optimal out-of-sample policy under mild technical assumptions. We evaluate the performance of our approach on the problem of designing policies for allocating scarce housing resources to people experiencing homelessness in Los Angeles based on data from the homeless management information system. In particular, we show that using our policies improves rates of exit from homelessness by 1.2% and that policies that are fair in either allocation or outcomes by race come at very low price of fairness.

10:30-10:35 a.m.

Abstract: While training fair machine learning models has been studied extensively in recent years, the majority of the developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data.  Further, the proposed mitigation solutions are either designed based on the assumption of having access to the causal graph describing the interaction of different features, or knowing the exact type of the distribution shift apriori. 

In this talk, we propose the first distribution-shift-agnostic fairness framework with convergence guarantees for both full-batch and stochastic first-order optimization methods. 

More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under L_p norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation.  We have demonstrated the superiority of the presented framework in terms of performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.

10:35-10:40 a.m.

Abstract: We consider the problem of learning classification trees that are robust to distribution shifts between training and testing/deployment data. This problem arises frequently in high stakes settings such as public health and social work where data is often collected using self-reported surveys which are highly sensitive to e.g., the framing of the questions, the time when and place where the survey is conducted, and the level of comfort the interviewee has in sharing information with the interviewer. We propose a method for learning optimal robust classification trees based on mixed-integer robust optimization technology. In particular, we demonstrate that the problem of learning an optimal robust tree can be cast as a single-stage mixed-integer robust optimization problem with a highly nonlinear and discontinuous objective. We reformulate this problem equivalently as a two-stage linear robust optimization problem for which we devise a tailored solution procedure based on constraint generation. We evaluate the performance of our approach on numerous publicly available datasets, and compare the performance to a regularized, non-robust optimal tree. We show an increase of up to 14.16% in worst-case accuracy and of up to 4.72% in average-case accuracy across several datasets and distribution shifts from using our robust solution in comparison to the non-robust one.

10:40-10:45 a.m.

Q&A

Break 10:45-11:00 a.m.

11:00-11:30 A.M.

AI, Conservation, and Disaster Resilience

11:00-11:10 a.m.

Bistra Dilkina

11:10-11:15 a.m.

Abstract: Natural disasters such as earthquakes and floods, cause widespread disruptions to critical infrastructures, individual lives, as well as community well-being. In particular, the damage and disabling of critical infrastructures such as water systems can have extremely detrimental effects. Therefore, it is vital to have informed, effective and efficient disaster mitigation, preparedness and response that make them more resilient and minimize negative impacts when failures occur. In this research, we study an important infrastructure of city water pipe networks. We work closely with the Los Angeles Department of Water and Power (LADWP). Los Angeles has high seismic hazard exposure that can result in water pipe breakage and disruption of water availability to customers such as hospitals, evacuation centers and fire stations during disasters. However, given the complexity of the water network, the spatially-varied seismic risk and locations of customers as well as the limited resources available, the planning problem becomes complex. In this work, we make several contributions. First, we develop optimization approaches based on integer linear programming to strategically identify where seismic resilient pipes should be installed. The total pipe length to be upgraded is more than a hundred miles but the budget of LADWP is only about 10-30 miles per year. As the second contribution, we investigate the problem of planning partial network installments to maximize efficiency over the years and develop an effective sequential planning algorithm. Third, we apply the methods to this problem in the context of Los Angeles. Our approach finds optimal plans within just 18 minutes for an entire Service Zone in LA that includes 34,462 pipes and 300 critical customers. Our close collaboration with LADWP and their strong interest in using our methods resulted in us creating a first usable prototype of the algorithms. We have delivered the first version to LADWP and they were in fact able to run the prototype in-house on their own systems and with their own data. We received preliminary feedback on usability and usefulness to expand the preliminary tool into a fully functional prototype, while scoping it beyond the needs of this single client but with the outlook to the broader set of potential customers and their needs.

11:15-11:20 a.m.

Abstract: Wildlife trafficking (WT), the illegal trade of wild fauna, flora, and their parts, directly threatens biodiversity and conservation of trafficked species, while also negatively impacting human health, national security, and economic development. 

Wildlife traffickers obfuscate their activities in plain sight, leveraging legal, large, and globally linked transportation networks.

To complicate matters, defensive interdiction resources are limited, datasets are fragmented and rarely interoperable, and interventions like setting checkpoints place a burden on legal transportation. As a result, interpretable predictions of which routes wildlife traffickers are likely to take can help target defensive efforts and understand what wildlife traffickers may be considering when selecting routes.

We propose a data-driven model for predicting trafficking routes on the global commercial flight network, a transportation network for which we have some historical seizure data and a specification of the possible routes that traffickers may take. While seizure data has limitations such as data bias and dependence on the deployed defensive resources, this is a first step towards predicting wildlife trafficking routes on real-world data. Our seizure data documents the planned commercial flight itinerary of trafficked and successfully interdicted wildlife. We aim to provide predictions of highly-trafficked flight paths for known origin-destination pairs with plausible explanations that illuminate how traffickers make decisions based on the presence of criminal actors, markets, and resilience systems. 

We propose a model that first predicts likelihoods of which commercial flights will be taken out of a given airport given input features, and then subsequently finds the highest-likelihood flight path from origin to destination using a differentiable shortest path solver, allowing us to automatically align our model's loss with the overall goal of correctly predicting the full flight itinerary from a given source to a destination.

We evaluate the proposed model's predictions and interpretations both quantitatively and qualitatively, showing that the predicted paths are aligned with observed held-out seizures, and can be interpreted by policy-makers.

11:20-11:25 a.m.

Abstract: The devastating effects of wildfires in California have led researchers to explore the use of deep learning models to detect and predict their spread. However, current wildfire spread prediction models require fire perimeter data, which must be manually collected due to occlusion from smoke. To address this challenge, we propose a new model that can predict wildfire spread by mapping out the probability of a fire spreading to a certain location over a given number of days, without using fire perimeter data.

The proposed model utilizes meteorological, topographical, and vegetation imagery data from northern California to predict wildfire spread, as these three factors were found to be the most impactful in wildfire propagation. Meteorological data, including humidity, wind speed, direction, and temperature, from the Environmental Protection Agency (EPA) will be fed into a Long Term Short Term Memory model (LSTM), which has been shown to achieve higher accuracy in wildfire prediction than other sequential models like recurrent neural networks.

Topographical data from the United States Geological Survey’s (USGS) Lidar datasets will be passed through a Convolutional Neural Network (CNN) model to extract terrain features related to elevation. Meanwhile, satellite imagery showing vegetation will be processed through another CNN model to gain information about the most likely direction of fire spread. The outputs of all these models will be combined to produce a heat map corresponding to the probability of wildfire spreading around a particular location.

This proposed model has the potential to improve wildfire prediction accuracy and could be trained on more accessible data. This can help authorities to better prepare for and mitigate the impact of wildfires in California.

Overall, through this project, we hope to demonstrate the value of combining multiple data sources and deep learning models to address complex environmental challenges. The proposed model could be applied to other regions and disasters, contributing to the development of more effective and data-driven approaches to disaster response and management.

11:25-11:30 a.m.

Q&A

11:30 A.M.-12 P.M.

Messages from the Deans

11:30 a.m.-12:00 p.m.

Yannis Yortsos

Dean of USC Viterbi School of Engineering

Vassilios Papadopoulos

Interim Dean of USC Suzanne Dworak-Peck School of Social Work

Dean of USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences

Suzanne Wenzel

Richard M. and Ann L. Thor Professor in Urban Social Development

Associate Dean for Research, USC Suzanne Dworak-Peck School of Social Work

Ishwar K. Puri

Senior Vice President of Research and Innovation, USC Office of Research

12:00-1:30 P.M.

Lunch & Poster Session

12:00-12:30 p.m.

Lunch

12:30-1:30 p.m.

Lunch, continued + Poster Session

1:30-2:00 P.M.

COVID-19 and Diverse Computational Strategies

1:30-1:36 p.m.

Introduction:

The COVID-19 pandemic has greatly impacted the lives of people all over the world. Social media has become a primary source of information and communication during this time, and it has been shown to play a significant role in shaping public health-related attitudes and behaviors. We aim to explore the causal relationship between social media discourse and real-world public health behavior such as lockdowns and vaccinations, using a combination of sentiment analysis and graph neural networks (GNNs) driven influence modeling. The outcome of this work is a model that can predict public health related behavior early. For example, how much adoption of a new vaccine we can expect in a population?

Objectives:

Methods: (i) Data Collection and Pre-processing – We have collected a representative sample of tweets using Twitter's API, considering factors such as the geographic location, language, and age of the users. The collected data was cleaned by removing irrelevant or duplicate tweets. (ii) Sentiment Analysis – We are currently working on generating a representation of the tweets as a vector of sentiment using a sentiment analysis algorithm, which is a combination of a rule-based method, and a machine learning algorithm. (iii) Graph Representation – The relationships between different users will be represented as a graph, and the sentiment over time will form the dynamic attributes. (iii) Graph Neural Network Modeling - A graph neural network will be trained on this graph representation to predict the evolution of sentiment over time. We will identify causal relationships that drive the changes in sentiment. Further, we will find causal relationships between the evolving sentiment and real-world health behavior such as vaccine adoption. (iv) Model Evaluation – The performance of the model will be evaluated using metrics such as mean absolute error in retrospectively predicting changes in contact rates and vaccine adoption based on social media sentiment on policy announcements.

1:36-1:42 p.m.

Purpose

San Francisco County (SFC) shifted many non-emergency healthcare resources to COVID-19 and placed a Shelter in Place (SIP) order that limited nonessential social activities, which reduced HIV care initiation and retention. We quantify the COVID-19 effects on HIV burden among men who have sex with men (MSM) as SFC returns to normal service levels and progresses towards the HIV Ending the Epidemic (EHE) goals.

Methods

We use an individual level, discrete time, Markovian simulation of MSM in SFC that tracks HIV disease progression and treatment while considering COVID-19-related reductions in testing, viral load suppression (VLS), PrEP uptake and retention, reduction in sexual partners, and COVID-19 deaths. To identify the importance of rapid return to normalcy, we consider scenarios where COVID-19 effects end in 2022 or 2025 and compare outcomes to the counterfactual where COVID-19 never occurred. We also examine scenarios where resources are prioritized to new or existing patients from 2023-2025 before all services return to normal. 

Results

The annual number of MSM prescribed PrEP, VLS, new diagnoses, incidence, and knowledge of HIV status rebounds quickly after HIV care returns to normal. The model suggests if COVID-19 effects stop in 2022, COVID-19 effects will reduce PrEP use by 12% (95% uncertainty interval: 11.8%-12.2%) person-years from 2020-2035, VLS by 1.4% (1.1%-1.6%) person-years and increased incidence by 3.1% (1.3%-4.9%) cases and deaths by 1.3% (1%-1.5%). The cumulative burden is larger if these effects end in 2025, with 23.3% (23.2%-23.5%) reduced PrEP person-years, 2.5% (2.3%-2.8%) reduced VLS person-years, and 9.1% (7.4%-10.8%) more new cumulative cases. Prioritizing care to existing versus new patients will not substantially change cumulative incidence, although it will result in more person-years of PrEP but less VLS person-years and more deaths. All EHE goals besides PrEP are unchanged by COVID-19 effects, with incidence and VLS goals unmet even without COVID-19 effects.

Conclusions

The sooner HIV care returns to pre-COVID-19 service levels, the lighter the cumulative burden. For example, the cumulative difference from a non-COVID-19 counterfactual in 2035 is 6% higher if COVID-19 ends in 2025 than in 2022. Whether care is prioritized to new or existing patients does not have a large difference on incidence but may influence PrEP and VLS counts. However, COVID-19 effects do not substantially alter the likelihoods of reaching EHE goals in SFC.

1:42-1:48 p.m.

Abstract: Since its early emergence, the COVID-19 pandemic led to the implementation of various public health policies, such as school closures and restrictions on travel, to stem its spread. While such policies were intended as beneficial health interventions, some affected public sentiment more than others, especially in the early period (first two quarters of 2020) when the virus was not well-understood, vaccines and cures were far from being developed, and healthcare systems were overburdened. Compliance with policies was high at the time, and data is available from that period on the stringency with which these policies were implemented in different states. 

To prepare for future pandemics from a public health standpoint, understanding how different policies (implemented with varying stringencies) impacted the public sentiment or subjective well-being (SWB) of different sociodemographic groups, is critically important. For example, did some policies disproportionately impact the SWB of women more than men, or of lower-income people compared to higher-income people? By rigorously combining individual-level SWB data from Gallup (administered to a representative sample of U.S.-based respondents), and the policy data publicly available in the Oxford Covid-19 Government Response Tracker, it is now possible to study such questions. 

We conducted such a study by using a combination of more traditional statistical methods such as controlled correlations and fixed effects regressions, and a novel conditional inference tree (CIT) model that has previously only been applied either in the medical setting or (in a rare exception) to study climate skepticism. We used this model to quantify and visualize the interaction and conditional effects of relevant socio-demographic variables such as income, gender, occupation and ethnicity, known to be important correlates of SWB. 

Our model showed that strict stay-at home requirements were negatively associated with SWB and that an individual's specific job area was a significant determinant of SWB  but depended strongly on the policy. Furthermore, the model confirmed statistically that income status, gender and the workplace closure policies are significant predictors of SWB. Low-income female individuals were especially found to have significantly lower well-being, although it cannot be ascertained whether this emerged only during the pandemic. This finding confirms what has since been reported in the press: that the pandemic and the hypothesized ‘K-shaped’ economic recovery had disproportionately negative impacts on women, and possibly minorities. 

In subsequent research, we also used the model to understand vaccine acceptance in later phases of the pandemic. Again, the model confirmed statistically that socio-demographic variables, such as age, education, level of household income and education, have significant association with vaccine acceptance, and that there are key points of disagreement with a global survey that was conducted earlier. The model also revealed that trust in government, age, and ethnicity are the important covariates for predicting vaccine hesitancy. 

Overall, our study offers a valuable methodological framework to conduct computational social science analyses that can help inform public policy. We hope that the insights from our model assist in advancing the science of pandemic preparedness from a public health perspective.

1:48-1:54 p.m.

Abstract: This paper explores how people’s beliefs change as a result of the COVID-19 health crisis using the so-called “mismatch” index in India as a case study. The mismatch index measures the difference between own beliefs and beliefs about others on the importance of taking precautions during COVID-19. The mismatch index represents the average of the disagreement in the personal beliefs of an individual and their beliefs of others. It is based on the responses to the questions “How important is it for you to take actions to prevent the spread of COVID-19 in your community?” (Q-Individual) and “How important do other people in your community think it is to take actions to prevent the spread of COVID-19?” (Q-Societal). The response is first discretized on a scale of 20 to 100 (with 20 meaning “not important at all” and 100 meaning “extremely important”) and the mismatch index is calculated as the average difference across respondents defined as Q-individual – Q-societal. We explore how the mismatch index varies with COVID-19 case rates in India across geographical and temporal zones and the factors causing these variations.

 Using MIT Covid Surveys as the primary data set, we support our conclusions with regression analysis, through which we find that the mismatch index increases in periods of high case rate increase. We also provide some qualitative evidence that the development level of a state or territory in India (determined by its average income and education level) plays an important role in shaping the mismatch index. The general trend noticed was that the mismatch index increased whenever the pandemic was at its ‘strongest’. We explore other characteristics such as behavior that contributes to the explanation of the index or some immutable time-invariant characteristics such as geography or culture demographics. The purpose of the state-fixed effects was to control for any time-invariant characteristics such as geography or immutable socio-demographic state characteristics. The purpose of introducing the time-fixed effect was to check whether the coefficient is still significant if the variation of different waves is taken away. If the value decreased, it would indicate that the time dimension did indeed play a role in establishing the relation between the mismatch index and case rates.

Currently, we are working on understanding how a change in beliefs has impacted behavior over different periods of time of the survey. We are then trying to use these beliefs as early predictors of health behaviors.

1:54 - 2:00 p.m.

Q&A

2:00-2:40 P.M.

Public Health, Nutrition, and Pharmacology

2:00-2:10 p.m.

Abstract: Poor diets, including those high in fast food, are a leading cause of morbidity and mortality. Exposure to low-quality food environments, such as ‘food swamps’ saturated with fast food outlets (FFO), is hypothesized to negatively impact diet and related diseases. However, research linking such exposure to diet and health outcomes has generated mixed findings and led to unsuccessful policy interventions. A major research limitation has been a predominant focus on static food environments around the home, such as food deserts and swamps, and sparse availability of information on mobile food environments people are exposed to and food outlets they visit as they move throughout the day. In this work, we leverage population-scale mobility data to examine peoples’ visits to food outlets and FFO in and beyond their home neighborhoods and to evaluate how food choice is influenced by features of food environments people are exposed to in their daily routines vs. individual preference in the US. Using a semi-causal framework and various natural experiments, we find that a 10% higher prevalence FFO (across all food outlets) in an area increases the odds of people moving within it to visit a FFO by approximately 20%. This strong influence of the food environment happens similarly during weekends and weekdays, is largely independent of individual income. Using our results, we investigate multiple intervention strategies to food environments to promote reduced FFO visits. We find that optimal locations for intervention are a combination of where i) the prevalence of FFO is the highest, ii) most decisions about food outlet visits are made, and most importantly, iii) visitors’ food decisions are most susceptible to the environment. Multi-level interventions at the individual behavior- and food environment-level that target areas combining these features could have 1.7x to 4x larger effects than traditional interventions that alter food swamps or food deserts.

2:10-2:15 p.m.

Background. Unintentional drug overdoses (or poisonings) are a leading cause of morbidity of mortality in the U.S. Despite the growing use of polypharmacy - or multiple medications concurrently - evidence on the underlying cause of these overdoses, has been limited to specific medications and medication combinations, including opioid analgesics, benzodiazepines, and stimulants, often associated with drug abuse and misuse. Many commonly used medications and medication combinations, however, have drug effects and interactions that may increase overdose risk, including -adrenergic agonists (e.g., albuterol), -blockers (e.g., metoprolol), antihistamines (e.g., hydroxyzine), muscle relaxants (e.g., tizanidine) or anti-convulsants (e.g., gabapentin). 

Objectives. To characterize common patterns of medications and medication combinations used among individuals with unintentional drug overdoses in the U.S.

Methods. We identified individuals within IQVIA Dx/Hx medical and institutional claims data who experienced an incident, unintentional drug overdose in 2019. Using a self-controlled study design, we analyzed the most common medications and medication combinations (or drug-pairs) during the 60-days preceding the index date for the 'case' and 'control' periods. The index-date for the 'case' period is the date of the incident, unintentional drug overdose event, while the index date for the 'control' period is 12-months prior to the event. We used machine learning approaches (e.g., FP-growth, etc.) to identify medication patterns, overall and by subgroup, including by sex, substance use disorder, and depression diagnosis.

Results: We identified a total of 29,977 individuals 12 years or older with an incident, unintentional drug overdose in the U.S.; 42.9% were male, 68.9% were 40 years or older. The majority (~88%) of individuals used five or more medications concurrently within 60-days prior to having an overdose. More than one-third (38.5%) of individuals were previously diagnosed with depression and thirty percent have a history of substance use disorder.  Overall, albuterol (2.0%) and gabapentin (1.8%) were among the five most common medications; Oxycodone and hydrocodone - both opioid analgesics - were also in the top five but were used by 1.5% of individuals.  These patterns varied by gender and among those with depression and substance use disorder; levothyroxine and sertraline were more commonly used in combinations in women whereas amphetamines and oxycodone were more common in men.  Albuterol and gabapentin were common in both men and women and among those with substance use disorder. 

Conclusions: Although efforts to address drug overdose has focused on opioids, benzodiazepines and stimulants, our findings indicate that other commonly used medications, particularly for the treatment of depression, asthma, and hypertension,  are frequently used alone or combination preceding an unintentional overdose among US individuals.

2:15-2:25 p.m.

Abstract: The healthcare needs of society are constantly changing and it is a responsibility for professional schools to prepare students appropriately to meet these needs. Use of artificial intelligence (AI) may help to align education to meet societal needs, while also maximizing students’ chances of success, including matching for postgraduate training or finding a full-time job in their desired field. 

We have started to build a model called AI-SiPS (Success in Pharmacy School) to identify activities that lead to success upon graduation. The model includes data collected at multiple stages of the PharmD program, with the purpose of identifying key factors for success. These factors include personality traits, background prior to admission, academic and extracurricular activities, professional motivation, mentoring, co-curricular experiences, curricular organization, and confidence. 

The current model is based on data from students from the classes of 2019 to 2022. Data include course grades (n=745 students), a year 3 (P3) survey on perceived readiness for advanced pharmacy practice experiences (APPEs) rotations (n=261), the student rotation assignments (n=740), the perceived experience in the APPE year (n=564), and the initial professional step the student is taking after graduation (also n=564). Outcomes are divided into broad categories of “Residency”, “Industry”, “Community/Hospital” (RICH).

AI-SiPS is implemented on the KNIME (ver. 4.5.1) platform. Using this platform, we examined 1) relationships of APPE rotation order with residency matching, 2) performance in didactic courses with subsequent pre-APPE confidence, and 3) pre-APPE confidence with RICH outcomes. Decision tree analysis was used to find data breakpoints in each relationship.

This analysis indicated: 1) an early Acute Care APPE led to a higher residency match rate than a late Acute Care APPE (70.2% vs. 58.0%), 2) students who performed well in certain therapeutic didactic courses, such as cardiology, had a higher chance of matching for residency than those who did not perform well (77.8% vs. 51.7%), and 3) students who felt confident about starting their APPEs had a higher chance of matching for residency than those who reported not feeling confident (77.1% vs. 66.7%). Earlier didactic courses focused on practical application of knowledge also showed an impact on self-perceived confidence. Lastly, APPE confidence may also relate to pursuit of a career in industry.

These results have been used to implement changes in our processes in the current academic year, including earlier identification of students with a desire to pursue residency. This information permits appropriate reorganization of their APPE rotation order, which might increase our successful residency match rate. Similar changes may be possible to support other career goals.

The AI-SiPS model represents our conceptual approach based on preliminary analyses of important data leading to RICH post-graduation outcomes. Future plans are to provide KNIME workflows to other institutions interested in using AI to support student success, including student recruitment and advisement that are consistent with the changing needs of society.

2:25-2:30 p.m.

Abstract: This paper explores how people’s beliefs change as a result of the COVID-19 health crisis using the so-called “mismatch” index in India as a case study. The mismatch index measures the difference between own beliefs and beliefs about others on the importance of taking precautions during COVID-19. The mismatch index represents the average of the disagreement in the personal beliefs of an individual and their beliefs of others. It is based on the responses to the questions “How important is it for you to take actions to prevent the spread of COVID-19 in your community?” (Q-Individual) and “How important do other people in your community think it is to take actions to prevent the spread of COVID-19?” (Q-Societal). The response is first discretized on a scale of 20 to 100 (with 20 meaning “not important at all” and 100 meaning “extremely important”) and the mismatch index is calculated as the average difference across respondents defined as Q-individual – Q-societal. We explore how the mismatch index varies with COVID-19 case rates in India across geographical and temporal zones and the factors causing these variations.

 Using MIT Covid Surveys as the primary data set, we support our conclusions with regression analysis, through which we find that the mismatch index increases in periods of high case rate increase. We also provide some qualitative evidence that the development level of a state or territory in India (determined by its average income and education level) plays an important role in shaping the mismatch index. The general trend noticed was that the mismatch index increased whenever the pandemic was at its ‘strongest’. We explore other characteristics such as behavior that contributes to the explanation of the index or some immutable time-invariant characteristics such as geography or culture demographics. The purpose of the state-fixed effects was to control for any time-invariant characteristics such as geography or immutable socio-demographic state characteristics. The purpose of introducing the time-fixed effect was to check whether the coefficient is still significant if the variation of different waves is taken away. If the value decreased, it would indicate that the time dimension did indeed play a role in establishing the relation between the mismatch index and case rates.

Currently, we are working on understanding how a change in beliefs has impacted behavior over different periods of time of the survey. We are then trying to use these beliefs as early predictors of health behaviors.

2:30-2:35 p.m.

Abstract: Approximately 50 million people worldwide are diagnosed with dementia, and an estimated 6.2 million Americans, one in nine people 65 and older, are living with Alzheimer's Disease (as of 2021). Most affected people do not obtain early screening toward a timely diagnosis. Consequently, there is a substantial and rapidly growing need for low-cost, non-invasive, and accessible tools for dementia screening toward alerting families and caregivers and encouraging them to pursue medical evaluation for potential patients. This work aims to develop a screening system for these patients. An iPad will take as input a person's speech and front-camera video feed response to standard ADRD dementia screening queries and returns an assessment based on machine learning models trained on a large relevant dataset. The app is intended for family members and caregivers and will be designed to be easy to use and encourage regular and easy screening. We have already developed a preliminary prototype using existing data on a single screening test and an associated machine-learning model. 

2:35 - 2:40 p.m.

Q&A

2:40-3:00 P.M.

Machine Learning in Substance Use and Treatment Contexts

2:40-2:47 p.m.

Abstract: Military veterans experience very high rates of hazardous drinking. Biopsychosocial models of alcohol use suggest many factors (i.e., psychological, environmental, and military-specific factors) might interact to influence hazardous drinking over time. It remains unknown which factors are most important in determining the course of hazardous drinking among veterans. The present study involves a sample of veterans who met criteria for hazardous drinking (n = 1047) and were surveyed six times over two years. Participants’ zip codes were used to link environment-level data from several sources. Structural equation modeling (SEM) forests, which combine SEM and computational approaches, will be employed to elucidate the most important predictors of hazardous drinking trajectories. Prototypical growth curve models will visually depict interactions among the most important predictors. Results will lend clarity on risk and protective factors to clinicians targeting hazardous drinking among veterans, and can inform future, hypothesis-driven research.

2:47-2:54 p.m.

Background:  In the United States, nearly half a million young adults aged 18 to 25 receive substance use treatment each year. Young adults in substance use treatment may be at greater risk for experiencing homelessness, but a comprehensive investigation of risk and protective factors for homelessness in this population has not been conducted. The current study uses traditional logistic regression and machine learning classification models to address this important public health question.

Methods:  Data come from 40,758 young adults (Mage = 21.4, SD = 2.4, 34.9% female, 62.6% Non-Hispanic White) receiving substance use treatment in the United States who completed a Global Appraisal of Individual Needs intake assessment at treatment entry. Risk and protective factors from previous literature were selected as predictor variables, and stepwise logistic regression, penalized (Lasso) logistic regression, and random forest classification models were used to identify significant correlates of homelessness in the year prior to treatment.

Results: Models correctly classified the past-year housing status (homeless or housed) of about two-thirds of all individuals (64.4% - 66.5%). Models correctly classified a greater proportion of homeless individuals than housed individuals (sensitivity: 73.7 - 76.4%; specificity: 60.8 - 64.4%). Demographic, familial, mental health, and behavioral variables were identified as important correlates of past-year homelessness, with some differences in variable importance across models. 

Conclusions: Commonly used classification models identified correlates of homelessness that are consistent with previous literature. Implications for the use of predictive modeling to identify individuals at greater risk of homelessness in clinical settings are discussed.

2:54 - 3:00 p.m.

Q&A

Break 3:00-3:10 p.m.

3:10-3:40 P.M.

Roundtable Discussion on Interdisciplinary AI at USC

3:10-3:40 p.m.

Roundtable Discussion on Interdisciplinary AI at USC

Eric Rice, Emilio Ferrara, Yolanda Gil, Jordan Davis, and Nathan Justin (moderator)

3:40-4:30 P.M.

Closing and Social

3:40 - 3:55 p.m.

Awards and Closing Remarks

Bistra Dilkina and Eric Rice

3:55 - 4:30 p.m.

Social