TrustML4Health'24

IEEE Workshop on Trustworthy Machine Learning for Healthcare (TrustML4Health'24)

Held in conjunction with IEEE BigData 2024

Dec 18, 2024, Washington D.C., USA

Introduction

In recent years, machine learning and artificial intelligence have made remarkable strides in revolutionizing various industries, and healthcare stands as one of its most promising frontiers. However, the integration of AI into healthcare systems brings forth a myriad of challenges, particularly regarding trust, reliability, and ethical considerations. Ensuring the trustworthiness of AI systems in healthcare is paramount to their widespread acceptance and effective implementation. Patients, healthcare providers, and policymakers must have confidence in the AI-driven decisions that influence critical aspects of diagnosis, treatment, and patient care. The purpose of this workshop is to convene experts, researchers, practitioners, and stakeholders from the fields of AI and healthcare to explore strategies for developing, evaluating, and deploying trustworthy machine learning solutions in healthcare settings.

Through collaborative discussions, presentations, and interactive sessions, we aim to identify key challenges and opportunities in ensuring the trustworthiness of AI in healthcare, share insights and best practices, discuss ethical, regulatory, and societal implications, propose frameworks for risk assessment and mitigation, and foster interdisciplinary collaborations to promote responsible AI innovation in healthcare. This workshop will be of interest to researchers, practitioners, policymakers, healthcare professionals, ethicists, and industry representatives who are invested in harnessing the potential of AI to improve healthcare outcomes while addressing the ethical and societal implications of its deployment. Join us in this endeavor to shape the future of healthcare innovation while upholding the principles of trust, transparency, and accountability.

The workshop will be held online via the link provided below on December 18, 2024, at 1:00 PM U.S. Eastern Time.

https://baylor.zoom.us/j/87184999172?pwd=PcZoupD9zma1kCN3K1iEBhbUecvtNa.1

Meeting ID: 871 8499 9172

Passcode: 214066

Call for Papers

Important Dates:

Following are the proposed important dates for the workshop. All deadlines are due 11:59 pm Anywhere on Earth (AOE).

Paper submission: October 8, 2024
Notification of decision: November 15, 2024
Camera-ready due: November 23, 2024

Topics of Interest:

We encourage submissions in various degrees of progress, such as new results, visions, techniques, innovative application papers, and progress reports under the topics that include, but are not limited to, the following broad categories:

Interpretable AI methods for healthcare
Robustness of clinical AI methods
Medical knowledge grounded AI
Physician-in-the-loop AI
Causal Machine Learning for Clinical Trials
Security and Privacy in Medical AI
Fairness in AI for Healthcare
Interpretable Natural Language Processing for Healthcare
AI-Aided Clinical Decision Support
Healthcare under Large Language Models

And with particular focuses but not limited to these applications:

Personalized Treatment Recommendations
Early Disease Detection and Diagnosis
Clinical Trial Design Optimization
Healthcare Fraud Detection
Patient Monitoring and Predictive Analytics
Healthcare Resource Allocation

Submission Guidelines:

Submissions are limited to a total of 5 pages, including all content and references. There will be no page limit for supplemental materials. All submissions must be in PDF format and formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (two-column format).

Template guidelines are here: https://www.ieee.org/conferences/publishing/templates.html.

Following the IEEE BigData conference submission policy, reviews are single-blind. Submitted papers will be assessed based on their novelty, technical quality, potential impact, and clarity of writing. For papers that rely heavily on empirical evaluations, the experimental methods and results should be clear, well-executed, and repeatable. Authors are strongly encouraged to make data and code publicly available whenever possible. The accepted papers will be posted on the workshop website but will not be included in the IEEE BigData proceedings.

Submit your papers through the website: here.

Upon notification, we ask that authors of accepted works make any final changes and then submit a camera-ready version to the submission site. The workshop website will then be updated with links to accepted papers. Note that accepted works will not be formally published. This means that:

Authors can retain full copyright of their works.
Works in accepted papers by this workshop are not precluded from being published in other research venues.
Submitted papers are allowed to have significant overlap with previously published or currently submitted work (in this case, previously published papers are welcomed).

Any questions regarding submissions can be directed to xiao_shou@baylor.edu.

Accepted Papers

Adopting Trustworthy AI for Sleep Disorder Prediction: Deep Time Series Analysis with Temporal Attention Mechanism and Counterfactual Explanations

Pegah Ahadian, Wei Xu, Sherry Wang, and Qiang Guan
Abstract: Sleep disorders have a major impact on both lifestyle and health. Effective sleep disorder prediction from lifestyle and physiological data can provide essential details for early intervention. This research utilizes three deep time series models and facilitates them with explainability approaches for sleep disorder prediction. Specifically, our approach adopts Temporal Convolutional Networks (TCN), Long Short-Term Memory (LSTM) for time series data analysis, and Temporal Fusion Transformer model (TFT). Meanwhile, the temporal attention mechanism and counterfactual explanation with SHapley Additive exPlanations (SHAP) approach are employed to ensure dependable, accurate, and interpretable predictions. Finally, using a large dataset of sleep health measures, our evaluation demonstrates the effect of our method in predicting sleep disorders.

Evaluating Fairness of Mask R-CNN for Kidney Infection Detection based on Renal Scintigraphy

Jiayi Wang, Mingyan Wu, Yuhang Guo, Ha Wu, and Zhiyu Wan
Abstract: 99mTc-DMSA renal scan plays a crucial role in assessing functional abnormalities in the kidneys. A deep learning model, Mask R-CNN, showed much promise in diagnosing acute pyelonephritis, a type of kidney infection. This study evaluated the diagnostic performance and fairness of Mask R-CNN and Faster R-CNN using a 99mTc-DMSA renal dataset. The classification results showed that Mask R-CNN achieved an accuracy of 0.89, while Faster R-CNN reached an accuracy of 0.88. Both models demonstrated strong classification capabilities for kidney conditions. Furthermore, the analysis of fairness across sex and age groups indicated that neither model exhibited significant bias, thereby supporting their suitability for clinical applications. Future research should consider integrating more patient data to further enhance the diagnostic capabilities and fairness assessments of the models.

A Machine Learning and GIS-Based Approach for Mobile Health Clinic Placement: Identifying Factors for Successful Deployments in South Carolina

Shakhawat Tanim, MinJae Woo, and Lior Rennert
Abstract: Access to quality healthcare in medically underserved and rural areas remains a significant challenge, contributing to persistent health disparities and adverse health outcomes. Mobile Health Clinics (MHCs) offer a viable solution by delivering essential medical services directly to these hard-to-reach communities and during pandemic or epidemic situations. However, the success of MHCs depends significantly on their strategic placement and utilization. In this study, we describe a machine learning prediction model to optimize the placement of COVID-19 MHCs, aiming to identify key predictors that should be retained for improved prediction outcomes. Using an Extreme Gradient Boosting (XGBoost) regression model, we built a prediction model based on statewide datasets incorporating socio-demographic, geographic, and temporal variables relevant to MHC operations. The effectiveness of placements was evaluated by the number of vaccinations administered during a single visit. Key predictors included the number of households receiving public assistance, operational days and specific weekdays, site characteristics (e.g., locations within schools), regional distinctions (particularly the Upstate region), metropolitan status, weekend operations, proximity to highways and hospitals, and total population in the area. Findings reveal that the prediction model heavily relies on factors relevant to economically disadvantaged areas, accessible locations within metropolitan regions, strategic scheduling on weekends and specific weekdays, and leveraging trusted community institutions like schools. These insights provide valuable guidance for developers and healthcare providers on what are the key predictors to be retained for maintaining and improving prediction models for MHC effectiveness and success in underserved communities. Furthermore, the methodology developed in this research possibly offers a scalable framework for optimizing MHC placements in other regions facing similar healthcare delivery challenges. This study exemplifies the use of explainability in AI (XAI) in evidence-driven model development for optimized MHC deployment.

Predictive Modeling for Early Alzheimer's Disease Using Natural Language Processing

Azra Emekci
Abstract: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects millions worldwide, leading to cognitive decline and placing a substantial burden on families and healthcare systems. Early detection is crucial for managing the disease and improving patient outcomes, yet current diagnostic methods are often invasive, expensive, or insufficiently sensitive in the early stages. Utilizing data from the DementiaBank Pitt Corpus, we developed a machine learning model capable of differentiating between individuals with AD and healthy controls based on their text. The Multilayer Perceptron (MLP) model achieved an accuracy of 86.93%, with a precision of 0.87 for detecting AD cases and 0.84 for identifying non-AD cases. The recall rates were 0.83 for AD and 0.88 for non-AD, indicating the model’s robustness in correctly identifying both conditions. These results highlight the potential of NLP-based models as effective diagnostic tools, offering a non-invasive and cost- effective alternative to traditional methods. The implementation of such technology could revolutionize the early detection of AD, enabling timely interventions that may slow disease progression and improve the quality of life for patients. By integrating this model into clinical settings or remote health applications, we can enhance accessibility to diagnostic services, particularly in under served communities.

Examining Trustworthiness of LLM-as-a-Judge Systems in a Clinical Trial Design Benchmark

Corey Curran, Nafis Neehal, Keerthiram Murugesan, and Kristin P. Bennett
Abstract: Manual evaluation of Large Language Model (LLM) applications at scale presents significant resource challenges, making LLM-as-Judge (LaaJ) an attractive alternative. This study examines the reliability of LaaJ evaluation within CT- Bench, a benchmark for assessing LLMs’ capabilities in recommending clinical trial baseline features. LaaJ-alpha, our GPT- 4o based prototype, semantically matches LLM-recommended features against reference features from clinical trials, accounting for semantic equivalence (e.g., ‘BMI’ and ‘Body Mass Index’). The system generates matched pairs and unmatched features from both sources to calculate precision, recall, and F1 scores. Laaj-alpha evaluates baseline feature recommendations across CTBench CT-Pub (100 trials) and CT-Repo (1,690 trials) for comparing results for GPT-4o and Llama-3-70B-Instruct under zero-shot and three-shot settings. Coherence checking revealed hallucinations in LaaJ-alpha’s evaluation, necessitating a post- processing correction step that yielded lower but more accurate performance metrics. Three different types of hallucination were observed. The hallucination rate provides a quantifiable coherence metric that can be systematically used to improve LaaJ reliability. Our findings underscore the challenges in developing reliable LLM evaluation methods in healthcare applications and demonstrate a potential framework for improving LaaJ systems.

Workshop Schedule

December 18, 2024, 1:00-5:30 PM, U.S. Eastern Time.

The workshop will be held online via the link provided below on December 18, 2024, at 1:00 PM U.S. Eastern Time.

https://baylor.zoom.us/j/87184999172?pwd=PcZoupD9zma1kCN3K1iEBhbUecvtNa.1

Meeting ID: 871 8499 9172

Passcode: 214066

Invited Speakers

MinJae Woo, Clemson University

Short Bio: Dr. MinJae Woo is a multidisciplinary researcher whose research interests lie in Artificial Intelligence (AI) in healthcare, focusing on its application in predictive analytics and public health intelligence. He has been leading a multi-institutional team of health scientists and computer scientists towards one shared goal of revolutionizing health interventions through technology. He received a B.S. from the University of California - Los Angeles (UCLA) in Mathematics and Economics and his Ph.D. from the Clemson University - Medical University of South Carolina (joint) in Biomedical Data Science and Informatics. He is currently an Assistant Professor of Health Informatics in the Department of Public Health at Clemson University and Associated Research Faculty of Healthcare Innovations and Translational Informatics Laboratory at Emory School of Medicine.

Jinbo Bi, University of Connecticut

Short Bio: Dr. Jinbo Bi is Frederick H Leonhardt Professor of Computer Science and associate head of the Department of Computer Science & Engineering at the University of Connecticut. In her more than ten years of service at UConn, Dr. Bi has established a research program in the areas of machine learning and artificial intelligence and their application to medical diagnosis and treatment that has gained national and international recognition. In 2017, she was awarded the highly competitive MidCareer Independent Scientist Award from the National Institute on Drug Abuse and the National Institute on Alcohol Abuse and Alcoholism. Dr. Bi’s many professional leadership roles include serving as the General Chair of the 2019 IEEE International Conference on Bioinformatics and Biomedicine, advisor for NIH’s Institute for Alcohol Abuse and Addiction (NIAAA) strategic planning for innovation in machine learning and big data analytics, and a highlighted speaker at the International Behavioral and Neural Genetics Society. She also received the 2019 Women Innovators and Leaders Award from the Connecticut Technology Council and the 2019 Distinguished Woman in STEM Award from Bay Path University.

Eric Strobl, University of Pittsburgh

Short Bio: Dr. Eric Strobl is an Assistant Professor of Biomedical informatics at the University of Pittsburgh. He is also a physician-scientist practicing child and adolescent psychiatry. His research focuses on the development of causal discovery algorithms that solve fundamental problems in medicine. He has developed methods that (1) infer causal graphs from observational data under realistic assumptions, (2) enable causal discovery with feedback loops, (3) generalize clinical trials to the broader population and, more recently, (4) identify root causes of disease.

Kush R. Varshney, IBM Research

Short Bio: Dr. Kush R. Varshney was born in Syracuse, New York in 1982. He received the B.S. degree (magna cum laude) in electrical and computer engineering with honors from Cornell University, Ithaca, New York, in 2004. He received the S.M. degree in 2006 and the Ph.D. degree in 2010, both in electrical engineering and computer science at the Massachusetts Institute of Technology (MIT), Cambridge. While at MIT, he was a National Science Foundation Graduate Research Fellow.

Dr. Varshney is an IBM Fellow, based at the Thomas J. Watson Research Center, Yorktown Heights, NY, where he heads the Human-Centered Trustworthy AI team. He was a visiting scientist at IBM Research - Africa, Nairobi, Kenya in 2019. He applies data science and predictive analytics to human capital management, healthcare, olfaction, computational creativity, public affairs, international development, and algorithmic fairness, which has led to the Extraordinary IBM Research Technical Accomplishment for contributions to workforce innovation and enterprise transformation, IBM Corporate Technical Awards for Trustworthy AI and for AI-Powered Employee Journey, and the IEEE Signal Processing Society’s 2023 Industrial Innovation Award.

He and his team created several well-known open-source toolkits, including AI Fairness 360, AI Explainability 360, Uncertainty Quantification 360, and AI FactSheets 360. AI Fairness 360 has been recognized by the Harvard Kennedy School's Belfer Center as a tech spotlight runner-up and by the Falling Walls Science Symposium as a winning science and innovation management breakthrough.

He conducts academic research on the theory and methods of trustworthy machine learning. His work has been recognized through paper awards at the Fusion 2009, SOLI 2013, KDD 2014, and SDM 2015 conferences and the 2019 Computing Community Consortium / Schmidt Futures Computer Science for Social Good White Paper Competition. He independently-published a book entitled 'Trustworthy Machine Learning' in 2022, available at http://www.trustworthymachinelearning.com. He is a fellow of the IEEE.

Organizers

Xiao Shou, Baylor University Chen Zhao, Baylor University Kristin P. Bennett, RPI

Program Committee (Reviewers)

Sarat Chandra (GE Aerospace)
Shaif Chowdhury (Baylor University)
Saswata Paul (GE Aerospace)
Linlin Yu (University of Texas at Dallas)

Page updated

Google Sites

Report abuse