Publications

Leveraging Expert Consistency to Improve Algorithmic Decision Support

Published in Management Science, 2025

Machine learning (ML) is increasingly being used to support high-stakes decisions, a trend owed in part to its promise of superior predictive power relative to human assessment. However, there is frequently a gap between decision objectives and what is captured in the observed outcomes used as labels to train ML models. As a result, machine learning models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. In this work, we explore the use of historical expert decisions as a rich – yet imperfect – source of information that is commonly available in organizational information systems, and show that it can be leveraged to bridge the gap between decision objectives and algorithm objectives. We consider the problem of estimating expert consistency indirectly when each case in the data is assessed by a single expert, and propose influence function-based methodology as a solution to this problem. We then incorporate the estimated expert consistency into a predictive model through a training-time label amalgamation approach. This approach allows ML models to learn from experts when there is inferred expert consistency, and from observed labels otherwise. We also propose alternative ways of leveraging inferred consistency via hybrid and deferral models. In our empirical evaluation, focused on the context of child maltreatment hotline screenings, we show that (1) there are high-risk cases whose risk is considered by the experts but not wholly captured in the target labels used to train a deployed model, and (2) the proposed approach significantly improves precision for these cases.

Code available on GitHub.

Recommended citation: De-Arteaga, M., Jeanselme, V., Dubrawski, A., Chouldechova, A. Leveraging Expert Consistency to Improve Algorithmic Decision Support. In Management Science. https://pubsonline.informs.org/doi/full/10.1287/mnsc.2022.01576

riSCC: A personalized risk model for the development of poor outcomes in cutaneous squamous cell carcinoma

Published in Journal of the American Academy of Dermatology, 2025

This paper introduces a novel prognostic model for cutaneous squamous cell carcinoma and compare its performance to commonly used staging systems. To help medical practitioners quantify risk, we release an app to compute riSCC.

Background Cutaneous squamous cell carcinoma (CSCC) is a prevalent disease for which improved risk stratification strategies are needed.

Objective To develop a novel prognostic model (herein “riSCC”) for CSCC and compare riSCC performance to Brigham and Women’s Hospital (BWH) and American Joint Committee on Cancer Staging 8th Edition (AJCC8) T staging systems.

Methods Retrospective 12-center, multinational cohort study of CSCCs from 1991 to 2023. Clinical and pathologic risk factors, treatments, and outcomes were collected. Fine-Gray model was employed for each outcome with inverse probability of treatment weighting. A final model was trained for prospective use and estimation of hazard ratios.

Results 23,166 localized CSCC tumors were included. riSCC prognostic model performed superiorly to American Joint Committee on Cancer 8th edition and Brigham and Women’s Hospital T staging for all outcomes. At five years, the C-index for riSCC ranged from 0.74 for LR to 0.87 for DSD.

Limitations Retrospective study design

Conclusions riSCC prognostic model offers fine-grained risk estimates and improved stratification for important CSCC outcomes compared to T staging systems.

Recommended citation: Jambusaria-Pahlajani, A.*, Jeanselme, V.*, Wang, D., Ran, N., Granger, E., Cañueto, J., Brodland, D., Carr, D., Carter, J., Carucci, J., Hirotsu, K., Karn, E., Koyfman, S., Mangold, A., Girardi, F., Shahwan, K., Srivastava, D., Vidimos, A., Willenbrink, T., Wysong, A., Lotter, W., Ruiz, E. (2025). riSCC: A personalized risk model for the development of poor outcomes in cutaneous squamous cell carcinoma. In Journal of the American Academy of Dermatology. https://www.jaad.org/article/S0190-9622(25)00373-1/abstract

Published in , 1900

Neural Fine-Gray

Published in Conference on Health, Inference, and Learning (CHIL), 2023

Time-to-event modelling, known as survival analysis, differs from standard regression as it addresses censoring in patients who do not experience the event of interest. Despite competitive performances in tackling this problem, machine learning methods often ignore other competing risks that preclude the event of interest. This practice biases the survival estimation. Extensions to address this challenge often rely on parametric assumptions or numerical estimations leading to sub-optimal survival approximations. This paper leverages constrained monotonic neural networks to model each competing survival distribution. This modelling choice ensures the exact likelihood maximisation at a reduced computational cost by using automatic differentiation. The effectiveness of the solution is demonstrated on one synthetic and three medical datasets. Finally, we discuss the implications of considering competing risks when developing risk scores for medical practice.

Code available on GitHub.

Recommended citation: Jeanselme, V., Yoon, C. H., Tom, B., Barrett, J. (2023, June). Neural Fine-Gray: Monotonic neural networks for competing risks. In Conference on Health, Inference, and Learning (pp. 379-392). PMLR. https://arxiv.org/abs/2305.06703

DeepJoint: Robust Survival Modelling Under Clinical Presence Shift

Published in NeurIPS Workshop TS4H, 2022

Observational data in medicine arise as a result of the complex interaction between patients and the healthcare system. The sampling process is often highly irregular and itself constitutes an informative process. When using such data to develop prediction models, this phenomenon is often ignored, leading to sub-optimal performance and generalisability of models when practices evolve. We propose a multi-task recurrent neural network which models three clinical presence dimensions – namely the longitudinal, the inter-observation and the missingness processes – in parallel to the survival outcome. On a prediction task using MIMIC III laboratory tests, explicit modelling of these three processes showed improved performance in comparison to state-of-the-art predictive models (C-index at 1 day horizon: 0.878). More importantly, the proposed approach was more robust to change in the clinical presence setting, demonstrated by performance comparison between patients admitted on weekdays and weekends. This analysis demonstrates the importance of studying and leveraging clinical presence to improve performance and create more transportable clinical models.

Code available on GitHub.

Recommended citation: Jeanselme, V., Martin, G., Peek, N., Sperrin, M., Tom, B., Barrett, J. (2022). DeepJoint: Robust Survival Modelling Under Clinical Presence Shift . In NeurIPS 2022 Workshop on Learning from Time Series for Health. https://arxiv.org/abs/2205.13481

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Published in Machine Learning for Health (ML4H), 2022

Biases have marked medical history, leading to unequal care affecting marginalised groups. The patterns of missingness in observational data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is too often an overlooked preprocessing step. When explicitly considered, attention is placed on overall performance, ignoring how this preprocessing can reinforce group-specific inequities. Our work questions this choice by studying how imputation affects downstream algorithmic fairness. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, through simulations and real-world experiments, we demonstrate that the imputation choice influences marginalised group performance and that no imputation strategy consistently reduces disparities. Importantly, our results show that current practices may endanger health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from a neglected step of the machine learning pipeline.

Code available on GitHub.

Recommended citation: Jeanselme, V., De-Arteaga, M., Zhang, Z., Barrett, J., Tom, B, (2022, November). Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness. In Machine Learning for Health (pp. 12-34). PMLR. https://proceedings.mlr.press/v193/jeanselme22a/jeanselme22a.pdf

Constrained clustering and multiple kernel learning without pairwise constraint relaxation

Published in Springer - Advances in Data Analysis and Classification, 2022

Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data.

Code available on GitHub.

Recommended citation: Boecking, B., Jeanselme, V., Dubrawski, A (2022). Constrained clustering and multiple kernel learning without pairwise constraint relaxation. In Advances in Data Analysis and Classification, 1-16. https://doi.org/10.1007/s11634-022-00507-5

Neural Survival Clustering: Non-parametric mixture of neural networks for survival clustering

Published in Conference on Health, Inference, and Learning (CHIL), 2022

Survival analysis involves the modelling of the times to event. Proposed neural network approaches maximise the predictive performance of traditional survival models at the cost of their interpretability. This impairs their applicability in high stake domains such as medicine. Providing insights into the survival distributions would tackle this issue and advance the medical understanding of diseases. This paper approaches survival analysis as a mixture of neural baselines whereby different baseline cumulative hazard functions are modelled using positive and monotone neural networks. The efficiency of the solution is demonstrated on three datasets while enabling the discovery of new survival phenotypes.

Code available on GitHub.

Recommended citation: Jeanselme, V., Tom, B., Barrett, J. (2022, April). Neural Survival Clustering: Non-parametric mixture of neural networks for survival clustering. In Conference on Health, Inference, and Learning (pp. 92-102). PMLR. https://proceedings.mlr.press/v174/jeanselme22a/jeanselme22a.pdf

Sex differences in post cardiac arrest discharge locations

Published in Resuscitation Plus, 2021

Background We explored sex-based differences in discharge location after resuscitation from cardiac arrest.

Methods We performed a single-center retrospective cohort study including patients hospitalized after resuscitation from cardiac arrest from January 2010 to May 2020. We identified patients from a prospective registry, from which we extracted standard demographic and clinical variables. We explored favorable discharge location, defined as discharge to home or acute rehabilitation for survivors to hospital discharge. We tested the association of sex with the residuals of a multivariable logistic regression built using bidirectional selection to control for clinically relevant covariates.

Results We included 2,278 patients. Mean age was 59 (SD 16), 40% were women, and 77% were admitted after out-of-hospital cardiac arrest. A total of 970 patients (43%) survived to discharge; of those, 607 (63% of survivors) had a favorable discharge location. Female sex showed a weak independent association with unfavorable discharge location (adjusted OR 0.94 (95%CI 0.89–0.99)).

Conclusions Our results suggest a possible sex-based disparity in discharge location after cardiac arrest.

Recommended citation: Jeanselme, V., De-Arteaga, M., Elmer, J., Perman, S. M., Dubrawski, A. (2021). Sex differences in post cardiac arrest discharge locations. In Resuscitation plus, 8, 100185. https://www.sciencedirect.com/science/article/pii/S2666520421001107

Deep Parametric Time-to-Event Regression with Time-Varying Covariates

Published in AAAI Spring Symposium on Survival Analysis, 2021

Time-to-event regression in healthcare and other domains, such as predictive maintenance, require working with time-series (or time-varying) data such as continuously monitored vital signs, electronic health records, or sensor readings. In such scenarios, the event-time distribution may have temporal dependencies at different time scales that are not easily captured by classical survival models that assume training data points to be independent. In this paper, we describe a fully parametric approach to model censored time-to-event outcomes with time varying covariates. It involves learning representations of the input temporal data using Recurrent Neural Networks such as LSTMs and GRUs, followed by describing the conditional event distribution as a fixed mixture of parametric distributions. The use of the recurrent neural networks allows the learned representations to model long-term dependencies in the input data while jointly estimating the Time-to-Event. We benchmark our approach on MIMIC III: a large, publicly available dataset collected from Intensive Care Unit (ICU) patients, focusing on predicting duration of their ICU stays and their short term life expectancy, and we demonstrate competitive performance of the proposed approach compared to established time-to-event regression models.

Code available on GitHub.

Recommended citation: Nagpal, C.*, Jeanselme, V.*, Dubrawski, A. (2021, May). Deep parametric time-to-event regression with time-varying covariates. In Survival Prediction-Algorithms, Challenges and Applications (pp. 184-193). PMLR. https://proceedings.mlr.press/v146/nagpal21a.html

Prediction of Hypotension Events with Physiologic Vital Sign Signatures in The Intensive Care Unit

Published in Critical Care, 2020

Background Even brief hypotension is associated with increased morbidity and mortality. We developed a machine learning model to predict the initial hypotension event among intensive care unit (ICU) patients and designed an alert system for bedside implementation.

Materials and methods From the Medical Information Mart for Intensive Care III (MIMIC-3) dataset, minute-by-minute vital signs were extracted. A hypotension event was defined as at least five measurements within a 10-min period of systolic blood pressure ≤ 90 mmHg and mean arterial pressure ≤ 60 mmHg. Using time series data from 30-min overlapping time windows, a random forest (RF) classifier was used to predict risk of hypotension every minute. Chronologically, the first half of extracted data was used to train the model, and the second half was used to validate the trained model. The model’s performance was measured with area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Hypotension alerts were generated using risk score time series, a stacked RF model. A lockout time were applied for real-life implementation.

Results We identified 1307 subjects (1580 ICU stays) as the hypotension group and 1619 subjects (2279 ICU stays) as the non-hypotension group. The RF model showed AUROC of 0.93 and 0.88 at 15 and 60 min, respectively, before hypotension, and AUPRC of 0.77 at 60 min before. Risk score trajectories revealed 80% and > 60% of hypotension predicted at 15 and 60 min before the hypotension, respectively. The stacked model with 15-min lockout produced on average 0.79 alerts/subject/hour (sensitivity 92.4%).

Conclusion Clinically significant hypotension events in the ICU can be predicted at least 1 h before the initial hypotension episode. With a highly sensitive and reliable practical alert system, a vast majority of future hypotension could be captured, suggesting potential real-life utility.

Recommended citation: Yoon, J. H.*, Jeanselme, V.*, Dubrawski, A., Hravnak, M., Pinsky, M. R., Clermont, G. (2020). Prediction of Hypotension Events with Physiologic Vital Sign Signatures in The Intensive Care Unit. In Critical Care, 24(1), 1-9. https://ccforum.biomedcentral.com/articles/10.1186/s13054-020-03379-3