Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas.

About me

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Markdown

Page not in menu

This is a page not in th emain menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Working Papers

Blog posts

Jupyter notebook markdown generator

Posts

Future Blog Post

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Prediction of Hypotension Events with Physiologic Vital Sign Signatures in The Intensive Care Unit

Published in Critical Care, 2020

Background Even brief hypotension is associated with increased morbidity and mortality. We developed a machine learning model to predict the initial hypotension event among intensive care unit (ICU) patients and designed an alert system for bedside implementation.

Materials and methods From the Medical Information Mart for Intensive Care III (MIMIC-3) dataset, minute-by-minute vital signs were extracted. A hypotension event was defined as at least five measurements within a 10-min period of systolic blood pressure ≤ 90 mmHg and mean arterial pressure ≤ 60 mmHg. Using time series data from 30-min overlapping time windows, a random forest (RF) classifier was used to predict risk of hypotension every minute. Chronologically, the first half of extracted data was used to train the model, and the second half was used to validate the trained model. The model’s performance was measured with area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Hypotension alerts were generated using risk score time series, a stacked RF model. A lockout time were applied for real-life implementation.

Results We identified 1307 subjects (1580 ICU stays) as the hypotension group and 1619 subjects (2279 ICU stays) as the non-hypotension group. The RF model showed AUROC of 0.93 and 0.88 at 15 and 60 min, respectively, before hypotension, and AUPRC of 0.77 at 60 min before. Risk score trajectories revealed 80% and > 60% of hypotension predicted at 15 and 60 min before the hypotension, respectively. The stacked model with 15-min lockout produced on average 0.79 alerts/subject/hour (sensitivity 92.4%).

Conclusion Clinically significant hypotension events in the ICU can be predicted at least 1 h before the initial hypotension episode. With a highly sensitive and reliable practical alert system, a vast majority of future hypotension could be captured, suggesting potential real-life utility.

Recommended citation: Yoon, J. H.*, Jeanselme, V.*, Dubrawski, A., Hravnak, M., Pinsky, M. R., Clermont, G. (2020). Prediction of Hypotension Events with Physiologic Vital Sign Signatures in The Intensive Care Unit. In Critical Care, 24(1), 1-9. https://ccforum.biomedcentral.com/articles/10.1186/s13054-020-03379-3

Deep Parametric Time-to-Event Regression with Time-Varying Covariates

Published in AAAI Spring Symposium on Survival Analysis, 2021

Time-to-event regression in healthcare and other domains, such as predictive maintenance, require working with time-series (or time-varying) data such as continuously monitored vital signs, electronic health records, or sensor readings. In such scenarios, the event-time distribution may have temporal dependencies at different time scales that are not easily captured by classical survival models that assume training data points to be independent. In this paper, we describe a fully parametric approach to model censored time-to-event outcomes with time varying covariates. It involves learning representations of the input temporal data using Recurrent Neural Networks such as LSTMs and GRUs, followed by describing the conditional event distribution as a fixed mixture of parametric distributions. The use of the recurrent neural networks allows the learned representations to model long-term dependencies in the input data while jointly estimating the Time-to-Event. We benchmark our approach on MIMIC III: a large, publicly available dataset collected from Intensive Care Unit (ICU) patients, focusing on predicting duration of their ICU stays and their short term life expectancy, and we demonstrate competitive performance of the proposed approach compared to established time-to-event regression models.

Code available on GitHub.

Recommended citation: Nagpal, C.*, Jeanselme, V.*, Dubrawski, A. (2021, May). Deep parametric time-to-event regression with time-varying covariates. In Survival Prediction-Algorithms, Challenges and Applications (pp. 184-193). PMLR. https://proceedings.mlr.press/v146/nagpal21a.html

Sex differences in post cardiac arrest discharge locations

Published in Resuscitation Plus, 2021

Background We explored sex-based differences in discharge location after resuscitation from cardiac arrest.

Methods We performed a single-center retrospective cohort study including patients hospitalized after resuscitation from cardiac arrest from January 2010 to May 2020. We identified patients from a prospective registry, from which we extracted standard demographic and clinical variables. We explored favorable discharge location, defined as discharge to home or acute rehabilitation for survivors to hospital discharge. We tested the association of sex with the residuals of a multivariable logistic regression built using bidirectional selection to control for clinically relevant covariates.

Results We included 2,278 patients. Mean age was 59 (SD 16), 40% were women, and 77% were admitted after out-of-hospital cardiac arrest. A total of 970 patients (43%) survived to discharge; of those, 607 (63% of survivors) had a favorable discharge location. Female sex showed a weak independent association with unfavorable discharge location (adjusted OR 0.94 (95%CI 0.89–0.99)).

Conclusions Our results suggest a possible sex-based disparity in discharge location after cardiac arrest.

Recommended citation: Jeanselme, V., De-Arteaga, M., Elmer, J., Perman, S. M., Dubrawski, A. (2021). Sex differences in post cardiac arrest discharge locations. In Resuscitation plus, 8, 100185. https://www.sciencedirect.com/science/article/pii/S2666520421001107

Neural Survival Clustering: Non-parametric mixture of neural networks for survival clustering

Published in Conference on Health, Inference, and Learning (CHIL), 2022

Survival analysis involves the modelling of the times to event. Proposed neural network approaches maximise the predictive performance of traditional survival models at the cost of their interpretability. This impairs their applicability in high stake domains such as medicine. Providing insights into the survival distributions would tackle this issue and advance the medical understanding of diseases. This paper approaches survival analysis as a mixture of neural baselines whereby different baseline cumulative hazard functions are modelled using positive and monotone neural networks. The efficiency of the solution is demonstrated on three datasets while enabling the discovery of new survival phenotypes.

Code available on GitHub.

Recommended citation: Jeanselme, V., Tom, B., Barrett, J. (2022, April). Neural Survival Clustering: Non-parametric mixture of neural networks for survival clustering. In Conference on Health, Inference, and Learning (pp. 92-102). PMLR. https://proceedings.mlr.press/v174/jeanselme22a/jeanselme22a.pdf

Constrained clustering and multiple kernel learning without pairwise constraint relaxation

Published in Springer - Advances in Data Analysis and Classification, 2022

Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data.

Code available on GitHub.

Recommended citation: Boecking, B., Jeanselme, V., Dubrawski, A (2022). Constrained clustering and multiple kernel learning without pairwise constraint relaxation. In Advances in Data Analysis and Classification, 1-16. https://doi.org/10.1007/s11634-022-00507-5

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Published in Machine Learning for Health (ML4H), 2022

Biases have marked medical history, leading to unequal care affecting marginalised groups. The patterns of missingness in observational data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is too often an overlooked preprocessing step. When explicitly considered, attention is placed on overall performance, ignoring how this preprocessing can reinforce group-specific inequities. Our work questions this choice by studying how imputation affects downstream algorithmic fairness. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, through simulations and real-world experiments, we demonstrate that the imputation choice influences marginalised group performance and that no imputation strategy consistently reduces disparities. Importantly, our results show that current practices may endanger health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from a neglected step of the machine learning pipeline.

Code available on GitHub.

Recommended citation: Jeanselme, V., De-Arteaga, M., Zhang, Z., Barrett, J., Tom, B, (2022, November). Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness. In Machine Learning for Health (pp. 12-34). PMLR. https://proceedings.mlr.press/v193/jeanselme22a/jeanselme22a.pdf

DeepJoint: Robust Survival Modelling Under Clinical Presence Shift

Published in NeurIPS Workshop TS4H, 2022

Observational data in medicine arise as a result of the complex interaction between patients and the healthcare system. The sampling process is often highly irregular and itself constitutes an informative process. When using such data to develop prediction models, this phenomenon is often ignored, leading to sub-optimal performance and generalisability of models when practices evolve. We propose a multi-task recurrent neural network which models three clinical presence dimensions – namely the longitudinal, the inter-observation and the missingness processes – in parallel to the survival outcome. On a prediction task using MIMIC III laboratory tests, explicit modelling of these three processes showed improved performance in comparison to state-of-the-art predictive models (C-index at 1 day horizon: 0.878). More importantly, the proposed approach was more robust to change in the clinical presence setting, demonstrated by performance comparison between patients admitted on weekdays and weekends. This analysis demonstrates the importance of studying and leveraging clinical presence to improve performance and create more transportable clinical models.

Code available on GitHub.

Recommended citation: Jeanselme, V., Martin, G., Peek, N., Sperrin, M., Tom, B., Barrett, J. (2022). DeepJoint: Robust Survival Modelling Under Clinical Presence Shift . In NeurIPS 2022 Workshop on Learning from Time Series for Health. https://arxiv.org/abs/2205.13481

Neural Fine-Gray

Published in Conference on Health, Inference, and Learning (CHIL), 2023

Time-to-event modelling, known as survival analysis, differs from standard regression as it addresses censoring in patients who do not experience the event of interest. Despite competitive performances in tackling this problem, machine learning methods often ignore other competing risks that preclude the event of interest. This practice biases the survival estimation. Extensions to address this challenge often rely on parametric assumptions or numerical estimations leading to sub-optimal survival approximations. This paper leverages constrained monotonic neural networks to model each competing survival distribution. This modelling choice ensures the exact likelihood maximisation at a reduced computational cost by using automatic differentiation. The effectiveness of the solution is demonstrated on one synthetic and three medical datasets. Finally, we discuss the implications of considering competing risks when developing risk scores for medical practice.

Code available on GitHub.

Recommended citation: Jeanselme, V., Yoon, C. H., Tom, B., Barrett, J. (2023, June). Neural Fine-Gray: Monotonic neural networks for competing risks. In Conference on Health, Inference, and Learning (pp. 379-392). PMLR. https://arxiv.org/abs/2305.06703

Review of Language Models for Survival Analysis

Published in AAAI 2024 Spring Symposium on Clinical Foundation Models, 2024

By learning statistical relations between words, Large Language Models (LLMs) have presented the capacity to capture meaningful representations for tasks beyond the ones they were trained for. LLMs widespread accessibility and flexibility have attracted interest among medical practitioners, leading to extensive exploration of their utility in medical prognostic and diagnostic applications. Our work reviews LLMs use for survival analysis, a statistical tool for estimating the time to an event of interest and, consequently, medical risk. We propose a classification of LLMs modelling strategies and adaptations to survival analysis, detailing their limitations and strengths. Due to the absence of standardised guidelines in the literature, we introduce a framework to assess the efficacy of diverse LLM strategies for survival analysis.

Code available on GitHub.

Recommended citation: Jeanselme, V., Agarwal, N. and Wang, C. (2024, May). Review of Language Models for Survival Analysis. In AAAI 2024 Spring Symposium on Clinical Foundation Models. https://openreview.net/pdf?id=ZLUsZ52ibx

ADHAM: Additive deep hazard analysis mixtures for interpretable survival regression

Published in Machine Learning for Healthcare 2025, 2025

Survival analysis is a fundamental tool for modeling time-to-event outcomes in healthcare. Recent advances have introduced flexible neural network approaches for improved predictive performance. However, most of these models do not provide interpretable insights into the association between exposures and the modeled outcomes, a critical requirement for decision-making in clinical practice. To address this limitation, we propose Additive Deep Hazard Analysis Mixtures (ADHAM), an interpretable additive survival model. ADHAM assumes a conditional latent structure that defines subgroups, each characterized by a combination of covariate-specific hazard functions. To select the number of subgroups, we introduce a post-training refinement by merging similar groups to reduce the number of equivalent latent subgroups. We perform comprehensive studies to demonstrate ADHAM’s interpretability at the population, subgroup, and individual levels. Extensive experiments on real-world datasets show that ADHAM provides novel insights into the association between exposures and outcomes. Further, ADHAM remains on par with existing state-of-the-art survival baselines in terms of predictive performance, offering a scalable and interpretable approach to time-to-event prediction in healthcare.

Code available on GitHub.

Recommended citation: Ketenci, M., Jeanselme, V., Nieva, H., Joshi, S., and Elhadad, N. ADHAM: Additive deep hazard analysis mixtures for interpretable survival regression.

riSCC: A personalized risk model for the development of poor outcomes in cutaneous squamous cell carcinoma

Published in Journal of the American Academy of Dermatology, 2025

This paper introduces a novel prognostic model for cutaneous squamous cell carcinoma and compare its performance to commonly used staging systems. To help medical practitioners quantify risk, we release an app to compute riSCC.

Background Cutaneous squamous cell carcinoma (CSCC) is a prevalent disease for which improved risk stratification strategies are needed.

Objective To develop a novel prognostic model (herein “riSCC”) for CSCC and compare riSCC performance to Brigham and Women’s Hospital (BWH) and American Joint Committee on Cancer Staging 8th Edition (AJCC8) T staging systems.

Methods Retrospective 12-center, multinational cohort study of CSCCs from 1991 to 2023. Clinical and pathologic risk factors, treatments, and outcomes were collected. Fine-Gray model was employed for each outcome with inverse probability of treatment weighting. A final model was trained for prospective use and estimation of hazard ratios.

Results 23,166 localized CSCC tumors were included. riSCC prognostic model performed superiorly to American Joint Committee on Cancer 8th edition and Brigham and Women’s Hospital T staging for all outcomes. At five years, the C-index for riSCC ranged from 0.74 for LR to 0.87 for DSD.

Limitations Retrospective study design

Conclusions riSCC prognostic model offers fine-grained risk estimates and improved stratification for important CSCC outcomes compared to T staging systems.

Recommended citation: Jambusaria-Pahlajani, A.*, Jeanselme, V.*, Wang, D., Ran, N., Granger, E., Cañueto, J., Brodland, D., Carr, D., Carter, J., Carucci, J., Hirotsu, K., Karn, E., Koyfman, S., Mangold, A., Girardi, F., Shahwan, K., Srivastava, D., Vidimos, A., Willenbrink, T., Wysong, A., Lotter, W., Ruiz, E. (2025). riSCC: A personalized risk model for the development of poor outcomes in cutaneous squamous cell carcinoma. In Journal of the American Academy of Dermatology. https://www.jaad.org/article/S0190-9622(25)00373-1/abstract

Leveraging Expert Consistency to Improve Algorithmic Decision Support

Published in Management Science, 2025

Machine learning (ML) is increasingly being used to support high-stakes decisions, a trend owed in part to its promise of superior predictive power relative to human assessment. However, there is frequently a gap between decision objectives and what is captured in the observed outcomes used as labels to train ML models. As a result, machine learning models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. In this work, we explore the use of historical expert decisions as a rich – yet imperfect – source of information that is commonly available in organizational information systems, and show that it can be leveraged to bridge the gap between decision objectives and algorithm objectives. We consider the problem of estimating expert consistency indirectly when each case in the data is assessed by a single expert, and propose influence function-based methodology as a solution to this problem. We then incorporate the estimated expert consistency into a predictive model through a training-time label amalgamation approach. This approach allows ML models to learn from experts when there is inferred expert consistency, and from observed labels otherwise. We also propose alternative ways of leveraging inferred consistency via hybrid and deferral models. In our empirical evaluation, focused on the context of child maltreatment hotline screenings, we show that (1) there are high-risk cases whose risk is considered by the experts but not wholly captured in the target labels used to train a deployed model, and (2) the proposed approach significantly improves precision for these cases.

Code available on GitHub.

Recommended citation: De-Arteaga, M., Jeanselme, V., Dubrawski, A., Chouldechova, A. Leveraging Expert Consistency to Improve Algorithmic Decision Support. In Management Science. https://pubsonline.informs.org/doi/full/10.1287/mnsc.2022.01576

talks

Recurrent Deep Survival Machines: Deep Survival Regression with Time-Varying Covariates

Published: March 24, 2021

Neural Survival Clustering: Non parametric mixture of neural networks for survival clustering

Published: April 07, 2022

Slides available
Recording available

Using observation processes to predict survival: A deep learning approach to joint modelling

Published: July 14, 2022

Slides available

Predictive modelling under clinical presence

Published: May 04, 2023

Neural Fine-Gray: Monotonic neural networks for competing risks

Published: June 22, 2023

Slides available
Recording available

Predictive Modelling Under Clinical Presence

Published: October 19, 2023

Slides available
Seminar series

Using Neural Networks For Survival Modelling

Published: June 18, 2024

Clinical Presence: Impact on Algorithmic Fairness

Published: November 21, 2024

Slides available

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Published: May 02, 2025

workings

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Under review at

Machine learning risks reinforcing biases present in data, and, as we argue in this work, in what is absent from data. In healthcare, biases have marked medical history, leading to unequal care affecting marginalised groups. Patterns in missing data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is often an overlooked preprocessing step, with attention placed on the reduction of reconstruction error and overall performance, ignoring how imputation can affect groups differently. Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, we theoretically demonstrate that the optimal choice between two common imputation strategies is under-determined, both in terms of group-specific imputation quality and of the gap in quality across groups. Particularly, the use of group-specific imputation strategies may counter-intuitively reduce data quality for marginalised group. We complement these theoretical results with simulations and real-world empirical evidence showing that imputation choices influence group-specific data quality and downstream algorithmic fairness, and that no imputation strategy consistently reduces group disparities in reconstruction error or predictions. Importantly, our results show that current practices may be detrimental to health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from an overlooked step of the machine learning pipeline.

Recommended citation: Jeanselme, V., De-Arteaga, M., Zhang, Z., Barrett, J., and Tom, B. Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness. https://arxiv.org/abs/2208.06648

Competing Risks: Impact on Risk Estimation and Algorithmic Fairness

To be submitted to

Accurate time-to-event prediction is integral to decision-making, informing medical guidelines, hiring decisions, and resource allocation. Survival analysis — the quantitative framework used to model time-to-event data — accounts for patients who do not experience the event of interest during the study period, known as censored patients. However, many patients experience events that prevent the observation of the outcome of interest. These competing risks are often treated as censoring, a practice frequently overlooked due to a limited understanding of its consequences. Our work theoretically demonstrates why treating competing risks as censoring introduces substantial bias in survival estimates, leading to systematic overestimation of risk and, critically, amplifying disparities. First, we formalize the problem of misclassifying competing risks as censoring and quantify the resulting error in survival estimates. Specifically, we develop a framework to estimate this error and demonstrate the associated implications for predictive performance and algorithmic fairness. Furthermore, we examine how differing risk profiles across demographic groups lead to group-specific errors, potentially exacerbating existing disparities. Our findings, supported by an empirical analysis of cardiovascular management, demonstrate that ignoring competing risks disproportionately impacts the individuals most at risk of these events, potentially accentuating inequity. By quantifying the error and highlighting the fairness implications of the common practice of considering competing risks as censoring, our work provides a critical insight into the development of survival models: practitioners must account for competing risks to improve accuracy, reduce disparities in risk assessment, and better inform downstream decisions.

Recommended citation: Jeanselme, V., Barrett, J., and Tom, B. Competing Risks: Impact on Risk Estimation and Algorithmic Fairness. https://arxiv.org/abs/2508.05435

Identifying treatment response subgroups in observational time-to-event data

Under review at NeurIPS

Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for subgroup analysis primarily focus on Randomised Controlled Trials (RCTs), in which treatment assignment is randomised. Furthermore, the patient cohort of an RCT is often constrained by cost, and is not representative of the heterogeneity of patients likely to receive treatment in real-world clinical practice. Therefore, when applied to observational studies, such approaches suffer from significant statistical biases because of the non-randomisation of treatment. Our work introduces a novel, outcome-guided method for identifying treatment response subgroups in observational studies. Our approach assigns each patient to a subgroup associated with two time-to-event distributions: one under treatment and one under control regime. It hence positions itself in between individualised and average treatment effect estimation. The assumptions of our model result in a simple correction of the statistical bias from treatment non-randomisation through inverse propensity weighting. In experiments, our approach significantly outperforms the current state-of-the-art method for outcome-guided subgroup analysis in both randomised and observational treatment regimes.

Recommended citation: Jeanselme, V., Yoon, C., Falck, F., Tom, B., and Barrett, J. Identifying treatment response subgroups in observational time-to-event data. https://www.arxiv.org/abs/2408.03463

Assessing the impact of variance heterogeneity and mis-specification in mixed-effects location-scale models

Under review at BMC Medical Research Methodology

Linear Mixed Model (LMM) is a common statistical approach to model the relation between exposure and outcome while capturing individual variability through random effects. However, this model assumes the homogeneity of the error term’s variance. Breaking this assumption, known as homoscedasticity, can bias estimates and, consequently, may change a study’s conclusions. If this assumption is unmet, the mixed-effect location-scale model (MELSM) offers a solution to account for within-individual variability. Our work explores how LMMs and MELSMs behave when the homoscedasticity assumption is not met. Further, we study how misspecification affects inference for MELSM. To this aim, we propose a simulation study with longitudinal data and evaluate the estimates’ bias and coverage. Our simulations show that neglecting heteroscedasticity in LMMs leads to loss of coverage for the estimated coefficients and biases the estimates of the standard deviations of the random effects. In MELSMs, scale misspecification does not bias the location model, but location misspecification alters the scale estimates. Our simulation study illustrates the importance of modelling heteroscedasticity, with potential implications beyond mixed effect models, for generalised linear mixed models for non-normal outcomes and joint models with survival data.

Recommended citation: Jeanselme, V., Palma, M., and Barrett, J. Assessing the impact of variance heterogeneity and mis-specification in mixed-effects location-scale models. https://arxiv.org/abs/2505.18038

FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records

Under review at NeurIPS

Foundation models hold significant promise in healthcare, given their capacity to extract meaningful representations independent of downstream tasks. This property has enabled state-of-the-art performance across several clinical applications trained on structured electronic health record (EHR) data, even in settings with limited labeled data, a prevalent challenge in healthcare. However, there is little consensus on these models’ potential for clinical utility due to the lack of desiderata of comprehensive and meaningful tasks and sufficiently diverse evaluations to characterize the benefit over conventional supervised learning. To address this gap, we propose a suite of clinically meaningful tasks spanning patient outcomes, early prediction of acute and chronic conditions, including desiderata for robust evaluations. We evaluate state-of-the-art foundation models on EHR data consisting of 5 million patients from Columbia University Irving Medical Center (CUMC), a large urban academic medical center in New York City, across 14 clinically relevant tasks. We measure overall accuracy, calibration, and subpopulation performance to surface tradeoffs based on the choice of pre-training, tokenization, and data representation strategies. Our study aims to advance the empirical evaluation of structured EHR foundation models and guide the development of future healthcare foundation models.

Recommended citation: Pang*, C., Jeanselme*, V., Choi, Y., Jiang, X., Jing, Z., Kashyap, A., Kobayashi, Y., Li, Y., Pollet, F., Natarajan, K., and Joshi, S. FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records. https://arxiv.org/abs/2505.16941

From Cox to Neural Networks: Flexible Modeling Improves Modeling of Post-Kidney Transplant Survival

Under review at Statistical Analysis and Data Mining

Accurate prediction of graft failure is critical to enhance patient care following transplantation. Traditional predictive models often focus on graft failure, overlooking other potential outcomes, known as competing risks, such as death with a functioning graft. This oversight theoretically biases risk estimates, yet the literature presents conflicting evidence on the gain associated with incorporating competing risks and leveraging flexible survival models. Our work compares a traditional Cox proportional hazards model with the Fine-Gray model, which accounts for competing risks, utilising simulation studies and real-world kidney transplant data from the United Network for Organ Sharing (UNOS). Additionally, we extend traditional methodologies with neural networks to assess the predictive gain associated with more flexible models while maintaining the same modeling assumptions. Our contributions include a detailed performance assessment between traditional and competing risks models, measuring predictive gains associated with neural networks, and developing a Python implementation for these models and associated evaluation metrics. Our findings demonstrate the importance of accounting for competing risks to improve risk estimation. These insights have substantial implications for improving patient prioritisation and transplantation management practices.

Recommended citation: Jeanselme, V., Defor, E., Bandyopadhyay, D., and Gupta, G. From Cox to Neural Networks: Flexible Modeling Improves Modeling of Post-Kidney Transplant Survival.

Prediction of Survival Outcomes under Clinical Presence Shift: A Joint Neural Network Architecture

Under review at Lifetime Data Analysis

Electronic health records arise from the complex interaction between patients and the healthcare system. This observation process of interactions, referred to as clinical presence, often impacts observed outcomes. When using electronic health records to develop clinical prediction models, it is standard practice to overlook clinical presence, impacting performance and limiting the transportability of models when this interaction evolves. We propose a multi-task recurrent neural network that jointly models the inter-observation time and the missingness processes characterising this interaction in parallel to the survival outcome of interest. Our work formalises the concept of clinical presence shift when the prediction model is deployed in new settings (e.g. different hospitals, regions or countries), and we theoretically justify why the proposed joint modelling can improve transportability under changes in clinical presence. We demonstrate, in a real-world mortality prediction task in the MIMIC-III dataset, how the proposed strategy improves performance and transportability compared to state-of-the-art prediction models that do not incorporate the observation process. These results emphasise the importance of leveraging clinical presence to improve performance and create more transportable clinical prediction models.

Recommended citation: Jeanselme, V., Martin, G., Peek, N., Sperrin, M., Tom, B., Barrett, J. Prediction of Survival Outcomes under Clinical Presence Shift: A Joint Neural Network Architecture. https://arxiv.org/abs/2508.05472

Vincent Jeanselme

Sitemap

Pages

Page Not Found

About me

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Working Papers

Blog posts

Jupyter notebook markdown generator

Posts

Future Blog Post

Blog Post number 4

Blog Post number 3

Blog Post number 2

Blog Post number 1

portfolio

Portfolio item number 1

Portfolio item number 2

publications

Prediction of Hypotension Events with Physiologic Vital Sign Signatures in The Intensive Care Unit

Deep Parametric Time-to-Event Regression with Time-Varying Covariates

Sex differences in post cardiac arrest discharge locations

Neural Survival Clustering: Non-parametric mixture of neural networks for survival clustering

Constrained clustering and multiple kernel learning without pairwise constraint relaxation

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

DeepJoint: Robust Survival Modelling Under Clinical Presence Shift

Neural Fine-Gray

Review of Language Models for Survival Analysis

ADHAM: Additive deep hazard analysis mixtures for interpretable survival regression

riSCC: A personalized risk model for the development of poor outcomes in cutaneous squamous cell carcinoma

Leveraging Expert Consistency to Improve Algorithmic Decision Support

talks

Recurrent Deep Survival Machines: Deep Survival Regression with Time-Varying Covariates

Neural Survival Clustering: Non parametric mixture of neural networks for survival clustering

Using observation processes to predict survival: A deep learning approach to joint modelling

Predictive modelling under clinical presence

Neural Fine-Gray: Monotonic neural networks for competing risks

Predictive Modelling Under Clinical Presence

Using Neural Networks For Survival Modelling

Clinical Presence: Impact on Algorithmic Fairness

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

workings

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Competing Risks: Impact on Risk Estimation and Algorithmic Fairness

Identifying treatment response subgroups in observational time-to-event data

Assessing the impact of variance heterogeneity and mis-specification in mixed-effects location-scale models

FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records

From Cox to Neural Networks: Flexible Modeling Improves Modeling of Post-Kidney Transplant Survival

Prediction of Survival Outcomes under Clinical Presence Shift: A Joint Neural Network Architecture