Working Papers

Prediction of Survival Outcomes under Clinical Presence Shift: A Joint Neural Network Architecture

Under review at Lifetime Data Analysis

Electronic health records arise from the complex interaction between patients and the healthcare system. This observation process of interactions, referred to as clinical presence, often impacts observed outcomes. When using electronic health records to develop clinical prediction models, it is standard practice to overlook clinical presence, impacting performance and limiting the transportability of models when this interaction evolves. We propose a multi-task recurrent neural network that jointly models the inter-observation time and the missingness processes characterising this interaction in parallel to the survival outcome of interest. Our work formalises the concept of clinical presence shift when the prediction model is deployed in new settings (e.g. different hospitals, regions or countries), and we theoretically justify why the proposed joint modelling can improve transportability under changes in clinical presence. We demonstrate, in a real-world mortality prediction task in the MIMIC-III dataset, how the proposed strategy improves performance and transportability compared to state-of-the-art prediction models that do not incorporate the observation process. These results emphasise the importance of leveraging clinical presence to improve performance and create more transportable clinical prediction models.

Recommended citation: Jeanselme, V., Martin, G., Peek, N., Sperrin, M., Tom, B., Barrett, J. Prediction of Survival Outcomes under Clinical Presence Shift: A Joint Neural Network Architecture. https://arxiv.org/abs/2508.05472

From Cox to Neural Networks: Flexible Modeling Improves Modeling of Post-Kidney Transplant Survival

Under review at Statistical Analysis and Data Mining

Accurate prediction of graft failure is critical to enhance patient care following transplantation. Traditional predictive models often focus on graft failure, overlooking other potential outcomes, known as competing risks, such as death with a functioning graft. This oversight theoretically biases risk estimates, yet the literature presents conflicting evidence on the gain associated with incorporating competing risks and leveraging flexible survival models. Our work compares a traditional Cox proportional hazards model with the Fine-Gray model, which accounts for competing risks, utilising simulation studies and real-world kidney transplant data from the United Network for Organ Sharing (UNOS). Additionally, we extend traditional methodologies with neural networks to assess the predictive gain associated with more flexible models while maintaining the same modeling assumptions. Our contributions include a detailed performance assessment between traditional and competing risks models, measuring predictive gains associated with neural networks, and developing a Python implementation for these models and associated evaluation metrics. Our findings demonstrate the importance of accounting for competing risks to improve risk estimation. These insights have substantial implications for improving patient prioritisation and transplantation management practices.

Recommended citation: Jeanselme, V., Defor, E., Bandyopadhyay, D., and Gupta, G. From Cox to Neural Networks: Flexible Modeling Improves Modeling of Post-Kidney Transplant Survival.

FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records

Under review at NeurIPS

Foundation models hold significant promise in healthcare, given their capacity to extract meaningful representations independent of downstream tasks. This property has enabled state-of-the-art performance across several clinical applications trained on structured electronic health record (EHR) data, even in settings with limited labeled data, a prevalent challenge in healthcare. However, there is little consensus on these models’ potential for clinical utility due to the lack of desiderata of comprehensive and meaningful tasks and sufficiently diverse evaluations to characterize the benefit over conventional supervised learning. To address this gap, we propose a suite of clinically meaningful tasks spanning patient outcomes, early prediction of acute and chronic conditions, including desiderata for robust evaluations. We evaluate state-of-the-art foundation models on EHR data consisting of 5 million patients from Columbia University Irving Medical Center (CUMC), a large urban academic medical center in New York City, across 14 clinically relevant tasks. We measure overall accuracy, calibration, and subpopulation performance to surface tradeoffs based on the choice of pre-training, tokenization, and data representation strategies. Our study aims to advance the empirical evaluation of structured EHR foundation models and guide the development of future healthcare foundation models.

Recommended citation: Pang*, C., Jeanselme*, V., Choi, Y., Jiang, X., Jing, Z., Kashyap, A., Kobayashi, Y., Li, Y., Pollet, F., Natarajan, K., and Joshi, S. FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records. https://arxiv.org/abs/2505.16941

Assessing the impact of variance heterogeneity and mis-specification in mixed-effects location-scale models

Under review at BMC Medical Research Methodology

Linear Mixed Model (LMM) is a common statistical approach to model the relation between exposure and outcome while capturing individual variability through random effects. However, this model assumes the homogeneity of the error term’s variance. Breaking this assumption, known as homoscedasticity, can bias estimates and, consequently, may change a study’s conclusions. If this assumption is unmet, the mixed-effect location-scale model (MELSM) offers a solution to account for within-individual variability. Our work explores how LMMs and MELSMs behave when the homoscedasticity assumption is not met. Further, we study how misspecification affects inference for MELSM. To this aim, we propose a simulation study with longitudinal data and evaluate the estimates’ bias and coverage. Our simulations show that neglecting heteroscedasticity in LMMs leads to loss of coverage for the estimated coefficients and biases the estimates of the standard deviations of the random effects. In MELSMs, scale misspecification does not bias the location model, but location misspecification alters the scale estimates. Our simulation study illustrates the importance of modelling heteroscedasticity, with potential implications beyond mixed effect models, for generalised linear mixed models for non-normal outcomes and joint models with survival data.

Recommended citation: Jeanselme, V., Palma, M., and Barrett, J. Assessing the impact of variance heterogeneity and mis-specification in mixed-effects location-scale models. https://arxiv.org/abs/2505.18038

Identifying treatment response subgroups in observational time-to-event data

Under review at NeurIPS

Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for subgroup analysis primarily focus on Randomised Controlled Trials (RCTs), in which treatment assignment is randomised. Furthermore, the patient cohort of an RCT is often constrained by cost, and is not representative of the heterogeneity of patients likely to receive treatment in real-world clinical practice. Therefore, when applied to observational studies, such approaches suffer from significant statistical biases because of the non-randomisation of treatment. Our work introduces a novel, outcome-guided method for identifying treatment response subgroups in observational studies. Our approach assigns each patient to a subgroup associated with two time-to-event distributions: one under treatment and one under control regime. It hence positions itself in between individualised and average treatment effect estimation. The assumptions of our model result in a simple correction of the statistical bias from treatment non-randomisation through inverse propensity weighting. In experiments, our approach significantly outperforms the current state-of-the-art method for outcome-guided subgroup analysis in both randomised and observational treatment regimes.

Recommended citation: Jeanselme, V., Yoon, C., Falck, F., Tom, B., and Barrett, J. Identifying treatment response subgroups in observational time-to-event data. https://www.arxiv.org/abs/2408.03463

Competing Risks: Impact on Risk Estimation and Algorithmic Fairness

To be submitted to Management Science

Accurate time-to-event prediction is integral to decision-making, informing medical guidelines, hiring decisions, and resource allocation. Survival analysis — the quantitative framework used to model time-to-event data — accounts for patients who do not experience the event of interest during the study period, known as censored patients. However, many patients experience events that prevent the observation of the outcome of interest. These competing risks are often treated as censoring, a practice frequently overlooked due to a limited understanding of its consequences. Our work theoretically demonstrates why treating competing risks as censoring introduces substantial bias in survival estimates, leading to systematic overestimation of risk and, critically, amplifying disparities. First, we formalize the problem of misclassifying competing risks as censoring and quantify the resulting error in survival estimates. Specifically, we develop a framework to estimate this error and demonstrate the associated implications for predictive performance and algorithmic fairness. Furthermore, we examine how differing risk profiles across demographic groups lead to group-specific errors, potentially exacerbating existing disparities. Our findings, supported by an empirical analysis of cardiovascular management, demonstrate that ignoring competing risks disproportionately impacts the individuals most at risk of these events, potentially accentuating inequity. By quantifying the error and highlighting the fairness implications of the common practice of considering competing risks as censoring, our work provides a critical insight into the development of survival models: practitioners must account for competing risks to improve accuracy, reduce disparities in risk assessment, and better inform downstream decisions.

Recommended citation: Jeanselme, V., Barrett, J., and Tom, B. Competing Risks: Impact on Risk Estimation and Algorithmic Fairness. https://arxiv.org/abs/2508.05435

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Under review at Management Information Systems Quarterly

Machine learning risks reinforcing biases present in data, and, as we argue in this work, in what is absent from data. In healthcare, biases have marked medical history, leading to unequal care affecting marginalised groups. Patterns in missing data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is often an overlooked preprocessing step, with attention placed on the reduction of reconstruction error and overall performance, ignoring how imputation can affect groups differently. Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, we theoretically demonstrate that the optimal choice between two common imputation strategies is under-determined, both in terms of group-specific imputation quality and of the gap in quality across groups. Particularly, the use of group-specific imputation strategies may counter-intuitively reduce data quality for marginalised group. We complement these theoretical results with simulations and real-world empirical evidence showing that imputation choices influence group-specific data quality and downstream algorithmic fairness, and that no imputation strategy consistently reduces group disparities in reconstruction error or predictions. Importantly, our results show that current practices may be detrimental to health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from an overlooked step of the machine learning pipeline.

Recommended citation: Jeanselme, V., De-Arteaga, M., Zhang, Z., Barrett, J., and Tom, B. Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness. https://arxiv.org/abs/2208.06648