Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in Critical Care, 2020
Background Even brief hypotension is associated with increased morbidity and mortality. We developed a machine learning model to predict the initial hypotension event among intensive care unit (ICU) patients and designed an alert system for bedside implementation.
Materials and methods From the Medical Information Mart for Intensive Care III (MIMIC-3) dataset, minute-by-minute vital signs were extracted. A hypotension event was defined as at least five measurements within a 10-min period of systolic blood pressure ≤ 90 mmHg and mean arterial pressure ≤ 60 mmHg. Using time series data from 30-min overlapping time windows, a random forest (RF) classifier was used to predict risk of hypotension every minute. Chronologically, the first half of extracted data was used to train the model, and the second half was used to validate the trained model. The model’s performance was measured with area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Hypotension alerts were generated using risk score time series, a stacked RF model. A lockout time were applied for real-life implementation.
Results We identified 1307 subjects (1580 ICU stays) as the hypotension group and 1619 subjects (2279 ICU stays) as the non-hypotension group. The RF model showed AUROC of 0.93 and 0.88 at 15 and 60 min, respectively, before hypotension, and AUPRC of 0.77 at 60 min before. Risk score trajectories revealed 80% and > 60% of hypotension predicted at 15 and 60 min before the hypotension, respectively. The stacked model with 15-min lockout produced on average 0.79 alerts/subject/hour (sensitivity 92.4%).
Conclusion Clinically significant hypotension events in the ICU can be predicted at least 1 h before the initial hypotension episode. With a highly sensitive and reliable practical alert system, a vast majority of future hypotension could be captured, suggesting potential real-life utility.
Recommended citation: Yoon, J. H.*, Jeanselme, V.*, Dubrawski, A., Hravnak, M., Pinsky, M. R., Clermont, G. (2020). Prediction of Hypotension Events with Physiologic Vital Sign Signatures in The Intensive Care Unit. In Critical Care, 24(1), 1-9. https://ccforum.biomedcentral.com/articles/10.1186/s13054-020-03379-3
Published in AAAI Spring Symposium on Survival Analysis, 2021
Time-to-event regression in healthcare and other domains, such as predictive maintenance, require working with time-series (or time-varying) data such as continuously monitored vital signs, electronic health records, or sensor readings. In such scenarios, the event-time distribution may have temporal dependencies at different time scales that are not easily captured by classical survival models that assume training data points to be independent. In this paper, we describe a fully parametric approach to model censored time-to-event outcomes with time varying covariates. It involves learning representations of the input temporal data using Recurrent Neural Networks such as LSTMs and GRUs, followed by describing the conditional event distribution as a fixed mixture of parametric distributions. The use of the recurrent neural networks allows the learned representations to model long-term dependencies in the input data while jointly estimating the Time-to-Event. We benchmark our approach on MIMIC III: a large, publicly available dataset collected from Intensive Care Unit (ICU) patients, focusing on predicting duration of their ICU stays and their short term life expectancy, and we demonstrate competitive performance of the proposed approach compared to established time-to-event regression models.
Code available on GitHub.
Recommended citation: Nagpal, C.*, Jeanselme, V.*, Dubrawski, A. (2021, May). Deep parametric time-to-event regression with time-varying covariates. In Survival Prediction-Algorithms, Challenges and Applications (pp. 184-193). PMLR. https://proceedings.mlr.press/v146/nagpal21a.html
Published in Resuscitation Plus, 2021
Background We explored sex-based differences in discharge location after resuscitation from cardiac arrest.
Methods We performed a single-center retrospective cohort study including patients hospitalized after resuscitation from cardiac arrest from January 2010 to May 2020. We identified patients from a prospective registry, from which we extracted standard demographic and clinical variables. We explored favorable discharge location, defined as discharge to home or acute rehabilitation for survivors to hospital discharge. We tested the association of sex with the residuals of a multivariable logistic regression built using bidirectional selection to control for clinically relevant covariates.
Results We included 2,278 patients. Mean age was 59 (SD 16), 40% were women, and 77% were admitted after out-of-hospital cardiac arrest. A total of 970 patients (43%) survived to discharge; of those, 607 (63% of survivors) had a favorable discharge location. Female sex showed a weak independent association with unfavorable discharge location (adjusted OR 0.94 (95%CI 0.89–0.99)).
Conclusions Our results suggest a possible sex-based disparity in discharge location after cardiac arrest.
Recommended citation: Jeanselme, V., De-Arteaga, M., Elmer, J., Perman, S. M., Dubrawski, A. (2021). Sex differences in post cardiac arrest discharge locations. In Resuscitation plus, 8, 100185. https://www.sciencedirect.com/science/article/pii/S2666520421001107
Published in Conference on Health, Inference, and Learning (CHIL), 2022
Survival analysis involves the modelling of the times to event. Proposed neural network approaches maximise the predictive performance of traditional survival models at the cost of their interpretability. This impairs their applicability in high stake domains such as medicine. Providing insights into the survival distributions would tackle this issue and advance the medical understanding of diseases. This paper approaches survival analysis as a mixture of neural baselines whereby different baseline cumulative hazard functions are modelled using positive and monotone neural networks. The efficiency of the solution is demonstrated on three datasets while enabling the discovery of new survival phenotypes.
Code available on GitHub.
Recommended citation: Jeanselme, V., Tom, B., Barrett, J. (2022, April). Neural Survival Clustering: Non-parametric mixture of neural networks for survival clustering. In Conference on Health, Inference, and Learning (pp. 92-102). PMLR. https://proceedings.mlr.press/v174/jeanselme22a/jeanselme22a.pdf
Published in Springer - Advances in Data Analysis and Classification, 2022
Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data.
Code available on GitHub.
Recommended citation: Boecking, B., Jeanselme, V., Dubrawski, A (2022). Constrained clustering and multiple kernel learning without pairwise constraint relaxation. In Advances in Data Analysis and Classification, 1-16. https://doi.org/10.1007/s11634-022-00507-5
Published in Machine Learning for Health (ML4H), 2022
Biases have marked medical history, leading to unequal care affecting marginalised groups. The patterns of missingness in observational data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is too often an overlooked preprocessing step. When explicitly considered, attention is placed on overall performance, ignoring how this preprocessing can reinforce group-specific inequities. Our work questions this choice by studying how imputation affects downstream algorithmic fairness. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, through simulations and real-world experiments, we demonstrate that the imputation choice influences marginalised group performance and that no imputation strategy consistently reduces disparities. Importantly, our results show that current practices may endanger health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from a neglected step of the machine learning pipeline.
Code available on GitHub.
Recommended citation: Jeanselme, V., De-Arteaga, M., Zhang, Z., Barrett, J., Tom, B, (2022, November). Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness. In Machine Learning for Health (pp. 12-34). PMLR. https://proceedings.mlr.press/v193/jeanselme22a/jeanselme22a.pdf
Published in NeurIPS Workshop TS4H, 2022
Observational data in medicine arise as a result of the complex interaction between patients and the healthcare system. The sampling process is often highly irregular and itself constitutes an informative process. When using such data to develop prediction models, this phenomenon is often ignored, leading to sub-optimal performance and generalisability of models when practices evolve. We propose a multi-task recurrent neural network which models three clinical presence dimensions – namely the longitudinal, the inter-observation and the missingness processes – in parallel to the survival outcome. On a prediction task using MIMIC III laboratory tests, explicit modelling of these three processes showed improved performance in comparison to state-of-the-art predictive models (C-index at 1 day horizon: 0.878). More importantly, the proposed approach was more robust to change in the clinical presence setting, demonstrated by performance comparison between patients admitted on weekdays and weekends. This analysis demonstrates the importance of studying and leveraging clinical presence to improve performance and create more transportable clinical models.
Code available on GitHub.
Recommended citation: Jeanselme, V., Martin, G., Peek, N., Sperrin, M., Tom, B., Barrett, J. (2022). DeepJoint: Robust Survival Modelling Under Clinical Presence Shift . In NeurIPS 2022 Workshop on Learning from Time Series for Health. https://arxiv.org/abs/2205.13481
Published in Conference on Health, Inference, and Learning (CHIL), 2023
Time-to-event modelling, known as survival analysis, differs from standard regression as it addresses censoring in patients who do not experience the event of interest. Despite competitive performances in tackling this problem, machine learning methods often ignore other competing risks that preclude the event of interest. This practice biases the survival estimation. Extensions to address this challenge often rely on parametric assumptions or numerical estimations leading to sub-optimal survival approximations. This paper leverages constrained monotonic neural networks to model each competing survival distribution. This modelling choice ensures the exact likelihood maximisation at a reduced computational cost by using automatic differentiation. The effectiveness of the solution is demonstrated on one synthetic and three medical datasets. Finally, we discuss the implications of considering competing risks when developing risk scores for medical practice.
Code available on GitHub.
Recommended citation: Jeanselme, V., Yoon, C. H., Tom, B., Barrett, J. (2023, June). Neural Fine-Gray: Monotonic neural networks for competing risks. In Conference on Health, Inference, and Learning (pp. 379-392). PMLR. https://arxiv.org/abs/2305.06703
Published in , 1900
Published in Journal of the American Academy of Dermatology, 2025
This paper introduces a novel prognostic model for cutaneous squamous cell carcinoma and compare its performance to commonly used staging systems. To help medical practitioners quantify risk, we release an app to compute riSCC.
Background Cutaneous squamous cell carcinoma (CSCC) is a prevalent disease for which improved risk stratification strategies are needed.
Objective To develop a novel prognostic model (herein “riSCC”) for CSCC and compare riSCC performance to Brigham and Women’s Hospital (BWH) and American Joint Committee on Cancer Staging 8th Edition (AJCC8) T staging systems.
Methods Retrospective 12-center, multinational cohort study of CSCCs from 1991 to 2023. Clinical and pathologic risk factors, treatments, and outcomes were collected. Fine-Gray model was employed for each outcome with inverse probability of treatment weighting. A final model was trained for prospective use and estimation of hazard ratios.
Results 23,166 localized CSCC tumors were included. riSCC prognostic model performed superiorly to American Joint Committee on Cancer 8th edition and Brigham and Women’s Hospital T staging for all outcomes. At five years, the C-index for riSCC ranged from 0.74 for LR to 0.87 for DSD.
Limitations Retrospective study design
Conclusions riSCC prognostic model offers fine-grained risk estimates and improved stratification for important CSCC outcomes compared to T staging systems.
Recommended citation: Jambusaria-Pahlajani, A.*, Jeanselme, V.*, Wang, D., Ran, N., Granger, E., Cañueto, J., Brodland, D., Carr, D., Carter, J., Carucci, J., Hirotsu, K., Karn, E., Koyfman, S., Mangold, A., Girardi, F., Shahwan, K., Srivastava, D., Vidimos, A., Willenbrink, T., Wysong, A., Lotter, W., Ruiz, E. (2025). riSCC: A personalized risk model for the development of poor outcomes in cutaneous squamous cell carcinoma. In Journal of the American Academy of Dermatology. https://www.jaad.org/article/S0190-9622(25)00373-1/abstract
Published in Management Science, 2025
Machine learning (ML) is increasingly being used to support high-stakes decisions, a trend owed in part to its promise of superior predictive power relative to human assessment. However, there is frequently a gap between decision objectives and what is captured in the observed outcomes used as labels to train ML models. As a result, machine learning models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. In this work, we explore the use of historical expert decisions as a rich – yet imperfect – source of information that is commonly available in organizational information systems, and show that it can be leveraged to bridge the gap between decision objectives and algorithm objectives. We consider the problem of estimating expert consistency indirectly when each case in the data is assessed by a single expert, and propose influence function-based methodology as a solution to this problem. We then incorporate the estimated expert consistency into a predictive model through a training-time label amalgamation approach. This approach allows ML models to learn from experts when there is inferred expert consistency, and from observed labels otherwise. We also propose alternative ways of leveraging inferred consistency via hybrid and deferral models. In our empirical evaluation, focused on the context of child maltreatment hotline screenings, we show that (1) there are high-risk cases whose risk is considered by the experts but not wholly captured in the target labels used to train a deployed model, and (2) the proposed approach significantly improves precision for these cases.
Code available on GitHub.
Recommended citation: De-Arteaga, M., Jeanselme, V., Dubrawski, A., Chouldechova, A. Leveraging Expert Consistency to Improve Algorithmic Decision Support. In Management Science. https://pubsonline.informs.org/doi/full/10.1287/mnsc.2022.01576
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Under review at Management Science (Reject and Resubmit)
Machine learning risks reinforcing biases present in data, and, as we argue in this work, in what is absent from data. In healthcare, biases have marked medical history, leading to unequal care affecting marginalised groups. Patterns in missing data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is often an overlooked preprocessing step, with attention placed on the reduction of reconstruction error and overall performance, ignoring how imputation can affect groups differently. Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, we theoretically demonstrate that the optimal choice between two common imputation strategies is under-determined, both in terms of group-specific imputation quality and of the gap in quality across groups. Particularly, the use of group-specific imputation strategies may counter-intuitively reduce data quality for marginalised group. We complement these theoretical results with simulations and real-world empirical evidence showing that imputation choices influence group-specific data quality and downstream algorithmic fairness, and that no imputation strategy consistently reduces group disparities in reconstruction error or predictions. Importantly, our results show that current practices may be detrimental to health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from an overlooked step of the machine learning pipeline.
Recommended citation: Jeanselme, V., De-Arteaga, M., Zhang, Z., Barrett, J., and Tom, B. Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness. https://arxiv.org/abs/2208.06648
To be submitted to Management Science
Recommended citation: Jeanselme, V., Barrett, J., and Tom, B. Ignoring Competing Risks: Impact on Algorithmic Fairness.
Under review at NeurIPS
Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for subgroup analysis primarily focus on Randomised Controlled Trials (RCTs), in which treatment assignment is randomised. Furthermore, the patient cohort of an RCT is often constrained by cost, and is not representative of the heterogeneity of patients likely to receive treatment in real-world clinical practice. Therefore, when applied to observational studies, such approaches suffer from significant statistical biases because of the non-randomisation of treatment. Our work introduces a novel, outcome-guided method for identifying treatment response subgroups in observational studies. Our approach assigns each patient to a subgroup associated with two time-to-event distributions: one under treatment and one under control regime. It hence positions itself in between individualised and average treatment effect estimation. The assumptions of our model result in a simple correction of the statistical bias from treatment non-randomisation through inverse propensity weighting. In experiments, our approach significantly outperforms the current state-of-the-art method for outcome-guided subgroup analysis in both randomised and observational treatment regimes.
Recommended citation: Jeanselme, V., Yoon, C., Falck, F., Tom, B., and Barrett, J. Identifying treatment response subgroups in observational time-to-event data. https://www.arxiv.org/abs/2408.03463
Under review at MLHC
Survival analysis is a fundamental tool for modeling time-to-event outcomes in healthcare. Recent advances have introduced flexible neural network approaches for improved predictive performances. However, these models do not provide interpretable insights into the association between exposures and the modeled outcomes, a critical requirement for decision-making in clinical practice. To address this limitation, we propose Additive Deep Hazard Analysis Mixtures (ADHAM), an interpretable additive survival model. ADHAM assumes a conditional latent subpopulation structure that characterizes an individual, combined with covariate-specific hazard functions. To select the number of subpopulations, we introduce a post-training group refinement-based model-selection procedure; ie an efficient approach to merge similar clusters to reduce the number of repetitive latent subpopulations identified by the model. We perform comprehensive studies to demonstrate ADHAM’s interpretability on population, subpopulation, and individual levels. Extensive experiments on real-world datasets show that ADHAM provides novel insights into the association between exposures and outcomes. Further, ADHAM remains on par with existing state-of-the-art survival baselines, offering a scalable and interpretable approach to time-to-event prediction in healthcare.
Recommended citation: Ketenci, M., Jeanselme, V., Nieva, H., Joshi, S., and Elhadad, N. ADHAM: Additive deep hazard analysis mixtures for interpretable survival regression.
Under review at BMC Medical Research Methodology
Linear Mixed Model (LMM) is a common statistical approach to model the relation between exposure and outcome while capturing individual variability through random effects. However, this model assumes the homogeneity of the error term’s variance. Breaking this assumption, known as homoscedasticity, can bias estimates and, consequently, may change a study’s conclusions. If this assumption is unmet, the mixed-effect location-scale model (MELSM) offers a solution to account for within-individual variability. Our work explores how LMMs and MELSMs behave when the homoscedasticity assumption is not met. Further, we study how misspecification affects inference for MELSM. To this aim, we propose a simulation study with longitudinal data and evaluate the estimates’ bias and coverage. Our simulations show that neglecting heteroscedasticity in LMMs leads to loss of coverage for the estimated coefficients and biases the estimates of the standard deviations of the random effects. In MELSMs, scale misspecification does not bias the location model, but location misspecification alters the scale estimates. Our simulation study illustrates the importance of modelling heteroscedasticity, with potential implications beyond mixed effect models, for generalised linear mixed models for non-normal outcomes and joint models with survival data.
Recommended citation: Jeanselme, V., Palma, M., and Barrett, J. Assessing the impact of variance heterogeneity and mis-specification in mixed-effects location-scale models. https://arxiv.org/abs/2505.18038
Under review at NeurIPS
Foundation models hold significant promise in healthcare, given their capacity to extract meaningful representations independent of downstream tasks. This property has enabled state-of-the-art performance across several clinical applications trained on structured electronic health record (EHR) data, even in settings with limited labeled data, a prevalent challenge in healthcare. However, there is little consensus on these models’ potential for clinical utility due to the lack of desiderata of comprehensive and meaningful tasks and sufficiently diverse evaluations to characterize the benefit over conventional supervised learning. To address this gap, we propose a suite of clinically meaningful tasks spanning patient outcomes, early prediction of acute and chronic conditions, including desiderata for robust evaluations. We evaluate state-of-the-art foundation models on EHR data consisting of 5 million patients from Columbia University Irving Medical Center (CUMC), a large urban academic medical center in New York City, across 14 clinically relevant tasks. We measure overall accuracy, calibration, and subpopulation performance to surface tradeoffs based on the choice of pre-training, tokenization, and data representation strategies. Our study aims to advance the empirical evaluation of structured EHR foundation models and guide the development of future healthcare foundation models.
Recommended citation: Pang*, C., Jeanselme*, V., Choi, Y., Jiang, X., Jing, Z., Kashyap, A., Kobayashi, Y., Li, Y., Pollet, F., Natarajan, K., and Joshi, S. FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records. https://arxiv.org/abs/2505.16941
Under review at Statistical Analysis and Data Mining
Accurate prediction of graft failure is critical to enhance patient care following transplantation. Traditional predictive models often focus on graft failure, overlooking other potential outcomes, known as competing risks, such as death with a functioning graft. This oversight theoretically biases risk estimates, yet the literature presents conflicting evidence on the gain associated with incorporating competing risks and leveraging flexible survival models. Our work compares a traditional Cox proportional hazards model with the Fine-Gray model, which accounts for competing risks, utilising simulation studies and real-world kidney transplant data from the United Network for Organ Sharing (UNOS). Additionally, we extend traditional methodologies with neural networks to assess the predictive gain associated with more flexible models while maintaining the same modeling assumptions. Our contributions include a detailed performance assessment between traditional and competing risks models, measuring predictive gains associated with neural networks, and developing a Python implementation for these models and associated evaluation metrics. Our findings demonstrate the importance of accounting for competing risks to improve risk estimation. These insights have substantial implications for improving patient prioritisation and transplantation management practices.
Recommended citation: Jeanselme, V., Defor, E., Bandyopadhyay, D., and Gupta, G. From Cox to Neural Networks: Flexible Modeling Improves Modeling of Post-Kidney Transplant Survival.