Search | arXiv e-print repository

Decision from Suboptimal Classifiers: Excess Risk Pre- and Post-Calibration

Authors: Alexandre Perez-Lebel, Gael Varoquaux, Sanmi Koyejo, Matthieu Doutreligne, Marine Le Morvan

Abstract: Probabilistic classifiers are central for making informed decisions under uncertainty. Based on the maximum expected utility principle, optimal decision rules can be derived using the posterior class probabilities and misclassification costs. Yet, in practice only learned approximations of the oracle posterior probabilities are available. In this work, we quantify the excess risk (a.k.a. regret) i… ▽ More Probabilistic classifiers are central for making informed decisions under uncertainty. Based on the maximum expected utility principle, optimal decision rules can be derived using the posterior class probabilities and misclassification costs. Yet, in practice only learned approximations of the oracle posterior probabilities are available. In this work, we quantify the excess risk (a.k.a. regret) incurred using approximate posterior probabilities in batch binary decision-making. We provide analytical expressions for miscalibration-induced regret ($R^{\mathrm{CL}}$), as well as tight and informative upper and lower bounds on the regret of calibrated classifiers ($R^{\mathrm{GL}}$). These expressions allow us to identify regimes where recalibration alone addresses most of the regret, and regimes where the regret is dominated by the grouping loss, which calls for post-training beyond recalibration. Crucially, both $R^{\mathrm{CL}}$ and $R^{\mathrm{GL}}$ can be estimated in practice using a calibration curve and a recent grouping loss estimator. On NLP experiments, we show that these quantities identify when the expected gain of more advanced post-training is worth the operational cost. Finally, we highlight the potential of multicalibration approaches as efficient alternatives to costlier fine-tuning approaches. △ Less

Submitted 23 March, 2025; originally announced March 2025.

Journal ref: Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) 2025, Mai Khao, Thailand. PMLR: Volume 258

arXiv:2308.01605 [pdf, other]

Causal thinking for decision making on Electronic Health Records: why and how

Authors: Matthieu Doutreligne, Tristan Struja, Judith Abecassis, Claire Morgand, Leo Anthony Celi, Gaël Varoquaux

Abstract: Accurate predictions, as with machine learning, may not suffice to provide optimal healthcare for every patient. Indeed, prediction can be driven by shortcuts in the data, such as racial biases. Causal thinking is needed for data-driven decisions. Here, we give an introduction to the key elements, focusing on routinely-collected data, electronic health records (EHRs) and claims data. Using such da… ▽ More Accurate predictions, as with machine learning, may not suffice to provide optimal healthcare for every patient. Indeed, prediction can be driven by shortcuts in the data, such as racial biases. Causal thinking is needed for data-driven decisions. Here, we give an introduction to the key elements, focusing on routinely-collected data, electronic health records (EHRs) and claims data. Using such data to assess the value of an intervention requires care: temporal dependencies and existing practices easily confound the causal effect. We present a step-by-step framework to help build valid decision making from real-life patient records by emulating a randomized trial before individualizing decisions, eg with machine learning. Our framework highlights the most important pitfalls and considerations in analysing EHRs or claims data to draw causal conclusions. We illustrate the various choices in studying the effect of albumin on sepsis mortality in the Medical Information Mart for Intensive Care database (MIMIC-IV). We study the impact of various choices at every step, from feature extraction to causal-estimator selection. In a tutorial spirit, the code and the data are openly available. △ Less

Submitted 11 December, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

arXiv:2302.07074 [pdf, other]

Good practices for clinical data warehouse implementation: a case study in France

Authors: Matthieu Doutreligne, Adeline Degremont, Pierre-Alain Jachiet, Antoine Lamer, Xavier Tannier

Abstract: Real World Data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern Clinical Data Warehouses (CDWs): governance, transparency,… ▽ More Real World Data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern Clinical Data Warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multi-level governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multi-centric data reuses as well as innovations in routine care. △ Less

Submitted 7 March, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: 16 pages

arXiv:2302.00370 [pdf, other]

How to select predictive models for causal inference?

Authors: Matthieu Doutreligne, Gaël Varoquaux

Abstract: As predictive models -- e.g., from machine learning -- give likely outcomes, they may be used to reason on the effect of an intervention, a causal-inference task. The increasing complexity of health data has opened the door to a plethora of models, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Here we highlight that classic machine-learni… ▽ More As predictive models -- e.g., from machine learning -- give likely outcomes, they may be used to reason on the effect of an intervention, a causal-inference task. The increasing complexity of health data has opened the door to a plethora of models, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Here we highlight that classic machine-learning model selection does not select the best outcome models for causal inference. Indeed, causal model selection should control both outcome errors for each individual, treated or not treated, whereas only one outcome is observed. Theoretically, simple risks used in machine learning do not control causal effects when treated and non-treated population differ too much. More elaborate risks build proxies of the causal error using ``nuisance'' re-weighting to compute it on the observed data. But does computing these nuisance adds noise to model selection? Drawing from an extensive empirical study, we outline a good causal model-selection procedure: using the so-called $R\text{-risk}$; using flexible estimators to compute the nuisance models on the train set; and splitting out 10\% of the data to compute risks. △ Less

Submitted 16 May, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

Comments: 35 pages

arXiv:1903.07879 [pdf]

Hybrid Approaches for our Participation to the n2c2 Challenge on Cohort Selection for Clinical Trials

Authors: Xavier Tannier, Nicolas Paris, Hugo Cisneros, Christel Daniel, Matthieu Doutreligne, Catherine Duclos, Nicolas Griffon, Claire Hassen-Khodja, Ivan Lerner, Adrien Parrot, Éric Sadou, Cyrina Saussol, Pascal Vaillant

Abstract: Objective: Natural language processing can help minimize human intervention in identifying patients meeting eligibility criteria for clinical trials, but there is still a long way to go to obtain a general and systematic approach that is useful for researchers. We describe two methods taking a step in this direction and present their results obtained during the n2c2 challenge on cohort selection f… ▽ More Objective: Natural language processing can help minimize human intervention in identifying patients meeting eligibility criteria for clinical trials, but there is still a long way to go to obtain a general and systematic approach that is useful for researchers. We describe two methods taking a step in this direction and present their results obtained during the n2c2 challenge on cohort selection for clinical trials. Materials and Methods: The first method is a weakly supervised method using an unlabeled corpus (MIMIC) to build a silver standard, by producing semi-automatically a small and very precise set of rules to detect some samples of positive and negative patients. This silver standard is then used to train a traditional supervised model. The second method is a terminology-based approach where a medical expert selects the appropriate concepts, and a procedure is defined to search the terms and check the structural or temporal constraints. Results: On the n2c2 dataset containing annotated data about 13 selection criteria on 288 patients, we obtained an overall F1-measure of 0.8969, which is the third best result out of 45 participant teams, with no statistically significant difference with the best-ranked team. Discussion: Both approaches obtained very encouraging results and apply to different types of criteria. The weakly supervised method requires explicit descriptions of positive and negative examples in some reports. The terminology-based method is very efficient when medical concepts carry most of the relevant information. Conclusion: It is unlikely that much more annotated data will be soon available for the task of identifying a wide range of patient phenotypes. One must focus on weakly or non-supervised learning methods using both structured and unstructured data and relying on a comprehensive representation of the patients. △ Less

Submitted 9 December, 2020; v1 submitted 19 March, 2019; originally announced March 2019.

Comments: 15 pages

Showing 1–5 of 5 results for author: Doutreligne, M