Showing 1–2 of 2 results for author: Dockès, J

Search v0.5.6 released 2020-02-24

arXiv:2107.09947 [pdf, other]

cs.LG math.ST q-bio.QM

Preventing dataset shift from breaking machine-learning biomarkers

Authors: Jéroôme Dockès, Gaël Varoquaux, Jean-Baptiste Poline

Abstract: Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new indiv… ▽ More Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: GigaScience, BioMed Central, In press
arXiv:1806.01139 [pdf, other]

stat.ME cs.IR cs.LG

Text to brain: predicting the spatial distribution of neuroimaging observations from text reports

Authors: Jérôme Dockès, Demian Wassermann, Russell Poldrack, Fabian Suchanek, Bertrand Thirion, Gaël Varoquaux

Abstract: Despite the digital nature of magnetic resonance imaging, the resulting observations are most frequently reported and stored in text documents. There is a trove of information untapped in medical health records, case reports, and medical publications. In this paper, we propose to mine brain medical publications to learn the spatial distribution associated with anatomical terms. The problem is form… ▽ More Despite the digital nature of magnetic resonance imaging, the resulting observations are most frequently reported and stored in text documents. There is a trove of information untapped in medical health records, case reports, and medical publications. In this paper, we propose to mine brain medical publications to learn the spatial distribution associated with anatomical terms. The problem is formulated in terms of minimization of a risk on distributions which leads to a least-deviation cost function. An efficient algorithm in the dual then learns the mapping from documents to brain structures. Empirical results using coordinates extracted from the brain-imaging literature show that i) models must adapt to semantic variation in the terms used to describe a given anatomical structure, ii) voxel-wise parameterization leads to higher likelihood of locations reported in unseen documents, iii) least-deviation cost outperforms least-square. As a proof of concept for our method, we use our model of spatial distributions to predict the distribution of specific neurological conditions from text-only reports. △ Less

Submitted 28 June, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

Journal ref: MICCAI 2018 - 21st International Conference on Medical Image Computing and Computer Assisted Intervention, Sep 2018, Granada, Spain. pp.1-18, 2018

Search v0.5.6 released 2020-02-24