Search | arXiv e-print repository

arXiv:2503.10448 [pdf, other]

Estimating relapse time distribution from longitudinal biomarker trajectories using iterative regression and continuous time Markov processes

Authors: Alice Cleynen, Benoîte de Saporta, Amélie Vernay

Abstract: Biomarker measurements obtained by blood sampling are often used as a non-invasive means of monitoring tumour progression in cancer patients. Diseases evolve dynamically over time, and studying longitudinal observations of specific biomarkers can help to understand patients response to treatment and predict disease progression. We propose a novel iterative regression-based method to estimate chang… ▽ More Biomarker measurements obtained by blood sampling are often used as a non-invasive means of monitoring tumour progression in cancer patients. Diseases evolve dynamically over time, and studying longitudinal observations of specific biomarkers can help to understand patients response to treatment and predict disease progression. We propose a novel iterative regression-based method to estimate changes in patients status within a cohort that includes censored patients, and illustrate it on clinical data from myeloma cases. We formulate the relapse time estimation problem in the framework of Piecewise Deterministic Markov processes (PDMP), where the Euclidean component is a surrogate biomarker for patient state. This approach enables continuous-time estimation of the status-change dates, which in turn allows for accurate inference of the relapse time distribution. A key challenge lies in the partial observability of the process, a complexity that has been rarely addressed in previous studies. . We evaluate the performance of our procedure through a simulation study and compare it with different approaches. This work is a proof of concept on biomarker trajectories with simple behaviour, but our method can easily be extended to more complex dynamics. △ Less

Submitted 13 March, 2025; originally announced March 2025.

arXiv:2501.04120 [pdf, other]

Bridging Impulse Control of Piecewise Deterministic Markov Processes and Markov Decision Processes: Frameworks, Extensions, and Open Challenges

Authors: Alice Cleynen, Benoîte de Saporta, Orlane Rossini, Régis Sabbadin, Amélie Vernay

Abstract: Control theory plays a pivotal role in understanding and optimizing the behavior of complex dynamical systems across various scientific and engineering disciplines. Two key frameworks that have emerged for modeling and solving control problems in stochastic systems are piecewise deterministic Markov processes (PDMPs) and Markov decision processes (MDPs). Each framework has its unique strengths, an… ▽ More Control theory plays a pivotal role in understanding and optimizing the behavior of complex dynamical systems across various scientific and engineering disciplines. Two key frameworks that have emerged for modeling and solving control problems in stochastic systems are piecewise deterministic Markov processes (PDMPs) and Markov decision processes (MDPs). Each framework has its unique strengths, and their intersection offers promising opportunities for tackling a broad class of problems, particularly in the context of impulse controls and decision-making in complex systems. The relationship between PDMPs and MDPs is a natural subject of exploration, as embedding impulse control problems for PDMPs into the MDP framework could open new avenues for their analysis and resolution. Specifically, this integration would allow leveraging the computational and theoretical tools developed for MDPs to address the challenges inherent in PDMPs. On the other hand, PDMPs can offer a versatile and simple paradigm to model continuous time problems that are often described as discrete-time MDPs parametrized by complex transition kernels. This transformation has the potential to bridge the gap between the two frameworks, enabling solutions to previously intractable problems and expanding the scope of both fields. This paper presents a comprehensive review of two research domains, illustrated through a recurring medical example. The example is revisited and progressively formalized within the framework of thevarious concepts and objects introduced △ Less

Submitted 14 April, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

arXiv:1905.11779 [pdf, other]

Evaluation of mineralogy per geological layers by Approximate Bayesian Computation

Authors: Vianney Bruned, Alice Cleynen, André Mas, Sylvain Wlodarczyck

Abstract: We propose a new methodology to perform mineralogic inversion from wellbore logs based on a Bayesian linear regression model. Our method essentially relies on three steps. The first step makes use of Approximate Bayesian Computation (ABC) and selects from the Bayesian generator a set of candidates-volumes corresponding closely to the wellbore data responses. The second step gathers these candidate… ▽ More We propose a new methodology to perform mineralogic inversion from wellbore logs based on a Bayesian linear regression model. Our method essentially relies on three steps. The first step makes use of Approximate Bayesian Computation (ABC) and selects from the Bayesian generator a set of candidates-volumes corresponding closely to the wellbore data responses. The second step gathers these candidates through a density-based clustering algorithm. A mineral scenario is assigned to each cluster through direct mineralogical inversion, and we provide a confidence estimate for each lithological hypothesis. The advantage of this approach is to explore all possible mineralogy hypotheses that match the wellbore data. This pipeline is tested on both synthetic and real datasets. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1307.3146 [pdf, other]

Comparing change-point locations of independent profiles with application to gene annotation

Authors: Alice Cleynen, Stéphane Robin

Abstract: We are interested in the comparison of transcript boundaries from cells which originated in different environments. The goal is to assess whether this phenomenon, called differential splicing, is used to modify the transcription of the genome in response to stress factors. We address this question by comparing the change-points locations in the individual segmentation of each profile, which corres… ▽ More We are interested in the comparison of transcript boundaries from cells which originated in different environments. The goal is to assess whether this phenomenon, called differential splicing, is used to modify the transcription of the genome in response to stress factors. We address this question by comparing the change-points locations in the individual segmentation of each profile, which correspond to the RNA-Seq data for a gene in one growth condition. This requires the ability to evaluate the uncertainty of the change-point positions, and the work of Rigaill et. al. (2011) provides an appropriate framework in such case. Building on their approach, we propose two methods for the comparison of change-points, and illustrate our results on a dataset from the yeast specie. We show that the UTR boundaries are subject to differential splicing, while the intron boundaries are conserved in all profiles. Our approach is implemented in an R package called EBS which is available on the CRAN. △ Less

Submitted 11 July, 2013; originally announced July 2013.

MSC Class: 62F15; 62F25; 62P10; 92D20

arXiv:1306.4657 [pdf, other]

Finite state space non parametric Hidden Markov Models are in general identifiable

Authors: Elisabeth Gassiat, Alice Cleynen, Stéphane Robin

Abstract: In this paper, we prove that finite state space non parametric hidden Markov models are identifiable as soon as the transition matrix of the latent Markov chain has full rank and the emission probability distributions are linearly independent. We then propose several non parametric likelihood based estimation methods, which we apply to models used in applications. We finally show on examples that… ▽ More In this paper, we prove that finite state space non parametric hidden Markov models are identifiable as soon as the transition matrix of the latent Markov chain has full rank and the emission probability distributions are linearly independent. We then propose several non parametric likelihood based estimation methods, which we apply to models used in applications. We finally show on examples that the use of non parametric modeling and estimation may improve the classification performances. △ Less

Submitted 19 June, 2013; originally announced June 2013.

arXiv:1301.2534 [pdf, other]

Segmentation of the Poisson and negative binomial rate models: a penalized estimator

Authors: Alice Cleynen, Emilie Lebarbier

Abstract: We consider the segmentation problem of Poisson and negative binomial (i.e. overdispersed Poisson) rate distributions. In segmentation, an important issue remains the choice of the number of segments. To this end, we propose a penalized log-likelihood estimator where the penalty function is constructed in a non-asymptotic context following the works of L. Birgé and P. Massart. The resulting estima… ▽ More We consider the segmentation problem of Poisson and negative binomial (i.e. overdispersed Poisson) rate distributions. In segmentation, an important issue remains the choice of the number of segments. To this end, we propose a penalized log-likelihood estimator where the penalty function is constructed in a non-asymptotic context following the works of L. Birgé and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. The performances of our criterion is assessed using simulated and real datasets in the RNA-seq data analysis context. △ Less

Submitted 17 March, 2013; v1 submitted 11 January, 2013; originally announced January 2013.

MSC Class: 62G05; 62G07; 62P10

arXiv:1211.3210 [pdf, other]

Fast estimation of the ICL criterion for change-point detection problems with applications to Next-Generation Sequencing data

Authors: Alice Cleynen, The Minh Luong, Guillem Rigaill, Gregory Nuel

Abstract: In this paper, we consider the Integrated Completed Likelihood (ICL) as a useful criterion for estimating the number of changes in the underlying distribution of data in problems where detecting the precise location of these changes is the main goal. The exact computation of the ICL requires O(Kn2) operations (with K the number of segments and n the number of data-points) which is prohibitive in m… ▽ More In this paper, we consider the Integrated Completed Likelihood (ICL) as a useful criterion for estimating the number of changes in the underlying distribution of data in problems where detecting the precise location of these changes is the main goal. The exact computation of the ICL requires O(Kn2) operations (with K the number of segments and n the number of data-points) which is prohibitive in many practical situations with large sequences of data. We describe a framework to estimate the ICL with O(Kn) complexity. Our approach is general in the sense that it can accommodate any given model distribution. We checked the run-time and validity of our approach on simulated data and demonstrate its good performance when analyzing real Next-Generation Sequencing (NGS) data using a negative binomial model. △ Less

Submitted 1 July, 2013; v1 submitted 14 November, 2012; originally announced November 2012.

Comments: 15 pages, 8 figures

arXiv:1204.5564 [pdf, other]

Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data

Authors: Alice Cleynen, Michel Koskas, Emilie Lebarbier, Guillem Rigaill, Stephane Robin

Abstract: Genome annotation is an important issue in biology which has long been addressed with gene prediction methods and manual experiments requiring biological expertise. The expanding Next Generation Sequencing technologies and their enhanced precision allow a new approach to the domain: the segmentation of RNA-Seq data to determine gene boundaries. Because of its almost linear complexity, we propose t… ▽ More Genome annotation is an important issue in biology which has long been addressed with gene prediction methods and manual experiments requiring biological expertise. The expanding Next Generation Sequencing technologies and their enhanced precision allow a new approach to the domain: the segmentation of RNA-Seq data to determine gene boundaries. Because of its almost linear complexity, we propose to use the Pruned Dynamic Programming Algorithm, which performances had been acknowledged for CGH arrays, for Seq-experiment outputs. This requires the adaptation of the algorithm to the negative binomial distribution with which we model the data. We show that if the dispersion in the signal is known, the PDP algorithm can be used and we provide an estimator for this dispersion. We then propose to estimate the number of segments, which can be associated to coding or non-coding regions of the genome, using an oracle penalty. We illustrate the results of our approach on a real data-set and show its good performance. Our algorithm is available as an R package on the CRAN repository. △ Less

Submitted 1 July, 2013; v1 submitted 25 April, 2012; originally announced April 2012.

Showing 1–8 of 8 results for author: Cleynen, A