Search | arXiv e-print repository

Probabilistic Causal Analysis of Social Influence

Authors: Francesco Bonchi, Francesco Gullo, Bud Mishra, Daniele Ramazzotti

Abstract: Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, i.e., homophily and other spurious causes. However, most studies to characterize social influence, and, in general, most data-science analyses focus on correlations, statistical independence, or conditional independence. Only recen… ▽ More Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, i.e., homophily and other spurious causes. However, most studies to characterize social influence, and, in general, most data-science analyses focus on correlations, statistical independence, or conditional independence. Only recently, there has been a resurgence of interest in "causal data science", e.g., grounded on causality theories. In this paper we adopt a principled causal approach to the analysis of social influence from information-propagation data, rooted in the theory of probabilistic causation. Our approach consists of two phases. In the first one, in order to avoid the pitfalls of misinterpreting causation when the data spans a mixture of several subtypes ("Simpson's paradox"), we partition the set of propagation traces into groups, in such a way that each group is as less contradictory as possible in terms of the hierarchical structure of information propagation. To achieve this goal, we borrow the notion of "agony" and define the Agony-bounded Partitioning problem, which we prove being hard, and for which we develop two efficient algorithms with approximation guarantees. In the second phase, for each group from the first phase, we apply a constrained MLE approach to ultimately learn a minimal causal topology. Experiments on synthetic data show that our method is able to retrieve the genuine causal arcs w.r.t. a ground-truth generative model. Experiments on real data show that, by focusing only on the extracted causal structures instead of the whole social graph, the effectiveness of predicting influence spread is significantly improved. △ Less

Submitted 29 August, 2018; v1 submitted 6 August, 2018; originally announced August 2018.

Journal ref: CIKM 18, October 22-26, 2018, Torino, Italy

arXiv:1808.02017 [pdf]

doi 10.1371/journal.pone.0212439

Withholding or withdrawing invasive interventions may not accelerate time to death among dying ICU patients

Authors: Daniele Ramazzotti, Peter Clardy, Leo Anthony Celi, David J. Stone, Robert S. Rudin

Abstract: We considered observational data available from the MIMIC-III open-access ICU database and collected within a study period between year 2002 up to 2011. If a patient had multiple admissions to the ICU during the 30 days before death, only the first stay was analyzed, leading to a final set of 6,436 unique ICU admissions during the study period. We tested two hypotheses: (i) administration of invas… ▽ More We considered observational data available from the MIMIC-III open-access ICU database and collected within a study period between year 2002 up to 2011. If a patient had multiple admissions to the ICU during the 30 days before death, only the first stay was analyzed, leading to a final set of 6,436 unique ICU admissions during the study period. We tested two hypotheses: (i) administration of invasive intervention during the ICU stay immediately preceding end-of-life would decrease over the study time period and (ii) time-to-death from ICU admission would also decrease, due to the decrease in invasive intervention administration. To investigate the latter hypothesis, we performed a subgroups analysis by considering patients with lowest and highest severity. To do so, we stratified the patients based on their SAPS I scores, and we considered patients within the first and the third tertiles of the score. We then assessed differences in trends within these groups between years 2002-05 vs. 2008-11. Comparing the period 2002-2005 vs. 2008-2011, we found a reduction in endotracheal ventilation among patients who died within 30 days of ICU admission (120.8 vs. 68.5 hours for the lowest severity patients, p<0.001; 47.7 vs. 46.0 hours for the highest severity patients, p=0.004). This is explained in part by an increase in the use of non-invasive ventilation. Comparing the period 2002-2005 vs. 2008-2011, we found a reduction in the use of vasopressors and inotropes among patients with the lowest severity who died within 30 days of ICU admission (41.8 vs. 36.2 hours, p<0.001) but not among those with the highest severity. Despite a reduction in the use of invasive interventions, we did not find a reduction in the time to death between 2002-2005 vs. 2008-2011 (7.8 days vs. 8.2 days for the lowest severity patients, p=0.32; 2.1 days vs. 2.0 days for the highest severity patients, p=0.74). △ Less

Submitted 29 January, 2019; v1 submitted 4 August, 2018; originally announced August 2018.

arXiv:1808.01345 [pdf, other]

Investigating the performance of multi-objective optimization when learning Bayesian Networks

Authors: Paolo Cazzaniga, Marco S. Nobile, Daniele Ramazzotti

Abstract: Bayesian Networks have been widely used in the last decades in many fields, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest that poses many challenges. On the one hand, it is a well-known NP-complete problem, practically hardened by the huge search space of possible solutions. On the… ▽ More Bayesian Networks have been widely used in the last decades in many fields, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest that poses many challenges. On the one hand, it is a well-known NP-complete problem, practically hardened by the huge search space of possible solutions. On the other hand, the phenomenon of I-equivalence, i.e., different graphical structures underpinning the same set of statistical dependencies, may lead to multimodal fitness landscapes further hindering maximum likelihood approaches to solve the task. In particular, we exploit the NSGA-II multi-objective optimization procedure in order to explicitly account for both the likelihood of a solution and the number of selected arcs, by setting these as the two objective functions of the method. The aim of this work is to investigate the behavior of NSGA-II and analyse the quality of its solutions. We thus thoroughly examined the optimization results obtained on a wide set of simulated data, by considering both the goodness of the inferred solutions in terms of the objective functions values achieved, and by comparing the retrieved structures with the ground truth, i.e., the networks used to generate the target data. Our results show that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic approaches, although paradoxically characterized in many cases by a lower similarity with the target network. △ Less

Submitted 20 July, 2021; v1 submitted 3 August, 2018; originally announced August 2018.

arXiv:1709.01076 [pdf, other]

Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

Authors: Daniele Ramazzotti, Alex Graudenzi, Luca De Sano, Marco Antoniotti, Giulio Caravagna

Abstract: Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a comput… ▽ More Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses. △ Less

Submitted 22 March, 2019; v1 submitted 4 September, 2017; originally announced September 2017.

arXiv:1706.02386 [pdf, other]

Learning the structure of Bayesian Networks via the bootstrap

Authors: Giulio Caravagna, Daniele Ramazzotti

Abstract: Learning the structure of dependencies among multiple random variables is a problem of considerable theoretical and practical interest. Within the context of Bayesian Networks, a practical and surprisingly successful solution to this learning problem is achieved by adopting score-functions optimisation schema, augmented with multiple restarts to avoid local optima. Yet, the conditions under which… ▽ More Learning the structure of dependencies among multiple random variables is a problem of considerable theoretical and practical interest. Within the context of Bayesian Networks, a practical and surprisingly successful solution to this learning problem is achieved by adopting score-functions optimisation schema, augmented with multiple restarts to avoid local optima. Yet, the conditions under which such strategies work well are poorly understood, and there are also some intrinsic limitations to learning the directionality of the interaction among the variables. Following an early intuition of Friedman and Koller, we propose to decouple the learning problem into two steps: first, we identify a partial ordering among input variables which constrains the structural learning problem, and then propose an effective bootstrap-based algorithm to simulate augmented data sets, and select the most important dependencies among the variables. By using several synthetic data sets, we show that our algorithm yields better recovery performance than the state of the art, increasing the chances of identifying a globally-optimal solution to the learning problem, and solving also well-known identifiability issues that affect the standard approach. We use our new algorithm to infer statistical dependencies between cancer driver somatic mutations detected by high-throughput genome sequencing data of multiple colorectal cancer patients. In this way, we also show how the proposed methods can shade new insights about cancer initiation, and progression. Code: https://github.com/caravagn/Bootstrap-based-Learning △ Less

Submitted 19 January, 2021; v1 submitted 7 June, 2017; originally announced June 2017.

arXiv:1705.03067 [pdf, other]

cyTRON and cyTRON/JS: two Cytoscape-based applications for the inference of cancer evolution models

Authors: Lucrezia Patruno, Edoardo Galimberti, Daniele Ramazzotti, Giulio Caravagna, Luca De Sano, Marco Antoniotti, Alex Graudenzi

Abstract: The increasing availability of sequencing data of cancer samples is fueling the development of algorithmic strategies to investigate tumor heterogeneity and infer reliable models of cancer evolution. We here build up on previous works on cancer progression inference from genomic alteration data, to deliver two distinct Cytoscape-based applications, which allow to produce, visualize and manipulate… ▽ More The increasing availability of sequencing data of cancer samples is fueling the development of algorithmic strategies to investigate tumor heterogeneity and infer reliable models of cancer evolution. We here build up on previous works on cancer progression inference from genomic alteration data, to deliver two distinct Cytoscape-based applications, which allow to produce, visualize and manipulate cancer evolution models, also by interacting with public genomic and proteomics databases. In particular, we here introduce cyTRON, a stand-alone Cytoscape app, and cyTRON/JS, a web application which employs the functionalities of Cytoscape/JS. cyTRON was developed in Java; the code is available at https://github.com/BIMIB-DISCo/cyTRON and on the Cytoscape App Store http://apps.cytoscape.org/apps/cytron. cyTRON/JS was developed in JavaScript and R; the source code of the tool is available at https://github.com/BIMIB-DISCo/cyTRON-js and the tool is accessible from https://bimib.disco.unimib.it/cytronjs/welcome. △ Less

Submitted 20 July, 2019; v1 submitted 8 May, 2017; originally announced May 2017.

arXiv:1704.08676 [pdf, other]

Learning the structure of Bayesian Networks: A quantitative assessment of the effect of different algorithmic schemes

Authors: Stefano Beretta, Mauro Castelli, Ivo Goncalves, Roberto Henriques, Daniele Ramazzotti

Abstract: One of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions, and by the fact that the problem is NP-hard. Hence, full enumeration of all the possible solutions is not always feasible and approximations are often required. However, to the best of our knowledge, a qua… ▽ More One of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions, and by the fact that the problem is NP-hard. Hence, full enumeration of all the possible solutions is not always feasible and approximations are often required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before. For this reason, in this work, we provide a detailed comparison of many different state-of-the-arts methods for structural learning on simulated data considering both BNs with discrete and continuous variables, and with different rates of noise in the data. In particular, we investigate the performance of different widespread scores and algorithmic approaches proposed for the inference and the statistical pitfalls within them. △ Less

Submitted 3 August, 2018; v1 submitted 27 April, 2017; originally announced April 2017.

arXiv:1703.07844 [pdf, other]

doi 10.1002/pmic.201700232

SIMLR: A Tool for Large-Scale Genomic Analyses by Multi-Kernel Learning

Authors: Bo Wang, Daniele Ramazzotti, Luca De Sano, Junjie Zhu, Emma Pierson, Serafim Batzoglou

Abstract: We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmar… ▽ More We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. Availability and Implementation SIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on http://bioconductor.org. △ Less

Submitted 18 January, 2018; v1 submitted 21 March, 2017; originally announced March 2017.

arXiv:1703.03076 [pdf, other]

doi 10.1016/j.jocs.2018.04.003

Causal Data Science for Financial Stress Testing

Authors: Gelin Gao, Bud Mishra, Daniele Ramazzotti

Abstract: The most recent financial upheavals have cast doubt on the adequacy of some of the conventional quantitative risk management strategies, such as VaR (Value at Risk), in many common situations. Consequently, there has been an increasing need for verisimilar financial stress testings, namely simulating and analyzing financial portfolios in extreme, albeit rare scenarios. Unlike conventional risk man… ▽ More The most recent financial upheavals have cast doubt on the adequacy of some of the conventional quantitative risk management strategies, such as VaR (Value at Risk), in many common situations. Consequently, there has been an increasing need for verisimilar financial stress testings, namely simulating and analyzing financial portfolios in extreme, albeit rare scenarios. Unlike conventional risk management which exploits statistical correlations among financial instruments, here we focus our analysis on the notion of probabilistic causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs); SBCNs are probabilistic graphical models that have many attractive features in terms of more accurate causal analysis for generating financial stress scenarios. In this paper, we present a novel approach for conducting stress testing of financial portfolios based on SBCNs in combination with classical machine learning classification tools. The resulting method is shown to be capable of correctly discovering the causal relationships among financial factors that affect the portfolios and thus, simulating stress testing scenarios with a higher accuracy and lower computational complexity than conventional Monte Carlo Simulations. △ Less

Submitted 14 April, 2018; v1 submitted 8 March, 2017; originally announced March 2017.

arXiv:1703.03074 [pdf, other]

Efficient computational strategies to learn the structure of probabilistic graphical models of cumulative phenomena

Authors: Daniele Ramazzotti, Marco S. Nobile, Marco Antoniotti, Alex Graudenzi

Abstract: Structural learning of Bayesian Networks (BNs) is a NP-hard problem, which is further complicated by many theoretical issues, such as the I-equivalence among different structures. In this work, we focus on a specific subclass of BNs, named Suppes-Bayes Causal Networks (SBCNs), which include specific structural constraints based on Suppes' probabilistic causation to efficiently model cumulative phe… ▽ More Structural learning of Bayesian Networks (BNs) is a NP-hard problem, which is further complicated by many theoretical issues, such as the I-equivalence among different structures. In this work, we focus on a specific subclass of BNs, named Suppes-Bayes Causal Networks (SBCNs), which include specific structural constraints based on Suppes' probabilistic causation to efficiently model cumulative phenomena. Here we compare the performance, via extensive simulations, of various state-of-the-art search strategies, such as local search techniques and Genetic Algorithms, as well as of distinct regularization methods. The assessment is performed on a large number of simulated datasets from topologies with distinct levels of complexity, various sample size and different rates of errors in the data. Among the main results, we show that the introduction of Suppes' constraints dramatically improve the inference accuracy, by reducing the solution space and providing a temporal ordering on the variables. We also report on trade-offs among different search techniques that can be efficiently employed in distinct experimental settings. This manuscript is an extended version of the paper "Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena" presented at the 2018 International Conference on Computational Science. △ Less

Submitted 23 October, 2018; v1 submitted 8 March, 2017; originally announced March 2017.

arXiv:1703.03041 [pdf, other]

doi 10.5220/0006064102170224

Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks

Authors: Stefano Beretta, Mauro Castelli, Ivo Goncalves, Ivan Merelli, Daniele Ramazzotti

Abstract: Gene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Fur… ▽ More Gene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Furthermore, when the goal is to study causal relationships within the network, tools capable of overcoming the limitations of correlation networks are required. In this work, we make use of Bayesian Graphical Models to attach this problem and, specifically, we perform a comparative study of different state-of-the-art heuristics, analyzing their performance in inferring the structure of the Bayesian Network from breast cancer data. △ Less

Submitted 8 March, 2017; originally announced March 2017.

arXiv:1703.03038 [pdf, other]

doi 10.1109/CIBCB.2016.7758109

Parallel Implementation of Efficient Search Schemes for the Inference of Cancer Progression Models

Authors: Daniele Ramazzotti, Marco S. Nobile, Paolo Cazzaniga, Giancarlo Mauri, Marco Antoniotti

Abstract: The emergence and development of cancer is a consequence of the accumulation over time of genomic mutations involving a specific set of genes, which provides the cancer clones with a functional selective advantage. In this work, we model the order of accumulation of such mutations during the progression, which eventually leads to the disease, by means of probabilistic graphic models, i.e., Bayesia… ▽ More The emergence and development of cancer is a consequence of the accumulation over time of genomic mutations involving a specific set of genes, which provides the cancer clones with a functional selective advantage. In this work, we model the order of accumulation of such mutations during the progression, which eventually leads to the disease, by means of probabilistic graphic models, i.e., Bayesian Networks (BNs). We investigate how to perform the task of learning the structure of such BNs, according to experimental evidence, adopting a global optimization meta-heuristics. In particular, in this work we rely on Genetic Algorithms, and to strongly reduce the execution time of the inference -- which can also involve multiple repetitions to collect statistically significant assessments of the data -- we distribute the calculations using both multi-threading and a multi-node architecture. The results show that our approach is characterized by good accuracy and specificity; we also demonstrate its feasibility, thanks to a 84x reduction of the overall execution time with respect to a traditional sequential implementation. △ Less

Submitted 8 March, 2017; originally announced March 2017.

arXiv:1602.07857 [pdf, other]

doi 10.1177/1176934318785167

Modeling cumulative biological phenomena with Suppes-Bayes Causal Networks

Authors: Daniele Ramazzotti, Alex Graudenzi, Giulio Caravagna, Marco Antoniotti

Abstract: Several diseases related to cell proliferation are characterized by the accumulation of somatic DNA changes, with respect to wildtype conditions. Cancer and HIV are two common examples of such diseases, where the mutational load in the cancerous/viral population increases over time. In these cases, selective pressures are often observed along with competition, cooperation and parasitism among dist… ▽ More Several diseases related to cell proliferation are characterized by the accumulation of somatic DNA changes, with respect to wildtype conditions. Cancer and HIV are two common examples of such diseases, where the mutational load in the cancerous/viral population increases over time. In these cases, selective pressures are often observed along with competition, cooperation and parasitism among distinct cellular clones. Recently, we presented a mathematical framework to model these phenomena, based on a combination of Bayesian inference and Suppes' theory of probabilistic causation, depicted in graphical structures dubbed Suppes-Bayes Causal Networks (SBCNs). SBCNs are generative probabilistic graphical models that recapitulate the potential ordering of accumulation of such DNA changes during the progression of the disease. Such models can be inferred from data by exploiting likelihood-based model-selection strategies with regularization. In this paper we discuss the theoretical foundations of our approach and we investigate in depth the influence on the model-selection task of: (i) the poset based on Suppes' theory and (ii) different regularization strategies. Furthermore, we provide an example of application of our framework to HIV genetic data highlighting the valuable insights provided by the inferred. △ Less

Submitted 4 July, 2018; v1 submitted 25 February, 2016; originally announced February 2016.

arXiv:1602.07614 [pdf]

A Model of Selective Advantage for the Efficient Inference of Cancer Clonal Evolution

Authors: Daniele Ramazzotti

Abstract: Recently, there has been a resurgence of interest in rigorous algorithms for the inference of cancer progression from genomic data. The motivations are manifold: (i) growing NGS and single cell data from cancer patients, (ii) need for novel Data Science and Machine Learning algorithms to infer models of cancer progression, and (iii) a desire to understand the temporal and heterogeneous structure o… ▽ More Recently, there has been a resurgence of interest in rigorous algorithms for the inference of cancer progression from genomic data. The motivations are manifold: (i) growing NGS and single cell data from cancer patients, (ii) need for novel Data Science and Machine Learning algorithms to infer models of cancer progression, and (iii) a desire to understand the temporal and heterogeneous structure of tumor to tame its progression by efficacious therapeutic intervention. This thesis presents a multi-disciplinary effort to model tumor progression involving successive accumulation of genetic alterations, each resulting populations manifesting themselves in a cancer phenotype. The framework presented in this work along with algorithms derived from it, represents a novel approach for inferring cancer progression, whose accuracy and convergence rates surpass the existing techniques. The approach derives its power from several fields including algorithms in machine learning, theory of causality and cancer biology. Furthermore, a modular pipeline to extract ensemble-level progression models from sequenced cancer genomes is proposed. The pipeline combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. Furthermore, the results are validated by synthetic data with realistic generative models, and empirically interpreted in the context of real cancer datasets; in the later case, biologically significant conclusions are also highlighted. Specifically, it demonstrates the pipeline's ability to reproduce much of the knowledge on colorectal cancer, as well as to suggest novel hypotheses. Lastly, it also proves that the proposed framework can be applied to reconstruct the evolutionary history of cancer clones in single patients, as illustrated by an example from clear cell renal carcinomas. △ Less

Submitted 15 February, 2016; originally announced February 2016.

Comments: Doctoral thesis, University of Milan

arXiv:1510.00552 [pdf, other]

doi 10.1007/s41060-016-0040-z

Exposing the Probabilistic Causal Structure of Discrimination

Authors: Francesco Bonchi, Sara Hajian, Bud Mishra, Daniele Ramazzotti

Abstract: Discrimination discovery from data is an important task aiming at identifying patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally-valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation-based, albeit, as it is well known, correlation does not imply caus… ▽ More Discrimination discovery from data is an important task aiming at identifying patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally-valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation-based, albeit, as it is well known, correlation does not imply causation. In this paper we take a principled causal approach to the data mining problem of discrimination detection in databases. Following Suppes' probabilistic causation theory, we define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which we dub Suppes-Bayes Causal Network (SBCN). Next, we develop a toolkit of methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism. Our experiments on real-world datasets confirm the inferential power of our approach in all these different tasks. △ Less

Submitted 8 March, 2017; v1 submitted 2 October, 2015; originally announced October 2015.

arXiv:1408.6032 [pdf, ps, other]

PMCE: efficient inference of expressive models of cancer evolution with high prognostic power

Authors: Fabrizio Angaroni, Kevin Chen, Chiara Damiani, Giulio Caravagna, Alex Graudenzi, Daniele Ramazzotti

Abstract: Motivation: Driver (epi)genomic alterations underlie the positive selection of cancer subpopulations, which promotes drug resistance and relapse. Even though substantial heterogeneity is witnessed in most cancer types, mutation accumulation patterns can be regularly found and can be exploited to reconstruct predictive models of cancer evolution. Yet, available methods cannot infer logical formulas… ▽ More Motivation: Driver (epi)genomic alterations underlie the positive selection of cancer subpopulations, which promotes drug resistance and relapse. Even though substantial heterogeneity is witnessed in most cancer types, mutation accumulation patterns can be regularly found and can be exploited to reconstruct predictive models of cancer evolution. Yet, available methods cannot infer logical formulas connecting events to represent alternative evolutionary routes or convergent evolution. Results: We introduce PMCE, an expressive framework that leverages mutational profiles from cross-sectional sequencing data to infer probabilistic graphical models of cancer evolution including arbitrary logical formulas, and which outperforms the state-of-the-art in terms of accuracy and robustness to noise, on simulations. The application of PMCE to 7866 samples from the TCGA database allows us to identify a highly significant correlation between the predicted evolutionary paths and the overall survival in 7 tumor types, proving that our approach can effectively stratify cancer patients in reliable risk groups. Availability: PMCE is freely available at https://github.com/BIMIB-DISCo/PMCE, in addition to the code to replicate all the analyses presented in the manuscript. Contacts: [email protected], [email protected]. △ Less

Submitted 1 October, 2021; v1 submitted 26 August, 2014; originally announced August 2014.

arXiv:1309.7692 [pdf, other]

doi 10.4204/EPTCS.130.11

A Model of Colonic Crypts using SBML Spatial

Authors: Daniele Ramazzotti, Carlo Maj, Marco Antoniotti

Abstract: The Spatial Processes package enables an explicit definition of a spatial environment on top of the normal dynamic modeling SBML capabilities. The possibility of an explicit representation of spatial dynamics increases the representation power of SBML. In this work we used those new SBML features to define an extensive model of colonic crypts composed of the main cellular types (from stem cells to… ▽ More The Spatial Processes package enables an explicit definition of a spatial environment on top of the normal dynamic modeling SBML capabilities. The possibility of an explicit representation of spatial dynamics increases the representation power of SBML. In this work we used those new SBML features to define an extensive model of colonic crypts composed of the main cellular types (from stem cells to fully differentiated cells), alongside their spatial dynamics. △ Less

Submitted 29 September, 2013; originally announced September 2013.

Comments: In Proceedings Wivace 2013, arXiv:1309.7122

Journal ref: EPTCS 130, 2013, pp. 74-78

Showing 1–17 of 17 results for author: Ramazzotti, D