-
Genomics Data Analysis via Spectral Shape and Topology
Authors:
Erik J. Amézquita,
Farzana Nasrin,
Kathleen M. Storey,
Masato Yoshizawa
Abstract:
Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor…
▽ More
Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Utilizing gradient approximations to optimize data selection protocols for tumor growth model calibration
Authors:
Allison L. Lewis,
Kathleen M. Storey,
Heyrim Cho,
Anna C. Zittle
Abstract:
The use of mathematical models to make predictions about tumor growth and response to treatment has become increasingly more prevalent in the clinical setting. The level of complexity within these models ranges broadly, and the calibration of more complex models correspondingly requires more detailed clinical data. This raises questions about how much data should be collected and when, in order to…
▽ More
The use of mathematical models to make predictions about tumor growth and response to treatment has become increasingly more prevalent in the clinical setting. The level of complexity within these models ranges broadly, and the calibration of more complex models correspondingly requires more detailed clinical data. This raises questions about how much data should be collected and when, in order to minimize the total amount of data used and the time until a model can be calibrated accurately. To address these questions, we propose a Bayesian information-theoretic procedure, using a gradient-based score function to determine the optimal data collection times for model calibration. The novel score function introduced in this work eliminates the need for a weight parameter used in a previous study's score function, while still yielding accurate and efficient model calibration using even fewer scans on a sample set of synthetic data, simulating tumors of varying levels of radiosensitivity. We also conduct a robust analysis of the calibration accuracy and certainty, using both error and uncertainty metrics. Unlike the error analysis of the previous study, the inclusion of uncertainty analysis in this work|as a means for deciding when the algorithm can be terminated|provides a more realistic option for clinical decision-making, since it does not rely on data that will be collected later in time.
△ Less
Submitted 25 December, 2021;
originally announced December 2021.
-
Bayesian information-theoretic calibration of patient-specific radiotherapy sensitivity parameters for informing effective scanning protocols in cancer
Authors:
Heyrim Cho,
Allison L. Lewis,
Kathleen M. Storey
Abstract:
With new advancements in technology, it is now possible to collect data for a variety of different metrics describing tumor growth, including tumor volume, composition, and vascularity, among others. For any proposed model of tumor growth and treatment, we observe large variability among individual patients' parameter values, particularly those relating to treatment response; thus, exploiting the…
▽ More
With new advancements in technology, it is now possible to collect data for a variety of different metrics describing tumor growth, including tumor volume, composition, and vascularity, among others. For any proposed model of tumor growth and treatment, we observe large variability among individual patients' parameter values, particularly those relating to treatment response; thus, exploiting the use of these various metrics for model calibration can be helpful to infer such patient-specific parameters both accurately and early, so that treatment protocols can be adjusted mid-course for maximum efficacy. However, taking measurements can be costly and invasive, limiting clinicians to a sparse collection schedule. As such, the determination of optimal times and metrics for which to collect data in order to best inform proper treatment protocols could be of great assistance to clinicians. In this investigation, we employ a Bayesian information-theoretic calibration protocol for experimental design in order to identify the optimal times at which to collect data for informing treatment parameters. Within this procedure, data collection times are chosen sequentially to maximize the reduction in parameter uncertainty with each added measurement, ensuring that a budget of $n$ high-fidelity experimental measurements results in maximum information gain about the low-fidelity model parameter values. In addition to investigating the optimal temporal pattern for data collection, we also develop a framework for deciding which metrics should be utilized at each data collection point. We illustrate this framework with a variety of toy examples, each utilizing a radiotherapy treatment regimen. For each scenario, we analyze the dependence of the predictive power of the low-fidelity model upon the measurement budget.
△ Less
Submitted 5 September, 2020;
originally announced September 2020.
-
Spread of premalignant mutant clones and cancer initiation in multilayered tissue
Authors:
Jasmine Foo,
Einar Bjarki Gunnarsson,
Kevin Leder,
Kathleen Storey
Abstract:
Over 80% of human cancers originate from the epithelium, which covers the outer and inner surfaces of organs and blood vessels. In stratified epithelium, the bottom layers are occupied by stem and stem-like cells that continually divide and replenish the upper layers. In this work, we study the spread of premalignant mutant clones and cancer initiation in stratified epithelium using the biased vot…
▽ More
Over 80% of human cancers originate from the epithelium, which covers the outer and inner surfaces of organs and blood vessels. In stratified epithelium, the bottom layers are occupied by stem and stem-like cells that continually divide and replenish the upper layers. In this work, we study the spread of premalignant mutant clones and cancer initiation in stratified epithelium using the biased voter model on stacked two-dimensional lattices. Our main result is an estimate of the propagation speed of a premalignant mutant clone, which is asymptotically precise in the cancer-relevant weak-selection limit. We use our main result to study cancer initiation under a two-step mutational model of cancer, which includes computing the distributions of the time of cancer initiation and the size of the premalignant clone giving rise to cancer. Our work quantifies the effect of epithelial tissue thickness on the process of carcinogenesis, thereby contributing to an emerging understanding of the spatial evolutionary dynamics of cancer.
△ Less
Submitted 26 March, 2022; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Analyzing Collective Motion with Machine Learning and Topology
Authors:
Dhananjay Bhaskar,
Angelika Manhart,
Jesse Milzman,
John T. Nardini,
Kathleen Storey,
Chad M. Topaz,
Lori Ziegelmeier
Abstract:
We use topological data analysis and machine learning to study a seminal model of collective motion in biology [D'Orsogna et al., Phys. Rev. Lett. 96 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simula…
▽ More
We use topological data analysis and machine learning to study a seminal model of collective motion in biology [D'Orsogna et al., Phys. Rev. Lett. 96 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based in topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.
△ Less
Submitted 3 February, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.