-
Statistical Analysis of Quantitative Cancer Imaging Data
Authors:
Shariq Mohammed,
Maria Masotti,
Nathaniel Osher,
Satwik Acharyya,
Veerabhadran Baladandayuthapani
Abstract:
Recent advances in types and extent of medical imaging technologies has led to proliferation of multimodal quantitative imaging data in cancer. Quantitative medical imaging data refer to numerical representations derived from medical imaging technologies, such as radiology and pathology imaging, that can be used to assess and quantify characteristics of diseases, especially cancer. The use of such…
▽ More
Recent advances in types and extent of medical imaging technologies has led to proliferation of multimodal quantitative imaging data in cancer. Quantitative medical imaging data refer to numerical representations derived from medical imaging technologies, such as radiology and pathology imaging, that can be used to assess and quantify characteristics of diseases, especially cancer. The use of such data in both clinical and research setting enables precise quantifications and analyses of tumor characteristics that can facilitate objective evaluation of disease progression, response to therapy, and prognosis. The scale and size of these imaging biomarkers is vast and presents several analytical and computational challenges that range from high-dimensionality to complex structural correlation patterns. In this review article, we summarize some state-of-the-art statistical methods developed for quantitative medical imaging data ranging from topological, functional and shape data analyses to spatial process models. We delve into common imaging biomarkers with a focus on radiology and pathology imaging in cancer, address the analytical questions and challenges they present, and highlight the innovative statistical and machine learning models that have been developed to answer relevant scientific and clinical questions. We also outline some emerging and open problems in this area for future explorations.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Bayesian Hierarchical Modeling on Covariance Valued Data
Authors:
Satwik Acharyya,
Zhengwu Zhang,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
Analysis of structural and functional connectivity (FC) of human brains is of pivotal importance for diagnosis of cognitive ability. The Human Connectome Project (HCP) provides an excellent source of neural data across different regions of interest (ROIs) of the living human brain. Individual specific data were available from an existing analysis (Dai et al., 2017) in the form of time varying cova…
▽ More
Analysis of structural and functional connectivity (FC) of human brains is of pivotal importance for diagnosis of cognitive ability. The Human Connectome Project (HCP) provides an excellent source of neural data across different regions of interest (ROIs) of the living human brain. Individual specific data were available from an existing analysis (Dai et al., 2017) in the form of time varying covariance matrices representing the brain activity as the subjects perform a specific task. As a preliminary objective of studying the heterogeneity of brain connectomics across the population, we develop a probabilistic model for a sample of covariance matrices using a scaled Wishart distribution. We stress here that our data units are available in the form of covariance matrices, and we use the Wishart distribution to create our likelihood function rather than its more common usage as a prior on covariance matrices. Based on empirical explorations suggesting the data matrices to have low effective rank, we further model the center of the Wishart distribution using an orthogonal factor model type decomposition. We encourage shrinkage towards a low rank structure through a novel shrinkage prior and discuss strategies to sample from the posterior distribution using a combination of Gibbs and slice sampling. We extend our modeling framework to a dynamic setting to detect change points. The efficacy of the approach is explored in various simulation settings and exemplified on several case studies including our motivating HCP data. We extend our modeling framework to a dynamic setting to detect change points.
△ Less
Submitted 9 July, 2020; v1 submitted 1 November, 2018;
originally announced November 2018.
-
Clustered Monotone Transforms for Rating Factorization
Authors:
Gaurush Hiranandani,
Raghav Somani,
Oluwasanmi Koyejo,
Sreangsu Acharyya
Abstract:
Exploiting low-rank structure of the user-item rating matrix has been the crux of many recommendation engines. However, existing recommendation engines force raters with heterogeneous behavior profiles to map their intrinsic rating scales to a common rating scale (e.g. 1-5). This non-linear transformation of the rating scale shatters the low-rank structure of the rating matrix, therefore resulting…
▽ More
Exploiting low-rank structure of the user-item rating matrix has been the crux of many recommendation engines. However, existing recommendation engines force raters with heterogeneous behavior profiles to map their intrinsic rating scales to a common rating scale (e.g. 1-5). This non-linear transformation of the rating scale shatters the low-rank structure of the rating matrix, therefore resulting in a poor fit and consequentially, poor recommendations. In this paper, we propose Clustered Monotone Transforms for Rating Factorization (CMTRF), a novel approach to perform regression up to unknown monotonic transforms over unknown population segments. Essentially, for recommendation systems, the technique searches for monotonic transformations of the rating scales resulting in a better fit. This is combined with an underlying matrix factorization regression model that couples the user-wise ratings to exploit shared low dimensional structure. The rating scale transformations can be generated for each user, for a cluster of users, or for all the users at once, forming the basis of three simple and efficient algorithms proposed in this paper, all of which alternate between transformation of the rating scales and matrix factorization regression. Despite the non-convexity, CMTRF is theoretically shown to recover a unique solution under mild conditions. Experimental results on two synthetic and seven real-world datasets show that CMTRF outperforms other state-of-the-art baselines.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
A case study of Empirical Bayes in User-Movie Recommendation system
Authors:
Arabin Kumar Dey,
Raghav Somani,
Sreangsu Acharyya
Abstract:
In this article we provide a formulation of empirical bayes described by Atchade (2011) to tune the hyperparameters of priors used in bayesian set up of collaborative filter. We implement the same in MovieLens small dataset. We see that it can be used to get a good initial choice for the parameters. It can also be used to guess an initial choice for hyper-parameters in grid search procedure even f…
▽ More
In this article we provide a formulation of empirical bayes described by Atchade (2011) to tune the hyperparameters of priors used in bayesian set up of collaborative filter. We implement the same in MovieLens small dataset. We see that it can be used to get a good initial choice for the parameters. It can also be used to guess an initial choice for hyper-parameters in grid search procedure even for the datasets where MCMC oscillates around the true value or takes long time to converge.
△ Less
Submitted 7 July, 2017;
originally announced July 2017.
-
Surrogacy of progression free survival for overall survival in metastatic breast cancer studies: meta-analyses of published studies
Authors:
Madan G. Kundu,
Suddhasatta Acharyya
Abstract:
Purpose: PFS is often used as a surrogate endpoint for OS in metastatic breast cancer studies. We have evaluated the association of treatment effect on PFS with significant HR$_{OS}$ (and how this association is affected by other factors) in published prospective metastatic breast cancer studies.
Methods: A systematic literature search in PubMed identified prospective metastatic breast cancer st…
▽ More
Purpose: PFS is often used as a surrogate endpoint for OS in metastatic breast cancer studies. We have evaluated the association of treatment effect on PFS with significant HR$_{OS}$ (and how this association is affected by other factors) in published prospective metastatic breast cancer studies.
Methods: A systematic literature search in PubMed identified prospective metastatic breast cancer studies. Treatments effects on PFS were determined using hazard ratio (HR$_{PFS}$), increase in median PFS ($Δ$MED$_{PFS}$) and % increase in median PFS (%$Δ$MED$_{PFS}$). Diagnostic accuracy of PFS measures (HR$_{PFS}$, $Δ$MED$_{PFS}$ and %$Δ$MED$_{PFS}$) in predicting significant HR$_{OS}$ was assessed using receiver operating characteristics (ROC) curves and classification trees approach.
Results: Seventy-three cases (i.e., treatment to control comparisons) from 64 individual publications were identified for the analyses. Of these, 16 cases reported significant treatment effect on HR$_{OS}$ at 5% level of significance. Median number of deaths reported in these cases were 156. Area under the ROC curve (AUC) for diagnostic measures as HR$_{PFS}$, $Δ$MED$_{PFS}$ and %$Δ$MED$_{PFS}$ were 0.69, 0.70 and 0.75, respectively. Classification tree results identified %$Δ$MED$_{PFS}$ and number of deaths as diagnostic measure for significant HR$_{OS}$. Only 7.9\% (3/39) cases with $Δ$MED$_{PFS}$ shorter than 48.27\% reported significant HR$_{OS}$. There were 7 cases with $Δ$MED$_{PFS}$ of 48.27\% or more and number of deaths reported as 227 or more -- of these 5 cases reported significant HR$_{OS}$.
Conclusion: %$Δ$MED$_{PFS}$ was found as better diagnostic measure for significant HR$_{OS}$. Our analysis results also suggest that consideration of total number of deaths may further improve its diagnostic performance.
△ Less
Submitted 1 December, 2016; v1 submitted 15 August, 2016;
originally announced August 2016.
-
Learning to Rank With Bregman Divergences and Monotone Retargeting
Authors:
Sreangsu Acharyya,
Oluwasanmi Koyejo,
Joydeep Ghosh
Abstract:
This paper introduces a novel approach for learning to rank (LETOR) based on the notion of monotone retargeting. It involves minimizing a divergence between all monotonic increasing transformations of the training scores and a parameterized prediction function. The minimization is both over the transformations as well as over the parameters. It is applied to Bregman divergences, a large class of "…
▽ More
This paper introduces a novel approach for learning to rank (LETOR) based on the notion of monotone retargeting. It involves minimizing a divergence between all monotonic increasing transformations of the training scores and a parameterized prediction function. The minimization is both over the transformations as well as over the parameters. It is applied to Bregman divergences, a large class of "distance like" functions that were recently shown to be the unique class that is statistically consistent with the normalized discounted gain (NDCG) criterion [19]. The algorithm uses alternating projection style updates, in which one set of simultaneous projections can be computed independent of the Bregman divergence and the other reduces to parameter estimation of a generalized linear model. This results in easily implemented, efficiently parallelizable algorithm for the LETOR task that enjoys global optimum guarantees under mild conditions. We present empirical results on benchmark datasets showing that this approach can outperform the state of the art NDCG consistent techniques.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.