-
Capturing functional connectomics using Riemannian partial least squares
Authors:
Matt Ryan,
Gary Glonek,
Jono Tuke,
Melissa Humphries
Abstract:
For neurological disorders and diseases, functional and anatomical connectomes of the human brain can be used to better inform targeted interventions and treatment strategies. Functional magnetic resonance imaging (fMRI) is a non-invasive neuroimaging technique that captures spatio-temporal brain function through blood flow over time. FMRI can be used to study the functional connectome through the…
▽ More
For neurological disorders and diseases, functional and anatomical connectomes of the human brain can be used to better inform targeted interventions and treatment strategies. Functional magnetic resonance imaging (fMRI) is a non-invasive neuroimaging technique that captures spatio-temporal brain function through blood flow over time. FMRI can be used to study the functional connectome through the functional connectivity matrix; that is, Pearson's correlation matrix between time series from the regions of interest of an fMRI image. One approach to analysing functional connectivity is using partial least squares (PLS), a multivariate regression technique designed for high-dimensional predictor data. However, analysing functional connectivity with PLS ignores a key property of the functional connectivity matrix; namely, these matrices are positive definite. To account for this, we introduce a generalisation of PLS to Riemannian manifolds, called R-PLS, and apply it to symmetric positive definite matrices with the affine invariant geometry. We apply R-PLS to two functional imaging datasets: COBRE, which investigates functional differences between schizophrenic patients and healthy controls, and; ABIDE, which compares people with autism spectrum disorder and neurotypical controls. Using the variable importance in the projection statistic on the results of R-PLS, we identify key functional connections in each dataset that are well represented in the literature. Given the generality of R-PLS, this method has potential to open up new avenues for multi-model imaging analysis linking structural and functional connectomics.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Multivariate distance matrix regression for a manifold-valued response variable
Authors:
Matt Ryan,
Gary Glonek,
Melissa Humphries,
Jono Tuke
Abstract:
In this paper, we propose the use of geodesic distances in conjunction with multivariate distance matrix regression, called geometric-MDMR, as a powerful first step analysis method for manifold-valued data. Manifold-valued data is appearing more frequently in the literature from analyses of earthquake to analysing brain patterns. Accounting for the structure of this data increases the complexity o…
▽ More
In this paper, we propose the use of geodesic distances in conjunction with multivariate distance matrix regression, called geometric-MDMR, as a powerful first step analysis method for manifold-valued data. Manifold-valued data is appearing more frequently in the literature from analyses of earthquake to analysing brain patterns. Accounting for the structure of this data increases the complexity of your analysis, but allows for much more interpretable results in terms of the data. To test geometric-MDMR, we develop a method to simulate functional connectivity matrices for fMRI data to perform a simulation study, which shows that our method outperforms the current standards in fMRI analysis.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit
Authors:
Curtis Murray,
Lewis Mitchell,
Jonathan Tuke,
Mark Mackay
Abstract:
Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts…
▽ More
Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts to the Reddit forum r/COVID19Positive contain first-hand accounts from COVID-19 positive patients, giving insight into personal struggles with the virus. These posts often feature a temporal structure indicating the number of days after developing symptoms the text refers to. Using topic modelling and sentiment analysis, we quantify the change in discussion of COVID-19 throughout individuals' experiences for the first 14 days since symptom onset. Discourse on early symptoms such as fever, cough, and sore throat was concentrated towards the beginning of the posts, while language indicating breathing issues peaked around ten days. Some conversation around critical cases was also identified and appeared at a roughly constant rate. We identified two clear clusters of positive and negative emotions associated with the evolution of these symptoms and mapped their relationships. Our results provide a perspective on the patient experience of COVID-19 that complements other medical data streams and can potentially reveal when mental health issues might appear.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
A framework for streamlined statistical prediction using topic models
Authors:
Vanessa Glenny,
Jonathan Tuke,
Nigel Bean,
Lewis Mitchell
Abstract:
In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within…
▽ More
In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
The one comparing narrative social network extraction techniques
Authors:
Michelle Edwards,
Lewis Mitchell,
Jonathan Tuke,
Matthew Roughan
Abstract:
Analysing narratives through their social networks is an expanding field in quantitative literary studies. Manually extracting a social network from any narrative can be time consuming, so automatic extraction methods of varying complexity have been developed. However, the effect of different extraction methods on the analysis is unknown. Here we model and compare three extraction methods for soci…
▽ More
Analysing narratives through their social networks is an expanding field in quantitative literary studies. Manually extracting a social network from any narrative can be time consuming, so automatic extraction methods of varying complexity have been developed. However, the effect of different extraction methods on the analysis is unknown. Here we model and compare three extraction methods for social networks in narratives: manual extraction, co-occurrence automated extraction and automated extraction using machine learning. Although the manual extraction method produces more precise results in the network analysis, it is much more time consuming and the automatic extraction methods yield comparable conclusions for density, centrality measures and edge weights. Our results provide evidence that social networks extracted automatically are reliable for many analyses. We also describe which aspects of analysis are not reliable with such a social network. We anticipate that our findings will make it easier to analyse more narratives, which help us improve our understanding of how stories are written and evolve, and how people interact with each other.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Pachinko Prediction: A Bayesian method for event prediction from social media data
Authors:
Jonathan Tuke,
Andrew Nguyen,
Mehwish Nasim,
Drew Mellor,
Asanga Wickramasinghe,
Nigel Bean,
Lewis Mitchell
Abstract:
The combination of large open data sources with machine learning approaches presents a potentially powerful way to predict events such as protest or social unrest. However, accounting for uncertainty in such models, particularly when using diverse, unstructured datasets such as social media, is essential to guarantee the appropriate use of such methods. Here we develop a Bayesian method for predic…
▽ More
The combination of large open data sources with machine learning approaches presents a potentially powerful way to predict events such as protest or social unrest. However, accounting for uncertainty in such models, particularly when using diverse, unstructured datasets such as social media, is essential to guarantee the appropriate use of such methods. Here we develop a Bayesian method for predicting social unrest events in Australia using social media data. This method uses machine learning methods to classify individual postings to social media as being relevant, and an empirical Bayesian approach to calculate posterior event probabilities. We use the method to predict events in Australian cities over a period in 2017/18.
△ Less
Submitted 22 September, 2018;
originally announced September 2018.
-
Rigorous statistical analysis of HTTPS reachability
Authors:
George Michaelson,
Matthew Roughan,
Jonathan Tuke,
Matt P. Wand,
Randy Bush
Abstract:
The use of secure connections using HTTPS as the default means, or even the only means, to connect to web servers is increasing. It is being pushed from both sides: from the bottom up by client distributions and plugins, and from the top down by organisations such as Google. However, there are potential technical hurdles that might lock some clients out of the modern web. This paper seeks to measu…
▽ More
The use of secure connections using HTTPS as the default means, or even the only means, to connect to web servers is increasing. It is being pushed from both sides: from the bottom up by client distributions and plugins, and from the top down by organisations such as Google. However, there are potential technical hurdles that might lock some clients out of the modern web. This paper seeks to measure and precisely quantify those hurdles in the wild. More than three million measurements provide statistically significant evidence of degradation. We show this through a variety of statistical techniques. Various factors are shown to influence the problem, ranging from the client's browser, to the locale from which they connect.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
An Induced Natural Selection Heuristic for Finding Optimal Bayesian Experimental Designs
Authors:
David J. Price,
Nigel G. Bean,
Joshua V. Ross,
Jonathan Tuke
Abstract:
Bayesian optimal experimental design has immense potential to inform the collection of data so as to subsequently enhance our understanding of a variety of processes. However, a major impediment is the difficulty in evaluating optimal designs for problems with large, or high-dimensional, design spaces. We propose an efficient search heuristic suitable for general optimisation problems, with a part…
▽ More
Bayesian optimal experimental design has immense potential to inform the collection of data so as to subsequently enhance our understanding of a variety of processes. However, a major impediment is the difficulty in evaluating optimal designs for problems with large, or high-dimensional, design spaces. We propose an efficient search heuristic suitable for general optimisation problems, with a particular focus on optimal Bayesian experimental design problems. The heuristic evaluates the objective (utility) function at an initial, randomly generated set of input values. At each generation of the algorithm, input values are "accepted" if their corresponding objective (utility) function satisfies some acceptance criteria, and new inputs are sampled about these accepted points. We demonstrate the new algorithm by evaluating the optimal Bayesian experimental designs for the previously considered death, pharmacokinetic and logistic regression models. Comparisons to the current "gold-standard" method are given to demonstrate the proposed algorithm as a computationally-efficient alternative for moderately-large design problems (i.e., up to approximately 40-dimensions).
△ Less
Submitted 13 March, 2018; v1 submitted 16 March, 2017;
originally announced March 2017.
-
All networks look the same to me: Testing for homogeneity in networks
Authors:
Jonathan Tuke,
Matthew Roughan
Abstract:
How can researchers test for heterogeneity in the local structure of a network? In this paper, we present a framework that utilizes random sampling to give subgraphs which are then used in a goodness of fit test to test for heterogeneity. We illustrate how to use the goodness of fit test for an analytically derived distribution as well as an empirical distribution. To demonstrate our framework, we…
▽ More
How can researchers test for heterogeneity in the local structure of a network? In this paper, we present a framework that utilizes random sampling to give subgraphs which are then used in a goodness of fit test to test for heterogeneity. We illustrate how to use the goodness of fit test for an analytically derived distribution as well as an empirical distribution. To demonstrate our framework, we consider the simple case of testing for edge probability heterogeneity. We examine the significance level, power and computation time for this case with appropriate examples. Finally we outline how to apply our framework to other heterogeneity problems.
△ Less
Submitted 2 December, 2015;
originally announced December 2015.
-
P-values, q-values and posterior probabilities for equivalence in genomics studies
Authors:
J. Tuke,
G. F. V. Glonek,
P. J. Solomon
Abstract:
Equivalence testing is of emerging importance in genomics studies but has hitherto been little studied in this content. In this paper, we define the notion of equivalence of gene expression and determine a `strength of evidence' measure for gene equivalence. It is common practice in genome-wide studies to rank genes according to observed gene-specific P-values or adjusted P-values, which are assum…
▽ More
Equivalence testing is of emerging importance in genomics studies but has hitherto been little studied in this content. In this paper, we define the notion of equivalence of gene expression and determine a `strength of evidence' measure for gene equivalence. It is common practice in genome-wide studies to rank genes according to observed gene-specific P-values or adjusted P-values, which are assumed to measure the strength of evidence against the null hypothesis of no differential gene expression. We show here, both empirically and formally, that the equivalence P-value does not satisfy the basic consistency requirements for a valid strength of evidence measure for equivalence. This means that the widely-used q-value (Storey, 2002) defined for each gene to be the minimum positive false discovery rate that would result in the inclusion of the corresponding P-value in the discovery set, cannot be translated to the equivalence testing framework. However, when represented as a posterior probability, we find that the q-value does satisfy some basic consistency requirements needed to be a credible measure of evidence for equivalence. We propose a simple estimate for the q-value from posterior probabilities of equivalence, and analyse data from a mouse stem cell microarray experiment which demonstrate the theory and methods presented here.
△ Less
Submitted 31 January, 2012;
originally announced February 2012.
-
Gene profiling for determining pluripotent genes in a time course microarray experiment
Authors:
J. Tuke,
G. F. V. Glonek,
P. J. Solomon
Abstract:
In microarray experiments, it is often of interest to identify genes which have a pre-specified gene expression profile with respect to time. Methods available in the literature are, however, typically not stringent enough in identifying such genes, particularly when the profile requires equivalence of gene expression levels at certain time points. In this paper, the authors introduce a new meth…
▽ More
In microarray experiments, it is often of interest to identify genes which have a pre-specified gene expression profile with respect to time. Methods available in the literature are, however, typically not stringent enough in identifying such genes, particularly when the profile requires equivalence of gene expression levels at certain time points. In this paper, the authors introduce a new methodology, called gene profiling, that uses simultaneous differential and equivalent gene expression level testing to rank genes according to a pre-specified gene expression profile. Gene profiling treats the vector of true gene expression levels as a linear combination of appropriate vectors, i.e., vectors that give the required criteria for the profile. This gene-profile model is fitted to the data and the resultant parameter estimates are summarized in a single test statistic that is then used to rank the genes. The theoretical underpinnings of gene profiling (equivalence testing, intersection-union tests) are discussed in this paper, and the gene profiling methodology is applied to our motivating stem cell experiment.
△ Less
Submitted 23 July, 2008; v1 submitted 23 May, 2008;
originally announced May 2008.