-
A Virtual Solar Wind Monitor at Mars with Uncertainty Quantification using Gaussian Processes
Authors:
A. R. Azari,
E. Abrahams,
F. Sapienza,
J. Halekas,
J. Biersteker,
D. L. Mitchell,
F. Pérez,
M. Marquette,
M. J. Rutala,
C. F. Bowers,
C. M. Jackman,
S. M. Curry
Abstract:
Single spacecraft missions do not measure the pristine solar wind continuously because of the spacecrafts' orbital trajectory. The infrequent spatiotemporal cadence of measurement fundamentally limits conclusions about solar wind-magnetosphere coupling throughout the solar system. At Mars, such single spacecraft missions result in limitations for assessing the solar wind's role in causing lower al…
▽ More
Single spacecraft missions do not measure the pristine solar wind continuously because of the spacecrafts' orbital trajectory. The infrequent spatiotemporal cadence of measurement fundamentally limits conclusions about solar wind-magnetosphere coupling throughout the solar system. At Mars, such single spacecraft missions result in limitations for assessing the solar wind's role in causing lower altitude observations such as auroral dynamics or atmospheric loss. In this work, we detail the development of a virtual solar wind monitor from the Mars Atmosphere and Volatile Evolution (MAVEN) mission; a single spacecraft. This virtual solar wind monitor provides a continuous estimate of the solar wind upstream from Mars with uncertainties. We specifically employ Gaussian process regression to estimate the upstream solar wind and uncertainty estimations that scale with the data sparsity of our real observations. This proxy enables continuous solar wind estimation at Mars with representative uncertainties for the majority of the time since since late 2014. We conclude by discussing suggested uses of this virtual solar wind monitor for statistical studies of the Mars space environment and heliosphere.
△ Less
Submitted 14 July, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Magnetic Field Draping in Induced Magnetospheres: Evidence from the MAVEN Mission to Mars
Authors:
A. R. Azari,
E. Abrahams,
F. Sapienza,
D. L. Mitchell,
J. Biersteker,
S. Xu,
C. Bowers,
F. Pérez,
G. A. DiBraccio,
Y. Dong,
S. Curry
Abstract:
The Mars Atmosphere and Volatile EvolutioN (MAVEN) mission has been orbiting Mars since 2014 and now has over 10,000 orbits which we use to characterize Mars' dynamic space environment. Through global field line tracing with MAVEN magnetic field data we find an altitude dependent draping morphology that differs from expectations of induced magnetospheres in the vertical ($\hat Z$ Mars Sun-state, M…
▽ More
The Mars Atmosphere and Volatile EvolutioN (MAVEN) mission has been orbiting Mars since 2014 and now has over 10,000 orbits which we use to characterize Mars' dynamic space environment. Through global field line tracing with MAVEN magnetic field data we find an altitude dependent draping morphology that differs from expectations of induced magnetospheres in the vertical ($\hat Z$ Mars Sun-state, MSO) direction. We quantify this difference from the classical picture of induced magnetospheres with a Bayesian multiple linear regression model to predict the draped field as a function of the upstream interplanetary magnetic field (IMF), remanent crustal fields, and a previously underestimated induced effect. From our model we conclude that unexpected twists in high altitude dayside draping ($>$800 km) are a result of the IMF component in the $\pm \hat X$ MSO direction. We propose that this is a natural outcome of current theories of induced magnetospheres but has been underestimated due to approximations of the IMF as solely $\pm \hat Y$ directed. We additionally estimate that distortions in low altitude ($<$800 km) dayside draping along $\hat Z$ are directly related to remanent crustal fields. We show dayside draping traces down tail and previously reported inner magnetotail twists are likely caused by the crustal field of Mars, while the outer tail morphology is governed by an induced response to the IMF direction. We conclude with an updated understanding of induced magnetospheres which details dayside draping for multiple directions of the incoming IMF and discuss the repercussions of this draping for magnetotail morphology.
△ Less
Submitted 20 October, 2023; v1 submitted 4 August, 2023;
originally announced August 2023.
-
Six Years of Shiny in Research -- Collaborative Development of Web Tools in R
Authors:
Peter Kasprzak,
Lachlan Mitchell,
Olena Kravchuk,
Andy Timmins
Abstract:
The use of Shiny in research publications is investigated. From the appearance of this popular web application framework for R through to 2018, it has been utilised in many diverse research areas. While it can be shown that the complexity of Shiny applications is limited by the background architecture, and real security concerns exist for novice app developers, the collaborative benefits are worth…
▽ More
The use of Shiny in research publications is investigated. From the appearance of this popular web application framework for R through to 2018, it has been utilised in many diverse research areas. While it can be shown that the complexity of Shiny applications is limited by the background architecture, and real security concerns exist for novice app developers, the collaborative benefits are worth attention from the wider research community. Shiny simplifies the use of complex methodologies for users of different specialities, at the level of proficiency appropriate for the end user. This enables a diverse community of users to interact efficiently, utilising cutting-edge methodologies. The literature reviewed demonstrates that complex methodologies can be put into practice without the necessity for investment in professional training. It would appear that Shiny opens up concurrent benefits in communication between those who analyse data and those in other disciplines, thereby potentially enriching research through this technology.
△ Less
Submitted 10 January, 2021;
originally announced January 2021.
-
Popularity and Centrality in Spotify Networks: Critical transitions in eigenvector centrality
Authors:
Tobin South,
Matthew Roughan,
Lewis Mitchell
Abstract:
The modern age of digital music access has increased the availability of data about music consumption and creation, facilitating the large-scale analysis of the complex networks that connect music together. Data about user streaming behaviour, and the musical collaboration networks are particularly important with new data-driven recommendation systems. Without thorough analysis, such collaboration…
▽ More
The modern age of digital music access has increased the availability of data about music consumption and creation, facilitating the large-scale analysis of the complex networks that connect music together. Data about user streaming behaviour, and the musical collaboration networks are particularly important with new data-driven recommendation systems. Without thorough analysis, such collaboration graphs can lead to false or misleading conclusions. Here we present a new collaboration network of artists from the online music streaming service Spotify, and demonstrate a critical change in the eigenvector centrality of artists, as low popularity artists are removed. The critical change in centrality, from classical artists to rap artists, demonstrates deeper structural properties of the network. A Social Group Centrality model is presented to simulate this critical transition behaviour, and switching between dominant eigenvectors is observed. This model presents a novel investigation of the effect of popularity bias on how centrality and importance are measured, and provides a new tool for examining such flaws in networks.
△ Less
Submitted 29 August, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Analyzing the Effects of Observation Function Selection in Ensemble Kalman Filtering for Epidemic Models
Authors:
Leah Mitchell,
Andrea Arnold
Abstract:
The Ensemble Kalman Filter (EnKF) is a popular sequential data assimilation method that has been increasingly used for parameter estimation and forecast prediction in epidemiological studies. The observation function plays a critical role in the EnKF framework, connecting the unknown system variables with the observed data. Key differences in observed data and modeling assumptions have led to the…
▽ More
The Ensemble Kalman Filter (EnKF) is a popular sequential data assimilation method that has been increasingly used for parameter estimation and forecast prediction in epidemiological studies. The observation function plays a critical role in the EnKF framework, connecting the unknown system variables with the observed data. Key differences in observed data and modeling assumptions have led to the use of different observation functions in the epidemic modeling literature. In this work, we present a novel computational analysis demonstrating the effects of observation function selection when using the EnKF for state and parameter estimation in this setting. In examining the use of four epidemiologically-inspired observation functions of different forms in connection with the classic Susceptible-Infectious-Recovered (SIR) model, we show how incorrect observation modeling assumptions (i.e., fitting incidence data with a prevalence model, or neglecting under-reporting) can lead to inaccurate filtering estimates and forecast predictions. Results demonstrate the importance of choosing an observation function that well interprets the available data on the corresponding EnKF estimates in several filtering scenarios, including state estimation with known parameters, and combined state and parameter estimation with both constant and time-varying parameters. Numerical experiments further illustrate how modifying the observation noise covariance matrix in the filter can help to account for uncertainty in the observation function in certain cases.
△ Less
Submitted 17 July, 2021; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit
Authors:
Curtis Murray,
Lewis Mitchell,
Jonathan Tuke,
Mark Mackay
Abstract:
Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts…
▽ More
Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts to the Reddit forum r/COVID19Positive contain first-hand accounts from COVID-19 positive patients, giving insight into personal struggles with the virus. These posts often feature a temporal structure indicating the number of days after developing symptoms the text refers to. Using topic modelling and sentiment analysis, we quantify the change in discussion of COVID-19 throughout individuals' experiences for the first 14 days since symptom onset. Discourse on early symptoms such as fever, cough, and sore throat was concentrated towards the beginning of the posts, while language indicating breathing issues peaked around ten days. Some conversation around critical cases was also identified and appeared at a roughly constant rate. We identified two clear clusters of positive and negative emotions associated with the evolution of these symptoms and mapped their relationships. Our results provide a perspective on the patient experience of COVID-19 that complements other medical data streams and can potentially reveal when mental health issues might appear.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Improving inference for nonlinear state-space models of animal population dynamics given biased sequential life stage data
Authors:
Leo Polansky,
Ken B. Newman,
Lara Mitchell
Abstract:
State-space models (SSMs) are a popular tool for modeling animal abundances. Inference difficulties for simple linear SSMs are well known, particularly in relation to simultaneous estimation of process and observation variances. Several remedies to overcome estimation problems have been studied for relatively simple SSMs, but whether these challenges and proposed remedies apply for nonlinear stage…
▽ More
State-space models (SSMs) are a popular tool for modeling animal abundances. Inference difficulties for simple linear SSMs are well known, particularly in relation to simultaneous estimation of process and observation variances. Several remedies to overcome estimation problems have been studied for relatively simple SSMs, but whether these challenges and proposed remedies apply for nonlinear stage-structured SSMs, an important class of ecological models, is less well understood. Here we identify improvements for inference about nonlinear stage-structured SSMs fit with biased sequential life stage data. Theoretical analyses indicate parameter identifiability requires covariates in the state processes. Simulation studies show that plugging in externally estimated observation variances, as opposed to jointly estimating them with other parameters, reduces bias and standard error of estimates. In contrast to previous results for simple linear SSMs, strong confounding between jointly estimated process and observation variance parameters was not found in the models explored here. However, when observation variance was also estimated in the motivating case study, the resulting process variance estimates were implausibly low (near-zero). As SSMs are used in increasingly complex ways, understanding when inference can be expected to be successful, and what aids it, becomes more important. Our study illustrates (i) the need for relevant process covariates and (ii) the benefits of using externally estimated observation variances for inference for nonlinear stage-structured SSMs.
△ Less
Submitted 19 September, 2019;
originally announced September 2019.
-
A framework for streamlined statistical prediction using topic models
Authors:
Vanessa Glenny,
Jonathan Tuke,
Nigel Bean,
Lewis Mitchell
Abstract:
In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within…
▽ More
In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
SMERC: Social media event response clustering using textual and temporal information
Authors:
Peter Mathews,
Caitlin Gray,
Lewis Mitchell,
Giang T. Nguyen,
Nigel G. Bean
Abstract:
Tweet clustering for event detection is a powerful modern method to automate the real-time detection of events. In this work we present a new tweet clustering approach, using a probabilistic approach to incorporate temporal information. By analysing the distribution of time gaps between tweets we show that the gaps between pairs of related tweets exhibit exponential decay, whereas the gaps between…
▽ More
Tweet clustering for event detection is a powerful modern method to automate the real-time detection of events. In this work we present a new tweet clustering approach, using a probabilistic approach to incorporate temporal information. By analysing the distribution of time gaps between tweets we show that the gaps between pairs of related tweets exhibit exponential decay, whereas the gaps between unrelated tweets are approximately uniform. Guided by this insight, we use probabilistic arguments to estimate the likelihood that a pair of tweets are related, and build an improved clustering method. Our method Social Media Event Response Clustering (SMERC) creates clusters of tweets based on their tendency to be related to a single event. We evaluate our method at three levels: through traditional event prediction from tweet clustering, by measuring the improvement in quality of clusters created, and also comparing the clustering precision and recall with other methods. By applying SMERC to tweets collected during a number of sporting events, we demonstrate that incorporating temporal information leads to state of the art clustering performance.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
The one comparing narrative social network extraction techniques
Authors:
Michelle Edwards,
Lewis Mitchell,
Jonathan Tuke,
Matthew Roughan
Abstract:
Analysing narratives through their social networks is an expanding field in quantitative literary studies. Manually extracting a social network from any narrative can be time consuming, so automatic extraction methods of varying complexity have been developed. However, the effect of different extraction methods on the analysis is unknown. Here we model and compare three extraction methods for soci…
▽ More
Analysing narratives through their social networks is an expanding field in quantitative literary studies. Manually extracting a social network from any narrative can be time consuming, so automatic extraction methods of varying complexity have been developed. However, the effect of different extraction methods on the analysis is unknown. Here we model and compare three extraction methods for social networks in narratives: manual extraction, co-occurrence automated extraction and automated extraction using machine learning. Although the manual extraction method produces more precise results in the network analysis, it is much more time consuming and the automatic extraction methods yield comparable conclusions for density, centrality measures and edge weights. Our results provide evidence that social networks extracted automatically are reliable for many analyses. We also describe which aspects of analysis are not reliable with such a social network. We anticipate that our findings will make it easier to analyse more narratives, which help us improve our understanding of how stories are written and evolve, and how people interact with each other.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Pachinko Prediction: A Bayesian method for event prediction from social media data
Authors:
Jonathan Tuke,
Andrew Nguyen,
Mehwish Nasim,
Drew Mellor,
Asanga Wickramasinghe,
Nigel Bean,
Lewis Mitchell
Abstract:
The combination of large open data sources with machine learning approaches presents a potentially powerful way to predict events such as protest or social unrest. However, accounting for uncertainty in such models, particularly when using diverse, unstructured datasets such as social media, is essential to guarantee the appropriate use of such methods. Here we develop a Bayesian method for predic…
▽ More
The combination of large open data sources with machine learning approaches presents a potentially powerful way to predict events such as protest or social unrest. However, accounting for uncertainty in such models, particularly when using diverse, unstructured datasets such as social media, is essential to guarantee the appropriate use of such methods. Here we develop a Bayesian method for predicting social unrest events in Australia using social media data. This method uses machine learning methods to classify individual postings to social media as being relevant, and an empirical Bayesian approach to calculate posterior event probabilities. We use the method to predict events in Australian cities over a period in 2017/18.
△ Less
Submitted 22 September, 2018;
originally announced September 2018.
-
Generating Connected Random Graphs
Authors:
Caitlin Gray,
Lewis Mitchell,
Matthew Roughan
Abstract:
Sampling random graphs is essential in many applications, and often algorithms use Markov chain Monte Carlo methods to sample uniformly from the space of graphs. However, often there is a need to sample graphs with some property that we are unable, or it is too inefficient, to sample using standard approaches. In this paper, we are interested in sampling graphs from a conditional ensemble of the u…
▽ More
Sampling random graphs is essential in many applications, and often algorithms use Markov chain Monte Carlo methods to sample uniformly from the space of graphs. However, often there is a need to sample graphs with some property that we are unable, or it is too inefficient, to sample using standard approaches. In this paper, we are interested in sampling graphs from a conditional ensemble of the underlying graph model. We present an algorithm to generate samples from an ensemble of connected random graphs using a Metropolis-Hastings framework. The algorithm extends to a general framework for sampling from a known distribution of graphs, conditioned on a desired property. We demonstrate the method to generate connected spatially embedded random graphs, specifically the well known Waxman network, and illustrate the convergence and practicalities of the algorithm.
△ Less
Submitted 25 October, 2018; v1 submitted 29 June, 2018;
originally announced June 2018.
-
The nature and origin of heavy tails in retweet activity
Authors:
Peter Mathews,
Lewis Mitchell,
Giang T. Nguyen,
Nigel G. Bean
Abstract:
Modern social media platforms facilitate the rapid spread of information online. Modelling phenomena such as social contagion and information diffusion are contingent upon a detailed understanding of the information-sharing processes. In Twitter, an important aspect of this occurs with retweets, where users rebroadcast the tweets of other users. To improve our understanding of how these distributi…
▽ More
Modern social media platforms facilitate the rapid spread of information online. Modelling phenomena such as social contagion and information diffusion are contingent upon a detailed understanding of the information-sharing processes. In Twitter, an important aspect of this occurs with retweets, where users rebroadcast the tweets of other users. To improve our understanding of how these distributions arise, we analyse the distribution of retweet times. We show that a power law with exponential cutoff provides a better fit than the power laws previously suggested. We explain this fit through the burstiness of human behaviour and the priorities individuals place on different tasks.
△ Less
Submitted 16 March, 2017;
originally announced March 2017.
-
Recommending Learning Algorithms and Their Associated Hyperparameters
Authors:
Michael R. Smith,
Logan Mitchell,
Christophe Giraud-Carrier,
Tony Martinez
Abstract:
The success of machine learning on a given task dependson, among other things, which learning algorithm is selected and its associated hyperparameters. Selecting an appropriate learning algorithm and setting its hyperparameters for a given data set can be a challenging task, especially for users who are not experts in machine learning. Previous work has examined using meta-features to predict whic…
▽ More
The success of machine learning on a given task dependson, among other things, which learning algorithm is selected and its associated hyperparameters. Selecting an appropriate learning algorithm and setting its hyperparameters for a given data set can be a challenging task, especially for users who are not experts in machine learning. Previous work has examined using meta-features to predict which learning algorithm and hyperparameters should be used. However, choosing a set of meta-features that are predictive of algorithm performance is difficult. Here, we propose to apply collaborative filtering techniques to learning algorithm and hyperparameter selection, and find that doing so avoids determining which meta-features to use and outperforms traditional meta-learning approaches in many cases.
△ Less
Submitted 7 July, 2014;
originally announced July 2014.