-
Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19
Authors:
Davide Pigoli,
Kieran Baker,
Jobie Budd,
Lorraine Butler,
Harry Coppock,
Sabrina Egglestone,
Steven G. Gilmour,
Chris Holmes,
David Hurley,
Radka Jersakova,
Ivan Kiskin,
Vasiliki Koutra,
Jonathon Mellor,
George Nicholson,
Joe Packham,
Selina Patel,
Richard Payne,
Stephen J. Roberts,
Björn W. Schuller,
Ana Tendero-Cañadas,
Tracey Thornley,
Alexander Titcomb
Abstract:
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously ass…
▽ More
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
△ Less
Submitted 27 February, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Democratizing Aviation Emissions Estimation: Development of an Open-Source, Data-Driven Methodology
Authors:
Andy Eskenazi,
Landon Butler,
Arnav Joshi,
Megan Ryerson
Abstract:
Through an aviation emissions estimation tool that is both publicly-accessible and comprehensive, researchers, planners, and community advocates can help shape a more sustainable and equitable U.S. air transportation system. To this end, we develop an open-source, data-driven methodology to calculate the system-wide emissions of the U.S. domestic civil aviation industry. This process utilizes and…
▽ More
Through an aviation emissions estimation tool that is both publicly-accessible and comprehensive, researchers, planners, and community advocates can help shape a more sustainable and equitable U.S. air transportation system. To this end, we develop an open-source, data-driven methodology to calculate the system-wide emissions of the U.S. domestic civil aviation industry. This process utilizes and integrates six different public datasets provided by the Bureau of Transportation Statistics (BTS), the Federal Aviation Agency (FAA), EUROCONTROL, and the International Civil Aviation Organization (ICAO). At the individual flight level, our approach examines the specific aircraft type, equipped engine, and time in stage of flight to produce a more granular estimate than competing approaches. Enabled by our methodology, we then calculate system-wide emissions, considering four different greenhouse gases (CO2, NOx, CO, HC) during the Landing, Take-off (LTO) and Climb, Cruise, and Descent (CCD) flight cycles. Our results elucidate that emissions on a particular route can vary significantly due to aircraft and engine choice, and that emission rates differ significantly from airline to airline. We also find that CO2 alone is not a sufficient proxy for emissions, as NOx, when converted to its CO2-equivalency, exceeds CO2 during both LTO and CCD.
△ Less
Submitted 5 May, 2022; v1 submitted 13 February, 2022;
originally announced February 2022.
-
The synthesis of data from instrumented structures and physics-based models via Gaussian processes
Authors:
Alastair Gregory,
Din-Houn Lau,
Mark Girolami,
Liam Butler,
Mohammed Elshafie
Abstract:
A recent development which is poised to disrupt current structural engineering practice is the use of data obtained from physical structures such as bridges, viaducts and buildings. These data can represent how the structure responds to various stimuli over time when in operation, providing engineers with a unique insight into how their designs are performing. With the advent of advanced sensing t…
▽ More
A recent development which is poised to disrupt current structural engineering practice is the use of data obtained from physical structures such as bridges, viaducts and buildings. These data can represent how the structure responds to various stimuli over time when in operation, providing engineers with a unique insight into how their designs are performing. With the advent of advanced sensing technologies and the Internet of Things, the efficient interpretation of structural health monitoring data has become a big data challenge. Many models have been proposed in literature to represent such data, such as linear statistical models. Based upon these models, the health of the structure is reasoned about, e.g. through damage indices, changes in likelihood and statistical parameter estimates. On the other hand, physics-based models are typically used when designing structures to predict how the structure will respond to operational stimuli. What remains unclear in the literature is how to combine the observed data with information from the idealised physics-based model into a model that describes the responses of the operational structure. This paper introduces a new approach which fuses together observed data from a physical structure during operation and information from a mathematical model. The observed data are combined with data simulated from the physics-based model using a multi-output Gaussian process formulation. The novelty of this method is how the information from observed data and the physics-based model is balanced to obtain a representative model of the structures response to stimuli. We present our method using data obtained from a fibre-optic sensor network installed on experimental railway sleepers. We discuss how this approach can be used to reason about changes in the structures behaviour over time using simulations and experimental data.
△ Less
Submitted 29 April, 2019; v1 submitted 27 November, 2018;
originally announced November 2018.
-
A Quantile-Based Approach to Modelling Recovery Time in Structural Health Monitoring
Authors:
Alastair Gregory,
F. Din-Houn Lau,
Liam Butler
Abstract:
Statistical techniques play a large role in the structural health monitoring of instrumented infrastructure, such as a railway bridge constructed with an integrated network of fibre optic sensors. One possible way to reason about the structural health of such a railway bridge, is to model the time it takes to recover to a no-load (baseline) state after a train passes over. Inherently, this recover…
▽ More
Statistical techniques play a large role in the structural health monitoring of instrumented infrastructure, such as a railway bridge constructed with an integrated network of fibre optic sensors. One possible way to reason about the structural health of such a railway bridge, is to model the time it takes to recover to a no-load (baseline) state after a train passes over. Inherently, this recovery time is random and should be modelled statistically. This paper uses a non-parametric model, based on empirical quantile approximations, to construct a space-memory efficient baseline distribution for the streaming data from these sensors. A fast statistical test is implemented to detect deviations away from, and recovery back to, this distribution when trains pass over the bridge, yielding a recovery time. Our method assumes that there are no temporal variations in the data. A median-based detrending scheme is used to remove the temporal variations likely due to temperature changes. This allows for the continuous recording of sensor data with a space-memory constraint.
△ Less
Submitted 22 March, 2018;
originally announced March 2018.
-
A bayesian approach to the estimation of maps between riemannian manifolds, II: examples
Authors:
Leo T. Butler,
Boris Levit
Abstract:
Let M be a smooth compact oriented manifold without boundary, imbedded in a euclidean space E and let f be a smooth map of M into a Riemannian manifold N. An unknown state x in M is observed via X=x+su where s>0 is a small parameter and u is a white Gaussian noise. For a given smooth prior on M and smooth estimators g of the map f we have derived a second-order asymptotic expansion for the relat…
▽ More
Let M be a smooth compact oriented manifold without boundary, imbedded in a euclidean space E and let f be a smooth map of M into a Riemannian manifold N. An unknown state x in M is observed via X=x+su where s>0 is a small parameter and u is a white Gaussian noise. For a given smooth prior on M and smooth estimators g of the map f we have derived a second-order asymptotic expansion for the related Bayesian risk (see arXiv:0705.2540). In this paper, we apply this technique to a variety of examples.
The second part examines the first-order conditions for equality-constrained regression problems. The geometric tools that are utilised in our earlier paper are naturally applicable to these regression problems.
△ Less
Submitted 18 August, 2009;
originally announced August 2009.