-
An information theoretic limit to data amplification
Authors:
S. J. Watts,
L. Crow
Abstract:
In recent years generative artificial intelligence has been used to create data to support science analysis. For example, Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the advantage that a GAN creates data in a significantly reduced computing time. N training events for a GAN can result in GN…
▽ More
In recent years generative artificial intelligence has been used to create data to support science analysis. For example, Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the advantage that a GAN creates data in a significantly reduced computing time. N training events for a GAN can result in GN generated events with the gain factor, G, being more than one. This appears to violate the principle that one cannot get information for free. This is not the only way to amplify data so this process will be referred to as data amplification which is studied using information theoretic concepts. It is shown that a gain of greater than one is possible whilst keeping the information content of the data unchanged. This leads to a mathematical bound which only depends on the number of generated and training events. This study determines conditions on both the underlying and reconstructed probability distributions to ensure this bound. In particular, the resolution of variables in amplified data is not improved by the process but the increase in sample size can still improve statistical significance. The bound is confirmed using computer simulation and analysis of GAN generated data from the literature.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
Limits to classification performance by relating Kullback-Leibler divergence to Cohen's Kappa
Authors:
L. Crow,
S. J. Watts
Abstract:
The performance of machine learning classification algorithms are evaluated by estimating metrics, often from the confusion matrix, using training data and cross-validation. However, these do not prove that the best possible performance has been achieved. Fundamental limits to error rates can be estimated using information distance measures. To this end, the confusion matrix has been formulated to…
▽ More
The performance of machine learning classification algorithms are evaluated by estimating metrics, often from the confusion matrix, using training data and cross-validation. However, these do not prove that the best possible performance has been achieved. Fundamental limits to error rates can be estimated using information distance measures. To this end, the confusion matrix has been formulated to comply with the Chernoff-Stein Lemma. This links the error rates to the Kullback-Leibler divergences between the probability density functions describing the two classes. This leads to a key result that relates Cohen's Kappa to the Resistor Average Distance which is the parallel resistor combination of the two Kullback-Leibler divergences. The Resistor Average Distance has units of bits and is estimated from the same training data used by the classification algorithm, using kNN estimates of the KullBack-Leibler divergences. The classification algorithm gives the confusion matrix and Kappa. Theory and methods are discussed in detail and then applied to Monte Carlo data and real datasets. Four very different real datasets - Breast Cancer, Coronary Heart Disease, Bankruptcy, and Particle Identification - are analysed, with both continuous and discrete values, and their classification performance compared to the expected theoretical limit. In all cases this analysis shows that the algorithms could not have performed any better due to the underlying probability density functions for the two classes. Important lessons are learnt on how to predict the performance of algorithms for imbalanced data using training datasets that are approximately balanced. Machine learning is very powerful but classification performance ultimately depends on the quality of the data and the relevance of the variables to the problem.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
The Shannon Entropy of a Histogram
Authors:
Stephen Watts,
Lisa Crow
Abstract:
The histogram is a key method for visualizing data and estimating the underlying probability distribution. Incorrect conclusions about the data result from over or under-binning. A new method based on the Shannon entropy of the histogram uses a simple formula based on the differential entropy estimated from nearest-neighbour distances. Links are made between the new method and other algorithms suc…
▽ More
The histogram is a key method for visualizing data and estimating the underlying probability distribution. Incorrect conclusions about the data result from over or under-binning. A new method based on the Shannon entropy of the histogram uses a simple formula based on the differential entropy estimated from nearest-neighbour distances. Links are made between the new method and other algorithms such as Scott's formula, and cost and risk function methods. A parameter is found that predicts over and under-binning, which can be estimated for any histogram. The new algorithm is shown to be robust by application to real data.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
New Search for Mirror Neutrons at HFIR
Authors:
L. J. Broussard,
K. M. Bailey,
W. B. Bailey,
J. L. Barrow,
B. Chance,
C. Crawford,
L. Crow,
L. DeBeer-Schmitt,
N. Fomin,
M. Frost,
A. Galindo-Uribarri,
F. X. Gallmeier,
L. Heilbronn,
E. B. Iverson,
Y. Kamyshkov,
C. -Y. Liu,
I. Novikov,
S. I. Pentillä,
A. Ruggles,
B. Rybolt,
M. Snow,
L. Townsend,
L. J. Varriano,
S. Vavra,
A. R. Young
Abstract:
The theory of mirror matter predicts a hidden sector made up of a copy of the Standard Model particles and interactions but with opposite parity. If mirror matter interacts with ordinary matter, there could be experimentally accessible implications in the form of neutral particle oscillations. Direct searches for neutron oscillations into mirror neutrons in a controlled magnetic field have previou…
▽ More
The theory of mirror matter predicts a hidden sector made up of a copy of the Standard Model particles and interactions but with opposite parity. If mirror matter interacts with ordinary matter, there could be experimentally accessible implications in the form of neutral particle oscillations. Direct searches for neutron oscillations into mirror neutrons in a controlled magnetic field have previously been performed using ultracold neutrons in storage/disappearance measurements, with some inconclusive results consistent with characteristic oscillation time of $τ$$\sim$10~s. Here we describe a proposed disappearance and regeneration experiment in which the neutron oscillates to and from a mirror neutron state. An experiment performed using the existing General Purpose-Small Angle Neutron Scattering instrument at the High Flux Isotope Reactor at Oak Ridge National Laboratory could have the sensitivity to exclude up to $τ$$<$15~s in 1 week of beamtime and at low cost.
△ Less
Submitted 25 October, 2017; v1 submitted 2 October, 2017;
originally announced October 2017.
-
Demonstration of a novel focusing small-angle neutron scattering instrument equipped with axisymmetric mirrors
Authors:
Dazhi Liu,
Boris Khaykovich,
Mikhail V. Gubarev,
J Lee Robertson,
Lowell Crow,
Brian D. Ramsey,
David E. Moncton
Abstract:
Small-angle neutron scattering (SANS) is the most significant neutron technique in terms of impact on science and engineering. However, the basic concept of SANS facilities has not changed since the technique's inception about 40 years ago, as all SANS instruments, save a few, are still designed as pinhole cameras. Here we demonstrate a novel concept for a SANS instrument, based on axisymmetric fo…
▽ More
Small-angle neutron scattering (SANS) is the most significant neutron technique in terms of impact on science and engineering. However, the basic concept of SANS facilities has not changed since the technique's inception about 40 years ago, as all SANS instruments, save a few, are still designed as pinhole cameras. Here we demonstrate a novel concept for a SANS instrument, based on axisymmetric focusing mirrors. We build and test a small prototype, which shows a performance comparable to that of conventional large SANS facilities. By using a detector with 50-micron pixels, we build the most compact SANS instrument in the world. This work, together with the recent demonstration that such mirrors could increase the signal rate at least 50-fold, while improving resolution, paves the way to novel SANS instruments, thus affecting a broad community of scientists and engineers.
△ Less
Submitted 4 October, 2013;
originally announced October 2013.
-
Tests of Modulated Intensity Small Angle Scattering in time of flight mode
Authors:
G. Brandl,
J. Lal,
J. Carpenter,
L. Crow,
L. Robertson,
R. Georgii,
P. Böni,
M. Bleuel
Abstract:
We report results of tests of the MISANS technique at the CG-1D beamline at High Flux Isotope Reactor (HFIR), Oak Ridge National Laboratory (ORNL). A chopper at 40 Hz simulated a pulsed neutron source at the beamline. A compact turn-key MISANS module operating with the pulsed beam was installed and a well characterised MnSi sample was tested. The feasibility of application of high magnetic fields…
▽ More
We report results of tests of the MISANS technique at the CG-1D beamline at High Flux Isotope Reactor (HFIR), Oak Ridge National Laboratory (ORNL). A chopper at 40 Hz simulated a pulsed neutron source at the beamline. A compact turn-key MISANS module operating with the pulsed beam was installed and a well characterised MnSi sample was tested. The feasibility of application of high magnetic fields at the sample position was also explored. These tests demonstrate the great potential of this technique, in particular for examining magnetic and depolarizing samples, under extreme sample environments at pulsed sources, such as the Spallation Neutron Source (SNS) or the planned European Spallation Source (ESS).
△ Less
Submitted 13 December, 2011;
originally announced December 2011.