-
Topological Obstructions to Autoencoding
Authors:
Joshua Batson,
C. Grace Haaf,
Yonatan Kahn,
Daniel A. Roberts
Abstract:
Autoencoders have been proposed as a powerful tool for model-independent anomaly detection in high-energy physics. The operating principle is that events which do not belong to the space of training data will be reconstructed poorly, thus flagging them as anomalies. We point out that in a variety of examples of interest, the connection between large reconstruction error and anomalies is not so cle…
▽ More
Autoencoders have been proposed as a powerful tool for model-independent anomaly detection in high-energy physics. The operating principle is that events which do not belong to the space of training data will be reconstructed poorly, thus flagging them as anomalies. We point out that in a variety of examples of interest, the connection between large reconstruction error and anomalies is not so clear. In particular, for data sets with nontrivial topology, there will always be points that erroneously seem anomalous due to global issues. Conversely, neural networks typically have an inductive bias or prior to locally interpolate such that undersampled or rare events may be reconstructed with small error, despite actually being the desired anomalies. Taken together, these facts are in tension with the simple picture of the autoencoder as an anomaly detector. Using a series of illustrative low-dimensional examples, we show explicitly how the intrinsic and extrinsic topology of the dataset affects the behavior of an autoencoder and how this topology is manifested in the latent space representation during training. We ground this analysis in the discussion of a mock "bump hunt" in which the autoencoder fails to identify an anomalous "signal" for reasons tied to the intrinsic topology of $n$-particle phase space.
△ Less
Submitted 3 May, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
A comparison of group testing architectures for COVID-19 testing
Authors:
J. Batson,
N. Bottman,
Y. Cooper,
F. Janda
Abstract:
An important component of every country's COVID-19 response is fast and efficient testing - to identify and isolate cases, as well as for early detection of local hotspots. For many countries, producing a sufficient number of tests has been a serious limiting factor in their efforts to control COVID-19 infections. Group testing is a well-established mathematical tool, which can provide a substanti…
▽ More
An important component of every country's COVID-19 response is fast and efficient testing - to identify and isolate cases, as well as for early detection of local hotspots. For many countries, producing a sufficient number of tests has been a serious limiting factor in their efforts to control COVID-19 infections. Group testing is a well-established mathematical tool, which can provide a substantial and inexpensive expansion of testing capacity. In this note, we compare several popular group testing schemes in the context of qPCR testing for COVID-19. We find that in practical settings, for identification of individuals with COVID-19, Dorfman testing is the best choice at prevalences up to 30%, while for estimation of COVID-19 prevalence rates in the total population, Gibbs-Gower testing is the best choice at prevalences up to 30% given a fixed and relatively small number of tests. For instance, at a prevalence of up to 2%, Dorfman testing gives an efficiency gain of 3.5--8; at 1% prevalence, Gibbs-Gower testing gives an efficiency gain of 18, even when capping the pool size at a feasible number .
This note is intended as a helpful handbook for labs implementing group testing methods.
△ Less
Submitted 23 October, 2020; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Noise2Self: Blind Denoising by Self-Supervision
Authors:
Joshua Batson,
Loic Royer
Abstract:
We propose a general framework for denoising high-dimensional measurements which requires no prior on the signal, no estimate of the noise, and no clean training data. The only assumption is that the noise exhibits statistical independence across different dimensions of the measurement, while the true signal exhibits some correlation. For a broad class of functions ("$\mathcal{J}$-invariant"), it…
▽ More
We propose a general framework for denoising high-dimensional measurements which requires no prior on the signal, no estimate of the noise, and no clean training data. The only assumption is that the noise exhibits statistical independence across different dimensions of the measurement, while the true signal exhibits some correlation. For a broad class of functions ("$\mathcal{J}$-invariant"), it is then possible to estimate the performance of a denoiser from noisy data alone. This allows us to calibrate $\mathcal{J}$-invariant versions of any parameterised denoising algorithm, from the single hyperparameter of a median filter to the millions of weights of a deep neural network. We demonstrate this on natural image and microscopy data, where we exploit noise independence between pixels, and on single-cell gene expression data, where we exploit independence between detections of individual molecules. This framework generalizes recent work on training neural nets from noisy images and on cross-validation for matrix factorization.
△ Less
Submitted 8 June, 2019; v1 submitted 30 January, 2019;
originally announced January 2019.