-
Extremum Encoding for Joint Baseband Signal Compression and Time-Delay Estimation for Distributed Systems
Authors:
Amir Weiss,
Yuval Kochman,
Gregory W. Wornell
Abstract:
The ubiquitous time-delay estimation (TDE) problem becomes nontrivial when sensors are non-co-located and communication between them is limited. Building on the recently proposed "extremum encoding" compression-estimation scheme, we address the critical extension to complex-valued signals, suitable for radio-frequency (RF) baseband processing. This extension introduces new challenges, e.g., due to…
▽ More
The ubiquitous time-delay estimation (TDE) problem becomes nontrivial when sensors are non-co-located and communication between them is limited. Building on the recently proposed "extremum encoding" compression-estimation scheme, we address the critical extension to complex-valued signals, suitable for radio-frequency (RF) baseband processing. This extension introduces new challenges, e.g., due to unknown phase of the signal of interest and random phase of the noise, rendering a naïve application of the original scheme inapplicable and irrelevant. In the face of these challenges, we propose a judiciously adapted, though natural, extension of the scheme, paving its way to RF applications. While our extension leads to a different statistical analysis, including extremes of non-Gaussian distributions, we show that, ultimately, its asymptotic behavior is akin to the original scheme. We derive an exponentially tight upper bound on its error probability, corroborate our results via simulation experiments, and demonstrate the superior performance compared to two benchmark approaches.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Estimating the Number and Locations of Boundaries in Reverberant Environments with Deep Learning
Authors:
Toros Arikan,
Luca M. Chackalackal,
Fatima Ahsan,
Konrad Tittel,
Andrew C. Singer,
Gregory W. Wornell,
Richard G. Baraniuk
Abstract:
Underwater acoustic environment estimation is a challenging but important task for remote sensing scenarios. Current estimation methods require high signal strength and a solution to the fragile echo labeling problem to be effective. In previous publications, we proposed a general deep learning-based method for two-dimensional environment estimation which outperformed the state-of-the-art, both in…
▽ More
Underwater acoustic environment estimation is a challenging but important task for remote sensing scenarios. Current estimation methods require high signal strength and a solution to the fragile echo labeling problem to be effective. In previous publications, we proposed a general deep learning-based method for two-dimensional environment estimation which outperformed the state-of-the-art, both in simulation and in real-life experimental settings. A limitation of this method was that some prior information had to be provided by the user on the number and locations of the reflective boundaries, and that its neural networks had to be re-trained accordingly for different environments. Utilizing more advanced neural network and time delay estimation techniques, the proposed improved method no longer requires prior knowledge the number of boundaries or their locations, and is able to estimate two-dimensional environments with one or two boundaries. Future work will extend the proposed method to more boundaries and larger-scale environments.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation
Authors:
J. Jon Ryu,
Abhin Shah,
Gregory W. Wornell
Abstract:
This paper studies a family of estimators based on noise-contrastive estimation (NCE) for learning unnormalized distributions. The main contribution of this work is to provide a unified perspective on various methods for learning unnormalized distributions, which have been independently proposed and studied in separate research communities, through the lens of NCE. This unified view offers new ins…
▽ More
This paper studies a family of estimators based on noise-contrastive estimation (NCE) for learning unnormalized distributions. The main contribution of this work is to provide a unified perspective on various methods for learning unnormalized distributions, which have been independently proposed and studied in separate research communities, through the lens of NCE. This unified view offers new insights into existing estimators. Specifically, for exponential families, we establish the finite-sample convergence rates of the proposed estimators under a set of regularity assumptions, most of which are new.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
RF Challenge: The Data-Driven Radio Frequency Signal Separation Challenge
Authors:
Alejandro Lancho,
Amir Weiss,
Gary C. F. Lee,
Tejas Jayashankar,
Binoy Kurien,
Yury Polyanskiy,
Gregory W. Wornell
Abstract:
We address the critical problem of interference rejection in radio-frequency (RF) signals using a data-driven approach that leverages deep-learning methods. A primary contribution of this paper is the introduction of the RF Challenge, which is a publicly available, diverse RF signal dataset for data-driven analyses of RF signal problems. Specifically, we adopt a simplified signal model for develop…
▽ More
We address the critical problem of interference rejection in radio-frequency (RF) signals using a data-driven approach that leverages deep-learning methods. A primary contribution of this paper is the introduction of the RF Challenge, which is a publicly available, diverse RF signal dataset for data-driven analyses of RF signal problems. Specifically, we adopt a simplified signal model for developing and analyzing interference rejection algorithms. For this signal model, we introduce a set of carefully chosen deep learning architectures, incorporating key domain-informed modifications alongside traditional benchmark solutions to establish baseline performance metrics for this intricate, ubiquitous problem. Through extensive simulations involving eight different signal mixture types, we demonstrate the superior performance (in some cases, by two orders of magnitude) of architectures such as UNet and WaveNet over traditional methods like matched filtering and linear minimum mean square error estimation. Our findings suggest that the data-driven approach can yield scalable solutions, in the sense that the same architectures may be similarly trained and deployed for different types of signals. Moreover, these findings further corroborate the promising potential of deep learning algorithms for enhancing communication systems, particularly via interference mitigation. This work also includes results from an open competition based on the RF Challenge, hosted at the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'24).
△ Less
Submitted 27 March, 2025; v1 submitted 13 September, 2024;
originally announced September 2024.
-
FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces
Authors:
Safa C. Medin,
Gengyan Li,
Ruofei Du,
Stephan Garbin,
Philip Davidson,
Gregory W. Wornell,
Thabo Beeler,
Abhimitra Meka
Abstract:
3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts$\unicode{x2014}$photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- a…
▽ More
3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts$\unicode{x2014}$photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- and hardware, and allows for a graceful trade-off between quality and efficiency. Our method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. We achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. We export a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. We demonstrate our method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
A Joint Data Compression and Time-Delay Estimation Method For Distributed Systems via Extremum Encoding
Authors:
Amir Weiss,
Yuval Kochman,
Gregory W. Wornell
Abstract:
Motivated by the proliferation of mobile devices, we consider a basic form of the ubiquitous problem of time-delay estimation (TDE), but with communication constraints between two non co-located sensors. In this setting, when joint processing of the received signals is not possible, a compression technique that is tailored to TDE is desirable. For our basic TDE formulation, we develop such a joint…
▽ More
Motivated by the proliferation of mobile devices, we consider a basic form of the ubiquitous problem of time-delay estimation (TDE), but with communication constraints between two non co-located sensors. In this setting, when joint processing of the received signals is not possible, a compression technique that is tailored to TDE is desirable. For our basic TDE formulation, we develop such a joint compression-estimation strategy based on the notion of what we term "extremum encoding", whereby we send the index of the maximum of a finite-length time-series from one sensor to another. Subsequent joint processing of the encoded message with locally observed data gives rise to our proposed time-delay "maximum-index"-based estimator. We derive an exponentially tight upper bound on its error probability, establishing its consistency with respect to the number of transmitted bits. We further validate our analysis via simulations, and comment on potential extensions and generalizations of the basic methodology.
△ Less
Submitted 4 December, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?
Authors:
Maohao Shen,
J. Jon Ryu,
Soumya Ghosh,
Yuheng Bu,
Prasanna Sattigeri,
Subhro Das,
Gregory W. Wornell
Abstract:
This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies…
▽ More
This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies by Bengs et al. identify limitations of the existing methods to conclude their learned epistemic uncertainties are unreliable, e.g., in that they are non-vanishing even with infinite data. Building on and sharpening such analysis, we 1) provide a sharper understanding of the asymptotic behavior of a wide class of EDL methods by unifying various objective functions; 2) reveal that the EDL methods can be better interpreted as an out-of-distribution detection algorithm based on energy-based-models; and 3) conduct extensive ablation studies to better assess their empirical effectiveness with real-world datasets. Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity.
△ Less
Submitted 31 October, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Gambling-Based Confidence Sequences for Bounded Random Vectors
Authors:
J. Jon Ryu,
Gregory W. Wornell
Abstract:
A confidence sequence (CS) is a sequence of confidence sets that contains a target parameter of an underlying stochastic process at any time step with high probability. This paper proposes a new approach to constructing CSs for means of bounded multivariate stochastic processes using a general gambling framework, extending the recently established coin toss framework for bounded random processes.…
▽ More
A confidence sequence (CS) is a sequence of confidence sets that contains a target parameter of an underlying stochastic process at any time step with high probability. This paper proposes a new approach to constructing CSs for means of bounded multivariate stochastic processes using a general gambling framework, extending the recently established coin toss framework for bounded random processes. The proposed gambling framework provides a general recipe for constructing CSs for categorical and probability-vector-valued observations, as well as for general bounded multidimensional observations through a simple reduction. This paper specifically explores the use of the mixture portfolio, akin to Cover's universal portfolio, in the proposed framework and investigates the properties of the resulting CSs. Simulations demonstrate the tightness of these confidence sequences compared to existing methods. When applied to the sampling without-replacement setting for finite categorical data, it is shown that the resulting CS based on a universal gambling strategy is provably tighter than that of the posterior-prior ratio martingale proposed by Waudby-Smith and Ramdas.
△ Less
Submitted 21 August, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Operator SVD with Neural Networks via Nested Low-Rank Approximation
Authors:
J. Jon Ryu,
Xiangxiang Xu,
H. S. Melihcan Erol,
Yuheng Bu,
Lizhong Zheng,
Gregory W. Wornell
Abstract:
Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra technique…
▽ More
Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques. This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition, accompanied by new techniques called \emph{nesting} for learning the top-$L$ singular values and singular functions in the correct order. The proposed method promotes the desired orthogonality in the learned functions implicitly and efficiently via an unconstrained optimization formulation, which is easy to solve with off-the-shelf gradient-based optimization algorithms. We demonstrate the effectiveness of the proposed optimization framework for use cases in computational physics and machine learning.
△ Less
Submitted 21 August, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
On Computationally Efficient Learning of Exponential Family Distributions
Authors:
Abhin Shah,
Devavrat Shah,
Gregory W. Wornell
Abstract:
We consider the classical problem of learning, with arbitrary accuracy, the natural parameters of a $k$-parameter truncated \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this…
▽ More
We consider the classical problem of learning, with arbitrary accuracy, the natural parameters of a $k$-parameter truncated \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a novel loss function and a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family. Further, we show that our estimator can be interpreted as a solution to minimizing a particular Bregman score as well as an instance of minimizing the \textit{surrogate} likelihood. We also provide finite sample guarantees to achieve an error (in $\ell_2$-norm) of $α$ in the parameter estimation with sample complexity $O({\sf poly}(k)/α^2)$. Our method achives the order-optimal sample complexity of $O({\sf log}(k)/α^2)$ when tailored for node-wise-sparse Markov random fields. Finally, we demonstrate the performance of our estimator via numerical experiments.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Score-based Source Separation with Applications to Digital Communication Signals
Authors:
Tejas Jayashankar,
Gary C. F. Lee,
Alejandro Lancho,
Amir Weiss,
Yury Polyanskiy,
Gregory W. Wornell
Abstract:
We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by maximum a posteriori estimation with an $α$-posterior, across multiple levels of Gaussian smoothing. Motivated by applications in radio-frequency (RF) systems, we a…
▽ More
We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by maximum a posteriori estimation with an $α$-posterior, across multiple levels of Gaussian smoothing. Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature and the recovery of encoded bits from a signal of interest, as measured by the bit error rate (BER). Experimental results with RF mixtures demonstrate that our method results in a BER reduction of 95% over classical and existing learning-based methods. Our analysis demonstrates that our proposed method yields solutions that asymptotically approach the modes of an underlying discrete distribution. Furthermore, our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme, shedding additional light on its use beyond conditional sampling. The project webpage is available at https://alpha-rgs.github.io
△ Less
Submitted 17 January, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Gibbs-Based Information Criteria and the Over-Parameterized Regime
Authors:
Haobo Chen,
Yuheng Bu,
Gregory W. Wornell
Abstract:
Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) an…
▽ More
Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by the Gibbs algorithm. Notably, the penalty terms for the Gibbs-based AIC and BIC correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by providing two different Gibbs-based BICs to compute the marginal likelihood of random feature models in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the Gibbs-based BIC can select the high-dimensional model and reveal the mismatch between marginal likelihood and population risk in the over-parameterized regime, providing new insights to understand double-descent.
△ Less
Submitted 13 November, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Online Segmented Recursive Least-Squares for Multipath Doppler Tracking
Authors:
Jae Won Choi,
Girish Chowdhary,
Andrew C. Singer,
Hari Vishnu,
Amir Weiss,
Gregory W. Wornell,
Grant Deane
Abstract:
Underwater communication signals typically suffer from distortion due to motion-induced Doppler. Especially in shallow water environments, recovering the signal is challenging due to the time-varying Doppler effects distorting each path differently. However, conventional Doppler estimation algorithms typically model uniform Doppler across all paths and often fail to provide robust Doppler tracking…
▽ More
Underwater communication signals typically suffer from distortion due to motion-induced Doppler. Especially in shallow water environments, recovering the signal is challenging due to the time-varying Doppler effects distorting each path differently. However, conventional Doppler estimation algorithms typically model uniform Doppler across all paths and often fail to provide robust Doppler tracking in multipath environments. In this paper, we propose a dynamic programming-inspired method, called online segmented recursive least-squares (OSRLS) to sequentially estimate the time-varying non-uniform Doppler across different multipath arrivals. By approximating the non-linear time distortion as a piece-wise-linear Markov model, we formulate the problem in a dynamic programming framework known as segmented least-squares (SLS). In order to circumvent an ill-conditioned formulation, perturbations are added to the Doppler model during the linearization process. The successful operation of the algorithm is demonstrated in a simulation on a synthetic channel with time-varying non-uniform Doppler.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Towards Robust Data-Driven Underwater Acoustic Localization: A Deep CNN Solution with Performance Guarantees for Model Mismatch
Authors:
Amir Weiss,
Andrew C. Singer,
Gregory W. Wornell
Abstract:
Key challenges in developing underwater acoustic localization methods are related to the combined effects of high reverberation in intricate environments. To address such challenges, recent studies have shown that with a properly designed architecture, neural networks can lead to unprecedented localization capabilities and enhanced accuracy. However, the robustness of such methods to environmental…
▽ More
Key challenges in developing underwater acoustic localization methods are related to the combined effects of high reverberation in intricate environments. To address such challenges, recent studies have shown that with a properly designed architecture, neural networks can lead to unprecedented localization capabilities and enhanced accuracy. However, the robustness of such methods to environmental mismatch is typically hard to characterize, and is usually assessed only empirically. In this work, we consider the recently proposed data-driven method [19] based on a deep convolutional neural network, and demonstrate that it can learn to localize in complex and mismatched environments. To explain this robustness, we provide an upper bound on the localization mean squared error (MSE) in the ``true" environment, in terms of the MSE in a ``presumed" environment and an additional penalty term related to the environmental discrepancy. Our theoretical results are corroborated via simulation results in a rich, highly reverberant, and mismatch channel.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
A Bilateral Bound on the Mean-Square Error for Estimation in Model Mismatch
Authors:
Amir Weiss,
Alejandro Lancho,
Yuheng Bu,
Gregory W. Wornell
Abstract:
A bilateral (i.e., upper and lower) bound on the mean-square error under a general model mismatch is developed. The bound, which is derived from the variational representation of the chi-square divergence, is applicable in the Bayesian and nonBayesian frameworks to biased and unbiased estimators. Unlike other classical MSE bounds that depend only on the model, our bound is also estimator-dependent…
▽ More
A bilateral (i.e., upper and lower) bound on the mean-square error under a general model mismatch is developed. The bound, which is derived from the variational representation of the chi-square divergence, is applicable in the Bayesian and nonBayesian frameworks to biased and unbiased estimators. Unlike other classical MSE bounds that depend only on the model, our bound is also estimator-dependent. Thus, it is applicable as a tool for characterizing the MSE of a specific estimator. The proposed bounding technique has a variety of applications, one of which is a tool for proving the consistency of estimators for a class of models. Furthermore, it provides insight as to why certain estimators work well under general model mismatch conditions.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals
Authors:
Gary C. F. Lee,
Amir Weiss,
Alejandro Lancho,
Yury Polyanskiy,
Gregory W. Wornell
Abstract:
We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time serie…
▽ More
We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time series). In this work, through a prototype problem based on the OFDM source model, we assess -- and question -- the efficacy of using audio-oriented neural architectures in separating signals based on features pertinent to communication waveforms. Perhaps surprisingly, we demonstrate that in some configurations, where perfect separation is theoretically attainable, these audio-oriented neural architectures perform poorly in separating co-channel OFDM waveforms. Yet, we propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures, that can confer about 30 dB improvement in performance.
△ Less
Submitted 15 March, 2023; v1 submitted 11 March, 2023;
originally announced March 2023.
-
Group Fairness with Uncertainty in Sensitive Attributes
Authors:
Abhin Shah,
Maohao Shen,
Jongha Jon Ryu,
Subhro Das,
Prasanna Sattigeri,
Yuheng Bu,
Gregory W. Wornell
Abstract:
Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty.…
▽ More
Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty. We demonstrate that solely enforcing fairness constraints on uncertain sensitive attributes can fall significantly short in achieving the level of fairness of models trained without uncertainty. To overcome this limitation, we propose a bootstrap-based algorithm that achieves the target level of fairness despite the uncertainty in sensitive attributes. The algorithm is guided by a Gaussian analysis for the independence notion of fairness where we propose a robust quadratically constrained quadratic problem to ensure a strict fairness guarantee with uncertain sensitive attributes. Our algorithm is applicable to both discrete and continuous sensitive attributes and is effective in real-world classification and regression tasks for various group fairness notions, e.g., independence and separation.
△ Less
Submitted 7 June, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
On counterfactual inference with unobserved confounding
Authors:
Abhin Shah,
Raaz Dwivedi,
Devavrat Shah,
Gregory W. Wornell
Abstract:
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneit…
▽ More
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.
△ Less
Submitted 14 September, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
Information-theoretic Characterizations of Generalization Error for the Gibbs Algorithm
Authors:
Gholamali Aminian,
Yuheng Bu,
Laura Toni,
Miguel R. D. Rodrigues,
Gregory W. Wornell
Abstract:
Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the wel…
▽ More
Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using different information measures, in particular, the symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our information-theoretic approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with a data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the standard empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Can Shadows Reveal Biometric Information?
Authors:
Safa C. Medin,
Amir Weiss,
Frédo Durand,
William T. Freeman,
Gregory W. Wornell
Abstract:
We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, e…
▽ More
We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, exploiting the subtle cues in the shadows that are the source of the leakage without requiring any labeled real data. In particular, our approach relies on building synthetic scenes composed of 3D face models obtained from a single photograph of each identity. We transfer what we learn from the synthetic data to the real data using domain adaptation in a completely unsupervised way. Our model is able to generalize well to the real domain and is robust to several variations in the scenes. We report high classification accuracies in an identity classification task that takes place in a scene with unknown geometry and occluding objects.
△ Less
Submitted 4 October, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Data-Driven Blind Synchronization and Interference Rejection for Digital Communication Signals
Authors:
Alejandro Lancho,
Amir Weiss,
Gary C. F. Lee,
Jennifer Tang,
Yuheng Bu,
Yury Polyanskiy,
Gregory W. Wornell
Abstract:
We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separati…
▽ More
We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separation problem is also referred to as interference rejection. We show that capturing high-resolution temporal structures (nonstationarities), which enables accurate synchronization to both the SOI and the interference, leads to substantial performance gains. With this key insight, we propose a domain-informed neural network (NN) design that is able to improve upon both "off-the-shelf" NNs and classical detection and interference rejection methods, as demonstrated in our simulations. Our findings highlight the key role communication-specific domain knowledge plays in the development of data-driven approaches that hold the promise of unprecedented gains.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation
Authors:
Gary C. F. Lee,
Amir Weiss,
Alejandro Lancho,
Jennifer Tang,
Yuheng Bu,
Yury Polyanskiy,
Gregory W. Wornell
Abstract:
We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian cons…
▽ More
We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian constituents, we establish a lower bound on the attainable mean squared error (MSE) for any separation method, model-based or data-driven. Our analysis further reveals the operation for optimal separation and the associated implementation challenges. As a computationally attractive alternative, we propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator. We demonstrate in simulation that, with suitable domain-informed architectural choices, our U-Net method can approach the optimal performance with substantially reduced computational burden.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Direct Localization in Underwater Acoustics via Convolutional Neural Networks: A Data-Driven Approach
Authors:
Amir Weiss,
Toros Arikan,
Gregory W. Wornell
Abstract:
Direct localization (DLOC) methods, which use the observed data to localize a source at an unknown position in a one-step procedure, generally outperform their indirect two-step counterparts (e.g., using time-difference of arrivals). However, underwater acoustic DLOC methods require prior knowledge of the environment, and are computationally costly, hence slow. We propose, what is to the best of o…
▽ More
Direct localization (DLOC) methods, which use the observed data to localize a source at an unknown position in a one-step procedure, generally outperform their indirect two-step counterparts (e.g., using time-difference of arrivals). However, underwater acoustic DLOC methods require prior knowledge of the environment, and are computationally costly, hence slow. We propose, what is to the best of our knowledge, the first data-driven DLOC method. Inspired by classical and contemporary optimal model-based DLOC solutions, and leveraging the capabilities of convolutional neural networks (CNNs), we devise a holistic CNN-based solution. Our method includes a specifically-tailored input structure, architecture, loss function, and a progressive training procedure, which are of independent interest in the broader context of machine learning. We demonstrate that our method outperforms attractive alternatives, and asymptotically matches the performance of an oracle optimal model-based solution.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Selective Regression Under Fairness Criteria
Authors:
Abhin Shah,
Yuheng Bu,
Joshua Ka-Wing Lee,
Subhro Das,
Rameswar Panda,
Prasanna Sattigeri,
Gregory W. Wornell
Abstract:
Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we redu…
▽ More
Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, than the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.
△ Less
Submitted 14 July, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
A Computationally Efficient Method for Learning Exponential Family Distributions
Authors:
Abhin Shah,
Devavrat Shah,
Gregory W. Wornell
Abstract:
We consider the question of learning the natural parameters of a $k$ parameter minimal exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotica…
▽ More
We consider the question of learning the natural parameters of a $k$ parameter minimal exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We provide finite sample guarantees to achieve an ($\ell_2$) error of $α$ in the parameter estimation with sample complexity $O(\mathrm{poly}(k/α))$ and computational complexity ${O}(\mathrm{poly}(k/α))$. To establish these results, we show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
A Semi-Blind Method for Localization of Underwater Acoustic Sources
Authors:
Amir Weiss,
Toros Arikan,
Hari Vishnu,
Grant B. Deane,
Andrew C. Singer,
Gregory W. Wornell
Abstract:
Underwater acoustic localization has traditionally been challenging due to the presence of unknown environmental structure and dynamic conditions. The problem is richer still when such structure includes occlusion, which causes the loss of line-of-sight (LOS) between the acoustic source and the receivers, on which many of the existing localization algorithms rely. We develop a semi-blind passive l…
▽ More
Underwater acoustic localization has traditionally been challenging due to the presence of unknown environmental structure and dynamic conditions. The problem is richer still when such structure includes occlusion, which causes the loss of line-of-sight (LOS) between the acoustic source and the receivers, on which many of the existing localization algorithms rely. We develop a semi-blind passive localization method capable of accurately estimating the source's position even in the possible absence of LOS between the source and all receivers. Based on typically-available prior knowledge of the water surface and bottom, we derive a closed-form expression for the optimal estimator under a multi-ray propagation model, which is suitable for shallow-water environments and high-frequency signals. By exploiting a computationally efficient form of this estimator, our methodology makes comparatively high-resolution localization feasible. We also derive the Cramér-Rao bound for this model, which can be used to guide the placement of collections of receivers so as to optimize localization accuracy. The method improves a balance of accuracy and robustness to environmental model mismatch, relative to existing localization methods that are useful in similar settings. The method is validated with simulations and water tank experiments.
△ Less
Submitted 2 February, 2023; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Blind Modulo Analog-to-Digital Conversion of Vector Processes
Authors:
Amir Weiss,
Everest Huang,
Or Ordentlich,
Gregory W. Wornell
Abstract:
In a growing number of applications, there is a need to digitize a (possibly high) number of correlated signals whose spectral characteristics are challenging for traditional analog-to-digital converters (ADCs). Examples, among others, include multiple-input multiple-output systems where the ADCs must acquire at once several signals at a very wide but sparsely and dynamically occupied bandwidth su…
▽ More
In a growing number of applications, there is a need to digitize a (possibly high) number of correlated signals whose spectral characteristics are challenging for traditional analog-to-digital converters (ADCs). Examples, among others, include multiple-input multiple-output systems where the ADCs must acquire at once several signals at a very wide but sparsely and dynamically occupied bandwidth supporting diverse services. In such scenarios, the resolution requirements can be prohibitively high. As an alternative, the recently proposed modulo-ADC architecture can in principle require dramatically fewer bits in the conversion to obtain the target fidelity, but requires that spatiotemporal information be known and explicitly taken into account by the analog and digital processing in the converter, which is frequently impractical. Building on our recent work, we address this limitation and develop a blind version of the architecture that requires no such knowledge in the converter. In particular, it features an automatic modulo-level adjustment and a fully adaptive modulo-decoding mechanism, allowing it to asymptotically match the characteristics of the unknown input signal. Simulation results demonstrate the successful operation of the proposed algorithm.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
What You Can Learn by Staring at a Blank Wall
Authors:
Prafull Sharma,
Miika Aittala,
Yoav Y. Schechner,
Antonio Torralba,
Gregory W. Wornell,
William T. Freeman,
Fredo Durand
Abstract:
We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two m…
▽ More
We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two moving people, or the activity of a person in the hidden scene. We train two convolutional neural networks using data collected from 20 different scenes, and achieve an accuracy of $\approx94\%$ for both tasks in unseen test environments and real-time online settings. Unlike other passive non-line-of-sight methods, the technique does not rely on known occluders or controllable light sources, and generalizes to unknown rooms with no re-calibration. We analyze the generalization and robustness of our method with both real and synthetic data, and study the effect of the scene parameters on the signal quality.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Blind Modulo Analog-to-Digital Conversion
Authors:
Amir Weiss,
Everest Huang,
Or Ordentlich,
Gregory W. Wornell
Abstract:
In a growing number of applications, there is a need to digitize signals whose spectral characteristics are challenging for traditional Analog-to-Digital Converters (ADCs). Examples, among others, include systems where the ADC must acquire at once a very wide but sparsely and dynamically occupied bandwidth supporting diverse services, as well as systems where the signal of interest is subject to s…
▽ More
In a growing number of applications, there is a need to digitize signals whose spectral characteristics are challenging for traditional Analog-to-Digital Converters (ADCs). Examples, among others, include systems where the ADC must acquire at once a very wide but sparsely and dynamically occupied bandwidth supporting diverse services, as well as systems where the signal of interest is subject to strong narrowband co-channel interference. In such scenarios, the resolution requirements can be prohibitively high. As an alternative, the recently proposed modulo-ADC architecture can in principle require dramatically fewer bits in the conversation to obtain the target fidelity, but requires that information about the spectrum be known and explicitly taken into account by the analog and digital processing in the converter, which is frequently impractical. To address this limitation, we develop a blind version of the architecture that requires no such knowledge in the converter, without sacrificing performance. In particular, it features an automatic modulo-level adjustment and a fully adaptive modulo unwrapping mechanism, allowing it to asymptotically match the characteristics of the unknown input signal. In addition to detailed analysis, simulations demonstrate the attractive performance characteristics in representative settings.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
One-Bit Direct Position Determination of Narrowband Gaussian Signals
Authors:
Amir Weiss,
Gregory W. Wornell
Abstract:
One of the main drawbacks of the well-known Direct Position Determination (DPD) method is the requirement that raw signal data be transferred to a common processor. It would therefore be of high practical value if DPD$-$or a modified version thereof$-$could be successfully applied to a coarsely quantized version of the raw data, thus alleviating the requirements on the communication links between…
▽ More
One of the main drawbacks of the well-known Direct Position Determination (DPD) method is the requirement that raw signal data be transferred to a common processor. It would therefore be of high practical value if DPD$-$or a modified version thereof$-$could be successfully applied to a coarsely quantized version of the raw data, thus alleviating the requirements on the communication links between the different base stations. Motivated by the above, and inspired by recent work in the rejuvenated one-bit array processing field, we present One-Bit DPD: a method for direct localization based on one-bit quantized measurements. We show that despite the coarse quantization, the proposed method nonetheless yields an estimate for the unknown emitter position with appealing asymptotic properties. We further establish the underlying identifiability conditions of this model, which rely only on second-order statistics. Empirical simulation results corroborate our analytical derivations, demonstrating that much of the information regarding the unknown emitter position is preserved under this crude form of quantization.
△ Less
Submitted 25 May, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
On Learning Continuous Pairwise Markov Random Fields
Authors:
Abhin Shah,
Devavrat Shah,
Gregory W. Wornell
Abstract:
We consider learning a sparse pairwise Markov Random Field (MRF) with continuous-valued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finite-sample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRF…
▽ More
We consider learning a sparse pairwise Markov Random Field (MRF) with continuous-valued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finite-sample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRFs with continuous variables and also has desirable asymptotic properties, including consistency and normality under mild conditions. Further, we establish that the population version of the optimization criterion employed in Vuffray et al. (2019) can be interpreted as local maximum likelihood estimation (MLE). As part of our analysis, we introduce a robust variation of sparse linear regression a` la Lasso, which may be of interest in its own right.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization
Authors:
Miika Aittala,
Prafull Sharma,
Lukas Murmann,
Adam B. Yedidia,
Gregory W. Wornell,
William T. Freeman,
Fredo Durand
Abstract:
We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Insp…
▽ More
We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Inspired by recent work on the Deep Image Prior, we parameterize the factor matrices using randomly initialized convolutional neural networks trained in a one-off manner, and show that this results in decompositions that reflect the true motion in the hidden scene.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
On Universal Features for High-Dimensional Learning and Inference
Authors:
Shao-Lun Huang,
Anuran Makur,
Gregory W. Wornell,
Lizhong Zheng
Abstract:
We consider the problem of identifying universal low-dimensional features from high-dimensional data for inference tasks in settings involving learning. For such problems, we introduce natural notions of universality and we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a conceptually and computationally useful analysis. The develo…
▽ More
We consider the problem of identifying universal low-dimensional features from high-dimensional data for inference tasks in settings involving learning. For such problems, we introduce natural notions of universality and we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a conceptually and computationally useful analysis. The development reveals the complementary roles of the singular value decomposition, Hirschfeld-Gebelein-Rényi maximal correlation, the canonical correlation and principle component analyses of Hotelling and Pearson, Tishby's information bottleneck, Wyner's common information, Ky Fan $k$-norms, and Brieman and Friedman's alternating conditional expectations algorithm. We further illustrate how this framework facilitates understanding and optimizing aspects of learning systems, including multinomial logistic (softmax) regression and the associated neural network architecture, matrix factorization methods for collaborative filtering and other applications, rank-constrained multivariate linear regression, and forms of semi-supervised learning.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Super-Nyquist Rateless Coding for Intersymbol Interference Channels
Authors:
Uri Erez,
Gregory W. Wornell
Abstract:
A rateless transmission architecture is developed for communication over Gaussian intersymbol interference channels, based on the concept of super-Nyquist (SNQ) signaling. In such systems, the signaling rate is chosen significantly higher than the Nyquist rate of the system. We show that such signaling, when used in conjunction with good "off-the-shelf" base codes, simple linear redundancy, and mi…
▽ More
A rateless transmission architecture is developed for communication over Gaussian intersymbol interference channels, based on the concept of super-Nyquist (SNQ) signaling. In such systems, the signaling rate is chosen significantly higher than the Nyquist rate of the system. We show that such signaling, when used in conjunction with good "off-the-shelf" base codes, simple linear redundancy, and minimum mean-square error decision feedback equalization, results in capacity-approaching, low-complexity rateless codes for the time-varying intersymbol-interference channel. Constructions for both single-input / single-output (SISO) and multi-input / multi-output (MIMO) ISI channels are developed.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
An Information Theoretic Interpretation to Deep Neural Networks
Authors:
Shao-Lun Huang,
Xiangxiang Xu,
Lizhong Zheng,
Gregory W. Wornell
Abstract:
It is commonly believed that the hidden layers of deep neural networks (DNNs) attempt to extract informative features for learning tasks. In this paper, we formalize this intuition by showing that the features extracted by DNN coincide with the result of an optimization problem, which we call the `universal feature selection' problem, in a local analysis regime. We interpret the weights training i…
▽ More
It is commonly believed that the hidden layers of deep neural networks (DNNs) attempt to extract informative features for learning tasks. In this paper, we formalize this intuition by showing that the features extracted by DNN coincide with the result of an optimization problem, which we call the `universal feature selection' problem, in a local analysis regime. We interpret the weights training in DNN as the projection of feature functions between feature spaces, specified by the network structure. Our formulation has direct operational meaning in terms of the performance for inference tasks, and gives interpretations to the internal computation results of DNNs. Results of numerical experiments are provided to support the analysis.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss
Authors:
Amichai Painsky,
Gregory W. Wornell
Abstract:
A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullb…
▽ More
A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitrary finite alphabets are also developed. The derived inequalities extend several well-known $f$-divergence results.
△ Less
Submitted 2 January, 2020; v1 submitted 14 October, 2018;
originally announced October 2018.
-
A Modulo-Based Architecture for Analog-to-Digital Conversion
Authors:
Or Ordentlich,
Gizem Tabak,
Pavan Kumar Hanumolu,
Andrew C. Singer,
Gregory W. Wornell
Abstract:
Systems that capture and process analog signals must first acquire them through an analog-to-digital converter. While subsequent digital processing can remove statistical correlations present in the acquired data, the dynamic range of the converter is typically scaled to match that of the input analog signal. The present paper develops an approach for analog-to-digital conversion that aims at mini…
▽ More
Systems that capture and process analog signals must first acquire them through an analog-to-digital converter. While subsequent digital processing can remove statistical correlations present in the acquired data, the dynamic range of the converter is typically scaled to match that of the input analog signal. The present paper develops an approach for analog-to-digital conversion that aims at minimizing the number of bits per sample at the output of the converter. This is attained by reducing the dynamic range of the analog signal by performing a modulo operation on its amplitude, and then quantizing the result. While the converter itself is universal and agnostic of the statistics of the signal, the decoder operation on the output of the quantizer can exploit the statistical structure in order to unwrap the modulo folding. The performance of this method is shown to approach information theoretical limits, as captured by the rate-distortion function, in various settings. An architecture for modulo analog-to-digital conversion via ring oscillators is suggested, and its merits are numerically demonstrated.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
On the Universality of the Logistic Loss Function
Authors:
Amichai Painsky,
Gregory W. Wornell
Abstract:
A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and th…
▽ More
A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
Revealing hidden scenes by photon-efficient occlusion-based opportunistic active imaging
Authors:
Feihu Xu,
Gal Shulkind,
Christos Thrampoulidis,
Jeffrey H. Shapiro,
Antonio Torralba,
Franco N. C. Wong,
Gregory W. Wornell
Abstract:
The ability to see around corners, i.e., recover details of a hidden scene from its reflections in the surrounding environment, is of considerable interest in a wide range of applications. However, the diffuse nature of light reflected from typical surfaces leads to mixing of spatial information in the collected light, precluding useful scene reconstruction. Here, we employ a computational imaging…
▽ More
The ability to see around corners, i.e., recover details of a hidden scene from its reflections in the surrounding environment, is of considerable interest in a wide range of applications. However, the diffuse nature of light reflected from typical surfaces leads to mixing of spatial information in the collected light, precluding useful scene reconstruction. Here, we employ a computational imaging technique that opportunistically exploits the presence of occluding objects, which obstruct probe-light propagation in the hidden scene, to undo the mixing and greatly improve scene recovery. Importantly, our technique obviates the need for the ultrafast time-of-flight measurements employed by most previous approaches to hidden-scene imaging. Moreover, it does so in a photon-efficient manner based on an accurate forward model and a computational algorithm that, together, respect the physics of three-bounce light propagation and single-photon detection. Using our methodology, we demonstrate reconstruction of hidden-surface reflectivity patterns in a meter-scale environment from non-time-resolved measurements. Ultimately, our technique represents an instance of a rich and promising new imaging modality with important potential implications for imaging science.
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Exploiting Occlusion in Non-Line-of-Sight Active Imaging
Authors:
Christos Thrampoulidis,
Gal Shulkind,
Feihu Xu,
William T. Freeman,
Jeffrey H. Shapiro,
Antonio Torralba,
Franco N. C. Wong,
Gregory W. Wornell
Abstract:
Active non-line-of-sight imaging systems are of growing interest for diverse applications. The most commonly proposed approaches to date rely on exploiting time-resolved measurements, i.e., measuring the time it takes for short light pulses to transit the scene. This typically requires expensive, specialized, ultrafast lasers and detectors that must be carefully calibrated. We develop an alternati…
▽ More
Active non-line-of-sight imaging systems are of growing interest for diverse applications. The most commonly proposed approaches to date rely on exploiting time-resolved measurements, i.e., measuring the time it takes for short light pulses to transit the scene. This typically requires expensive, specialized, ultrafast lasers and detectors that must be carefully calibrated. We develop an alternative approach that exploits the valuable role that natural occluders in a scene play in enabling accurate and practical image formation in such settings without such hardware complexity. In particular, we demonstrate that the presence of occluders in the hidden scene can obviate the need for collecting time-resolved measurements, and develop an accompanying analysis for such systems and their generalizations. Ultimately, the results suggest the potential to develop increasingly sophisticated future systems that are able to identify and exploit diverse structural features of the environment to reconstruct scenes hidden from view.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Covert Communication with Channel-State Information at the Transmitter
Authors:
Si-Hyeon Lee,
Ligong Wang,
Ashish Khisti,
Gregory W. Wornell
Abstract:
We consider the problem of covert communication over a state-dependent channel, where the transmitter has causal or noncausal knowledge of the channel states. Here, "covert" means that a warden on the channel should observe similar statistics when the transmitter is sending a message and when it is not. When a sufficiently long secret key is shared between the transmitter and the receiver, we deri…
▽ More
We consider the problem of covert communication over a state-dependent channel, where the transmitter has causal or noncausal knowledge of the channel states. Here, "covert" means that a warden on the channel should observe similar statistics when the transmitter is sending a message and when it is not. When a sufficiently long secret key is shared between the transmitter and the receiver, we derive closed-form formulas for the maximum achievable covert communication rate ("covert capacity") for discrete memoryless channels and, when the transmitter's channel-state information (CSI) is noncausal, for additive white Gaussian noise (AWGN) channels. For certain channel models, including the AWGN channel, we show that the covert capacity is positive with CSI at the transmitter, but is zero without CSI. We also derive lower bounds on the rate of the secret key that is needed for the transmitter and the receiver to achieve the covert capacity.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
Asynchronous Massive Access and Neighbor Discovery Using OFDMA
Authors:
Xu Chen,
Lina Liu,
Dongning Guo,
Gregory W. Wornell
Abstract:
The fundamental communication problem in the wireless Internet of Things (IoT) is to discover a massive number of devices and to allow them reliable access to shared channels. Oftentimes these devices transmit short messages randomly and sporadically. This paper proposes a novel signaling scheme for grant-free massive access, where each device encodes its identity and/or information in a sparse se…
▽ More
The fundamental communication problem in the wireless Internet of Things (IoT) is to discover a massive number of devices and to allow them reliable access to shared channels. Oftentimes these devices transmit short messages randomly and sporadically. This paper proposes a novel signaling scheme for grant-free massive access, where each device encodes its identity and/or information in a sparse set of tones. Such transmissions are implemented in the form of orthogonal frequency-division multiple access (OFDMA). Under some mild conditions and assuming device delays to be bounded unknown multiples of symbol intervals, sparse OFDMA is proved to enable arbitrarily reliable asynchronous device identification and message decoding with a codelength that is O(K(log K + log S + log N)), where N denotes the device population, K denotes the actual number of active devices, and log S is essentially equal to the number of bits a device can send (including its identity). By exploiting the Fast Fourier Transform (FFT), the computational complexity for discovery and decoding can be made to be sub-linear in the total device population. To prove the concept, a specific design is proposed to identify up to 100 active devices out of $2^{38}$ possible devices with up to 20 symbols of delay and moderate signal-to-noise ratios and fading. The codelength compares much more favorably with those of standard slotted ALOHA and carrier-sensing multiple access (CSMA) schemes.
△ Less
Submitted 19 November, 2021; v1 submitted 28 June, 2017;
originally announced June 2017.
-
Sensor Array Design Through Submodular Optimization
Authors:
Gal Shulkind,
Stefanie Jegelka,
Gregory W. Wornell
Abstract:
We consider the problem of far-field sensing by means of a sensor array. Traditional array geometry design techniques are agnostic to prior information about the far-field scene. However, in many applications such priors are available and may be utilized to design more efficient array topologies. We formulate the problem of array geometry design with scene prior as one of finding a sampling config…
▽ More
We consider the problem of far-field sensing by means of a sensor array. Traditional array geometry design techniques are agnostic to prior information about the far-field scene. However, in many applications such priors are available and may be utilized to design more efficient array topologies. We formulate the problem of array geometry design with scene prior as one of finding a sampling configuration that enables efficient inference, which turns out to be a combinatorial optimization problem. While generic combinatorial optimization problems are NP-hard and resist efficient solvers, we show how for array design problems the theory of submodular optimization may be utilized to obtain efficient algorithms that are guaranteed to achieve solutions within a constant approximation factor from the optimum. We leverage the connection between array design problems and submodular optimization and port several results of interest. We demonstrate efficient methods for designing arrays with constraints on the sensing aperture, as well as arrays respecting combinatorial placement constraints. This novel connection between array design and submodularity suggests the possibility for utilizing other insights and techniques from the growing body of literature on submodular optimization in the field of array design.
△ Less
Submitted 28 December, 2017; v1 submitted 18 May, 2017;
originally announced May 2017.
-
Photon-efficient quantum cryptography with pulse-position modulation
Authors:
Tian Zhong,
Feihu Xu,
Zheshen Zhang,
Hongchao Zhou,
Alessandro Restelli,
Joshua C. Bienfang,
Ligong Wang,
Gregory W. Wornell,
Jeffrey H. Shapiro,
Franco N. C. Wong
Abstract:
The binary (one-bit-per-photon) encoding that most existing quantum key distribution (QKD) protocols employ puts a fundamental limit on their achievable key rates, especially under high channel loss conditions associated with long-distance fiber-optic or satellite-to-ground links. Inspired by the pulse-position-modulation (PPM) approach to photon-starved classical communications, we design and dem…
▽ More
The binary (one-bit-per-photon) encoding that most existing quantum key distribution (QKD) protocols employ puts a fundamental limit on their achievable key rates, especially under high channel loss conditions associated with long-distance fiber-optic or satellite-to-ground links. Inspired by the pulse-position-modulation (PPM) approach to photon-starved classical communications, we design and demonstrate the first PPM-QKD, whose security against collective attacks is established through continuous-variable entanglement measurements that also enable a novel decoy-state protocol performed conveniently in post processing. We achieve a throughput of 8.0 Mbit/s (2.5 Mbit/s for loss equivalent to 25 km of fiber) and secret-key capacity up to 4.0 bits per detected photon, thus demonstrating the significant enhancement afforded by high-dimensional encoding. These results point to a new avenue for realizing high-throughput satellite-based or long-haul fiber-optic quantum communications beyond their photon-reception-rate limits.
△ Less
Submitted 20 October, 2015;
originally announced October 2015.
-
Toward Photon-Efficient Key Distribution over Optical Channels
Authors:
Yuval Kochman,
Ligong Wang,
Gregory W. Wornell
Abstract:
This work considers the distribution of a secret key over an optical (bosonic) channel in the regime of high photon efficiency, i.e., when the number of secret key bits generated per detected photon is high. While in principle the photon efficiency is unbounded, there is an inherent tradeoff between this efficiency and the key generation rate (with respect to the channel bandwidth). We derive asym…
▽ More
This work considers the distribution of a secret key over an optical (bosonic) channel in the regime of high photon efficiency, i.e., when the number of secret key bits generated per detected photon is high. While in principle the photon efficiency is unbounded, there is an inherent tradeoff between this efficiency and the key generation rate (with respect to the channel bandwidth). We derive asymptotic expressions for the optimal generation rates in the photon-efficient limit, and propose schemes that approach these limits up to certain approximations. The schemes are practical, in the sense that they use coherent or temporally-entangled optical states and direct photodetection, all of which are reasonably easy to realize in practice, in conjunction with off-the-shelf classical codes.
△ Less
Submitted 22 July, 2014;
originally announced July 2014.
-
A refined analysis of the Poisson channel in the high-photon-efficiency regime
Authors:
Ligong Wang,
Gregory W. Wornell
Abstract:
We study the discrete-time Poisson channel under the constraint that its average input power (in photons per channel use) must not exceed some constant E. We consider the wideband, high-photon-efficiency extreme where E approaches zero, and where the channel's "dark current" approaches zero proportionally with E. Improving over a previously obtained first-order capacity approximation, we derive a…
▽ More
We study the discrete-time Poisson channel under the constraint that its average input power (in photons per channel use) must not exceed some constant E. We consider the wideband, high-photon-efficiency extreme where E approaches zero, and where the channel's "dark current" approaches zero proportionally with E. Improving over a previously obtained first-order capacity approximation, we derive a refined approximation, which includes the exact characterization of the second-order term, as well as an asymptotic characterization of the third-order term with respect to the dark current. We also show that pulse-position modulation is nearly optimal in this regime.
△ Less
Submitted 23 April, 2014; v1 submitted 22 January, 2014;
originally announced January 2014.
-
Geometric Relationships Between Gaussian and Modulo-Lattice Error Exponents
Authors:
Charles H. Swannack,
Uri Erez,
Gregory W. Wornell
Abstract:
Lattice coding and decoding have been shown to achieve the capacity of the additive white Gaussian noise (AWGN) channel. This was accomplished using a minimum mean-square error scaling and randomization to transform the AWGN channel into a modulo-lattice additive noise channel of the same capacity. It has been further shown that when operating at rates below capacity but above the critical rate of…
▽ More
Lattice coding and decoding have been shown to achieve the capacity of the additive white Gaussian noise (AWGN) channel. This was accomplished using a minimum mean-square error scaling and randomization to transform the AWGN channel into a modulo-lattice additive noise channel of the same capacity. It has been further shown that when operating at rates below capacity but above the critical rate of the channel, there exists a rate-dependent scaling such that the associated modulo-lattice channel attains the error exponent of the AWGN channel. A geometric explanation for this result is developed. In particular, it is shown how the geometry of typical error events for the modulo-lattice channel coincides with that of a spherical code for the AWGN channel.
△ Less
Submitted 7 August, 2013;
originally announced August 2013.
-
Update-Efficiency and Local Repairability Limits for Capacity Approaching Codes
Authors:
Arya Mazumdar,
Venkat Chandar,
Gregory W. Wornell
Abstract:
Motivated by distributed storage applications, we investigate the degree to which capacity achieving encodings can be efficiently updated when a single information bit changes, and the degree to which such encodings can be efficiently (i.e., locally) repaired when single encoded bit is lost.
Specifically, we first develop conditions under which optimum error-correction and update-efficiency are…
▽ More
Motivated by distributed storage applications, we investigate the degree to which capacity achieving encodings can be efficiently updated when a single information bit changes, and the degree to which such encodings can be efficiently (i.e., locally) repaired when single encoded bit is lost.
Specifically, we first develop conditions under which optimum error-correction and update-efficiency are possible, and establish that the number of encoded bits that must change in response to a change in a single information bit must scale logarithmically in the block-length of the code if we are to achieve any nontrivial rate with vanishing probability of error over the binary erasure or binary symmetric channels. Moreover, we show there exist capacity-achieving codes with this scaling.
With respect to local repairability, we develop tight upper and lower bounds on the number of remaining encoded bits that are needed to recover a single lost bit of the encoding. In particular, we show that if the code-rate is $ε$ less than the capacity, then for optimal codes, the maximum number of codeword symbols required to recover one lost symbol must scale as $\log1/ε$.
Several variations on---and extensions of---these results are also developed.
△ Less
Submitted 5 October, 2013; v1 submitted 14 May, 2013;
originally announced May 2013.
-
Private-Capacity Bounds for Bosonic Wiretap Channels
Authors:
Ligong Wang,
Jeffrey H. Shapiro,
Nivedita Chandrasekaran,
Gregory W. Wornell
Abstract:
We prove an upper bound on the private capacity of the single-mode noiseless bosonic wiretap channel. Combined with a previous lower bound, we obtain the low photon-number asymptotic expression for the private capacity. We then show that the multiple-mode noiseless bosonic wiretap channel is equivalent to parallel single-mode channels, hence the single-mode bounds can be applied. Finally, we consi…
▽ More
We prove an upper bound on the private capacity of the single-mode noiseless bosonic wiretap channel. Combined with a previous lower bound, we obtain the low photon-number asymptotic expression for the private capacity. We then show that the multiple-mode noiseless bosonic wiretap channel is equivalent to parallel single-mode channels, hence the single-mode bounds can be applied. Finally, we consider multiple-spatial-mode propagation through atmospheric turbulence, and derive a private-capacity lower bound that only requires second moments of the channel matrix.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
A Simple Message-Passing Algorithm for Compressed Sensing
Authors:
Venkat Chandar,
Devavrat Shah,
Gregory W. Wornell
Abstract:
We consider the recovery of a nonnegative vector x from measurements y = Ax, where A is an m-by-n matrix whos entries are in {0, 1}. We establish that when A corresponds to the adjacency matrix of a bipartite graph with sufficient expansion, a simple message-passing algorithm produces an estimate \hat{x} of x satisfying ||x-\hat{x}||_1 \leq O(n/k) ||x-x(k)||_1, where x(k) is the best k-sparse ap…
▽ More
We consider the recovery of a nonnegative vector x from measurements y = Ax, where A is an m-by-n matrix whos entries are in {0, 1}. We establish that when A corresponds to the adjacency matrix of a bipartite graph with sufficient expansion, a simple message-passing algorithm produces an estimate \hat{x} of x satisfying ||x-\hat{x}||_1 \leq O(n/k) ||x-x(k)||_1, where x(k) is the best k-sparse approximation of x. The algorithm performs O(n (log(n/k))^2 log(k)) computation in total, and the number of measurements required is m = O(k log(n/k)). In the special case when x is k-sparse, the algorithm recovers x exactly in time O(n log(n/k) log(k)). Ultimately, this work is a further step in the direction of more formally developing the broader role of message-passing algorithms in solving compressed sensing problems.
△ Less
Submitted 22 January, 2010;
originally announced January 2010.