Search | arXiv e-print repository

arXiv:2504.19773 [pdf, ps, other]

Sliding Window Adversarial Channels

Authors: Bikash Kumar Dey, Sidharth Jaggi, Michael Langberg, Anand D. Sarwate, Yihan Zhang

Abstract: In an arbitrarily varying channel (AVC), the channel has a state which is under the control of an adversarial jammer and the corresponding capacities are often functions of the "power" constraints on the transmitter and jammer. In this paper we propose a model in which the constraints must hold almost surely over contiguous subsequences of the codeword and state, which we call a sliding window con… ▽ More In an arbitrarily varying channel (AVC), the channel has a state which is under the control of an adversarial jammer and the corresponding capacities are often functions of the "power" constraints on the transmitter and jammer. In this paper we propose a model in which the constraints must hold almost surely over contiguous subsequences of the codeword and state, which we call a sliding window constraint. We study oblivious jammers and codes with stochastic encoding under maximum probability of error. We show that this extra limitation on the jammer is beneficial for the transmitter: in some cases, the capacity for unique decoding with a sliding window constraint is equal to the capacity for list decoding in the standard model without sliding windows, roughly implying that the addition of window constraints reduces list decoding to unique decoding. The list decoding capacity in the standard model can be strictly larger than the unique decoding capacity. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: Submitted manuscript accepted to ISIT 2025

MSC Class: 94A40

arXiv:2502.13577 [pdf, other]

Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts

Authors: Xin Li, Anand Sarwate

Abstract: However, real-world data often exhibit complex local structures that can be challenging for single-model approaches with a smooth global manifold in the embedding space to unravel. In this work, we conjecture that in the latent space of these large language models, the embeddings live in a local manifold structure with different dimensions depending on the perplexities and domains of the input dat… ▽ More However, real-world data often exhibit complex local structures that can be challenging for single-model approaches with a smooth global manifold in the embedding space to unravel. In this work, we conjecture that in the latent space of these large language models, the embeddings live in a local manifold structure with different dimensions depending on the perplexities and domains of the input data, commonly referred to as a Stratified Manifold structure, which in combination form a structured space known as a Stratified Space. To investigate the validity of this structural claim, we propose an analysis framework based on a Mixture-of-Experts (MoE) model where each expert is implemented with a simple dictionary learning algorithm at varying sparsity levels. By incorporating an attention-based soft-gating network, we verify that our model learns specialized sub-manifolds for an ensemble of input data sources, reflecting the semantic stratification in LLM embedding space. We further analyze the intrinsic dimensions of these stratified sub-manifolds and present extensive statistics on expert assignments, gating entropy, and inter-expert distances. Our experimental results demonstrate that our method not only validates the claim of a stratified manifold structure in the LLM embedding space, but also provides interpretable clusters that align with the intrinsic semantic variations of the input data. △ Less

Submitted 19 February, 2025; originally announced February 2025.

arXiv:2502.13568 [pdf, other]

LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation

Authors: Xin Li, Anand Sarwate

Abstract: Imposing an effective structural assumption on neural network weight matrices has been the major paradigm for designing Parameter-Efficient Fine-Tuning (PEFT) systems for adapting modern large pre-trained models to various downstream tasks. However, low rank based adaptation has become increasingly challenging due to the sheer scale of modern large language models. In this paper, we propose an eff… ▽ More Imposing an effective structural assumption on neural network weight matrices has been the major paradigm for designing Parameter-Efficient Fine-Tuning (PEFT) systems for adapting modern large pre-trained models to various downstream tasks. However, low rank based adaptation has become increasingly challenging due to the sheer scale of modern large language models. In this paper, we propose an effective kernelization to further reduce the number of parameters required for adaptation tasks. Specifically, from the classical idea in numerical analysis regarding matrix Low-Separation-Rank (LSR) representations, we develop a kernel using this representation for the low rank adapter matrices of the linear layers from large networks, named the Low Separation Rank Adaptation (LSR-Adapt) kernel. With the ultra-efficient kernel representation of the low rank adapter matrices, we manage to achieve state-of-the-art performance with even higher accuracy with almost half the number of parameters as compared to conventional low rank based methods. This structural assumption also opens the door to further GPU-side optimizations due to the highly parallelizable nature of Kronecker computations. △ Less

Submitted 19 February, 2025; originally announced February 2025.

arXiv:2501.13810 [pdf, other]

Learning to Help in Multi-Class Settings

Authors: Yu Wu, Yansong Li, Zeyu Dong, Nitya Sathyavageeswaran, Anand D. Sarwate

Abstract: Deploying complex machine learning models on resource-constrained devices is challenging due to limited computational power, memory, and model retrainability. To address these limitations, a hybrid system can be established by augmenting the local model with a server-side model, where samples are selectively deferred by a rejector and then sent to the server for processing. The hybrid system enabl… ▽ More Deploying complex machine learning models on resource-constrained devices is challenging due to limited computational power, memory, and model retrainability. To address these limitations, a hybrid system can be established by augmenting the local model with a server-side model, where samples are selectively deferred by a rejector and then sent to the server for processing. The hybrid system enables efficient use of computational resources while minimizing the overhead associated with server usage. The recently proposed Learning to Help (L2H) model trains a server model given a fixed local (client) model, differing from the Learning to Defer (L2D) framework, which trains the client for a fixed (expert) server. In both L2D and L2H, the training includes learning a rejector at the client to determine when to query the server. In this work, we extend the L2H model from binary to multi-class classification problems and demonstrate its applicability in a number of different scenarios of practical interest in which access to the server may be limited by cost, availability, or policy. We derive a stage-switching surrogate loss function that is differentiable, convex, and consistent with the Bayes rule corresponding to the 0-1 loss for the L2H model. Experiments show that our proposed methods offer an efficient and practical solution for multi-class classification in resource-constrained environments. △ Less

Submitted 16 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

Comments: 30 pages, 7 figures, conference, ICLR 2025

arXiv:2501.06620 [pdf, other]

Differentially Private Distribution Estimation Using Functional Approximation

Authors: Ye Tao, Anand D. Sarwate

Abstract: The cumulative distribution function (CDF) is fundamental due to its ability to reveal information about random variables, making it essential in studies that require privacy-preserving methods to protect sensitive data. This paper introduces a novel privacy-preserving CDF method inspired by the functional analysis and functional mechanism. Our approach projects the empirical CDF into a predefined… ▽ More The cumulative distribution function (CDF) is fundamental due to its ability to reveal information about random variables, making it essential in studies that require privacy-preserving methods to protect sensitive data. This paper introduces a novel privacy-preserving CDF method inspired by the functional analysis and functional mechanism. Our approach projects the empirical CDF into a predefined space, approximating it using specific functions, and protects the coefficients to achieve a differentially private empirical CDF. Compared to existing methods like histogram queries and adaptive quantiles, our method is preferable in decentralized settings and scenarios where CDFs must be updated with newly collected data. △ Less

Submitted 11 January, 2025; originally announced January 2025.

Comments: 11 pages, 8 figures

arXiv:2409.16253 [pdf, other]

Learning To Help: Training Models to Assist Legacy Devices

Authors: Yu Wu, Anand Sarwate

Abstract: Machine learning models implemented in hardware on physical devices may be deployed for a long time. The computational abilities of the device may be limited and become outdated with respect to newer improvements. Because of the size of ML models, offloading some computation (e.g. to an edge cloud) can help such legacy devices. We cast this problem in the framework of learning with abstention (LWA… ▽ More Machine learning models implemented in hardware on physical devices may be deployed for a long time. The computational abilities of the device may be limited and become outdated with respect to newer improvements. Because of the size of ML models, offloading some computation (e.g. to an edge cloud) can help such legacy devices. We cast this problem in the framework of learning with abstention (LWA) in which the expert (edge) must be trained to assist the client (device). Prior work on LWA trains the client assuming the edge is either an oracle or a human expert. In this work, we formalize the reverse problem of training the expert for a fixed (legacy) client. As in LWA, the client uses a rejection rule to decide when to offload inference to the expert (at a cost). We find the Bayes-optimal rule, prove a generalization bound, and find a consistent surrogate loss function. Empirical results show that our framework outperforms confidence-based rejection rules. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 12 pages, 4 figures

ACM Class: I.2.6; I.2.11

arXiv:2409.11184 [pdf, other]

LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling

Authors: Xin Li, Anand Sarwate

Abstract: Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance in many modern generative modeling applications. Quantizing the latent space has been justifie… ▽ More Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance in many modern generative modeling applications. Quantizing the latent space has been justified by the assumption that the data themselves are inherently discrete in the latent space (like pixel values). In this paper, we propose an alternative representation of the latent space by relaxing the structural assumption than the VQ formulation. Specifically, we assume that the latent space can be approximated by a union of subspaces model corresponding to a dictionary-based representation under a sparsity constraint. The dictionary is learned/updated during the training process. We apply this approach to look at two models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs with Generative Adversarial Networks (DL-GANs). We show empirically that our more latent space is more expressive and has leads to better representations than the VQ approach in terms of reconstruction quality at the expense of a small computational overhead for the latent space computation. Our results thus suggest that the true benefit of the VQ approach might not be from discretization of the latent space, but rather the lossy compression of the latent space. We confirm this hypothesis by showing that our sparse representations also address the codebook collapse issue as found common in VQ-family models. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: Preprint, under review. Submitted to 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

arXiv:2408.10437 [pdf, other]

Understanding Generative AI Content with Embedding Models

Authors: Max Vargas, Reilly Cannon, Andrew Engel, Anand D. Sarwate, Tony Chiang

Abstract: Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For em… ▽ More Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models -- which are trained to be useful across many contexts -- we demonstrate that simple and well-studied dimensionality-reduction techniques such as Principal Component Analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence (AI). △ Less

Submitted 22 February, 2025; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2406.08307 [pdf, other]

Measuring training variability from stochastic optimization using robust nonparametric testing

Authors: Sinjini Banerjee, Tim Marrinan, Reilly Cannon, Tony Chiang, Anand D. Sarwate

Abstract: Deep neural network training often involves stochastic optimization, meaning each run will produce a different model. This implies that hyperparameters of the training process, such as the random seed itself, can potentially have significant influence on the variability in the trained models. Measuring model quality by summary statistics, such as test accuracy, can obscure this dependence. We prop… ▽ More Deep neural network training often involves stochastic optimization, meaning each run will produce a different model. This implies that hyperparameters of the training process, such as the random seed itself, can potentially have significant influence on the variability in the trained models. Measuring model quality by summary statistics, such as test accuracy, can obscure this dependence. We propose a robust hypothesis testing framework and a novel summary statistic, the $α$-trimming level, to measure model similarity. Applying hypothesis testing directly with the $α$-trimming level is challenging because we cannot accurately describe the distribution under the null hypothesis. Our framework addresses this issue by determining how closely an approximate distribution resembles the expected distribution of a group of individually trained models and using this approximation as our reference. We then use the $α$-trimming level to suggest how many training runs should be sampled to ensure that an ensemble is a reliable representative of the true model performance. We also show how to use the $α$-trimming level to measure model variability and demonstrate experimentally that it is more expressive than performance metrics like validation accuracy, churn, or expected calibration error when taken alone. An application of fine-tuning over random seed in transfer learning illustrates the advantage of our new metric. △ Less

Submitted 15 April, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2310.00541 [pdf, other]

Robust Nonparametric Hypothesis Testing to Understand Variability in Training Neural Networks

Authors: Sinjini Banerjee, Reilly Cannon, Tim Marrinan, Tony Chiang, Anand D. Sarwate

Abstract: Training a deep neural network (DNN) often involves stochastic optimization, which means each run will produce a different model. Several works suggest this variability is negligible when models have the same performance, which in the case of classification is test accuracy. However, models with similar test accuracy may not be computing the same function. We propose a new measure of closeness bet… ▽ More Training a deep neural network (DNN) often involves stochastic optimization, which means each run will produce a different model. Several works suggest this variability is negligible when models have the same performance, which in the case of classification is test accuracy. However, models with similar test accuracy may not be computing the same function. We propose a new measure of closeness between classification models based on the output of the network before thresholding. Our measure is based on a robust hypothesis-testing framework and can be adapted to other quantities derived from trained models. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2308.02922 [pdf, other]

Structured Low-Rank Tensors for Generalized Linear Models

Authors: Batoul Taki, Anand D. Sarwate, Waheed U. Bajwa

Abstract: Recent works have shown that imposing tensor structures on the coefficient tensor in regression problems can lead to more reliable parameter estimation and lower sample complexity compared to vector-based methods. This work investigates a new low-rank tensor model, called Low Separation Rank (LSR), in Generalized Linear Model (GLM) problems. The LSR model -- which generalizes the well-known Tucker… ▽ More Recent works have shown that imposing tensor structures on the coefficient tensor in regression problems can lead to more reliable parameter estimation and lower sample complexity compared to vector-based methods. This work investigates a new low-rank tensor model, called Low Separation Rank (LSR), in Generalized Linear Model (GLM) problems. The LSR model -- which generalizes the well-known Tucker and CANDECOMP/PARAFAC (CP) models, and is a special case of the Block Tensor Decomposition (BTD) model -- is imposed onto the coefficient tensor in the GLM model. This work proposes a block coordinate descent algorithm for parameter estimation in LSR-structured tensor GLMs. Most importantly, it derives a minimax lower bound on the error threshold on estimating the coefficient tensor in LSR tensor GLM problems. The minimax bound is proportional to the intrinsic degrees of freedom in the LSR tensor GLM problem, suggesting that its sample complexity may be significantly lower than that of vectorized GLMs. This result can also be specialised to lower bound the estimation error in CP and Tucker-structured GLMs. The derived bounds are comparable to tight bounds in the literature for Tucker linear regression, and the tightness of the minimax lower bound is further assessed numerically. Finally, numerical experiments on synthetic datasets demonstrate the efficacy of the proposed LSR tensor model for three regression types (linear, logistic and Poisson). Experiments on a collection of medical imaging datasets demonstrate the usefulness of the LSR model over other tensor models (Tucker and CP) on real, imbalanced data with limited available samples. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: 43 pages; published in Transactions on Machine Learning Research (08/2023)

Journal ref: Transactions on Machine Learning Research, Aug. 2023 (https://openreview.net/forum?id=qUxBs3Ln41)

arXiv:2307.11684 [pdf, other]

Minibatching Offers Improved Generalization Performance for Second Order Optimizers

Authors: Eric Silk, Swarnita Chakraborty, Nairanjana Dasgupta, Anand D. Sarwate, Andrew Lumsdaine, Tony Chiang

Abstract: Training deep neural networks (DNNs) used in modern machine learning is computationally expensive. Machine learning scientists, therefore, rely on stochastic first-order methods for training, coupled with significant hand-tuning, to obtain good performance. To better understand performance variability of different stochastic algorithms, including second-order methods, we conduct an empirical study… ▽ More Training deep neural networks (DNNs) used in modern machine learning is computationally expensive. Machine learning scientists, therefore, rely on stochastic first-order methods for training, coupled with significant hand-tuning, to obtain good performance. To better understand performance variability of different stochastic algorithms, including second-order methods, we conduct an empirical study that treats performance as a response variable across multiple training sessions of the same model. Using 2-factor Analysis of Variance (ANOVA) with interactions, we show that batch size used during training has a statistically significant effect on the peak accuracy of the methods, and that full batch largely performed the worst. In addition, we found that second-order optimizers (SOOs) generally exhibited significantly lower variance at specific batch sizes, suggesting they may require less hyperparameter tuning, leading to a reduced overall time to solution for model training. △ Less

Submitted 25 May, 2023; originally announced July 2023.

Comments: 14 pages, 6 figures, 5 tables

arXiv:2305.14585 [pdf, other]

Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models

Authors: Andrew Engel, Zhichao Wang, Natalie S. Frank, Ioana Dumitriu, Sutanay Choudhury, Anand Sarwate, Tony Chiang

Abstract: A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution… ▽ More A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution. Approximation is critical for eNTK analysis due to the high computational cost to compute the eNTK. We define new approximate eNTK and perform novel analysis on how well the resulting kernel machine surrogate models correlate with the underlying neural network. We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation. We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models, with the introduced trace NTK the most consistent performer. Open source software allowing users to efficiently calculate kernel functions in the PyTorch framework is available (https://github.com/pnnl/projection\_ntk). △ Less

Submitted 11 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 9 pages, 2 figures, 3 tables Updated 3/11/2024 various additions/clarifications after ICLR review. Accepted as a Spotlight paper at ICLR 2024

arXiv:2211.06506 [pdf, other]

Spectral Evolution and Invariance in Linear-width Neural Networks

Authors: Zhichao Wang, Andrew Engel, Anand Sarwate, Ioana Dumitriu, Tony Chiang

Abstract: We investigate the spectral properties of linear-width feed-forward neural networks, where the sample size is asymptotically proportional to network width. Empirically, we show that the spectra of weight in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates; we provide a theoretical justification for this observation and prove the invarian… ▽ More We investigate the spectral properties of linear-width feed-forward neural networks, where the sample size is asymptotically proportional to network width. Empirically, we show that the spectra of weight in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates; we provide a theoretical justification for this observation and prove the invariance of the bulk spectra for both conjugate and neural tangent kernels. We demonstrate similar characteristics when training with stochastic gradient descent with small learning rates. When the learning rate is large, we exhibit the emergence of an outlier whose corresponding eigenvector is aligned with the training data structure. We also show that after adaptive gradient training, where a lower test error and feature learning emerge, both weight and kernel matrices exhibit heavy tail behavior. Simple examples are provided to explain when heavy tails can have better generalizations. We exhibit different spectral properties such as invariant bulk, spike, and heavy-tailed distribution from a two-layer neural network using different training strategies, and then correlate them to the feature learning. Analogous phenomena also appear when we train conventional neural networks with real-world data. We conclude that monitoring the evolution of the spectra during training is an essential step toward understanding the training dynamics and feature learning. △ Less

Submitted 7 November, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: Accepted by NeurIPS 2023

arXiv:2205.12372 [pdf, other]

TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models

Authors: Andrew Engel, Zhichao Wang, Anand D. Sarwate, Sutanay Choudhury, Tony Chiang

Abstract: We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework. We provide an efficient method to calculate the NTK of multilayer perceptrons. We compare the explicit differentiation implementation against autodifferentiation implementations, which have the benefit of extending the utility of the library to any archi… ▽ More We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework. We provide an efficient method to calculate the NTK of multilayer perceptrons. We compare the explicit differentiation implementation against autodifferentiation implementations, which have the benefit of extending the utility of the library to any architecture supported by PyTorch, such as convolutional networks. A feature of the library is that we expose the user to layerwise NTK components, and show that in some regimes a layerwise calculation is more memory efficient. We conduct preliminary experiments to demonstrate use cases for the software and probe the NTK. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 19 pages, 5 figures

arXiv:2205.06708 [pdf, ps, other]

The Capacity of Causal Adversarial Channels

Authors: Yihan Zhang, Sidharth Jaggi, Michael Langberg, Anand D. Sarwate

Abstract: We characterize the capacity for the discrete-time arbitrarily varying channel with discrete inputs, outputs, and states when (a) the encoder and decoder do not share common randomness, (b) the input and state are subject to cost constraints, (c) the transition matrix of the channel is deterministic given the state, and (d) at each time step the adversary can only observe the current and past chan… ▽ More We characterize the capacity for the discrete-time arbitrarily varying channel with discrete inputs, outputs, and states when (a) the encoder and decoder do not share common randomness, (b) the input and state are subject to cost constraints, (c) the transition matrix of the channel is deterministic given the state, and (d) at each time step the adversary can only observe the current and past channel inputs when choosing the state at that time. The achievable strategy involves stochastic encoding together with list decoding and a disambiguation step. The converse uses a two-phase "babble-and-push" strategy where the adversary chooses the state randomly in the first phase, list decodes the output, and then chooses state inputs to symmetrize the channel in the second phase. These results generalize prior work on specific channels models (additive, erasure) to general discrete alphabets and models. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2202.08260 [pdf, other]

Low-Rank Phase Retrieval with Structured Tensor Models

Authors: Soo Min Kwon, Xin Li, Anand D. Sarwate

Abstract: We study the low-rank phase retrieval problem, where the objective is to recover a sequence of signals (typically images) given the magnitude of linear measurements of those signals. Existing solutions involve recovering a matrix constructed by vectorizing and stacking each image. These algorithms model this matrix to be low-rank and leverage the low-rank property to decrease the sample complexity… ▽ More We study the low-rank phase retrieval problem, where the objective is to recover a sequence of signals (typically images) given the magnitude of linear measurements of those signals. Existing solutions involve recovering a matrix constructed by vectorizing and stacking each image. These algorithms model this matrix to be low-rank and leverage the low-rank property to decrease the sample complexity required for accurate recovery. However, when the number of available measurements is more limited, these low-rank matrix models can often fail. We propose an algorithm called Tucker-Structured Phase Retrieval (TSPR) that models the sequence of images as a tensor rather than a matrix that we factorize using the Tucker decomposition. This factorization reduces the number of parameters that need to be estimated, allowing for a more accurate reconstruction in the under-sampled regime. Interestingly, we observe that this structure also has improved performance in the over-determined setting when the Tucker ranks are chosen appropriately. We demonstrate the effectiveness of our approach on real video datasets under several different measurement models. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: A shorter version of this paper is in 2022 International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

arXiv:2111.14992 [pdf, other]

Network Traffic Shaping for Enhancing Privacy in IoT Systems

Authors: Sijie Xiong, Anand D. Sarwate, Narayan B. Mandayam

Abstract: Motivated by privacy issues caused by inference attacks on user activities in the packet sizes and timing information of Internet of Things (IoT) network traffic, we establish a rigorous event-level differential privacy (DP) model on infinite packet streams. We propose a memoryless traffic shaping mechanism satisfying a first-come-first-served queuing discipline that outputs traffic dependent on t… ▽ More Motivated by privacy issues caused by inference attacks on user activities in the packet sizes and timing information of Internet of Things (IoT) network traffic, we establish a rigorous event-level differential privacy (DP) model on infinite packet streams. We propose a memoryless traffic shaping mechanism satisfying a first-come-first-served queuing discipline that outputs traffic dependent on the input using a DP mechanism. We show that in special cases the proposed mechanism recovers existing shapers which standardize the output independently from the input. To find the optimal shapers for given levels of privacy and transmission efficiency, we formulate the constrained problem of minimizing the expected delay per packet and propose using the expected queue size across time as a proxy. We further show that the constrained minimization is a convex program. We demonstrate the effect of shapers on both synthetic data and packet traces from actual IoT devices. The experimental results reveal inherent privacy-overhead tradeoffs: more shaping overhead provides better privacy protection. Under the same privacy level, there naturally exists a tradeoff between dummy traffic and delay. When dealing with heavier or less bursty input traffic, all shapers become more overhead-efficient. We also show that increased traffic from a larger number of IoT devices makes guaranteeing event-level privacy easier. The DP shaper offers tunable privacy that is invariant with the change in the input traffic distribution and has an advantage in handling burstiness over traffic-independent shapers. This approach well accommodates heterogeneous network conditions and enables users to adapt to their privacy/overhead demands. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 18 pages, 10 figures, submitted to IEEE Transactions on Networking

arXiv:2106.12083 [pdf, other]

Privid: Practical, Privacy-Preserving Video Analytics Queries

Authors: Frank Cangialosi, Neil Agarwal, Venkat Arun, Junchen Jiang, Srinivas Narayana, Anand Sarwate, Ravi Netravali

Abstract: Analytics on video recorded by cameras in public areas have the potential to fuel many exciting applications, but also pose the risk of intruding on individuals' privacy. Unfortunately, existing solutions fail to practically resolve this tension between utility and privacy, relying on perfect detection of all private information in each video frame--an elusive requirement. This paper presents: (1)… ▽ More Analytics on video recorded by cameras in public areas have the potential to fuel many exciting applications, but also pose the risk of intruding on individuals' privacy. Unfortunately, existing solutions fail to practically resolve this tension between utility and privacy, relying on perfect detection of all private information in each video frame--an elusive requirement. This paper presents: (1) a new notion of differential privacy (DP) for video analytics, $(ρ,K,ε)$-event-duration privacy, which protects all private information visible for less than a particular duration, rather than relying on perfect detections of that information, and (2) a practical system called Privid that enforces duration-based privacy even with the (untrusted) analyst-provided deep neural networks that are commonplace for video analytics today. Across a variety of videos and queries, we show that Privid achieves accuracies within 79-99% of a non-private system. △ Less

Submitted 22 June, 2021; originally announced June 2021.

arXiv:2105.14673 [pdf, ps, other]

doi 10.1109/IEEECONF53345.2021.9723149

A Minimax Lower Bound for Low-Rank Matrix-Variate Logistic Regression

Authors: Batoul Taki, Mohsen Ghassemi, Anand D. Sarwate, Waheed U. Bajwa

Abstract: This paper considers the problem of matrix-variate logistic regression. It derives the fundamental error threshold on estimating low-rank coefficient matrices in the logistic regression problem by obtaining a lower bound on the minimax risk. The bound depends explicitly on the dimension and distribution of the covariates, the rank and energy of the coefficient matrix, and the number of samples. Th… ▽ More This paper considers the problem of matrix-variate logistic regression. It derives the fundamental error threshold on estimating low-rank coefficient matrices in the logistic regression problem by obtaining a lower bound on the minimax risk. The bound depends explicitly on the dimension and distribution of the covariates, the rank and energy of the coefficient matrix, and the number of samples. The resulting bound is proportional to the intrinsic degrees of freedom in the problem, which suggests the sample complexity of the low-rank matrix logistic regression problem can be lower than that for vectorized logistic regression. The proof techniques utilized in this work also set the stage for development of minimax lower bounds for tensor-variate logistic regression problems. △ Less

Submitted 28 January, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

Comments: 8 pages; published in Proc. 55th Asilomar Conf. Signals, Systems, and Computers, Pacific Grove, CA, Oct. 31-Nov. 3, 2021

arXiv:2012.11877 [pdf, other]

Influencers and the Giant Component: the Fundamental Hardness in Privacy Protection for Socially Contagious Attributes

Authors: Aria Rezaei, Jie Gao, Anand D. Sarwate

Abstract: The presence of correlation is known to make privacy protection more difficult. We investigate the privacy of socially contagious attributes on a network of individuals, where each individual possessing that attribute may influence a number of others into adopting it. We show that for contagions following the Independent Cascade model there exists a giant connected component of infected nodes, con… ▽ More The presence of correlation is known to make privacy protection more difficult. We investigate the privacy of socially contagious attributes on a network of individuals, where each individual possessing that attribute may influence a number of others into adopting it. We show that for contagions following the Independent Cascade model there exists a giant connected component of infected nodes, containing a constant fraction of all the nodes who all receive the contagion from the same set of sources. We further show that it is extremely hard to hide the existence of this giant connected component if we want to obtain an estimate of the activated users at an acceptable level. Moreover, an adversary possessing this knowledge can predict the real status ("active" or "inactive") with decent probability for many of the individuals regardless of the privacy (perturbation) mechanism used. As a case study, we show that the Wasserstein mechanism, a state-of-the-art privacy mechanism designed specifically for correlated data, introduces a noise with magnitude of order $Ω(n)$ in the count estimation in our setting. We provide theoretical guarantees for two classes of random networks: Erdos Renyi graphs and Chung-Lu power-law graphs under the Independent Cascade model. Experiments demonstrate that a giant connected component of infected nodes can and does appear in real-world networks and that a simple inference attack can reveal the status of a good fraction of nodes. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: SIAM SDM 2021, privacy, social contagions, social networks

arXiv:2006.06792 [pdf, other]

Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme

Authors: Kontantinos E. Nikolakakis, Dionysios S. Kalogerias, Or Sheffet, Anand D. Sarwate

Abstract: We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is $δ$-PAC and we characterize its sample complexity.… ▽ More We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is $δ$-PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support-size, and we characterize its sample complexity. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand. △ Less

Submitted 4 December, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 18 pages, 4 figures

arXiv:1910.12913 [pdf, other]

Improved Differentially Private Decentralized Source Separation for fMRI Data

Authors: Hafiz Imtiaz, Jafar Mohammadi, Rogers Silva, Bradley Baker, Sergey M. Plis, Anand D. Sarwate, Vince Calhoun

Abstract: Blind source separation algorithms such as independent component analysis (ICA) are widely used in the analysis of neuroimaging data. In order to leverage larger sample sizes, different data holders/sites may wish to collaboratively learn feature representations. However, such datasets are often privacy-sensitive, precluding centralized analyses that pool the data at a single site. In this work, w… ▽ More Blind source separation algorithms such as independent component analysis (ICA) are widely used in the analysis of neuroimaging data. In order to leverage larger sample sizes, different data holders/sites may wish to collaboratively learn feature representations. However, such datasets are often privacy-sensitive, precluding centralized analyses that pool the data at a single site. In this work, we propose a differentially private algorithm for performing ICA in a decentralized data setting. Conventional approaches to decentralized differentially private algorithms may introduce too much noise due to the typically small sample sizes at each site. We propose a novel protocol that uses correlated noise to remedy this problem. We show that our algorithm outperforms existing approaches on synthetic and real neuroimaging datasets and demonstrate that it can sometimes reach the same level of utility as the corresponding non-private algorithm. This indicates that it is possible to have meaningful utility while preserving privacy. △ Less

Submitted 22 February, 2021; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: \c{opyright} 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. arXiv admin note: text overlap with arXiv:1904.10059

arXiv:1909.09596 [pdf, other]

Optimal Rates for Learning Hidden Tree Structures

Authors: Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Anand D. Sarwate

Abstract: We provide high probability finite sample complexity guarantees for hidden non-parametric structure learning of tree-shaped graphical models, whose hidden and observable nodes are discrete random variables with either finite or countable alphabets. We study a fundamental quantity called the (noisy) information threshold, which arises naturally from the error analysis of the Chow-Liu algorithm and,… ▽ More We provide high probability finite sample complexity guarantees for hidden non-parametric structure learning of tree-shaped graphical models, whose hidden and observable nodes are discrete random variables with either finite or countable alphabets. We study a fundamental quantity called the (noisy) information threshold, which arises naturally from the error analysis of the Chow-Liu algorithm and, as we discuss, provides explicit necessary and sufficient conditions on sample complexity, by effectively summarizing the difficulty of the tree-structure learning problem. Specifically, we show that the finite sample complexity of the Chow-Liu algorithm for ensuring exact structure recovery from noisy data is inversely proportional to the information threshold squared (provided it is positive), and scales almost logarithmically relative to the number of nodes over a given probability of failure. Conversely, we show that, if the number of samples is less than an absolute constant times the inverse of information threshold squared, then no algorithm can recover the hidden tree structure with probability greater than one half. As a consequence, our upper and lower bounds match with respect to the information threshold, indicating that it is a fundamental quantity for the problem of learning hidden tree-structured models. Further, the Chow-Liu algorithm with noisy data as input achieves the optimal rate with respect to the information threshold. Lastly, as a byproduct of our analysis, we resolve the problem of tree structure learning in the presence of non-identically distributed observation noise, providing conditions for convergence of the Chow-Liu algorithm under this setting, as well. △ Less

Submitted 31 March, 2021; v1 submitted 20 September, 2019; originally announced September 2019.

Comments: 33 pages, 4 figures

arXiv:1908.08407 [pdf, other]

doi 10.1109/TIT.2021.3091604

Coordination Through Shared Randomness

Authors: Gowtham R. Kurri, Vinod M. Prabhakaran, Anand D. Sarwate

Abstract: We study a distributed sampling problem where a set of processors want to output (approximately) independent and identically distributed samples from a joint distribution with the help of a common message from a coordinator. Each processor has access to a subset of sources from a set of independent sources of "shared" randomness. We consider two cases -- in the "omniscient coordinator setting", th… ▽ More We study a distributed sampling problem where a set of processors want to output (approximately) independent and identically distributed samples from a joint distribution with the help of a common message from a coordinator. Each processor has access to a subset of sources from a set of independent sources of "shared" randomness. We consider two cases -- in the "omniscient coordinator setting", the coordinator has access to all these sources of shared randomness, while in the "oblivious coordinator setting", it has access to none. All processors and the coordinator may privately randomize. In the omniscient coordinator setting, when the subsets at the processors are disjoint (individually shared randomness model), we characterize the rate of communication required from the coordinator to the processors over a multicast link. For the two-processor case, the optimal rate matches a special case of relaxed Wyner's common information proposed by Gastpar and Sula (2019), thereby providing an operational meaning to the latter. We also give an upper bound on the communication rate for the "randomness-on-the-forehead" model where each processor observes all but one source of randomness and we give an achievable strategy for the general case where the processors have access to arbitrary subsets of sources of randomness. Also, we consider a more general model where the processors observe components of correlated sources (with the coordinator observing all the components), where we characterize the communication rate when all the processors wish to output the same random sequence. In the oblivious coordinator setting, we completely characterize the trade-off region between the communication and shared randomness rates for the general case where the processors have access to arbitrary subsets of sources of randomness. △ Less

Submitted 17 June, 2021; v1 submitted 22 August, 2019; originally announced August 2019.

Comments: 27 pages, 7 figures. Some results of this paper were presented at ISIT 2018 and ITW 2019. This paper subsumes arXiv:1805.03193

arXiv:1904.10059 [pdf, other]

Distributed Differentially Private Computation of Functions with Correlated Noise

Authors: Hafiz Imtiaz, Jafar Mohammadi, Anand D. Sarwate

Abstract: Many applications of machine learning, such as human health research, involve processing private or sensitive information. Privacy concerns may impose significant hurdles to collaboration in scenarios where there are multiple sites holding data and the goal is to estimate properties jointly across all datasets. Differentially private decentralized algorithms can provide strong privacy guarantees.… ▽ More Many applications of machine learning, such as human health research, involve processing private or sensitive information. Privacy concerns may impose significant hurdles to collaboration in scenarios where there are multiple sites holding data and the goal is to estimate properties jointly across all datasets. Differentially private decentralized algorithms can provide strong privacy guarantees. However, the accuracy of the joint estimates may be poor when the datasets at each site are small. This paper proposes a new framework, Correlation Assisted Private Estimation (CAPE), for designing privacy-preserving decentralized algorithms with better accuracy guarantees in an honest-but-curious model. CAPE can be used in conjunction with the functional mechanism for statistical and machine learning optimization problems. A tighter characterization of the functional mechanism is provided that allows CAPE to achieve the same performance as a centralized algorithm in the decentralized setting using all datasets. Empirical results on regression and neural network problems for both synthetic and real datasets show that differentially private methods can be competitive with non-private algorithms in many scenarios of interest. △ Less

Submitted 22 February, 2021; v1 submitted 22 April, 2019; originally announced April 2019.

Comments: The manuscript is partially subsumed by arXiv:1910.12913

arXiv:1903.09284 [pdf, other]

doi 10.1109/TSP.2019.2952046

Learning Mixtures of Separable Dictionaries for Tensor Data: Analysis and Algorithms

Authors: Mohsen Ghassemi, Zahra Shakeri, Anand D. Sarwate, Waheed U. Bajwa

Abstract: This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for loca… ▽ More This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for local identifiability of the underlying dictionary are derived in each case. Moreover, computational algorithms are developed to solve the problem of learning mixture of separable dictionaries in both batch and online settings. Numerical experiments are used to show the usefulness of the proposed model and the efficacy of the developed algorithms. △ Less

Submitted 13 June, 2020; v1 submitted 21 March, 2019; originally announced March 2019.

Comments: 18 pages, 4 figures, 3 tables; Published in IEEE Trans. Signal Processing

Journal ref: IEEE Trans. Signal Processing, vol. 68, pp. 33-48, 2020

arXiv:1812.04700 [pdf, other]

Predictive Learning on Hidden Tree-Structured Ising Models

Authors: Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Anand D. Sarwate

Abstract: We provide high-probability sample complexity guarantees for exact structure recovery and accurate predictive learning using noise-corrupted samples from an acyclic (tree-shaped) graphical model. The hidden variables follow a tree-structured Ising model distribution, whereas the observable variables are generated by a binary symmetric channel taking the hidden variables as its input (flipping each… ▽ More We provide high-probability sample complexity guarantees for exact structure recovery and accurate predictive learning using noise-corrupted samples from an acyclic (tree-shaped) graphical model. The hidden variables follow a tree-structured Ising model distribution, whereas the observable variables are generated by a binary symmetric channel taking the hidden variables as its input (flipping each bit independently with some constant probability $q\in [0,1/2)$). In the absence of noise, predictive learning on Ising models was recently studied by Bresler and Karzand (2020); this paper quantifies how noise in the hidden model impacts the tasks of structure recovery and marginal distribution estimation by proving upper and lower bounds on the sample complexity. Our results generalize state-of-the-art bounds reported in prior work, and they exactly recover the noiseless case ($q=0$). In fact, for any tree with $p$ vertices and probability of incorrect recovery $δ>0$, the sufficient number of samples remains logarithmic as in the noiseless case, i.e., $\mathcal{O}(\log(p/δ))$, while the dependence on $q$ is $\mathcal{O}\big( 1/(1-2q)^{4} \big)$, for both aforementioned tasks. We also present a new equivalent of Isserlis' Theorem for sign-valued tree-structured distributions, yielding a new low-complexity algorithm for higher-order moment estimation. △ Less

Submitted 16 February, 2021; v1 submitted 11 December, 2018; originally announced December 2018.

Comments: 82 pages, 8 figures

arXiv:1805.03319 [pdf, other]

Quadratically Constrained Channels with Causal Adversaries

Authors: Tongxin Li, Bikash Kumar Dey, Sidharth Jaggi, Michael Langberg, Anand D. Sarwate

Abstract: We consider the problem of communication over a channel with a causal jamming adversary subject to quadratic constraints. A sender Alice wishes to communicate a message to a receiver Bob by transmitting a real-valued length-$n$ codeword $\mathbf{x}=x_1,...,x_n$ through a communication channel. Alice and Bob do not share common randomness. Knowing Alice's encoding strategy, an adversarial jammer Ja… ▽ More We consider the problem of communication over a channel with a causal jamming adversary subject to quadratic constraints. A sender Alice wishes to communicate a message to a receiver Bob by transmitting a real-valued length-$n$ codeword $\mathbf{x}=x_1,...,x_n$ through a communication channel. Alice and Bob do not share common randomness. Knowing Alice's encoding strategy, an adversarial jammer James chooses a real-valued length-n noise sequence $\mathbf{s}=s_1,..,s_n$ in a causal manner, i.e., each $s_t (1<=t<=n)$ can only depend on $x_1,...,x_t$. Bob receives $\mathbf{y}$, the sum of Alice's transmission $\mathbf{x}$ and James' jamming vector $\mathbf{s}$, and is required to reliably estimate Alice's message from this sum. In addition, Alice and James's transmission powers are restricted by quadratic constraints $P>0$ and $N>0$. In this work, we characterize the channel capacity for such a channel as the limit superior of the optimal values of a series of optimizations. Upper and lower bounds on the optimal values are provided both analytically and numerically. Interestingly, unlike many communication problems, in this causal setting Alice's optimal codebook may not have a uniform power allocation - for certain SNR, a codebook with a two-level uniform power allocation results in a strictly higher rate than a codebook with a uniform power allocation would. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: 80 pages, ISIT 2018

arXiv:1805.03193 [pdf, other]

Coordination Using Individually Shared Randomness

Authors: Gowtham R. Kurri, Vinod M. Prabhakaran, Anand D. Sarwate

Abstract: Two processors output correlated sequences using the help of a coordinator with whom they individually share independent randomness. For the case of unlimited shared randomness, we characterize the rate of communication required from the coordinator to the processors over a broadcast link. We also give an achievable trade-off between the communication and shared randomness rates. Two processors output correlated sequences using the help of a coordinator with whom they individually share independent randomness. For the case of unlimited shared randomness, we characterize the rate of communication required from the coordinator to the processors over a broadcast link. We also give an achievable trade-off between the communication and shared randomness rates. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: Extended version of a paper accepted for presentation at ISIT 2018. 8 pages, 3 figures

arXiv:1804.10299 [pdf, other]

doi 10.1109/JSTSP.2018.2877842

Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization

Authors: Hafiz Imtiaz, Anand D. Sarwate

Abstract: In many signal processing and machine learning applications, datasets containing private information are held at different locations, requiring the development of distributed privacy-preserving algorithms. Tensor and matrix factorizations are key components of many processing pipelines. In the distributed setting, differentially private algorithms suffer because they introduce noise to guarantee p… ▽ More In many signal processing and machine learning applications, datasets containing private information are held at different locations, requiring the development of distributed privacy-preserving algorithms. Tensor and matrix factorizations are key components of many processing pipelines. In the distributed setting, differentially private algorithms suffer because they introduce noise to guarantee privacy. This paper designs new and improved distributed and differentially private algorithms for two popular matrix and tensor factorization methods: principal component analysis (PCA) and orthogonal tensor decomposition (OTD). The new algorithms employ a correlated noise design scheme to alleviate the effects of noise and can achieve the same noise level as the centralized scenario. Experiments on synthetic and real data illustrate the regimes in which the correlated noise allows performance matching with the centralized setting, outperforming previous methods and demonstrating that meaningful utility is possible while guaranteeing differential privacy. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Comments: 39 pages, in review for publication

Journal ref: IEEE Journal of Selected Topics in Signal Proessing 2018

arXiv:1801.05951 [pdf, other]

Quadratically Constrained Myopic Adversarial Channels

Authors: Yihan Zhang, Shashank Vatedka, Sidharth Jaggi, Anand Sarwate

Abstract: We study communication in the presence of a jamming adversary where quadratic power constraints are imposed on the transmitter and the jammer. The jamming signal is allowed to be a function of the codebook, and a noncausal but noisy observation of the transmitted codeword. For a certain range of the noise-to-signal ratios (NSRs) of the transmitter and the jammer, we are able to characterize the ca… ▽ More We study communication in the presence of a jamming adversary where quadratic power constraints are imposed on the transmitter and the jammer. The jamming signal is allowed to be a function of the codebook, and a noncausal but noisy observation of the transmitted codeword. For a certain range of the noise-to-signal ratios (NSRs) of the transmitter and the jammer, we are able to characterize the capacity of this channel under deterministic encoding or stochastic encoding, i.e., with no common randomness between the encoder/decoder pair. For the remaining NSR regimes, we determine the capacity under the assumption of a small amount of common randomness (at most $2\log(n)$ bits in one sub-regime, and at most $Ω(n)$ bits in the other sub-regime) available to the encoder-decoder pair. Our proof techniques involve a novel myopic list-decoding result for achievability, and a Plotkin-type push attack for the converse in a subregion of the NSRs, both of which which may be of independent interest. We also give bounds on the secrecy capacity of this channel assuming that the jammer is simultaneously eavesdropping. △ Less

Submitted 10 August, 2020; v1 submitted 18 January, 2018; originally announced January 2018.

Comments: Improved z-aware symmetrization bound is added, subsuming those given by z-agnostic symmetrization and the old z-aware symmetrization in the previous version

arXiv:1712.03471 [pdf, other]

doi 10.1109/JSTSP.2018.2838092

Identifiability of Kronecker-structured Dictionaries for Tensor Data

Authors: Zahra Shakeri, Anand D. Sarwate, Waheed U. Bajwa

Abstract: This paper derives sufficient conditions for local recovery of coordinate dictionaries comprising a Kronecker-structured dictionary that is used for representing $K$th-order tensor data. Tensor observations are assumed to be generated from a Kronecker-structured dictionary multiplied by sparse coefficient tensors that follow the separable sparsity model. This work provides sufficient conditions on… ▽ More This paper derives sufficient conditions for local recovery of coordinate dictionaries comprising a Kronecker-structured dictionary that is used for representing $K$th-order tensor data. Tensor observations are assumed to be generated from a Kronecker-structured dictionary multiplied by sparse coefficient tensors that follow the separable sparsity model. This work provides sufficient conditions on the underlying coordinate dictionaries, coefficient and noise distributions, and number of samples that guarantee recovery of the individual coordinate dictionaries up to a specified error, as a local minimum of the objective function, with high probability. In particular, the sample complexity to recover $K$ coordinate dictionaries with dimensions $m_k \times p_k$ up to estimation error $\varepsilon_k$ is shown to be $\max_{k \in [K]}\mathcal{O}(m_kp_k^3\varepsilon_k^{-2})$. △ Less

Submitted 25 May, 2018; v1 submitted 10 December, 2017; originally announced December 2017.

Comments: 16 pages, to appear in IEEE Journal of Special Topics in Signal Processing

Journal ref: IEEE J. Sel. Topics Signal Processing, vol. 12, no. 5, pp. 1047-1062, Oct. 2018

arXiv:1711.04887 [pdf, other]

STARK: Structured Dictionary Learning Through Rank-one Tensor Recovery

Authors: Mohsen Ghassemi, Zahra Shakeri, Anand D. Sarwate, Waheed U. Bajwa

Abstract: In recent years, a class of dictionaries have been proposed for multidimensional (tensor) data representation that exploit the structure of tensor data by imposing a Kronecker structure on the dictionary underlying the data. In this work, a novel algorithm called "STARK" is provided to learn Kronecker structured dictionaries that can represent tensors of any order. By establishing that the Kroneck… ▽ More In recent years, a class of dictionaries have been proposed for multidimensional (tensor) data representation that exploit the structure of tensor data by imposing a Kronecker structure on the dictionary underlying the data. In this work, a novel algorithm called "STARK" is provided to learn Kronecker structured dictionaries that can represent tensors of any order. By establishing that the Kronecker product of any number of matrices can be rearranged to form a rank-1 tensor, we show that Kronecker structure can be enforced on the dictionary by solving a rank-1 tensor recovery problem. Because rank-1 tensor recovery is a challenging nonconvex problem, we resort to solving a convex relaxation of this problem. Empirical experiments on synthetic and real data show promising results for our proposed algorithm. △ Less

Submitted 13 November, 2017; originally announced November 2017.

arXiv:1705.09905 [pdf, other]

doi 10.1109/CLUSTER.2017.75

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Authors: Bangtian Liu, Chengyao Wen, Anand D. Sarwate, Maryam Mehri Dehnavi

Abstract: Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor oper… ▽ More Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations make such implementations challenging. We leverage the fact that sparse tensor operations share similar computation patterns to propose a unified tensor representation called F-COO. Combined with GPU-specific optimizations, F-COO provides highly-optimized implementations of sparse tensor computations on GPUs. The performance of the proposed unified approach is demonstrated for tensor-based kernels such as the Sparse Matricized Tensor- Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor- Times-Matrix Multiply (SpTTM) and is used in tensor decomposition algorithms. Compared to state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to 3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement a CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs. △ Less

Submitted 28 May, 2017; originally announced May 2017.

arXiv:1608.02792 [pdf, other]

doi 10.1109/TIT.2018.2799931

Minimax Lower Bounds on Dictionary Learning for Tensor Data

Authors: Zahra Shakeri, Waheed U. Bajwa, Anand D. Sarwate

Abstract: This paper provides fundamental limits on the sample complexity of estimating dictionaries for tensor data. The specific focus of this work is on $K$th-order tensor data and the case where the underlying dictionary can be expressed in terms of $K$ smaller dictionaries. It is assumed the data are generated by linear combinations of these structured dictionary atoms and observed through white Gaussi… ▽ More This paper provides fundamental limits on the sample complexity of estimating dictionaries for tensor data. The specific focus of this work is on $K$th-order tensor data and the case where the underlying dictionary can be expressed in terms of $K$ smaller dictionaries. It is assumed the data are generated by linear combinations of these structured dictionary atoms and observed through white Gaussian noise. This work first provides a general lower bound on the minimax risk of dictionary learning for such tensor data and then adapts the proof techniques for specialized results in the case of sparse and sparse-Gaussian linear combinations. The results suggest the sample complexity of dictionary learning for tensor data can be significantly lower than that for unstructured data: for unstructured data it scales linearly with the product of the dictionary dimensions, whereas for tensor-structured data the bound scales linearly with the sum of the product of the dimensions of the (smaller) component dictionaries. A partial converse is provided for the case of 2nd-order tensor data to show that the bounds in this paper can be tight. This involves developing an algorithm for learning highly-structured dictionaries from noisy tensor data. Finally, numerical experiments highlight the advantages associated with explicitly accounting for tensor data structure during dictionary learning. △ Less

Submitted 18 February, 2018; v1 submitted 9 August, 2016; originally announced August 2016.

Comments: In IEEE Transactions on Information Theory

Journal ref: IEEE Trans. Inform. Theory, vol. 64, no. 4, pp. 2706-2726, Apr. 2018

arXiv:1605.05284 [pdf, other]

doi 10.1109/ISIT.2016.7541479

Minimax Lower Bounds for Kronecker-Structured Dictionary Learning

Authors: Zahra Shakeri, Waheed U. Bajwa, Anand D. Sarwate

Abstract: Dictionary learning is the problem of estimating the collection of atomic elements that provide a sparse representation of measured/collected signals or data. This paper finds fundamental limits on the sample complexity of estimating dictionaries for tensor data by proving a lower bound on the minimax risk. This lower bound depends on the dimensions of the tensor and parameters of the generative m… ▽ More Dictionary learning is the problem of estimating the collection of atomic elements that provide a sparse representation of measured/collected signals or data. This paper finds fundamental limits on the sample complexity of estimating dictionaries for tensor data by proving a lower bound on the minimax risk. This lower bound depends on the dimensions of the tensor and parameters of the generative model. The focus of this paper is on second-order tensor data, with the underlying dictionaries constructed by taking the Kronecker product of two smaller dictionaries and the observed data generated by sparse linear combinations of dictionary atoms observed through white Gaussian noise. In this regard, the paper provides a general lower bound on the minimax risk and also adapts the proof techniques for equivalent results using sparse and Gaussian coefficient models. The reported results suggest that the sample complexity of dictionary learning for tensor data can be significantly lower than that for unstructured data. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Comments: 5 pages, 1 figure. To appear in 2016 IEEE International Symposium on Information Theory

Journal ref: Proc. IEEE Intl. Symp. Information Theory, Barcelona, Spain, Jul. 10-15, 2016, pp. 1148-1152

arXiv:1602.03571 [pdf, other]

High Dimensional Inference with Random Maximum A-Posteriori Perturbations

Authors: Tamir Hazan, Francesco Orabona, Anand D. Sarwate, Subhransu Maji, Tommi Jaakkola

Abstract: This paper presents a new approach, called perturb-max, for high-dimensional statistical inference that is based on applying random perturbations followed by optimization. This framework injects randomness to maximum a-posteriori (MAP) predictors by randomly perturbing the potential function for the input. A classic result from extreme value statistics asserts that perturb-max operations generate… ▽ More This paper presents a new approach, called perturb-max, for high-dimensional statistical inference that is based on applying random perturbations followed by optimization. This framework injects randomness to maximum a-posteriori (MAP) predictors by randomly perturbing the potential function for the input. A classic result from extreme value statistics asserts that perturb-max operations generate unbiased samples from the Gibbs distribution using high-dimensional perturbations. Unfortunately, the computational cost of generating so many high-dimensional random variables can be prohibitive. However, when the perturbations are of low dimension, sampling the perturb-max prediction is as efficient as MAP optimization. This paper shows that the expected value of perturb-max inference with low dimensional perturbations can be used sequentially to generate unbiased samples from the Gibbs distribution. Furthermore the expected value of the maximal perturbations is a natural bound on the entropy of such perturb-max models. A measure concentration result for perturb-max values shows that the deviation of their sampled average from its expectation decays exponentially in the number of samples, allowing effective approximation of the expectation. △ Less

Submitted 30 May, 2017; v1 submitted 10 February, 2016; originally announced February 2016.

Comments: 47 pages, 10 figures, under review

arXiv:1602.02384 [pdf, other]

The benefit of a 1-bit jump-start, and the necessity of stochastic encoding, in jamming channels

Authors: Bikash Kumar Dey, Sidharth Jaggi, Michael Langberg, Anand D. Sarwate

Abstract: We consider the problem of communicating a message $m$ in the presence of a malicious jamming adversary (Calvin), who can erase an arbitrary set of up to $pn$ bits, out of $n$ transmitted bits $(x_1,\ldots,x_n)$. The capacity of such a channel when Calvin is exactly causal, i.e. Calvin's decision of whether or not to erase bit $x_i$ depends on his observations $(x_1,\ldots,x_i)$ was recently chara… ▽ More We consider the problem of communicating a message $m$ in the presence of a malicious jamming adversary (Calvin), who can erase an arbitrary set of up to $pn$ bits, out of $n$ transmitted bits $(x_1,\ldots,x_n)$. The capacity of such a channel when Calvin is exactly causal, i.e. Calvin's decision of whether or not to erase bit $x_i$ depends on his observations $(x_1,\ldots,x_i)$ was recently characterized to be $1-2p$. In this work we show two (perhaps) surprising phenomena. Firstly, we demonstrate via a novel code construction that if Calvin is delayed by even a single bit, i.e. Calvin's decision of whether or not to erase bit $x_i$ depends only on $(x_1,\ldots,x_{i-1})$ (and is independent of the "current bit" $x_i$) then the capacity increases to $1-p$ when the encoder is allowed to be stochastic. Secondly, we show via a novel jamming strategy for Calvin that, in the single-bit-delay setting, if the encoding is deterministic (i.e. the transmitted codeword is a deterministic function of the message $m$) then no rate asymptotically larger than $1-2p$ is possible with vanishing probability of error, hence stochastic encoding (using private randomness at the encoder) is essential to achieve the capacity of $1-p$ against a one-bit-delayed Calvin. △ Less

Submitted 7 February, 2016; originally announced February 2016.

Comments: 21 pages, 4 figures, extended draft of submission to ISIT 2016

arXiv:1601.06426 [pdf, other]

doi 10.1109/TIFS.2018.2831619

Robust Privacy-Utility Tradeoffs under Differential Privacy and Hamming Distortion

Authors: Kousha Kalantari, Lalitha Sankar, Anand Sarwate

Abstract: A privacy-utility tradeoff is developed for an arbitrary set of finite-alphabet source distributions. Privacy is quantified using differential privacy (DP), and utility is quantified using expected Hamming distortion maximized over the set of distributions. The family of source distribution sets (source sets) is categorized into three classes, based on different levels of prior knowledge they capt… ▽ More A privacy-utility tradeoff is developed for an arbitrary set of finite-alphabet source distributions. Privacy is quantified using differential privacy (DP), and utility is quantified using expected Hamming distortion maximized over the set of distributions. The family of source distribution sets (source sets) is categorized into three classes, based on different levels of prior knowledge they capture. For source sets whose convex hull includes the uniform distribution, symmetric DP mechanisms are optimal. For source sets whose probability values have a fixed monotonic ordering, asymmetric DP mechanisms are optimal. For all other source sets, general upper and lower bounds on the optimal privacy leakage are developed and a necessary and sufficient condition for tightness are established. Differentially private leakage is an upper bound on mutual information (MI) leakage: the two criteria are compared analytically and numerically to illustrate the effect of adopting a stronger privacy criterion. △ Less

Submitted 1 August, 2018; v1 submitted 24 January, 2016; originally announced January 2016.

Comments: Extended abstract of ISIT 2016 submission

Journal ref: K. Kalantari, L. Sankar and A. D. Sarwate, "Robust Privacy-Utility Tradeoffs Under Differential Privacy and Hamming Distortion," in IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2816-2830, Nov. 2018

arXiv:1508.01818 [pdf, other]

Designing Incentive Schemes For Privacy-Sensitive Users

Authors: Chong Huang, Lalitha Sankar, Anand D. Sarwate

Abstract: Businesses (retailers) often wish to offer personalized advertisements (coupons) to individuals (consumers), but run the risk of strong reactions from consumers who want a customized shopping experience but feel their privacy has been violated. Existing models for privacy such as differential privacy or information theory try to quantify privacy risk but do not capture the subjective experience an… ▽ More Businesses (retailers) often wish to offer personalized advertisements (coupons) to individuals (consumers), but run the risk of strong reactions from consumers who want a customized shopping experience but feel their privacy has been violated. Existing models for privacy such as differential privacy or information theory try to quantify privacy risk but do not capture the subjective experience and heterogeneous expression of privacy-sensitivity. We propose a Markov decision process (MDP) model to capture (i) different consumer privacy sensitivities via a time-varying state; (ii) different coupon types (action set) for the retailer; and (iii) the action-and-state-dependent cost for perceived privacy violations. For the simple case with two states ("Normal" and "Alerted"), two coupons (targeted and untargeted) model, and consumer behavior statistics known to the retailer, we show that a stationary threshold-based policy is the optimal coupon-offering strategy for a retailer that wishes to minimize its expected discounted cost. The threshold is a function of all model parameters; the retailer offers a targeted coupon if their belief that the consumer is in the "Alerted" state is below the threshold. We extend this two-state model to consumers with multiple privacy-sensitivity states as well as coupon-dependent state transition probabilities. Furthermore, we study the case with imperfect (noisy) cost feedback from consumers and uncertain initial belief state. △ Less

Submitted 23 September, 2015; v1 submitted 7 August, 2015; originally announced August 2015.

Comments: 25 pages, 10 figures, submitted to journal of privacy and confidentiality

arXiv:1412.5617 [pdf, other]

Learning from Data with Heterogeneous Noise using SGD

Authors: Shuang Song, Kamalika Chaudhuri, Anand D. Sarwate

Abstract: We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient… ▽ More We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality. The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Experiments on real data show that our method performs better than using a single learning rate and using only the less noisy of the two datasets when the noise level is low to moderate. △ Less

Submitted 17 December, 2014; originally announced December 2014.

arXiv:1410.4307 [pdf, other]

Social Learning and Distributed Hypothesis Testing

Authors: Anusha Lalitha, Tara Javidi, Anand Sarwate

Abstract: This paper considers a problem of distributed hypothesis testing and social learning. Individual nodes in a network receive noisy local (private) observations whose distribution is parameterized by a discrete parameter (hypotheses). The conditional distributions are known locally at the nodes, but the true parameter/hypothesis is not known. An update rule is analyzed in which nodes first perform a… ▽ More This paper considers a problem of distributed hypothesis testing and social learning. Individual nodes in a network receive noisy local (private) observations whose distribution is parameterized by a discrete parameter (hypotheses). The conditional distributions are known locally at the nodes, but the true parameter/hypothesis is not known. An update rule is analyzed in which nodes first perform a Bayesian update of their belief (distribution estimate) of the parameter based on their local observation, communicate these updates to their neighbors, and then perform a "non-Bayesian" linear consensus using the log-beliefs of their neighbors. In this paper we show that under mild assumptions, the belief of any node in any incorrect hypothesis converges to zero exponentially fast, and we characterize the exponential rate of learning which is given in terms of the network structure and the divergences between the observations' distributions. Our main result is the concentration property established on the rate of convergence. △ Less

Submitted 16 May, 2016; v1 submitted 16 October, 2014; originally announced October 2014.

arXiv:1409.7614 [pdf, other]

Generalized Opinion Dynamics from Local Optimization Rules

Authors: Avhishek Chatterjee, Anand D. Sarwate, Sriram Vishwanath

Abstract: We study generalizations of the Hegselmann-Krause (HK) model for opinion dynamics, incorporating features and parameters that are natural components of observed social systems. The first generalization is one where the strength of influence depends on the distance of the agents' opinions. Under this setup, we identify conditions under which the opinions converge in finite time, and provide a quali… ▽ More We study generalizations of the Hegselmann-Krause (HK) model for opinion dynamics, incorporating features and parameters that are natural components of observed social systems. The first generalization is one where the strength of influence depends on the distance of the agents' opinions. Under this setup, we identify conditions under which the opinions converge in finite time, and provide a qualitative characterization of the equilibrium. We interpret the HK model opinion update rule as a quadratic cost-minimization rule. This enables a second generalization: a family of update rules which possess different equilibrium properties. Subsequently, we investigate models in which a external force can behave strategically to modulate/influence user updates. We consider cases where this external force can introduce additional agents and cases where they can modify the cost structures for other agents. We describe and analyze some strategies through which such modulation may be possible in an order-optimal manner. Our simulations demonstrate that generalized dynamics differ qualitatively and quantitatively from traditional HK dynamics. △ Less

Submitted 25 September, 2014; originally announced September 2014.

Comments: 20 pages, under review

arXiv:1407.5383 [pdf, other]

doi 10.3390/e16105339

Redundancy of Exchangeable Estimators

Authors: Narayana P. Santhanam, Anand D. Sarwate, Jae Oh Woo

Abstract: Exchangeable random partition processes are the basis for Bayesian approaches to statistical inference in large alphabet settings. On the other hand, the notion of the pattern of a sequence provides an information-theoretic framework for data compression in large alphabet scenarios. Because data compression and parameter estimation are intimately related, we study the redundancy of Bayes estimator… ▽ More Exchangeable random partition processes are the basis for Bayesian approaches to statistical inference in large alphabet settings. On the other hand, the notion of the pattern of a sequence provides an information-theoretic framework for data compression in large alphabet scenarios. Because data compression and parameter estimation are intimately related, we study the redundancy of Bayes estimators coming from Poisson-Dirichlet priors (or "Chinese restaurant processes") and the Pitman-Yor prior. This provides an understanding of these estimators in the setting of unknown discrete alphabets from the perspective of universal compression. In particular, we identify relations between alphabet sizes and sample sizes where the redundancy is small, thereby characterizing useful regimes for these estimators. △ Less

Submitted 20 October, 2014; v1 submitted 21 July, 2014; originally announced July 2014.

Comments: 18 pages

arXiv:1310.4227 [pdf, other]

On Measure Concentration of Random Maximum A-Posteriori Perturbations

Authors: Francesco Orabona, Tamir Hazan, Anand D. Sarwate, Tommi Jaakkola

Abstract: The maximum a-posteriori (MAP) perturbation framework has emerged as a useful approach for inference and learning in high dimensional complex models. By maximizing a randomly perturbed potential function, MAP perturbations generate unbiased samples from the Gibbs distribution. Unfortunately, the computational cost of generating so many high-dimensional random variables can be prohibitive. More eff… ▽ More The maximum a-posteriori (MAP) perturbation framework has emerged as a useful approach for inference and learning in high dimensional complex models. By maximizing a randomly perturbed potential function, MAP perturbations generate unbiased samples from the Gibbs distribution. Unfortunately, the computational cost of generating so many high-dimensional random variables can be prohibitive. More efficient algorithms use sequential sampling strategies based on the expected value of low dimensional MAP perturbations. This paper develops new measure concentration inequalities that bound the number of samples needed to estimate such expected values. Applying the general result to MAP perturbations can yield a more efficient algorithm to approximate sampling from the Gibbs distribution. The measure concentration result is of general interest and may be applicable to other areas involving expected estimations. △ Less

Submitted 15 October, 2013; originally announced October 2013.

arXiv:1306.2347 [pdf, other]

Auditing: Active Learning with Outcome-Dependent Query Costs

Authors: Sivan Sabato, Anand D. Sarwate, Nathan Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting audit… ▽ More We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also discuss a general competitive approach for auditing and possible modifications to the framework. △ Less

Submitted 12 July, 2015; v1 submitted 10 June, 2013; originally announced June 2013.

Comments: Corrections in section 5

Journal ref: Neural Information Processing Systems 26 (NIPS), 512-520, 2013

arXiv:1305.4548 [pdf, other]

Distributed Learning of Distributions via Social Sampling

Authors: Anand D. Sarwate, Tara Javidi

Abstract: A protocol for distributed estimation of discrete distributions is proposed. Each agent begins with a single sample from the distribution, and the goal is to learn the empirical distribution of the samples. The protocol is based on a simple message-passing model motivated by communication in social networks. Agents sample a message randomly from their current estimates of the distribution, resulti… ▽ More A protocol for distributed estimation of discrete distributions is proposed. Each agent begins with a single sample from the distribution, and the goal is to learn the empirical distribution of the samples. The protocol is based on a simple message-passing model motivated by communication in social networks. Agents sample a message randomly from their current estimates of the distribution, resulting in a protocol with quantized messages. Using tools from stochastic approximation, the algorithm is shown to converge almost surely. Examples illustrate three regimes with different consensus phenomena. Simulations demonstrate this convergence and give some insight into the effect of network topology. △ Less

Submitted 5 June, 2014; v1 submitted 20 May, 2013; originally announced May 2013.

Comments: 17 pages, accepted to IEEE Transactions on Automatic Control

arXiv:1209.2755 [pdf, ps, other]

Relaxing the Gaussian AVC

Authors: Anand D. Sarwate, Michael Gastpar

Abstract: The arbitrarily varying channel (AVC) is a conservative way of modeling an unknown interference, and the corresponding capacity results are pessimistic. We reconsider the Gaussian AVC by relaxing the classical model and thereby weakening the adversarial nature of the interference. We examine three different relaxations. First, we show how a very small amount of common randomness between transmitte… ▽ More The arbitrarily varying channel (AVC) is a conservative way of modeling an unknown interference, and the corresponding capacity results are pessimistic. We reconsider the Gaussian AVC by relaxing the classical model and thereby weakening the adversarial nature of the interference. We examine three different relaxations. First, we show how a very small amount of common randomness between transmitter and receiver is sufficient to achieve the rates of fully randomized codes. Second, akin to the dirty paper coding problem, we study the impact of an additional interference known to the transmitter. We provide partial capacity results that differ significantly from the standard AVC. Third, we revisit a Gaussian MIMO AVC in which the interference is arbitrary but of limited dimension. △ Less

Submitted 12 September, 2012; originally announced September 2012.

Comments: Submitted to the IEEE Transactions on Information Theory

arXiv:1207.2812 [pdf, other]

Near-Optimal Algorithms for Differentially-Private Principal Components

Authors: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

Abstract: Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between pr… ▽ More Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. △ Less

Submitted 7 August, 2013; v1 submitted 11 July, 2012; originally announced July 2012.

Comments: 37 pages, 8 figures; final version to appear in the Journal of Machine Learning Research, preliminary version was at NIPS 2012

Showing 1–50 of 59 results for author: Sarwate, A