Skip to main content

Showing 1–25 of 25 results for author: Thakkar, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.01948  [pdf, other

    cs.CR

    Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models

    Authors: Hongbin Liu, Lun Wang, Om Thakkar, Abhradeep Thakurta, Arun Narayanan

    Abstract: Large ASR models can inadvertently leak sensitive information, which can be mitigated by formal privacy measures like differential privacy (DP). However, traditional DP training is computationally expensive, and can hurt model performance. Our study explores DP parameter-efficient fine-tuning as a way to mitigate privacy risks with smaller computation and performance costs for ASR models. Through… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  2. arXiv:2409.13953  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Training Large ASR Encoders with Differential Privacy

    Authors: Geeticka Chauhan, Steve Chien, Om Thakkar, Abhradeep Thakurta, Arun Narayanan

    Abstract: Self-supervised learning (SSL) methods for large speech models have proven to be highly effective at ASR. With the interest in public deployment of large pre-trained models, there is a rising concern for unintended memorization and leakage of sensitive data points from the training data. In this paper, we apply differentially private (DP) pre-training to a SOTA Conformer-based encoder, and study i… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: In proceedings of the IEEE Spoken Language Technologies Workshop, 2024

  3. arXiv:2406.02004  [pdf, ps, other

    cs.CR cs.CL cs.SD eess.AS

    Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping

    Authors: Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan

    Abstract: Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memorization. This work systematically investigates the impact of a specific granularity of gradient clipping, namely per-core clip-ping (PCC), across train… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech'24

  4. arXiv:2404.02052  [pdf, other

    cs.LG

    Noise Masking Attacks and Defenses for Pretrained Speech Models

    Authors: Matthew Jagielski, Om Thakkar, Lun Wang

    Abstract: Speech models are often trained on sensitive data in order to improve model performance, leading to potential privacy leakage. Our work considers noise masking attacks, introduced by Amid et al. 2022, which attack automatic speech recognition (ASR) models by requesting a transcript of an utterance which is partially replaced with noise. They show that when a record has been seen at training time,… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: accepted to ICASSP 2024

  5. arXiv:2310.11739  [pdf, other

    cs.LG cs.SD eess.AS

    Unintended Memorization in Large ASR Models, and How to Mitigate It

    Authors: Lun Wang, Om Thakkar, Rajiv Mathews

    Abstract: It is well-known that neural networks can unintentionally memorize their training examples, causing privacy concerns. However, auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challenging due to the high compute cost of existing methods such as hardness calibration. In this work, we design a simple auditing method to measure memorization in larg… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  6. arXiv:2302.09483  [pdf, other

    cs.LG

    Why Is Public Pretraining Necessary for Private Model Training?

    Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang

    Abstract: In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretrai… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  7. arXiv:2210.01864  [pdf, other

    cs.LG cs.CR

    Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

    Authors: Virat Shejwalkar, Arun Ganesh, Rajiv Mathews, Yarong Mu, Shuang Song, Om Thakkar, Abhradeep Thakurta, Xinyi Zheng

    Abstract: In this work, we focus on improving the accuracy-variance trade-off for state-of-the-art differentially private machine learning (DP ML) methods. First, we design a general framework that uses aggregates of intermediate checkpoints \emph{during training} to increase the accuracy of DP ML techniques. Specifically, we demonstrate that training over aggregates can provide significant gains in predict… ▽ More

    Submitted 17 September, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: New results on pCVR task

  8. arXiv:2207.00099  [pdf, other

    cs.LG

    Measuring Forgetting of Memorized Training Examples

    Authors: Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang

    Abstract: Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what… ▽ More

    Submitted 9 May, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: Appeared at ICLR '23, 22 pages, 12 figures

  9. arXiv:2204.09606  [pdf, other

    cs.CL cs.CR cs.LG cs.SD eess.AS

    Detecting Unintended Memorization in Language-Model-Fused ASR

    Authors: W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews

    Abstract: End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentionally memorizing rare or unique sequences in the training data. In this work, we design a framework for detecting memorization of random textual seque… ▽ More

    Submitted 28 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: Interspeech 2022

  10. arXiv:2204.08345  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Extracting Targeted Training Data from ASR Models, and How to Mitigate It

    Authors: Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays

    Abstract: Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data… ▽ More

    Submitted 27 June, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted to appear at Interspeech'22

  11. arXiv:2112.00193  [pdf, other

    cs.LG cs.CR

    Public Data-Assisted Mirror Descent for Private Model Training

    Authors: Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta

    Abstract: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by t… ▽ More

    Submitted 27 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: 20 pages, 8 figures, 3 tables

  12. arXiv:2111.04906  [pdf, other

    stat.ML cs.CR cs.LG

    The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection

    Authors: Shubhankar Mohapatra, Sajin Sasy, Xi He, Gautam Kamath, Om Thakkar

    Abstract: Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private m… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

  13. arXiv:2111.00556  [pdf, other

    cs.LG cs.CL cs.CR

    Revealing and Protecting Labels in Distributed Training

    Authors: Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays

    Abstract: Distributed learning paradigms such as federated learning often involve transmission of model updates, or gradients, over a network, thereby avoiding transmission of private data. However, it is possible for sensitive information about the training data to be revealed from such gradients. Prior works have demonstrated that labels can be revealed analytically from the last layer of certain models (… ▽ More

    Submitted 31 October, 2021; originally announced November 2021.

  14. arXiv:2104.07815  [pdf, other

    cs.CL cs.CR cs.LG

    A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It

    Authors: Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays

    Abstract: End-to-end Automatic Speech Recognition (ASR) models are commonly trained over spoken utterances using optimization methods like Stochastic Gradient Descent (SGD). In distributed settings like Federated Learning, model training requires transmission of gradients over a network. In this work, we design the first method for revealing the identity of the speaker of a training utterance with access on… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  15. arXiv:2103.00039  [pdf, other

    cs.CR cs.LG

    Practical and Private (Deep) Learning without Sampling or Shuffling

    Authors: Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu

    Abstract: We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in importan… ▽ More

    Submitted 10 December, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

  16. arXiv:2009.10031  [pdf, other

    cs.LG cs.CR stat.ML

    Training Production Language Models without Memorizing User Data

    Authors: Swaroop Ramaswamy, Om Thakkar, Rajiv Mathews, Galen Andrew, H. Brendan McMahan, Françoise Beaufays

    Abstract: This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique. There has been prior work on building practical FL infrastructure, including work demonstrating the feasibility of training language models on mobile devices using such infrastructure. It has also b… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  17. arXiv:2007.06605  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy Amplification via Random Check-Ins

    Authors: Borja Balle, Peter Kairouz, H. Brendan McMahan, Om Thakkar, Abhradeep Thakurta

    Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via naïve schemes. A key assumption in both these approaches is that the elements in the data set can be u… ▽ More

    Submitted 30 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Updated proof for $(ε_0, δ_0)$-DP local randomizers

  18. arXiv:2006.07490  [pdf, other

    cs.LG cs.CL stat.ML

    Understanding Unintended Memorization in Federated Learning

    Authors: Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Françoise Beaufays

    Abstract: Recent works have shown that generative sequence models (e.g., language models) have a tendency to memorize rare or unique sequences in the training data. Since useful models are often trained on sensitive data, to ensure the privacy of the training data it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale di… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  19. arXiv:2006.06783  [pdf, other

    cs.CR cs.LG math.OC stat.ML

    Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent

    Authors: Shuang Song, Thomas Steinke, Om Thakkar, Abhradeep Thakurta

    Abstract: We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy para… ▽ More

    Submitted 2 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  20. arXiv:1906.09231  [pdf, other

    cs.LG math.ST stat.ML

    Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

    Authors: Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth

    Abstract: We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many setti… ▽ More

    Submitted 9 March, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: Accepted to appear in the proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

  21. arXiv:1905.03871  [pdf, other

    cs.LG stat.ML

    Differentially Private Learning with Adaptive Clipping

    Authors: Galen Andrew, Om Thakkar, H. Brendan McMahan, Swaroop Ramaswamy

    Abstract: Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the mod… ▽ More

    Submitted 9 May, 2022; v1 submitted 9 May, 2019; originally announced May 2019.

    Comments: Accepted to NeurIPS, 2021

  22. arXiv:1803.05101  [pdf, ps, other

    cs.LG

    Model-Agnostic Private Learning via Stability

    Authors: Raef Bassily, Om Thakkar, Abhradeep Thakurta

    Abstract: We design differentially private learning algorithms that are agnostic to the learning model. Our algorithms are interactive in nature, i.e., instead of outputting a model based on the training data, they provide predictions for a set of $m$ feature vectors that arrive online. We show that, for the feature vectors on which an ensemble of models (trained on random disjoint subsets of a dataset) mak… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

  23. arXiv:1712.09765  [pdf, other

    cs.LG

    Differentially Private Matrix Completion Revisited

    Authors: Prateek Jain, Om Thakkar, Abhradeep Thakurta

    Abstract: We provide the first provably joint differentially private algorithm with formal utility guarantees for the problem of user-level privacy-preserving collaborative filtering. Our algorithm is based on the Frank-Wolfe method, and it consistently estimates the underlying preference matrix as long as the number of users $m$ is $ω(n^{5/4})$, where $n$ is the number of items, and each user provides her… ▽ More

    Submitted 11 June, 2018; v1 submitted 28 December, 2017; originally announced December 2017.

    Comments: Updated version. Accepted for presentation at International Conference on Machine Learning (ICML) 2018

  24. arXiv:1604.03924  [pdf, other

    cs.LG

    Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing

    Authors: Ryan Rogers, Aaron Roth, Adam Smith, Om Thakkar

    Abstract: In this paper, we initiate a principled study of how the generalization properties of approximate differential privacy can be used to perform adaptive hypothesis testing, while giving statistically valid $p$-value corrections. We do this by observing that the guarantees of algorithms with bounded approximate max-information are sufficient to correct the $p$-values of adaptively chosen hypotheses,… ▽ More

    Submitted 9 September, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

  25. arXiv:1507.01818  [pdf, other

    math.CO cs.DM

    Improved Upper Bounds on $a'(G\Box H)$

    Authors: Punit Mehta, Rahul Muthu, Gaurav Patel, Om Thakkar, Devanshi Vyas

    Abstract: The acyclic edge colouring problem is extensively studied in graph theory. The corner-stone of this field is a conjecture of Alon et. al.\cite{alonacyclic} that $a'(G)\le Δ(G)+2$. In that and subsequent work, $a'(G)$ is typically bounded in terms of $Δ(G)$. Motivated by this we introduce a term $gap(G)$ defined as $gap(G)=a'(G)-Δ(G)$. Alon's conjecture can be rephrased as $gap(G)\le2$ for all grap… ▽ More

    Submitted 7 July, 2015; originally announced July 2015.

    Comments: 10 pages, 5 figures