Skip to main content

Showing 1–22 of 22 results for author: Yekhanin, S

.
  1. arXiv:2502.05505  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

    Authors: Zinan Lin, Tadas Baltrusaitis, Wenyu Wang, Sergey Yekhanin

    Abstract: Differentially private (DP) synthetic data, which closely resembles the original private data while maintaining strong privacy guarantees, has become a key tool for unlocking the value of private data without compromising privacy. Recently, Private Evolution (PE) has emerged as a promising method for generating DP synthetic data. Unlike other training-based approaches, PE only requires access to i… ▽ More

    Submitted 20 May, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

    Comments: Published in: (1) ICLR 2025 Workshop on Data Problems, (2) ICLR 2025 Workshop on Synthetic Data

  2. arXiv:2404.02241  [pdf, other

    cs.CV

    Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Shuaiqi Wang, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by pro… ▽ More

    Submitted 26 February, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

  3. arXiv:2403.01749  [pdf, other

    cs.CL

    Differentially Private Synthetic Data via Foundation Model APIs 2: Text

    Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

    Abstract: Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalab… ▽ More

    Submitted 23 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: ICML'24 Spotlight

  4. arXiv:2305.15560  [pdf, other

    cs.CV cs.CR cs.LG

    Differentially Private Synthetic Data via Foundation Model APIs 1: Images

    Authors: Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Harsha Nori, Sergey Yekhanin

    Abstract: Generating differentially private (DP) synthetic data that closely resembles the original private data is a scalable way to mitigate privacy concerns in the current data-driven world. In contrast to current practices that train customized models for this task, we aim to generate DP Synthetic Data via APIs (DPSDA), where we treat foundation models as blackboxes and only utilize their inference APIs… ▽ More

    Submitted 17 May, 2025; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published in ICLR 2024

  5. arXiv:2304.06929  [pdf

    cs.CR

    Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

    Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

    Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  6. arXiv:2110.06500  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Differentially Private Fine-tuning of Language Models

    Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

    Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially… ▽ More

    Submitted 14 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: ICLR 2022. Code available at https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models

  7. arXiv:2108.02831  [pdf, other

    cs.LG cs.CR cs.DS

    Differentially Private n-gram Extraction

    Authors: Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, Sergey Yekhanin

    Abstract: We revisit the problem of $n$-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many $n$-grams as possible while preserving user level privacy. Extracting $n$-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other a… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

  8. arXiv:2107.06440  [pdf, other

    cs.IT

    Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage

    Authors: Sundara Rajan Srinivasavaradhan, Sivakanth Gopi, Henry D. Pfister, Sergey Yekhanin

    Abstract: Sequencing a DNA strand, as part of the read process in DNA storage, produces multiple noisy copies which can be combined to produce better estimates of the original strand; this is called trace reconstruction. One can reduce the error rate further by introducing redundancy in the write sequence and this is called coded trace reconstruction. In this paper, we model the DNA storage channel as an in… ▽ More

    Submitted 20 August, 2024; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: Extended version of paper presented at ISIT 2021. On 8/20/2024, in Section III, added a note regarding the dataset of traces released with the paper

  9. arXiv:2011.14532  [pdf, other

    cs.DS cs.IT math.CO math.PR q-bio.QM

    Batch Optimization for DNA Synthesis

    Authors: Konstantin Makarychev, Miklos Z. Racz, Cyrus Rashtchian, Sergey Yekhanin

    Abstract: Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cos… ▽ More

    Submitted 23 February, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

    Comments: Improved Theorem 1.2 and its proof

  10. arXiv:2002.09745  [pdf, other

    cs.CR cs.DS cs.LG stat.ML

    Differentially Private Set Union

    Authors: Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi, Sergey Yekhanin

    Abstract: We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($ε$,$δ$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large a… ▽ More

    Submitted 6 April, 2022; v1 submitted 22 February, 2020; originally announced February 2020.

    Comments: 23 pages, 7 figures

  11. arXiv:1807.00736  [pdf, other

    cs.CR cs.DS

    An Algorithmic Framework For Differentially Private Data Analysis on Trusted Processors

    Authors: Joshua Allen, Bolin Ding, Janardhan Kulkarni, Harsha Nori, Olga Ohrimenko, Sergey Yekhanin

    Abstract: Differential privacy has emerged as the main definition for private data analysis and machine learning. The {\em global} model of differential privacy, which assumes that users trust the data collector, provides strong privacy guarantees and introduces small errors in the output. In contrast, applications of differential privacy in commercial systems by Apple, Google, and Microsoft, use the {\em l… ▽ More

    Submitted 26 October, 2019; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: Accepted at NeurIPS 2019

  12. arXiv:1712.01524  [pdf, other

    cs.CR cs.DS

    Collecting Telemetry Data Privately

    Authors: Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin

    Abstract: The collection and analysis of telemetry data from users' devices is routinely performed by many software companies. Telemetry collection leads to improved user experience but poses significant risks to users' privacy. Locally differentially private (LDP) algorithms have recently emerged as the main tool that allows data collectors to estimate various population statistics, while preserving privac… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: To appear in NIPS 2017

  13. arXiv:1710.10322  [pdf, other

    cs.IT cs.CC

    Maximally Recoverable LRCs: A field size lower bound and constructions for few heavy parities

    Authors: Sivakanth Gopi, Venkatesan Guruswami, Sergey Yekhanin

    Abstract: The explosion in the volumes of data being stored online has resulted in distributed storage systems transitioning to erasure coding based schemes. Local Reconstruction Codes (LRCs) have emerged as the codes of choice for these applications. These codes can correct a small number of erasures by accessing only a small number of remaining coordinates. An $(n,r,h,a,q)$-LRC is a linear code over… ▽ More

    Submitted 15 November, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

    Comments: Conference version to appear in Symposium on Discrete Algorithms (SODA) 2018

  14. arXiv:1605.05412  [pdf, other

    cs.IT

    Maximally Recoverable Codes for Grid-like Topologies

    Authors: Parikshit Gopalan, Guangda Hu, Swastik Kopparty, Shubhangi Saraf, Carol Wang, Sergey Yekhanin

    Abstract: The explosion in the volumes of data being stored online has resulted in distributed storage systems transitioning to erasure coding based schemes. Yet, the codes being deployed in practice are fairly short. In this work, we address what we view as the main coding theoretic barrier to deploying longer codes in storage: at large lengths, failures are not independent and correlated failures are inev… ▽ More

    Submitted 20 September, 2016; v1 submitted 17 May, 2016; originally announced May 2016.

  15. arXiv:1605.02290  [pdf, ps, other

    cs.IT

    New Constructions of SD and MR Codes over Small Finite Fields

    Authors: Guangda Hu, Sergey Yekhanin

    Abstract: Data storage applications require erasure-correcting codes with prescribed sets of dependencies between data symbols and redundant symbols. The most common arrangement is to have $k$ data symbols and $h$ redundant symbols (that each depends on all data symbols) be partitioned into a number of disjoint groups, where for each group one allocates an additional (local) redundant symbol storing the par… ▽ More

    Submitted 8 May, 2016; originally announced May 2016.

  16. arXiv:1307.4150  [pdf, ps, other

    cs.IT

    Explicit Maximally Recoverable Codes with Locality

    Authors: Parikshit Gopalan, Cheng Huang, Bob Jenkins, Sergey Yekhanin

    Abstract: Consider a systematic linear code where some (local) parity symbols depend on few prescribed symbols, while other (heavy) parity symbols may depend on all data symbols. Local parities allow to quickly recover any single symbol when it is erased, while heavy parities provide tolerance to a large number of simultaneous erasures. A code as above is maximally-recoverable if it corrects all erasure pat… ▽ More

    Submitted 19 July, 2013; v1 submitted 15 July, 2013; originally announced July 2013.

    MSC Class: 94B05

  17. arXiv:1303.3921  [pdf, ps, other

    cs.IT cs.DM

    On the Locality of Codeword Symbols in Non-Linear Codes

    Authors: Michael Forbes, Sergey Yekhanin

    Abstract: Consider a possibly non-linear (n,K,d)_q code. Coordinate i has locality r if its value is determined by some r other coordinates. A recent line of work obtained an optimal trade-off between information locality of codes and their redundancy. Further, for linear codes meeting this trade-off, structure theorems were derived. In this work we give a new proof of the locality / redundancy trade-off an… ▽ More

    Submitted 15 March, 2013; originally announced March 2013.

  18. arXiv:1106.3625  [pdf, ps, other

    cs.IT cs.CC cs.DM

    On the Locality of Codeword Symbols

    Authors: Parikshit Gopalan, Cheng Huang, Huseyin Simitci, Sergey Yekhanin

    Abstract: Consider a linear [n,k,d]_q code C. We say that that i-th coordinate of C has locality r, if the value at this coordinate can be recovered from accessing some other r coordinates of C. Data storage applications require codes with small redundancy, low locality for information coordinates, large distance, and low locality for parity coordinates. In this paper we carry out an in-depth study of the r… ▽ More

    Submitted 18 June, 2011; originally announced June 2011.

  19. arXiv:1004.2294  [pdf, ps, other

    math.CO

    Sets with large additive energy and symmetric sets

    Authors: Ilya Shkredov, Sergey Yekhanin

    Abstract: We show that for any set A in a finite Abelian group G that has at least c |A|^3 solutions to a_1 + a_2 = a_3 + a_4, where a_i belong A there exist sets A' in A and L in G, |L| \ll c^{-1} log |A| such that A' is contained in Span of L and A' has approximately c |A|^3 solutions to a'_1 + a'_2 = a'_3 + a'_4, where a'_i belong A'. We also study so-called symmetric sets or, in other words, sets of la… ▽ More

    Submitted 13 April, 2010; originally announced April 2010.

  20. arXiv:0704.1694  [pdf, ps, other

    cs.CC math.NT

    Locally Decodable Codes From Nice Subsets of Finite Fields and Prime Factors of Mersenne Numbers

    Authors: Kiran S. Kedlaya, Sergey Yekhanin

    Abstract: A k-query Locally Decodable Code (LDC) encodes an n-bit message x as an N-bit codeword C(x), such that one can probabilistically recover any bit x_i of the message by querying only k bits of the codeword C(x), even after some constant fraction of codeword bits has been corrupted. The major goal of LDC related research is to establish the optimal trade-off between length and query complexity of s… ▽ More

    Submitted 13 April, 2007; originally announced April 2007.

    Comments: 18 pages

  21. arXiv:cs/0408017  [pdf, ps, other

    cs.IT

    Improved Upper Bound for the Redundancy of Fix-Free Codes

    Authors: Sergey Yekhanin

    Abstract: A variable-length code is a fix-free code if no codeword is a prefix or a suffix of any other codeword. In a fix-free code any finite sequence of codewords can be decoded in both directions, which can improve the robustness to channel noise and speed up the decoding process. In this paper we prove a new sufficient condition of the existence of fix-free codes and improve the upper bound on the re… ▽ More

    Submitted 5 August, 2004; originally announced August 2004.

  22. arXiv:cs/0406039  [pdf, ps, other

    cs.IT

    Long Nonbinary Codes Exceeding the Gilbert - Varshamov Bound for any Fixed Distance

    Authors: Sergey Yekhanin, Ilya Dumer

    Abstract: Let A(q,n,d) denote the maximum size of a q-ary code of length n and distance d. We study the minimum asymptotic redundancy ρ(q,n,d)=n-log_q A(q,n,d) as n grows while q and d are fixed. For any d and q<=d-1, long algebraic codes are designed that improve on the BCH codes and have the lowest asymptotic redundancy ρ(q,n,d) <= ((d-3)+1/(d-2)) log_q n known to date. Prior to this work, codes of fixe… ▽ More

    Submitted 23 June, 2004; v1 submitted 21 June, 2004; originally announced June 2004.

    Comments: Submitted to IEEE Trans. on Info. Theory