Skip to main content

Showing 1–50 of 53 results for author: Wu, Z S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.06488  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Membership Inference Attacks for Unseen Classes

    Authors: Pratiksha Thaker, Neil Kale, Zhiwei Steven Wu, Virginia Smith

    Abstract: Shadow model attacks are the state-of-the-art approach for membership inference attacks on machine learning models. However, these attacks typically assume an adversary has access to a background (nonmember) data distribution that matches the distribution the target model was trained on. We initiate a study of membership inference attacks where the adversary or auditor cannot access an entire subc… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Preprint

  2. arXiv:2504.21199  [pdf, ps, other

    stat.ML cs.CR cs.LG

    Generate-then-Verify: Reconstructing Data from Limited Published Statistics

    Authors: Terrance Liu, Eileen Xiao, Adam Smith, Pratiksha Thaker, Zhiwei Steven Wu

    Abstract: We study the problem of reconstructing tabular data from aggregate statistics, in which the attacker aims to identify interesting claims about the sensitive data that can be verified with 100% certainty given the aggregates. Successful attempts in prior work have conducted studies in settings where the set of published statistics is rich enough that entire datasets can be reconstructed with certai… ▽ More

    Submitted 11 June, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: First two authors contributed equally. Remaining authors are ordered alphabetically

  3. arXiv:2504.15615  [pdf, ps, other

    cs.LG stat.ML

    Dimension-Free Decision Calibration for Nonlinear Loss Functions

    Authors: Jingwu Tang, Jiayun Wu, Zhiwei Steven Wu, Jiahao Zhang

    Abstract: When model predictions inform downstream decision making, a natural question is under what conditions can the decision-makers simply respond to the predictions as if they were the true outcomes. Calibration suffices to guarantee that simple best-response to predictions is optimal. However, calibration for high-dimensional prediction outcome spaces requires exponential computational and statistical… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  4. arXiv:2502.17264  [pdf, other

    cs.LG stat.ML

    Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage

    Authors: Konstantina Bairaktari, Jiayun Wu, Zhiwei Steven Wu

    Abstract: Conformal prediction is a powerful distribution-free framework for constructing prediction sets with coverage guarantees. Classical methods, such as split conformal prediction, provide marginal coverage, ensuring that the prediction set contains the label of a random test point with a target probability. However, these guarantees may not hold uniformly across different subpopulations, leading to d… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  5. arXiv:2406.01933  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Orthogonal Causal Calibration

    Authors: Justin Whitehouse, Christopher Jung, Vasilis Syrgkanis, Bryan Wilder, Zhiwei Steven Wu

    Abstract: Estimates of heterogeneous treatment effects such as conditional average treatment effects (CATEs) and conditional quantile treatment effects (CQTEs) play an important role in real-world decision making. Given this importance, one should ensure these estimates are calibrated. While there is a rich literature on calibrating estimators of non-causal parameters, very few methods have been derived for… ▽ More

    Submitted 30 April, 2025; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 47 pages, 2 figures

  6. arXiv:2404.00848  [pdf, other

    cs.LG cs.CY stat.ME

    Predictive Performance Comparison of Decision Policies Under Confounding

    Authors: Luke Guerdan, Amanda Coston, Kenneth Holstein, Zhiwei Steven Wu

    Abstract: Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing decision-making policy that is generally under-specified and dependent on unobservable factors. These sources of uncertainty are often addressed in practice by maki… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: ICML 2024

  7. arXiv:2403.05006  [pdf, ps, other

    cs.LG cs.AI stat.ME stat.ML

    Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

    Authors: Huiying Zhong, Zhun Deng, Weijie J. Su, Zhiwei Steven Wu, Linjun Zhang

    Abstract: Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how trad… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  8. arXiv:2312.16307  [pdf, other

    econ.EM cs.GT cs.LG stat.ME

    Incentive-Aware Synthetic Control: Accurate Counterfactual Estimation via Incentivized Exploration

    Authors: Daniel Ngo, Keegan Harris, Anish Agarwal, Vasilis Syrgkanis, Zhiwei Steven Wu

    Abstract: We consider the setting of synthetic control methods (SCMs), a canonical approach used to estimate the treatment effect on the treated in a panel data setting. We shed light on a frequently overlooked but ubiquitous assumption made in SCMs of "overlap": a treated unit can be written as some combination -- typically, convex or linear combination -- of the units that remain under control. We show th… ▽ More

    Submitted 13 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  9. arXiv:2312.15551  [pdf, other

    cs.LG cs.CR stat.ML

    On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

    Authors: Pratiksha Thaker, Amrith Setlur, Zhiwei Steven Wu, Virginia Smith

    Abstract: Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the se… ▽ More

    Submitted 1 September, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  10. arXiv:2310.09100  [pdf, other

    math.PR math.ST stat.ME

    Time-Uniform Self-Normalized Concentration for Vector-Valued Processes

    Authors: Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas

    Abstract: Self-normalized processes arise naturally in many learning-related tasks. While self-normalized concentration has been extensively studied for scalar-valued processes, there are few results for multidimensional processes outside of the sub-Gaussian setting. In this work, we construct a general, self-normalized inequality for multivariate processes that satisfy a simple yet broad sub-$ψ$ tail condi… ▽ More

    Submitted 30 April, 2025; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 49 pages, 4 figures

  11. arXiv:2307.07539  [pdf, ps, other

    cs.LG math.ST stat.ML

    On the Sublinear Regret of GP-UCB

    Authors: Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas

    Abstract: In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (… ▽ More

    Submitted 14 August, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: 20 pages, 0 figures

  12. arXiv:2307.02295  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Learning Adversarial Bandit Algorithms

    Authors: Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan, Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu

    Abstract: We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inne… ▽ More

    Submitted 1 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Merger of arXiv:2205.14128 and arXiv:2205.15921, with some additional improvements; to appear in NeurIPS 2023

  13. arXiv:2307.01357  [pdf, other

    cs.LG econ.EM stat.ME stat.ML

    Adaptive Principal Component Regression with Applications to Panel Data

    Authors: Anish Agarwal, Keegan Harris, Justin Whitehouse, Zhiwei Steven Wu

    Abstract: Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixe… ▽ More

    Submitted 4 August, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

  14. arXiv:2303.01256  [pdf, other

    stat.ML cs.CR cs.CV cs.DS cs.LG

    Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance

    Authors: Xin Gu, Gautam Kamath, Zhiwei Steven Wu

    Abstract: Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data. However, given a choice of public… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  15. arXiv:2302.11121  [pdf, other

    cs.LG cs.CY cs.HC stat.ME

    Counterfactual Prediction Under Outcome Measurement Error

    Authors: Luke Guerdan, Amanda Coston, Kenneth Holstein, Zhiwei Steven Wu

    Abstract: Across domains such as medicine, employment, and criminal justice, predictive models often target labels that imperfectly reflect the outcomes of interest to experts and policymakers. For example, clinical risk assessments deployed to inform physician decision-making often predict measures of healthcare utilization (e.g., costs, hospitalization) as a proxy for patient medical need. These proxies c… ▽ More

    Submitted 17 May, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: FAccT 2023

  16. arXiv:2206.07902  [pdf, other

    cs.LG cs.CR stat.ML

    On Privacy and Personalization in Cross-Silo Federated Learning

    Authors: Ziyu Liu, Shengyuan Hu, Zhiwei Steven Wu, Virginia Smith

    Abstract: While the application of differential privacy (DP) has been well-studied in cross-device federated learning (FL), there is a lack of work considering DP and its implications for cross-silo FL, a setting characterized by a limited number of clients each containing many data subjects. In cross-silo FL, usual notions of client-level DP are less suitable as real-world privacy regulations typically con… ▽ More

    Submitted 17 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022, 37 pages

  17. arXiv:2206.07234  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints

    Authors: Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas, Ryan Rogers

    Abstract: There is a disconnect between how researchers and practitioners handle privacy-utility tradeoffs. Researchers primarily operate from a privacy first perspective, setting strict privacy requirements and minimizing risk subject to these constraints. Practitioners often desire an accuracy first perspective, possibly satisfied with the greatest privacy they can get subject to obtaining sufficiently sm… ▽ More

    Submitted 10 November, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 26 pages, 4 figures

  18. arXiv:2205.15397  [pdf, other

    cs.LG stat.ML

    Minimax Optimal Online Imitation Learning via Replay Estimation

    Authors: Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

    Abstract: Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap tha… ▽ More

    Submitted 14 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  19. arXiv:2205.14128  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Meta-Learning Adversarial Bandits

    Authors: Maria-Florina Balcan, Keegan Harris, Mikhail Khodak, Zhiwei Steven Wu

    Abstract: We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a unified meta-algorithm that yields setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit linear optimiza… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 19 pages

  20. arXiv:2203.05481  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Fully Adaptive Composition in Differential Privacy

    Authors: Justin Whitehouse, Aaditya Ramdas, Ryan Rogers, Zhiwei Steven Wu

    Abstract: Composition is a key feature of differential privacy. Well-known advanced composition theorems allow one to query a private database quadratically more times than basic privacy composition would permit. However, these results require that the privacy parameters of all algorithms be fixed before interacting with the data. To address this, Rogers et al. introduced fully adaptive composition, wherein… ▽ More

    Submitted 24 October, 2023; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: 23 pages, 3 figures

  21. arXiv:2202.08728  [pdf, other

    stat.ME cs.CR math.ST stat.ML

    Nonparametric extensions of randomized response for private confidence sets

    Authors: Ian Waudby-Smith, Zhiwei Steven Wu, Aaditya Ramdas

    Abstract: This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP). Given bounded observations $(X_1, \dots, X_n)$ with mean $μ^\star$ that are privatized into $(Z_1, \dots, Z_n)$, we present confidence intervals (CI) and time-uniform confidence sequences (CS) for $μ^\star$ when only given access… ▽ More

    Submitted 24 July, 2024; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: 50 pages, 7 figures, to appear in the 2023 International Conference on Machine Learning with an Oral Presentation

  22. arXiv:2202.05318  [pdf, other

    stat.ML cs.CR cs.LG math.OC

    Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

    Authors: Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu

    Abstract: Large-scale machine learning systems often involve data distributed across a collection of users. Federated learning algorithms leverage this structure by communicating model updates to a central server, rather than entire datasets. In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint… ▽ More

    Submitted 15 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: ICML

  23. arXiv:2103.03236  [pdf, other

    cs.LG cs.RO stat.ML

    Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching. At its core, our classification scheme is based on whether the learner attempts to match (1) reward or (2) action-value moments of the expert's behavior, with each option leading to differing algorithmic approaches. By considering adversarially chosen divergences between lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  24. arXiv:2009.09052  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Private Reinforcement Learning with PAC and Regret Guarantees

    Authors: Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Zhiwei Steven Wu

    Abstract: Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)--a strong variant of differential privacy for settings where each user receives t… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

  25. arXiv:2008.11707  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Bandit Data-Driven Optimization

    Authors: Zheyuan Ryan Shi, Zhiwei Steven Wu, Rayid Ghani, Fei Fang

    Abstract: Applications of machine learning in the non-profit and public sectors often feature an iterative workflow of data acquisition, prediction, and optimization of interventions. There are four major pain points that a machine learning pipeline must overcome in order to be actually useful in these settings: small data, data collected only under the default intervention, unmodeled objectives due to comm… ▽ More

    Submitted 14 January, 2022; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: This is the complete version of the paper. A version of this paper is also published at AAAI-22

  26. arXiv:2007.11934  [pdf, other

    cs.LG cs.CR cs.CY stat.ML

    Private Post-GAN Boosting

    Authors: Marcel Neunhoeffer, Zhiwei Steven Wu, Cynthia Dwork

    Abstract: Differentially private GANs have proven to be a promising approach for generating realistic synthetic data without compromising the privacy of individuals. Due to the privacy-protective noise introduced in the training, the convergence of GANs becomes even more elusive, which often leads to poor utility in the output generator at the end of training. We propose Private post-GAN boosting (Private P… ▽ More

    Submitted 25 March, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

    Journal ref: International Conference on Learning Representations, 2021

  27. arXiv:2007.05453  [pdf, other

    cs.LG cs.DS stat.ML

    New Oracle-Efficient Algorithms for Private Synthetic Data Release

    Authors: Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, Zhiwei Steven Wu

    Abstract: We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  28. arXiv:2007.03813  [pdf, other

    cs.LG cs.CR stat.ML

    Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

    Authors: Yingxue Zhou, Zhiwei Steven Wu, Arindam Banerjee

    Abstract: Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM). Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. Such dependence can be problematic for over-parameterized models where $p \gg n$, the number of train… ▽ More

    Submitted 23 April, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

  29. arXiv:2006.15429  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Understanding Gradient Clipping in Private SGD: A Geometric Perspective

    Authors: Xiangyi Chen, Zhiwei Steven Wu, Mingyi Hong

    Abstract: Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of… ▽ More

    Submitted 17 March, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

  30. arXiv:2006.13501  [pdf, other

    cs.LG cs.CR stat.ML

    Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

    Authors: Yingxue Zhou, Xiangyi Chen, Mingyi Hong, Zhiwei Steven Wu, Arindam Banerjee

    Abstract: We study differentially private (DP) algorithms for stochastic non-convex optimization. In this problem, the goal is to minimize the population loss over a $p$-dimensional space given $n$ i.i.d. samples drawn from a distribution. We improve upon the population gradient bound of ${\sqrt{p}}/{\sqrt{n}}$ from prior work and obtain a sharper rate of $\sqrt[4]{p}/\sqrt{n}$. We obtain this rate by provi… ▽ More

    Submitted 10 August, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: In the current version, we drop the experimental results on CIFAR-10 dataset due to an implementation error

  31. arXiv:2005.10624  [pdf, ps, other

    cs.LG stat.ML

    Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

    Authors: Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu

    Abstract: Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that alway… ▽ More

    Submitted 27 December, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: Results in this paper, without any proofs, have been announced in an extended abstract (Raghavan et al., 2018a), and fleshed out in the technical report (Raghavan et al., 2018b [arXiv:1806.00543]). This manuscript covers a subset of results from Raghavan et al. (2018a,b), focusing on the greedy algorithm, and is streamlined accordingly

  32. arXiv:2004.10941  [pdf, other

    cs.LG stat.ML

    Private Query Release Assisted by Public Data

    Authors: Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Zhiwei Steven Wu

    Abstract: We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class $\mathcal{H}$ of statistical queries with error no more than $α$ using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  33. arXiv:2002.11332  [pdf, ps, other

    cs.LG math.ST stat.ML

    Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

    Authors: Vidyashankar Sivakumar, Zhiwei Steven Wu, Arindam Banerjee

    Abstract: Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $θ^*$ has structur… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  34. arXiv:2002.09465  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Locally Private Hypothesis Selection

    Authors: Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Zhiwei Steven Wu, Huanyu Zhang

    Abstract: We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. T… ▽ More

    Submitted 19 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: To appear in COLT 2020

  35. arXiv:2002.09463  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Privately Learning Markov Random Fields

    Authors: Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, Zhiwei Steven Wu

    Abstract: We consider the problem of learning Markov Random Fields (including the prototypical example, the Ising model) under the constraint of differential privacy. Our learning goals include both structure learning, where we try to estimate the underlying graph structure of the model, as well as the harder goal of parameter learning, in which we additionally estimate the parameter on each edge. We provid… ▽ More

    Submitted 14 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

  36. arXiv:2002.07024  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Gaming Helps! Learning from Strategic Interactions in Natural Dynamics

    Authors: Yahav Bechavod, Katrina Ligett, Zhiwei Steven Wu, Juba Ziani

    Abstract: We consider an online regression setting in which individuals adapt to the regression model: arriving individuals are aware of the current model, and invest strategically in modifying their own features so as to improve the predicted score that the current model assigns to them. Such feature manipulation has been observed in various scenarios -- from credit assessment to school admissions -- posin… ▽ More

    Submitted 28 February, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

    Comments: The Conference version of this paper is to appear in the Proceedings of AISTATS 2021. 27 pages

  37. arXiv:2002.05660  [pdf, other

    cs.LG stat.ML

    Learn to Expect the Unexpected: Probably Approximately Correct Domain Generalization

    Authors: Vikas K. Garg, Adam Kalai, Katrina Ligett, Zhiwei Steven Wu

    Abstract: Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm c… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  38. arXiv:2002.05474  [pdf, ps, other

    cs.LG stat.ML

    Metric-Free Individual Fairness in Online Learning

    Authors: Yahav Bechavod, Christopher Jung, Zhiwei Steven Wu

    Abstract: We study an online learning problem subject to the constraint of individual fairness, which requires that similar individuals are treated similarly. Unlike prior work on individual fairness, we do not assume the similarity measure among individuals is known, nor do we assume that such measure takes a certain parametric form. Instead, we leverage the existence of an auditor who detects fairness vio… ▽ More

    Submitted 23 April, 2022; v1 submitted 13 February, 2020; originally announced February 2020.

  39. arXiv:1910.04930  [pdf, other

    cs.LG math.ST stat.ML

    Random Quadratic Forms with Dependence: Applications to Restricted Isometry and Beyond

    Authors: Arindam Banerjee, Qilong Gu, Vidyashankar Sivakumar, Zhiwei Steven Wu

    Abstract: Several important families of computational and statistical results in machine learning and randomized algorithms rely on uniform bounds on quadratic forms of random vectors or matrices. Such results include the Johnson-Lindenstrauss (J-L) Lemma, the Restricted Isometry Property (RIP), randomized sketching algorithms, and approximate linear algebra. The existing results critically depend on statis… ▽ More

    Submitted 5 December, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

  40. arXiv:1909.01783  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Oracle Efficient Private Non-Convex Optimization

    Authors: Seth Neel, Aaron Roth, Giuseppe Vietri, Zhiwei Steven Wu

    Abstract: One of the most effective algorithms for differentially private learning and optimization is objective perturbation. This technique augments a given optimization problem (e.g. deriving from an ERM problem) with a random linear term, and then exactly solves it. However, to date, analyses of this approach crucially rely on the convexity and smoothness of the objective function, limiting its generali… ▽ More

    Submitted 29 December, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

  41. arXiv:1906.01736  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

    Authors: Xiangyi Chen, Tiancong Chen, Haoran Sun, Zhiwei Steven Wu, Mingyi Hong

    Abstract: Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses… ▽ More

    Submitted 6 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

  42. arXiv:1905.13229  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Private Hypothesis Selection

    Authors: Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu

    Abstract: We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we deno… ▽ More

    Submitted 4 January, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Appeared in NeurIPS 2019. Final version to appear in IEEE Transactions on Information Theory

  43. arXiv:1905.12843  [pdf, other

    cs.LG stat.ML

    Fair Regression: Quantitative Definitions and Reduction-based Algorithms

    Authors: Alekh Agarwal, Miroslav Dudík, Zhiwei Steven Wu

    Abstract: In this paper, we study the prediction of a real-valued target, such as a risk score or recidivism rate, while guaranteeing a quantitative notion of fairness with respect to a protected attribute such as gender or race. We call this class of problems \emph{fair regression}. We propose general schemes for fair regression under two notions of fairness: (1) statistical parity, which asks that the pre… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  44. arXiv:1905.10660  [pdf, other

    cs.LG stat.ML

    An Algorithmic Framework for Fairness Elicitation

    Authors: Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, Zhiwei Steven Wu

    Abstract: We consider settings in which the right notion of fairness is not captured by simple mathematical definitions (such as equality of error rates across groups), but might be more complex and nuanced and thus require elicitation from individual or collective stakeholders. We introduce a framework in which pairs of individuals can be identified as requiring (approximately) equal treatment under a lear… ▽ More

    Submitted 14 October, 2020; v1 submitted 25 May, 2019; originally announced May 2019.

  45. arXiv:1902.02242  [pdf, ps, other

    cs.LG stat.ML

    Equal Opportunity in Online Classification with Partial Feedback

    Authors: Yahav Bechavod, Katrina Ligett, Aaron Roth, Bo Waggoner, Zhiwei Steven Wu

    Abstract: We study an online classification problem with partial feedback in which individuals arrive one at a time from a fixed but unknown distribution, and must be classified as positive or negative. Our algorithm only observes the true label of an individual if they are given a positive classification. This setting captures many classification problems for which fairness is a concern: for example, in cr… ▽ More

    Submitted 16 April, 2020; v1 submitted 6 February, 2019; originally announced February 2019.

    Comments: The Conference version of this paper appears in the Proceedings of NeurIPS 2019. 29 pages

  46. arXiv:1812.01484  [pdf, ps, other

    cs.LG stat.ML

    Privacy-Preserving Distributed Deep Learning for Clinical Data

    Authors: Brett K. Beaulieu-Jones, William Yuan, Samuel G. Finlayson, Zhiwei Steven Wu

    Abstract: Deep learning with medical data often requires larger samples sizes than are available at single providers. While data sharing among institutions is desirable to train more accurate and sophisticated models, it can lead to severe privacy concerns due the sensitive nature of the data. This problem has motivated a number of studies on distributed training of neural networks that do not require direc… ▽ More

    Submitted 4 December, 2018; originally announced December 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

  47. arXiv:1811.08382  [pdf, ps, other

    cs.LG stat.ML

    Locally Private Gaussian Estimation

    Authors: Matthew Joseph, Janardhan Kulkarni, Jieming Mao, Zhiwei Steven Wu

    Abstract: We study a basic private estimation problem: each of $n$ users draws a single i.i.d. sample from an unknown Gaussian distribution, and the goal is to estimate the mean of this Gaussian distribution while satisfying local differential privacy for each user. Informally, local differential privacy requires that each data point is individually and independently privatized before it is passed to a lear… ▽ More

    Submitted 27 October, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

  48. arXiv:1811.07765  [pdf, ps, other

    cs.LG cs.CR cs.DS stat.ML

    How to Use Heuristics for Differential Privacy

    Authors: Seth Neel, Aaron Roth, Zhiwei Steven Wu

    Abstract: We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guarantees cannot be evaluated empirically, and must be proven --- without making heuristic assumptions. We show that learning problems over broad classes… ▽ More

    Submitted 19 November, 2018; originally announced November 2018.

  49. arXiv:1808.08166  [pdf, other

    cs.LG stat.ML

    An Empirical Study of Rich Subgroup Fairness for Machine Learning

    Authors: Michael Kearns, Seth Neel, Aaron Roth, Zhiwei Steven Wu

    Abstract: Kearns et al. [2018] recently proposed a notion of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness picks a statistical fairness constraint (say, equalizing false positive rates across protected groups), but then asks that this constraint hold over an exponentially or infinitely large collection of subgroups defined by… ▽ More

    Submitted 24 August, 2018; originally announced August 2018.

  50. arXiv:1806.03467  [pdf, other

    cs.LG econ.EM math.ST stat.ML

    Orthogonal Random Forest for Causal Inference

    Authors: Miruna Oprescu, Vasilis Syrgkanis, Zhiwei Steven Wu

    Abstract: We propose the orthogonal random forest, an algorithm that combines Neyman-orthogonality to reduce sensitivity with respect to estimation error of nuisance parameters with generalized random forests (Athey et al., 2017)--a flexible non-parametric method for statistical estimation of conditional moment models using random forests. We provide a consistency rate and establish asymptotic normality for… ▽ More

    Submitted 25 September, 2019; v1 submitted 9 June, 2018; originally announced June 2018.

    Comments: This paper appeared in the Proceedings of the 36th International Conference on Machine Learning