Skip to main content

Showing 1–27 of 27 results for author: Smith, V

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.20254  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs

    Authors: Xiangchen Song, Aashiq Muhamed, Yujia Zheng, Lingjing Kong, Zeyu Tang, Mona T. Diab, Virginia Smith, Kun Zhang

    Abstract: Sparse Autoencoders (SAEs) are a prominent tool in mechanistic interpretability (MI) for decomposing neural network activations into interpretable features. However, the aspiration to identify a canonical set of features is challenged by the observed inconsistency of learned SAE features across different training runs, undermining the reliability and efficiency of MI research. This position paper… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2312.15551  [pdf, other

    cs.LG cs.CR stat.ML

    On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

    Authors: Pratiksha Thaker, Amrith Setlur, Zhiwei Steven Wu, Virginia Smith

    Abstract: Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the se… ▽ More

    Submitted 1 September, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  3. arXiv:2312.03318  [pdf, other

    cs.LG cs.CV stat.ML

    Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift

    Authors: Saurabh Garg, Amrith Setlur, Zachary Chase Lipton, Sivaraman Balakrishnan, Virginia Smith, Aditi Raghunathan

    Abstract: Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains unexplored. In this paper, we undertake a systematic empirical investi… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023

  4. arXiv:2206.07902  [pdf, other

    cs.LG cs.CR stat.ML

    On Privacy and Personalization in Cross-Silo Federated Learning

    Authors: Ziyu Liu, Shengyuan Hu, Zhiwei Steven Wu, Virginia Smith

    Abstract: While the application of differential privacy (DP) has been well-studied in cross-device federated learning (FL), there is a lack of work considering DP and its implications for cross-silo FL, a setting characterized by a limited number of clients each containing many data subjects. In cross-silo FL, usual notions of client-level DP are less suitable as real-world privacy regulations typically con… ▽ More

    Submitted 17 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022, 37 pages

  5. arXiv:2202.05963  [pdf, other

    cs.LG cs.CR stat.ML

    Private Adaptive Optimization with Side Information

    Authors: Tian Li, Manzil Zaheer, Sashank J. Reddi, Virginia Smith

    Abstract: Adaptive optimization methods have become the default solvers for many machine learning tasks. Unfortunately, the benefits of adaptivity may degrade when training with differential privacy, as the noise added to ensure privacy reduces the effectiveness of the adaptive preconditioner. To this end, we propose AdaDPS, a general framework that uses non-sensitive side information to precondition the gr… ▽ More

    Submitted 24 June, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  6. arXiv:2109.06141  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    On Tilted Losses in Machine Learning: Theory and Applications

    Authors: Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith

    Abstract: Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -… ▽ More

    Submitted 1 June, 2023; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2007.01162

  7. arXiv:2106.04502  [pdf, other

    cs.LG cs.AI cs.DC stat.ML

    Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing

    Authors: Mikhail Khodak, Renbo Tu, Tian Li, Liam Li, Maria-Florina Balcan, Virginia Smith, Ameet Talwalkar

    Abstract: Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investig… ▽ More

    Submitted 4 November, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  8. arXiv:2012.04221  [pdf, other

    cs.LG stat.ML

    Ditto: Fair and Robust Federated Learning Through Personalization

    Authors: Tian Li, Shengyuan Hu, Ahmad Beirami, Virginia Smith

    Abstract: Fairness and robustness are two important concerns for federated learning systems. In this work, we identify that robustness to data and model poisoning attacks and fairness, measured as the uniformity of performance across devices, are competing constraints in statistically heterogeneous networks. To address these constraints, we propose employing a simple, general framework for personalized fede… ▽ More

    Submitted 15 June, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted by ICML 2021

  9. arXiv:2011.14048  [pdf, other

    cs.LG stat.ML

    Is Support Set Diversity Necessary for Meta-Learning?

    Authors: Amrith Setlur, Oscar Li, Virginia Smith

    Abstract: Meta-learning is a popular framework for learning with limited data in which an algorithm is produced by training over multiple few-shot learning tasks. For classification problems, these tasks are typically constructed by sampling a small number of support and query examples from a subset of the classes. While conventional wisdom is that task diversity should improve the performance of meta-learn… ▽ More

    Submitted 7 October, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

    Journal ref: NeurIPS 2020 Workshop on Meta-learning

  10. arXiv:2008.03230  [pdf, other

    cs.LG cs.CV cs.DB cs.IT eess.SP stat.ML

    ESPRESSO: Entropy and ShaPe awaRe timE-Series SegmentatiOn for processing heterogeneous sensor data

    Authors: Shohreh Deldari, Daniel V. Smith, Amin Sadri, Flora D. Salim

    Abstract: Extracting informative and meaningful temporal segments from high-dimensional wearable sensor data, smart devices, or IoT data is a vital preprocessing step in applications such as Human Activity Recognition (HAR), trajectory prediction, gesture recognition, and lifelogging. In this paper, we propose ESPRESSO (Entropy and ShaPe awaRe timE-Series SegmentatiOn), a hybrid segmentation model for multi… ▽ More

    Submitted 24 July, 2020; originally announced August 2020.

    Comments: 23 pages, 11 figures, accepted at IMWUT Volume(4) issue(3)

  11. arXiv:2007.01162  [pdf, other

    cs.LG cs.IT stat.ML

    Tilted Empirical Risk Minimization

    Authors: Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith

    Abstract: Empirical risk minimization (ERM) is typically designed to perform well on the average loss, which can result in estimators that are sensitive to outliers, generalize poorly, or treat subgroups unfairly. While many methods aim to address these problems individually, in this work, we explore them through a unified framework -- tilted empirical risk minimization (TERM). In particular, we show that i… ▽ More

    Submitted 17 March, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

    Comments: Accepted by ICLR 2021

  12. arXiv:2001.01920  [pdf, other

    cs.LG stat.ML

    FedDANE: A Federated Newton-Type Method

    Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

    Abstract: Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions.… ▽ More

    Submitted 7 January, 2020; originally announced January 2020.

    Comments: Asilomar Conference on Signals, Systems, and Computers 2019

  13. arXiv:1911.01812  [pdf, other

    cs.LG cs.CR cs.NI stat.ML

    Enhancing the Privacy of Federated Learning with Sketching

    Authors: Zaoxing Liu, Tian Li, Virginia Smith, Vyas Sekar

    Abstract: In response to growing concerns about user privacy, federated learning has emerged as a promising tool to train statistical models over networks of devices while keeping data localized. Federated learning methods run training tasks directly on user devices and do not share the raw user data with third parties. However, current methods still share model updates, which may contain private informatio… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

  14. arXiv:1911.00972  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy for Free: Communication-Efficient Learning with Differential Privacy Using Sketches

    Authors: Tian Li, Zaoxing Liu, Vyas Sekar, Virginia Smith

    Abstract: Communication and privacy are two critical concerns in distributed learning. Many existing works treat these concerns separately. In this work, we argue that a natural connection exists between methods for communication reduction and privacy preservation in the context of distributed machine learning. In particular, we prove that Count Sketch, a simple method for data stream summarization, has inh… ▽ More

    Submitted 6 December, 2019; v1 submitted 3 November, 2019; originally announced November 2019.

  15. arXiv:1911.00472  [pdf, other

    cs.LG stat.ML

    Progressive Compressed Records: Taking a Byte out of Deep Learning Data

    Authors: Michael Kuchnik, George Amvrosiadis, Virginia Smith

    Abstract: Deep learning accelerators efficiently train over vast and growing amounts of data, placing a newfound burden on commodity networks and storage devices. A common approach to conserve bandwidth involves resizing or compressing data prior to training. We introduce Progressive Compressed Records (PCRs), a data format that uses compression to reduce the overhead of fetching and transporting data, effe… ▽ More

    Submitted 11 August, 2021; v1 submitted 1 November, 2019; originally announced November 2019.

  16. arXiv:1908.07873  [pdf, other

    cs.LG cs.DC stat.ML

    Federated Learning: Challenges, Methods, and Future Directions

    Authors: Tian Li, Anit Kumar Sahu, Ameet Talwalkar, Virginia Smith

    Abstract: Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving da… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

  17. arXiv:1905.10497  [pdf, other

    cs.LG stat.ML

    Fair Resource Allocation in Federated Learning

    Authors: Tian Li, Maziar Sanjabi, Ahmad Beirami, Virginia Smith

    Abstract: Federated learning involves training statistical models in massive, heterogeneous networks. Naively minimizing an aggregate loss function in such a network may disproportionately advantage or disadvantage some of the devices. In this work, we propose q-Fair Federated Learning (q-FFL), a novel optimization objective inspired by fair resource allocation in wireless networks that encourages a more fa… ▽ More

    Submitted 14 February, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: ICLR 2020

  18. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  19. arXiv:1902.11175  [pdf, other

    cs.LG stat.ML

    One-Shot Federated Learning

    Authors: Neel Guha, Ameet Talwalkar, Virginia Smith

    Abstract: We present one-shot federated learning, where a central server learns a global model over a network of federated devices in a single round of communication. Our approach - drawing on ensemble learning and knowledge aggregation - achieves an average relative gain of 51.5% in AUC over local baselines and comes within 90.1% of the (unattainable) global ideal. We discuss these methods and identify sev… ▽ More

    Submitted 5 March, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

    Comments: 5 pages, 3 figures, 1 table. 2nd Workshop on Machine Learning on the Phone and other Consumer Devices, NeurIPs 2018

  20. arXiv:1812.06127  [pdf, other

    cs.LG stat.ML

    Federated Optimization in Heterogeneous Networks

    Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

    Abstract: Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedPr… ▽ More

    Submitted 21 April, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

    Comments: MLSys 2020

  21. arXiv:1812.01097  [pdf, other

    cs.LG stat.ML

    LEAF: A Benchmark for Federated Settings

    Authors: Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, Ameet Talwalkar

    Abstract: Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and mult… ▽ More

    Submitted 9 December, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

  22. arXiv:1810.05222  [pdf, other

    cs.LG stat.ML

    Efficient Augmentation via Data Subsampling

    Authors: Michael Kuchnik, Virginia Smith

    Abstract: Data augmentation is commonly used to encode invariances in learning methods. However, this process is often performed in an inefficient manner, as artificial examples are created by applying a number of transformations to all points in the training set. The resulting explosion of the dataset size can be an issue in terms of storage and training costs, as well as in selecting and tuning the optima… ▽ More

    Submitted 1 March, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

  23. arXiv:1805.07782  [pdf, other

    cs.LG cs.AI stat.ML

    Model Aggregation via Good-Enough Model Spaces

    Authors: Neel Guha, Virginia Smith

    Abstract: In many applications, the training data for a machine learning task is partitioned across multiple nodes, and aggregating this data may be infeasible due to communication, privacy, or storage constraints. Existing distributed optimization methods for learning global models in these settings typically aggregate local updates from each node in an iterative fashion. However, these approaches require… ▽ More

    Submitted 4 June, 2019; v1 submitted 20 May, 2018; originally announced May 2018.

    Comments: 21 pages, 6 figures, 8 tablees

  24. arXiv:1803.06084  [pdf, other

    cs.LG stat.ML

    A Kernel Theory of Modern Data Augmentation

    Authors: Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

    Abstract: Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear natural… ▽ More

    Submitted 20 March, 2019; v1 submitted 16 March, 2018; originally announced March 2018.

  25. arXiv:1705.10467  [pdf, other

    cs.LG stat.ML

    Federated Multi-Task Learning

    Authors: Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, Ameet Talwalkar

    Abstract: Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first… ▽ More

    Submitted 27 February, 2018; v1 submitted 30 May, 2017; originally announced May 2017.

  26. arXiv:1409.1458  [pdf, ps, other

    cs.LG math.OC stat.ML

    Communication-Efficient Distributed Dual Coordinate Ascent

    Authors: Martin Jaggi, Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I. Jordan

    Abstract: Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In this paper, we propose a communication-efficient framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algor… ▽ More

    Submitted 29 September, 2014; v1 submitted 4 September, 2014; originally announced September 2014.

    Comments: NIPS 2014 version, including proofs. Published in Advances in Neural Information Processing Systems 27 (NIPS 2014)

    MSC Class: 90C25; 68W15 ACM Class: G.1.6; C.1.4

  27. arXiv:1310.5426  [pdf, other

    cs.LG cs.DC stat.ML

    MLI: An API for Distributed Machine Learning

    Authors: Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael I. Jordan, Tim Kraska

    Abstract: MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implement… ▽ More

    Submitted 25 October, 2013; v1 submitted 21 October, 2013; originally announced October 2013.