Skip to main content

Showing 1–50 of 95 results for author: Braverman, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01951  [pdf, ps, other

    cs.CL cs.LG

    Self-ensemble: Mitigating Confidence Distortion for Large Language Models

    Authors: Zicheng Xu, Guanchu Wang, Guangyao Zheng, Yu-Neng Chuang, Alexander Szalay, Xia Hu, Vladimir Braverman

    Abstract: Although Large Language Models (LLMs) perform well in general fields, they exhibit a confidence distortion problem on multi-choice question-answering (MCQA), particularly as the number of answer choices increases. Specifically, on MCQA with many choices, LLMs suffer from under-confidence in correct predictions and over-confidence in incorrect ones, leading to a substantially degraded performance.… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2505.22662  [pdf, other

    cs.CL cs.LG

    AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models

    Authors: Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu

    Abstract: The reasoning-capable large language models (LLMs) demonstrate strong performance on complex reasoning tasks but often suffer from overthinking, generating unnecessarily long chain-of-thought (CoT) reasoning paths for easy reasoning questions, thereby increasing inference cost and latency. Recent approaches attempt to address this challenge by manually deciding when to apply long or short reasonin… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  3. arXiv:2504.04039  [pdf, other

    cs.LG cs.AI stat.ML

    Memory-Statistics Tradeoff in Continual Learning with Structural Regularization

    Authors: Haoran Li, Jingfeng Wu, Vladimir Braverman

    Abstract: We study the statistical performance of a continual learning problem with two linear regression tasks in a well-specified random design setting. We consider a structural regularization algorithm that incorporates a generalized $\ell_2$-regularization tailored to the Hessian of the previous task for mitigating catastrophic forgetting. We establish upper and lower bounds on the joint excess risk for… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  4. arXiv:2502.05790  [pdf, other

    cs.LG

    I3S: Importance Sampling Subspace Selection for Low-Rank Optimization in LLM Pretraining

    Authors: Haochen Zhang, Junze Yin, Guanchu Wang, Zirui Liu, Tianyi Zhang, Anshumali Shrivastava, Lin Yang, Vladimir Braverman

    Abstract: Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memory cost of storing optimizer states. A key challenge in these methods is identifying suitable subspaces to ensure an effective optimization trajectory. Most exis… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  5. arXiv:2502.04428  [pdf, other

    cs.CL cs.AI cs.LG

    Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization

    Authors: Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu

    Abstract: Large language models (LLMs) are increasingly deployed and democratized on edge devices. To improve the efficiency of on-device deployment, small language models (SLMs) are often adopted due to their efficient decoding latency and reduced energy consumption. However, these SLMs often generate inaccurate responses when handling complex queries. One promising solution is uncertainty-based SLM routin… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  6. arXiv:2502.04386  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Fair Medical AI: Adversarial Debiasing of 3D CT Foundation Embeddings

    Authors: Guangyao Zheng, Michael A. Jacobs, Vladimir Braverman, Vishwa S. Parekh

    Abstract: Self-supervised learning has revolutionized medical imaging by enabling efficient and generalizable feature extraction from large-scale unlabeled datasets. Recently, self-supervised foundation models have been extended to three-dimensional (3D) computed tomography (CT) data, generating compact, information-rich embeddings with 1408 features that achieve state-of-the-art performance on downstream t… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  7. arXiv:2411.09979  [pdf, other

    cs.DS cs.LG

    Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time

    Authors: Vladimir Braverman, Prathamesh Dharangutte, Shreyas Pai, Vihan Shah, Chen Wang

    Abstract: We study the dynamic correlation clustering problem with $\textit{adaptive}$ edge label flips. In correlation clustering, we are given a $n$-vertex complete graph whose edges are labeled either $(+)$ or $(-)$, and the goal is to minimize the total number of $(+)$ edges between clusters and the number of $(-)$ edges within clusters. We consider the dynamic setting with adversarial robustness, in wh… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  8. arXiv:2411.01580  [pdf, other

    cs.LG cs.CR

    Federated Learning Clients Clustering with Adaptation to Data Drifts

    Authors: Minghao Li, Dmitrii Avdiukhin, Rana Shahout, Nikita Ivkin, Vladimir Braverman, Minlan Yu

    Abstract: Federated Learning (FL) enables deep learning model training across edge devices and protects user privacy by retaining raw data locally. Data heterogeneity in client distributions slows model convergence and leads to plateauing with reduced precision. Clustered FL solutions address this by grouping clients with statistically similar data and training models for each cluster. However, maintaining… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

    Comments: 16 pages, 10 figures

  9. arXiv:2408.08422  [pdf, other

    cs.CE cs.AI

    Assessing and Enhancing Large Language Models in Rare Disease Question-answering

    Authors: Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) in general medical domains, questions remain about their performance in diagnosing rare diseases. To answer this question, we aim to assess the diagnostic performance of LLMs in rare diseases, and explore methods to enhance their effectiveness in this area. In this work, we introduce a rare disease question-answering (ReDis-QA) da… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  10. arXiv:2407.11364  [pdf, ps, other

    cs.DS cs.LG

    Learning-augmented Maximum Independent Set

    Authors: Vladimir Braverman, Prathamesh Dharangutte, Vihan Shah, Chen Wang

    Abstract: We study the Maximum Independent Set (MIS) problem on general graphs within the framework of learning-augmented algorithms. The MIS problem is known to be NP-hard and is also NP-hard to approximate to within a factor of $n^{1-δ}$ for any $δ>0$. We show that we can break this barrier in the presence of an oracle obtained through predictions from a machine learning model that answers vertex membersh… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: APPROX 2024

  11. KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

    Authors: Zirui Liu, Jiayi Yuan, Hongye Jin, Shaochen Zhong, Zhaozhuo Xu, Vladimir Braverman, Beidi Chen, Xia Hu

    Abstract: Efficiently serving large language models (LLMs) requires batching of many requests to reduce the cost per request. Yet, with larger batch sizes and longer context lengths, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. Additionally, the loading of the KV cac… ▽ More

    Submitted 25 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICML2024

  12. arXiv:2312.13385  [pdf, other

    cs.RO cs.LG

    ORBSLAM3-Enhanced Autonomous Toy Drones: Pioneering Indoor Exploration

    Authors: Murad Tukan, Fares Fares, Yotam Grufinkle, Ido Talmor, Loay Mualem, Vladimir Braverman, Dan Feldman

    Abstract: Navigating toy drones through uncharted GPS-denied indoor spaces poses significant difficulties due to their reliance on GPS for location determination. In such circumstances, the necessity for achieving proper navigation is a primary concern. In response to this formidable challenge, we introduce a real-time autonomous indoor exploration system tailored for drones equipped with a monocular \emph{… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  13. arXiv:2310.08391  [pdf, other

    stat.ML cs.LG

    How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

    Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

    Abstract: Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a stati… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera Ready

  14. arXiv:2307.05834  [pdf, other

    cs.LG cs.AI

    Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

    Authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang

    Abstract: Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  15. arXiv:2307.04249  [pdf, other

    cs.DS

    Private Data Stream Analysis for Universal Symmetric Norm Estimation

    Authors: Vladimir Braverman, Joel Manning, Zhiwei Steven Wu, Samson Zhou

    Abstract: We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise permutations on an input data stream and include $L_p$ norms, $k$-support norms, top-$k$ norms, and the box norm as special cases. Although it may be possible to de… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  16. arXiv:2306.09396  [pdf, other

    cs.DS cs.LG

    Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

    Authors: Jingfeng Wu, Wennan Zhu, Peter Kairouz, Vladimir Braverman

    Abstract: In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketc… ▽ More

    Submitted 2 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 camera ready version

  17. arXiv:2306.05310  [pdf, other

    cs.LG

    A framework for dynamically training and adapting deep reinforcement learning models to different, low-compute, and continuously changing radiology deployment environments

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: While Deep Reinforcement Learning has been widely researched in medical imaging, the training and deployment of these models usually require powerful GPUs. Since imaging environments evolve rapidly and can be generated by edge devices, the algorithm is required to continually learn and adapt to changing environments, and adjust to low-compute devices. To this end, we developed three image coreset… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  18. arXiv:2306.00188  [pdf, other

    cs.LG cs.CV eess.IV

    Multi-environment lifelong deep reinforcement learning for medical imaging

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Deep reinforcement learning(DRL) is increasingly being explored in medical imaging. However, the environments for medical imaging tasks are constantly evolving in terms of imaging orientations, imaging sequences, and pathologies. To that end, we developed a Lifelong DRL framework, SERIL to continually learn new tasks in changing imaging environments without catastrophic forgetting. SERIL was devel… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  19. arXiv:2305.11980  [pdf, other

    cs.LG

    AutoCoreset: An Automatic Practical Coreset Construction Framework

    Authors: Alaa Maalouf, Murad Tukan, Vladimir Braverman, Daniela Rus

    Abstract: A coreset is a tiny weighted subset of an input set, that closely resembles the loss function, with respect to a certain set of queries. Coresets became prevalent in machine learning as they have shown to be advantageous for many applications. While coreset research is an active research area, unfortunately, coresets are constructed in a problem-dependent manner, where for each problem, a new core… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  20. arXiv:2305.11788  [pdf, other

    cs.LG stat.ML

    Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

    Authors: Jingfeng Wu, Vladimir Braverman, Jason D. Lee

    Abstract: Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen, et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS… ▽ More

    Submitted 15 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 camera ready version

  21. arXiv:2303.16287  [pdf, ps, other

    cs.DS

    Lower Bounds for Pseudo-Deterministic Counting in a Stream

    Authors: Vladimir Braverman, Robert Krauthgamer, Aditya Krishnan, Shay Sapir

    Abstract: Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed simultaneously, to avoid an exponentially larger (and often trivial) space complexity. A common drawback of these randomized approximate algorithms is that independent e… ▽ More

    Submitted 15 May, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: 14 pages, ICALP2023

  22. arXiv:2303.10263  [pdf, other

    cs.LG

    Fixed Design Analysis of Regularization-Based Continual Learning

    Authors: Haoran Li, Jingfeng Wu, Vladimir Braverman

    Abstract: We consider a continual learning (CL) problem with two linear regression tasks in the fixed design setting, where the feature vectors are assumed fixed and the labels are assumed to be random variables. We consider an $\ell_2$-regularized CL algorithm, which computes an Ordinary Least Squares parameter to fit the first dataset, then computes another parameter that fits the second dataset under an… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: CoLLAs 2023 camera-ready version

  23. arXiv:2303.06783  [pdf, other

    cs.LG cs.CV eess.IV

    Asynchronous Decentralized Federated Lifelong Learning for Landmark Localization in Medical Imaging

    Authors: Guangyao Zheng, Michael A. Jacobs, Vladimir Braverman, Vishwa S. Parekh

    Abstract: Federated learning is a recent development in the machine learning area that allows a system of devices to train on one or more tasks without sharing their data to a single location or device. However, this framework still requires a centralized global model to consolidate individual models into one, and the devices train synchronously, which both can be potential bottlenecks for using federated l… ▽ More

    Submitted 10 January, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

  24. arXiv:2303.05151  [pdf, other

    cs.LG cs.AI

    Provable Data Subset Selection For Efficient Neural Network Training

    Authors: Murad Tukan, Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir Braverman, Dan Feldman

    Abstract: Radial basis function neural networks (\emph{RBFNN}) are {well-known} for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the first algorithm to construct coresets for \emph{RBFNNs}, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function n… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  25. arXiv:2303.02255  [pdf, other

    cs.LG math.OC stat.ML

    Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

    Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspec… ▽ More

    Submitted 26 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: ICML 2023 camera ready

  26. arXiv:2302.11510  [pdf, other

    cs.LG cs.CV

    Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging

    Authors: Guangyao Zheng, Samson Zhou, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experienc… ▽ More

    Submitted 9 January, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

  27. arXiv:2209.12054  [pdf, other

    stat.ML cs.LG

    From Local to Global: Spectral-Inspired Graph Neural Networks

    Authors: Ningyuan Huang, Soledad Villar, Carey E. Priebe, Da Zheng, Chengyue Huang, Lin Yang, Vladimir Braverman

    Abstract: Graph Neural Networks (GNNs) are powerful deep learning methods for Non-Euclidean data. Popular GNNs are message-passing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss long-range signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like over-smoothing or over-squashing. To mitigate… ▽ More

    Submitted 4 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at the NeurIPS 2022 GLFrontiers Workshop

  28. arXiv:2209.01901  [pdf, ps, other

    cs.DS

    The Power of Uniform Sampling for Coresets

    Authors: Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, Xuan Wu

    Abstract: Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive er… ▽ More

    Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

  29. arXiv:2208.01857  [pdf, other

    cs.LG math.OC stat.ML

    The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

    Authors: Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 32 pages, 1 figure, 1 table

  30. arXiv:2206.02291  [pdf, other

    cs.CL

    Pretrained Models for Multilingual Federated Learning

    Authors: Orion Weller, Marc Marone, Vladimir Braverman, Dawn Lawrie, Benjamin Van Durme

    Abstract: Since the advent of Federated Learning (FL), research has applied these methods to natural language processing (NLP) tasks. Despite a plethora of papers in FL for NLP, no previous works have studied how multilingual text impacts FL algorithms. Furthermore, multilingual text provides an interesting avenue to examine the impact of non-IID text (e.g. different languages) on FL in naturally occurring… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: NAACL 2022

  31. arXiv:2204.09136  [pdf, other

    cs.DS

    The White-Box Adversarial Data Stream Model

    Authors: Miklos Ajtai, Vladimir Braverman, T. S. Jayram, Sandeep Silwal, Alec Sun, David P. Woodruff, Samson Zhou

    Abstract: We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the $L_1$-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long stre… ▽ More

    Submitted 23 July, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: PODS 2022

  32. arXiv:2203.06514  [pdf, other

    cs.LG cs.AI cs.CV

    Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

    Authors: Ali Abbasi, Parsa Nooralinejad, Vladimir Braverman, Hamed Pirsiavash, Soheil Kolouri

    Abstract: Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming… ▽ More

    Submitted 8 July, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

  33. arXiv:2203.04370  [pdf, other

    cs.LG

    New Coresets for Projective Clustering and Applications

    Authors: Murad Tukan, Xuan Wu, Samson Zhou, Vladimir Braverman, Dan Feldman

    Abstract: $(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

  34. arXiv:2203.03159  [pdf, other

    cs.LG math.OC stat.ML

    Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 28 pages, 2 figures

  35. arXiv:2112.10001  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Cross-Domain Federated Learning in Medical Imaging

    Authors: Vishwa S Parekh, Shuhao Lai, Vladimir Braverman, Jeff Leal, Steven Rowe, Jay J Pillai, Michael A Jacobs

    Abstract: Federated learning is increasingly being explored in the field of medical imaging to train deep learning models on large scale datasets distributed across different data centers while preserving privacy by avoiding the need to transfer sensitive patient information. In this manuscript, we explore federated learning in a multi-domain, multi-task setting wherein different participating nodes may con… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: Under Review for MIDL 2022

  36. arXiv:2110.06198  [pdf, other

    cs.LG math.OC stat.ML

    Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

    Authors: Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) has been shown to generalize well in many deep learning applications. In practice, one often runs SGD with a geometrically decaying stepsize, i.e., a constant initial stepsize followed by multiple geometric stepsize decay, and uses the last iterate as the output. This kind of SGD is known to be nearly minimax optimal for classical finite-dimensional linear regress… ▽ More

    Submitted 11 July, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 35 pages, 2 figures, 1 table. In ICML 2022

  37. arXiv:2109.01635  [pdf, other

    cs.DS

    Symmetric Norm Estimation and Regression on Sliding Windows

    Authors: Vladimir Braverman, Viska Wei, Samson Zhou

    Abstract: The sliding window model generalizes the standard streaming model and often performs better in applications where recent data is more important or more accurate than data that arrived prior to a certain time. We study the problem of approximating symmetric norms (a norm on $\mathbb{R}^n$ that is invariant under sign-flips and coordinate-wise permutations) in the sliding window model, where only th… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: COCOON 2021

  38. arXiv:2108.05439  [pdf, other

    cs.LG

    Gap-Dependent Unsupervised Exploration for Reinforcement Learning

    Authors: Jingfeng Wu, Vladimir Braverman, Lin F. Yang

    Abstract: For the problem of task-agnostic reinforcement learning (RL), an agent first collects samples from an unknown environment without the supervision of reward signals, then is revealed with a reward and is asked to compute a corresponding near-optimal policy. Existing approaches mainly concern the worst-case scenarios, in which no structural information of the reward/transition-dynamics is utilized.… ▽ More

    Submitted 14 March, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: AISTATS 2022 camera ready version

  39. arXiv:2108.04552  [pdf, other

    cs.LG math.OC stat.ML

    The Benefits of Implicit Regularization from SGD in Least Squares Problems

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make s… ▽ More

    Submitted 10 July, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: 33 pages, 1 figure. In NeurIPS 2021

  40. arXiv:2106.16112  [pdf, other

    cs.DS

    Coresets for Clustering with Missing Values

    Authors: Vladimir Braverman, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: We provide the first coreset for clustering points in $\mathbb{R}^d$ that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like $k$-Means, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an $ε$-coreset of a large datas… ▽ More

    Submitted 11 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

  41. arXiv:2106.14952  [pdf, other

    cs.LG cs.DS

    Adversarial Robustness of Streaming Algorithms through Importance Sampling

    Authors: Vladimir Braverman, Avinatan Hassidim, Yossi Matias, Mariano Schain, Sandeep Silwal, Samson Zhou

    Abstract: In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model. Our results are bas… ▽ More

    Submitted 25 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  42. arXiv:2104.08604  [pdf, other

    cs.LG

    Lifelong Learning with Sketched Structural Regularization

    Authors: Haoran Li, Aditya Krishnan, Jingfeng Wu, Soheil Kolouri, Praveen K. Pilly, Vladimir Braverman

    Abstract: Preventing catastrophic forgetting while continually learning new tasks is an essential problem in lifelong learning. Structural regularization (SR) refers to a family of algorithms that mitigate catastrophic forgetting by penalizing the network for changing its "critical parameters" from previous tasks while learning a new one. The penalty is often induced via a quadratic regularizer defined by a… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

  43. Sublinear Time Spectral Density Estimation

    Authors: Vladimir Braverman, Aditya Krishnan, Christopher Musco

    Abstract: We present a new sublinear time algorithm for approximating the spectral density (eigenvalue distribution) of an $n\times n$ normalized graph adjacency or Laplacian matrix. The algorithm recovers the spectrum up to $ε$ accuracy in the Wasserstein-1 distance in $O(n\cdot \text{poly}(1/ε))$ time given sample access to the graph. This result compliments recent work by David Cohen-Steiner, Weihao Kong… ▽ More

    Submitted 14 April, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted to STOC'22

  44. arXiv:2103.12692  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting of Constant-Stepsize SGD for Linear Regression

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: There is an increasing realization that algorithmic inductive biases are central in preventing overfitting; empirically, we often see a benign overfitting phenomenon in overparameterized settings for natural learning algorithms, such as stochastic gradient descent (SGD), where little to no explicit regularization has been employed. This work considers this issue in arguably the most basic setting:… ▽ More

    Submitted 12 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: 56 pages, 2 figures. A short version is accepted at the 34th Annual Conference on Learning Theory (COLT 2021)

  45. arXiv:2011.13034  [pdf, other

    cs.LG stat.ML

    Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

    Authors: Jingfeng Wu, Vladimir Braverman, Lin F. Yang

    Abstract: In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product… ▽ More

    Submitted 27 October, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2021 Camera Ready Version

  46. arXiv:2011.06103  [pdf, other

    cs.DC astro-ph.SR cs.LG

    Sketch and Scale: Geo-distributed tSNE and UMAP

    Authors: Viska Wei, Nikita Ivkin, Vladimir Braverman, Alexander Szalay

    Abstract: Running machine learning analytics over geographically distributed datasets is a rapidly arising problem in the world of data management policies ensuring privacy and data security. Visualizing high dimensional data using tools such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) became common practice for data scientists. Both tools s… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: IEEE BigData2020 conference

  47. arXiv:2011.02538  [pdf, other

    cs.LG stat.ML

    Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

    Authors: Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu

    Abstract: Understanding the algorithmic bias of \emph{stochastic gradient descent} (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works, however, focus on \emph{very small or even infinitesimal} learning rate regime, and fail to cover practical scenarios where the learning rate is \emph{moderate and annealing}. In this paper, we make an initial a… ▽ More

    Submitted 29 March, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: ICLR 2021 Camera Ready

  48. arXiv:2011.01777  [pdf, ps, other

    cs.DS

    Near-Optimal Entrywise Sampling of Numerically Sparse Matrices

    Authors: Vladimir Braverman, Robert Krauthgamer, Aditya Krishnan, Shay Sapir

    Abstract: Many real-world data sets are sparse or almost sparse. One method to measure this for a matrix $A\in \mathbb{R}^{n\times n}$ is the \emph{numerical sparsity}, denoted $\mathsf{ns}(A)$, defined as the minimum $k\geq 1$ such that $\|a\|_1/\|a\|_2 \leq \sqrt{k}$ for every row and every column $a$ of $A$. This measure of $a$ is smooth and is clearly only smaller than the number of non-zeros in the row… ▽ More

    Submitted 5 July, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: 20 pages. To appear in COLT 2021

  49. arXiv:2008.08316  [pdf, other

    cs.LG cs.AI stat.ML

    Data-Independent Structured Pruning of Neural Networks via Coresets

    Authors: Ben Mussay, Daniel Feldman, Samson Zhou, Vladimir Braverman, Margarita Osadchy

    Abstract: Model compression is crucial for deployment of neural networks on devices with limited computational and memory resources. Many different methods show comparable accuracy of the compressed model and similar compression rates. However, the majority of the compression methods are based on heuristics and offer no worst-case guarantees on the trade-off between the compression rate and the approximatio… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

  50. arXiv:2008.06736  [pdf, other

    cs.LG stat.ML

    Obtaining Adjustable Regularization for Free via Iterate Averaging

    Authors: Jingfeng Wu, Vladimir Braverman, Lin F. Yang

    Abstract: Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regress… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: ICML 2020 camera ready