Skip to main content

Showing 1–31 of 31 results for author: Malladi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.19761  [pdf, other

    econ.TH cs.GT

    Rediscovery

    Authors: Martino Banchio, Suraj Malladi

    Abstract: We model search in settings where decision makers know what can be found but not where to find it. A searcher faces a set of choices arranged by an observable attribute. Each period, she either selects a choice and pays a cost to learn about its quality, or she concludes search to take her best discovery to date. She knows that similar choices have similar qualities and uses this to guide her sear… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Extended Abstract in EC'24

  2. arXiv:2503.19206  [pdf, other

    cs.CL cs.AI

    Overtrained Language Models Are Harder to Fine-Tune

    Authors: Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, Aditi Raghunathan

    Abstract: Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. We term this phenomenon catastrophic overtraining. For example, the instructi… ▽ More

    Submitted 27 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: 72 pages, 65 figures, 6 tables

  3. arXiv:2503.10915  [pdf, other

    cs.HC

    Usable Privacy in Virtual Worlds: Design Implications for Data Collection Awareness and Control Interfaces in Virtual Reality

    Authors: Viktorija Paneva, Verena Winterhalter, Naga Sai Surya Vamsy Malladi, Marvin Strauss, Stefan Schneegass, Florian Alt

    Abstract: Extended reality (XR) devices have become ubiquitous. They are equipped with arrays of sensors, collecting extensive user and environmental data, allowing inferences about sensitive user information users may not realize they are sharing. Current VR privacy notices largely replicate mechanisms from 2D interfaces, failing to leverage the unique affordances of virtual 3D environments. To address thi… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  4. arXiv:2501.06314  [pdf, other

    cs.AI cs.MA

    BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems

    Authors: Nikita Mehandru, Amanda K. Hall, Olesya Melnichenko, Yulia Dubinina, Daniel Tsirulnikov, David Bamman, Ahmed Alaa, Scott Saponas, Venkat S. Malladi

    Abstract: Creating end-to-end bioinformatics workflows requires diverse domain expertise, which poses challenges for both junior and senior researchers as it demands a deep understanding of both genomics concepts and computational techniques. While large language models (LLMs) provide some assistance, they often fall short in providing the nuanced guidance needed to execute complex bioinformatics tasks, and… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

  5. arXiv:2501.01956  [pdf, ps, other

    cs.CL

    Metadata Conditioning Accelerates Language Model Pre-training

    Authors: Tianyu Gao, Alexander Wettig, Luxi He, Yihe Dong, Sadhika Malladi, Danqi Chen

    Abstract: The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging. To address this, we propose a new method, termed Metadata Conditioning then Cooldown (MeCo), to incorporate… ▽ More

    Submitted 27 June, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

    Comments: Accepted to ICML 2025. Code available at https://github.com/princeton-pli/MeCo

  6. arXiv:2411.12600  [pdf, ps, other

    cs.LG cs.AI

    Provable unlearning in topic modeling and downstream tasks

    Authors: Stanley Wei, Sadhika Malladi, Sanjeev Arora, Amartya Sanyal

    Abstract: Machine unlearning algorithms are increasingly important as legal concerns arise around the provenance of training data, but verifying the success of unlearning is often difficult. Provable guarantees for unlearning are often limited to supervised learning settings. In this paper, we provide the first theoretical guarantees for unlearning in the pre-training and fine-tuning paradigm by studying to… ▽ More

    Submitted 20 April, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

  7. arXiv:2411.02283  [pdf

    cs.SE cs.CE

    Continuous Analysis: Evolution of Software Engineering and Reproducibility for Science

    Authors: Venkat S. Malladi, Maria Yazykova, Olesya Melnichenko, Yulia Dubinina

    Abstract: Reproducibility in research remains hindered by complex systems involving data, models, tools, and algorithms. Studies highlight a reproducibility crisis due to a lack of standardized reporting, code and data sharing, and rigorous evaluation. This paper introduces the concept of Continuous Analysis to address the reproducibility challenges in scientific research, extending the DevOps lifecycle. Co… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: 11 pages, 3 figures

  8. arXiv:2410.11820  [pdf, other

    cs.LG

    Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

    Authors: Yiding Jiang, Allan Zhou, Zhili Feng, Sadhika Malladi, J. Zico Kolter

    Abstract: The composition of pretraining data is a key determinant of foundation models' performance, but there is no standard guideline for allocating a limited computational budget across different data sources. Most current approaches either rely on extensive experiments with smaller models or dynamic data adjustments that also require proxy models, both of which significantly increase the workflow compl… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 21 pages, 10 figures

  9. arXiv:2410.08847  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

    Authors: Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin

    Abstract: Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences. Although these methods are designed to teach a model to generate preferred responses more frequently relative to dispreferred responses, prior work has observed that the likelihood of preferred responses often decreases during training. The current work sheds light on th… ▽ More

    Submitted 27 April, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025; Code available at https://github.com/princeton-nlp/unintentional-unalignment

  10. arXiv:2410.05464  [pdf, other

    cs.LG

    Progressive distillation induces an implicit curriculum

    Authors: Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Andrej Risteski, Surbhi Goel

    Abstract: Knowledge distillation leverages a teacher model to improve the training of a student model. A persistent challenge is that a better teacher does not always yield a better student, to which a common mitigation is to use additional supervision from several ``intermediate'' teachers. One empirically validated variant of this principle is progressive distillation, where the student learns from succes… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  11. arXiv:2407.06460  [pdf, other

    cs.CL cs.AI

    MUSE: Machine Unlearning Six-Way Evaluation for Language Models

    Authors: Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang

    Abstract: Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approxim… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  12. arXiv:2406.18521  [pdf, other

    cs.CL cs.CV

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

    Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

    Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 121 pages, 90 figures

  13. arXiv:2405.19534  [pdf, other

    cs.LG cs.AI cs.CL

    Preference Learning Algorithms Do Not Learn Preference Rankings

    Authors: Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho

    Abstract: Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via ranking a… ▽ More

    Submitted 31 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024 camera-ready

  14. arXiv:2405.00013  [pdf, other

    cs.DC

    The GA4GH Task Execution API: Enabling Easy Multi Cloud Task Execution

    Authors: Alexander Kanitz, Matthew H. McLoughlin, Liam Beckman, Venkat S. Malladi, Kyle P. Ellrott

    Abstract: The Global Alliance for Genomics and Health (GA4GH) Task Execution Service (TES) API is a standardized schema and API for describing and executing batch execution tasks. It provides a common way to submit and manage tasks to a variety of compute environments, including on premise High Performance Compute and High Throughput Computing (HPC/HTC) systems, Cloud computing platforms, and hybrid environ… ▽ More

    Submitted 8 February, 2024; originally announced May 2024.

  15. arXiv:2402.04333  [pdf, other

    cs.CL cs.AI cs.LG

    LESS: Selecting Influential Data for Targeted Instruction Tuning

    Authors: Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

    Abstract: Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we… ▽ More

    Submitted 12 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024; Code and data are available at https://github.com/princeton-nlp/LESS

  16. Feel the Breeze: Promoting Relaxation in Virtual Reality using Mid-Air Haptics

    Authors: Naga Sai Surya Vamsy Malladi, Viktorija Paneva, Jörg Müller

    Abstract: Mid-air haptic interfaces employ focused ultrasound waves to generate touchless haptic sensations on the skin. Prior studies have demonstrated the potential positive impact of mid-air haptic feedback on virtual experiences, enhancing aspects such as enjoyment, immersion, and sense of agency. As a highly immersive environment, Virtual Reality (VR) is being explored as a tool for stress management a… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 5 pages, 6 figures. This is the author's version. Final version of records is to appear in 2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

  17. arXiv:2307.15196  [pdf, other

    cs.LG math.OC

    The Marginal Value of Momentum for Small Learning Rate SGD

    Authors: Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li

    Abstract: Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise. In stochastic optimization, such as training neural networks, folklore suggests that momentum may help deep learning optimization by reducing the variance of the stochastic gradient update, but previous theoretical analyses do not find momentum to offer any provable ac… ▽ More

    Submitted 15 April, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

  18. arXiv:2307.01189  [pdf, other

    cs.CL cs.LG

    Trainable Transformer in Transformer

    Authors: Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

    Abstract: Recent works attribute the capability of in-context learning (ICL) in large pre-trained language models to implicitly simulating and fine-tuning an internal model (e.g., linear or 2-layer MLP) during inference. However, such constructions require large memory overhead, which makes simulation of more sophisticated internal models intractable. In this work, we propose an efficient construction, Tran… ▽ More

    Submitted 8 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Code base: https://github.com/abhishekpanigrahi1996/transformer_in_transformer

  19. arXiv:2305.17333  [pdf, other

    cs.LG cs.CL

    Fine-Tuning Language Models with Just Forward Passes

    Authors: Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

    Abstract: Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder opti… ▽ More

    Submitted 11 January, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023 (oral). Code available at https://github.com/princeton-nlp/MeZO

  20. arXiv:2210.05643  [pdf, other

    cs.LG cs.CL

    A Kernel-Based View of Language Model Fine-Tuning

    Authors: Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

    Abstract: It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings. There is minimal theoretical understanding of empirical success, e.g., why fine-tuning a model with $10^8$ or more parameters on a couple dozen training points does not result in overfitting. We investigate whether the Neural Tangent Kernel (NTK) - which originated as a mode… ▽ More

    Submitted 6 June, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted at ICML 2023. Code and pre-computed kernels are publicly available at https://github.com/princeton-nlp/LM-Kernel-FT

  21. arXiv:2205.10287  [pdf, other

    cs.LG

    On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

    Authors: Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

    Abstract: Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam, has been challenging because there were no rigorously proven SDE approximations for thes… ▽ More

    Submitted 31 October, 2024; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: revised for correcting errors in some figures

  22. arXiv:2109.03575  [pdf, other

    cs.CV cs.LG

    Deriving Explanation of Deep Visual Saliency Models

    Authors: Sai Phani Kumar Malladi, Jayanta Mukhopadhyay, Chaker Larabi, Santanu Chaudhury

    Abstract: Deep neural networks have shown their profound impact on achieving human level performance in visual saliency prediction. However, it is still unclear how they learn the task and what it means in terms of understanding human visual system. In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying h… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

  23. arXiv:2102.12470  [pdf, other

    cs.LG stat.ML

    On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

    Authors: Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

    Abstract: It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets. Most attempted explanations propose approximating finite-LR SGD with Ito Stochastic Differential Equations (SDEs), but formal justification for this approximation (e.g., (Li et al., 2019)) only applies to SGD with tiny LR. Experimental verificatio… ▽ More

    Submitted 16 June, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: 36 pages, 20 figures

  24. arXiv:2010.03648  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

    Authors: Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

    Abstract: Autoregressive language models, pretrained using large text corpora to do well on next word prediction, have been successful at solving many downstream tasks, even with zero-shot usage. However, there is little theoretical understanding of this success. This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions… ▽ More

    Submitted 14 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: This version is the camera-ready version for ICLR 2021. Main changes include a detailed discussion about natural tasks, more detailed proof sketch and updated experimental evaluations

  25. Estimating Dispersion Curves from Frequency Response Functions via Vector-Fitting

    Authors: Mohammad I. Albakri, Vijaya V. N. Sriram Malladi, Serkan Gugercin, Pablo A. Tarazaga

    Abstract: Driven by the need for describing and understanding wave propagation in structural materials and components, several analytical, numerical, and experimental techniques have been developed to obtain dispersion curves. Accurate characterization of the structure (waveguide) under test is needed for analytical and numerical approaches. Experimental approaches, on the other hand, rely on analyzing wave… ▽ More

    Submitted 27 December, 2019; originally announced December 2019.

    Comments: Accepted to appear in Mechanical Systems and Signal Processing, in press

  26. arXiv:1812.03354  [pdf, other

    physics.soc-ph cs.SI

    Learning through the Grapevine: The Impact of Noise and the Breadth and Depth of Social Networks

    Authors: Matthew O. Jackson, Suraj Malladi, David McAdams

    Abstract: We examine how well people learn when information is noisily relayed from person to person; and we study how communication platforms can improve learning without censoring or fact-checking messages. We analyze learning as a function of social network depth (how many times information is relayed) and breadth (the number of relay chains accessed). Noise builds up as depth increases, so learning requ… ▽ More

    Submitted 26 June, 2020; v1 submitted 8 December, 2018; originally announced December 2018.

  27. arXiv:1004.2522  [pdf, ps, other

    cs.CR cs.LO

    How to prevent type-flaw and multi-protocol attacks on cryptographic protocols under Exclusive-OR

    Authors: Sreekanth Malladi

    Abstract: Type-flaw attacks and multi-protocol attacks on security protocols have been frequently reported in the literature. Heather et al. and Guttman et al. have proven that these could be prevented by tagging encrypted components with distinct constants in a standard protocol model with free message algebra and perfect encryption. However, most "real-world" protocols such as SSL 3.0 are designed with th… ▽ More

    Submitted 19 June, 2010; v1 submitted 14 April, 2010; originally announced April 2010.

    Comments: 37 pages plus 14 pages in the Appendix

    Report number: DSU-BIS-IA-Mall2010A

  28. arXiv:1003.5406  [pdf, ps, other

    cs.CR cs.DM cs.LO cs.SC

    Disabling equational theories in unification for cryptographic protocol analysis through tagging

    Authors: Sreekanth Malladi

    Abstract: In this paper, we show a new tagging scheme for cryptographic protocol messages. Under this tagging, equational theories of operators such as exclusive-or, binary addition etc. are effectively disabled, when terms are unified. We believe that this result has a significant impact on protocol analysis and security, since unification is at the heart of symbolic protocol analysis. Hence, disabling equ… ▽ More

    Submitted 9 April, 2010; v1 submitted 28 March, 2010; originally announced March 2010.

    Comments: 8 pages, submitted for publication

    Report number: DSU-BIS-MSIA-Mall2010C

  29. arXiv:1003.5385  [pdf, ps, other

    cs.CR

    How to prevent type-flaw attacks on security protocols under algebraic properties

    Authors: Sreekanth Malladi, Pascal Lafourcade

    Abstract: Type-flaw attacks upon security protocols wherein agents are led to misinterpret message types have been reported frequently in the literature. Preventing them is crucial for protocol security and verification. Heather et al. proved that tagging every message field with it's type prevents all type-flaw attacks under a free message algebra and perfect encryption system. In this paper, we prove that… ▽ More

    Submitted 28 March, 2010; originally announced March 2010.

    Comments: 16 pages, Appeared in proceedings of Security with Rewriting Techniques (SecRet09), Affiliated to CSF Symposium 2009, Port Jefferson, NY.

    Report number: ML09

  30. arXiv:1003.5384  [pdf, other

    cs.CR cs.SC

    Protocol indepedence through disjoint encryption under Exclusive-OR

    Authors: Sreekanth Malladi

    Abstract: Multi-protocol attacks due to protocol interaction has been a notorious problem for security. Gutman-Thayer proved that they can be prevented by ensuring that encrypted messages are distinguishable across protocols, under a free algebra. In this paper, we prove that a similar suggestion prevents these attacks under commonly used operators such as Exclusive-OR, that induce equational theories, brea… ▽ More

    Submitted 9 May, 2010; v1 submitted 28 March, 2010; originally announced March 2010.

    Comments: 22 pages, In Proceedings, Foundations of Security and Privacy (FCS-PrivMod 2010).

    Report number: DSU-BIS-IA-Mallb2010

  31. arXiv:1003.5383  [pdf, other

    cs.CR cs.NI

    Automatic analysis of distance bounding protocols

    Authors: Sreekanth Malladi, Bezawada Bruhadeshwar, Kishore Kothapalli

    Abstract: Distance bounding protocols are used by nodes in wireless networks to calculate upper bounds on their distances to other nodes. However, dishonest nodes in the network can turn the calculations both illegitimate and inaccurate when they participate in protocol executions. It is important to analyze protocols for the possibility of such violations. Past efforts to analyze distance bounding protocol… ▽ More

    Submitted 28 March, 2010; originally announced March 2010.

    Comments: 22 pages, Appeared in Foundations of Computer Security, (Affiliated workshop of LICS 2009, Los Angeles, CA).

    Report number: FCS09