Skip to main content

Showing 1–28 of 28 results for author: Agarwala, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.09991  [pdf, ps, other

    cs.CC cs.DS

    Bipartite Matching is in Catalytic Logspace

    Authors: Aryan Agarwala, Ian Mertz

    Abstract: Matching is a central problem in theoretical computer science, with a large body of work spanning the last five decades. However, understanding matching in the time-space bounded setting remains a longstanding open question, even in the presence of additional resources such as randomness or non-determinism. In this work we study space-bounded machines with access to catalytic space, which is add… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  2. arXiv:2502.02407  [pdf, other

    cs.LG cs.CL stat.ML

    Avoiding spurious sharpness minimization broadens applicability of SAM

    Authors: Sidak Pal Singh, Hossein Mobahi, Atish Agarwala, Yann Dauphin

    Abstract: Curvature regularization techniques like Sharpness Aware Minimization (SAM) have shown great promise in improving generalization on vision tasks. However, we find that SAM performs poorly in domains like natural language processing (NLP), often degrading performance -- even with twice the compute budget. We investigate the discrepancy across domains and find that in the NLP setting, SAM is dominat… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  3. arXiv:2412.19249  [pdf, ps, other

    cs.DS

    A Space Lower Bound for Approximate Membership with Duplicate Insertions or Deletions of Nonelements

    Authors: Aryan Agarwala, Guy Even

    Abstract: Designs of data structures for approximate membership queries with false-positive errors that support both insertions and deletions stipulate the following two conditions: (1) Duplicate insertions are prohibited, i.e., it is prohibited to insert an element $x$ if $x$ is currently a member of the dataset. (2) Deletions of nonelements are prohibited, i.e., it is prohibited to delete $x$ if $x$ is no… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    ACM Class: E.1; E.2

  4. arXiv:2411.12135  [pdf, other

    stat.ML cs.LG

    Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects

    Authors: Ke Liang Xiao, Noah Marshall, Atish Agarwala, Elliot Paquette

    Abstract: In recent years, signSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers like Adam. Though there is a general consensus that signSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of signSGD in a high dimensio… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  5. arXiv:2407.20685  [pdf, other

    cs.ET cs.CL

    CultureVo: The Serious Game of Utilizing Gen AI for Enhancing Cultural Intelligence

    Authors: Ajita Agarwala, Anupam Purwar, Viswanadhasai Rao

    Abstract: CultureVo, Inc. has developed the Integrated Culture Learning Suite (ICLS) to deliver foundational knowledge of world cultures through a combination of interactive lessons and gamified experiences. This paper explores how Generative AI powered by open source Large Langauge Models are utilized within the ICLS to enhance cultural intelligence. The suite employs Generative AI techniques to automate t… ▽ More

    Submitted 1 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: Fourth International Conference on AI-ML Systems, 8-11 October, 2024, Louisiana, USA

  6. arXiv:2407.06183  [pdf, other

    cs.LG

    Stepping on the Edge: Curvature Aware Learning Rate Tuners

    Authors: Vincent Roulet, Atish Agarwala, Jean-Bastien Grill, Grzegorz Swirszcz, Mathieu Blondel, Fabian Pedregosa

    Abstract: Curvature information -- particularly, the largest eigenvalue of the loss Hessian, known as the sharpness -- often forms the basis for learning rate tuners. However, recent work has shown that the curvature information undergoes complex dynamics during training, going from a phase of increasing sharpness to eventual stabilization. We analyze the closed-loop feedback effect between learning rate tu… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  7. arXiv:2406.11733  [pdf, other

    stat.ML cs.LG

    To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions

    Authors: Noah Marshall, Ke Liang Xiao, Atish Agarwala, Elliot Paquette

    Abstract: The success of modern machine learning is due in part to the adaptive optimization methods that have been developed to deal with the difficulties of training large models over complex datasets. One such method is gradient clipping: a practical procedure with limited theoretical underpinnings. In this work, we study clipping in a least squares problem under streaming SGD. We develop a theoretical a… ▽ More

    Submitted 6 October, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2404.19261  [pdf, other

    cs.LG math.OC math.ST physics.data-an

    High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

    Authors: Atish Agarwala, Jeffrey Pennington

    Abstract: Recent empirical and theoretical work has shown that the dynamics of the large eigenvalues of the training loss Hessian have some remarkably robust features across models and datasets in the full batch regime. There is often an early period of progressive sharpening where the large eigenvalues increase, followed by stabilization at a predictable value known as the edge of stability. Previous work… ▽ More

    Submitted 31 January, 2025; v1 submitted 30 April, 2024; originally announced April 2024.

  9. arXiv:2402.05271  [pdf, other

    stat.ML cs.AI cs.LG

    Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

    Authors: Daniel Beaglehole, Ioannis Mitliagkas, Atish Agarwala

    Abstract: Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as… ▽ More

    Submitted 17 November, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  10. arXiv:2401.10809  [pdf, other

    cs.LG

    Neglected Hessian component explains mysteries in Sharpness regularization

    Authors: Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

    Abstract: Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  11. arXiv:2312.00209  [pdf, other

    cs.LG cs.AI math.OC

    On the Interplay Between Stepsize Tuning and Progressive Sharpening

    Authors: Vincent Roulet, Atish Agarwala, Fabian Pedregosa

    Abstract: Recent empirical work has revealed an intriguing property of deep learning models by which the sharpness (largest eigenvalue of the Hessian) increases throughout optimization until it stabilizes around a critical value at which the optimizer operates at the edge of stability, given a fixed stepsize (Cohen et al, 2022). We investigate empirically how the sharpness evolves when using stepsize-tuners… ▽ More

    Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Presented at the NeurIPS 2023 OPT Wokshop

  12. arXiv:2308.01976  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces

    Authors: Dayananda Ubrangala, Juhi Sharma, Ravi Prasad Kondapalli, Kiran R, Amit Agarwala, Laurent Boué

    Abstract: Typographical errors are a major source of frustration for visitors of online marketplaces. Because of the domain-specific nature of these marketplaces and the very short queries users tend to search for, traditional spell cheking solutions do not perform well in correcting typos. We present a data augmentation method to address the lack of annotated typo data and train a recurrent neural network… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Journal ref: Microsoft Journal of Applied Research, Volume 19, 2023

  13. arXiv:2302.08692  [pdf, other

    cs.LG

    SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

    Authors: Atish Agarwala, Yann N. Dauphin

    Abstract: The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong r… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  14. arXiv:2210.04860  [pdf, other

    cs.LG cs.AI math.OC

    Second-order regression models exhibit progressive sharpening to the edge of stability

    Authors: Atish Agarwala, Fabian Pedregosa, Jeffrey Pennington

    Abstract: Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant N… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  15. arXiv:2207.09432  [pdf, other

    cs.LG

    Deep equilibrium networks are sensitive to initialization statistics

    Authors: Atish Agarwala, Samuel S. Schoenholz

    Abstract: Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

  16. arXiv:2205.14929  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Neural Volumetric Object Selection

    Authors: Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang

    Abstract: We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF). Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views. To achieve this result, we propose a novel voxel fe… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 camera ready

  17. arXiv:2103.15261  [pdf, other

    cs.LG cs.AI stat.ML

    One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

    Authors: Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, Qiuyi Zhang

    Abstract: Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that a single neural network is capable of simultaneously learning multiple tasks from a combined data set, for a vari… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

    Comments: 30 pages, 6 figures

  18. arXiv:2010.07344  [pdf, other

    cs.LG cs.AI

    Temperature check: theory and practice for training models with softmax-cross-entropy losses

    Authors: Atish Agarwala, Jeffrey Pennington, Yann Dauphin, Sam Schoenholz

    Abstract: The softmax function combined with a cross-entropy loss is a principled approach to modeling probability distributions that has become ubiquitous in deep learning. The softmax function is defined by a lone hyperparameter, the temperature, that is commonly set to one or regarded as a way to tune model confidence after training; however, less is known about how the temperature impacts training dynam… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

  19. arXiv:2005.07724  [pdf, other

    cs.LG stat.ML

    Learning the gravitational force law and other analytic functions

    Authors: Atish Agarwala, Abhimanyu Das, Rina Panigrahy, Qiuyi Zhang

    Abstract: Large neural network models have been successful in learning functions of importance in many branches of science, including physics, chemistry and biology. Recent theoretical work has shown explicit learning bounds for wide networks and kernel methods on some simple classes of functions, but not on more complex functions which arise in practice. We extend these techniques to provide learning bound… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  20. arXiv:2004.02132  [pdf, other

    cs.CV

    Deep Homography Estimation for Dynamic Scenes

    Authors: Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

    Abstract: Homography estimation is an important step in many computer vision problems. Recently, deep neural network methods have shown to be favorable for this problem when compared to traditional methods. However, these new methods do not consider dynamic content in input images. They train neural networks with only image pairs that can be perfectly aligned using homographies. This paper investigates and… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

    Comments: CVPR 2020, https://github.com/lcmhoang/hmg-dynamics

  21. arXiv:1811.11283  [pdf, other

    cs.CV cs.AI

    A Compact Embedding for Facial Expression Similarity

    Authors: Raviteja Vemulapalli, Aseem Agarwala

    Abstract: Most of the existing work on automatic facial expression analysis focuses on discrete emotion recognition, or facial action unit detection. However, facial expressions do not always fall neatly into pre-defined semantic categories. Also, the similarity between expressions measured in the action unit space need not correspond to how humans perceive expression similarity. Different from previous wor… ▽ More

    Submitted 9 January, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

  22. arXiv:1702.02463  [pdf, other

    cs.CV cs.GR cs.LG

    Video Frame Synthesis using Deep Voxel Flow

    Authors: Ziwei Liu, Raymond A. Yeh, Xiaoou Tang, Yiming Liu, Aseem Agarwala

    Abstract: We address the problem of synthesizing new video frames in an existing video, either in-between existing frames (interpolation), or subsequent to them (extrapolation). This problem is challenging because video appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation is challenging, while newer neural-network-based methods that hallucina… ▽ More

    Submitted 5 August, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

    Comments: To appear in ICCV 2017 as an oral paper. More details at the project page: https://liuziwei7.github.io/projects/VoxelFlow.html

  23. arXiv:1611.09961  [pdf, other

    cs.CV

    Semantic Facial Expression Editing using Autoencoded Flow

    Authors: Raymond Yeh, Ziwei Liu, Dan B Goldman, Aseem Agarwala

    Abstract: High-level manipulation of facial expressions in images --- such as changing a smile to a neutral expression --- is challenging because facial expression changes are highly non-linear, and vary depending on the appearance of the face. We present a fully automatic approach to editing faces that combines the advantages of flow-based face manipulation with the more recent generative capabilities of V… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

  24. arXiv:1507.07068  [pdf

    physics.comp-ph cond-mat.mtrl-sci cs.DC

    Performance metrics in a hybrid MPI-OpenMP based molecular dynamics simulation with short-range interactions

    Authors: Anirban Pal, Abhishek Agarwala, Soumyendu Raha, Baidurya Bhattacharya

    Abstract: We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nanoindentat… ▽ More

    Submitted 25 July, 2015; originally announced July 2015.

    Journal ref: Journal of Parallel and Distributed Computing, Elsevier, vol. 74, no. 3, pp. 2203-2214, 2014

  25. arXiv:1507.03196  [pdf, other

    cs.CV

    DeepFont: Identify Your Font from An Image

    Authors: Zhangyang Wang, Jianchao Yang, Hailin Jin, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

    Abstract: As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem, and advance the state-of-the-art remarkably by developing the DeepFont system. First of all, we build up the first available large-scale VFR dataset, named AdobeVFR, consisting o… ▽ More

    Submitted 12 July, 2015; originally announced July 2015.

    Comments: To Appear in ACM Multimedia as a full paper

  26. arXiv:1504.00028  [pdf, other

    cs.CV cs.LG

    Real-World Font Recognition Using Deep Network and Domain Adaptation

    Authors: Zhangyang Wang, Jianchao Yang, Hailin Jin, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

    Abstract: We address a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic domain gap caused poor generalization to new real data in previous methods (Chen et al. (2014)). In this paper, we refer to Convolutional Neural… ▽ More

    Submitted 31 March, 2015; originally announced April 2015.

  27. arXiv:1412.5758   

    cs.CV

    Decomposition-Based Domain Adaptation for Real-World Font Recognition

    Authors: Zhangyang Wang, Jianchao Yang, Hailin Jin, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

    Abstract: We present a domain adaption framework to address a domain mismatch between synthetic training and real-world testing data. We demonstrate our method on a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic dom… ▽ More

    Submitted 1 April, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

    Comments: This paper has been withdrawn by the author due to project concerns

  28. Recognizing Image Style

    Authors: Sergey Karayev, Matthew Trentacoste, Helen Han, Aseem Agarwala, Trevor Darrell, Aaron Hertzmann, Holger Winnemoeller

    Abstract: The style of an image plays a significant role in how it is viewed, but style has received little attention in computer vision research. We describe an approach to predicting style of images, and perform a thorough evaluation of different image features for these tasks. We find that features learned in a multi-layer network generally perform best -- even when trained with object class (not style)… ▽ More

    Submitted 23 July, 2014; v1 submitted 14 November, 2013; originally announced November 2013.

    Journal ref: Proc. British Machine Vision Conference (BMVC) 2014