Skip to main content

Showing 1–19 of 19 results for author: Mahadevan, V

.
  1. arXiv:2503.12333  [pdf, other

    cs.RO cs.MA

    GameChat: Multi-LLM Dialogue for Safe, Agile, and Socially Optimal Multi-Agent Navigation in Constrained Environments

    Authors: Vagul Mahadevan, Shangtong Zhang, Rohan Chandra

    Abstract: Safe, agile, and socially compliant multi-robot navigation in cluttered and constrained environments remains a critical challenge. This is especially difficult with self-interested agents in decentralized settings, where there is no central authority to resolve conflicts induced by spatial symmetry. We address this challenge by proposing a novel approach, GameChat, which facilitates safe, agile, a… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  2. arXiv:2410.03061  [pdf, other

    cs.CV cs.CL

    DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

    Authors: Sungnyun Kim, Haofu Liao, Srikar Appalaraju, Peng Tang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan, Stefano Soatto

    Abstract: Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and useful data. In response, we present a new fra… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024

  3. arXiv:2410.02160  [pdf, other

    cs.CR cs.AI cs.LG

    RiskSEA : A Scalable Graph Embedding for Detecting On-chain Fraudulent Activities on the Ethereum Blockchain

    Authors: Ayush Agarwal, Lv Lu, Arjun Maheswaran, Varsha Mahadevan, Bhaskar Krishnamachari

    Abstract: Like any other useful technology, cryptocurrencies are sometimes used for criminal activities. While transactions are recorded on the blockchain, there exists a need for a more rapid and scalable method to detect addresses associated with fraudulent activities. We present RiskSEA, a scalable risk scoring system capable of effectively handling the dynamic nature of large-scale blockchain transactio… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: text overlap with arXiv:2203.12363 by other authors

  4. arXiv:2403.03346  [pdf, other

    cs.CV

    Enhancing Vision-Language Pre-training with Rich Supervisions

    Authors: Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

    Abstract: We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localiza… ▽ More

    Submitted 12 March, 2025; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  5. arXiv:2311.08623  [pdf, other

    cs.CV cs.CL cs.LG

    DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

    Authors: Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

    Abstract: Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding. To accelerate the inference, we propose an approach of performing Dynamic Early Exit on Decoder (DEED). We build a multi-exit encoder-decoder transformer model… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  6. arXiv:2311.08622  [pdf, other

    cs.CV cs.CL cs.LG

    Multiple-Question Multiple-Answer Text-VQA

    Authors: Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan

    Abstract: We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models. The text-VQA task requires a model to answer a question by understanding multi-modal content: text (typically from OCR) and an associated image. To the best of our knowledge, almost all previous approaches for text-VQA process a single question and its associated content to p… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  7. arXiv:2307.07929  [pdf, other

    cs.CV

    DocTr: Document Transformer for Structured Information Extraction in Documents

    Authors: Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan

    Abstract: We present a new formulation for structured information extraction (SIE) from visually rich documents. It aims to address the limitations of existing IOB tagging or graph-based formulations, which are either overly reliant on the correct ordering of input text or struggle with decoding a complex graph. Instead, motivated by anchor-based object detectors in vision, we represent an entity as an anch… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

  8. arXiv:2302.07387  [pdf, other

    cs.CV

    PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

    Authors: Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

    Abstract: In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens… ▽ More

    Submitted 27 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: CVPR 2023. Project Page: https://polyformer.github.io/

  9. arXiv:2301.01209  [pdf, other

    math.NA

    Customizable Adaptive Regularization Techniques for B-Spline Modeling

    Authors: David Lenz, Raine Yeh, Vijay Mahadevan, Iulian Grindeanu, Tom Peterka

    Abstract: B-spline models are a powerful way to represent scientific data sets with a functional approximation. However, these models can suffer from spurious oscillations when the data to be approximated are not uniformly distributed. Model regularization (i.e., smoothing) has traditionally been used to minimize these oscillations; unfortunately, it is sometimes impossible to sufficiently remove unwanted a… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: 11 pages, 10 figures. Extension of the 2022 International Conference on Computational Science (ICCS) proceedings paper arXiv:2203.12730

    Report number: ANL/MCS-P9682-1222

  10. arXiv:2210.06528  [pdf, other

    math.NA cs.DC

    Parallel Domain Decomposition techniques applied to Multivariate Functional Approximation of discrete data

    Authors: Vijay S. Mahadevan, David Lenz, Iulian Grindeanu, Thomas Peterka

    Abstract: Compactly expressing large-scale datasets through Multivariate Functional Approximations (MFA) can be critically important for analysis and visualization to drive scientific discovery. Tackling such problems requires scalable data partitioning approaches to compute MFA representations in amenable wall clock times. We introduce a fully parallel scheme to reduce the total work per task in combinatio… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Submitted to SIAM Journal of Scientific Computing

    MSC Class: 65D05; 65D15; 65Y05

  11. arXiv:2205.08094  [pdf, other

    cs.CV

    MATrIX -- Modality-Aware Transformer for Information eXtraction

    Authors: Thomas Delteil, Edouard Belval, Lei Chen, Luis Goncalves, Vijay Mahadevan

    Abstract: We present MATrIX - a Modality-Aware Transformer for Information eXtraction in the Visual Document Understanding (VDU) domain. VDU covers information extraction from visually rich documents such as forms, invoices, receipts, tables, graphs, presentations, or advertisements. In these, text semantics and visual information supplement each other to provide a global understanding of the document. MATr… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

  12. arXiv:2203.16701  [pdf, other

    cs.LG cs.CR stat.ML

    Towards Differential Relational Privacy and its use in Question Answering

    Authors: Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

    Abstract: Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  13. arXiv:2203.12730  [pdf, other

    math.NA

    Adaptive Regularization of B-Spline Models for Scientific Data

    Authors: David Lenz, Raine Yeh, Vijay Mahadevan, Iulian Grindeanu, Tom Peterka

    Abstract: B-spline models are a powerful way to represent scientific data sets with a functional approximation. However, these models can suffer from spurious oscillations when the data to be approximated are not uniformly distributed. Model regularization (i.e., smoothing) has traditionally been used to minimize these oscillations; unfortunately, it is sometimes impossible to sufficiently remove unwanted a… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: To appear in Proceedings of the International Conference on Computational Science, 2022. 15 pages, 5 figures

    ACM Class: G.1.2

  14. arXiv:2201.01922  [pdf, other

    cs.LG cs.CV

    Contrastive Neighborhood Alignment

    Authors: Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

    Abstract: We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model. The target model aims to mimic the local structure of the source representation space using a contrastive loss. CNA is an… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

    Comments: 10 pages, 7 tables, 3 figures

  15. arXiv:2105.02170  [pdf, other

    cs.CV

    Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

    Authors: Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto

    Abstract: Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion. In this paper, we present a new approach, denoted Part-and-Sum detection Transformer (PST), to perfo… ▽ More

    Submitted 19 August, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

    Comments: Accepted by ICCV2021

  16. arXiv:2012.04123  [pdf, other

    math.NA

    Fourier-Informed Knot Placement Schemes for B-Spline Approximation

    Authors: David Lenz, Oana Marin, Vijay Mahadevan, Raine Yeh, Tom Peterka

    Abstract: Fitting B-splines to discrete data is especially challenging when the given data contain noise, jumps, or corners. Here, we describe how periodic data sets with these features can be efficiently and robustly approximated with B-splines by analyzing the Fourier spectrum of the data. Our method uses a collection of spectral filters to produce different indicator functions that guide effective knot p… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Report number: ANL/MCS-P9391-1020 MSC Class: 41A15; 65D10; 65D15

  17. arXiv:2006.14615  [pdf, other

    cs.CV cs.LG

    LayoutTransformer: Layout Generation and Completion with Self-attention

    Authors: Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava

    Abstract: We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. Most complex scenes, natural or human-designed, can be expressed as a meaningful arrangement of simpler compositional graphical primitives. Generating a new layout or extending an existing layout requires understanding the relationships between these primitives. To… ▽ More

    Submitted 30 September, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: To appear at ICCV 2021

  18. arXiv:1908.01091  [pdf, other

    cs.LG cs.CV stat.ML

    Toward Understanding Catastrophic Forgetting in Continual Learning

    Authors: Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto

    Abstract: We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlat… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

  19. arXiv:1801.05733  [pdf, ps, other

    physics.comp-ph cond-mat.dis-nn

    Deep Convolutional Neural Networks for Eigenvalue Problems in Mechanics

    Authors: David Finol, Yan Lu, Vijay Mahadevan, Ankit Srivastava

    Abstract: We show that deep convolutional neural networks (CNN) can massively outperform traditional densely-connected neural networks (both deep or shallow) in predicting eigenvalue problems in mechanics. In this sense, we strike out in a new direction in mechanics computations with strongly predictive NNs whose success depends not only on architectures being deep, but also being fundamentally different fr… ▽ More

    Submitted 17 July, 2018; v1 submitted 17 January, 2018; originally announced January 2018.