Skip to main content

Showing 1–13 of 13 results for author: G, V K B

.
  1. arXiv:2503.19355  [pdf, other

    cs.CV

    ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models

    Authors: Dohwan Ko, Sihyeon Kim, Yumin Suh, Vijay Kumar B. G, Minseo Yoon, Manmohan Chandraker, Hyunwoo J. Kim

    Abstract: Spatio-temporal reasoning is essential in understanding real-world environments in various fields, eg, autonomous driving and sports analytics. Recent advances have improved the spatial reasoning ability of Vision-Language Models (VLMs) by introducing large-scale data, but these models still struggle to analyze kinematic elements like traveled distance and speed of moving objects. To bridge this g… ▽ More

    Submitted 26 March, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

  2. arXiv:2503.19263  [pdf, other

    cs.CV

    DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

    Authors: Fucai Ke, Vijay Kumar B G, Xingjian Leng, Zhixi Cai, Zaid Khan, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi, Manmohan Chandraker

    Abstract: Visual reasoning (VR), which is crucial in many fields for enabling human-like visual understanding, remains highly challenging. Recently, compositional visual reasoning approaches, which leverage the reasoning abilities of large language models (LLMs) with integrated tools to solve problems, have shown promise as more effective strategies than end-to-end VR methods. However, these approaches face… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  3. arXiv:2401.00125  [pdf, other

    cs.AI cs.CV

    LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

    Authors: S P Sharan, Francesco Pittaluga, Vijay Kumar B G, Manmohan Chandraker

    Abstract: Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that requir… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 15 pages, 8 figures, 7 tables

  4. arXiv:2401.00094  [pdf, other

    cs.CV

    Generating Enhanced Negatives for Training Language-Based Object Detectors

    Authors: Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

    Abstract: The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make… ▽ More

    Submitted 12 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: Accepted to CVPR 2024. The supplementary document included

  5. arXiv:2311.01295  [pdf, ps, other

    cs.LG cs.CR cs.CV

    DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

    Authors: Wenxuan Bao, Francesco Pittaluga, Vijay Kumar B G, Vincent Bindschaedler

    Abstract: Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the l… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 17 pages, 2 figures, to be published in Neural Information Processing Systems 2023

  6. arXiv:2308.06412  [pdf, other

    cs.CV

    Taming Self-Training for Open-Vocabulary Object Detection

    Authors: Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas

    Abstract: Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distr… ▽ More

    Submitted 12 April, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted to CVPR 2024. The supplementary document included

  7. arXiv:2304.11463  [pdf, other

    cs.CV

    OmniLabel: A Challenging Benchmark for Language-Based Object Detection

    Authors: Samuel Schulter, Vijay Kumar B G, Yumin Suh, Konstantinos M. Dafnis, Zhixing Zhang, Shiyu Zhao, Dimitris Metaxas

    Abstract: Language-based object detection is a promising direction towards building a natural interface to describe objects in images that goes far beyond plain category names. While recent methods show great progress in that direction, proper evaluation is lacking. With OmniLabel, we propose a novel task definition, dataset, and evaluation metric. The task subsumes standard- and open-vocabulary detection a… ▽ More

    Submitted 14 August, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 Oral - Visit our project website at https://www.omnilabel.org

  8. arXiv:2207.08954  [pdf, other

    cs.CV

    Exploiting Unlabeled Data with Vision and Language Models for Object Detection

    Authors: Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

    Abstract: Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectiv… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022 (with the supplementary document)

  9. arXiv:2109.02762  [pdf, other

    cs.CV

    STRIVE: Scene Text Replacement In Videos

    Authors: Vijay Kumar B G, Jeyasri Subramanian, Varnith Chordia, Eugene Bart, Shaobo Fang, Kelly Guan, Raja Bala

    Abstract: We propose replacing scene text in videos using deep style transfer and learned photometric transformations.Building on recent progress on still image text replacement,we present extensions that alter text while preserving the appearance and motion characteristics of the original video.Compared to the problem of still image text replacement,our method addresses additional challenges introduced by… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: ICCV 2021, Project Page: https://striveiccv2021.github.io/STRIVE-ICCV2021/

  10. arXiv:1806.00911  [pdf, other

    cs.CV

    Bayesian Semantic Instance Segmentation in Open Set World

    Authors: Trung Pham, Vijay Kumar B G, Thanh-Toan Do, Gustavo Carneiro, Ian Reid

    Abstract: This paper addresses the semantic instance segmentation task in the open-set conditions, where input images can contain known and unknown object classes. The training process of existing semantic instance segmentation methods requires annotation masks for all object instances, which is expensive to acquire or even infeasible in some realistic scenarios, where the number of categories may increase… ▽ More

    Submitted 29 July, 2018; v1 submitted 3 June, 2018; originally announced June 2018.

    Comments: Accepted to ECCV 2018

  11. arXiv:1704.01285  [pdf, other

    cs.CV

    Smart Mining for Deep Metric Learning

    Authors: Ben Harwood, Vijay Kumar B G, Gustavo Carneiro, Ian Reid, Tom Drummond

    Abstract: To solve deep metric learning problems and producing feature embeddings, current methodologies will commonly use a triplet model to minimise the relative distance between samples from the same class and maximise the relative distance between samples from different classes. Though successful, the training convergence of this triplet model can be compromised by the fact that the vast majority of the… ▽ More

    Submitted 27 July, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

    Comments: *Vijay Kumar B G and Ben Harwood contributed equally to this work. Accepted in IEEE International Conference on Computer Vision, ICCV 2017

  12. arXiv:1611.08998  [pdf, other

    cs.CV cs.AI cs.LG

    DeepSetNet: Predicting Sets with Deep Neural Networks

    Authors: S. Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick, Ian Reid

    Abstract: This paper addresses the task of set prediction using deep learning. This is important because the output of many computer vision tasks, including image tagging and object detection, are naturally expressed as sets of entities rather than vectors. As opposed to a vector, the size of a set is not fixed in advance, and it is invariant to the ordering of entities within it. We define a likelihood for… ▽ More

    Submitted 10 August, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

    Comments: Accepted in IEEE International Conference on Computer Vision (ICCV), Venice, 2017, (Spotlight)

  13. arXiv:1512.09272  [pdf, other

    cs.CV

    Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimising Global Loss Functions

    Authors: Vijay Kumar B G, Gustavo Carneiro, Ian Reid

    Abstract: Recent innovations in training deep convolutional neural network (ConvNet) models have motivated the design of new methods to automatically learn local image descriptors. The latest deep ConvNets proposed for this task consist of a siamese network that is trained by penalising misclassification of pairs of local image patches. Current results from machine learning show that replacing this siamese… ▽ More

    Submitted 1 August, 2016; v1 submitted 31 December, 2015; originally announced December 2015.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition 2016 (CVPR 2016)