Skip to main content

Showing 1–4 of 4 results for author: Modi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.20535  [pdf, other

    cs.CV cs.AI

    Asynchronous Perception Machine For Efficient Test-Time-Training

    Authors: Rajat Modi, Yogesh Singh Rawat

    Abstract: In this work, we propose Asynchronous Perception Machine (APM), a computationally-efficient architecture for test-time-training (TTT). APM can process patches of an image one at a time in any order asymmetrically and still encode semantic-awareness in the net. We demonstrate APM's ability to recognize out-of-distribution images without dataset-specific pre-training, augmentation or any-pretext tas… ▽ More

    Submitted 5 November, 2024; v1 submitted 27 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024 Main Track. APM is a step to getting Geoffrey Hinton's GLOM working

  2. arXiv:2410.19553  [pdf, other

    cs.CV cs.AI cs.CY

    On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

    Authors: Rajat Modi, Vibhav Vineet, Yogesh Singh Rawat

    Abstract: This paper explores the impact of occlusions in video action detection. We facilitate this study by introducing five new benchmark datasets namely O-UCF and O-JHMDB consisting of synthetically controlled static/dynamic occlusions, OVIS-UCF and OVIS-JHMDB consisting of occlusions with realistic motions and Real-OUCF for occlusions in realistic-world scenarios. We formally confirm an intuitive expec… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: This paper was accepted to NeurIPS 2023 Dataset And Benchmark Track. It also showcases: Hinton's Islands of Agreement on realistic datasets which were previously hypothesized in his GLOM paper

  3. arXiv:2405.03770  [pdf, other

    cs.CV

    Foundation Models for Video Understanding: A Survey

    Authors: Neelu Madan, Andreas Moegelmose, Rajat Modi, Yogesh S. Rawat, Thomas B. Moeslund

    Abstract: Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs achieve this by capturing robust and generic features from video data. This survey analyzes over 200 video foundational models, offering a comprehensive overview of benchmarks and evaluation metrics across 14 distinct video… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2204.07892  [pdf, other

    cs.CV

    Video Action Detection: Analysing Limitations and Challenges

    Authors: Rajat Modi, Aayush Jung Rana, Akash Kumar, Praveen Tirupattur, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah

    Abstract: Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action… ▽ More

    Submitted 16 April, 2022; originally announced April 2022.

    Comments: CVPRW'22