Skip to main content

Showing 1–13 of 13 results for author: Omama, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24265  [pdf, ps, other

    cs.MA

    R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning

    Authors: Harsh Goel, Mohammad Omama, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Sandeep Chinchali

    Abstract: Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 21 pages, To appear in the International Conference of Machine Learning (ICML 2025)

  2. arXiv:2503.23465  [pdf, other

    cs.RO

    SparseLoc: Sparse Open-Set Landmark-based Global Localization for Autonomous Navigation

    Authors: Pranjal Paul, Vineeth Bhat, Tejas Salian, Mohammad Omama, Krishna Murthy Jatavallabhula, Naveen Arulselvan, K. Madhava Krishna

    Abstract: Global localization is a critical problem in autonomous navigation, enabling precise positioning without reliance on GPS. Modern global localization techniques often depend on dense LiDAR maps, which, while precise, require extensive storage and computational resources. Recent approaches have explored alternative methods, such as sparse maps and learned features, but they suffer from poor robustne… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  3. arXiv:2411.16718  [pdf, other

    cs.CV cs.AI

    Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification

    Authors: S P Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali

    Abstract: Recent advancements in text-to-video models such as Sora, Gen-3, MovieGen, and CogVideoX are pushing the boundaries of synthetic video generation, with adoption seen in fields like robotics, autonomous driving, and entertainment. As these models become prevalent, various metrics and benchmarks have emerged to evaluate the quality of the generated videos. However, these metrics emphasize visual qua… ▽ More

    Submitted 24 April, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Journal ref: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2025

  4. arXiv:2411.10513  [pdf, other

    cs.CV cs.IR cs.MM

    Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction

    Authors: Po-han Li, Yunhao Yang, Mohammad Omama, Sandeep Chinchali, Ufuk Topcu

    Abstract: Autonomous agents perceive and interpret their surroundings by integrating multimodal inputs, such as vision, audio, and LiDAR. These perceptual modalities support retrieval tasks, such as place recognition in robotics. However, current multimodal retrieval systems encounter difficulties when parts of the data are missing due to sensor failures or inaccessibility, such as silent videos or LiDAR sc… ▽ More

    Submitted 25 November, 2024; v1 submitted 15 November, 2024; originally announced November 2024.

  5. arXiv:2410.07022  [pdf, other

    cs.IR

    Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval

    Authors: Mohammad Omama, Po-han Li, Sandeep P. Chinchali

    Abstract: Image retrieval is crucial in robotics and computer vision, with downstream applications in robot place recognition and vision-based product recommendations. Modern retrieval systems face two key challenges: scalability and efficiency. State-of-the-art image retrieval systems train specific neural networks for each dataset, an approach that lacks scalability. Furthermore, since retrieval speed is… ▽ More

    Submitted 1 April, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

  6. Towards Neuro-Symbolic Video Understanding

    Authors: Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali

    Abstract: The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reaso… ▽ More

    Submitted 3 December, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by The European Conference on Computer Vision (ECCV) 2024

  7. arXiv:2312.16648  [pdf, other

    cs.RO cs.CV

    LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization

    Authors: Sai Shubodh Puligilla, Mohammad Omama, Husain Zaidi, Udit Singh Parihar, Madhava Krishna

    Abstract: Global visual localization in LiDAR-maps, crucial for autonomous driving applications, remains largely unexplored due to the challenging issue of bridging the cross-modal heterogeneity gap. Popular multi-modal learning approach Contrastive Language-Image Pre-Training (CLIP) has popularized contrastive symmetric loss using batch construction technique by applying it to multi-modal domains of text a… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: To be presented at WACV-W 2024. Project page: https://shubodhs.ai/liploc

  8. NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving

    Authors: Kaustab Pal, Aditya Sharma, Mohd Omama, Parth N. Shah, K. Madhava Krishna

    Abstract: In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon tha… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Published in 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE)

  9. arXiv:2310.02324  [pdf, other

    cs.RO

    ALT-Pilot: Autonomous navigation with Language augmented Topometric maps

    Authors: Mohammad Omama, Pranav Inani, Pranjal Paul, Sarat Chandra Yellapragada, Krishna Murthy Jatavallabhula, Sandeep Chinchali, Madhava Krishna

    Abstract: We present an autonomous navigation system that operates without assuming HD LiDAR maps of the environment. Our system, ALT-Pilot, relies only on publicly available road network information and a sparse (and noisy) set of crowdsourced language landmarks. With the help of onboard sensors and a language-augmented topometric map, ALT-Pilot autonomously pilots the vehicle to any destination on the roa… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  10. arXiv:2302.07241  [pdf, other

    cs.CV cs.AI cs.RO

    ConceptFusion: Open-set Multimodal 3D Mapping

    Authors: Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba

    Abstract: Building 3D maps of the environment is central to robot navigation, planning, and interaction with objects in a scene. Most existing approaches that integrate semantic concepts with 3D maps largely remain confined to the closed-set setting: they can only reason about a finite set of concepts, pre-defined at training time. Further, these maps can only be queried using class labels, or in recent wor… ▽ More

    Submitted 23 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: RSS 2023. Project page: https://concept-fusion.github.io Explainer video: https://www.youtube.com/watch?v=rkXgws8fiDs Code: https://github.com/concept-fusion/concept-fusion

  11. arXiv:2203.06897  [pdf, other

    cs.RO

    Drift Reduced Navigation with Deep Explainable Features

    Authors: Mohd Omama, Sundar Sripada Venugopalaswamy Sriraman, Sandeep Chinchali, Arun Kumar Singh, K. Madhava Krishna

    Abstract: Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i.e., localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argu… ▽ More

    Submitted 25 November, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: Accepted in IROS 2022

  12. arXiv:2110.14928  [pdf, other

    cs.RO

    Learning Actions for Drift-Free Navigation in Highly Dynamic Scenes

    Authors: Mohd Omama, Sundar Sripada V. S., Sandeep Chinchali, K. Madhava Krishna

    Abstract: We embark on a hitherto unreported problem of an autonomous robot (self-driving car) navigating in dynamic scenes in a manner that reduces its localization error and eventual cumulative drift or Absolute Trajectory Error, which is pronounced in such dynamic scenes. With the hugely popular Velodyne-16 3D LIDAR as the main sensing modality, and the accurate LIDAR-based Localization and Mapping algor… ▽ More

    Submitted 31 March, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted in American Control Conference 2022

  13. arXiv:1905.11922  [pdf, other

    cs.CV

    FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications

    Authors: Arpit Jadon, Mohd. Omama, Akshay Varshney, Mohammad Samar Ansari, Rishabh Sharma

    Abstract: Fire disasters typically result in lot of loss to life and property. It is therefore imperative that precise, fast, and possibly portable solutions to detect fire be made readily available to the masses at reasonable prices. There have been several research attempts to design effective and appropriately priced fire detection systems with varying degrees of success. However, most of them demonstrat… ▽ More

    Submitted 4 September, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: To be submitted to a conference in the future