Skip to main content

Showing 1–7 of 7 results for author: Gang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.13111  [pdf, other

    cs.CV cs.CL cs.LG

    MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

    Authors: Erik Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, Peter Grasch

    Abstract: Multimodal large language models (MLLMs) excel at 2D visual understanding but remain limited in their ability to reason about 3D space. In this work, we leverage large-scale high-quality 3D scene data with open-set annotations to introduce 1) a novel supervised fine-tuning dataset and 2) a new evaluation benchmark, focused on indoor scenes. Our Cubify Anything VQA (CA-VQA) data covers diverse spat… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  2. arXiv:2407.15841  [pdf, other

    cs.CV

    SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

    Authors: Mingze Xu, Mingfei Gao, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang, Afshin Dehghan

    Abstract: We propose SlowFast-LLaVA (or SF-LLaVA for short), a training-free video large language model (LLM) that can jointly capture detailed spatial semantics and long-range temporal context without exceeding the token budget of commonly used LLMs. This is realized by using a two-stream SlowFast design of inputs for Video LLMs to aggregate features from sampled frames in an effective way. Specifically, t… ▽ More

    Submitted 15 September, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Technical report

  3. arXiv:2308.00530  [pdf, other

    cs.CV cs.AI

    Tolerating Annotation Displacement in Dense Object Counting via Point Annotation Probability Map

    Authors: Yuehai Chen, Jing Yang, Badong Chen, Hua Gang, Shaoyi Du

    Abstract: Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation displacement properly which arises from the human annotation process and may lead to different distributions. We conjecture that it would be b… ▽ More

    Submitted 8 November, 2023; v1 submitted 29 July, 2023; originally announced August 2023.

  4. arXiv:2203.02634  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Important Object Identification with Semi-Supervised Learning for Autonomous Driving

    Authors: Jiachen Li, Haiming Gang, Hengbo Ma, Masayoshi Tomizuka, Chiho Choi

    Abstract: Accurate identification of important objects in the scene is a prerequisite for safe and high-quality decision making and motion planning of intelligent agents (e.g., autonomous vehicles) that navigate in complex and dynamic environments. Most existing approaches attempt to employ attention mechanisms to learn importance weights associated with each object indirectly via various tasks (e.g., traje… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: ICRA 2022

  5. arXiv:2202.00182  [pdf, other

    cs.CV cs.AI

    Semi-supervised 3D Object Detection via Temporal Graph Neural Networks

    Authors: Jianren Wang, Haiming Gang, Siddharth Ancha, Yi-Ting Chen, David Held

    Abstract: 3D object detection plays an important role in autonomous driving and other robotics applications. However, these detectors usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging large amounts of unlabeled point cloud videos by semi-supervised learning of 3D object detectors via temporal graph neural networks. Our… ▽ More

    Submitted 6 March, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: 3DV 2021

  6. arXiv:2108.08236  [pdf, other

    cs.CV cs.AI cs.LG cs.MA cs.RO

    LOKI: Long Term and Key Intentions for Trajectory Prediction

    Authors: Harshayu Girase, Haiming Gang, Srikanth Malla, Jiachen Li, Akira Kanehara, Karttikeya Mangalam, Chiho Choi

    Abstract: Recent advances in trajectory prediction have shown that explicit reasoning about agents' intent is important to accurately forecast their motion. However, the current research activities are not directly applicable to intelligent and safety critical systems. This is mainly because very few public datasets are available, and they only consider pedestrian-specific intents for a short temporal horiz… ▽ More

    Submitted 17 September, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 (The dataset is available at https://usa.honda-ri.com/loki)

  7. arXiv:1903.01568  [pdf, other

    cs.CV cs.RO

    The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes

    Authors: Abhishek Patil, Srikanth Malla, Haiming Gang, Yi-Ting Chen

    Abstract: 3D multi-object detection and tracking are crucial for traffic scene understanding. However, the community pays less attention to these areas due to the lack of a standardized benchmark dataset to advance the field. Moreover, existing datasets (e.g., KITTI) do not provide sufficient data and labels to tackle challenging scenes where highly interactive and occluded traffic participants are present.… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

    Comments: The dataset is available at https://usa.honda-ri.com/H3D

    Journal ref: IEEE International Conference on Robotics and Automation (ICRA), 2019