Skip to main content

Showing 1–2 of 2 results for author: Fateh, F J

.
  1. arXiv:2412.02930  [pdf, other

    cs.CV

    Video LLMs for Temporal Reasoning in Long Videos

    Authors: Fawad Javed Fateh, Umer Ahmed, Hamza Khan, M. Zeeshan Zia, Quoc-Huy Tran

    Abstract: This paper introduces TemporalVLM, a video large language model (video LLM) capable of effective temporal reasoning and fine-grained understanding in long videos. At the core, our approach includes a visual encoder for mapping a long-term input video into features which are time-aware and contain both local and global cues. In particular, it first divides the input video into short-term clips, whi… ▽ More

    Submitted 9 March, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  2. arXiv:2309.06462  [pdf, other

    cs.CV

    Action Segmentation Using 2D Skeleton Heatmaps and Multi-Modality Fusion

    Authors: Syed Waleed Hyder, Muhammad Usama, Anas Zafar, Muhammad Naufil, Fawad Javed Fateh, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran

    Abstract: This paper presents a 2D skeleton-based action segmentation method with applications in fine-grained human activity recognition. In contrast with state-of-the-art methods which directly take sequences of 3D skeleton coordinates as inputs and apply Graph Convolutional Networks (GCNs) for spatiotemporal feature learning, our main idea is to use sequences of 2D skeleton heatmaps as inputs and employ… ▽ More

    Submitted 25 April, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted to ICRA 2024