Skip to main content

Showing 1–3 of 3 results for author: Melvin, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13243  [pdf, other

    cs.CV

    MCTR: Multi Camera Tracking Transformer

    Authors: Alexandru Niculescu-Mizil, Deep Patel, Iain Melvin

    Abstract: Multi-camera tracking plays a pivotal role in various real-world applications. While end-to-end methods have gained significant interest in single-camera tracking, multi-camera tracking remains predominantly reliant on heuristic techniques. In response to this gap, this paper introduces Multi-Camera Tracking tRansformer (MCTR), a novel end-to-end approach tailored for multi-object detection and tr… ▽ More

    Submitted 11 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  2. arXiv:1711.06354  [pdf, other

    cs.CV

    Grounded Objects and Interactions for Video Captioning

    Authors: Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf

    Abstract: We address the problem of video captioning by grounding language generation on object interactions in the video. Existing work mostly focuses on overall scene understanding with often limited or no emphasis on object interactions to address the problem of video understanding. In this paper, we propose SINet-Caption that learns to generate captions grounded over higher-order interactions between ar… ▽ More

    Submitted 16 November, 2017; originally announced November 2017.

    Comments: arXiv admin note: substantial text overlap with arXiv:1711.06330

  3. arXiv:1711.06330  [pdf, other

    cs.CV

    Attend and Interact: Higher-Order Object Interactions for Video Understanding

    Authors: Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf

    Abstract: Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwise object relationships. Furthermore, learning interactions across multiple objects in hundreds of frames for video is computationally infeasible and… ▽ More

    Submitted 20 March, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: CVPR 2018