Skip to main content

Showing 1–34 of 34 results for author: Heilbron, F C

.
  1. arXiv:2506.10182  [pdf, ps, other

    cs.CV

    Improving Personalized Search with Regularized Low-Rank Parameter Updates

    Authors: Fiona Ryan, Josef Sivic, Fabian Caba Heilbron, Judy Hoffman, James M. Rehg, Bryan Russell

    Abstract: Personalized vision-language retrieval seeks to recognize new concepts (e.g. "my dog Fido") from only a few examples. This task is challenging because it requires not only learning a new concept from a few images, but also integrating the personal and general knowledge together to recognize the concept in different contexts. In this paper, we show how to effectively adapt the internal representati… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 Highlight. Code: http://github.com/adobe-research/polar-vl

  2. arXiv:2411.12293  [pdf, other

    cs.CV cs.HC cs.MM

    Generative Timelines for Instructed Visual Assembly

    Authors: Alejandro Pardo, Jui-Hsien Wang, Bernard Ghanem, Josef Sivic, Bryan Russell, Fabian Caba Heilbron

    Abstract: The objective of this work is to manipulate visual timelines (e.g. a video) through natural language instructions, making complex timeline editing tasks accessible to non-expert or potentially even disabled users. We call this task Instructed visual assembly. This task is challenging as it requires (i) identifying relevant visual content in the input timeline as well as retrieving relevant visual… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  3. arXiv:2409.01445  [pdf, other

    cs.CV cs.IR cs.LG

    Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

    Authors: Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni

    Abstract: Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos. Such methods could benefit various video editing, processing, and understanding tasks. However, existing approaches operate under the restrictive assumption that a suitable video pair for alignment is given, significantly limiting their broader applicability. To address t… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: ECCV 2024 Oral

  4. arXiv:2405.03190  [pdf, other

    cs.CV

    Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval

    Authors: Jiacheng Cheng, Hijung Valentina Shin, Nuno Vasconcelos, Bryan Russell, Fabian Caba Heilbron

    Abstract: In the recent years, the dual-encoder vision-language models (\eg CLIP) have achieved remarkable text-to-image retrieval performance. However, we discover that these models usually results in very different retrievals for a pair of paraphrased queries. Such behavior might render the retrieval system less predictable and lead to user frustration. In this work, we consider the task of paraphrased te… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2404.03913  [pdf, other

    cs.CV cs.AI cs.LG

    Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

    Authors: Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

    Abstract: While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with t… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  6. arXiv:2404.03477  [pdf, other

    cs.CV

    Towards Automated Movie Trailer Generation

    Authors: Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem

    Abstract: Movie trailers are an essential tool for promoting films and attracting audiences. However, the process of creating trailers can be time-consuming and expensive. To streamline this process, we propose an automatic trailer generation framework that generates plausible trailers from a full movie by automating shot selection and composition. Our approach draws inspiration from machine translation tec… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  7. arXiv:2404.03398  [pdf, other

    cs.CV

    Scaling Up Video Summarization Pretraining with Large Language Models

    Authors: Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

    Abstract: Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem. However, existing video summarization datasets are notably limited in their size, constraining the effectiveness of state-of-the-art methods for generalization. Our work aims to overcome this limitation by capitalizing on the abundance of long-form vide… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  8. arXiv:2308.09775  [pdf, other

    cs.CV

    Long-range Multimodal Pretraining for Movie Understanding

    Authors: Dawit Mureja Argaw, Joon-Young Lee, Markus Woodson, In So Kweon, Fabian Caba Heilbron

    Abstract: Learning computer vision models from (and for) movies has a long-standing history. While great progress has been attained, there is still a need for a pretrained multimodal model that can perform well in the ever-growing set of movie understanding tasks the community has been establishing. In this work, we introduce Long-range Multimodal Pretraining, a strategy, and a model that leverages movie da… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  9. arXiv:2306.10169  [pdf, other

    cs.CV cs.CL cs.LG

    Meta-Personalizing Vision-Language Models to Find Named Instances in Video

    Authors: Chun-Hsiao Yeh, Bryan Russell, Josef Sivic, Fabian Caba Heilbron, Simon Jenni

    Abstract: Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications. While these models allow category-level queries, they currently struggle with personalized searches for moments in a video where a specific object instance such as ``My dog Biscuit'' appears. We present the following three contributions to address this problem. First, we describe a metho… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023. Project webpage: https://danielchyeh.github.io/metaper/

  10. arXiv:2302.13372  [pdf, other

    cs.CV cs.AI cs.LG

    Localizing Moments in Long Video Via Multimodal Guidance

    Authors: Wayner Barrios, Mattia Soldan, Alberto Mario Ceballos-Arroyo, Fabian Caba Heilbron, Bernard Ghanem

    Abstract: The recent introduction of the large-scale, long-form MAD and Ego4D datasets has enabled researchers to investigate the performance of current state-of-the-art methods for video grounding in the long-form setup, with interesting findings: current grounding methods alone fail at tackling this challenging task and setup due to their inability to process long video sequences. In this paper, we propos… ▽ More

    Submitted 15 October, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2023

  11. arXiv:2212.04842  [pdf, other

    cs.CV cs.AI

    PIVOT: Prompting for Video Continual Learning

    Authors: Andrés Villa, Juan León Alcázar, Motasem Alfarra, Kumail Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, Bernard Ghanem

    Abstract: Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectivel… ▽ More

    Submitted 4 April, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  12. arXiv:2211.12493  [pdf, other

    cs.CV cs.HC cs.MM

    Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior

    Authors: David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, Nikolas Martelaro

    Abstract: This paper investigates the challenge of extracting highlight moments from videos. To perform this task, we need to understand what constitutes a highlight for arbitrary video domains while at the same time being able to scale across different domains. Our key insight is that photographs taken by photographers tend to capture the most remarkable or photogenic moments of an activity. Drawing on thi… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: https://humanvideointeraction.github.io/videogenic

  13. arXiv:2211.12492  [pdf, other

    cs.HC cs.CV cs.MM

    VideoMap: Supporting Video Editing Exploration, Brainstorming, and Prototyping in the Latent Space

    Authors: David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, Nikolas Martelaro

    Abstract: Video editing is a creative and complex endeavor and we believe that there is potential for reimagining a new video editing interface to better support the creative and exploratory nature of video editing. We take inspiration from latent space exploration tools that help users find patterns and connections within complex datasets. We present VideoMap, a proof-of-concept video editing interface tha… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: https://humanvideointeraction.github.io/videomap

  14. arXiv:2207.09812  [pdf, other

    cs.CV

    The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing

    Authors: Dawit Mureja Argaw, Fabian Caba Heilbron, Joon-Young Lee, Markus Woodson, In So Kweon

    Abstract: Machine learning is transforming the video editing industry. Recent advances in computer vision have leveled-up video editing tasks such as intelligent reframing, rotoscoping, color grading, or applying digital makeups. However, most of the solutions have focused on video manipulation and VFX. This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-ass… ▽ More

    Submitted 21 July, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Code is available at: https://github.com/dawitmureja/AVE.git

  15. arXiv:2205.05609  [pdf, other

    cs.CV

    Video-ReTime: Learning Temporally Varying Speediness for Time Remapping

    Authors: Simon Jenni, Markus Woodson, Fabian Caba Heilbron

    Abstract: We propose a method for generating a temporally remapped video that matches the desired target duration while maximally preserving natural video dynamics. Our approach trains a neural network through self-supervision to recognize and accurately localize temporally varying changes in the video playback speed. To re-time videos, we 1. use the model to infer the slowness of individual video frames, a… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted at the AI for Content Creation (AICC) workshop at CVPR 2022

  16. arXiv:2203.13371  [pdf, other

    cs.CV

    FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

    Authors: Santiago Castro, Fabian Caba Heilbron

    Abstract: Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval. However, these models have not been adapted to video, mainly because they do not account for the time dimension but also because video frames are different from the typical images (e.g., containing motion blur, and… ▽ More

    Submitted 5 October, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted at BMVC 2022. It includes the supplementary material. The margins and page size were modified to fit the arXiv ID stamp on the left side

  17. arXiv:2202.04947  [pdf, other

    cs.CV cs.SD eess.AS

    OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos

    Authors: Merey Ramazanova, Victor Escorcia, Fabian Caba Heilbron, Chen Zhao, Bernard Ghanem

    Abstract: Egocentric videos capture sequences of human activities from a first-person perspective and can provide rich multimodal signals. However, most current localization methods use third-person videos and only incorporate visual information. In this work, we take a deep look into the effectiveness of audiovisual context in detecting actions in egocentric videos and introduce a simple-yet-effective appr… ▽ More

    Submitted 26 October, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  18. arXiv:2201.09381  [pdf, other

    cs.CV

    vCLIMB: A Novel Video Class Incremental Learning Benchmark

    Authors: Andrés Villa, Kumail Alhamoud, Juan León Alcázar, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

    Abstract: Continual learning (CL) is under-explored in the video domain. The few existing works contain splits with imbalanced class distributions over the tasks, or study the problem in unsuitable datasets. We introduce vCLIMB, a novel video continual learning benchmark. vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning. In contrast to previous… ▽ More

    Submitted 6 April, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: An updated version of our CVPR 2022 paper (oral); v2 adds minor text changes. The code of our benchmark can be found at: https://vclimb.netlify.app/

  19. arXiv:2112.00431  [pdf, other

    cs.CV cs.AI

    MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

    Authors: Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem

    Abstract: The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task. Recent works have begun to discover significant limitations in these datasets, suggesting that state-of-t… ▽ More

    Submitted 28 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: 12 Pages, 6 Figures, 7 Tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR 2022

  20. arXiv:2109.05569  [pdf, other

    cs.CV

    MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

    Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

    Abstract: Understanding movies and their structural patterns is a crucial task in decoding the craft of video editing. While previous works have developed tools for general analysis, such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the Cut type recognition task, whi… ▽ More

    Submitted 24 October, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: Paper's website: https://www.alejandropardo.net/publication/moviecuts/

    Journal ref: ECCV 2022

  21. arXiv:2108.04294  [pdf, other

    cs.CV cs.MM

    Learning to Cut by Watching Movies

    Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

    Abstract: Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea i… ▽ More

    Submitted 29 September, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Accepted at ICCV2021. Paper website: https://alejandropardo.net/publication/learning-to-cut/

  22. arXiv:2107.11851  [pdf, other

    cs.CV

    Transcript to Video: Efficient Clip Sequencing from Texts

    Authors: Yu Xiong, Fabian Caba Heilbron, Dahua Lin

    Abstract: Among numerous videos shared on the web, well-edited ones always attract more attention. However, it is difficult for inexperienced users to make well-edited videos because it requires professional expertise and immense manual labor. To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences… ▽ More

    Submitted 19 November, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: Tech Report; Demo and project page at http://www.xiongyu.me/projects/transcript2video/

  23. arXiv:2106.01667  [pdf, other

    cs.CV

    APES: Audiovisual Person Search in Untrimmed Video

    Authors: Juan Leon Alcazar, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem, Fabian Caba Heilbron

    Abstract: Humans are arguably one of the most important subjects in video streams, many real-world applications such as video summarization or video editing workflows often require the automatic search and retrieval of a person of interest. Despite tremendous efforts in the person reidentification and retrieval domains, few works have developed audiovisual search strategies. In this paper, we present the Au… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  24. arXiv:2101.03682  [pdf, other

    cs.CV

    MAAS: Multi-modal Assignation for Active Speaker Detection

    Authors: Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

    Abstract: Active speaker detection requires a solid integration of multi-modal cues. While individual modalities can approximate a solution, accurate predictions can only be achieved by explicitly fusing the audio and visual features and modeling their temporal progression. Despite its inherent muti-modal nature, current methods still focus on modeling and fusing short-term audiovisual features for individu… ▽ More

    Submitted 5 October, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

  25. arXiv:2007.03815  [pdf, other

    cs.CV cs.MM cs.RO

    Real-time Semantic Segmentation with Fast Attention

    Authors: Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

    Abstract: In deep CNN based models for semantic segmentation, high accuracy relies on rich spatial context (large receptive fields) and fine spatial details (high resolution), both of which incur high computational costs. In this paper, we propose a novel architecture that addresses both challenges and achieves state-of-the-art performance for semantic segmentation of high-resolution images and videos in re… ▽ More

    Submitted 9 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: project page: https://cs-people.bu.edu/pinghu/FANet.html

  26. arXiv:2005.09812  [pdf, other

    cs.CV cs.SD eess.AS

    Active Speakers in Context

    Authors: Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem

    Abstract: Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker. Although this strategy can be enough for addressing single-speaker scenarios, it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationshi… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  27. arXiv:2004.01800  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    Temporally Distributed Networks for Fast Video Semantic Segmentation

    Authors: Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff, Federico Perazzi

    Abstract: We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefo… ▽ More

    Submitted 6 April, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: [CVPR2020] Project: https://github.com/feinanshan/TDNet

  28. arXiv:2003.12041  [pdf, other

    cs.CV

    Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation Protocol

    Authors: Marcos Baptista Rios, Roberto J. López-Sastre, Fabian Caba Heilbron, Jan van Gemert, F. Javier Acevedo-Rodríguez, S. Maldonado-Bascón

    Abstract: The Online Action Detection (OAD) problem needs to be revisited. Unlike traditional offline action detection approaches, where the evaluation metrics are clear and well established, in the OAD setting we find very few works and no consensus on the evaluation protocols to be used. In this work we propose to rethink the OAD scenario, clearly defining the problem itself and the main characteristics t… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: Published at IEEE Access journal

  29. arXiv:2003.09970  [pdf, other

    cs.CV

    The Instantaneous Accuracy: a Novel Metric for the Problem of Online Human Behaviour Recognition in Untrimmed Videos

    Authors: Marcos Baptista Rios, Roberto J. López-Sastre, Fabian Caba Heilbron, Jan van Gemert, Francisco Javier Acevedo-Rodríguez, Saturnino Maldonado-Bascón

    Abstract: The problem of Online Human Behaviour Recognition in untrimmed videos, aka Online Action Detection (OAD), needs to be revisited. Unlike traditional offline action detection approaches, where the evaluation metrics are clear and well established, in the OAD setting we find few works and no consensus on the evaluation protocols to be used. In this paper we introduce a novel online metric, the Instan… ▽ More

    Submitted 25 March, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

    Comments: Published at ICCV 2019 workshop: Human Behaviour Understanding

  30. arXiv:1904.00227  [pdf, other

    cs.CV

    RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization

    Authors: Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

    Abstract: Video action detectors are usually trained using datasets with fully-supervised temporal annotations. Building such datasets is an expensive task. To alleviate this problem, recent methods have tried to leverage weak labeling, where videos are untrimmed and only a video-level label is available. In this paper, we propose RefineLoc, a novel weakly-supervised temporal action localization method. Ref… ▽ More

    Submitted 8 November, 2020; v1 submitted 30 March, 2019; originally announced April 2019.

    Comments: Accepted to WACV 2021. Project website: http://humamalwassel.com/publication/refineloc

  31. arXiv:1808.03766  [pdf, ps, other

    cs.CV

    The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

    Authors: Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

    Abstract: The 3rd annual installment of the ActivityNet Large- Scale Activity Recognition Challenge, held as a full-day workshop in CVPR 2018, focused on the recognition of daily life, high-level, goal-oriented activities from user-generated videos as those found in internet video portals. The 2018 challenge hosted six diverse tasks which aimed to push the limits of semantic visual understanding of videos a… ▽ More

    Submitted 23 August, 2018; v1 submitted 11 August, 2018; originally announced August 2018.

    Comments: CVPR Workshop 2018 challenge summary

  32. arXiv:1807.10706  [pdf, other

    cs.CV

    Diagnosing Error in Temporal Action Detectors

    Authors: Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

    Abstract: Despite the recent progress in video understanding and the continuous rate of improvement in temporal action localization throughout the years, it is still unclear how far (or close?) we are to solving the problem. To this end, we introduce a new diagnostic tool to analyze the performance of temporal action detectors in videos and compare different methods beyond a single scalar metric. We exempli… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

    Comments: Accepted to ECCV 2018

  33. arXiv:1710.08011  [pdf, other

    cs.CV

    ActivityNet Challenge 2017 Summary

    Authors: Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch

    Abstract: The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary: results and challenge participants papers.

    Submitted 22 October, 2017; originally announced October 2017.

    Comments: 76 pages

  34. arXiv:1706.04269  [pdf, other

    cs.CV

    Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization

    Authors: Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem

    Abstract: State-of-the-art temporal action detectors inefficiently search the entire video for specific actions. Despite the encouraging progress these methods achieve, it is crucial to design automated approaches that only explore parts of the video which are the most relevant to the actions being searched for. To address this need, we propose the new problem of action spotting in video, which we define as… ▽ More

    Submitted 27 July, 2018; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: Accepted to ECCV 2018