Skip to main content

Showing 1–38 of 38 results for author: Thabet, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.11295  [pdf, other

    cs.CV

    Autoregressive Distillation of Diffusion Transformers

    Authors: Yeongmin Kim, Sotiris Anagnostidis, Yuming Du, Edgar Schönfeld, Jonas Kohler, Markos Georgopoulos, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu

    Abstract: Diffusion models with transformer architectures have demonstrated promising capabilities in generating high-fidelity images and scalability for high resolution. However, iterative sampling process required for synthesis is very resource-intensive. A line of work has focused on distilling solutions to probability flow ODEs into few-step student models. Nevertheless, existing methods have been limit… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: CVPR 2025 Oral

  2. arXiv:2502.20126  [pdf, other

    cs.LG cs.CV

    FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

    Authors: Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim, Jonas Kohler, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Albert Pumarola, Ali Thabet, Edgar Schönfeld

    Abstract: Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step. In this work, we revisit the conventional static paradigm that allocates a fixed compute budget per denoising iteration and propose a dynamic strategy instead. Our simple and sampl… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  3. arXiv:2501.19309  [pdf, other

    cs.LG cs.CL

    Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

    Authors: Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Schönfeld, Ali Thabet, Jonas Kohler

    Abstract: The performance of large language models (LLMs) is closely linked to their underlying size, leading to ever-growing networks and hence slower inference. Speculative decoding has been proposed as a technique to accelerate autoregressive generation, leveraging a fast draft model to propose candidate tokens, which are then verified in parallel based on their likelihood under the target model. While t… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  4. arXiv:2410.13720  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Movie Gen: A Cast of Media Foundation Models

    Authors: Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le , et al. (63 additional authors not shown)

    Abstract: We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization,… ▽ More

    Submitted 26 February, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

  5. arXiv:2405.05224  [pdf, other

    cs.CV

    Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation

    Authors: Jonas Kohler, Albert Pumarola, Edgar Schönfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet

    Abstract: Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach compris… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  6. arXiv:2403.01329  [pdf, other

    cs.LG cs.AI cs.CV

    Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models

    Authors: Neta Shaul, Uriel Singer, Ricky T. Q. Chen, Matthew Le, Ali Thabet, Albert Pumarola, Yaron Lipman

    Abstract: This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models. BNS solvers are based on a family of non-stationary solvers that provably subsumes existing numerical ODE solvers and consequently demonstrate considerable improvement in sample approximation (PSNR) over these baselines. Compared to model distillatio… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  7. arXiv:2402.06088  [pdf, other

    cs.CV

    Animated Stickers: Bringing Stickers to Life with Video Diffusion

    Authors: David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu

    Abstract: We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static sticker image. Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion. Due to the domain gap, i.e. differences in visual and motion style, a model which performed well on generating natural videos can n… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  8. arXiv:2312.16109  [pdf, other

    cs.CV cs.LG

    fMPI: Fast Novel View Synthesis in the Wild with Layered Scene Representations

    Authors: Jonas Kohler, Nicolas Griffiths Sanchez, Luca Cavalli, Catherine Herold, Albert Pumarola, Alberto Garcia Garcia, Ali Thabet

    Abstract: In this study, we propose two novel input processing paradigms for novel view synthesis (NVS) methods based on layered scene representations that significantly improve their runtime without compromising quality. Our approach identifies and mitigates the two most time-consuming aspects of traditional pipelines: building and processing the so-called plane sweep volume (PSV), which is a high-dimensio… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  9. arXiv:2312.12487  [pdf, other

    cs.LG cs.AI

    Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

    Authors: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet

    Abstract: This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search fr… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  10. arXiv:2310.19075  [pdf, other

    cs.LG cs.AI cs.CV

    Bespoke Solvers for Generative Flow Models

    Authors: Neta Shaul, Juan Perez, Ricky T. Q. Chen, Ali Thabet, Albert Pumarola, Yaron Lipman

    Abstract: Diffusion or flow-based models are powerful generative paradigms that are notoriously hard to sample as samples are defined as solutions to high-dimensional Ordinary or Stochastic Differential Equations (ODEs/SDEs) which require a large Number of Function Evaluations (NFE) to approximate well. Existing methods to alleviate the costly sampling process include model distillation and designing dedica… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

  11. arXiv:2304.11118  [pdf, other

    cs.CV cs.AI

    BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis

    Authors: Angela Castillo, Maria Escobar, Guillaume Jeanneret, Albert Pumarola, Pablo Arbeláez, Ali Thabet, Artsiom Sanakoyeu

    Abstract: Mixed reality applications require tracking the user's full-body motion to enable an immersive experience. However, typical head-mounted devices can only track head and hand movements, leading to a limited reconstruction of full-body motion due to variability in lower body configurations. We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained r… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  12. arXiv:2304.08577  [pdf, other

    cs.CV

    Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model

    Authors: Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu

    Abstract: With the recent surge in popularity of AR/VR applications, realistic and accurate control of 3D full-body avatars has become a highly demanded feature. A particular challenge is that only a sparse tracking signal is available from standalone HMDs (Head Mounted Devices), often limited to tracking the user's head and wrists. While this signal is resourceful for reconstructing the upper body motion,… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: CVPR 2023, project page: https://dulucas.github.io/agrol/

  13. arXiv:2303.14569  [pdf, other

    cs.CV

    VisCo Grids: Surface Reconstruction with Viscosity and Coarea Grids

    Authors: Albert Pumarola, Artsiom Sanakoyeu, Lior Yariv, Ali Thabet, Yaron Lipman

    Abstract: Surface reconstruction has been seeing a lot of progress lately by utilizing Implicit Neural Representations (INRs). Despite their success, INRs often introduce hard to control inductive bias (i.e., the solution surface can exhibit unexplainable behaviours), have costly inference, and are slow to train. The goal of this work is to show that replacing neural networks with simple grid functions, alo… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: Published in NeurIPS 2022

  14. arXiv:2303.08717  [pdf, other

    cs.CV cs.GR

    Re-ReND: Real-time Rendering of NeRFs across Devices

    Authors: Sara Rojas, Jesus Zarzar, Juan Camilo Perez, Artsiom Sanakoyeu, Ali Thabet, Albert Pumarola, Bernard Ghanem

    Abstract: This paper proposes a novel approach for rendering a pre-trained Neural Radiance Field (NeRF) in real-time on resource-constrained devices. We introduce Re-ReND, a method enabling Real-time Rendering of NeRFs across Devices. Re-ReND is designed to achieve real-time performance by converting the NeRF into a representation that can be efficiently processed by standard graphics pipelines. The propose… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  15. arXiv:2202.04978  [pdf, other

    cs.CV

    Towards Assessing and Characterizing the Semantic Robustness of Face Recognition

    Authors: Juan C. Pérez, Motasem Alfarra, Ali Thabet, Pablo Arbeláez, Bernard Ghanem

    Abstract: Deep Neural Networks (DNNs) lack robustness against imperceptible perturbations to their input. Face Recognition Models (FRMs) based on DNNs inherit this vulnerability. We propose a methodology for assessing and characterizing the robustness of FRMs against semantic perturbations to their input. Our methodology causes FRMs to malfunction by designing adversarial attacks that search for identity-pr… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 26 pages, 18 figures

  16. arXiv:2112.02522  [pdf, other

    eess.IV cs.CV

    Snapshot HDR Video Construction Using Coded Mask

    Authors: Masheal Alghamdi, Qiang Fu, Ali Thabet, Wolfgang Heidrich

    Abstract: This paper study the reconstruction of High Dynamic Range (HDR) video from snapshot-coded LDR video. Constructing an HDR video requires restoring the HDR values for each frame and maintaining the consistency between successive frames. HDR image acquisition from single image capture, also known as snapshot HDR imaging, can be achieved in several ways. For example, the reconfigurable snapshot HDR ca… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: 13 pages, 7 figures

  17. arXiv:2110.10538  [pdf, other

    cs.CV cs.LG

    ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning

    Authors: Guocheng Qian, Hasan Abed Al Kader Hammoud, Guohao Li, Ali Thabet, Bernard Ghanem

    Abstract: Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet++, one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first pre… ▽ More

    Submitted 24 October, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: ASSANet gets accepted to NeurIPS'21 as a Spotlight paper. code available at https://github.com/guochengqian/ASSANet

  18. arXiv:2109.05569  [pdf, other

    cs.CV

    MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

    Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

    Abstract: Understanding movies and their structural patterns is a crucial task in decoding the craft of video editing. While previous works have developed tools for general analysis, such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the Cut type recognition task, whi… ▽ More

    Submitted 24 October, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: Paper's website: https://www.alejandropardo.net/publication/moviecuts/

    Journal ref: ECCV 2022

  19. arXiv:2108.04294  [pdf, other

    cs.CV cs.MM

    Learning to Cut by Watching Movies

    Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

    Abstract: Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea i… ▽ More

    Submitted 29 September, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Accepted at ICCV2021. Paper website: https://alejandropardo.net/publication/learning-to-cut/

  20. arXiv:2107.14110  [pdf, other

    cs.LG cs.CR cs.CV

    Enhancing Adversarial Robustness via Test-time Transformation Ensembling

    Authors: Juan C. Pérez, Motasem Alfarra, Guillaume Jeanneret, Laura Rueda, Ali Thabet, Bernard Ghanem, Pablo Arbeláez

    Abstract: Deep learning models are prone to being fooled by imperceptible perturbations known as adversarial attacks. In this work, we study how equipping models with Test-time Transformation Ensembling (TTE) can work as a reliable defense against such attacks. While transforming the input data, both at train and test times, is known to enhance model performance, its effects on adversarial robustness have n… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

  21. arXiv:2103.14347  [pdf, other

    cs.LG cs.CV

    Combating Adversaries with Anti-Adversaries

    Authors: Motasem Alfarra, Juan C. Pérez, Ali Thabet, Adel Bibi, Philip H. S. Torr, Bernard Ghanem

    Abstract: Deep neural networks are vulnerable to small input perturbations known as adversarial attacks. Inspired by the fact that these adversaries are constructed by iteratively minimizing the confidence of a network for the true class label, we propose the anti-adversary layer, aimed at countering this effect. In particular, our layer generates an input perturbation in the opposite direction of the adver… ▽ More

    Submitted 16 December, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: Accepted to AAAI Conference on Artificial Intelligence (AAAI'22)

  22. arXiv:2101.03682  [pdf, other

    cs.CV

    MAAS: Multi-modal Assignation for Active Speaker Detection

    Authors: Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

    Abstract: Active speaker detection requires a solid integration of multi-modal cues. While individual modalities can approximate a solution, accurate predictions can only be achieved by explicitly fusing the audio and visual features and modeling their temporal progression. Despite its inherent muti-modal nature, current methods still focus on modeling and fusing short-term audiovisual features for individu… ▽ More

    Submitted 5 October, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

  23. arXiv:2012.14929  [pdf, other

    cs.CV

    SALA: Soft Assignment Local Aggregation for Parameter Efficient 3D Semantic Segmentation

    Authors: Hani Itani, Silvio Giancola, Ali Thabet, Bernard Ghanem

    Abstract: In this work, we focus on designing a point local aggregation function that yields parameter efficient networks for 3D point cloud semantic segmentation. We explore the idea of using learnable neighbor-to-grid soft assignment in grid-based aggregation functions. Previous methods in literature operate on a predefined geometric grid such as local volume partitions or irregular kernel points. A more… ▽ More

    Submitted 5 April, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

  24. arXiv:2011.14598  [pdf, other

    cs.CV

    Video Self-Stitching Graph Network for Temporal Action Localization

    Authors: Chen Zhao, Ali Thabet, Bernard Ghanem

    Abstract: Temporal action localization (TAL) in videos is a challenging task, especially due to the large variation in action temporal scales. Short actions usually occupy a major proportion in the datasets, but tend to have the lowest performance. In this paper, we confront the challenge of short actions and propose a multi-level cross-scale solution dubbed as video self-stitching graph network (VSGN). We… ▽ More

    Submitted 30 March, 2024; v1 submitted 30 November, 2020; originally announced November 2020.

    Journal ref: Proceedings of the IEEE International Conference on Computer Vision (ICCV) 2021

  25. arXiv:2008.10309  [pdf, other

    cs.CV cs.AI

    LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks

    Authors: Guohao Li, Mengmeng Xu, Silvio Giancola, Ali Thabet, Bernard Ghanem

    Abstract: Point cloud architecture design has become a crucial problem for 3D deep learning. Several efforts exist to manually design architectures with high accuracy in point cloud tasks such as classification, segmentation, and detection. Recent progress in automatic Neural Architecture Search (NAS) minimizes the human effort in network design and optimizes high performing architectures. However, these ef… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: Originally submitted to ECCV'2020 but rejected. This work was filed with the United States Patent and Trademark Office (USPTO) on May 19, 2020 and assigned Serial No. 63/027,241

  26. arXiv:2006.07739  [pdf, other

    cs.LG cs.CV stat.ML

    DeeperGCN: All You Need to Train Deeper GCNs

    Authors: Guohao Li, Chenxin Xiong, Ali Thabet, Bernard Ghanem

    Abstract: Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs. Unlike Convolutional Neural Networks (CNNs), which are able to take advantage of stacking very deep layers, GCNs suffer from vanishing gradient, over-smoothing and over-fitting issues when going deeper. These challenges limit the representation power of GCNs on large-sca… ▽ More

    Submitted 13 June, 2020; originally announced June 2020.

    Comments: This work is still working in process. More results will be updated in the future version. Project website: https://www.deepgcns.org

  27. arXiv:2006.07682  [pdf, other

    cs.LG stat.ML

    Rethinking Clustering for Robustness

    Authors: Motasem Alfarra, Juan C. Pérez, Adel Bibi, Ali Thabet, Pablo Arbeláez, Bernard Ghanem

    Abstract: This paper studies how encouraging semantically-aligned features during deep neural network training can increase network robustness. Recent works observed that Adversarial Training leads to robust models, whose learnt features appear to correlate with human perception. Inspired by this connection from robustness to semantics, we study the complementary connection: from semantics to robustness. To… ▽ More

    Submitted 19 November, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: Accepted to the 32nd British Machine Vision Conference (BMVC'21)

  28. arXiv:1912.05661  [pdf, other

    cs.CV

    Gabor Layers Enhance Network Robustness

    Authors: Juan C. Pérez, Motasem Alfarra, Guillaume Jeanneret, Adel Bibi, Ali Thabet, Bernard Ghanem, Pablo Arbeláez

    Abstract: We revisit the benefits of merging classical vision concepts with deep learning models. In particular, we explore the effect on robustness against adversarial attacks of replacing the first layers of various deep architectures with Gabor layers, i.e. convolutional layers with filters that are based on learnable Gabor parameters. We observe that architectures enhanced with Gabor layers gain a consi… ▽ More

    Submitted 27 March, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: 32 pages, 23 figures, 14 tables

  29. arXiv:1912.03264  [pdf, other

    cs.CV cs.CG cs.LG

    PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks

    Authors: Guocheng Qian, Abdulellah Abualshour, Guohao Li, Ali Thabet, Bernard Ghanem

    Abstract: The effectiveness of learning-based point cloud upsampling pipelines heavily relies on the upsampling modules and feature extractors used therein. For the point upsampling module, we propose a novel model called NodeShuffle, which uses a Graph Convolutional Network (GCN) to better encode local point information from point neighborhoods. NodeShuffle is versatile and can be incorporated into any poi… ▽ More

    Submitted 29 March, 2021; v1 submitted 30 November, 2019; originally announced December 2019.

    Comments: Get accepted to CVPR 2021. The source code of this work is available at https://github.com/guochengqian/PU-GCN

  30. AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds

    Authors: Abdullah Hamdi, Sara Rojas, Ali Thabet, Bernard Ghanem

    Abstract: Deep neural networks are vulnerable to adversarial attacks, in which imperceptible perturbations to their input lead to erroneous network predictions. This phenomenon has been extensively studied in the image domain, and has only recently been extended to 3D point clouds. In this work, we present novel data-driven adversarial attacks against 3D point cloud networks. We aim to address the following… ▽ More

    Submitted 16 July, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

    Comments: Presented at European conference on computer vision (ECCV), 2020. The code is available at https://github.com/ajhamdi/AdvPC

    MSC Class: 68T45

    Journal ref: ECCV 2020

  31. arXiv:1912.00195  [pdf, other

    cs.LG cs.CV stat.ML

    SGAS: Sequential Greedy Architecture Search

    Authors: Guohao Li, Guocheng Qian, Itzel C. Delgadillo, Matthias Müller, Ali Thabet, Bernard Ghanem

    Abstract: Architecture design has become a crucial component of successful deep learning. Recent progress in automatic neural architecture search (NAS) shows a lot of promise. However, discovered architectures often fail to generalize in the final evaluation. Architectures with a higher validation accuracy during the search phase may perform worse in the evaluation. Aiming to alleviate this common issue, we… ▽ More

    Submitted 2 April, 2020; v1 submitted 30 November, 2019; originally announced December 2019.

    Comments: Accepted at CVPR'2020. Project website: https://www.deepgcns.org/auto/sgas

  32. arXiv:1911.11462  [pdf, other

    cs.CV

    G-TAD: Sub-Graph Localization for Temporal Action Detection

    Authors: Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem

    Abstract: Temporal action detection is a fundamental yet challenging task in video understanding. Video context is a critical cue to effectively detect actions, but current works mainly focus on temporal context, while neglecting semantic context as well as other important context properties. In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic co… ▽ More

    Submitted 2 April, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: Accepted by CVPR2020. 8 pages, 9 figures, 2 pages appendix

  33. arXiv:1910.06849  [pdf, other

    cs.CV cs.LG eess.IV

    DeepGCNs: Making GCNs Go as Deep as CNNs

    Authors: Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem

    Abstract: Convolutional Neural Networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling factor for their great performance has been the ability to train very deep networks. Despite their huge success in many tasks, CNNs do not work well with non-Eucl… ▽ More

    Submitted 14 May, 2021; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: Accepted at TPAMI. This work is a journal extension of our ICCV'19 paper arXiv:1904.03751. The first three authors contributed equally

  34. arXiv:1904.05847  [pdf, other

    cs.CV

    MAIN: Multi-Attention Instance Network for Video Segmentation

    Authors: Juan Leon Alcazar, Maria A. Bravo, Ali K. Thabet, Guillaume Jeanneret, Thomas Brox, Pablo Arbelaez, Bernard Ghanem

    Abstract: Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Netwo… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

  35. arXiv:1904.05443  [pdf, other

    cs.CV

    BAOD: Budget-Aware Object Detection

    Authors: Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem

    Abstract: We study the problem of object detection from a novel perspective in which annotation budget constraints are taken into consideration, appropriately coined Budget Aware Object Detection (BAOD). When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-b… ▽ More

    Submitted 9 August, 2021; v1 submitted 10 April, 2019; originally announced April 2019.

  36. arXiv:1904.03751  [pdf, other

    cs.CV cs.LG

    DeepGCNs: Can GCNs Go as Deep as CNNs?

    Authors: Guohao Li, Matthias Müller, Ali Thabet, Bernard Ghanem

    Abstract: Convolutional Neural Networks (CNNs) achieve impressive performance in a wide variety of fields. Their success benefited from a massive boost when very deep CNN models were able to be reliably trained. Despite their merits, CNNs fail to properly address problems with non-Euclidean data. To overcome this challenge, Graph Convolutional Networks (GCNs) build graphs to represent non-Euclidean data, bo… ▽ More

    Submitted 19 August, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: First two authors contributed equally. Accepted to ICCV'19 as oral presentation

  37. arXiv:1904.00230  [pdf, other

    cs.CV

    MortonNet: Self-Supervised Learning of Local Features in 3D Point Clouds

    Authors: Ali Thabet, Humam Alwassel, Bernard Ghanem

    Abstract: We present a self-supervised task on point clouds, in order to learn meaningful point-wise features that encode local structure around each point. Our self-supervised network, named MortonNet, operates directly on unstructured/unordered point clouds. Using a multi-layer RNN, MortonNet predicts the next point in a point sequence created by a popular and fast Space Filling Curve, the Morton-order cu… ▽ More

    Submitted 30 March, 2019; originally announced April 2019.

  38. arXiv:1904.00227  [pdf, other

    cs.CV

    RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization

    Authors: Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

    Abstract: Video action detectors are usually trained using datasets with fully-supervised temporal annotations. Building such datasets is an expensive task. To alleviate this problem, recent methods have tried to leverage weak labeling, where videos are untrimmed and only a video-level label is available. In this paper, we propose RefineLoc, a novel weakly-supervised temporal action localization method. Ref… ▽ More

    Submitted 8 November, 2020; v1 submitted 30 March, 2019; originally announced April 2019.

    Comments: Accepted to WACV 2021. Project website: http://humamalwassel.com/publication/refineloc