Skip to main content

Showing 1–30 of 30 results for author: Boussaid, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.22046   

    cs.CV

    LatentMove: Towards Complex Human Movement Video Generation

    Authors: Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Farid Boussaid, Aref Miri Rekavandi, Zinuo Li, Qiuhong Ke, Hamid Laga

    Abstract: Image-to-video (I2V) generation seeks to produce realistic motion sequences from a single reference image. Although recent methods exhibit strong temporal consistency, they often struggle when dealing with complex, non-repetitive human movements, leading to unnatural deformations. To tackle this issue, we present LatentMove, a DiT-based framework specifically tailored for highly dynamic human anim… ▽ More

    Submitted 27 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: The authors are withdrawing this paper due to major issues in the experiments and methodology. To prevent citation of this outdated and flawed version, we have decided to remove it while we work on a substantial revision. Thank you

  2. arXiv:2505.18110  [pdf, ps, other

    cs.CL

    Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM

    Authors: Zinuo Li, Xian Zhang, Yongxin Guo, Mohammed Bennamoun, Farid Boussaid, Girish Dwivedi, Luqi Gong, Qiuhong Ke

    Abstract: Humans naturally understand moments in a video by integrating visual and auditory cues. For example, localizing a scene in the video like "A scientist passionately speaks on wildlife conservation as dramatic orchestral music plays, with the audience nodding and applauding" requires simultaneous processing of visual, audio, and speech signals. However, existing models often struggle to effectively… ▽ More

    Submitted 20 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  3. arXiv:2503.03132  [pdf, other

    cs.CV

    Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis

    Authors: Awais Nizamani, Hamid Laga, Guanjin Wang, Farid Boussaid, Mohammed Bennamoun, Anuj Srivastava

    Abstract: We propose a novel framework for the statistical analysis of genus-zero 4D surfaces, i.e., 3D surfaces that deform and evolve over time. This problem is particularly challenging due to the arbitrary parameterizations of these surfaces and their varying deformation speeds, necessitating effective spatiotemporal registration. Traditionally, 4D surfaces are discretized, in space and time, before comp… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 22 pages, 23 figures, conference paper

    Journal ref: CVPR 2025

  4. arXiv:2411.08569  [pdf, other

    cs.CV

    UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation

    Authors: Chengyuan Zhang, Yilin Zhang, Lei Zhu, Deyin Liu, Lin Wu, Bo Li, Shichao Zhang, Mohammed Bennamoun, Farid Boussaid

    Abstract: This paper introduces a novel framework for unified incremental few-shot object detection (iFSOD) and instance segmentation (iFSIS) using the Transformer architecture. Our goal is to create an optimal solution for situations where only a few examples of novel object classes are available, with no access to training data for base or old classes, while maintaining high performance across both base a… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: 11 pages, 3 figures

  5. arXiv:2408.12443  [pdf, other

    cs.CV cs.AI cs.GR

    A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

    Authors: Tahmina Khanam, Hamid Laga, Mohammed Bennamoun, Guanjin Wang, Ferdous Sohel, Farid Boussaid, Guan Wang, Anuj Srivastava

    Abstract: We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVF… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  6. arXiv:2407.19205  [pdf, other

    cs.CV cs.AI

    Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions

    Authors: Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Zinuo Li, Hamid Laga, Farid Boussaid

    Abstract: This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework, focusing on their impact on video generation quality and computational efficiency. Our findings indicate that CLIP embeddings, while crucial for aesthetic quality, do not significantly contribute towards the subject and background consistency of video outputs. Moreover, the computationally… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  7. arXiv:2403.01156  [pdf, other

    cs.CV

    Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

    Authors: Lian Xu, Mohammed Bennamoun, Farid Boussaid, Wanli Ouyang, Ferdous Sohel, Dan Xu

    Abstract: Most existing weakly supervised semantic segmentation (WSSS) methods rely on Class Activation Mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pre-trained saliency model to produce more accurate pseudo-… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: Accepted at IEEE Transactions on Neural Networks and Learning Systems. arXiv admin note: substantial text overlap with arXiv:2107.11787

  8. arXiv:2402.17910  [pdf, other

    cs.CV

    Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models

    Authors: Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Hamid Laga, Farid Boussaid

    Abstract: While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  9. arXiv:2309.04902  [pdf, other

    cs.CV

    Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art

    Authors: Aref Miri Rekavandi, Shima Rashidi, Farid Boussaid, Stephen Hoefs, Emre Akbas, Mohammed bennamoun

    Abstract: Transformers have rapidly gained popularity in computer vision, especially in the field of object recognition and detection. Upon examining the outcomes of state-of-the-art object detection methods, we noticed that transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset. While transformer-based approaches remain at the forefront of small o… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  10. arXiv:2308.03005  [pdf, other

    cs.CV

    MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

    Authors: Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu

    Abstract: This paper proposes a novel transformer-based framework that aims to enhance weakly supervised semantic segmentation (WSSS) by generating accurate class-specific object localization maps as pseudo labels. Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Journal extension for MCTformer

  11. arXiv:2303.06052  [pdf, other

    cs.LG cs.AI cs.CY q-bio.NC

    Analysis and Evaluation of Explainable Artificial Intelligence on Suicide Risk Assessment

    Authors: Hao Tang, Aref Miri Rekavandi, Dharjinder Rooprai, Girish Dwivedi, Frank Sanfilippo, Farid Boussaid, Mohammed Bennamoun

    Abstract: This study investigates the effectiveness of Explainable Artificial Intelligence (XAI) techniques in predicting suicide risks and identifying the dominant causes for such behaviours. Data augmentation techniques and ML models are utilized to predict the associated risk. Furthermore, SHapley Additive exPlanations (SHAP) and correlation analysis are used to rank the importance of variables in predic… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  12. arXiv:2210.05952  [pdf, other

    eess.IV cs.CV

    3D Brain and Heart Volume Generative Models: A Survey

    Authors: Yanbin Liu, Girish Dwivedi, Farid Boussaid, Mohammed Bennamoun

    Abstract: Generative models such as generative adversarial networks and autoencoders have gained a great deal of attention in the medical field due to their excellent data generation capability. This paper provides a comprehensive survey of generative models for three-dimensional (3D) volumes, focusing on the brain and heart. A new and elaborate taxonomy of unconditional and conditional generative models is… ▽ More

    Submitted 5 December, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted at ACM Computing Surveys (CSUR) 2023

    MSC Class: 92C55 (Primary); 68U10 (Secondary) ACM Class: I.4; J.3

  13. arXiv:2209.08305  [pdf, other

    cs.CV

    Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods

    Authors: Laurent Jospin, Allen Antony, Lian Xu, Hamid Laga, Farid Boussaid, Mohammed Bennamoun

    Abstract: In stereo vision, self-similar or bland regions can make it difficult to match patches between two images. Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene so that each patch of an image pair can be identified without ambiguity. However, the projected pattern significantly alters the appearance of the image. If this pattern acts as a form of adve… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: 22 pages, 12 figures, accepted in NeurIPS 2022 Datasets and Benchmarks Track

  14. arXiv:2209.05082  [pdf, other

    cs.CV

    Bayesian Learning for Disparity Map Refinement for Semi-Dense Active Stereo Vision

    Authors: Laurent Valentin Jospin, Hamid Laga, Farid Boussaid, Mohammed Bennamoun

    Abstract: A major focus of recent developments in stereo vision has been on how to obtain accurate dense disparity maps in passive stereo vision. Active vision systems enable more accurate estimations of dense disparity compared to passive stereo. However, subpixel-accurate disparity estimation remains an open problem that has received little attention. In this paper, we propose a new learning strategy to t… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: 15 pages, 15 figures

  15. Inflating 2D Convolution Weights for Efficient Generation of 3D Medical Images

    Authors: Yanbin Liu, Girish Dwivedi, Farid Boussaid, Frank Sanfilippo, Makoto Yamada, Mohammed Bennamoun

    Abstract: The generation of three-dimensional (3D) medical images has great application potential since it takes into account the 3D anatomical structure. Two problems prevent effective training of a 3D medical generative model: (1) 3D medical images are expensive to acquire and annotate, resulting in an insufficient number of training images, and (2) a large number of parameters are involved in 3D convolut… ▽ More

    Submitted 5 December, 2023; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: Published at Computer Methods and Programs in Biomedicine (CMPB) 2023

    ACM Class: I.4; J.3

    Journal ref: Computer Methods and Programs in Biomedicine (2023): 107685

  16. arXiv:2207.13037  [pdf, other

    cs.CV

    Learning Resolution-Adaptive Representations for Cross-Resolution Person Re-Identification

    Authors: Lin Wu, Lingqiao Liu, Yang Wang, Zheng Zhang, Farid Boussaid, Mohammed Bennamoun

    Abstract: The cross-resolution person re-identification (CRReID) problem aims to match low-resolution (LR) query identity images against high resolution (HR) gallery images. It is a challenging and practical problem since the query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras. To address this problem, state-of-the-art (SOTA) solutions eith… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: Under review

  17. arXiv:2207.13036  [pdf, other

    cs.LG cs.AI cs.CR

    Jacobian Norm with Selective Input Gradient Regularization for Improved and Interpretable Adversarial Defense

    Authors: Deyin Liu, Lin Wu, Haifeng Zhao, Farid Boussaid, Mohammed Bennamoun, Xianghua Xie

    Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is often adopted to improve robustness through training a mixture of corrupted and clean data.… ▽ More

    Submitted 14 November, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: Under review

  18. arXiv:2207.13035  [pdf, other

    cs.CV

    Pseudo-Pair based Self-Similarity Learning for Unsupervised Person Re-identification

    Authors: Lin Wu, Deyin Liu, Wenying Zhang, Dapeng Chen, Zongyuan Ge, Farid Boussaid, Mohammed Bennamoun, Jialie Shen

    Abstract: Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples for supervised training. In this paper, we present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human a… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: Under review

    Journal ref: IEEE Transactions on Image Processing 2022

  19. arXiv:2207.12926  [pdf, other

    cs.CV cs.LG

    A Guide to Image and Video based Small Object Detection using Deep Learning : Case Study of Maritime Surveillance

    Authors: Aref Miri Rekavandi, Lian Xu, Farid Boussaid, Abd-Krim Seghouane, Stephen Hoefs, Mohammed Bennamoun

    Abstract: Small object detection (SOD) in optical images and videos is a challenging problem that even state-of-the-art generic object detection methods fail to accurately localize and identify such objects. Typically, small objects appear in real-world due to large camera-object distance. Because small objects occupy only a small area in the input image (e.g., less than 10%), the information extracted from… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  20. arXiv:2203.13387  [pdf, other

    cs.CV

    CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

    Authors: Mohammed Hassanin, Abdelwahed Khamiss, Mohammed Bennamoun, Farid Boussaid, Ibrahim Radwan

    Abstract: 3D human pose estimation can be handled by encoding the geometric dependencies between the body parts and enforcing the kinematic constraints. Recently, Transformer has been adopted to encode the long-range dependencies between the joints in the spatial and temporal domains. While they had shown excellence in long-range dependencies, studies have noted the need for improving the locality of vision… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  21. arXiv:2203.02891  [pdf, other

    cs.CV

    Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

    Authors: Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu

    Abstract: This paper proposes a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS). Inspired by the fact that the attended regions of the one-class token in the standard vision transformer can be leveraged to form a class-agnostic localization map, we investigate if the transformer model can also effectively ca… ▽ More

    Submitted 6 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  22. arXiv:2112.00941  [pdf, other

    cs.CV

    Generalized Closed-form Formulae for Feature-based Subpixel Alignment in Patch-based Matching

    Authors: Laurent Valentin Jospin, Farid Boussaid, Hamid Laga, Mohammed Bennamoun

    Abstract: Cost-based image patch matching is at the core of various techniques in computer vision, photogrammetry and remote sensing. When the subpixel disparity between the reference patch in the source and target images is required, either the cost function or the target image have to be interpolated. While cost-based interpolation is the easiest to implement, multiple works have shown that image based in… ▽ More

    Submitted 9 December, 2024; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: 29 pages, 10 figures

    ACM Class: I.4.8

  23. arXiv:2107.11787  [pdf, other

    cs.CV

    Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

    Authors: Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel, Dan Xu

    Abstract: Semantic segmentation is a challenging task in the absence of densely labelled data. Only relying on class activation maps (CAM) with image-level labels provides deficient segmentation supervision. Prior works thus consider pre-trained models to produce coarse saliency maps to guide the generation of pseudo segmentation labels. However, the commonly used off-line heuristic generation process canno… ▽ More

    Submitted 26 July, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: Accepted at ICCV 2021

  24. Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

    Authors: Laurent Valentin Jospin, Wray Buntine, Farid Boussaid, Hamid Laga, Mohammed Bennamoun

    Abstract: Modern deep learning methods constitute incredibly powerful tools to tackle a myriad of challenging problems. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural network predictions. This tutorial p… ▽ More

    Submitted 3 January, 2022; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: 20 pages, 13 figures

    MSC Class: 62-02 (Primary) ACM Class: G.3; I.2.6

    Journal ref: IEEE Computational Intelligence Magazine ( Volume: 17, Issue: 2, May 2022)

  25. A Survey on Deep Learning Techniques for Stereo-based Depth Estimation

    Authors: Hamid Laga, Laurent Valentin Jospin, Farid Boussaid, Mohammed Bennamoun

    Abstract: Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

  26. Automatic Hierarchical Classification of Kelps using Deep Residual Features

    Authors: Ammar Mahmood, Ana Giraldo Ospina, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid, Renae Hovey, Robert B. Fisher, Gary Kendrick

    Abstract: Across the globe, remote image data is rapidly being collected for the assessment of benthic communities from shallow to extremely deep waters on continental slopes to the abyssal seas. Exploiting this data is presently limited by the time it takes for experts to identify organisms found in these images. With this limitation in mind, a large effort has been made globally to introduce automation an… ▽ More

    Submitted 23 January, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: MDPI Sensors

    Journal ref: Sensors 2020, 20, 447

  27. arXiv:1711.05627  [pdf, ps, other

    cs.LG cs.AI

    Exploiting Layerwise Convexity of Rectifier Networks with Sign Constrained Weights

    Authors: Senjian An, Farid Boussaid, Mohammed Bennamoun, Ferdous Sohel

    Abstract: By introducing sign constraints on the weights, this paper proposes sign constrained rectifier networks (SCRNs), whose training can be solved efficiently by the well known majorization-minimization (MM) algorithms. We prove that the proposed two-hidden-layer SCRNs, which exhibit negative weights in the second hidden layer and negative weights in the output layer, are capable of separating any two… ▽ More

    Submitted 14 November, 2017; originally announced November 2017.

    Comments: 11 pages

  28. arXiv:1708.07244  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Compressive Power of Deep Rectifier Networks for High Resolution Representation of Class Boundaries

    Authors: Senjian An, Mohammed Bennamoun, Farid Boussaid

    Abstract: This paper provides a theoretical justification of the superior classification performance of deep rectifier networks over shallow rectifier networks from the geometrical perspective of piecewise linear (PWL) classifier boundaries. We show that, for a given threshold on the approximation error, the required number of boundary facets to approximate a general smooth boundary grows exponentially with… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

    Comments: 18 pages

  29. arXiv:1703.10355  [pdf, ps, other

    cs.LG stat.ML

    From Deep to Shallow: Transformations of Deep Rectifier Networks

    Authors: Senjian An, Farid Boussaid, Mohammed Bennamoun, Jiankun Hu

    Abstract: In this paper, we introduce transformations of deep rectifier networks, enabling the conversion of deep rectifier networks into shallow rectifier networks. We subsequently prove that any rectifier net of any depth can be represented by a maximum of a number of functions that can be realized by a shallow network with a single hidden layer. The transformations of both deep rectifier nets and deep re… ▽ More

    Submitted 30 March, 2017; originally announced March 2017.

    Comments: Technical Report

  30. A New Representation of Skeleton Sequences for 3D Action Recognition

    Authors: Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid

    Abstract: This paper presents a new method for 3D action recognition with skeleton sequences (i.e., 3D trajectories of human skeleton joints). The proposed method first transforms each skeleton sequence into three clips each consisting of several frames for spatial temporal feature learning using deep neural networks. Each clip is generated from one channel of the cylindrical coordinates of the skeleton seq… ▽ More

    Submitted 4 June, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

    Comments: CVPR 2017