Skip to main content

Showing 1–21 of 21 results for author: Habibian, A

.
  1. arXiv:2506.21446  [pdf, other

    cs.CV

    Controllable 3D Placement of Objects with Scene-Aware Diffusion Models

    Authors: Mohamed Omran, Dimitris Kalatzis, Jens Petersen, Amirhossein Habibian, Auke Wiggers

    Abstract: Image editing approaches have become more powerful and flexible with the advent of powerful text-conditioned generative models. However, placing objects in an environment with a precise location and orientation still remains a challenge, as this typically requires carefully crafted inpainting masks or prompts. In this work, we show that a carefully designed visual map, combined with coarse object… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2504.17076  [pdf, other

    cs.CV

    Scene-Aware Location Modeling for Data Augmentation in Automotive Object Detection

    Authors: Jens Petersen, Davide Abati, Amirhossein Habibian, Auke Wiggers

    Abstract: Generative image models are increasingly being used for training data augmentation in vision tasks. In the context of automotive object detection, methods usually focus on producing augmented frames that look as realistic as possible, for example by replacing real objects with generated ones. Others try to maximize the diversity of augmented frames, for example by pasting lots of generated objects… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  3. arXiv:2504.16740  [pdf, other

    cs.CV

    Gaussian Splatting is an Effective Data Generator for 3D Object Detection

    Authors: Farhad G. Zanjani, Davide Abati, Auke Wiggers, Dimitris Kalatzis, Jens Petersen, Hong Cai, Amirhossein Habibian

    Abstract: We investigate data augmentation for 3D object detection in autonomous driving. We utilize recent advancements in 3D reconstruction based on Gaussian Splatting for 3D object placement in driving scenes. Unlike existing diffusion-based methods that synthesize images conditioned on BEV layouts, our approach places 3D objects directly in the reconstructed 3D space with explicitly imposed geometric tr… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  4. arXiv:2412.07583  [pdf, other

    cs.CV cs.AI

    Mobile Video Diffusion

    Authors: Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian

    Abstract: Video diffusion models have achieved impressive realism and controllability but are limited by high computational demands, restricting their use on mobile devices. This paper introduces the first mobile-optimized video diffusion model. Starting from a spatio-temporal UNet from Stable Video Diffusion (SVD), we reduce memory and computational cost by reducing the frame resolution, incorporating mult… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  5. arXiv:2412.06578  [pdf, other

    cs.CV

    MoViE: Mobile Diffusion for Video Editing

    Authors: Adil Karjauv, Noor Fathima, Ioannis Lelekas, Fatih Porikli, Amir Ghodrati, Amirhossein Habibian

    Abstract: Recent progress in diffusion-based video editing has shown remarkable potential for practical applications. However, these methods remain prohibitively expensive and challenging to deploy on mobile devices. In this study, we introduce a series of optimizations that render mobile video editing feasible. Building upon the existing image editing model, we first optimize its architecture and incorpora… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: 8 pages

  6. arXiv:2410.13564  [pdf, other

    cs.CV

    Generative Location Modeling for Spatially Aware Object Insertion

    Authors: Jooyeol Yun, Davide Abati, Mohamed Omran, Jaegul Choo, Amirhossein Habibian, Auke Wiggers

    Abstract: Generative models have become a powerful tool for image editing tasks, including object insertion. However, these methods often lack spatial awareness, generating objects with unrealistic locations and scales, or unintentionally altering the scene background. A key challenge lies in maintaining visual coherence, which requires both a geometrically suitable object location and a high-quality image… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  7. arXiv:2401.05735  [pdf, other

    cs.CV cs.LG

    Object-Centric Diffusion for Efficient Video Editing

    Authors: Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

    Abstract: Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we c… ▽ More

    Submitted 30 August, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: ECCV24

  8. arXiv:2312.08892  [pdf, other

    cs.CV

    VaLID: Variable-Length Input Diffusion for Novel View Synthesis

    Authors: Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Juergen Gall, Amirhossein Habibian

    Abstract: Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision. As this task is heavily under-constrained, some recent work, like Zero123, tries to solve this problem with generative modeling, specifically using pre-trained diffusion models. Although this strategy generalizes well… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: paper and supplementary material

  9. arXiv:2312.08128  [pdf, other

    cs.CV

    Clockwork Diffusion: Efficient Generation With Model-Step Distillation

    Authors: Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen

    Abstract: This work aims to improve the efficiency of text-to-image diffusion models. While diffusion models use computationally expensive UNet-based denoising operations in every generation step, we identify that not all operations are equally relevant for the final output quality. In particular, we observe that UNet layers operating on high-res feature maps are relatively sensitive to small perturbations.… ▽ More

    Submitted 20 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  10. arXiv:2308.09511  [pdf, other

    cs.CV

    ResQ: Residual Quantization for Video Perception

    Authors: Davide Abati, Haitam Ben Yahia, Markus Nagel, Amirhossein Habibian

    Abstract: This paper accelerates video perception, such as semantic segmentation and human pose estimation, by levering cross-frame redundancies. Unlike the existing approaches, which avoid redundant computations by warping the past features using optical-flow or by performing sparse convolutions on frame differences, we approach the problem from a new perspective: low-bit quantization. We observe that resi… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  11. arXiv:2301.02240  [pdf, other

    cs.CV

    Skip-Attention: Improving Vision Transformers by Paying Less Attention

    Authors: Shashanka Venkataramanan, Amir Ghodrati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian

    Abstract: This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers -- a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to ap… ▽ More

    Submitted 17 January, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  12. arXiv:2206.08236  [pdf, other

    cs.CV cs.LG eess.IV

    Simple and Efficient Architectures for Semantic Segmentation

    Authors: Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort

    Abstract: Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware. This paper demonstrates that a simple encoder-decoder architecture with a ResNet-like backbone and a sm… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: To be presented at Efficient Deep Learning for Computer Vision Workshop at CVPR 2022

  13. arXiv:2204.02397  [pdf, other

    cs.CV

    SALISA: Saliency-based Input Sampling for Efficient Video Object Detection

    Authors: Babak Ehteshami Bejnordi, Amirhossein Habibian, Fatih Porikli, Amir Ghodrati

    Abstract: High-resolution images are widely adopted for high-performance object detection in videos. However, processing high-resolution inputs comes with high computation costs, and naive down-sampling of the input to reduce the computation costs quickly degrades the detection performance. In this paper, we propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detecti… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 20 pages, 7 figures

  14. arXiv:2203.09594  [pdf, other

    cs.CV cs.LG

    Delta Distillation for Efficient Video Processing

    Authors: Amirhossein Habibian, Haitam Ben Yahia, Davide Abati, Efstratios Gavves, Fatih Porikli

    Abstract: This paper aims to accelerate video stream processing, such as object detection and semantic segmentation, by leveraging the temporal redundancies that exist between video frames. Instead of propagating and warping features using motion alignment, such as optical flow, we propose a novel knowledge distillation schema coined as Delta Distillation. In our proposal, the student learns the variations… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  15. arXiv:2203.01978  [pdf, other

    eess.IV cs.CV cs.LG

    Region-of-Interest Based Neural Video Compression

    Authors: Yura Perugachi-Diaz, Guillaume Sautière, Davide Abati, Yang Yang, Amirhossein Habibian, Taco S Cohen

    Abstract: Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low… ▽ More

    Submitted 2 November, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Updated arxiv version to the camera-ready version after acceptance at British Machine Vision Conference (BMVC) 2022

  16. arXiv:2104.13400  [pdf, other

    cs.CV cs.LG

    FrameExit: Conditional Early Exiting for Efficient Video Recognition

    Authors: Amir Ghodrati, Babak Ehteshami Bejnordi, Amirhossein Habibian

    Abstract: In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to reduce the computation costs, we propose to use a simple sampling strategy combined with conditional early exiting to enable efficient recognition. Our model automatically learns to process fewer frames for simpler videos and more fr… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 | Oral paper

  17. arXiv:2104.11487  [pdf, other

    cs.CV cs.LG

    Skip-Convolutions for Efficient Video Processing

    Authors: Amirhossein Habibian, Davide Abati, Taco S. Cohen, Babak Ehteshami Bejnordi

    Abstract: We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each video is represented as a series of changes across frames and network activations, denoted as residuals. We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the mode… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  18. arXiv:2004.09508  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Adversarial Distortion for Learned Video Compression

    Authors: Vijay Veerabadran, Reza Pourreza, Amirhossein Habibian, Taco Cohen

    Abstract: In this paper, we present a novel adversarial lossy video compression model. At extremely low bit-rates, standard video coding schemes suffer from unpleasant reconstruction artifacts such as blocking, ringing etc. Existing learned neural approaches to video compression have achieved reasonable success on reducing the bit-rate for efficient transmission and reduce the impact of artifacts to an exte… ▽ More

    Submitted 18 June, 2021; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: CVPR Workshops, 2020

  19. arXiv:1908.05717  [pdf, other

    eess.IV cs.LG stat.ML

    Video Compression With Rate-Distortion Autoencoders

    Authors: Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen

    Abstract: In this paper we present a a deep generative model for lossy video compression. We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding. Both autoencoder and prior are trained jointly to minimize a rate-distortion loss, which is closely related to the ELBO used in variational autoencoders. Despite its simplicity, we find… ▽ More

    Submitted 13 November, 2019; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  20. arXiv:1908.00733  [pdf, other

    cs.LG cs.CV stat.ML

    Learning Variations in Human Motion via Mix-and-Match Perturbation

    Authors: Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould, Amirhossein Habibian

    Abstract: Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In… ▽ More

    Submitted 24 February, 2020; v1 submitted 2 August, 2019; originally announced August 2019.

  21. arXiv:1511.02492  [pdf, other

    cs.CV cs.MM

    VideoStory Embeddings Recognize Events when Examples are Scarce

    Authors: Amirhossein Habibian, Thomas Mensink, Cees G. M. Snoek

    Abstract: This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between vi… ▽ More

    Submitted 8 November, 2015; originally announced November 2015.