Skip to main content

Showing 1–14 of 14 results for author: Hadji, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.04990  [pdf, ps, other

    cs.CV

    Multi-scale Image Super Resolution with a Single Auto-Regressive Model

    Authors: Enrique Sanchez, Isma Hadji, Adrian Bulat, Christos Tzelepis, Brais Martinez, Georgios Tzimiropoulos

    Abstract: In this paper we tackle Image Super Resolution (ISR), using recent advances in Visual Auto-Regressive (VAR) modeling. VAR iteratively estimates the residual in latent space between gradually increasing image scales, a process referred to as next-scale prediction. Thus, the strong priors learned during pre-training align well with the downstream task (ISR). To our knowledge, only VARSR has exploite… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Enrique Sanchez and Isma Hadji equally contributed to this work. Project site https://github.com/saic-fi/ms_sr_var

  2. arXiv:2412.06978  [pdf, other

    cs.CV

    Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning

    Authors: Mehdi Noroozi, Isma Hadji, Victor Escorcia, Anestis Zaganidis, Brais Martinez, Georgios Tzimiropoulos

    Abstract: There has been immense progress recently in the visual quality of Stable Diffusion-based Super Resolution (SD-SR). However, deploying large diffusion models on computationally restricted devices such as mobile phones remains impractical due to the large model size and high latency. This is compounded for SR as it often operates at high res (e.g. 4Kx3K). In this work, we introduce Edge-SD-SR, the f… ▽ More

    Submitted 4 April, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

    Comments: Accepted to CVPR 2025

  3. arXiv:2411.18552  [pdf, other

    cs.CV

    FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

    Authors: Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Diffusion models are proficient at generating high-quality images. They are however effective only when operating at the resolution used during training. Inference at a scaled resolution leads to repetitive patterns and structural distortions. Retraining at higher resolutions quickly becomes prohibitive. Thus, methods enabling pre-existing diffusion models to operate at flexible test-time resoluti… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  4. arXiv:2401.17258  [pdf, other

    cs.CV

    You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

    Authors: Mehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

    Abstract: In this paper, we introduce YONOS-SR, a novel stable diffusion-based approach for image super-resolution that yields state-of-the-art results using only a single DDIM step. We propose a novel scale distillation approach to train our SR model. Instead of directly training our SR model on the scale factor of interest, we start by training a teacher model on a smaller magnification scale, thereby mak… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  5. arXiv:2401.13594  [pdf, other

    cs.CL cs.AI

    Graph Guided Question Answer Generation for Procedural Question-Answering

    Authors: Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez

    Abstract: In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural t… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to EACL 2024 as long paper. 25 pages including appendix

    MSC Class: I.2.7

  6. arXiv:2310.08312  [pdf, other

    cs.CV cs.LG

    GePSAn: Generative Procedure Step Anticipation in Cooking Videos

    Authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations i… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: published at ICCV 2023

  7. arXiv:2304.13265  [pdf, other

    cs.CV

    StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

    Authors: Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson

    Abstract: Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates the need to temporally localize the instruction steps in such videos, i.e. the task called key-step localization. Traditional methods for key-step lo… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: CVPR'23

  8. arXiv:2210.04996  [pdf, other

    cs.CV cs.AI

    Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization

    Authors: Nikita Dvornik, Isma Hadji, Hai Pham, Dhaivat Bhatt, Brais Martinez, Afsaneh Fazly, Allan D. Jepson

    Abstract: In this work, we consider the problem of weakly-supervised multi-step localization in instructional videos. An established approach to this problem is to rely on a given list of steps. However, in reality, there is often more than one way to execute a procedure successfully, by following the set of steps in slightly varying orders. Thus, for successful localization in a given video, recent works r… ▽ More

    Submitted 31 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: ECCV'22, oral

    Journal ref: ECCV 2022

  9. arXiv:2205.02300  [pdf, other

    cs.CV

    P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

    Authors: He Zhao, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Richard P. Wildes, Allan D. Jepson

    Abstract: In this paper, we study the problem of procedure planning in instructional videos. Here, an agent must produce a plausible sequence of actions that can transform the environment from a given start to a desired goal state. When learning procedure planning from instructional videos, most recent work leverages intermediate visual observations as supervision, which requires expensive annotation effort… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted as an oral paper at CVPR 2022

  10. arXiv:2108.11996  [pdf, other

    cs.CV

    Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

    Authors: Nikita Dvornik, Isma Hadji, Konstantinos G. Derpanis, Animesh Garg, Allan D. Jepson

    Abstract: In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time Warping (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way i… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

  11. arXiv:2105.05217  [pdf, other

    cs.CV

    Representation Learning via Global Temporal Alignment and Cycle-Consistency

    Authors: Isma Hadji, Konstantinos G. Derpanis, Allan D. Jepson

    Abstract: We introduce a weakly supervised method for representation learning based on aligning temporal sequences (e.g., videos) of the same process (e.g., human action). The main idea is to use the global temporal ordering of latent correspondences across sequence pairs as a supervisory signal. In particular, we propose a loss based on scoring the optimal sequence alignment to train an embedding network.… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: accepted to CVPR 2021

  12. arXiv:2011.14665  [pdf, other

    cs.CV

    Why Convolutional Networks Learn Oriented Bandpass Filters: Theory and Empirical Support

    Authors: Isma Hadji, Richard P. Wildes

    Abstract: It has been repeatedly observed that convolutional architectures when applied to image understanding tasks learn oriented bandpass filters. A standard explanation of this result is that these filters reflect the structure of the images that they have been exposed to during training: Natural images typically are locally composed of oriented contours at various scales and oriented bandpass filters a… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  13. arXiv:1803.08834  [pdf, other

    cs.CV

    What Do We Understand About Convolutional Networks?

    Authors: Isma Hadji, Richard P. Wildes

    Abstract: This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.

  14. arXiv:1708.06690  [pdf, other

    cs.CV

    A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition

    Authors: Isma Hadji, Richard P. Wildes

    Abstract: This paper presents a novel hierarchical spatiotemporal orientation representation for spacetime image analysis. It is designed to combine the benefits of the multilayer architecture of ConvNets and a more controlled approach to spacetime analysis. A distinguishing aspect of the approach is that unlike most contemporary convolutional networks no learning is involved; rather, all design decisions a… ▽ More

    Submitted 22 August, 2017; originally announced August 2017.

    Comments: accepted at ICCV 2017