Skip to main content

Showing 1–15 of 15 results for author: Stefanov, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.15068  [pdf, ps, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction

    Authors: Hrishav Bakul Barua, Kalin Stefanov, Lemuel Lai En Che, Abhinav Dhall, KokSheik Wong, Ganesh Krishnasamy

    Abstract: The translation of Low Dynamic Range (LDR) to High Dynamic Range (HDR) images is an important computer vision task. There is a significant amount of research utilizing both conventional non-learning methods and modern data-driven approaches, focusing on using both single-exposed and multi-exposed LDR for HDR image reconstruction. However, most current state-of-the-art methods require high-quality… ▽ More

    Submitted 8 June, 2025; v1 submitted 19 October, 2024; originally announced October 2024.

    MSC Class: Artificial intelligence; Computer vision; Machine learning; Deep learning ACM Class: I.3.3; I.4.5

  2. arXiv:2409.06991  [pdf, other

    cs.CV

    1M-Deepfakes Detection Challenge

    Authors: Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq

    Abstract: The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This ch… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: ACM MM 2024. Challenge webpage: https://deepfakes1m.github.io/

  3. arXiv:2403.17837  [pdf, other

    cs.CV cs.GR cs.LG cs.MM eess.IV

    GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

    Authors: Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, Ganesh Krishnasamy

    Abstract: High Dynamic Range (HDR) content (i.e., images and videos) has a broad range of applications. However, capturing HDR content from real-world scenes is expensive and time-consuming. Therefore, the challenging task of reconstructing visually accurate HDR images from their Low Dynamic Range (LDR) counterparts is gaining attention in the vision research community. A major challenge in this research pr… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE

    MSC Class: Artificial intelligence; Computer vision; Machine learning; Deep learning ACM Class: I.3.3; I.4.5

  4. arXiv:2402.14982  [pdf, other

    cs.SD cs.LG eess.AS q-bio.NC

    Human Brain Exhibits Distinct Patterns When Listening to Fake Versus Real Audio: Preliminary Evidence

    Authors: Mahsa Salehi, Kalin Stefanov, Ehsan Shareghi

    Abstract: In this paper we study the variations in human brain activity when listening to real and fake audio. Our preliminary results suggest that the representations learned by a state-of-the-art deepfake audio detection algorithm, do not exhibit clear distinct patterns between real and fake audio. In contrast, human brain activity, as measured by EEG, displays distinct patterns when individuals are expos… ▽ More

    Submitted 8 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 9 pages, 4 figures, 3 tables

  5. arXiv:2402.06692  [pdf, other

    eess.IV cs.AI cs.CV cs.GR cs.LG cs.MM

    HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation

    Authors: Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Abhinav Dhall, Kalin Stefanov

    Abstract: High Dynamic Range (HDR) imaging aims to replicate the high visual quality and clarity of real-world scenes. Due to the high costs associated with HDR imaging, the literature offers various data-driven methods for HDR image reconstruction from Low Dynamic Range (LDR) counterparts. A common limitation of these approaches is missing details in regions of the reconstructed HDR images, which are over-… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE

    MSC Class: Artificial intelligence; Computer vision; Machine learning; Deep learning ACM Class: I.3.3; I.4.5

  6. arXiv:2311.15308  [pdf, other

    cs.CV

    AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

    Authors: Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Tom Gedeon, Kalin Stefanov

    Abstract: The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In t… ▽ More

    Submitted 29 July, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: Accepted by ACM MM 2024

  7. arXiv:2309.03827  [pdf

    cs.CV cs.GR cs.LG cs.MM eess.IV

    ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation

    Authors: Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Kalin Stefanov, Abhinav Dhall

    Abstract: High Dynamic Range (HDR) content creation has become an important topic for modern media and entertainment sectors, gaming and Augmented/Virtual Reality industries. Many methods have been proposed to recreate the HDR counterparts of input Low Dynamic Range (LDR) images/videos given a single exposure or multi-exposure LDRs. The state-of-the-art methods focus primarily on the preservation of the rec… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted in Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan

    ACM Class: I.2.10; I.4.5; I.3.3; I.4.3

  8. arXiv:2307.06701  [pdf, other

    cs.CV cs.AI cs.LG

    S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

    Authors: Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi

    Abstract: We address the video prediction task by putting forth a novel model that combines (i) a novel hierarchical residual learning vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel autoregressive spatiotemporal predictive model (AST-PM). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the in… ▽ More

    Submitted 19 November, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: 12 pages, 6 figures, 5 tables. Accepted for publication on IEEE Transactions on Multimedia on 2024-11-19

    ACM Class: I.2.10; I.4.10; I.4.5; I.4.2; I.2.6

  9. arXiv:2305.01979  [pdf, other

    cs.CV

    Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

    Authors: Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov, Munawar Hayat

    Abstract: Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes and are centered around the binary classification task of detecting whether a video is real or fake. This is because available benchmark datasets contain mostly visual-only modifications present in the entirety of the video. However, a sophisticated deepfake may include small segments of… ▽ More

    Submitted 16 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: The paper is under consideration/review at Computer Vision and Image Understanding Journal

  10. arXiv:2211.06627  [pdf, other

    cs.CV

    MARLIN: Masked Autoencoder for facial video Representation LearnINg

    Authors: Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

    Abstract: This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust… ▽ More

    Submitted 22 March, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: CVPR 2023

  11. arXiv:2208.04554  [pdf, other

    cs.CV cs.LG

    Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation

    Authors: Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi

    Abstract: We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data. By utilizing a novel objective function, each layer in HR-VQVAE learns a discrete representation of the residual from previous layers through a vector quantized encoder. Furthermore, the representations at each layer are hierarchically linked to those at previou… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

    Comments: 12 pages plus supplementary material. Submitted to BMVC 2022

    ACM Class: I.4; I.2

  12. arXiv:2207.08380  [pdf, other

    cs.CV

    Visual Representations of Physiological Signals for Fake Video Detection

    Authors: Kalin Stefanov, Bhawna Paliwal, Abhinav Dhall

    Abstract: Realistic fake videos are a potential tool for spreading harmful misinformation given our increasing online presence and information intake. This paper presents a multimodal learning-based method for detection of real and fake videos. The method combines information from three modalities - audio, video, and physiology. We investigate two strategies for combining the video and physiology modalities… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  13. arXiv:2204.06228  [pdf, other

    cs.CV

    Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

    Authors: Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat

    Abstract: Due to its high societal impact, deepfake detection is getting active attention in the computer vision community. Most deepfake detection methods rely on identity, facial attributes, and adversarial perturbation-based spatio-temporal modifications at the whole video or random locations while keeping the meaning of the content intact. However, a sophisticated deepfake may contain only a small segme… ▽ More

    Submitted 3 May, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: DICTA 2022

  14. arXiv:1803.11088  [pdf, other

    cs.CV cs.HC

    Webcam-based Eye Gaze Tracking under Natural Head Movement

    Authors: Kalin Stefanov

    Abstract: This manuscript investigates and proposes a visual gaze tracker that tackles the problem using only an ordinary web camera and no prior knowledge in any sense (scene set-up, camera intrinsic and/or extrinsic parameters). The tracker we propose is based on the observation that our desire to grant the freedom of natural head movement to the user requires 3D modeling of the scene set-up. Although, us… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: MSc Thesis in Artificial Intelligence

  15. arXiv:1711.08992  [pdf, other

    cs.CV cs.CL cs.HC cs.LG stat.ML

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Authors: Kalin Stefanov, Jonas Beskow, Giampiero Salvi

    Abstract: This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robus… ▽ More

    Submitted 18 July, 2019; v1 submitted 24 November, 2017; originally announced November 2017.

    Comments: 10 pages, IEEE Transactions on Cognitive and Developmental Systems

    ACM Class: I.2; I.4; I.5