Skip to main content

Showing 1–7 of 7 results for author: Bhagtani, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.17349  [pdf, other

    cs.SD eess.AS

    Comparative Analysis of ASR Methods for Speech Deepfake Detection

    Authors: Davide Salvi, Amit Kumar Singh Yadav, Kratika Bhagtani, Viola Negroni, Paolo Bestagini, Edward J. Delp

    Abstract: Recent techniques for speech deepfake detection often rely on pre-trained self-supervised models. These systems, initially developed for Automatic Speech Recognition (ASR), have proved their ability to offer a meaningful representation of speech signals, which can benefit various tasks, including deepfake detection. In this context, pre-trained models serve as feature extractors and are used to ex… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Published at Asilomar Conference on Signals, Systems, and Computers 2024

  2. arXiv:2409.13049  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    DiffSSD: A Diffusion-Based Dataset For Speech Forensics

    Authors: Kratika Bhagtani, Amit Kumar Singh Yadav, Paolo Bestagini, Edward J. Delp

    Abstract: Diffusion-based speech generators are ubiquitous. These methods can generate very high quality synthetic speech and several recent incidents report their malicious use. To counter such misuse, synthetic speech detectors have been developed. Many of these detectors are trained on datasets which do not include diffusion-based synthesizers. In this paper, we demonstrate that existing detectors traine… ▽ More

    Submitted 2 October, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

    Comments: Submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2025

  3. arXiv:2404.10989  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    FairSSD: Understanding Bias in Synthetic Speech Detectors

    Authors: Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J. Delp

    Abstract: Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024 (WMF)

  4. arXiv:2402.14205  [pdf, other

    cs.SD cs.CV cs.LG eess.AS eess.SP

    Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer

    Authors: Amit Kumar Singh Yadav, Ziyue Xiang, Kratika Bhagtani, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

    Abstract: Many deep learning synthetic speech generation tools are readily available. The use of synthetic speech has caused financial fraud, impersonation of people, and misinformation to spread. For this reason forensic methods that can detect synthetic speech have been proposed. Existing methods often overfit on one dataset and their performance reduces substantially in practical scenarios such as detect… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted as long oral paper at ICMLA 2023

  5. arXiv:2304.03323  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection

    Authors: Amit Kumar Singh Yadav, Kratika Bhagtani, Ziyue Xiang, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

    Abstract: Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these approaches use deep learning methods as a black box without providing reasoning for the decisions they make. This limits the interpretability of these approach… ▽ More

    Submitted 28 July, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

  6. arXiv:2211.14609  [pdf, other

    cs.HC

    BEAMERS: Brain-Engaged, Active Music-based Emotion Regulation System

    Authors: Jiyang Li, Wei Wang, Kratika Bhagtani, Yincheng Jin, Zhanpeng Jin

    Abstract: With the increasing demands of emotion comprehension and regulation in our daily life, a customized music-based emotion regulation system is introduced by employing current EEG information and song features, which predicts users' emotion variation in the valence-arousal model before recommending music. The work shows that: (1) a novel music-based emotion regulation system with a commercial EEG dev… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  7. arXiv:2204.12067  [pdf, other

    cs.CV cs.MM

    An Overview of Recent Work in Media Forensics: Methods and Threats

    Authors: Kratika Bhagtani, Amit Kumar Singh Yadav, Emily R. Bartusiak, Ziyue Xiang, Ruiting Shao, Sriram Baireddy, Edward J. Delp

    Abstract: In this paper, we review recent work in media forensics for digital images, video, audio (specifically speech), and documents. For each data modality, we discuss synthesis and manipulation techniques that can be used to create and modify digital media. We then review technological advancements for detecting and quantifying such manipulations. Finally, we consider open issues and suggest directions… ▽ More

    Submitted 12 May, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

    Comments: This is a longer version of a paper accepted to the 2022 IEEE International Conference on Multimedia Information Processing and Retrieval entitled "An Overview of Recent Work in Multimedia Forensics"