Skip to main content

Showing 1–6 of 6 results for author: Koppisetti, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.03425  [pdf, ps, other

    eess.AS cs.AI cs.LG

    A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations

    Authors: Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj

    Abstract: Evaluating explainability techniques, such as SHAP and LRP, in the context of audio deepfake detection is challenging due to lack of clear ground truth annotations. In the cases when we are able to obtain the ground truth, we find that these methods struggle to provide accurate explanations. In this work, we propose a novel data-driven approach to identify artifact regions in deepfake audio. We co… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 5 pages, 3 figures, accepted at Interspeech 2025

  2. What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain

    Authors: Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj

    Abstract: Adding explanations to audio deepfake detection (ADD) models will boost their real-world application by providing insight on the decision making process. In this paper, we propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models. We compare against standard Grad-CAM and SHAP-based methods, using quantitative faithfulness metrics as well as a… ▽ More

    Submitted 27 January, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: Accepted to ICASSP 2025

    Journal ref: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2025, pp. 1-5

  3. arXiv:2410.07379  [pdf, other

    eess.AS cs.AI cs.CL

    Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge

    Authors: Yi Zhu, Chirag Goel, Surya Koppisetti, Trang Tran, Ankur Kumar, Gaurav Bharaj

    Abstract: Audio deepfake detection is crucial to combat the malicious use of AI-synthesized speech. Among many efforts undertaken by the community, the ASVspoof challenge has become one of the benchmarks to evaluate the generalizability and robustness of detection models. In this paper, we present Reality Defender's submission to the ASVspoof5 challenge, highlighting a novel pretraining strategy which signi… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted into ASVspoof5 workshop

  4. arXiv:2407.18517  [pdf, other

    cs.SD cs.AI eess.AS

    SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

    Authors: Yi Zhu, Surya Koppisetti, Trang Tran, Gaurav Bharaj

    Abstract: Audio deepfake detection (ADD) is crucial to combat the misuse of speech synthesized from generative AI models. Existing ADD models suffer from generalization issues, with a large performance discrepancy between in-domain and out-of-domain data. Moreover, the black-box nature of existing models limits their use in real-world scenarios, where explanations are required for model decisions. To allevi… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  5. Towards Attention-based Contrastive Learning for Audio Spoof Detection

    Authors: Chirag Goel, Surya Koppisetti, Ben Colman, Ali Shahriyari, Gaurav Bharaj

    Abstract: Vision transformers (ViT) have made substantial progress for classification tasks in computer vision. Recently, Gong et. al. '21, introduced attention-based modeling for several audio tasks. However, relatively unexplored is the use of a ViT for audio spoof detection task. We bridge this gap and introduce ViTs for this task. A vanilla baseline built on fine-tuning the SSAST (Gong et. al. '22) audi… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Proc. INTERSPEECH 2023

  6. arXiv:2406.02951  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection

    Authors: Trevine Oorloff, Surya Koppisetti, Nicolò Bonettini, Divyaraj Solanki, Ben Colman, Yaser Yacoob, Ali Shahriyari, Gaurav Bharaj

    Abstract: With the rapid growth in deepfake video content, we require improved and generalizable methods to detect them. Most existing detection methods either use uni-modal cues or rely on supervised training to capture the dissonance between the audio and visual modalities. While the former disregards the audio-visual correspondences entirely, the latter predominantly focuses on discerning audio-visual cu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024