Search | arXiv e-print repository

Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol

Authors: Konstantinos Apostolidis, Jakob Abesser, Luca Cuccovillo, Vasileios Mezaris

Abstract: This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and… ▽ More This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted for publication, 3rd ACM Int. Workshop on Multimedia AI against Disinformation (MAD'24) at ACM ICMR'24, June 10, 2024, Phuket, Thailand. This is the "accepted version"

arXiv:2209.07196 [pdf, other]

doi 10.1109/WIFS55849.2022.9975411

Environment Classification via Blind Roomprints Estimation

Authors: Malte Baum, Luca Cuccovillo, Artem Yaroshchuk, Patrick Aichroth

Abstract: In this paper we present a novel approach for environment classification for speech recordings, which does not require the selection of decaying reverberation tails. It is based on a multi-band RT60 analysis of blind channel estimates and achieves an accuracy of up to 93.6% on test recordings derived from the ACE corpus. In this paper we present a novel approach for environment classification for speech recordings, which does not require the selection of decaying reverberation tails. It is based on a multi-band RT60 analysis of blind channel estimates and achieves an accuracy of up to 93.6% on test recordings derived from the ACE corpus. △ Less

Submitted 26 January, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Journal ref: in IEEE International Workshop on Information Forensics and Security (WIFS), December 12-16, 2022, Shanghai, China, pp.1-6

arXiv:2209.07180 [pdf, ps, other]

doi 10.1109/WIFS55849.2022.9975433

Open Challenges in Synthetic Speech Detection

Authors: Luca Cuccovillo, Christoforos Papastergiopoulos, Anastasios Vafeiadis, Artem Yaroshchuk, Patrick Aichroth, Konstantinos Votis, Dimitrios Tzovaras

Abstract: In this paper the current status and open challenges of synthetic speech detection are addressed. The work comprises an initial analysis of available open datasets and of existing detection methods, a description of the requirements for new research datasets compliant with regulations and better representing real-case scenarios, and a discussion of the desired characteristics of future trustworthy… ▽ More In this paper the current status and open challenges of synthetic speech detection are addressed. The work comprises an initial analysis of available open datasets and of existing detection methods, a description of the requirements for new research datasets compliant with regulations and better representing real-case scenarios, and a discussion of the desired characteristics of future trustworthy detection methods in terms of both functional and non-functional requirements. Compared to other works, based on specific detection solutions or presenting single dataset of synthetic speeches, our paper is meant to orient future state-of-the-art research in the domain, to quickly lessen the current gap between synthesis and detection approaches. △ Less

Submitted 26 January, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Journal ref: in IEEE International Workshop on Information Forensics and Security (WIFS), December 12-16, 2022, Shanghai, China, pp.1-6

arXiv:2206.11640 [pdf, other]

doi 10.23919/EUSIPCO55093.2022.9909800

Speaker-Independent Microphone Identification in Noisy Conditions

Authors: Antonio Giganti, Luca Cuccovillo, Paolo Bestagini, Patrick Aichroth, Stefano Tubaro

Abstract: This work proposes a method for source device identification from speech recordings that applies neural-network-based denoising, to mitigate the impact of counter-forensics attacks using noise injection. The method is evaluated by comparing the impact of denoising on three state-of-the-art features for microphone classification, determining their discriminating power with and without denoising bei… ▽ More This work proposes a method for source device identification from speech recordings that applies neural-network-based denoising, to mitigate the impact of counter-forensics attacks using noise injection. The method is evaluated by comparing the impact of denoising on three state-of-the-art features for microphone classification, determining their discriminating power with and without denoising being applied. The proposed framework achieves a significant performance increase for noisy material, and more generally, validates the usefulness of applying denoising prior to device identification for noisy recordings. △ Less

Submitted 26 January, 2023; v1 submitted 23 June, 2022; originally announced June 2022.

Journal ref: in European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022, pp. 1047-1051

arXiv:2204.02841 [pdf, other]

doi 10.1145/3512732.3533586

Spectral Denoising for Microphone Classification

Authors: L. Cuccovillo, A. Giganti, P. Bestagini, P. Aichroth, S. Tubaro

Abstract: In this paper, we propose the use of denoising for microphone classification, to enable its usage for several key application domains that involve noisy conditions. We describe the proposed analysis pipeline and the baseline algorithm for microphone classification, and discuss various denoising approaches which can be applied to it within the time or spectral domain; finally, we determine the best… ▽ More In this paper, we propose the use of denoising for microphone classification, to enable its usage for several key application domains that involve noisy conditions. We describe the proposed analysis pipeline and the baseline algorithm for microphone classification, and discuss various denoising approaches which can be applied to it within the time or spectral domain; finally, we determine the best-performing denoising procedure, and evaluate the performance of the overall, integrated approach with several SNR levels of additive input noise. As a result, the proposed method achieves an average accuracy increase of about 25% on denoised content over the reference baseline. △ Less

Submitted 1 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Journal ref: in ACM International Workshop on Multimedia AI against Disinformation (MAD), Newark, NJ, USA, 2022, pp. 10-17

Showing 1–5 of 5 results for author: Cuccovillo, L