Search | arXiv e-print repository

Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol

Authors: Konstantinos Apostolidis, Jakob Abesser, Luca Cuccovillo, Vasileios Mezaris

Abstract: This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and… ▽ More This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted for publication, 3rd ACM Int. Workshop on Multimedia AI against Disinformation (MAD'24) at ACM ICMR'24, June 10, 2024, Phuket, Thailand. This is the "accepted version"

arXiv:2312.02616 [pdf, other]

Facilitating the Production of Well-tailored Video Summaries for Sharing on Social Media

Authors: Evlampios Apostolidis, Konstantinos Apostolidis, Vasileios Mezaris

Abstract: This paper presents a web-based tool that facilitates the production of tailored summaries for online sharing on social media. Through an interactive user interface, it supports a ``one-click'' video summarization process. Based on the integrated AI models for video summarization and aspect ratio transformation, it facilitates the generation of multiple summaries of a full-length video according t… ▽ More This paper presents a web-based tool that facilitates the production of tailored summaries for online sharing on social media. Through an interactive user interface, it supports a ``one-click'' video summarization process. Based on the integrated AI models for video summarization and aspect ratio transformation, it facilitates the generation of multiple summaries of a full-length video according to the needs of target platforms with regard to the video's length and aspect ratio. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted for publication, 30th Int. Conf. on MultiMedia Modeling (MMM 2024), Amsterdam, NL, Jan.-Feb. 2024. This is the "submitted manuscript" version

arXiv:2101.10141 [pdf]

Performance Evaluation of Convolutional Neural Networks for Gait Recognition

Authors: K. D. Apostolidis, P. S. Amanatidis, G. A. Papakostas

Abstract: In this paper, a performance evaluation of well-known deep learning models in gait recognition is presented. For this purpose, the transfer learning scheme is adopted to pre-trained models in order to fit the models to the CASIA-B dataset for solving a gait recognition task. In this context, 18 popular Convolutional Neural Networks (CNNs), were re-trained using Gait Energy Images (GEIs) of CASIA-B… ▽ More In this paper, a performance evaluation of well-known deep learning models in gait recognition is presented. For this purpose, the transfer learning scheme is adopted to pre-trained models in order to fit the models to the CASIA-B dataset for solving a gait recognition task. In this context, 18 popular Convolutional Neural Networks (CNNs), were re-trained using Gait Energy Images (GEIs) of CASIA-B containing almost 14000 images of 124 classes under various conditions, and their performance was studied in terms of accuracy. Moreover, the performance of the studied models is managed to be explained by examining the parts of the images being considered by the models towards providing their decisions. The experimental results are very promising since almost all the models achieved a high accuracy of over 90%, which is robust to the increasing number of classes. Furthermore, an important outcome of this study is the fact that a recognition problem can be effectively solved by using CNNs pre-trained to different problems, thus eliminating the need for customized model design. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: 6 pages, 15 figures, to be published in proceedings of the 24th Pan-Hellenic Conference on Informatics (PCI)

ACM Class: I.4; I.2.10

arXiv:1608.06770 [pdf, other]

Automatic Synchronization of Multi-User Photo Galleries

Authors: E. Sansone, K. Apostolidis, N. Conci, G. Boato, V. Mezaris, F. G. B. De Natale

Abstract: In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solutio… ▽ More In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization. △ Less

Submitted 16 January, 2017; v1 submitted 24 August, 2016; originally announced August 2016.

Comments: ACCEPTED to IEEE Transactions on Multimedia

Showing 1–4 of 4 results for author: Apostolidis, K