-
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors
Authors:
Riccardo Ziglio,
Cecilia Pasquini,
Silvio Ranise
Abstract:
Face swapping manipulations in video streams represents an increasing threat in remote video communications, due to advances
in automated and real-time tools. Recent literature proposes to characterize and exploit visual artifacts introduced in video frames
by swapping algorithms when dealing with challenging physical scenes, such as face occlusions. This paper investigates the
effectiveness…
▽ More
Face swapping manipulations in video streams represents an increasing threat in remote video communications, due to advances
in automated and real-time tools. Recent literature proposes to characterize and exploit visual artifacts introduced in video frames
by swapping algorithms when dealing with challenging physical scenes, such as face occlusions. This paper investigates the
effectiveness of this approach by benchmarking CNN-based data-driven models on two data corpora (including a newly collected
one) and analyzing generalization capabilities with respect to different acquisition sources and swapping algorithms. The results
confirm excellent performance of general-purpose CNN architectures when operating within the same data source, but a significant
difficulty in robustly characterizing occlusion-based visual cues across datasets. This highlights the need for specialized detection
strategies to deal with such artifacts.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
GPU-accelerated SIFT-aided source identification of stabilized videos
Authors:
Andrea Montibeller,
Cecilia Pasquini,
Giulia Boato,
Stefano Dell'Anna,
Fernando Pérez-González
Abstract:
Video stabilization is an in-camera processing commonly applied by modern acquisition devices. While significantly improving the visual quality of the resulting videos, it has been shown that such operation typically hinders the forensic analysis of video signals. In fact, the correct identification of the acquisition source usually based on Photo Response non-Uniformity (PRNU) is subject to the e…
▽ More
Video stabilization is an in-camera processing commonly applied by modern acquisition devices. While significantly improving the visual quality of the resulting videos, it has been shown that such operation typically hinders the forensic analysis of video signals. In fact, the correct identification of the acquisition source usually based on Photo Response non-Uniformity (PRNU) is subject to the estimation of the transformation applied to each frame in the stabilization phase. A number of techniques have been proposed for dealing with this problem, which however typically suffer from a high computational burden due to the grid search in the space of inversion parameters. Our work attempts to alleviate these shortcomings by exploiting the parallelization capabilities of Graphics Processing Units (GPUs), typically used for deep learning applications, in the framework of stabilised frames inversion. Moreover, we propose to exploit SIFT features {to estimate the camera momentum and} %to identify less stabilized temporal segments, thus enabling a more accurate identification analysis, and to efficiently initialize the frame-wise parameter search of consecutive frames. Experiments on a consolidated benchmark dataset confirm the effectiveness of the proposed approach in reducing the required computational time and improving the source identification accuracy. {The code is available at \url{https://github.com/AMontiB/GPU-PRNU-SIFT}}.
△ Less
Submitted 29 July, 2022;
originally announced July 2022.
-
Multi-clue reconstruction of sharing chains for social media images
Authors:
Sebastiano Verde,
Cecilia Pasquini,
Federica Lago,
Alessandro Goller,
Francesco GB De Natale,
Alessandro Piva,
Giulia Boato
Abstract:
The amount of multimedia content shared everyday, combined with the level of realism reached by recent fake-generating technologies, threatens to impair the trustworthiness of online information sources. The process of uploading and sharing data tends to hinder standard media forensic analyses, since multiple re-sharing steps progressively hide the traces of past manipulations. At the same time th…
▽ More
The amount of multimedia content shared everyday, combined with the level of realism reached by recent fake-generating technologies, threatens to impair the trustworthiness of online information sources. The process of uploading and sharing data tends to hinder standard media forensic analyses, since multiple re-sharing steps progressively hide the traces of past manipulations. At the same time though, new traces are introduced by the platforms themselves, enabling the reconstruction of the sharing history of digital objects, with possible applications in information flow monitoring and source identification. In this work, we propose a supervised framework for the reconstruction of image sharing chains on social media platforms. The system is structured as a cascade of backtracking blocks, each of them tracing back one step of the sharing chain at a time. Blocks are designed as ensembles of classifiers trained to analyse the input image independently from one another by leveraging different feature representations that describe both content and container of the media object. Individual decisions are then properly combined by a late fusion strategy. Results highlight the advantages of employing multiple clues, which allow accurately tracing back up to three steps along the sharing chain.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
More Real than Real: A Study on Human Visual Perception of Synthetic Faces
Authors:
Federica Lago,
Cecilia Pasquini,
Rainer Böhme,
Hélène Dumont,
Valérie Goffaux,
Giulia Boato
Abstract:
Deep fakes became extremely popular in the last years, also thanks to their increasing realism. Therefore, there is the need to measures human's ability to distinguish between real and synthetic face images when confronted with cutting-edge creation technologies. We describe the design and results of a perceptual experiment we have conducted, where a wide and diverse group of volunteers has been e…
▽ More
Deep fakes became extremely popular in the last years, also thanks to their increasing realism. Therefore, there is the need to measures human's ability to distinguish between real and synthetic face images when confronted with cutting-edge creation technologies. We describe the design and results of a perceptual experiment we have conducted, where a wide and diverse group of volunteers has been exposed to synthetic face images produced by state-of-the-art Generative Adversarial Networks (namely, PG-GAN, StyleGAN, StyleGAN2). The experiment outcomes reveal how strongly we should call into question our human ability to discriminate real faces from synthetic ones generated through modern AI.
△ Less
Submitted 20 October, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Dynamic texture analysis for detecting fake faces in video sequences
Authors:
Mattia Bonomi,
Cecilia Pasquini,
Giulia Boato
Abstract:
The creation of manipulated multimedia content involving human characters has reached in the last years unprecedented realism, calling for automated techniques to expose synthetically generated faces in images and videos. This work explores the analysis of spatio-temporal texture dynamics of the video signal, with the goal of characterizing and distinguishing real and fake sequences. We propose to…
▽ More
The creation of manipulated multimedia content involving human characters has reached in the last years unprecedented realism, calling for automated techniques to expose synthetically generated faces in images and videos. This work explores the analysis of spatio-temporal texture dynamics of the video signal, with the goal of characterizing and distinguishing real and fake sequences. We propose to build a binary decision on the joint analysis of multiple temporal segments and, in contrast to previous approaches, to exploit the textural dynamics of both the spatial and temporal dimensions. This is achieved through the use of Local Derivative Patterns on Three Orthogonal Planes (LDP-TOP), a compact feature representation known to be an important asset for the detection of face spoofing attacks. Experimental analyses on state-of-the-art datasets of manipulated videos show the discriminative power of such descriptors in separating real and fake sequences, and also identifying the creation method used. Linear Support Vector Machines (SVMs) are used which, despite the lower complexity, yield comparable performance to previously proposed deep models for fake content detection.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
Detecting Adversarial Examples - A Lesson from Multimedia Forensics
Authors:
Pascal Schöttle,
Alexander Schlögl,
Cecilia Pasquini,
Rainer Böhme
Abstract:
Adversarial classification is the task of performing robust classification in the presence of a strategic attacker. Originating from information hiding and multimedia forensics, adversarial classification recently received a lot of attention in a broader security context. In the domain of machine learning-based image classification, adversarial classification can be interpreted as detecting so-cal…
▽ More
Adversarial classification is the task of performing robust classification in the presence of a strategic attacker. Originating from information hiding and multimedia forensics, adversarial classification recently received a lot of attention in a broader security context. In the domain of machine learning-based image classification, adversarial classification can be interpreted as detecting so-called adversarial examples, which are slightly altered versions of benign images. They are specifically crafted to be misclassified with a very high probability by the classifier under attack. Neural networks, which dominate among modern image classifiers, have been shown to be especially vulnerable to these adversarial examples.
However, detecting subtle changes in digital images has always been the goal of multimedia forensics and steganalysis. In this paper, we highlight the parallels between these two fields and secure machine learning.
Furthermore, we adapt a linear filter, similar to early steganalysis methods, to detect adversarial examples that are generated with the projected gradient descent (PGD) method, the state-of-the-art algorithm for this task. We test our method on the MNIST database and show for several parameter combinations of PGD that our method can reliably detect adversarial examples.
Additionally, the combination of adversarial re-training and our detection method effectively reduces the attack surface of attacks against neural networks. Thus, we conclude that adversarial examples for image classification possibly do not withstand detection methods from steganalysis, and future work should explore the effectiveness of known techniques from multimedia forensics in other adversarial settings.
△ Less
Submitted 9 March, 2018;
originally announced March 2018.