Search | arXiv e-print repository

Autonomous and Self-Adapting System for Synthetic Media Detection and Attribution

Authors: Aref Azizpour, Tai D. Nguyen, Matthew C. Stamm

Abstract: Rapid advances in generative AI have enabled the creation of highly realistic synthetic images, which, while beneficial in many domains, also pose serious risks in terms of disinformation, fraud, and other malicious applications. Current synthetic image identification systems are typically static, relying on feature representations learned from known generators; as new generative models emerge, th… ▽ More Rapid advances in generative AI have enabled the creation of highly realistic synthetic images, which, while beneficial in many domains, also pose serious risks in terms of disinformation, fraud, and other malicious applications. Current synthetic image identification systems are typically static, relying on feature representations learned from known generators; as new generative models emerge, these systems suffer from severe performance degradation. In this paper, we introduce the concept of an autonomous self-adaptive synthetic media identification system -- one that not only detects synthetic images and attributes them to known sources but also autonomously identifies and incorporates novel generators without human intervention. Our approach leverages an open-set identification strategy with an evolvable embedding space that distinguishes between known and unknown sources. By employing an unsupervised clustering method to aggregate unknown samples into high-confidence clusters and continuously refining its decision boundaries, our system maintains robust detection and attribution performance even as the generative landscape evolves. Extensive experiments demonstrate that our method significantly outperforms existing approaches, marking a crucial step toward universal, adaptable forensic systems in the era of rapidly advancing generative models. △ Less

Submitted 4 April, 2025; originally announced April 2025.

arXiv:2503.21003 [pdf, other]

Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images

Authors: Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm

Abstract: The emergence of advanced AI-based tools to generate realistic images poses significant challenges for forensic detection and source attribution, especially as new generative techniques appear rapidly. Traditional methods often fail to generalize to unseen generators due to reliance on features specific to known sources during training. To address this problem, we propose a novel approach that exp… ▽ More The emergence of advanced AI-based tools to generate realistic images poses significant challenges for forensic detection and source attribution, especially as new generative techniques appear rapidly. Traditional methods often fail to generalize to unseen generators due to reliance on features specific to known sources during training. To address this problem, we propose a novel approach that explicitly models forensic microstructures - subtle, pixel-level patterns unique to the image creation process. Using only real images in a self-supervised manner, we learn a set of diverse predictive filters to extract residuals that capture different aspects of these microstructures. By jointly modeling these residuals across multiple scales, we obtain a compact model whose parameters constitute a unique forensic self-description for each image. This self-description enables us to perform zero-shot detection of synthetic images, open-set source attribution of images, and clustering based on source without prior knowledge. Extensive experiments demonstrate that our method achieves superior accuracy and adaptability compared to competing techniques, advancing the state of the art in synthetic media forensics. △ Less

Submitted 26 March, 2025; originally announced March 2025.

Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

arXiv:2503.20991 [pdf, other]

MVFNet: Multipurpose Video Forensics Network using Multiple Forms of Forensic Evidence

Authors: Tai D. Nguyen, Matthew C. Stamm

Abstract: While videos can be falsified in many different ways, most existing forensic networks are specialized to detect only a single manipulation type (e.g. deepfake, inpainting). This poses a significant issue as the manipulation used to falsify a video is not known a priori. To address this problem, we propose MVFNet - a multipurpose video forensics network capable of detecting multiple types of manipu… ▽ More While videos can be falsified in many different ways, most existing forensic networks are specialized to detect only a single manipulation type (e.g. deepfake, inpainting). This poses a significant issue as the manipulation used to falsify a video is not known a priori. To address this problem, we propose MVFNet - a multipurpose video forensics network capable of detecting multiple types of manipulations including inpainting, deepfakes, splicing, and editing. Our network does this by extracting and jointly analyzing a broad set of forensic feature modalities that capture both spatial and temporal anomalies in falsified videos. To reliably detect and localize fake content of all shapes and sizes, our network employs a novel Multi-Scale Hierarchical Transformer module to identify forensic inconsistencies across multiple spatial scales. Experimental results show that our network obtains state-of-the-art performance in general scenarios where multiple different manipulations are possible, and rivals specialized detectors in targeted scenarios. △ Less

Submitted 26 March, 2025; originally announced March 2025.

Comments: Proceedings of the Winter Conference on Applications of Computer Vision (WACV) 2025

arXiv:2404.15955 [pdf, other]

Beyond Deepfake Images: Detecting AI-Generated Videos

Authors: Danial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm

Abstract: Recent advances in generative AI have led to the development of techniques to generate visually realistic synthetic video. While a number of techniques have been developed to detect AI-generated synthetic images, in this paper we show that synthetic image detectors are unable to detect synthetic videos. We demonstrate that this is because synthetic video generators introduce substantially differen… ▽ More Recent advances in generative AI have led to the development of techniques to generate visually realistic synthetic video. While a number of techniques have been developed to detect AI-generated synthetic images, in this paper we show that synthetic image detectors are unable to detect synthetic videos. We demonstrate that this is because synthetic video generators introduce substantially different traces than those left by image generators. Despite this, we show that synthetic video traces can be learned, and used to perform reliable synthetic video detection or generator source attribution even after H.264 re-compression. Furthermore, we demonstrate that while detecting videos from new generators through zero-shot transferability is challenging, accurate detection of videos from a new generator can be achieved through few-shot learning. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: To be published in CVPRW24

arXiv:2404.08814 [pdf, other]

E3: Ensemble of Expert Embedders for Adapting Synthetic Image Detectors to New Generators Using Limited Data

Authors: Aref Azizpour, Tai D. Nguyen, Manil Shrestha, Kaidi Xu, Edward Kim, Matthew C. Stamm

Abstract: As generative AI progresses rapidly, new synthetic image generators continue to emerge at a swift pace. Traditional detection methods face two main challenges in adapting to these generators: the forensic traces of synthetic images from new techniques can vastly differ from those learned during training, and access to data for these new generators is often limited. To address these issues, we intr… ▽ More As generative AI progresses rapidly, new synthetic image generators continue to emerge at a swift pace. Traditional detection methods face two main challenges in adapting to these generators: the forensic traces of synthetic images from new techniques can vastly differ from those learned during training, and access to data for these new generators is often limited. To address these issues, we introduce the Ensemble of Expert Embedders (E3), a novel continual learning framework for updating synthetic image detectors. E3 enables the accurate detection of images from newly emerged generators using minimal training data. Our approach does this by first employing transfer learning to develop a suite of expert embedders, each specializing in the forensic traces of a specific generator. Then, all embeddings are jointly analyzed by an Expert Knowledge Fusion Network to produce accurate and reliable detection decisions. Our experiments demonstrate that E3 outperforms existing continual learning methods, including those developed specifically for synthetic image detection. △ Less

Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 11 pages, 4 figures, To be published in CVPRWMF24

arXiv:2308.11557 [pdf, other]

Open Set Synthetic Image Source Attribution

Authors: Shengbang Fang, Tai D. Nguyen, Matthew C. Stamm

Abstract: AI-generated images have become increasingly realistic and have garnered significant public attention. While synthetic images are intriguing due to their realism, they also pose an important misinformation threat. To address this new threat, researchers have developed multiple algorithms to detect synthetic images and identify their source generators. However, most existing source attribution tech… ▽ More AI-generated images have become increasingly realistic and have garnered significant public attention. While synthetic images are intriguing due to their realism, they also pose an important misinformation threat. To address this new threat, researchers have developed multiple algorithms to detect synthetic images and identify their source generators. However, most existing source attribution techniques are designed to operate in a closed-set scenario, i.e. they can only be used to discriminate between known image generators. By contrast, new image-generation techniques are rapidly emerging. To contend with this, there is a great need for open-set source attribution techniques that can identify when synthetic images have originated from new, unseen generators. To address this problem, we propose a new metric learning-based approach. Our technique works by learning transferrable embeddings capable of discriminating between generators, even when they are not seen during training. An image is first assigned to a candidate generator, then is accepted or rejected based on its distance in the embedding space from known generators' learned reference points. Importantly, we identify that initializing our source attribution embedding network by pretraining it on image camera identification can improve our embeddings' transferability. Through a series of experiments, we demonstrate our approach's ability to attribute the source of synthetic images in open-set scenarios. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2305.05784 [pdf, other]

Comprehensive Dataset of Synthetic and Manipulated Overhead Imagery for Development and Evaluation of Forensic Tools

Authors: Brandon B. May, Kirill Trapeznikov, Shengbang Fang, Matthew C. Stamm

Abstract: We present a first of its kind dataset of overhead imagery for development and evaluation of forensic tools. Our dataset consists of real, fully synthetic and partially manipulated overhead imagery generated from a custom diffusion model trained on two sets of different zoom levels and on two sources of pristine data. We developed our model to support controllable generation of multiple manipulati… ▽ More We present a first of its kind dataset of overhead imagery for development and evaluation of forensic tools. Our dataset consists of real, fully synthetic and partially manipulated overhead imagery generated from a custom diffusion model trained on two sets of different zoom levels and on two sources of pristine data. We developed our model to support controllable generation of multiple manipulation categories including fully synthetic imagery conditioned on real and generated base maps, and location. We also support partial in-painted imagery with same conditioning options and with several types of manipulated content. The data consist of raw images and ground truth annotations describing the manipulation parameters. We also report benchmark performance on several tasks supported by our dataset including detection of fully and partially manipulated imagery, manipulation localization and classification. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2211.15775 [pdf, other]

VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces

Authors: Tai D. Nguyen, Shengbang Fang, Matthew C. Stamm

Abstract: Fake videos represent an important misinformation threat. While existing forensic networks have demonstrated strong performance on image forgeries, recent results reported on the Adobe VideoSham dataset show that these networks fail to identify fake content in videos. In this paper, we show that this is due to video coding, which introduces local variation into forensic traces. In response, we pro… ▽ More Fake videos represent an important misinformation threat. While existing forensic networks have demonstrated strong performance on image forgeries, recent results reported on the Adobe VideoSham dataset show that these networks fail to identify fake content in videos. In this paper, we show that this is due to video coding, which introduces local variation into forensic traces. In response, we propose VideoFACT - a new network that is able to detect and localize a wide variety of video forgeries and manipulations. To overcome challenges that existing networks face when analyzing videos, our network utilizes both forensic embeddings to capture traces left by manipulation, context embeddings to control for variation in forensic traces introduced by video coding, and a deep self-attention mechanism to estimate the quality and relative importance of local forensic embeddings. We create several new video forgery datasets and use these, along with publicly available data, to experimentally evaluate our network's performance. These results show that our proposed network is able to identify a diverse set of video forgeries, including those not encountered during training. Furthermore, we show that our network can be fine-tuned to achieve even stronger performance on challenging AI-based manipulations. △ Less

Submitted 10 March, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.12314 [pdf, other]

Attacking Image Splicing Detection and Localization Algorithms Using Synthetic Traces

Authors: Shengbang Fang, Matthew C Stamm

Abstract: Recent advances in deep learning have enabled forensics researchers to develop a new class of image splicing detection and localization algorithms. These algorithms identify spliced content by detecting localized inconsistencies in forensic traces using Siamese neural networks, either explicitly during analysis or implicitly during training. At the same time, deep learning has enabled new forms of… ▽ More Recent advances in deep learning have enabled forensics researchers to develop a new class of image splicing detection and localization algorithms. These algorithms identify spliced content by detecting localized inconsistencies in forensic traces using Siamese neural networks, either explicitly during analysis or implicitly during training. At the same time, deep learning has enabled new forms of anti-forensic attacks, such as adversarial examples and generative adversarial network (GAN) based attacks. Thus far, however, no anti-forensic attack has been demonstrated against image splicing detection and localization algorithms. In this paper, we propose a new GAN-based anti-forensic attack that is able to fool state-of-the-art splicing detection and localization algorithms such as EXIF-Net, Noiseprint, and Forensic Similarity Graphs. This attack operates by adversarially training an anti-forensic generator against a set of Siamese neural networks so that it is able to create synthetic forensic traces. Under analysis, these synthetic traces appear authentic and are self-consistent throughout an image. Through a series of experiments, we demonstrate that our attack is capable of fooling forensic splicing detection and localization algorithms without introducing visually detectable artifacts into an attacked image. Additionally, we demonstrate that our attack outperforms existing alternative attack approaches. % △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2209.08000 [pdf, other]

TIMIT-TTS: a Text-to-Speech Dataset for Multimodal Synthetic Media Detection

Authors: Davide Salvi, Brian Hosler, Paolo Bestagini, Matthew C. Stamm, Stefano Tubaro

Abstract: With the rapid development of deep learning techniques, the generation and counterfeiting of multimedia material are becoming increasingly straightforward to perform. At the same time, sharing fake content on the web has become so simple that malicious users can create unpleasant situations with minimal effort. Also, forged media are getting more and more complex, with manipulated videos that are… ▽ More With the rapid development of deep learning techniques, the generation and counterfeiting of multimedia material are becoming increasingly straightforward to perform. At the same time, sharing fake content on the web has become so simple that malicious users can create unpleasant situations with minimal effort. Also, forged media are getting more and more complex, with manipulated videos that are taking the scene over still images. The multimedia forensic community has addressed the possible threats that this situation could imply by developing detectors that verify the authenticity of multimedia objects. However, the vast majority of these tools only analyze one modality at a time. This was not a problem as long as still images were considered the most widely edited media, but now, since manipulated videos are becoming customary, performing monomodal analyses could be reductive. Nonetheless, there is a lack in the literature regarding multimodal detectors, mainly due to the scarsity of datasets containing forged multimodal data to train and test the designed algorithms. In this paper we focus on the generation of an audio-visual deepfake dataset. First, we present a general pipeline for synthesizing speech deepfake content from a given real or fake video, facilitating the creation of counterfeit multimodal material. The proposed method uses Text-to-Speech (TTS) and Dynamic Time Warping techniques to achieve realistic speech tracks. Then, we use the pipeline to generate and release TIMIT-TTS, a synthetic speech dataset containing the most cutting-edge methods in the TTS field. This can be used as a standalone audio dataset, or combined with other state-of-the-art sets to perform multimodal research. Finally, we present numerous experiments to benchmark the proposed dataset in both mono and multimodal conditions, showing the need for multimodal forensic detectors and more suitable data. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2104.12069 [pdf, other]

Making Generated Images Hard To Spot: A Transferable Attack On Synthetic Image Detectors

Authors: Xinwei Zhao, Matthew C. Stamm

Abstract: Visually realistic GAN-generated images have recently emerged as an important misinformation threat. Research has shown that these synthetic images contain forensic traces that are readily identifiable by forensic detectors. Unfortunately, these detectors are built upon neural networks, which are vulnerable to recently developed adversarial attacks. In this paper, we propose a new anti-forensic at… ▽ More Visually realistic GAN-generated images have recently emerged as an important misinformation threat. Research has shown that these synthetic images contain forensic traces that are readily identifiable by forensic detectors. Unfortunately, these detectors are built upon neural networks, which are vulnerable to recently developed adversarial attacks. In this paper, we propose a new anti-forensic attack capable of fooling GAN-generated image detectors. Our attack uses an adversarially trained generator to synthesize traces that these detectors associate with real images. Furthermore, we propose a technique to train our attack so that it can achieve transferability, i.e. it can fool unknown CNNs that it was not explicitly trained against. We evaluate our attack through an extensive set of experiments, where we show that our attack can fool eight state-of-the-art detection CNNs with synthetic images created using seven different GANs, and outperform other alternative attacks. △ Less

Submitted 22 June, 2022; v1 submitted 25 April, 2021; originally announced April 2021.

Journal ref: International Conference on Pattern Recognition, August 2022, Montréal Québec

arXiv:2101.11081 [pdf, ps, other]

The Effect of Class Definitions on the Transferability of Adversarial Attacks Against Forensic CNNs

Authors: Xinwei Zhao, Matthew C. Stamm

Abstract: In recent years, convolutional neural networks (CNNs) have been widely used by researchers to perform forensic tasks such as image tampering detection. At the same time, adversarial attacks have been developed that are capable of fooling CNN-based classifiers. Understanding the transferability of adversarial attacks, i.e. an attacks ability to attack a different CNN than the one it was trained aga… ▽ More In recent years, convolutional neural networks (CNNs) have been widely used by researchers to perform forensic tasks such as image tampering detection. At the same time, adversarial attacks have been developed that are capable of fooling CNN-based classifiers. Understanding the transferability of adversarial attacks, i.e. an attacks ability to attack a different CNN than the one it was trained against, has important implications for designing CNNs that are resistant to attacks. While attacks on object recognition CNNs are believed to be transferrable, recent work by Barni et al. has shown that attacks on forensic CNNs have difficulty transferring to other CNN architectures or CNNs trained using different datasets. In this paper, we demonstrate that adversarial attacks on forensic CNNs are even less transferrable than previously thought even between virtually identical CNN architectures! We show that several common adversarial attacks against CNNs trained to identify image manipulation fail to transfer to CNNs whose only difference is in the class definitions (i.e. the same CNN architectures trained using the same data). We note that all formulations of class definitions contain the unaltered class. This has important implications for the future design of forensic CNNs that are robust to adversarial and anti-forensic attacks. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Journal ref: Published at Electronic Imaging, Media Watermarking, Security, and Forensics 2020, pp. 119-1-119-7(7)

arXiv:2101.11060 [pdf, other]

Defenses Against Multi-Sticker Physical Domain Attacks on Classifiers

Authors: Xinwei Zhao, Matthew C. Stamm

Abstract: Recently, physical domain adversarial attacks have drawn significant attention from the machine learning community. One important attack proposed by Eykholt et al. can fool a classifier by placing black and white stickers on an object such as a road sign. While this attack may pose a significant threat to visual classifiers, there are currently no defenses designed to protect against this attack.… ▽ More Recently, physical domain adversarial attacks have drawn significant attention from the machine learning community. One important attack proposed by Eykholt et al. can fool a classifier by placing black and white stickers on an object such as a road sign. While this attack may pose a significant threat to visual classifiers, there are currently no defenses designed to protect against this attack. In this paper, we propose new defenses that can protect against multi-sticker attacks. We present defensive strategies capable of operating when the defender has full, partial, and no prior information about the attack. By conducting extensive experiments, we show that our proposed defenses can outperform existing defenses against physical attacks when presented with a multi-sticker attack. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Journal ref: This paper is published on European Conference on Computer Vision 2020, page 202-219, Springer

arXiv:2101.09568 [pdf, other]

A Transferable Anti-Forensic Attack on Forensic CNNs Using A Generative Adversarial Network

Authors: Xinwei Zhao, Chen Chen, Matthew C. Stamm

Abstract: With the development of deep learning, convolutional neural networks (CNNs) have become widely used in multimedia forensics for tasks such as detecting and identifying image forgeries. Meanwhile, anti-forensic attacks have been developed to fool these CNN-based forensic algorithms. Previous anti-forensic attacks often were designed to remove forgery traces left by a single manipulation operation a… ▽ More With the development of deep learning, convolutional neural networks (CNNs) have become widely used in multimedia forensics for tasks such as detecting and identifying image forgeries. Meanwhile, anti-forensic attacks have been developed to fool these CNN-based forensic algorithms. Previous anti-forensic attacks often were designed to remove forgery traces left by a single manipulation operation as opposed to a set of manipulations. Additionally, recent research has shown that existing anti-forensic attacks against forensic CNNs have poor transferability, i.e. they are unable to fool other forensic CNNs that were not explicitly used during training. In this paper, we propose a new anti-forensic attack framework designed to remove forensic traces left by a variety of manipulation operations. This attack is transferable, i.e. it can be used to attack forensic CNNs are unknown to the attacker, and it introduces only minimal distortions that are imperceptible to human eyes. Our proposed attack utilizes a generative adversarial network (GAN) to build a generator that can attack color images of any size. We achieve attack transferability through the use of a new training strategy and loss function. We conduct extensive experiment to demonstrate that our attack can fool many state-of-art forensic CNNs with varying levels of knowledge available to the attacker. △ Less

Submitted 23 January, 2021; originally announced January 2021.

Comments: Submitted to IEEE Transactions on Information Forensics and Security

arXiv:1912.02861 [pdf, other]

doi 10.1109/JSTSP.2020.3001516

Exposing Fake Images with Forensic Similarity Graphs

Authors: Owen Mayer, Matthew C. Stamm

Abstract: We propose new image forgery detection and localization algorithms by recasting these problems as graph-based community detection problems. To do this, we introduce a novel abstract, graph-based representation of an image, which we call the Forensic Similarity Graph, that captures key forensic relationships among regions in the image. In this representation, small image patches are represented by… ▽ More We propose new image forgery detection and localization algorithms by recasting these problems as graph-based community detection problems. To do this, we introduce a novel abstract, graph-based representation of an image, which we call the Forensic Similarity Graph, that captures key forensic relationships among regions in the image. In this representation, small image patches are represented by graph vertices with edges assigned according to the forensic similarity between patches. Localized tampering introduces unique structure into this graph, which aligns with a concept called ``community structure'' in graph-theory literature. In the Forensic Similarity Graph, communities correspond to the tampered and unaltered regions in the image. As a result, forgery detection is performed by identifying whether multiple communities exist, and forgery localization is performed by partitioning these communities. We present two community detection techniques, adapted from literature, to detect and localize image forgeries. We experimentally show that our proposed community detection methods outperform existing state-of-the-art forgery detection and localization methods, which do not capture such community structure. △ Less

Submitted 20 April, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: 16 pages, under review at IEEE Journal of Selected Topics in Signal Processing

arXiv:1902.04684 [pdf, other]

doi 10.1109/TIFS.2019.2924552

Forensic Similarity for Digital Images

Authors: Owen Mayer, Matthew C. Stamm

Abstract: In this paper we introduce a new digital image forensics approach called forensic similarity, which determines whether two image patches contain the same forensic trace or different forensic traces. One benefit of this approach is that prior knowledge, e.g. training samples, of a forensic trace are not required to make a forensic similarity decision on it in the future. To do this, we propose a tw… ▽ More In this paper we introduce a new digital image forensics approach called forensic similarity, which determines whether two image patches contain the same forensic trace or different forensic traces. One benefit of this approach is that prior knowledge, e.g. training samples, of a forensic trace are not required to make a forensic similarity decision on it in the future. To do this, we propose a two part deep-learning system composed of a CNN-based feature extractor and a three-layer neural network, called the similarity network. This system maps pairs of image patches to a score indicating whether they contain the same or different forensic traces. We evaluated system accuracy of determining whether two image patches were 1) captured by the same or different camera model, 2) manipulated by the same or different editing operation, and 3) manipulated by the same or different manipulation parameter, given a particular editing operation. Experiments demonstrate applicability to a variety of forensic traces, and importantly show efficacy on "unknown" forensic traces that were not used to train the system. Experiments also show that the proposed system significantly improves upon prior art, reducing error rates by more than half. Furthermore, we demonstrated the utility of the forensic similarity approach in two practical applications: forgery detection and localization, and database consistency verification. △ Less

Submitted 19 July, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

Comments: 16 pages, Accepted for publication with IEEE T-IFS (IEEE Transactions on Information Forensics and Security, 2019)

Journal ref: IEEE Transactions on Information Forensics and Security (2019) Volume: 15, Pages: 1331 - 1346

Showing 1–16 of 16 results for author: Stamm, M C