-
Trident: Detecting Face Forgeries with Adversarial Triplet Learning
Authors:
Mustafa Hakan Kara,
Aysegul Dundar,
Uğur Güdükbay
Abstract:
As face forgeries generated by deep neural networks become increasingly sophisticated, detecting face manipulations in digital media has posed a significant challenge, underscoring the importance of maintaining digital media integrity and combating visual disinformation. Current detection models, predominantly based on supervised training with domain-specific data, often falter against forgeries g…
▽ More
As face forgeries generated by deep neural networks become increasingly sophisticated, detecting face manipulations in digital media has posed a significant challenge, underscoring the importance of maintaining digital media integrity and combating visual disinformation. Current detection models, predominantly based on supervised training with domain-specific data, often falter against forgeries generated by unencountered techniques. In response to this challenge, we introduce \textit{Trident}, a face forgery detection framework that employs triplet learning with a Siamese network architecture for enhanced adaptability across diverse forgery methods. \textit{Trident} is trained on curated triplets to isolate nuanced differences of forgeries, capturing fine-grained features that distinguish pristine samples from manipulated ones while controlling for other variables. To further enhance generalizability, we incorporate domain-adversarial training with a forgery discriminator. This adversarial component guides our embedding model towards forgery-agnostic representations, improving its robustness to unseen manipulations. In addition, we prevent gradient flow from the classifier head to the embedding model, avoiding overfitting induced by artifacts peculiar to certain forgeries. Comprehensive evaluations across multiple benchmarks and ablation studies demonstrate the effectiveness of our framework. We will release our code in a GitHub repository.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
When and How Unlabeled Data Provably Improve In-Context Learning
Authors:
Yingcong Li,
Xiangyu Chang,
Muti Kara,
Xiaofeng Liu,
Amit Roy-Chowdhury,
Samet Oymak
Abstract:
Recent research shows that in-context learning (ICL) can be effective even when demonstrations have missing or incorrect labels. To shed light on this capability, we examine a canonical setting where the demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show t…
▽ More
Recent research shows that in-context learning (ICL) can be effective even when demonstrations have missing or incorrect labels. To shed light on this capability, we examine a canonical setting where the demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention models recover the optimal fully-supervised estimator but completely fail to exploit unlabeled data; (2) In contrast, multilayer or looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form $\sum_{i\ge 0} a_i (X^\top X)^iX^\top y$ with $X$ and $y$ denoting features and partially-observed labels (with missing entries set to zero). We characterize the class of polynomials that can be expressed as a function of depth and draw connections to Expectation Maximization, an iterative pseudo-labeling algorithm commonly used in semi-supervised learning. Importantly, the leading polynomial power is exponential in depth, so mild amount of depth/looping suffices. As an application of theory, we propose looping off-the-shelf tabular foundation models to enhance their semi-supervision capabilities. Extensive evaluations on real-world datasets show that our method significantly improves the semisupervised tabular learning performance over the standard single pass inference.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models
Authors:
Saban Ozturk,
Melih B. Yilmaz,
Muti Kara,
M. Talat Yavuz,
Aykut Koç,
Tolga Çukur
Abstract:
Diagnostic imaging relies on interpreting both images and radiology reports, but the growing data volumes place significant pressure on medical experts, yielding increased errors and workflow backlogs. Medical vision-language models (med-VLMs) have emerged as a powerful framework to efficiently process multimodal imaging data, particularly in chest X-ray (CXR) evaluations, albeit their performance…
▽ More
Diagnostic imaging relies on interpreting both images and radiology reports, but the growing data volumes place significant pressure on medical experts, yielding increased errors and workflow backlogs. Medical vision-language models (med-VLMs) have emerged as a powerful framework to efficiently process multimodal imaging data, particularly in chest X-ray (CXR) evaluations, albeit their performance hinges on how well image and text representations are aligned. Existing alignment methods, predominantly based on contrastive learning, prioritize separation between disease classes over segregation of fine-grained pathology attributes like location, size or severity, leading to suboptimal representations. Here, we propose MedTrim (Meta-entity-driven Triplet mining), a novel method that enhances image-text alignment through multimodal triplet learning synergistically guided by disease class as well as adjectival and directional pathology descriptors. Unlike common alignment methods that separate broad disease classes, MedTrim leverages structured meta-entity information to preserve subtle but clinically significant intra-class variations. For this purpose, we first introduce an ontology-based entity recognition module that extracts pathology-specific meta-entities from CXR reports, as annotations on pathology attributes are rare in public datasets. For refined sample selection in triplet mining, we then introduce a novel score function that captures an aggregate measure of inter-sample similarity based on disease classes and adjectival/directional descriptors. Lastly, we introduce a multimodal triplet alignment objective for explicit within- and cross-modal alignment between samples sharing detailed pathology characteristics. Our demonstrations indicate that MedTrim improves performance in downstream retrieval and classification tasks compared to state-of-the-art alignment methods.
△ Less
Submitted 23 April, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
Provable Benefits of Task-Specific Prompts for In-context Learning
Authors:
Xiangyu Chang,
Yingcong Li,
Muti Kara,
Samet Oymak,
Amit K. Roy-Chowdhury
Abstract:
The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribut…
▽ More
The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions. We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model. Our results on loss landscape show that task-specific prompts facilitate a covariance-mean decoupling where prompt-tuning explains the conditional mean of the distribution whereas the variance is learned/explained through in-context learning. Incorporating task-specific head further aids this process by entirely decoupling estimation of mean and variance components. This covariance-mean perspective similarly explains how jointly training prompt and attention weights can provably help over fine-tuning after pretraining.
△ Less
Submitted 5 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Proposal for a distributed, community-driven academic publishing system
Authors:
Matteo Barbone,
Mustafa Gündoğan,
Dhiren M. Kara,
Benjamin Pingault,
Alejandro Rodriguez-Pardo Montblanch,
Lucio Stefan,
Anthony K. C. Tan
Abstract:
We propose an academic publishing system where research papers are stored in a network of data centres owned by university libraries and research institutions, and are interfaced with the academic community through a website. In our system, the editor is replaced by an initial adjusted community-wide evaluation, the standard peer-review is accompanied by a post-publication open-ended and community…
▽ More
We propose an academic publishing system where research papers are stored in a network of data centres owned by university libraries and research institutions, and are interfaced with the academic community through a website. In our system, the editor is replaced by an initial adjusted community-wide evaluation, the standard peer-review is accompanied by a post-publication open-ended and community-wide review process, aiming at a more objective and longer-term evaluation, the publishing costs are reduced to the running costs of the servers, and access is fully open. Our proposal addresses the fundamental problems of the current system: it reduces publishing costs, allowing easier access by less well-funded institutions (especially from developing countries); it makes the editorial evaluation distributed and more transparent; it speeds up the peer review process by eliminating the need for multiple resubmissions; and it introduces a long-term, community-wide evaluation of papers, ensuring their continued relevance and accuracy; while maximising its main goals, i.e. ensuring the highest quality of peer review and giving the best referees, the most visibility and the most credit to the best papers. Our scheme is time-efficient, financially sustainable, ethically fair and represents a significant improvement over the current system.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.