-
Emergent Temporal Correspondences from Video Diffusion Transformers
Authors:
Jisu Nam,
Soowon Son,
Dahyun Chung,
Jiyoung Kim,
Siyoon Jin,
Junhwa Hur,
Seungryong Kim
Abstract:
Recent advancements in video diffusion models based on Diffusion Transformers (DiTs) have achieved remarkable success in generating temporally coherent videos. Yet, a fundamental question persists: how do these models internally establish and represent temporal correspondences across frames? We introduce DiffTrack, the first quantitative analysis framework designed to answer this question. DiffTra…
▽ More
Recent advancements in video diffusion models based on Diffusion Transformers (DiTs) have achieved remarkable success in generating temporally coherent videos. Yet, a fundamental question persists: how do these models internally establish and represent temporal correspondences across frames? We introduce DiffTrack, the first quantitative analysis framework designed to answer this question. DiffTrack constructs a dataset of prompt-generated video with pseudo ground-truth tracking annotations and proposes novel evaluation metrics to systematically analyze how each component within the full 3D attention mechanism of DiTs (e.g., representations, layers, and timesteps) contributes to establishing temporal correspondences. Our analysis reveals that query-key similarities in specific, but not all, layers play a critical role in temporal matching, and that this matching becomes increasingly prominent during the denoising process. We demonstrate practical applications of DiffTrack in zero-shot point tracking, where it achieves state-of-the-art performance compared to existing vision foundation and self-supervised video models. Further, we extend our findings to motion-enhanced video generation with a novel guidance method that improves temporal consistency of generated videos without additional training. We believe our work offers crucial insights into the inner workings of video DiTs and establishes a foundation for further research and applications leveraging their temporal understanding.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
Authors:
Zeyuan Yang,
Xueyang Yu,
Delin Chen,
Maohao Shen,
Chuang Gan
Abstract:
Vision-language models (VLMs) excel at multimodal understanding, yet their text-only decoding forces them to verbalize visual reasoning, limiting performance on tasks that demand visual imagination. Recent attempts train VLMs to render explicit images, but the heavy image-generation pre-training often hinders the reasoning ability. Inspired by the way humans reason with mental imagery-the internal…
▽ More
Vision-language models (VLMs) excel at multimodal understanding, yet their text-only decoding forces them to verbalize visual reasoning, limiting performance on tasks that demand visual imagination. Recent attempts train VLMs to render explicit images, but the heavy image-generation pre-training often hinders the reasoning ability. Inspired by the way humans reason with mental imagery-the internal construction and manipulation of visual cues-we investigate whether VLMs can reason through interleaved multimodal trajectories without producing explicit images. To this end, we present a Machine Mental Imagery framework, dubbed as Mirage, which augments VLM decoding with latent visual tokens alongside ordinary text. Concretely, whenever the model chooses to ``think visually'', it recasts its hidden states as next tokens, thereby continuing a multimodal trajectory without generating pixel-level images. Begin by supervising the latent tokens through distillation from ground-truth image embeddings, we then switch to text-only supervision to make the latent trajectory align tightly with the task objective. A subsequent reinforcement learning stage further enhances the multimodal reasoning capability. Experiments on diverse benchmarks demonstrate that Mirage unlocks stronger multimodal reasoning without explicit image generation.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
Authors:
Guozheng Ma,
Lu Li,
Zilin Wang,
Li Shen,
Pierre-Luc Bacon,
Dacheng Tao
Abstract:
Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyo…
▽ More
Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Detecting LLM-Generated Short Answers and Effects on Learner Performance
Authors:
Shambhavi Bhushan,
Danielle R Thomas,
Conrad Borchers,
Isha Raghuvanshi,
Ralph Abboud,
Erin Gatz,
Shivang Gupta,
Kenneth Koedinger
Abstract:
The increasing availability of large language models (LLMs) has raised concerns about their potential misuse in online learning. While tools for detecting LLM-generated text exist and are widely used by researchers and educators, their reliability varies. Few studies have compared the accuracy of detection methods, defined criteria to identify content generated by LLM, or evaluated the effect on l…
▽ More
The increasing availability of large language models (LLMs) has raised concerns about their potential misuse in online learning. While tools for detecting LLM-generated text exist and are widely used by researchers and educators, their reliability varies. Few studies have compared the accuracy of detection methods, defined criteria to identify content generated by LLM, or evaluated the effect on learner performance from LLM misuse within learning. In this study, we define LLM-generated text within open responses as those produced by any LLM without paraphrasing or refinement, as evaluated by human coders. We then fine-tune GPT-4o to detect LLM-generated responses and assess the impact on learning from LLM misuse. We find that our fine-tuned LLM outperforms the existing AI detection tool GPTZero, achieving an accuracy of 80% and an F1 score of 0.78, compared to GPTZero's accuracy of 70% and macro F1 score of 0.50, demonstrating superior performance in detecting LLM-generated responses. We also find that learners suspected of LLM misuse in the open response question were more than twice as likely to correctly answer the corresponding posttest MCQ, suggesting potential misuse across both question types and indicating a bypass of the learning process. We pave the way for future work by demonstrating a structured, code-based approach to improve LLM-generated response detection and propose using auxiliary statistical indicators such as unusually high assessment scores on related tasks, readability scores, and response duration. In support of open science, we contribute data and code to support the fine-tuning of similar models for similar use cases.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Towards AI Search Paradigm
Authors:
Yuchen Li,
Hengyi Cai,
Rui Kong,
Xinran Chen,
Jiamin Chen,
Jun Yang,
Haojie Zhang,
Jiayi Li,
Jiayi Wu,
Yiqun Chen,
Changle Qu,
Keyi Kong,
Wenwen Ye,
Lixin Su,
Xinyu Ma,
Long Xia,
Daiting Shi,
Jiashu Zhao,
Haoyi Xiong,
Shuaiqiang Wang,
Dawei Yin
Abstract:
In this paper, we introduce the AI Search Paradigm, a comprehensive blueprint for next-generation search systems capable of emulating human information processing and decision-making. The paradigm employs a modular architecture of four LLM-powered agents (Master, Planner, Executor and Writer) that dynamically adapt to the full spectrum of information needs, from simple factual queries to complex m…
▽ More
In this paper, we introduce the AI Search Paradigm, a comprehensive blueprint for next-generation search systems capable of emulating human information processing and decision-making. The paradigm employs a modular architecture of four LLM-powered agents (Master, Planner, Executor and Writer) that dynamically adapt to the full spectrum of information needs, from simple factual queries to complex multi-stage reasoning tasks. These agents collaborate dynamically through coordinated workflows to evaluate query complexity, decompose problems into executable plans, and orchestrate tool usage, task execution, and content synthesis. We systematically present key methodologies for realizing this paradigm, including task planning and tool integration, execution strategies, aligned and robust retrieval-augmented generation, and efficient LLM inference, spanning both algorithmic techniques and infrastructure-level optimizations. By providing an in-depth guide to these foundational components, this work aims to inform the development of trustworthy, adaptive, and scalable AI search systems.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Judo: A User-Friendly Open-Source Package for Sampling-Based Model Predictive Control
Authors:
Albert H. Li,
Brandon Hung,
Aaron D. Ames,
Jiuguang Wang,
Simon Le Cleac'h,
Preston Culbertson
Abstract:
Recent advancements in parallel simulation and successful robotic applications are spurring a resurgence in sampling-based model predictive control. To build on this progress, however, the robotics community needs common tooling for prototyping, evaluating, and deploying sampling-based controllers. We introduce Judo, a software package designed to address this need. To facilitate rapid prototyping…
▽ More
Recent advancements in parallel simulation and successful robotic applications are spurring a resurgence in sampling-based model predictive control. To build on this progress, however, the robotics community needs common tooling for prototyping, evaluating, and deploying sampling-based controllers. We introduce Judo, a software package designed to address this need. To facilitate rapid prototyping and evaluation, Judo provides robust implementations of common sampling-based MPC algorithms and standardized benchmark tasks. It further emphasizes usability with simple but extensible interfaces for controller and task definitions, asynchronous execution for straightforward simulation-to-hardware transfer, and a highly customizable interactive GUI for tuning controllers interactively. While written in Python, the software leverages MuJoCo as its physics backend to achieve real-time performance, which we validate across both consumer and server-grade hardware. Code at https://github.com/bdaiinstitute/judo.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Variational Learning of Disentangled Representations
Authors:
Yuli Slavutsky,
Ozgur Beker,
David Blei,
Bianca Dumitrascu
Abstract:
Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific. This separation is essential in domains such as biomedical data analysis, where generalization to new treatments, patients, or species depends on isolating stable biological signals from context-dependent effects. While extensions of the…
▽ More
Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific. This separation is essential in domains such as biomedical data analysis, where generalization to new treatments, patients, or species depends on isolating stable biological signals from context-dependent effects. While extensions of the variational autoencoder (VAE) framework have been proposed to address this problem, they frequently suffer from leakage between latent representations, limiting their ability to generalize to unseen conditions. Here, we introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and condition-specific factors. DISCoVeR integrates three key components: (i) a dual-latent architecture that models shared and specific factors separately; (ii) two parallel reconstructions that ensure both representations remain informative; and (iii) a novel max-min objective that encourages clean separation without relying on handcrafted priors, while making only minimal assumptions. Theoretically, we show that this objective maximizes data likelihood while promoting disentanglement, and that it admits a unique equilibrium. Empirically, we demonstrate that DISCoVeR achieves improved disentanglement on synthetic datasets, natural images, and single-cell RNA-seq data. Together, these results establish DISCoVeR as a principled approach for learning disentangled representations in multi-condition settings.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Continual Learning with Columnar Spiking Neural Networks
Authors:
Denis Larionov,
Nikolay Bazenkov,
Mikhail Kiselev
Abstract:
This study investigates columnar-organized spiking neural networks (SNNs) for continual learning and catastrophic forgetting. Using CoLaNET (Columnar Layered Network), we show that microcolumns adapt most efficiently to new tasks when they lack shared structure with prior learning. We demonstrate how CoLaNET hyperparameters govern the trade-off between retaining old knowledge (stability) and acqui…
▽ More
This study investigates columnar-organized spiking neural networks (SNNs) for continual learning and catastrophic forgetting. Using CoLaNET (Columnar Layered Network), we show that microcolumns adapt most efficiently to new tasks when they lack shared structure with prior learning. We demonstrate how CoLaNET hyperparameters govern the trade-off between retaining old knowledge (stability) and acquiring new information (plasticity). Our optimal configuration learns ten sequential MNIST tasks effectively, maintaining 92% accuracy on each. It shows low forgetting, with only 4% performance degradation on the first task after training on nine subsequent tasks.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Codeword-Segmentation Rate-Splitting Multiple Access and Evaluation under Suboptimal Decoding
Authors:
Sibo Zhang,
Bruno Clerckx,
David Vargas
Abstract:
Rate-Splitting Multiple Access (RSMA) has been recognized as a promising multiple access technique. We propose a novel architecture for downlink RSMA, namely Codeword-Segmentation RSMA (CS-RSMA). Different from conventional RSMA which splits users' messages into common and private parts before encoding, CS-RSMA encodes the users' messages directly, segments the codewords into common and private pa…
▽ More
Rate-Splitting Multiple Access (RSMA) has been recognized as a promising multiple access technique. We propose a novel architecture for downlink RSMA, namely Codeword-Segmentation RSMA (CS-RSMA). Different from conventional RSMA which splits users' messages into common and private parts before encoding, CS-RSMA encodes the users' messages directly, segments the codewords into common and private parts, and transmits the codeword segments using common and private streams. In addition to the principle of CS-RSMA, a novel performance analysis framework is proposed. This framework utilizes a recent discovery in mismatched decoding under finite-alphabet input and interference, and can better capture the receiver's complexity limits. Precoder optimization under finite alphabets and suboptimal decoders for conventional RSMA and CS-RSMA to maximize the Sum-Rate (SR) and the Max-Min Fairness (MMF) is also addressed. The numerical results reveal the theoretical performance of conventional RSMA and CS-RSMA. We observe that CS-RSMA leads to better performance than conventional RSMA in SR, and similar performance in MMF. Furthermore, a physical-layer implementation of CS-RSMA is proposed and evaluated through link-level simulations. Aside performance benefits, we also demonstrate that CS-RSMA brings significant benefits on the encoding/decoding, control signaling, and retransmission process compared to conventional RSMA.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity
Authors:
Samin Yeasar Arnob,
Scott Fujimoto,
Doina Precup
Abstract:
In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we intro…
▽ More
In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce "Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification
Authors:
David Jacob Drexlin,
Jonas Dippel,
Julius Hense,
Niklas Prenißl,
Grégoire Montavon,
Frederick Klauschen,
Klaus-Robert Müller
Abstract:
Deep learning models have made significant advances in histological prediction tasks in recent years. However, for adaptation in clinical practice, their lack of robustness to varying conditions such as staining, scanner, hospital, and demographics is still a limiting factor: if trained on overrepresented subpopulations, models regularly struggle with less frequent patterns, leading to shortcut le…
▽ More
Deep learning models have made significant advances in histological prediction tasks in recent years. However, for adaptation in clinical practice, their lack of robustness to varying conditions such as staining, scanner, hospital, and demographics is still a limiting factor: if trained on overrepresented subpopulations, models regularly struggle with less frequent patterns, leading to shortcut learning and biased predictions. Large-scale foundation models have not fully eliminated this issue. Therefore, we propose a novel approach explicitly modeling such metadata into a Metadata-guided generative Diffusion model framework (MeDi). MeDi allows for a targeted augmentation of underrepresented subpopulations with synthetic data, which balances limited training data and mitigates biases in downstream models. We experimentally show that MeDi generates high-quality histopathology images for unseen subpopulations in TCGA, boosts the overall fidelity of the generated images, and enables improvements in performance for downstream classifiers on datasets with subpopulation shifts. Our work is a proof-of-concept towards better mitigating data biases with generative models.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
On the Theory of Conditional Feature Alignment for Unsupervised Domain-Adaptive Counting
Authors:
Zhuonan Liang,
Dongnan Liu,
Jianan Fan,
Yaxuan Song,
Qiang Qu,
Yu Yao,
Peng Fu,
Weidong Cai
Abstract:
Object counting models suffer when deployed across domains with differing density variety, since density shifts are inherently task-relevant and violate standard domain adaptation assumptions. To address this, we propose a theoretical framework of conditional feature alignment. We first formalize the notion of conditional divergence by partitioning each domain into subsets (e.g., object vs. backgr…
▽ More
Object counting models suffer when deployed across domains with differing density variety, since density shifts are inherently task-relevant and violate standard domain adaptation assumptions. To address this, we propose a theoretical framework of conditional feature alignment. We first formalize the notion of conditional divergence by partitioning each domain into subsets (e.g., object vs. background) and measuring divergences per condition. We then derive a joint error bound showing that, under discrete label spaces treated as condition sets, aligning distributions conditionally leads to tighter bounds on the combined source-target decision error than unconditional alignment. These insights motivate a general conditional adaptation principle: by preserving task-relevant variations while filtering out nuisance shifts, one can achieve superior cross-domain generalization for counting. We provide both defining conditional divergence then proving its benefit in lowering joint error and a practical adaptation strategy that preserves task-relevant information in unsupervised domain-adaptive counting. We demonstrate the effectiveness of our approach through extensive experiments on multiple counting datasets with varying density distributions. The results show that our method outperforms existing unsupervised domain adaptation methods, empirically validating the theoretical insights on conditional feature alignment.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Semi-Supervised Multi-Modal Medical Image Segmentation for Complex Situations
Authors:
Dongdong Meng,
Sheng Li,
Hao Wu,
Guoping Wang,
Xueqing Yan
Abstract:
Semi-supervised learning addresses the issue of limited annotations in medical images effectively, but its performance is often inadequate for complex backgrounds and challenging tasks. Multi-modal fusion methods can significantly improve the accuracy of medical image segmentation by providing complementary information. However, they face challenges in achieving significant improvements under semi…
▽ More
Semi-supervised learning addresses the issue of limited annotations in medical images effectively, but its performance is often inadequate for complex backgrounds and challenging tasks. Multi-modal fusion methods can significantly improve the accuracy of medical image segmentation by providing complementary information. However, they face challenges in achieving significant improvements under semi-supervised conditions due to the challenge of effectively leveraging unlabeled data. There is a significant need to create an effective and reliable multi-modal learning strategy for leveraging unlabeled data in semi-supervised segmentation. To address these issues, we propose a novel semi-supervised multi-modal medical image segmentation approach, which leverages complementary multi-modal information to enhance performance with limited labeled data. Our approach employs a multi-stage multi-modal fusion and enhancement strategy to fully utilize complementary multi-modal information, while reducing feature discrepancies and enhancing feature sharing and alignment. Furthermore, we effectively introduce contrastive mutual learning to constrain prediction consistency across modalities, thereby facilitating the robustness of segmentation results in semi-supervised tasks. Experimental results on two multi-modal datasets demonstrate the superior performance and robustness of the proposed framework, establishing its valuable potential for solving medical image segmentation tasks in complex scenarios.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?
Authors:
Adithya Bhaskar,
Alexander Wettig,
Tianyu Gao,
Yihe Dong,
Danqi Chen
Abstract:
Language models handle increasingly long contexts for tasks such as book summarization, but this leads to growing memory costs for the key-value (KV) cache. Many prior works have proposed ways of discarding KVs from memory, but their approaches are tailored to favorable settings, obscuring caveats like high peak memory and performance degradation, and a fair comparison between methods is difficult…
▽ More
Language models handle increasingly long contexts for tasks such as book summarization, but this leads to growing memory costs for the key-value (KV) cache. Many prior works have proposed ways of discarding KVs from memory, but their approaches are tailored to favorable settings, obscuring caveats like high peak memory and performance degradation, and a fair comparison between methods is difficult. In this paper, we propose the *KV footprint* as a unified metric, which accounts for both the amount of KV entries stored and their lifespan in memory. We evaluate methods based on the smallest footprint they attain while preserving performance in both long-context understanding and generation, with context lengths of up to 128K tokens. This metric reveals the high peak memory of prior KV eviction methods. One class of methods -- *post-fill eviction* -- has a high footprint due to being incompatible with eviction during pre-filling. We adapt these methods to be able to evict KVs during pre-filling, achieving substantially lower KV footprints. We then turn to *recency eviction* methods, wherein we propose PruLong, an end-to-end optimization method for learning which attention heads need to retain the full KV cache and which do not. PruLong saves memory while preserving long-context performance, achieving 12% smaller KV footprint than prior methods while retaining performance in challenging recall tasks. Our paper clarifies the complex tangle of long-context inference methods and paves the way for future development to minimize the KV footprint.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models
Authors:
Dadi Guo,
Jiayu Liu,
Zhiyuan Fan,
Zhitao He,
Haoran Li,
Yumeng Wang,
Yi R.,
Fung
Abstract:
Large reasoning models (e.g., R1, o3) have demonstrated remarkable mathematical problem-solving abilities. However, the high reported accuracy of these advanced models on popular datasets, reliance on purely numerical evaluation and potential benchmark leakage, often masks their true reasoning shortcomings. To address this, we propose leveraging the inherent rigor and methodological complexity of…
▽ More
Large reasoning models (e.g., R1, o3) have demonstrated remarkable mathematical problem-solving abilities. However, the high reported accuracy of these advanced models on popular datasets, reliance on purely numerical evaluation and potential benchmark leakage, often masks their true reasoning shortcomings. To address this, we propose leveraging the inherent rigor and methodological complexity of mathematical proofs as a diagnostic tool to expose these hidden failures. Specifically, we introduce the RFMDataset (Reveal Failure Modes), a collection of 200 diverse mathematical proof problems, and thoroughly evaluate advanced models' performance on it. Our in-depth analysis of their failures uncovers 10 fine-grained error types, which shows fundamental limitations in current large reasoning models: 1) large reasoning models grapple profoundly with mathematical proofs, with some generating entirely correct proofs for less than 20% of problems and failing even on basic ones; 2) models exhibit a diverse spectrum of reasoning failures, prominently demonstrating the lack of guarantees for the correctness and rigor of single-step reasoning; and 3) models show hallucination and incompleteness during the reasoning process. Our findings reveal that models' self-reflection is insufficient to resolve the current logical dilemmas, necessitating formalized and fine-grained logical training.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Software Fairness Testing in Practice
Authors:
Ronnie de Souza Santos,
Matheus de Morais Leca,
Reydne Santos,
Cleyton Magalhaes
Abstract:
Software testing ensures that a system functions correctly, meets specified requirements, and maintains high quality. As artificial intelligence and machine learning (ML) technologies become integral to software systems, testing has evolved to address their unique complexities. A critical advancement in this space is fairness testing, which identifies and mitigates biases in AI applications to pro…
▽ More
Software testing ensures that a system functions correctly, meets specified requirements, and maintains high quality. As artificial intelligence and machine learning (ML) technologies become integral to software systems, testing has evolved to address their unique complexities. A critical advancement in this space is fairness testing, which identifies and mitigates biases in AI applications to promote ethical and equitable outcomes. Despite extensive academic research on fairness testing, including test input generation, test oracle identification, and component testing, practical adoption remains limited. Industry practitioners often lack clear guidelines and effective tools to integrate fairness testing into real-world AI development. This study investigates how software professionals test AI-powered systems for fairness through interviews with 22 practitioners working on AI and ML projects. Our findings highlight a significant gap between theoretical fairness concepts and industry practice. While fairness definitions continue to evolve, they remain difficult for practitioners to interpret and apply. The absence of industry-aligned fairness testing tools further complicates adoption, necessitating research into practical, accessible solutions. Key challenges include data quality and diversity, time constraints, defining effective metrics, and ensuring model interoperability. These insights emphasize the need to bridge academic advancements with actionable strategies and tools, enabling practitioners to systematically address fairness in AI systems.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025
Authors:
Dominik Macháček,
Peter Polák
Abstract:
This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025. We cover all four language pairs with a direct or cascade approach. The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt. We further improve the p…
▽ More
This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025. We cover all four language pairs with a direct or cascade approach. The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt. We further improve the performance by prompting to inject in-domain terminology, and we accommodate context. Our cascaded systems further use EuroLLM for unbounded simultaneous translation. Compared to the Organizers' baseline, our systems improve by 2 BLEU points on Czech to English and 13-22 BLEU points on English to German, Chinese and Japanese on the development sets. Additionally, we also propose a new enhanced measure of speech recognition latency.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Neural Polar Decoders for DNA Data Storage
Authors:
Ziv Aharoni,
Henry D. Pfister
Abstract:
Synchronization errors, such as insertions and deletions, present a fundamental challenge in DNA-based data storage systems, arising from both synthesis and sequencing noise. These channels are often modeled as insertion-deletion-substitution (IDS) channels, for which designing maximum-likelihood decoders is computationally expensive. In this work, we propose a data-driven approach based on neural…
▽ More
Synchronization errors, such as insertions and deletions, present a fundamental challenge in DNA-based data storage systems, arising from both synthesis and sequencing noise. These channels are often modeled as insertion-deletion-substitution (IDS) channels, for which designing maximum-likelihood decoders is computationally expensive. In this work, we propose a data-driven approach based on neural polar decoders (NPDs) to design low-complexity decoders for channels with synchronization errors. The proposed architecture enables decoding over IDS channels with reduced complexity $O(AN log N )$, where $A$ is a tunable parameter independent of the channel. NPDs require only sample access to the channel and can be trained without an explicit channel model. Additionally, NPDs provide mutual information (MI) estimates that can be used to optimize input distributions and code design. We demonstrate the effectiveness of NPDs on both synthetic deletion and IDS channels. For deletion channels, we show that NPDs achieve near-optimal decoding performance and accurate MI estimation, with significantly lower complexity than trellis-based decoders. We also provide numerical estimates of the channel capacity for the deletion channel. We extend our evaluation to realistic DNA storage settings, including channels with multiple noisy reads and real-world Nanopore sequencing data. Our results show that NPDs match or surpass the performance of existing methods while using significantly fewer parameters than the state-of-the-art. These findings highlight the promise of NPDs for robust and efficient decoding in DNA data storage systems.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Quantum k-SAT Related Hypergraph Problems
Authors:
Simon-Luca Kremer,
Dorian Rudolph,
Sevag Gharibian
Abstract:
The Quantum k-SAT problem is the quantum generalization of the k-SAT problem. It is the problem whether a given local Hamiltonian is frustration-free. Frustration-free means that the ground state of the k-local Hamiltonian minimizes the energy of every local interaction term simultaneously. This is a central question in quantum physics and a canonical QMA_1-complete problem. The Quantum k-SAT prob…
▽ More
The Quantum k-SAT problem is the quantum generalization of the k-SAT problem. It is the problem whether a given local Hamiltonian is frustration-free. Frustration-free means that the ground state of the k-local Hamiltonian minimizes the energy of every local interaction term simultaneously. This is a central question in quantum physics and a canonical QMA_1-complete problem. The Quantum k-SAT problem is not as well studied as the classical k-SAT problem in terms of special tractable cases, approximation algorithms and parameterized complexity. In this paper, we will give a graph-theoretic study of the Quantum k-SAT problem with the structures core and radius. These hypergraph structures are important to solve the Quantum k-SAT problem. We can solve a Quantum k-SAT instance in polynomial time if the derived hypergraph has a core of size n-m+a, where a is a constant, and the radius is at most logarithmic. If it exists, we can find a core of size n-m+a with the best possible radius in polynomial time, whereas finding a general minimum core with minimal radius is NP-hard.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings
Authors:
Aditya Sengar,
Ali Hariri,
Daniel Probst,
Patrick Barth,
Pierre Vandergheynst
Abstract:
Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, includi…
▽ More
Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, including every side-chain heavy atom, directly from molecular dynamics (MD) trajectories. LD-FPG employs a Chebyshev graph neural network (ChebNet) to obtain low-dimensional latent embeddings of protein conformations, which are processed using three pooling strategies: blind, sequential and residue-based. A diffusion model trained on these latent representations generates new samples that a decoder, optionally regularized by dihedral-angle losses, maps back to Cartesian coordinates. Using D2R-MD, a 2-microsecond MD trajectory (12 000 frames) of the human dopamine D2 receptor in a membrane environment, the sequential and residue-based pooling strategy reproduces the reference ensemble with high structural fidelity (all-atom lDDT of approximately 0.7; C-alpha-lDDT of approximately 0.8) and recovers backbone and side-chain dihedral-angle distributions with a Jensen-Shannon divergence of less than 0.03 compared to the MD data. LD-FPG thereby offers a practical route to system-specific, all-atom ensemble generation for large proteins, providing a promising tool for structure-based therapeutic design on complex, dynamic targets. The D2R-MD dataset and our implementation are freely available to facilitate further research.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Relaxed syntax modeling in Transformers for future-proof license plate recognition
Authors:
Florent Meyer,
Laurent Guichard,
Denis Coquenet,
Guillaume Gravier,
Yann Soullard,
Bertrand Coüasnon
Abstract:
Effective license plate recognition systems are required to be resilient to constant change, as new license plates are released into traffic daily. While Transformer-based networks excel in their recognition at first sight, we observe significant performance drop over time which proves them unsuitable for tense production environments. Indeed, such systems obtain state-of-the-art results on plates…
▽ More
Effective license plate recognition systems are required to be resilient to constant change, as new license plates are released into traffic daily. While Transformer-based networks excel in their recognition at first sight, we observe significant performance drop over time which proves them unsuitable for tense production environments. Indeed, such systems obtain state-of-the-art results on plates whose syntax is seen during training. Yet, we show they perform similarly to random guessing on future plates where legible characters are wrongly recognized due to a shift in their syntax. After highlighting the flows of positional and contextual information in Transformer encoder-decoders, we identify several causes for their over-reliance on past syntax. Following, we devise architectural cut-offs and replacements which we integrate into SaLT, an attempt at a Syntax-Less Transformer for syntax-agnostic modeling of license plate representations. Experiments on both real and synthetic datasets show that our approach reaches top accuracy on past syntax and most importantly nearly maintains performance on future license plates. We further demonstrate the robustness of our architecture enhancements by way of various ablations.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
Authors:
Lorenzo Tausani,
Paolo Muratore,
Morgan B. Talbot,
Giacomo Amerio,
Gabriel Kreiman,
Davide Zoccolan
Abstract:
Uncovering which features' combinations high-level visual units encode is critical to understand how images are transformed into representations that support recognition. While existing feature visualization approaches typically infer a unit's most exciting images, this is insufficient to reveal the manifold of transformations under which responses remain invariant, which is key to generalization…
▽ More
Uncovering which features' combinations high-level visual units encode is critical to understand how images are transformed into representations that support recognition. While existing feature visualization approaches typically infer a unit's most exciting images, this is insufficient to reveal the manifold of transformations under which responses remain invariant, which is key to generalization in vision. Here we introduce Stretch-and-Squeeze (SnS), an unbiased, model-agnostic, and gradient-free framework to systematically characterize a unit's invariance landscape and its vulnerability to adversarial perturbations in both biological and artificial visual systems. SnS frames these transformations as bi-objective optimization problems. To probe invariance, SnS seeks image perturbations that maximally alter the representation of a reference stimulus in a given processing stage while preserving unit activation. To probe adversarial sensitivity, SnS seeks perturbations that minimally alter the stimulus while suppressing unit activation. Applied to convolutional neural networks (CNNs), SnS revealed image variations that were further from a reference image in pixel-space than those produced by affine transformations, while more strongly preserving the target unit's response. The discovered invariant images differed dramatically depending on the choice of image representation used for optimization: pixel-level changes primarily affected luminance and contrast, while stretching mid- and late-layer CNN representations altered texture and pose respectively. Notably, the invariant images from robust networks were more recognizable by human subjects than those from standard networks, supporting the higher fidelity of robust CNNs as models of the visual system.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction
Authors:
Sina Aghaee Dabaghan Fard,
Minhee Kim,
Akash Deep,
Jaesung Lee
Abstract:
Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately predicting a system's remaining useful life (RUL) requires effectively leveraging multi-sensor time-series data alongside multi-mode failure event data. In most e…
▽ More
Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately predicting a system's remaining useful life (RUL) requires effectively leveraging multi-sensor time-series data alongside multi-mode failure event data. In most existing models, failure modes and RUL prediction are performed independently, ignoring the inherent relationship between these two tasks. Some models integrate multiple failure modes and event prediction using black-box machine learning approaches, which lack statistical rigor and cannot characterize the inherent uncertainty in the model and data. This paper introduces a unified approach to jointly model the multi-sensor time-series data and failure time concerning multiple failure modes. This proposed model integrate a Cox proportional hazards model, a Convolved Multi-output Gaussian Process, and multinomial failure mode distributions in a hierarchical Bayesian framework with corresponding priors, enabling accurate prediction with robust uncertainty quantification. Posterior distributions are effectively obtained by Variational Bayes, and prediction is performed with Monte Carlo sampling. The advantages of the proposed model is validated through extensive numerical and case studies with jet-engine dataset.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Critical Appraisal of Fairness Metrics in Clinical Predictive AI
Authors:
João Matos,
Ben Van Calster,
Leo Anthony Celi,
Paula Dhiman,
Judy Wawira Gichoya,
Richard D. Riley,
Chris Russell,
Sara Khalid,
Gary S. Collins
Abstract:
Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of "fairness" remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a "fairness metric" as a measure quantify…
▽ More
Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of "fairness" remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a "fairness metric" as a measure quantifying whether a model discriminates (societally) against individuals or groups defined by sensitive attributes. We searched five databases (2014-2024), screening 820 records, to include 41 studies, and extracted 62 fairness metrics. Metrics were classified by performance-dependency, model output level, and base performance metric, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures. Eighteen metrics were explicitly developed for healthcare, including only one clinical utility metric. Our findings highlight conceptual challenges in defining and quantifying fairness and identify gaps in uncertainty quantification, intersectionality, and real-world applicability. Future work should prioritise clinically meaningful metrics.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
A Quantile Regression Approach for Remaining Useful Life Estimation with State Space Models
Authors:
Davide Frizzo,
Francesco Borsatti,
Gian Antonio Susto
Abstract:
Predictive Maintenance (PdM) is pivotal in Industry 4.0 and 5.0, proactively enhancing efficiency through accurate equipment Remaining Useful Life (RUL) prediction, thus optimizing maintenance scheduling and reducing unexpected failures and premature interventions. This paper introduces a novel RUL estimation approach leveraging State Space Models (SSM) for efficient long-term sequence modeling. T…
▽ More
Predictive Maintenance (PdM) is pivotal in Industry 4.0 and 5.0, proactively enhancing efficiency through accurate equipment Remaining Useful Life (RUL) prediction, thus optimizing maintenance scheduling and reducing unexpected failures and premature interventions. This paper introduces a novel RUL estimation approach leveraging State Space Models (SSM) for efficient long-term sequence modeling. To handle model uncertainty, Simoultaneous Quantile Regression (SQR) is integrated into the SSM, enabling multiple quantile estimations. The proposed method is benchmarked against traditional sequence modelling techniques (LSTM, Transformer, Informer) using the C-MAPSS dataset. Results demonstrate superior accuracy and computational efficiency of SSM models, underscoring their potential for high-stakes industrial applications.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation
Authors:
Giulia Bertazzini,
Chiara Albisani,
Daniele Baracchi,
Dasara Shullani,
Roberto Verdecchia
Abstract:
With the growing adoption of AI image generation, in conjunction with the ever-increasing environmental resources demanded by AI, we are urged to answer a fundamental question: What is the environmental impact hidden behind each image we generate? In this research, we present a comprehensive empirical experiment designed to assess the energy consumption of AI image generation. Our experiment compa…
▽ More
With the growing adoption of AI image generation, in conjunction with the ever-increasing environmental resources demanded by AI, we are urged to answer a fundamental question: What is the environmental impact hidden behind each image we generate? In this research, we present a comprehensive empirical experiment designed to assess the energy consumption of AI image generation. Our experiment compares 17 state-of-the-art image generation models by considering multiple factors that could affect their energy consumption, such as model quantization, image resolution, and prompt length. Additionally, we consider established image quality metrics to study potential trade-offs between energy consumption and generated image quality. Results show that image generation models vary drastically in terms of the energy they consume, with up to a 46x difference. Image resolution affects energy consumption inconsistently, ranging from a 1.3x to 4.7x increase when doubling resolution. U-Net-based models tend to consume less than Transformer-based one. Model quantization instead results to deteriorate the energy efficiency of most models, while prompt length and content have no statistically significant impact. Improving image quality does not always come at the cost of a higher energy consumption, with some of the models producing the highest quality images also being among the most energy efficient ones.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Simulating Correlated Electrons with Symmetry-Enforced Normalizing Flows
Authors:
Dominic Schuh,
Janik Kreit,
Evan Berkowitz,
Lena Funcke,
Thomas Luu,
Kim A. Nicoli,
Marcel Rodekamp
Abstract:
We present the first proof of principle that normalizing flows can accurately learn the Boltzmann distribution of the fermionic Hubbard model - a key framework for describing the electronic structure of graphene and related materials. State-of-the-art methods like Hybrid Monte Carlo often suffer from ergodicity issues near the time-continuum limit, leading to biased estimates. Leveraging symmetry-…
▽ More
We present the first proof of principle that normalizing flows can accurately learn the Boltzmann distribution of the fermionic Hubbard model - a key framework for describing the electronic structure of graphene and related materials. State-of-the-art methods like Hybrid Monte Carlo often suffer from ergodicity issues near the time-continuum limit, leading to biased estimates. Leveraging symmetry-aware architectures as well as independent and identically distributed sampling, our approach resolves these issues and achieves significant speed-ups over traditional methods.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators
Authors:
Marco Jiralerspong,
Esther Derman,
Danilo Vucetic,
Nikolay Malkin,
Bilun Sun,
Tianyu Zhang,
Pierre-Luc Bacon,
Gauthier Gidel
Abstract:
A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a small set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regula…
▽ More
A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a small set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regularization to generate more diverse candidates. These reward functions are inherently uncertain, raising a particularly salient challenge for scientific discovery. In this work, we show that existing methods, often framed as sampling proportional to a reward function, are inadequate and yield suboptimal candidates, especially in large search spaces. To remedy this issue, we take a robust RL approach and introduce a unified operator that seeks robustness to the uncertainty of the proxy reward function. This general operator targets peakier sampling distributions while encompassing known soft RL operators. It also leads us to a novel algorithm that identifies higher-quality, diverse candidates in both synthetic and real-world tasks. Ultimately, our work offers a new, flexible perspective on discrete compositional generation tasks. Code: https://github.com/marcojira/tgm.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
LLM-Generated Feedback Supports Learning If Learners Choose to Use It
Authors:
Danielle R. Thomas,
Conrad Borchers,
Shambhavi Bhushan,
Erin Gatz,
Shivang Gupta,
Kenneth R. Koedinger
Abstract:
Large language models (LLMs) are increasingly used to generate feedback, yet their impact on learning remains underexplored, especially compared to existing feedback methods. This study investigates how on-demand LLM-generated explanatory feedback influences learning in seven scenario-based tutor training lessons. Analyzing over 2,600 lesson completions from 885 tutor learners, we compare posttest…
▽ More
Large language models (LLMs) are increasingly used to generate feedback, yet their impact on learning remains underexplored, especially compared to existing feedback methods. This study investigates how on-demand LLM-generated explanatory feedback influences learning in seven scenario-based tutor training lessons. Analyzing over 2,600 lesson completions from 885 tutor learners, we compare posttest performance among learners across three groups: learners who received feedback generated by gpt-3.5-turbo, those who declined it, and those without access. All groups received non-LLM corrective feedback. To address potential selection bias-where higher-performing learners may be more inclined to use LLM feedback-we applied propensity scoring. Learners with a higher predicted likelihood of engaging with LLM feedback scored significantly higher at posttest than those with lower propensity. After adjusting for this effect, two out of seven lessons showed statistically significant learning benefits from LLM feedback with standardized effect sizes of 0.28 and 0.33. These moderate effects suggest that the effectiveness of LLM feedback depends on the learners' tendency to seek support. Importantly, LLM feedback did not significantly increase completion time, and learners overwhelmingly rated it as helpful. These findings highlight LLM feedback's potential as a low-cost and scalable way to improve learning on open-ended tasks, particularly in existing systems already providing feedback without LLMs. This work contributes open datasets, LLM prompts, and rubrics to support reproducibility.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
PersonalAI: Towards digital twins in the graph form
Authors:
Mikhail Menschikov,
Dmitry Evseev,
Ruslan Kostoev,
Ilya Perepechkin,
Ilnaz Salimov,
Victoria Dochkina,
Petr Anokhin,
Evgeny Burnaev,
Nikita Semenov
Abstract:
The challenge of personalizing language models, specifically the ability to account for a user's history during interactions, is of significant interest. Despite recent advancements in large language models (LLMs) and Retrieval Augmented Generation that have enhanced the factual base of LLMs, the task of retaining extensive personal information and using it to generate personalized responses remai…
▽ More
The challenge of personalizing language models, specifically the ability to account for a user's history during interactions, is of significant interest. Despite recent advancements in large language models (LLMs) and Retrieval Augmented Generation that have enhanced the factual base of LLMs, the task of retaining extensive personal information and using it to generate personalized responses remains pertinent. To address this, we propose utilizing external memory in the form of knowledge graphs, which are constructed and updated by the LLM itself. We have expanded upon ideas of AriGraph architecture and for the first time introduced a combined graph featuring both standard edges and two types of hyperedges. Experiments conducted on the TriviaQA, HotpotQA and DiaASQ benchmarks indicates that this approach aids in making the process of graph construction and knowledge extraction unified and robust. Furthermore, we augmented the DiaASQ benchmark by incorporating parameters such as time into dialogues and introducing contradictory statements made by the same speaker at different times. Despite these modifications, the performance of the question-answering system remained robust, demonstrating the proposed architecture's ability to maintain and utilize temporal dependencies.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments
Authors:
Yasir Ali Farrukh,
Syed Wali,
Irfan Khan,
Nathaniel D. Bastian
Abstract:
Unsupervised Domain Adaptation (UDA) is a critical challenge in real-world vision systems, especially in resource-constrained environments like drones, where memory and computation are limited. Existing prompt-driven UDA methods typically rely on large vision-language models and require full access to source-domain data during adaptation, limiting their applicability. In this work, we propose Prmp…
▽ More
Unsupervised Domain Adaptation (UDA) is a critical challenge in real-world vision systems, especially in resource-constrained environments like drones, where memory and computation are limited. Existing prompt-driven UDA methods typically rely on large vision-language models and require full access to source-domain data during adaptation, limiting their applicability. In this work, we propose Prmpt2Adpt, a lightweight and efficient zero-shot domain adaptation framework built around a teacher-student paradigm guided by prompt-based feature alignment. At the core of our method is a distilled and fine-tuned CLIP model, used as the frozen backbone of a Faster R-CNN teacher. A small set of low-level source features is aligned to the target domain semantics-specified only through a natural language prompt-via Prompt-driven Instance Normalization (PIN). These semantically steered features are used to briefly fine-tune the detection head of the teacher model. The adapted teacher then generates high-quality pseudo-labels, which guide the on-the-fly adaptation of a compact student model. Experiments on the MDS-A dataset demonstrate that Prmpt2Adpt achieves competitive detection performance compared to state-of-the-art methods, while delivering up to 7x faster adaptation and 5x faster inference speed using few source images-making it a practical and scalable solution for real-time adaptation in low-resource domains.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond
Authors:
Antonin Berthon,
Mihaela van der Schaar
Abstract:
Accurately assessing student knowledge is critical for effective education, yet traditional Knowledge Tracing (KT) methods rely on opaque latent embeddings, limiting interpretability. Even LLM-based approaches generate direct predictions or summaries that may hallucinate without any accuracy guarantees. We recast KT as an inverse problem: learning the minimum natural-language summary that makes pa…
▽ More
Accurately assessing student knowledge is critical for effective education, yet traditional Knowledge Tracing (KT) methods rely on opaque latent embeddings, limiting interpretability. Even LLM-based approaches generate direct predictions or summaries that may hallucinate without any accuracy guarantees. We recast KT as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable. Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text. By constraining all predictive information to pass through a short natural-language bottleneck, LBMs ensure that the summary contains accurate information while remaining human-interpretable. Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories. We demonstrate that training the encoder with group-relative policy optimization, using downstream decoding accuracy as a reward signal, effectively improves summary quality.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Reversing Flow for Image Restoration
Authors:
Haina Qin,
Wenyang Luo,
Libin Wang,
Dandan Zheng,
Jingdong Chen,
Ming Yang,
Bing Li,
Weiming Hu
Abstract:
Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation. Existing generative models for image restoration, including diffusion and score-based models, often treat the degradation process as a stochastic transformation, which introduces inefficiency and complexity. In this work, we propose ResFlow, a novel image restorat…
▽ More
Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation. Existing generative models for image restoration, including diffusion and score-based models, often treat the degradation process as a stochastic transformation, which introduces inefficiency and complexity. In this work, we propose ResFlow, a novel image restoration framework that models the degradation process as a deterministic path using continuous normalizing flows. ResFlow augments the degradation process with an auxiliary process that disambiguates the uncertainty in HQ prediction to enable reversible modeling of the degradation process. ResFlow adopts entropy-preserving flow paths and learns the augmented degradation flow by matching the velocity field. ResFlow significantly improves the performance and speed of image restoration, completing the task in fewer than four sampling steps. Extensive experiments demonstrate that ResFlow achieves state-of-the-art results across various image restoration benchmarks, offering a practical and efficient solution for real-world applications.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Authors:
Wenyang Luo,
Haina Qin,
Zewen Chen,
Libin Wang,
Dandan Zheng,
Yuming Li,
Yufan Liu,
Bing Li,
Weiming Hu
Abstract:
Image restoration tasks like deblurring, denoising, and dehazing usually need distinct models for each degradation type, restricting their generalization in real-world scenarios with mixed or unknown degradations. In this work, we propose \textbf{Defusion}, a novel all-in-one image restoration framework that utilizes visual instruction-guided degradation diffusion. Unlike existing methods that rel…
▽ More
Image restoration tasks like deblurring, denoising, and dehazing usually need distinct models for each degradation type, restricting their generalization in real-world scenarios with mixed or unknown degradations. In this work, we propose \textbf{Defusion}, a novel all-in-one image restoration framework that utilizes visual instruction-guided degradation diffusion. Unlike existing methods that rely on task-specific models or ambiguous text-based priors, Defusion constructs explicit \textbf{visual instructions} that align with the visual degradation patterns. These instructions are grounded by applying degradations to standardized visual elements, capturing intrinsic degradation features while agnostic to image semantics. Defusion then uses these visual instructions to guide a diffusion-based model that operates directly in the degradation space, where it reconstructs high-quality images by denoising the degradation effects with enhanced stability and generalizability. Comprehensive experiments demonstrate that Defusion outperforms state-of-the-art methods across diverse image restoration tasks, including complex and real-world degradations.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
The Proof Analysis Problem
Authors:
Noel Arteche,
Albert Atserias,
Susanna F. de Rezende,
Erfan Khaniki
Abstract:
Atserias and Müller (JACM, 2020) proved that for every unsatisfiable CNF formula $\varphi$, the formula $\operatorname{Ref}(\varphi)$, stating "$\varphi$ has small Resolution refutations", does not have subexponential-size Resolution refutations. Conversely, when $\varphi$ is satisfiable, Pudlák (TCS, 2003) showed how to construct a polynomial-size Resolution refutation of…
▽ More
Atserias and Müller (JACM, 2020) proved that for every unsatisfiable CNF formula $\varphi$, the formula $\operatorname{Ref}(\varphi)$, stating "$\varphi$ has small Resolution refutations", does not have subexponential-size Resolution refutations. Conversely, when $\varphi$ is satisfiable, Pudlák (TCS, 2003) showed how to construct a polynomial-size Resolution refutation of $\operatorname{Ref}(\varphi)$ given a satisfying assignment of $\varphi$. A question that remained open is: do all short Resolution refutations of $\operatorname{Ref}(\varphi)$ explicitly leak a satisfying assignment of $\varphi$?
We answer this question affirmatively by giving a polynomial-time algorithm that extracts a satisfying assignment for $\varphi$ given any short Resolution refutation of $\operatorname{Ref}(\varphi)$. The algorithm follows from a new feasibly constructive proof of the Atserias-Müller lower bound, formalizable in Cook's theory $\mathsf{PV_1}$ of bounded arithmetic.
Motivated by this, we introduce a computational problem concerning Resolution lower bounds: the Proof Analysis Problem (PAP). For a proof system $Q$, the Proof Analysis Problem for $Q$ asks, given a CNF formula $\varphi$ and a $Q$-proof of a Resolution lower bound for $\varphi$, encoded as $\neg \operatorname{Ref}(\varphi)$, whether $\varphi$ is satisfiable. In contrast to PAP for Resolution, we prove that PAP for Extended Frege (EF) is NP-complete.
Our results yield new insights into proof complexity: (i) every proof system simulating EF is (weakly) automatable if and only if it is (weakly) automatable on formulas stating Resolution lower bounds; (ii) we provide Ref formulas exponentially hard for bounded-depth Frege systems; and (iii) for every strong enough theory of arithmetic $T$ we construct unsatisfiable CNF formulas exponentially hard for Resolution but for which $T$ cannot prove even a quadratic lower bound.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
LunarLoc: Segment-Based Global Localization on the Moon
Authors:
Annika Thomas,
Robaire Galliath,
Aleksander Garbuz,
Luke Anger,
Cormac O'Neill,
Trevor Johst,
Dami Thomas,
George Lordos,
Jonathan P. How
Abstract:
Global localization is necessary for autonomous operations on the lunar surface where traditional Earth-based navigation infrastructure, such as GPS, is unavailable. As NASA advances toward sustained lunar presence under the Artemis program, autonomous operations will be an essential component of tasks such as robotic exploration and infrastructure deployment. Tasks such as excavation and transpor…
▽ More
Global localization is necessary for autonomous operations on the lunar surface where traditional Earth-based navigation infrastructure, such as GPS, is unavailable. As NASA advances toward sustained lunar presence under the Artemis program, autonomous operations will be an essential component of tasks such as robotic exploration and infrastructure deployment. Tasks such as excavation and transport of regolith require precise pose estimation, but proposed approaches such as visual-inertial odometry (VIO) accumulate odometry drift over long traverses. Precise pose estimation is particularly important for upcoming missions such as the ISRU Pilot Excavator (IPEx) that rely on autonomous agents to operate over extended timescales and varied terrain. To help overcome odometry drift over long traverses, we propose LunarLoc, an approach to global localization that leverages instance segmentation for zero-shot extraction of boulder landmarks from onboard stereo imagery. Segment detections are used to construct a graph-based representation of the terrain, which is then aligned with a reference map of the environment captured during a previous session using graph-theoretic data association. This method enables accurate and drift-free global localization in visually ambiguous settings. LunarLoc achieves sub-cm level accuracy in multi-session global localization experiments, significantly outperforming the state of the art in lunar global localization. To encourage the development of further methods for global localization on the Moon, we release our datasets publicly with a playback module: https://github.com/mit-acl/lunarloc-data.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test
Authors:
Sebastian Nagies,
Emiliano Tolotti,
Davide Pastorello,
Enrico Blanzieri
Abstract:
Parameterized quantum circuits represent promising architectures for machine learning applications, yet many lack clear connections to classical models, potentially limiting their ability to translate the wide success of classical neural networks to the quantum realm. We examine a specific type of quantum neural network (QNN) built exclusively from SWAP test circuits, and discuss its mathematical…
▽ More
Parameterized quantum circuits represent promising architectures for machine learning applications, yet many lack clear connections to classical models, potentially limiting their ability to translate the wide success of classical neural networks to the quantum realm. We examine a specific type of quantum neural network (QNN) built exclusively from SWAP test circuits, and discuss its mathematical equivalence to a classical two-layer feedforward network with quadratic activation functions under amplitude encoding. Our analysis across classical real-world and synthetic datasets reveals that while this architecture can successfully learn many practical tasks, it exhibits fundamental expressivity limitations due to violating the universal approximation theorem, particularly failing on harder problems like the parity check function. To address this limitation, we introduce a circuit modification using generalized SWAP test circuits that effectively implements classical neural networks with product layers. This enhancement enables successful learning of parity check functions in arbitrary dimensions which we analytically argue to be impossible for the original architecture beyond two dimensions regardless of network size. Our results establish a framework for enhancing QNN expressivity through classical task analysis and demonstrate that our SWAP test-based architecture offers broad representational capacity, suggesting potential promise also for quantum learning tasks.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
A deep learning and machine learning approach to predict neonatal death in the context of São Paulo
Authors:
Mohon Raihan,
Plabon Kumar Saha,
Rajan Das Gupta,
A Z M Tahmidul Kabir,
Afia Anjum Tamanna,
Md. Harun-Ur-Rashid,
Adnan Bin Abdus Salam,
Md Tanvir Anjum,
A Z M Ahteshamul Kabir
Abstract:
Neonatal death is still a concerning reality for underdeveloped and even some developed countries. Worldwide data indicate that 26.693 babies out of 1,000 births die, according to Macro Trades. To reduce this number, early prediction of endangered babies is crucial. Such prediction enables the opportunity to take ample care of the child and mother so that early child death can be avoided. In this…
▽ More
Neonatal death is still a concerning reality for underdeveloped and even some developed countries. Worldwide data indicate that 26.693 babies out of 1,000 births die, according to Macro Trades. To reduce this number, early prediction of endangered babies is crucial. Such prediction enables the opportunity to take ample care of the child and mother so that early child death can be avoided. In this context, machine learning was used to determine whether a newborn baby is at risk. To train the predictive model, historical data of 1.4 million newborns was used. Machine learning and deep learning techniques such as logical regression, K-nearest neighbor, random forest classifier, extreme gradient boosting (XGBoost), convolutional neural network, and long short-term memory (LSTM) were implemented using the dataset to identify the most accurate model for predicting neonatal mortality. Among the machine learning algorithms, XGBoost and random forest classifier achieved the best accuracy with 94%, while among the deep learning models, LSTM delivered the highest accuracy with 99%. Therefore, using LSTM appears to be the most suitable approach to predict whether precautionary measures for a child are necessary.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Advancing Fact Attribution for Query Answering: Aggregate Queries and Novel Algorithms
Authors:
Omer Abramovich,
Daniel Deutch,
Nave Frost,
Ahmet Kara,
Dan Olteanu
Abstract:
In this paper, we introduce a novel approach to computing the contribution of input tuples to the result of the query, quantified by the Banzhaf and Shapley values. In contrast to prior algorithmic work that focuses on Select-Project-Join-Union queries, ours is the first practical approach for queries with aggregates. It relies on two novel optimizations that are essential for its practicality and…
▽ More
In this paper, we introduce a novel approach to computing the contribution of input tuples to the result of the query, quantified by the Banzhaf and Shapley values. In contrast to prior algorithmic work that focuses on Select-Project-Join-Union queries, ours is the first practical approach for queries with aggregates. It relies on two novel optimizations that are essential for its practicality and significantly improve the runtime performance already for queries without aggregates. The first optimization exploits the observation that many input tuples have the same contribution to the query result, so it is enough to compute the contribution of one of them. The second optimization uses the gradient of the query lineage to compute the contributions of all tuples with the same complexity as for one of them. Experiments with a million instances over 3 databases show that our approach achieves up to 3 orders of magnitude runtime improvements over the state-of-the-art for queries without aggregates, and that it is practical for aggregate queries.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
A Neural Operator based Hybrid Microscale Model for Multiscale Simulation of Rate-Dependent Materials
Authors:
Dhananjeyan Jeyaraj,
Hamidreza Eivazi,
Jendrik-Alexander Tröger,
Stefan Wittek,
Stefan Hartmann,
Andreas Rausch
Abstract:
The behavior of materials is influenced by a wide range of phenomena occurring across various time and length scales. To better understand the impact of microstructure on macroscopic response, multiscale modeling strategies are essential. Numerical methods, such as the $\text{FE}^2$ approach, account for micro-macro interactions to predict the global response in a concurrent manner. However, these…
▽ More
The behavior of materials is influenced by a wide range of phenomena occurring across various time and length scales. To better understand the impact of microstructure on macroscopic response, multiscale modeling strategies are essential. Numerical methods, such as the $\text{FE}^2$ approach, account for micro-macro interactions to predict the global response in a concurrent manner. However, these methods are computationally intensive due to the repeated evaluations of the microscale. This challenge has led to the integration of deep learning techniques into computational homogenization frameworks to accelerate multiscale simulations. In this work, we employ neural operators to predict the microscale physics, resulting in a hybrid model that combines data-driven and physics-based approaches. This allows for physics-guided learning and provides flexibility for different materials and spatial discretizations. We apply this method to time-dependent solid mechanics problems involving viscoelastic material behavior, where the state is represented by internal variables only at the microscale. The constitutive relations of the microscale are incorporated into the model architecture and the internal variables are computed based on established physical principles. The results for homogenized stresses ($<6\%$ error) show that the approach is computationally efficient ($\sim 100 \times$ faster).
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts
Authors:
Daniel Christoph,
Max Ploner,
Patrick Haller,
Alan Akbik
Abstract:
Sample efficiency is a crucial property of language models with practical implications for training efficiency. In real-world text, information follows a long-tailed distribution. Yet, we expect models to learn and recall frequent and infrequent facts. Sample-efficient models are better equipped to handle this challenge of learning and retaining rare information without requiring excessive exposur…
▽ More
Sample efficiency is a crucial property of language models with practical implications for training efficiency. In real-world text, information follows a long-tailed distribution. Yet, we expect models to learn and recall frequent and infrequent facts. Sample-efficient models are better equipped to handle this challenge of learning and retaining rare information without requiring excessive exposure. This study analyzes multiple models of varying architectures and sizes, all trained on the same pre-training data. By annotating relational facts with their frequencies in the training corpus, we examine how model performance varies with fact frequency. Our findings show that most models perform similarly on high-frequency facts but differ notably on low-frequency facts. This analysis provides new insights into the relationship between model architecture, size, and factual learning efficiency.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Tracker Installations Are Not Created Equal: Understanding Tracker Configuration of Form Data Collection
Authors:
Julia B. Kieserman,
Athanasios Andreou,
Chris Geeng,
Tobias Lauinger,
Damon McCoy
Abstract:
Targeted advertising is fueled by the comprehensive tracking of users' online activity. As a result, advertising companies, such as Google and Meta, encourage website administrators to not only install tracking scripts on their websites but configure them to automatically collect users' Personally Identifying Information (PII). In this study, we aim to characterize how Google and Meta's trackers c…
▽ More
Targeted advertising is fueled by the comprehensive tracking of users' online activity. As a result, advertising companies, such as Google and Meta, encourage website administrators to not only install tracking scripts on their websites but configure them to automatically collect users' Personally Identifying Information (PII). In this study, we aim to characterize how Google and Meta's trackers can be configured to collect PII data from web forms. We first perform a qualitative analysis of how third parties present form data collection to website administrators in the documentation and user interface. We then perform a measurement study of 40,150 websites to quantify the prevalence and configuration of Google and Meta trackers.
Our results reveal that both Meta and Google encourage the use of form data collection and include inaccurate statements about hashing PII as a privacy-preserving method. Additionally, we find that Meta includes configuring form data collection as part of the basic setup flow. Our large-scale measurement study reveals that while Google trackers are more prevalent than Meta trackers (72.6% vs. 28.2% of websites), Meta trackers are configured to collect form data more frequently (11.6% vs. 62.3%). Finally, we identify sensitive finance and health websites that have installed trackers that are likely configured to collect form data PII in violation of Meta and Google policies. Our study highlights how tracker documentation and interfaces can potentially play a role in users' privacy through the configuration choices made by the website administrators who install trackers.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Camera Calibration via Circular Patterns: A Comprehensive Framework with Measurement Uncertainty and Unbiased Projection Model
Authors:
Chaehyeon Song,
Dongjae Lee,
Jongwoo Lim,
Ayoung Kim
Abstract:
Camera calibration using planar targets has been widely favored, and two types of control points have been mainly considered as measurements: the corners of the checkerboard and the centroid of circles. Since a centroid is derived from numerous pixels, the circular pattern provides more precise measurements than the checkerboard. However, the existing projection model of circle centroids is biased…
▽ More
Camera calibration using planar targets has been widely favored, and two types of control points have been mainly considered as measurements: the corners of the checkerboard and the centroid of circles. Since a centroid is derived from numerous pixels, the circular pattern provides more precise measurements than the checkerboard. However, the existing projection model of circle centroids is biased under lens distortion, resulting in low performance. To surmount this limitation, we propose an unbiased projection model of the circular pattern and demonstrate its superior accuracy compared to the checkerboard. Complementing this, we introduce uncertainty into circular patterns to enhance calibration robustness and completeness. Defining centroid uncertainty improves the performance of calibration components, including pattern detection, optimization, and evaluation metrics. We also provide guidelines for performing good camera calibration based on the evaluation metric. The core concept of this approach is to model the boundary points of a two-dimensional shape as a Markov random field, considering its connectivity. The shape distribution is propagated to the centroid uncertainty through an appropriate shape representation based on the Green theorem. Consequently, the resulting framework achieves marked gains in calibration accuracy and robustness. The complete source code and demonstration video are available at https://github.com/chaehyeonsong/discocal.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Accountability of Robust and Reliable AI-Enabled Systems: A Preliminary Study and Roadmap
Authors:
Filippo Scaramuzza,
Damian A. Tamburri,
Willem-Jan van den Heuvel
Abstract:
This vision paper presents initial research on assessing the robustness and reliability of AI-enabled systems, and key factors in ensuring their safety and effectiveness in practical applications, including a focus on accountability. By exploring evolving definitions of these concepts and reviewing current literature, the study highlights major challenges and approaches in the field. A case study…
▽ More
This vision paper presents initial research on assessing the robustness and reliability of AI-enabled systems, and key factors in ensuring their safety and effectiveness in practical applications, including a focus on accountability. By exploring evolving definitions of these concepts and reviewing current literature, the study highlights major challenges and approaches in the field. A case study is used to illustrate real-world applications, emphasizing the need for innovative testing solutions. The incorporation of accountability is crucial for building trust and ensuring responsible AI development. The paper outlines potential future research directions and identifies existing gaps, positioning robustness, reliability, and accountability as vital areas for the development of trustworthy AI systems of the future.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Learning Dexterous Object Handover
Authors:
Daniel Frau-Alfaro,
Julio Castaño-Amoros,
Santiago Puente,
Pablo Gil,
Roberto Calandra
Abstract:
Object handover is an important skill that we use daily when interacting with other humans. To deploy robots in collaborative setting, like houses, being able to receive and handing over objects safely and efficiently becomes a crucial skill. In this work, we demonstrate the use of Reinforcement Learning (RL) for dexterous object handover between two multi-finger hands. Key to this task is the use…
▽ More
Object handover is an important skill that we use daily when interacting with other humans. To deploy robots in collaborative setting, like houses, being able to receive and handing over objects safely and efficiently becomes a crucial skill. In this work, we demonstrate the use of Reinforcement Learning (RL) for dexterous object handover between two multi-finger hands. Key to this task is the use of a novel reward function based on dual quaternions to minimize the rotation distance, which outperforms other rotation representations such as Euler and rotation matrices. The robustness of the trained policy is experimentally evaluated by testing w.r.t. objects that are not included in the training distribution, and perturbations during the handover process. The results demonstrate that the trained policy successfully perform this task, achieving a total success rate of 94% in the best-case scenario after 100 experiments, thereby showing the robustness of our policy with novel objects. In addition, the best-case performance of the policy decreases by only 13.8% when the other robot moves during the handover, proving that our policy is also robust to this type of perturbation, which is common in real-world object handovers.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Self-supervised Feature Extraction for Enhanced Ball Detection on Soccer Robots
Authors:
Can Lin,
Daniele Affinita,
Marco E. P. Zimmatore,
Daniele Nardi,
Domenico D. Bloisi,
Vincenzo Suriani
Abstract:
Robust and accurate ball detection is a critical component for autonomous humanoid soccer robots, particularly in dynamic and challenging environments such as RoboCup outdoor fields. However, traditional supervised approaches require extensive manual annotation, which is costly and time-intensive. To overcome this problem, we present a self-supervised learning framework for domain-adaptive feature…
▽ More
Robust and accurate ball detection is a critical component for autonomous humanoid soccer robots, particularly in dynamic and challenging environments such as RoboCup outdoor fields. However, traditional supervised approaches require extensive manual annotation, which is costly and time-intensive. To overcome this problem, we present a self-supervised learning framework for domain-adaptive feature extraction to enhance ball detection performance. The proposed approach leverages a general-purpose pretrained model to generate pseudo-labels, which are then used in a suite of self-supervised pretext tasks -- including colorization, edge detection, and triplet loss -- to learn robust visual features without relying on manual annotations. Additionally, a model-agnostic meta-learning (MAML) strategy is incorporated to ensure rapid adaptation to new deployment scenarios with minimal supervision. A new dataset comprising 10,000 labeled images from outdoor RoboCup SPL matches is introduced, used to validate the method, and made available to the community. Experimental results demonstrate that the proposed pipeline outperforms baseline models in terms of accuracy, F1 score, and IoU, while also exhibiting faster convergence.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Zero-Knowledge Proof-of-Location Protocols for Vehicle Subsidies and Taxation Compliance
Authors:
Dan Bogdanov,
Eduardo Brito,
Annika Jaakson,
Peeter Laud,
Raul-Martin Rebane
Abstract:
This paper introduces a new set of privacy-preserving mechanisms for verifying compliance with location-based policies for vehicle taxation, or for (electric) vehicle (EV) subsidies, using Zero-Knowledge Proofs (ZKPs). We present the design and evaluation of a Zero-Knowledge Proof-of-Location (ZK-PoL) system that ensures a vehicle's adherence to territorial driving requirements without disclosing…
▽ More
This paper introduces a new set of privacy-preserving mechanisms for verifying compliance with location-based policies for vehicle taxation, or for (electric) vehicle (EV) subsidies, using Zero-Knowledge Proofs (ZKPs). We present the design and evaluation of a Zero-Knowledge Proof-of-Location (ZK-PoL) system that ensures a vehicle's adherence to territorial driving requirements without disclosing specific location data, hence maintaining user privacy. Our findings suggest a promising approach to apply ZK-PoL protocols in large-scale governmental subsidy or taxation programs.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation
Authors:
Riccardo Corvi,
Davide Cozzolino,
Ekta Prashnani,
Shalini De Mello,
Koki Nagano,
Luisa Verdoliva
Abstract:
Synthetic video generation is progressing very rapidly. The latest models can produce very realistic high-resolution videos that are virtually indistinguishable from real ones. Although several video forensic detectors have been recently proposed, they often exhibit poor generalization, which limits their applicability in a real-world scenario. Our key insight to overcome this issue is to guide th…
▽ More
Synthetic video generation is progressing very rapidly. The latest models can produce very realistic high-resolution videos that are virtually indistinguishable from real ones. Although several video forensic detectors have been recently proposed, they often exhibit poor generalization, which limits their applicability in a real-world scenario. Our key insight to overcome this issue is to guide the detector towards seeing what really matters. In fact, a well-designed forensic classifier should focus on identifying intrinsic low-level artifacts introduced by a generative architecture rather than relying on high-level semantic flaws that characterize a specific model. In this work, first, we study different generative architectures, searching and identifying discriminative features that are unbiased, robust to impairments, and shared across models. Then, we introduce a novel forensic-oriented data augmentation strategy based on the wavelet decomposition and replace specific frequency-related bands to drive the model to exploit more relevant forensic cues. Our novel training paradigm improves the generalizability of AI-generated video detectors, without the need for complex algorithms and large datasets that include multiple synthetic generators. To evaluate our approach, we train the detector using data from a single generative model and test it against videos produced by a wide range of other models. Despite its simplicity, our method achieves a significant accuracy improvement over state-of-the-art detectors and obtains excellent results even on very recent generative models, such as NOVA and FLUX. Code and data will be made publicly available.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
A Generic Construction of $q$-ary Near-MDS Codes Supporting 2-Designs with Lengths Beyond $q+1$
Authors:
Hengfeng Liu,
Chunming Tang,
Zhengchun Zhou,
Dongchun Han,
Hao Chen
Abstract:
A linear code with parameters $[n, k, n - k + 1]$ is called maximum distance separable (MDS), and one with parameters $[n, k, n - k]$ is called almost MDS (AMDS). A code is near-MDS (NMDS) if both it and its dual are AMDS. NMDS codes supporting combinatorial $t$-designs have attracted growing interest, yet constructing such codes remains highly challenging. In 2020, Ding and Tang initiated the stu…
▽ More
A linear code with parameters $[n, k, n - k + 1]$ is called maximum distance separable (MDS), and one with parameters $[n, k, n - k]$ is called almost MDS (AMDS). A code is near-MDS (NMDS) if both it and its dual are AMDS. NMDS codes supporting combinatorial $t$-designs have attracted growing interest, yet constructing such codes remains highly challenging. In 2020, Ding and Tang initiated the study of NMDS codes supporting 2-designs by constructing the first infinite family, followed by several other constructions for $t > 2$, all with length at most $q + 1$. Although NMDS codes can, in principle, exceed this length, known examples supporting 2-designs and having length greater than $q + 1$ are extremely rare and limited to a few sporadic binary and ternary cases. In this paper, we present the first \emph{generic construction} of $q$-ary NMDS codes supporting 2-designs with lengths \emph{exceeding $q + 1$}. Our method leverages new connections between elliptic curve codes, finite abelian groups, subset sums, and combinatorial designs, resulting in an infinite family of such codes along with their weight distributions.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
TabArena: A Living Benchmark for Machine Learning on Tabular Data
Authors:
Nick Erickson,
Lennart Purucker,
Andrej Tschalzev,
David Holzmüller,
Prateek Mutalik Desai,
and David Salinas,
Frank Hutter
Abstract:
With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular b…
▽ More
With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system. To launch TabArena, we manually curate a representative collection of datasets and well-implemented models, conduct a large-scale benchmarking study to initialize a public leaderboard, and assemble a team of experienced maintainers. Our results highlight the influence of validation method and ensembling of hyperparameter configurations to benchmark models at their full potential. While gradient-boosted trees are still strong contenders on practical tabular datasets, we observe that deep learning methods have caught up under larger time budgets with ensembling. At the same time, foundation models excel on smaller datasets. Finally, we show that ensembles across models advance the state-of-the-art in tabular machine learning and investigate the contributions of individual models. We launch TabArena with a public leaderboard, reproducible code, and maintenance protocols to create a living benchmark available at https://tabarena.ai.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.