-
Whole-Body Conditioned Egocentric Video Prediction
Authors:
Yutong Bai,
Danny Tran,
Amir Bar,
Yann LeCun,
Trevor Darrell,
Jitendra Malik
Abstract:
We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional dif…
▽ More
We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks, enabling a comprehensive analysis of the model's embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
Authors:
Alex Costanzino,
Pierluigi Zama Ramirez,
Luigi Lella,
Matteo Ragaglia,
Alessandro Oliva,
Giuseppe Lisanti,
Luigi Di Stefano
Abstract:
We propose SiM3D, the first benchmark considering the integration of multiview and multimodal information for comprehensive 3D anomaly detection and segmentation (ADS), where the task is to produce a voxel-based Anomaly Volume. Moreover, SiM3D focuses on a scenario of high interest in manufacturing: single-instance anomaly detection, where only one object, either real or synthetic, is available fo…
▽ More
We propose SiM3D, the first benchmark considering the integration of multiview and multimodal information for comprehensive 3D anomaly detection and segmentation (ADS), where the task is to produce a voxel-based Anomaly Volume. Moreover, SiM3D focuses on a scenario of high interest in manufacturing: single-instance anomaly detection, where only one object, either real or synthetic, is available for training. In this respect, SiM3D stands out as the first ADS benchmark that addresses the challenge of generalising from synthetic training data to real test data. SiM3D includes a novel multimodal multiview dataset acquired using top-tier industrial sensors and robots. The dataset features multiview high-resolution images (12 Mpx) and point clouds (7M points) for 333 instances of eight types of objects, alongside a CAD model for each type. We also provide manually annotated 3D segmentation GTs for anomalous test samples. To establish reference baselines for the proposed multiview 3D ADS task, we adapt prominent singleview methods and assess their performance using novel metrics that operate on Anomaly Volumes.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation
Authors:
Xinzhuo Li,
Adheesh Juvekar,
Xingyou Liu,
Muntasir Wahed,
Kiet A. Nguyen,
Ismini Lourentzou
Abstract:
Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations withou…
▽ More
Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Detecting weighted hidden cliques
Authors:
Urmisha Chatterjee,
Karissa Huang,
Ritabrata Karmakar,
B. R. Vinay Kumar,
Gábor Lugosi,
Nandan Malhotra,
Anirban Mandal,
Maruf Alam Tarafdar
Abstract:
We study a generalization of the classical hidden clique problem to graphs with real-valued edge weights. Formally, we define a hypothesis testing problem. Under the null hypothesis, edges of a complete graph on $n$ vertices are associated with independent and identically distributed edge weights from a distribution $P$. Under the alternate hypothesis, $k$ vertices are chosen at random and the edg…
▽ More
We study a generalization of the classical hidden clique problem to graphs with real-valued edge weights. Formally, we define a hypothesis testing problem. Under the null hypothesis, edges of a complete graph on $n$ vertices are associated with independent and identically distributed edge weights from a distribution $P$. Under the alternate hypothesis, $k$ vertices are chosen at random and the edge weights between them are drawn from a distribution $Q$, while the remaining are sampled from $P$. The goal is to decide, upon observing the edge weights, which of the two hypotheses they were generated from. We investigate the problem under two different scenarios: (1) when $P$ and $Q$ are completely known, and (2) when there is only partial information of $P$ and $Q$. In the first scenario, we obtain statistical limits on $k$ when the two hypotheses are distinguishable, and when they are not. Additionally, in each of the scenarios, we provide bounds on the minimal risk of the hypothesis testing problem when $Q$ is not absolutely continuous with respect to $P$. We also provide computationally efficient spectral tests that can distinguish the two hypotheses as long as $k=Ω(\sqrt{n})$ in both the scenarios.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
Authors:
Hani Alomari,
Anushka Sivakumar,
Andrew Zhang,
Chris Thomas
Abstract:
Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities. Traditional methods learn a single-vector embedding to represent semantics of each sample, but struggle to capture nuanced and diverse relationships that can exist across modalities. Set-based approaches, which represent each sample with multiple embeddings, offer…
▽ More
Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities. Traditional methods learn a single-vector embedding to represent semantics of each sample, but struggle to capture nuanced and diverse relationships that can exist across modalities. Set-based approaches, which represent each sample with multiple embeddings, offer a promising alternative, as they can capture richer and more diverse relationships. In this paper, we show that, despite their promise, these set-based representations continue to face issues including sparse supervision and set collapse, which limits their effectiveness. To address these challenges, we propose Maximal Pair Assignment Similarity to optimize one-to-one matching between embedding sets which preserve semantic diversity within the set. We also introduce two loss functions to further enhance the representations: Global Discriminative Loss to enhance distinction among embeddings, and Intra-Set Divergence Loss to prevent collapse within each set. Our method achieves state-of-the-art performance on MS-COCO and Flickr30k without relying on external data.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Exploring the Design Space of 3D MLLMs for CT Report Generation
Authors:
Mohammed Baharoon,
Jun Ma,
Congyu Fang,
Augustin Toma,
Bo Wang
Abstract:
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods tha…
▽ More
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
Authors:
Akshay Paruchuri,
Maryam Aziz,
Rohit Vartak,
Ayman Ali,
Best Uchehara,
Xin Liu,
Ishan Chatterjee,
Monica Agrawal
Abstract:
People are increasingly seeking healthcare information from large language models (LLMs) via interactive chatbots, yet the nature and inherent risks of these conversations remain largely unexplored. In this paper, we filter large-scale conversational AI datasets to achieve HealthChat-11K, a curated dataset of 11K real-world conversations composed of 25K user messages. We use HealthChat-11K and a c…
▽ More
People are increasingly seeking healthcare information from large language models (LLMs) via interactive chatbots, yet the nature and inherent risks of these conversations remain largely unexplored. In this paper, we filter large-scale conversational AI datasets to achieve HealthChat-11K, a curated dataset of 11K real-world conversations composed of 25K user messages. We use HealthChat-11K and a clinician-driven taxonomy for how users interact with LLMs when seeking healthcare information in order to systematically study user interactions across 21 distinct health specialties. Our analysis reveals insights into the nature of how and why users seek health information, such as common interactions, instances of incomplete context, affective behaviors, and interactions (e.g., leading questions) that can induce sycophancy, underscoring the need for improvements in the healthcare support capabilities of LLMs deployed as conversational AI. Code and artifacts to retrieve our analyses and combine them into a curated dataset can be found here: https://github.com/yahskapar/HealthChat
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation
Authors:
Mohammed Rakib,
Arunkumar Bagavathi
Abstract:
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided…
▽ More
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided Distillation (G$^{2}$D), a knowledge distillation framework that optimizes the multimodal model with a custom-built loss function that fuses both unimodal and multimodal objectives. G$^{2}$D further incorporates a dynamic sequential modality prioritization (SMP) technique in the learning process to ensure each modality leads the learning process, avoiding the pitfall of stronger modalities overshadowing weaker ones. We validate G$^{2}$D on multiple real-world datasets and show that G$^{2}$D amplifies the significance of weak modalities while training and outperforms state-of-the-art methods in classification and regression tasks. Our code is available at https://github.com/rAIson-Lab/G2D.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Gaussian Invariant Markov Chain Monte Carlo
Authors:
Michalis K. Titsias,
Angelos Alexopoulos,
Siran Liu,
Petros Dellaportas
Abstract:
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that…
▽ More
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
skLEP: A Slovak General Language Understanding Benchmark
Authors:
Marek Šuppa,
Andrej Ridzik,
Daniel Hládek,
Tomáš Javůrek,
Viktória Ondrejová,
Kristína Sásiková,
Martin Tamajka,
Marián Šimko
Abstract:
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datase…
▽ More
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Authors:
Boyu Gou,
Zanming Huang,
Yuting Ning,
Yu Gu,
Michael Lin,
Weijian Qi,
Andrei Kopanev,
Botao Yu,
Bernal Jiménez Gutiérrez,
Yiheng Shu,
Chan Hee Song,
Jiaman Wu,
Shijie Chen,
Hanane Nour Moussa,
Tianshu Zhang,
Jian Xie,
Yifei Li,
Tianci Xue,
Zeyi Liao,
Kai Zhang,
Boyuan Zheng,
Zhaowei Cai,
Viktor Rozgic,
Morteza Ziyadi,
Huan Sun
, et al. (1 additional authors not shown)
Abstract:
Agentic search such as Deep Research systems, where large language models autonomously browse the web, synthesize information, and return comprehensive citation-backed answers, represents a major shift in how users interact with web-scale information. While promising greater efficiency and cognitive offloading, the growing complexity and open-endedness of agentic search have outpaced existing eval…
▽ More
Agentic search such as Deep Research systems, where large language models autonomously browse the web, synthesize information, and return comprehensive citation-backed answers, represents a major shift in how users interact with web-scale information. While promising greater efficiency and cognitive offloading, the growing complexity and open-endedness of agentic search have outpaced existing evaluation benchmarks and methodologies, which largely assume short search horizons and static answers. In this paper, we introduce Mind2Web 2, a benchmark of 130 realistic, high-quality, and long-horizon tasks that require real-time web browsing and extensive information synthesis, constructed with over 1,000 hours of human labor. To address the challenge of evaluating time-varying and complex answers, we propose a novel Agent-as-a-Judge framework. Our method constructs task-specific judge agents based on a tree-structured rubric design to automatically assess both answer correctness and source attribution. We conduct a comprehensive evaluation of nine frontier agentic search systems and human performance, along with a detailed error analysis to draw insights for future development. The best-performing system, OpenAI Deep Research, can already achieve 50-70% of human performance while spending half the time, showing a great potential. Altogether, Mind2Web 2 provides a rigorous foundation for developing and benchmarking the next generation of agentic search systems.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Bridging Offline and Online Reinforcement Learning for LLMs
Authors:
Jack Lanchantin,
Angelica Chen,
Janice Lan,
Xian Li,
Swarnadeep Saha,
Tianlu Wang,
Jing Xu,
Ping Yu,
Weizhe Yuan,
Jason E Weston,
Sainbayar Sukhbaatar,
Ilia Kulikov
Abstract:
We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensive…
▽ More
We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Ad-Hoc Human-AI Coordination Challenge
Authors:
Tin Dizdarević,
Ravi Hammond,
Tobias Gessler,
Anisoara Calinescu,
Jonathan Cook,
Matteo Gallici,
Andrei Lupu,
Jakob Nicolaus Foerster
Abstract:
Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been…
▽ More
Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been limited by the challenges of human evaluation. In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. We develop \textit{human proxy agents} on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we open-source a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. We present baseline results for both two- and three- player Hanabi scenarios. To ensure fair evaluation, we host the proxy agents through a controlled evaluation system rather than releasing them publicly. The code is available at \href{https://github.com/FLAIROx/ah2ac2}{https://github.com/FLAIROx/ah2ac2}.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
OptGM: An Optimized Gate Merging Method to Mitigate NBTI in Digital Circuits
Authors:
Maryam Ghane,
Amir M. Hajisadeghi,
Hamid R. Zarandi
Abstract:
This paper presents OptGM, an optimized gate merging method designed to mitigate negative bias temperature instability (NBTI) in digital circuits. First, the proposed approach effectively identifies NBTI-critical internal nodes, defined as those with a signal probability exceeding a predefined threshold. Next, based on the proposed optimized algorithm, the sensitizer gate (which drives the critica…
▽ More
This paper presents OptGM, an optimized gate merging method designed to mitigate negative bias temperature instability (NBTI) in digital circuits. First, the proposed approach effectively identifies NBTI-critical internal nodes, defined as those with a signal probability exceeding a predefined threshold. Next, based on the proposed optimized algorithm, the sensitizer gate (which drives the critical node) and the sensitive gate (which is fed by it) are merged into a new complex gate. This complex gate preserves the original logic while eliminating NBTI-critical nodes. Finally, to evaluate the effectiveness of OptGM, we assess it on several combinational and sequential benchmark circuits. Simulation results demonstrate that, on average, the number of NBTI-critical transistors (i.e., PMOS transistors connected to critical nodes), NBTI-induced delay degradation, and the total transistor count are reduced by 89.29%, 23.87%, and 6.47%, respectively. Furthermore, OptGM enhances performance per cost (PPC) by 12.8% on average, with minimal area overhead.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Global and Local Entailment Learning for Natural World Imagery
Authors:
Srikumar Sastry,
Aayush Dhakal,
Eric Xing,
Subash Khanal,
Nathan Jacobs
Abstract:
Learning the hierarchical structure of data in vision-language models is a significant challenge. Previous works have attempted to address this challenge by employing entailment learning. However, these approaches fail to model the transitive nature of entailment explicitly, which establishes the relationship between order and semantics within a representation space. In this work, we introduce Rad…
▽ More
Learning the hierarchical structure of data in vision-language models is a significant challenge. Previous works have attempted to address this challenge by employing entailment learning. However, these approaches fail to model the transitive nature of entailment explicitly, which establishes the relationship between order and semantics within a representation space. In this work, we introduce Radial Cross-Modal Embeddings (RCME), a framework that enables the explicit modeling of transitivity-enforced entailment. Our proposed framework optimizes for the partial order of concepts within vision-language models. By leveraging our framework, we develop a hierarchical vision-language foundation model capable of representing the hierarchy in the Tree of Life. Our experiments on hierarchical species classification and hierarchical retrieval tasks demonstrate the enhanced performance of our models compared to the existing state-of-the-art models. Our code and models are open-sourced at https://vishu26.github.io/RCME/index.html.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces
Authors:
Michael Johnston,
Burkhard Ringlein,
Christoph Hagleitner,
Alessandro Pomponio,
Vassilis Vassiliadis,
Christian Pinto,
Srikumar Venugopal
Abstract:
Finding the optimal set of cloud resources to deploy a given workload at minimal cost while meeting a defined service level agreement is an active area of research. Combining tens of parameters applicable across a large selection of compute, storage, and services offered by cloud providers with similar numbers of application-specific parameters leads to configuration spaces with millions of deploy…
▽ More
Finding the optimal set of cloud resources to deploy a given workload at minimal cost while meeting a defined service level agreement is an active area of research. Combining tens of parameters applicable across a large selection of compute, storage, and services offered by cloud providers with similar numbers of application-specific parameters leads to configuration spaces with millions of deployment options.
In this paper, we propose Discovery Space, an abstraction that formalizes the description of workload configuration problems, and exhibits a set of characteristics required for structured, robust and distributed investigations of large search spaces. We describe a concrete implementation of the Discovery Space abstraction and show that it is generalizable across a diverse set of workloads such as Large Language Model inference and Big Data Analytics.
We demonstrate that our approach enables safe, transparent sharing of data between executions of best-of-breed optimizers increasing the efficiency of optimal configuration detection in large search spaces. We also demonstrate how Discovery Spaces enable transfer and reuse of knowledge across similar search spaces, enabling configuration search speed-ups of over 90%.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Aligning Spoken Dialogue Models from User Interactions
Authors:
Anne Wu,
Laurent Mazaré,
Neil Zeghidour,
Alexandre Défossez
Abstract:
We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speak…
▽ More
We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
A Keyword-Based Technique to Evaluate Broad Question Answer Script
Authors:
Tamim Al Mahmud,
Md Gulzar Hussain,
Sumaiya Kabir,
Hasnain Ahmad,
Mahmudus Sobhan
Abstract:
Evaluation is the method of assessing and determining the educational system through various techniques such as verbal or viva-voice test, subjective or objective written test. This paper presents an efficient solution to evaluate the subjective answer script electronically. In this paper, we proposed and implemented an integrated system that examines and evaluates the written answer script. This…
▽ More
Evaluation is the method of assessing and determining the educational system through various techniques such as verbal or viva-voice test, subjective or objective written test. This paper presents an efficient solution to evaluate the subjective answer script electronically. In this paper, we proposed and implemented an integrated system that examines and evaluates the written answer script. This article focuses on finding the keywords from the answer script and then compares them with the keywords that have been parsed from both open and closed domain. The system also checks the grammatical and spelling errors in the answer script. Our proposed system tested with answer scripts of 100 students and gives precision score 0.91.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Towards an Optimal Control Perspective of ResNet Training
Authors:
Jens Püttschneider,
Simon Heilig,
Asja Fischer,
Timm Faulwasser
Abstract:
We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent sk…
▽ More
We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario
Authors:
Cyrus Addy,
Ajay Kumar Gurumadaiah,
Yixiang Gao,
Kwame Awuah-Offei
Abstract:
Underground mining operations face significant safety challenges that make emergency response capabilities crucial. While robots have shown promise in assisting with search and rescue operations, their effectiveness depends on reliable miner detection capabilities. Deep learning algorithms offer potential solutions for automated miner detection, but require comprehensive training datasets, which a…
▽ More
Underground mining operations face significant safety challenges that make emergency response capabilities crucial. While robots have shown promise in assisting with search and rescue operations, their effectiveness depends on reliable miner detection capabilities. Deep learning algorithms offer potential solutions for automated miner detection, but require comprehensive training datasets, which are currently lacking for underground mining environments. This paper presents a novel thermal imaging dataset specifically designed to enable the development and validation of miner detection systems for potential emergency applications. We systematically captured thermal imagery of various mining activities and scenarios to create a robust foundation for detection algorithms. To establish baseline performance metrics, we evaluated several state-of-the-art object detection algorithms including YOLOv8, YOLOv10, YOLO11, and RT-DETR on our dataset. While not exhaustive of all possible emergency situations, this dataset serves as a crucial first step toward developing reliable thermal-based miner detection systems that could eventually be deployed in real emergency scenarios. This work demonstrates the feasibility of using thermal imaging for miner detection and establishes a foundation for future research in this critical safety application.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Controllable 3D Placement of Objects with Scene-Aware Diffusion Models
Authors:
Mohamed Omran,
Dimitris Kalatzis,
Jens Petersen,
Amirhossein Habibian,
Auke Wiggers
Abstract:
Image editing approaches have become more powerful and flexible with the advent of powerful text-conditioned generative models. However, placing objects in an environment with a precise location and orientation still remains a challenge, as this typically requires carefully crafted inpainting masks or prompts. In this work, we show that a carefully designed visual map, combined with coarse object…
▽ More
Image editing approaches have become more powerful and flexible with the advent of powerful text-conditioned generative models. However, placing objects in an environment with a precise location and orientation still remains a challenge, as this typically requires carefully crafted inpainting masks or prompts. In this work, we show that a carefully designed visual map, combined with coarse object masks, is sufficient for high quality object placement. We design a conditioning signal that resolves ambiguities, while being flexible enough to allow for changing of shapes or object orientations. By building on an inpainting model, we leave the background intact by design, in contrast to methods that model objects and background jointly. We demonstrate the effectiveness of our method in the automotive setting, where we compare different conditioning signals in novel object placement tasks. These tasks are designed to measure edit quality not only in terms of appearance, but also in terms of pose and location accuracy, including cases that require non-trivial shape changes. Lastly, we show that fine location control can be combined with appearance control to place existing objects in precise locations in a scene.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation
Authors:
Sweta Banerjee,
Viktoria Weiss,
Taryn A. Donovan,
Rutger A. Fick,
Thomas Conrad,
Jonas Ammeling,
Nils Porsche,
Robert Klopfleisch,
Christopher Kaltenecker,
Katharina Breininger,
Marc Aubreville,
Christof A. Bertram
Abstract:
Atypical mitoses mark a deviation in the cell division process that can be an independent prognostically relevant marker for tumor malignancy. However, their identification remains challenging due to low prevalence, at times subtle morphological differences from normal mitoses, low inter-rater agreement among pathologists, and class imbalance in datasets. Building on the Atypical Mitosis dataset f…
▽ More
Atypical mitoses mark a deviation in the cell division process that can be an independent prognostically relevant marker for tumor malignancy. However, their identification remains challenging due to low prevalence, at times subtle morphological differences from normal mitoses, low inter-rater agreement among pathologists, and class imbalance in datasets. Building on the Atypical Mitosis dataset for Breast Cancer (AMi-Br), this study presents a comprehensive benchmark comparing deep learning approaches for automated atypical mitotic figure (AMF) classification, including baseline models, foundation models with linear probing, and foundation models fine-tuned with low-rank adaptation (LoRA). For rigorous evaluation, we further introduce two new hold-out AMF datasets - AtNorM-Br, a dataset of mitoses from the The TCGA breast cancer cohort, and AtNorM-MD, a multi-domain dataset of mitoses from the MIDOG++ training set. We found average balanced accuracy values of up to 0.8135, 0.7696, and 0.7705 on the in-domain AMi-Br and the out-of-domain AtNorm-Br and AtNorM-MD datasets, respectively, with the results being particularly good for LoRA-based adaptation of the Virchow-line of foundation models. Our work shows that atypical mitosis classification, while being a challenging problem, can be effectively addressed through the use of recent advances in transfer learning and model fine-tuning techniques. We make available all code and data used in this paper in this github repository: https://github.com/DeepMicroscopy/AMi-Br_Benchmark.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
Authors:
Ali Şenol,
Garima Agrawal,
Huan Liu
Abstract:
Detecting deceptive conversations on dynamic platforms is increasingly difficult due to evolving language patterns and Concept Drift (CD)-i.e., semantic or topical shifts that alter the context or intent of interactions over time. These shifts can obscure malicious intent or mimic normal dialogue, making accurate classification challenging. While Large Language Models (LLMs) show strong performanc…
▽ More
Detecting deceptive conversations on dynamic platforms is increasingly difficult due to evolving language patterns and Concept Drift (CD)-i.e., semantic or topical shifts that alter the context or intent of interactions over time. These shifts can obscure malicious intent or mimic normal dialogue, making accurate classification challenging. While Large Language Models (LLMs) show strong performance in natural language tasks, they often struggle with contextual ambiguity and hallucinations in risk-sensitive scenarios. To address these challenges, we present a Domain Knowledge (DK)-Enhanced LLM framework that integrates pretrained LLMs with structured, task-specific insights to perform fraud and concept drift detection. The proposed architecture consists of three main components: (1) a DK-LLM module to detect fake or deceptive conversations; (2) a drift detection unit (OCDD) to determine whether a semantic shift has occurred; and (3) a second DK-LLM module to classify the drift as either benign or fraudulent. We first validate the value of domain knowledge using a fake review dataset and then apply our full framework to SEConvo, a multiturn dialogue dataset that includes various types of fraud and spam attacks. Results show that our system detects fake conversations with high accuracy and effectively classifies the nature of drift. Guided by structured prompts, the LLaMA-based implementation achieves 98% classification accuracy. Comparative studies against zero-shot baselines demonstrate that incorporating domain knowledge and drift awareness significantly improves performance, interpretability, and robustness in high-stakes NLP applications.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform
Authors:
Maxime Leiber,
Yosra Marnissi,
Axel Barrau,
Sylvain Meignen,
Laurent Massoulié
Abstract:
The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the li…
▽ More
The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Evolution and determinants of firm-level systemic risk in local production networks
Authors:
Anna Mancini,
Balázs Lengyel,
Riccardo Di Clemente,
Giulio Cimini
Abstract:
Recent crises like the COVID-19 pandemic and geopolitical tensions have exposed vulnerabilities and caused disruptions of supply chains, leading to product shortages, increased costs, and economic instability. This has prompted increasing efforts to assess systemic risk, namely the effects of firm disruptions on entire economies. However, the ability of firms to react to crises by rewiring their s…
▽ More
Recent crises like the COVID-19 pandemic and geopolitical tensions have exposed vulnerabilities and caused disruptions of supply chains, leading to product shortages, increased costs, and economic instability. This has prompted increasing efforts to assess systemic risk, namely the effects of firm disruptions on entire economies. However, the ability of firms to react to crises by rewiring their supply links has been largely overlooked, limiting our understanding of production networks resilience. Here we study dynamics and determinants of firm-level systemic risk in the Hungarian production network from 2015 to 2022. We use as benchmark a heuristic maximum entropy null model that generates an ensemble of production networks at equilibrium, by preserving the total input (demand) and output (supply) of each firm at the sector level. We show that the fairly stable set of firms with highest systemic risk undergoes a structural change during COVID-19, as those enabling economic exchanges become key players in the economy -- a result which is not reproduced by the null model. Although the empirical systemic risk aligns well with the null value until the onset of the pandemic, it becomes significantly smaller afterwards as the adaptive behavior of firms leads to a more resilient economy. Furthermore, firms' international trade volume (being a subject of disruption) becomes a significant predictor of their systemic risk. However, international links cannot provide an unequivocal explanation for the observed trends, as imports and exports have opposing effects on local systemic risk through the supply and demand channels.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Distributed Cross-Channel Hierarchical Aggregation for Foundation Models
Authors:
Aristeidis Tsaris,
Isaac Lyngaas,
John Lagregren,
Mohamed Wahib,
Larry York,
Prasanna Balaprakash,
Dan Lu,
Feiyi Wang,
Xiao Wang
Abstract:
Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-inte…
▽ More
Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated with tensor parallelism and model sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD GPUs on the Frontier Supercomputer.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference
Authors:
Colin Samplawski,
Adam D. Cobb,
Manoj Acharya,
Ramneet Kaur,
Susmit Jha
Abstract:
Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference ove…
▽ More
Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model. While effective, these approaches struggle to scale to larger LLMs due to requiring further additional parameters compared to LoRA. In this work we present $\textbf{Scala}$ble $\textbf{B}$ayesian $\textbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL). We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$. By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM. This allows us to learn all the parameters of our approach using stochastic variational inference. Despite the low dimensionality of our subspace, we are able to achieve competitive performance with state-of-the-art approaches while only requiring ${\sim}1000$ additional parameters. Furthermore, it allows us to scale up to the largest Bayesian LLM to date, with four times as a many base parameters as prior work.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Flowcut Switching: High-Performance Adaptive Routing with In-Order Delivery Guarantees
Authors:
Tommaso Bonato,
Daniele De Sensi,
Salvatore Di Girolamo,
Abdulla Bataineh,
David Hewson,
Duncan Roweth,
Torsten Hoefler
Abstract:
Network latency severely impacts the performance of applications running on supercomputers. Adaptive routing algorithms route packets over different available paths to reduce latency and improve network utilization. However, if a switch routes packets belonging to the same network flow on different paths, they might arrive at the destination out-of-order due to differences in the latency of these…
▽ More
Network latency severely impacts the performance of applications running on supercomputers. Adaptive routing algorithms route packets over different available paths to reduce latency and improve network utilization. However, if a switch routes packets belonging to the same network flow on different paths, they might arrive at the destination out-of-order due to differences in the latency of these paths. For some transport protocols like TCP, QUIC, and RoCE, out-of-order (OOO) packets might cause large performance drops or significantly increase CPU utilization. In this work, we propose flowcut switching, a new adaptive routing algorithm that provides high-performance in-order packet delivery. Differently from existing solutions like flowlet switching, which are based on the assumption of bursty traffic and that might still reorder packets, flowcut switching guarantees in-order delivery under any network conditions, and is effective also for non-bursty traffic, as it is often the case for RDMA.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Pay Attention to Small Weights
Authors:
Chao Zhou,
Tom Jacobs,
Advait Gadhikar,
Rebekka Burkholz
Abstract:
Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights.…
▽ More
Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
ToosiCubix: Monocular 3D Cuboid Labeling via Vehicle Part Annotations
Authors:
Behrooz Nasihatkon,
Hossein Resani,
Amirreza Mehrzadian
Abstract:
Many existing methods for 3D cuboid annotation of vehicles rely on expensive and carefully calibrated camera-LiDAR or stereo setups, limiting their accessibility for large-scale data collection. We introduce ToosiCubix, a simple yet powerful approach for annotating ground-truth cuboids using only monocular images and intrinsic camera parameters. Our method requires only about 10 user clicks per ve…
▽ More
Many existing methods for 3D cuboid annotation of vehicles rely on expensive and carefully calibrated camera-LiDAR or stereo setups, limiting their accessibility for large-scale data collection. We introduce ToosiCubix, a simple yet powerful approach for annotating ground-truth cuboids using only monocular images and intrinsic camera parameters. Our method requires only about 10 user clicks per vehicle, making it highly practical for adding 3D annotations to existing datasets originally collected without specialized equipment. By annotating specific features (e.g., wheels, car badge, symmetries) across different vehicle parts, we accurately estimate each vehicle's position, orientation, and dimensions up to a scale ambiguity (8 DoF). The geometric constraints are formulated as an optimization problem, which we solve using a coordinate descent strategy, alternating between Perspective-n-Points (PnP) and least-squares subproblems. To handle common ambiguities such as scale and unobserved dimensions, we incorporate probabilistic size priors, enabling 9 DoF cuboid placements. We validate our annotations against the KITTI and Cityscapes3D datasets, demonstrating that our method offers a cost-effective and scalable solution for high-quality 3D cuboid annotation.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations
Authors:
Julian Lorenz,
Mrunmai Phatak,
Robin Schön,
Katja Ludwig,
Nico Hörmann,
Annemarie Friedrich,
Rainer Lienhart
Abstract:
2D scene graphs provide a structural and explainable framework for scene understanding. However, current work still struggles with the lack of accurate scene graph data. To overcome this data bottleneck, we present CoPa-SG, a synthetic scene graph dataset with highly precise ground truth and exhaustive relation annotations between all objects. Moreover, we introduce parametric and proto-relations,…
▽ More
2D scene graphs provide a structural and explainable framework for scene understanding. However, current work still struggles with the lack of accurate scene graph data. To overcome this data bottleneck, we present CoPa-SG, a synthetic scene graph dataset with highly precise ground truth and exhaustive relation annotations between all objects. Moreover, we introduce parametric and proto-relations, two new fundamental concepts for scene graphs. The former provides a much more fine-grained representation than its traditional counterpart by enriching relations with additional parameters such as angles or distances. The latter encodes hypothetical relations in a scene graph and describes how relations would form if new objects are placed in the scene. Using CoPa-SG, we compare the performance of various scene graph generation models. We demonstrate how our new relation types can be integrated in downstream applications to enhance planning and reasoning capabilities.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Real-time Terrain Analysis for Off-road Autonomous Vehicles
Authors:
Edwina Lewis,
Aditya Parameshwaran,
Laura Redmond,
Yue Wang
Abstract:
This research addresses critical autonomous vehicle control challenges arising from road roughness variation, which induces course deviations and potential loss of road contact during steering operations. We present a novel real-time road roughness estimation system employing Bayesian calibration methodology that processes axle accelerations to predict terrain roughness with quantifiable confidenc…
▽ More
This research addresses critical autonomous vehicle control challenges arising from road roughness variation, which induces course deviations and potential loss of road contact during steering operations. We present a novel real-time road roughness estimation system employing Bayesian calibration methodology that processes axle accelerations to predict terrain roughness with quantifiable confidence measures. The technical framework integrates a Gaussian process surrogate model with a simulated half-vehicle model, systematically processing vehicle velocity and road surface roughness parameters to generate corresponding axle acceleration responses. The Bayesian calibration routine performs inverse estimation of road roughness from observed accelerations and velocities, yielding posterior distributions that quantify prediction uncertainty for adaptive risk management. Training data generation utilizes Latin Hypercube sampling across comprehensive velocity and roughness parameter spaces, while the calibrated model integrates seamlessly with a Simplex controller architecture to dynamically adjust velocity limits based on real-time roughness predictions. Experimental validation on stochastically generated surfaces featuring varying roughness regions demonstrates robust real-time characterization capabilities, with the integrated Simplex control strategy effectively enhancing autonomous vehicle operational safety through proactive surface condition response. This innovative Bayesian framework establishes a comprehensive foundation for mitigating roughness-related operational risks while simultaneously improving efficiency and safety margins in autonomous vehicle systems.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification
Authors:
Galvin Brice S. Lim,
Brian Godwin S. Lim,
Argel A. Bandala,
John Anthony C. Jose,
Timothy Scott C. Chu,
Edwin Sybingco
Abstract:
Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity a…
▽ More
Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints. While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals. This study addresses this gap by introducing the attentive graph-temporal convolutional network (AGTCNet), a novel graph-temporal model for motor imagery EEG (MI-EEG) classification. Specifically, AGTCNet leverages the topographic configuration of EEG electrodes as an inductive bias and integrates graph convolutional attention network (GCAT) to jointly learn expressive spatiotemporal EEG representations. The proposed model significantly outperformed existing MI-EEG classifiers, achieving state-of-the-art performance while utilizing a compact architecture, underscoring its effectiveness and practicality for BCI deployment. With a 49.87% reduction in model size, 64.65% faster inference time, and shorter input EEG signal, AGTCNet achieved a moving average accuracy of 66.82% for subject-independent classification on the BCI Competition IV Dataset 2a, which further improved to 82.88% when fine-tuned for subject-specific classification. On the EEG Motor Movement/Imagery Dataset, AGTCNet achieved moving average accuracies of 64.14% and 85.22% for 4-class and 2-class subject-independent classifications, respectively, with further improvements to 72.13% and 90.54% for subject-specific classifications.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Symmetry classes of Hamiltonian cycles
Authors:
Julia Baligacs,
Sofia Brenner,
Annette Lutz,
Lena Volk
Abstract:
We initiate the study of Hamiltonian cycles up to symmetries of the underlying graph. Our focus lies on the extremal case of Hamiltonian-transitive graphs, i.e., Hamiltonian graphs where, for every pair of Hamiltonian cycles, there is a graph automorphism mapping one cycle to the other. This generalizes the extensively studied uniquely Hamiltonian graphs. In this paper, we show that Cayley graphs…
▽ More
We initiate the study of Hamiltonian cycles up to symmetries of the underlying graph. Our focus lies on the extremal case of Hamiltonian-transitive graphs, i.e., Hamiltonian graphs where, for every pair of Hamiltonian cycles, there is a graph automorphism mapping one cycle to the other. This generalizes the extensively studied uniquely Hamiltonian graphs. In this paper, we show that Cayley graphs of abelian groups are not Hamiltonian-transitive (under some mild conditions and some non-surprising exceptions), i.e., they contain at least two structurally different Hamiltonian cycles. To show this, we reduce Hamiltonian-transitivity to properties of the prime factors of a Cartesian product decomposition, which we believe is interesting in its own right. We complement our results by constructing infinite families of regular Hamiltonian-transitive graphs and take a look at the opposite extremal case by constructing a family with many different Hamiltonian cycles up to symmetry.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Automatic Reviewers Assignment to a Research Paper Based on Allied References and Publications Weight
Authors:
Tamim Al Mahmud,
B M Mainul Hossain,
Dilshad Ara
Abstract:
Everyday, a vast stream of research documents is submitted to conferences, anthologies, journals, newsletters, annual reports, daily papers, and various periodicals. Many such publications use independent external specialists to review submissions. This process is called peer review, and the reviewers are called referees. However, it is not always possible to pick the best referee for reviewing. M…
▽ More
Everyday, a vast stream of research documents is submitted to conferences, anthologies, journals, newsletters, annual reports, daily papers, and various periodicals. Many such publications use independent external specialists to review submissions. This process is called peer review, and the reviewers are called referees. However, it is not always possible to pick the best referee for reviewing. Moreover, new research fields are emerging in every sector, and the number of research papers is increasing dramatically. To review all these papers, every journal assigns a small team of referees who may not be experts in all areas. For example, a research paper in communication technology should be reviewed by an expert from the same field. Thus, efficiently selecting the best reviewer or referee for a research paper is a big challenge.
In this research, we propose and implement program that uses a new strategy to automatically select the best reviewers for a research paper. Every research paper contains references at the end, usually from the same area. First, we collect the references and count authors who have at least one paper in the references. Then, we automatically browse the web to extract research topic keywords. Next, we search for top researchers in the specific topic and count their h-index, i10-index, and citations for the first n authors. Afterward, we rank the top n authors based on a score and automatically browse their homepages to retrieve email addresses. We also check their co-authors and colleagues online and discard them from the list. The remaining top n authors, generally professors, are likely the best referees for reviewing the research paper.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models
Authors:
Haoyang Wu,
Tsun-Hsuan Wang,
Mathias Lechner,
Ramin Hasani,
Jennifer A. Eckhoff,
Paul Pak,
Ozanan R. Meireles,
Guy Rosman,
Yutong Ban,
Daniela Rus
Abstract:
Surgical workflow analysis is essential in robot-assisted surgeries, yet the long duration of such procedures poses significant challenges for comprehensive video analysis. Recent approaches have predominantly relied on transformer models; however, their quadratic attention mechanism restricts efficient processing of lengthy surgical videos. In this paper, we propose a novel hierarchical input-dep…
▽ More
Surgical workflow analysis is essential in robot-assisted surgeries, yet the long duration of such procedures poses significant challenges for comprehensive video analysis. Recent approaches have predominantly relied on transformer models; however, their quadratic attention mechanism restricts efficient processing of lengthy surgical videos. In this paper, we propose a novel hierarchical input-dependent state space model that leverages the linear scaling property of state space models to enable decision making on full-length videos while capturing both local and global dynamics. Our framework incorporates a temporally consistent visual feature extractor, which appends a state space model head to a visual feature extractor to propagate temporal information. The proposed model consists of two key modules: a local-aggregation state space model block that effectively captures intricate local dynamics, and a global-relation state space model block that models temporal dependencies across the entire video. The model is trained using a hybrid discrete-continuous supervision strategy, where both signals of discrete phase labels and continuous phase progresses are propagated through the network. Experiments have shown that our method outperforms the current state-of-the-art methods by a large margin (+2.8% on Cholec80, +4.3% on MICCAI2016, and +12.9% on Heichole datasets). Code will be publicly available after paper acceptance.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Enabling Bitcoin Smart Contracts on the Internet Computer
Authors:
Ryan Croote,
Islam El-Ashi,
Thomas Locher,
Yvonne-Anne Pignolet
Abstract:
There is growing interest in providing programmatic access to the value locked in Bitcoin, which famously offers limited programmability itself. Various approaches have been put forth in recent years, with the vast majority of proposed mechanisms either building new functionality on top of Bitcoin or leveraging a bridging mechanism to enable smart contracts that make use of ``wrapped'' bitcoins on…
▽ More
There is growing interest in providing programmatic access to the value locked in Bitcoin, which famously offers limited programmability itself. Various approaches have been put forth in recent years, with the vast majority of proposed mechanisms either building new functionality on top of Bitcoin or leveraging a bridging mechanism to enable smart contracts that make use of ``wrapped'' bitcoins on entirely different platforms.
In this work, an architecture is presented that follows a different approach. The architecture enables the execution of Turing-complete Bitcoin smart contracts on the Internet Computer (IC), a blockchain platform for hosting and executing decentralized applications. Instead of using a bridge, IC and Bitcoin nodes interact directly, eliminating potential security risks that the use of a bridge entails. This integration requires novel concepts, in particular to reconcile the probabilistic nature of Bitcoin with the irreversibility of finalized state changes on the IC, which may be of independent interest.
In addition to the presentation of the architecture, we provide evaluation results based on measurements of the Bitcoin integration running on mainnet. The evaluation results demonstrate that, with finalization in a few seconds and low execution costs, this integration enables complex Bitcoin-based decentralized applications that were not practically feasible or economically viable before.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
An object-centric core metamodel for IoT-enhanced event logs
Authors:
Yannis Bertrand,
Christian Imenkamp,
Lukas Malburg,
Matthias Ehrendorfer,
Marco Franceschetti,
Joscha Grüger,
Francesco Leotta,
Jürgen Mangler,
Ronny Seiger,
Agnes Koschmider,
Stefanie Rinderle-Ma,
Barbara Weber,
Estefania Serral
Abstract:
Advances in Internet-of-Things (IoT) technologies have prompted the integration of IoT devices with business processes (BPs) in many organizations across various sectors, such as manufacturing, healthcare and smart spaces. The proliferation of IoT devices leads to the generation of large amounts of IoT data providing a window on the physical context of BPs, which facilitates the discovery of new i…
▽ More
Advances in Internet-of-Things (IoT) technologies have prompted the integration of IoT devices with business processes (BPs) in many organizations across various sectors, such as manufacturing, healthcare and smart spaces. The proliferation of IoT devices leads to the generation of large amounts of IoT data providing a window on the physical context of BPs, which facilitates the discovery of new insights about BPs using process mining (PM) techniques. However, to achieve these benefits, IoT data need to be combined with traditional process (event) data, which is challenging due to the very different characteristics of IoT and process data, for instance in terms of granularity levels. Recently, several data models were proposed to integrate IoT data with process data, each focusing on different aspects of data integration based on different assumptions and requirements. This fragmentation hampers data exchange and collaboration in the field of PM, e.g., making it tedious for researchers to share data. In this paper, we present a core model synthesizing the most important features of existing data models. As the core model is based on common requirements, it greatly facilitates data sharing and collaboration in the field. A prototypical Python implementation is used to evaluate the model against various use cases and demonstrate that it satisfies these common requirements.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Exploring Adapter Design Tradeoffs for Low Resource Music Generation
Authors:
Atharva Mehta,
Shivam Chauhan,
Monojit Choudhury
Abstract:
Fine-tuning large-scale music generation models, such as MusicGen and Mustango, is a computationally expensive process, often requiring updates to billions of parameters and, therefore, significant hardware resources. Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly adapter-based methods, have emerged as a promising alternative, enabling adaptation with minimal trainable parameters…
▽ More
Fine-tuning large-scale music generation models, such as MusicGen and Mustango, is a computationally expensive process, often requiring updates to billions of parameters and, therefore, significant hardware resources. Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly adapter-based methods, have emerged as a promising alternative, enabling adaptation with minimal trainable parameters while preserving model performance. However, the design choices for adapters, including their architecture, placement, and size, are numerous, and it is unclear which of these combinations would produce optimal adapters and why, for a given case of low-resource music genre. In this paper, we attempt to answer this question by studying various adapter configurations for two AI music models, MusicGen and Mustango, on two genres: Hindustani Classical and Turkish Makam music.
Our findings reveal distinct trade-offs: convolution-based adapters excel in capturing fine-grained local musical details such as ornamentations and short melodic phrases, while transformer-based adapters better preserve long-range dependencies crucial for structured improvisation. Additionally, we analyze computational resource requirements across different adapter scales, demonstrating how mid-sized adapters (40M parameters) achieve an optimal balance between expressivity and quality. Furthermore, we find that Mustango, a diffusion-based model, generates more diverse outputs with better adherence to the description in the input prompt while lacking in providing stability in notes, rhythm alignment, and aesthetics. Also, it is computationally intensive and requires significantly more time to train. In contrast, autoregressive models like MusicGen offer faster training and are more efficient, and can produce better quality output in comparison, but have slightly higher redundancy in their generations.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Exploring Micro Frontends: A Case Study Application in E-Commerce
Authors:
Ricardo Hideki Hangai Kojo,
Luiz Fernando Corte Real,
Renato Cordeiro Ferreira,
Thatiane de Oliveira Rosa,
Alfredo Goldman
Abstract:
In the micro frontends architectural style, the frontend is divided into smaller components, which can range from a simple button to an entire page. The goal is to improve scalability, resilience, and team independence, albeit at the cost of increased complexity and infrastructure demands. This paper seeks to understand when it is worth adopting micro frontends, particularly in the context of indu…
▽ More
In the micro frontends architectural style, the frontend is divided into smaller components, which can range from a simple button to an entire page. The goal is to improve scalability, resilience, and team independence, albeit at the cost of increased complexity and infrastructure demands. This paper seeks to understand when it is worth adopting micro frontends, particularly in the context of industry. To achieve this, we conducted an investigation into the state of the art of micro frontends, based on both academic and gray literature. We then implemented this architectural style in a marketplace for handcrafted products, which already used microservices. Finally, we evaluated the implementation through a semi-open questionnaire with the developers. At the studied marketplace company, the need for architectural change arose due to the tight coupling between their main system (a Java monolith) and a dedicated frontend system. Additionally, there were deprecated technologies and poor developer experience. To address these issues, the micro frontends architecture was adopted, along with the API Gateway and Backend for Frontend patterns, and technologies such as Svelte and Fastify. Although the adoption of Micro Frontends was successful, it was not strictly necessary to meet the company's needs. According to the analysis of the mixed questionnaire responses, other alternatives, such as a monolithic frontend, could have achieved comparable results. What made adopting micro frontends the most convenient choice in the company's context was the monolith strangulation and microservices adoption, which facilitated implementation through infrastructure reuse and knowledge sharing between teams.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Small Encoders Can Rival Large Decoders in Detecting Groundedness
Authors:
Istabrak Abbes,
Gabriele Prato,
Quentin Fournier,
Fernando Rodriguez,
Alaa Boukhary,
Adam Elwood,
Sarath Chandar
Abstract:
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensu…
▽ More
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation
Authors:
Diego Biagini,
Nassir Navab,
Azade Farshad
Abstract:
Surgical Video Synthesis has emerged as a promising research direction following the success of diffusion models in general-domain video generation. Although existing approaches achieve high-quality video generation, most are unconditional and fail to maintain consistency with surgical actions and phases, lacking the surgical understanding and fine-grained guidance necessary for factual simulation…
▽ More
Surgical Video Synthesis has emerged as a promising research direction following the success of diffusion models in general-domain video generation. Although existing approaches achieve high-quality video generation, most are unconditional and fail to maintain consistency with surgical actions and phases, lacking the surgical understanding and fine-grained guidance necessary for factual simulation. We address these challenges by proposing HieraSurg, a hierarchy-aware surgical video generation framework consisting of two specialized diffusion models. Given a surgical phase and an initial frame, HieraSurg first predicts future coarse-grained semantic changes through a segmentation prediction model. The final video is then generated by a second-stage model that augments these temporal segmentation maps with fine-grained visual features, leading to effective texture rendering and integration of semantic information in the video space. Our approach leverages surgical information at multiple levels of abstraction, including surgical phase, action triplets, and panoptic segmentation maps. The experimental results on Cholecystectomy Surgical Video Generation demonstrate that the model significantly outperforms prior work both quantitatively and qualitatively, showing strong generalization capabilities and the ability to generate higher frame-rate videos. The model exhibits particularly fine-grained adherence when provided with existing segmentation maps, suggesting its potential for practical surgical applications.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
Authors:
Xin Xu,
Tianhao Chen,
Fan Zhang,
Wanlong Liu,
Pengxiang Li,
Ajay Kumar Jaiswal,
Yuchen Yan,
Jishan Hu,
Yang Wang,
Hao Chen,
Shiwei Liu,
Shizhe Diao,
Can Yang,
Lu Yin
Abstract:
While slow-thinking large language models (LLMs) exhibit reflection-like reasoning, commonly referred to as the "aha moment:, their ability to generate informative critiques and refine prior solutions remains limited. In this paper, we introduce Double-Checker, a principled framework designed to enhance the reasoning capabilities of slow-thinking LLMs by fostering explicit self-critique and iterat…
▽ More
While slow-thinking large language models (LLMs) exhibit reflection-like reasoning, commonly referred to as the "aha moment:, their ability to generate informative critiques and refine prior solutions remains limited. In this paper, we introduce Double-Checker, a principled framework designed to enhance the reasoning capabilities of slow-thinking LLMs by fostering explicit self-critique and iterative refinement of their previous solutions. By fine-tuning on our curated 1,730 self-critical instances, Double-Checker empowers long-CoT LLMs to iteratively critique and refine their outputs during inference until they evaluate their solutions as correct under self-generated critiques. We validate the efficacy of Double-Checker across a comprehensive suite of reasoning benchmarks, demonstrating that iterative self-critique significantly enhances the reasoning capabilities of long-CoT LLMs. Notably, our Double-Checker increases the pass@1 performance on challenging AIME benchmarks from 4.4% to 18.2% compared to the original long-CoT LLMs. These results highlight a promising direction for developing more trustworthy and effective LLMs capable of structured self-critique.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Playing Snake on a Graph
Authors:
Denise Graafsma,
Bodo Manthey,
Alexander Skopalik
Abstract:
Snake is a classic computer game, which has been around for decades. Based on this game, we study the game of Snake on arbitrary undirected graphs. A snake forms a simple path that has to move to an apple while avoiding colliding with itself. When the snake reaches the apple, it grows longer, and a new apple appears. A graph on which the snake has a strategy to keep eating apples until it covers a…
▽ More
Snake is a classic computer game, which has been around for decades. Based on this game, we study the game of Snake on arbitrary undirected graphs. A snake forms a simple path that has to move to an apple while avoiding colliding with itself. When the snake reaches the apple, it grows longer, and a new apple appears. A graph on which the snake has a strategy to keep eating apples until it covers all the vertices of the graph is called snake-winnable. We prove that determining whether a graph is snake-winnable is NP-hard, even when restricted to grid graphs. We fully characterize snake-winnable graphs for odd-sized bipartite graphs and graphs with vertex-connectivity 1. While Hamiltonian graphs are always snake-winnable, we show that non-Hamiltonian snake-winnable graphs have a girth of at most 6 and that this bound is tight.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems?
Authors:
Andrea McGlinchey,
Peter J Barclay
Abstract:
Large language models can produce convincing "fake text" in domains such as academic writing, product reviews, and political news. Many approaches have been investigated for the detection of artificially generated text. While this may seem to presage an endless "arms race", we note that newer LLMs use ever more parameters, training data, and energy, while relatively simple classifiers demonstrate…
▽ More
Large language models can produce convincing "fake text" in domains such as academic writing, product reviews, and political news. Many approaches have been investigated for the detection of artificially generated text. While this may seem to presage an endless "arms race", we note that newer LLMs use ever more parameters, training data, and energy, while relatively simple classifiers demonstrate a good level of detection accuracy with modest resources. To approach the question of whether the models' ability to beat the detectors may therefore reach a plateau, we examine the ability of statistical classifiers to identify "fake text" in the style of classical detective fiction. Over a 0.5 version increase, we found that Gemini showed an increased ability to generate deceptive text, while GPT did not. This suggests that reliable detection of fake text may remain feasible even for ever-larger models, though new model architectures may improve their deceptiveness
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
KOALA: a Configurable Tool for Collecting IDE Data When Solving Programming Tasks
Authors:
Daniil Karol,
Elizaveta Artser,
Ilya Vlasov,
Yaroslav Golubev,
Hieke Keuning,
Anastasiia Birillo
Abstract:
Collecting data of students solving programming tasks is incredibly valuable for researchers and educators. It allows verifying that the students correctly apply the features and concepts they are taught, or finding students' misconceptions. However, existing data collection tools have limitations, e.g., no control over the granularity of the collected code, not collecting the specific events of t…
▽ More
Collecting data of students solving programming tasks is incredibly valuable for researchers and educators. It allows verifying that the students correctly apply the features and concepts they are taught, or finding students' misconceptions. However, existing data collection tools have limitations, e.g., no control over the granularity of the collected code, not collecting the specific events of the programming environment used, and overall being hard to configure.
To overcome these limitations, we propose KOALA, a convenient and highly configurable tool for collecting code snapshots and feature usage from students solving programming tasks in JetBrains IDEs. The plugin can be installed in IDEs and configured to provide the students with the necessary tasks, enable or disable certain IDE features like code completion, and run surveys. During problem solving, the plugin collects code snapshots at the configured granularity, all IDE actions like running and debugging, as well as some data not collected in prior works, like employed hotkeys and switching focus between files. The collected data is sent to the server that comes with the tool, where it is stored and can be converted to the standardized ProgSnap2 format. To showcase the tool, we collected data from 28 students solving tasks in two courses within the IDE, highlighting some insights from this data.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Zero-Shot Learning for Obsolescence Risk Forecasting
Authors:
Elie Saad,
Aya Mrabah,
Mariem Besbes,
Marc Zolghadri,
Victor Czmil,
Claude Baron,
Vincent Bourgeois
Abstract:
Component obsolescence poses significant challenges in industries reliant on electronic components, causing increased costs and disruptions in the security and availability of systems. Accurate obsolescence risk prediction is essential but hindered by a lack of reliable data. This paper proposes a novel approach to forecasting obsolescence risk using zero-shot learning (ZSL) with large language mo…
▽ More
Component obsolescence poses significant challenges in industries reliant on electronic components, causing increased costs and disruptions in the security and availability of systems. Accurate obsolescence risk prediction is essential but hindered by a lack of reliable data. This paper proposes a novel approach to forecasting obsolescence risk using zero-shot learning (ZSL) with large language models (LLMs) to address data limitations by leveraging domain-specific knowledge from tabular datasets. Applied to two real-world datasets, the method demonstrates effective risk prediction. A comparative evaluation of four LLMs underscores the importance of selecting the right model for specific forecasting tasks.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Complexity-aware fine-tuning
Authors:
Andrey Goncharov,
Daniil Vyazhev,
Petr Sychev,
Edvard Khalafyan,
Alexey Zaytsev
Abstract:
General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for compl…
▽ More
General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across two small open models ($\approx 3B$) we split the training data into complexity categories by a single token answer entropy (ROC AUC $0.73$), fine-tune large language models (LLMs) via SFT and distillation, and show that our pipeline significantly outperforms the standard SFT approach ($0.55$ vs $0.43$ average accuracy) and provides comparable with distillation performance while using $62\%$ less data ($0.55$ average accuracy for both). We publish our code and data to facilitate further research in this direction.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Edge Clique Partition and Cover Beyond Independence
Authors:
Fedor V. Fomin,
Petr A. Golovach,
Danil Sagunov,
Kirill Simonov
Abstract:
Covering and partitioning the edges of a graph into cliques are classical problems at the intersection of combinatorial optimization and graph theory, having been studied through a range of algorithmic and complexity-theoretic lenses. Despite the well-known fixed-parameter tractability of these problems when parameterized by the total number of cliques, such a parameterization often fails to be me…
▽ More
Covering and partitioning the edges of a graph into cliques are classical problems at the intersection of combinatorial optimization and graph theory, having been studied through a range of algorithmic and complexity-theoretic lenses. Despite the well-known fixed-parameter tractability of these problems when parameterized by the total number of cliques, such a parameterization often fails to be meaningful for sparse graphs. In many real-world instances, on the other hand, the minimum number of cliques in an edge cover or partition can be very close to the size of a maximum independent set α(G).
Motivated by this observation, we investigate above αparameterizations of the edge clique cover and partition problems. Concretely, we introduce and study Edge Clique Cover Above Independent Set (ECC/α) and Edge Clique Partition Above Independent Set (ECP/α), where the goal is to cover or partition all edges of a graph using at most α(G) + k cliques, and k is the parameter. Our main results reveal a distinct complexity landscape for the two variants. We show that ECP/αis fixed-parameter tractable, whereas ECC/αis NP-complete for all k \geq 2, yet can be solved in polynomial time for k \in {0,1}. These findings highlight intriguing differences between the two problems when viewed through the lens of parameterization above a natural lower bound.
Finally, we demonstrate that ECC/αbecomes fixed-parameter tractable when parameterized by k + ω(G), where ω(G) is the size of a maximum clique of the graph G. This result is particularly relevant for sparse graphs, in which ωis typically small. For H-minor free graphs, we design a subexponential algorithm of running time f(H)^{\sqrt{k}}n^{O(1)}.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models
Authors:
Louis Kerner,
Michel Meintz,
Bihe Zhao,
Franziska Boenisch,
Adam Dziedzic
Abstract:
State-of-the-art text-to-image models like Infinity generate photorealistic images at an unprecedented speed. These models operate in a bitwise autoregressive manner over a discrete set of tokens that is practically infinite in size. However, their impressive generative power comes with a growing risk: as their outputs increasingly populate the Internet, they are likely to be scraped and reused as…
▽ More
State-of-the-art text-to-image models like Infinity generate photorealistic images at an unprecedented speed. These models operate in a bitwise autoregressive manner over a discrete set of tokens that is practically infinite in size. However, their impressive generative power comes with a growing risk: as their outputs increasingly populate the Internet, they are likely to be scraped and reused as training data-potentially by the very same models. This phenomenon has been shown to lead to model collapse, where repeated training on generated content, especially from the models' own previous versions, causes a gradual degradation in performance. A promising mitigation strategy is watermarking, which embeds human-imperceptible yet detectable signals into generated images-enabling the identification of generated content. In this work, we introduce BitMark, a robust bitwise watermarking framework for Infinity. Our method embeds a watermark directly at the bit level of the token stream across multiple scales (also referred to as resolutions) during Infinity's image generation process. Our bitwise watermark subtly influences the bits to preserve visual fidelity and generation speed while remaining robust against a spectrum of removal techniques. Furthermore, it exhibits high radioactivity, i.e., when watermarked generated images are used to train another image generative model, this second model's outputs will also carry the watermark. The radioactive traces remain detectable even when only fine-tuning diffusion or image autoregressive models on images watermarked with our BitMark. Overall, our approach provides a principled step toward preventing model collapse in image generative models by enabling reliable detection of generated outputs.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.