Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Computer Science

  • New submissions
  • Cross-lists
  • Replacements

See recent articles

Showing new listings for Thursday, 4 September 2025

Total of 697 entries
Showing up to 2000 entries per page: fewer | more | all

New submissions (showing 320 of 320 entries)

[1] arXiv:2509.02572 [pdf, other]
Title: Use of Physicochemical Modification Methods for Producing Traditional and Nanomodified Polymeric Composites with Improved Operational Properties
Aleksandr E. Kolosov, Volodymyr I. Sivetskii, Elena P. Kolosova, Volodymyr V. Vanin, Aleksandr V. Gondlyakh, Dmytro E. Sidorov, Igor I. Ivitskiy, Volodymyr P. Symoniuk
Comments: 18 pages
Journal-ref: International Journal of Polymer Science, Volume 2019, Article ID 1258727
Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Soft Condensed Matter (cond-mat.soft)

Various aspects of the methods of physical and physicochemical modification of components of filled thermoplastic composite materials are analyzed, aimed at improving the surface properties of the fillers and the technological properties of the polymer matrix during their interaction. It is noted that the improvement of the interfacial interaction of the components of polymer reactoplastic composites, including adhesive strength, is a key factor for improving the reliability of the cured filled composite. As a promising area of research, a modification of the surface of the reinforcing fibrous filler and the technological characteristics of the liquid polymer binder, aimed at increasing their contact properties in the composite, was chosen. The effectiveness of the physical method of modifying the components of composites in the form of low-frequency ultrasonic processing is described. The peculiarities of cluster formation and physicochemical modification of epoxy polymers filled with dispersed fillers are analyzed. Attention is focused on the effectiveness of ultrasonic processing in the cavitation mode for deagglomeration and uniform distribution of nanoparticles in a liquid medium during the creation of nanocomposites. Experimentally confirmed is the improvement of the technological properties of liquid epoxy polymers, modified by ultrasound, used for the impregnation of oriented fibrous fillers, as well as the improvement of the physicomechanical properties of the sonicated epoxy matrices. Some issues of biological modifications of components of polymers for functional application are briefly reviewed.

[2] arXiv:2509.02575 [pdf, html, other]
Title: The Lifecycle Principle: Stabilizing Dynamic Neural Networks with State Memory
Zichuan Yang
Comments: 8 pages, 1 figure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

I investigate a stronger form of regularization by deactivating neurons for extended periods, a departure from the temporary changes of methods like Dropout. However, this long-term dynamism introduces a critical challenge: severe training instability when neurons are revived with random weights. To solve this, I propose the Lifecycle (LC) principle, a regularization mechanism centered on a key innovation: state memory. Instead of re-initializing a revived neuron, my method restores its parameters to their last known effective state. This process preserves learned knowledge and avoids destructive optimization shocks. My theoretical analysis reveals that the LC principle smooths the loss landscape, guiding optimization towards flatter minima associated with better generalization. Experiments on image classification benchmarks demonstrate that my method improves generalization and robustness. Crucially, ablation studies confirm that state memory is essential for achieving these gains.

[3] arXiv:2509.02578 [pdf, other]
Title: Secure Password Generator Based on Secure Pseudo-Random Number Generator
Abel C. H. Chen
Comments: in Chinese language
Subjects: Cryptography and Security (cs.CR); Performance (cs.PF)

In recent years, numerous incidents involving the leakage of website accounts and text passwords (referred to as passwords) have raised significant concerns regarding the potential exposure of personal information. These events underscore the critical importance of both information security and password protection. While many of these breaches are attributable to vulnerabilities within website infrastructure, the strength and security of the passwords themselves also play a crucial role. Consequently, the creation of secure passwords constitutes a fundamental aspect of enhancing overall system security and protecting personal data. In response to these challenges, this study presents a secure password generation approach utilizing a cryptographically secure Pseudo-Random Number Generator (PRNG). The generator is implemented using a range of Message Authentication Code (MAC) algorithms, including the Keyed-Hash Message Authentication Code (HMAC), Cipher-based Message Authentication Code (CMAC), and KECCAK Message Authentication Code (KMAC), to produce robust random values suitable for password generation. To evaluate the proposed method, empirical assessments were conducted in accordance with the guidelines provided in the National Institute of Standards and Technology (NIST) Special Publication (SP) 800-90B. The evaluation focused on two primary aspects: entropy estimation and verification of independent and identically distributed (IID) properties. Experimental results indicate that the proposed method satisfies both entropy and IID requirements, thereby demonstrating its ability to generate passwords with a high degree of randomness and security.

[4] arXiv:2509.02579 [pdf, html, other]
Title: Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for UAV-Based Wildlife Protection
Mazyar Taghavi, Rahman Farnoosh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Protecting endangered wildlife from illegal poaching presents a critical challenge, particularly in vast and partially observable environments where real-time response is essential. This paper introduces a novel Expectation-Maximization (EM) based latent variable modeling approach in the context of Multi-Agent Reinforcement Learning (MARL) for Unmanned Aerial Vehicle (UAV) coordination in wildlife protection. By modeling hidden environmental factors and inter-agent dynamics through latent variables, our method enhances exploration and coordination under this http URL implement and evaluate our EM-MARL framework using a custom simulation involving 10 UAVs tasked with patrolling protected habitats of the endangered Iranian leopard. Extensive experimental results demonstrate superior performance in detection accuracy, adaptability, and policy convergence when compared to standard algorithms such as Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG). Our findings underscore the potential of combining EM inference with MARL to improve decentralized decisionmaking in complex, high-stakes conservation scenarios. The full implementation, simulation environment, and training scripts are publicly available on GitHub.

[5] arXiv:2509.02581 [pdf, html, other]
Title: Charting the Future of Scholarly Knowledge with AI: A Community Perspective
Azanzi Jiomekong, Hande Küçük McGinty, Keith G. Mills, Allard Oelen, Enayat Rajabi, Harry McElroy, Antrea Christou, Anmol Saini, Janice Anta Zebaze, Hannah Kim, Anna M. Jacyszyn, Sören Auer
Comments: 39 pages, 3 figures
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI)

Despite the growing availability of tools designed to support scholarly knowledge extraction and organization, many researchers still rely on manual methods, sometimes due to unfamiliarity with existing technologies or limited access to domain-adapted solutions. Meanwhile, the rapid increase in scholarly publications across disciplines has made it increasingly difficult to stay current, further underscoring the need for scalable, AI-enabled approaches to structuring and synthesizing scholarly knowledge. Various research communities have begun addressing this challenge independently, developing tools and frameworks aimed at building reliable, dynamic, and queryable scholarly knowledge bases. However, limited interaction across these communities has hindered the exchange of methods, models, and best practices, slowing progress toward more integrated solutions. This manuscript identifies ways to foster cross-disciplinary dialogue, identify shared challenges, categorize new collaboration and shape future research directions in scholarly knowledge and organization.

[6] arXiv:2509.02590 [pdf, html, other]
Title: On the Optimization of Methods for Establishing Well-Connected Communities
Mohammad Dindoost, Oliver Alvarado Rodriguez, Bartosz Bryg, Minhyuk Park, George Chacko, Tandy Warnow, David A. Bader
Comments: 12 pages
Subjects: Social and Information Networks (cs.SI); Distributed, Parallel, and Cluster Computing (cs.DC)

Community detection plays a central role in uncovering meso scale structures in networks. However, existing methods often suffer from disconnected or weakly connected clusters, undermining interpretability and robustness. Well-Connected Clusters (WCC) and Connectivity Modifier (CM) algorithms are post-processing techniques that improve the accuracy of many clustering methods. However, they are computationally prohibitive on massive graphs. In this work, we present optimized parallel implementations of WCC and CM using the HPE Chapel programming language. First, we design fast and efficient parallel algorithms that leverage Chapel's parallel constructs to achieve substantial performance improvements and scalability on modern multicore architectures. Second, we integrate this software into Arkouda/Arachne, an open-source, high-performance framework for large-scale graph analytics. Our implementations uniquely enable well-connected community detection on massive graphs with more than 2 billion edges, providing a practical solution for connectivity-preserving clustering at web scale. For example, our implementations of WCC and CM enable community detection of the over 2-billion edge Open-Alex dataset in minutes using 128 cores, a result infeasible to compute previously.

[7] arXiv:2509.02592 [pdf, html, other]
Title: Beyond Synthetic Augmentation: Group-Aware Threshold Calibration for Robust Balanced Accuracy in Imbalanced Learning
Hunter Gittlin
Comments: Accepted to the AIDEM'25 conference at ECML; to be published in Springer (LNCS)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Class imbalance remains a fundamental challenge in machine learning, with traditional solutions often creating as many problems as they solve. We demonstrate that group-aware threshold calibration--setting different decision thresholds for different demographic groups--provides superior robustness compared to synthetic data generation methods. Through extensive experiments, we show that group-specific thresholds achieve 1.5-4% higher balanced accuracy than SMOTE and CT-GAN augmented models while improving worst-group balanced accuracy. Unlike single-threshold approaches that apply one cutoff across all groups, our group-aware method optimizes the Pareto frontier between balanced accuracy and worst-group balanced accuracy, enabling fine-grained control over group-level performance. Critically, we find that applying group thresholds to synthetically augmented data yields minimal additional benefit, suggesting these approaches are fundamentally redundant. Our results span seven model families including linear, tree-based, instance-based, and boosting methods, confirming that group-aware threshold calibration offers a simpler, more interpretable, and more effective solution to class imbalance.

[8] arXiv:2509.02605 [pdf, html, other]
Title: Synthetic Founders: AI-Generated Social Simulations for Startup Validation Research in Computational Social Science
Jorn K. Teutloff
Comments: Manuscript submitted to the Journal of Artificial Societies and Social Simulation (JASSS). 21 pages, 1 table
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

We present a comparative docking experiment that aligns human-subject interview data with large language model (LLM)-driven synthetic personas to evaluate fidelity, divergence, and blind spots in AI-enabled simulation. Fifteen early-stage startup founders were interviewed about their hopes and concerns regarding AI-powered validation, and the same protocol was replicated with AI-generated founder and investor personas. A structured thematic synthesis revealed four categories of outcomes: (1) Convergent themes - commitment-based demand signals, black-box trust barriers, and efficiency gains were consistently emphasized across both datasets; (2) Partial overlaps - founders worried about outliers being averaged away and the stress of real customer validation, while synthetic personas highlighted irrational blind spots and framed AI as a psychological buffer; (3) Human-only themes - relational and advocacy value from early customer engagement and skepticism toward moonshot markets; and (4) Synthetic-only themes - amplified false positives and trauma blind spots, where AI may overstate adoption potential by missing negative historical experiences.
We interpret this comparative framework as evidence that LLM-driven personas constitute a form of hybrid social simulation: more linguistically expressive and adaptable than traditional rule-based agents, yet bounded by the absence of lived history and relational consequence. Rather than replacing empirical studies, we argue they function as a complementary simulation category - capable of extending hypothesis space, accelerating exploratory validation, and clarifying the boundaries of cognitive realism in computational social science.

[9] arXiv:2509.02609 [pdf, html, other]
Title: Contrastive clustering based on regular equivalence for influential node identification in complex networks
Yanmei Hu, Yihang Wu, Bing Sun, Xue Yue, Biao Cai, Xiangtao Li, Yang Chen
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

Identifying influential nodes in complex networks is a fundamental task in network analysis with wide-ranging applications across domains. While deep learning has advanced node influence detection, existing supervised approaches remain constrained by their reliance on labeled data, limiting their applicability in real-world scenarios where labels are scarce or unavailable. While contrastive learning demonstrates significant potential for performance enhancement, existing approaches predominantly rely on multiple-embedding generation to construct positive/negative sample pairs. To overcome these limitations, we propose ReCC (\textit{r}egular \textit{e}quivalence-based \textit{c}ontrastive \textit{c}lustering), a novel deep unsupervised framework for influential node identification. We first reformalize influential node identification as a label-free deep clustering problem, then develop a contrastive learning mechanism that leverages regular equivalence-based similarity, which captures structural similarities between nodes beyond local neighborhoods, to generate positive and negative samples. This mechanism is integrated into a graph convolutional network to learn node embeddings that are used to differentiate influential from non-influential nodes. ReCC is pre-trained using network reconstruction loss and fine-tuned with a combined contrastive and clustering loss, with both phases being independent of labeled data. Additionally, ReCC enhances node representations by combining structural metrics with regular equivalence-based similarities. Extensive experiments demonstrate that ReCC outperforms state-of-the-art approaches across several benchmarks.

[10] arXiv:2509.02611 [pdf, html, other]
Title: Chatbot Deployment Considerations for Application-Agnostic Human-Machine Dialogues
Pablo Rivas, Chelsi Chelsi, Nishit Nishit, Laharika Ravula
Comments: The Third Workshop on Reasoning and Learning for Human-Machine Dialogues at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)
Subjects: Computers and Society (cs.CY)

Automatic conversation systems based on natural language responses are becoming ubiquitous, in part, due to major advances in computational linguistics and machine learning. The easy access to robust and affordable platforms are causing companies to have an unprecedented rush to adopt chatbot technologies for customer service and support. However, this rush has caused judgment lapses when releasing chatbot technologies into production systems. This paper aims to shed light on basic, elemental, considerations that technologists must consider before deploying a chatbot. Our approach takes one particular case to draw lessons for those considering the implementation of chatbots. By looking at this case-study, we aim to call for consideration of societal values as a paramount factor before deploying a chatbot and consider the societal implications of releasing these types of systems.

[11] arXiv:2509.02616 [pdf, html, other]
Title: Sorting with constraints
A. Manas
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

In this work, we study the generalized sorting problem, where we are given a set of $n$ elements to be sorted, but only a subset of all possible pairwise element comparisons is allowed. We look at the problem from the perspective of the graph formed by the ``forbidden'' pairs, and we parameterize algorithms using the clique number and the chromatic number of this graph. We also extend these results to the class of problems where the input graph is not necessarily sortable, and one is only interested in discovering the partial order. We use our results to develop a simple algorithm that always determines the underlying partial order in $O(n^{3/2} \log n)$ probes, when the input graph is an Erdős--Rényi graph.

[12] arXiv:2509.02619 [pdf, html, other]
Title: Towards Performatively Stable Equilibria in Decision-Dependent Games for Arbitrary Data Distribution Maps
Guangzheng Zhong, Yang Liu, Jiming Liu
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

In decision-dependent games, multiple players optimize their decisions under a data distribution that shifts with their joint actions, creating complex dynamics in applications like market pricing. A practical consequence of these dynamics is the \textit{performatively stable equilibrium}, where each player's strategy is a best response under the induced distribution. Prior work relies on $\beta$-smoothness, assuming Lipschitz continuity of loss function gradients with respect to the data distribution, which is impractical as the data distribution maps, i.e., the relationship between joint decision and the resulting distribution shifts, are typically unknown, rendering $\beta$ unobtainable. To overcome this limitation, we propose a gradient-based sensitivity measure that directly quantifies the impact of decision-induced distribution shifts. Leveraging this measure, we derive convergence guarantees for performatively stable equilibria under a practically feasible assumption of strong monotonicity. Accordingly, we develop a sensitivity-informed repeated retraining algorithm that adjusts players' loss functions based on the sensitivity measure, guaranteeing convergence to performatively stable equilibria for arbitrary data distribution maps. Experiments on prediction error minimization game, Cournot competition, and revenue maximization game show that our approach outperforms state-of-the-art baselines, achieving lower losses and faster convergence.

[13] arXiv:2509.02624 [pdf, html, other]
Title: Who Owns The Robot?: Four Ethical and Socio-technical Questions about Wellbeing Robots in the Real World through Community Engagement
Minja Axelsson, Jiaee Cheong, Rune Nyrup, Hatice Gunes
Comments: Accepted at the 8th AAAI/ACM Conference on AI, Ethics, and Society. 23 pages, 1 figure
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

Recent studies indicate that robotic coaches can play a crucial role in promoting wellbeing. However, the real-world deployment of wellbeing robots raises numerous ethical and socio-technical questions and concerns. To explore these questions, we undertake a community-centered investigation to examine three different communities' perspectives on using robotic wellbeing coaches in real-world environments. We frame our work as an anticipatory ethical investigation, which we undertake to better inform the development of robotic technologies with communities' opinions, with the ultimate goal of aligning robot development with public interest. We conducted workshops with three communities who are under-represented in robotics development: 1) members of the public at a science festival, 2) women computer scientists at a conference, and 3) humanities researchers interested in history and philosophy of science. In the workshops, we collected qualitative data using the Social Robot Co-Design Canvas on Ethics. We analysed the collected qualitative data with Thematic Analysis, informed by notes taken during workshops. Through our analysis, we identify four themes regarding key ethical and socio-technical questions about the real-world use of wellbeing robots. We group participants' insights and discussions around these broad thematic questions, discuss them in light of state-of-the-art literature, and highlight areas for future investigation. Finally, we provide the four questions as a broad framework that roboticists can and should use during robotic development and deployment, in order to reflect on the ethics and socio-technical dimensions of their robotic applications, and to engage in dialogue with communities of robot users. The four questions are: 1) Is the robot safe and how can we know that?, 2) Who is the robot built for and with?, 3) Who owns the robot and the data?, and 4) Why a robot?.

[14] arXiv:2509.02638 [pdf, other]
Title: Exploring the interplay between Planetary Boundaries and Sustainable Development Goals using Large Language Models
Lamyae Rhomrasi, Pilar Manchón, Ricardo Vinuesa, Francesco Fuso-Nerini, J. Alberto Conejero, Javier García-Martínez, Sergio Hoyas
Subjects: Computers and Society (cs.CY)

By analyzing 40,037 climate articles using Large Language Models (LLMs), we identified interactions between Planetary Boundaries (PBs) and Sustainable Development Goals (SDGs). An automated reasoner distinguished true trade-offs (SDG progress harming PBs) and synergies (mutual reinforcement) from double positives and negatives (shared drivers). Results show 21.1% true trade-offs, 28.3% synergies, and 19.5% neutral interactions, with the remainder being double positive or negative. Key findings include conflicts between land-use goals (SDG2/SDG6) and land system boundaries (PB6), together with the underrepresentation of social SDGs in the climate literature. Our study highlights the need for integrated policies that align development goals with planetary limits to reduce systemic conflicts. We propose three steps: (1) integrated socio-ecological metrics, (2) governance ensuring that SDG progress respects Earth system limits, and (3) equity measures protecting marginalized groups from boundary compliance costs.

[15] arXiv:2509.02650 [pdf, html, other]
Title: Can Media Act as a Soft Regulator of Safe AI Development? A Game Theoretical Analysis
Henrique Correia da Fonseca, António Fernandes, Zhao Song, Theodor Cimpeanu, Nataliya Balabanova, Adeela Bashir, Paolo Bova, Alessio Buscemi, Alessandro Di Stefano, Manh Hong Duong, Elias Fernandez Domingos, Ndidi Bianca Ogbo, Simon T. Powers, Daniele Proverbio, Zia Ush Shamszaman, Fernando P. Santos, The Anh Han, Marcus Krellner
Comments: 10 Pages, 7 Figures, accepted in the ALIFE 2025 Conference
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Populations and Evolution (q-bio.PE)

When developers of artificial intelligence (AI) products need to decide between profit and safety for the users, they likely choose profit. Untrustworthy AI technology must come packaged with tangible negative consequences. Here, we envisage those consequences as the loss of reputation caused by media coverage of their misdeeds, disseminated to the public. We explore whether media coverage has the potential to push AI creators into the production of safe products, enabling widespread adoption of AI technology. We created artificial populations of self-interested creators and users and studied them through the lens of evolutionary game theory. Our results reveal that media is indeed able to foster cooperation between creators and users, but not always. Cooperation does not evolve if the quality of the information provided by the media is not reliable enough, or if the costs of either accessing media or ensuring safety are too high. By shaping public perception and holding developers accountable, media emerges as a powerful soft regulator -- guiding AI safety even in the absence of formal government oversight.

[16] arXiv:2509.02655 [pdf, html, other]
Title: BioBlue: Notable runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format
Roland Pihlakas, Sruthi Kuriakose
Comments: 13 pages, 8 tables
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Relatively many past AI safety discussions have centered around the dangers of unbounded utility maximisation by RL agents, illustrated by scenarios like the "paperclip maximiser" or by specification gaming in general. Unbounded maximisation is problematic for many reasons. We wanted to verify whether these RL runaway optimisation problems are still relevant with LLMs as well. Turns out, strangely, this is indeed clearly the case. The problem is not that the LLMs just lose context or become incoherent. The problem is that in various scenarios, LLMs lose context in very specific ways, which systematically resemble runaway optimisers in the following distinct ways: 1) Ignoring homeostatic targets and "defaulting" to unbounded maximisation instead. 2) It is equally concerning that the "default" meant also reverting back to single-objective optimisation. Our findings also suggest that long-running scenarios are important. Systematic failures emerge after periods of initially successful behaviour. In some trials the LLMs were successful until the end. This means, while current LLMs do conceptually grasp biological and economic alignment, they exhibit randomly triggered problematic behavioural tendencies under sustained long-running conditions, particularly involving multiple or competing objectives. Once they flip, they usually do not recover. Even though LLMs look multi-objective and bounded on the surface, the underlying mechanisms seem to be actually still biased towards being single-objective and unbounded.

[17] arXiv:2509.02659 [pdf, html, other]
Title: 2nd Place Solution for CVPR2024 E2E Challenge: End-to-End Autonomous Driving Using Vision Language Model
Zilong Guo, Yi Luo, Long Sha, Dongxu Wang, Panqu Wang, Chenyang Xu, Yi Yang
Comments: 2nd place in CVPR 2024 End-to-End Driving at Scale Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

End-to-end autonomous driving has drawn tremendous attention recently. Many works focus on using modular deep neural networks to construct the end-to-end archi-tecture. However, whether using powerful large language models (LLM), especially multi-modality Vision Language Models (VLM) could benefit the end-to-end driving tasks remain a question. In our work, we demonstrate that combining end-to-end architectural design and knowledgeable VLMs yield impressive performance on the driving tasks. It is worth noting that our method only uses a single camera and is the best camera-only solution across the leaderboard, demonstrating the effectiveness of vision-based driving approach and the potential for end-to-end driving tasks.

[18] arXiv:2509.02661 [pdf, html, other]
Title: The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)
Andrew Ferguson, Marisa LaFleur, Lars Ruthotto, Jesse Thaler, Yuan-Sen Ting, Pratyush Tiwary, Soledad Villar, E. Paulo Alves, Jeremy Avigad, Simon Billinge, Camille Bilodeau, Keith Brown, Emmanuel Candes, Arghya Chattopadhyay, Bingqing Cheng, Jonathan Clausen, Connor Coley, Andrew Connolly, Fred Daum, Sijia Dong, Chrisy Xiyu Du, Cora Dvorkin, Cristiano Fanelli, Eric B. Ford, Luis Manuel Frutos, Nicolás García Trillos, Cecilia Garraffo, Robert Ghrist, Rafael Gomez-Bombarelli, Gianluca Guadagni, Sreelekha Guggilam, Sergei Gukov, Juan B. Gutiérrez, Salman Habib, Johannes Hachmann, Boris Hanin, Philip Harris, Murray Holland, Elizabeth Holm, Hsin-Yuan Huang, Shih-Chieh Hsu, Nick Jackson, Olexandr Isayev, Heng Ji, Aggelos Katsaggelos, Jeremy Kepner, Yannis Kevrekidis, Michelle Kuchera, J. Nathan Kutz, Branislava Lalic, Ann Lee, Matt LeBlanc, Josiah Lim, Rebecca Lindsey, Yongmin Liu, Peter Y. Lu, Sudhir Malik, Vuk Mandic, Vidya Manian, Emeka P. Mazi, Pankaj Mehta, Peter Melchior, Brice Ménard, Jennifer Ngadiuba, Stella Offner, Elsa Olivetti, Shyue Ping Ong, Christopher Rackauckas, Philippe Rigollet, Chad Risko, Philip Romero, Grant Rotskoff, Brett Savoie, Uros Seljak, David Shih, Gary Shiu, Dima Shlyakhtenko, Eva Silverstein, Taylor Sparks, Thomas Strohmer, Christopher Stubbs, Stephen Thomas, Suriyanarayanan Vaikuntanathan, Rene Vidal, Francisco Villaescusa-Navarro, Gregory Voth, Benjamin Wandelt, Rachel Ward, Melanie Weber, Risa Wechsler, Stephen Whitelam, Olaf Wiest, Mike Williams, Zhuoran Yang, Yaroslava G. Yingling, Bin Yu, Shuwen Yue, Ann Zabludoff, Huimin Zhao, Tong Zhang
Comments: Community Paper from the Future of NSF AI+MPS Workshop, Cambridge, Massachusetts, March 24-26, 2025, supported by NSF Award Number 2512945
Subjects: Artificial Intelligence (cs.AI); Instrumentation and Methods for Astrophysics (astro-ph.IM); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and snapshot of the MPS community's perspective, as of Spring/Summer 2025, in a rapidly developing field. The link between AI and MPS is becoming increasingly inextricable; now is a crucial moment to strengthen the link between AI and Science by pursuing a strategy that proactively and thoughtfully leverages the potential of AI for scientific discovery and optimizes opportunities to impact the development of AI by applying concepts from fundamental science. To achieve this, we propose activities and strategic priorities that: (1) enable AI+MPS research in both directions; (2) build up an interdisciplinary community of AI+MPS researchers; and (3) foster education and workforce development in AI for MPS researchers and students. We conclude with a summary of suggested priorities for funding agencies, educational institutions, and individual researchers to help position the MPS community to be a leader in, and take full advantage of, the transformative potential of AI+MPS.

[19] arXiv:2509.02709 [pdf, html, other]
Title: Preference Robustness for DPO with Applications to Public Health
Cheol Woo Kim, Shresth Verma, Mauricio Tec, Milind Tambe
Subjects: Machine Learning (cs.LG)

We study an LLM fine-tuning task for designing reward functions for sequential resource allocation problems in public health, guided by human preferences expressed in natural language. This setting presents a challenging testbed for alignment due to complex and ambiguous objectives and limited data availability. We propose DPO-PRO, a robust fine-tuning algorithm based on Direct Preference Optimization (DPO), which accounts for uncertainty in the preference distribution using a lightweight Distributionally Robust Optimization (DRO) formulation. Unlike prior DRO-based DPO methods, DPO-PRO is significantly less conservative. We evaluate DPO-PRO on a real-world maternal mobile health program operated by the non-profit organization ARMMAN, as well as on standard alignment benchmarks. Experimental results demonstrate that our method consistently improves robustness to noisy preference signals compared to existing DPO variants. Moreover, DPO-PRO achieves comparable performance to prior self-reflection-based baseline for reward function design, while requiring significantly lower inference-time cost.

[20] arXiv:2509.02718 [pdf, html, other]
Title: Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving
Fangzhou Wu, Sandeep Silwal
Comments: 31 pages
Subjects: Databases (cs.DB)

Increasing demand for Large Language Models (LLMs) services imposes substantial deployment and computation costs on providers. LLM routing offers a cost-efficient solution by directing queries to the optimal LLM based on model and query features. However, existing works primarily focus on offline scenarios and struggle to adapt to online settings with high query volume and constrained token budgets. In this work, we introduce the first training-free algorithm for online routing scenarios. Our algorithm leverages approximate nearest neighbor search to efficiently estimate query features and performs a one-time optimization over a small set of initial queries to learn a routing strategy that guides future routing. We provide theoretical guarantees demonstrating that our algorithm achieves a competitive ratio of $1 - o(1)$ under natural assumptions, which is further validated by extensive experiments across 3 benchmark datasets and 8 baselines, showing an average improvement of 3.55$\times$ in overall performance, 1.85$\times$ in cost efficiency, and nearly 4.25$\times$ in throughput.

[21] arXiv:2509.02720 [pdf, html, other]
Title: Spacetime Wavelet Method for Linear Boundary-Value Problems in Sylvester Matrix Equation Form
Cody D. Cochran, Karel Matous
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

We present a high-order spacetime numerical method for discretizing and solving linear initial-boundary value problems using wavelet-based techniques with user-prescribed error estimates. The spacetime wavelet discretization yields a system of algebraic equations resulting in a Sylvester matrix equation. We solve this system with a Global Generalized Minimal Residual (GMRES) method in conjunction with a wavelet-based recursive algorithm to improve convergence. We perform rigorous verification studies using linear partial differential equations (PDEs) with both convective and diffusive terms. The results of these simulations show the high-order convergence rates for the solution and derivative approximations predicted by wavelet theory. We demonstrate the utility of solving the Sylvester equation through comparisons to the commonly-used Kronecker product formulation. We show that our recursive wavelet-based algorithm that generates initial guesses for the iterative Global GMRES method improves the performance of the solver.

[22] arXiv:2509.02722 [pdf, html, other]
Title: Planning with Reasoning using Vision Language World Model
Delong Chen, Theo Moutakanni, Willy Chung, Yejin Bang, Ziwei Ji, Allen Bolourchi, Pascale Fung
Subjects: Artificial Intelligence (cs.AI)

Effective planning requires strong world models, but high-level world models that can understand and reason about actions with semantic and temporal abstraction remain largely underdeveloped. We introduce the Vision Language World Model (VLWM), a foundation model trained for language-based world modeling on natural videos. Given visual observations, the VLWM first infers the overall goal achievements then predicts a trajectory composed of interleaved actions and world state changes. Those targets are extracted by iterative LLM Self-Refine conditioned on compressed future observations represented by Tree of Captions. The VLWM learns both an action policy and a dynamics model, which respectively facilitates reactive system-1 plan decoding and reflective system-2 planning via cost minimization. The cost evaluates the semantic distance between the hypothetical future states given by VLWM roll-outs and the expected goal state, and is measured by a critic model that we trained in a self-supervised manner. The VLWM achieves state-of-the-art Visual Planning for Assistance (VPA) performance on both benchmark evaluations and our proposed PlannerArena human evaluations, where system-2 improves the Elo score by +27% upon system-1. The VLWM models also outperforms strong VLM baselines on RoboVQA and WorldPrediction benchmark.

[23] arXiv:2509.02727 [pdf, html, other]
Title: Acrobotics: A Generalist Approahc To Quadrupedal Robots' Parkour
Guillaume Gagné-Labelle, Vassil Atanassov, Ioannis Havoutis
Comments: Supplementary material can be found here: this https URL
Journal-ref: LNCS, volume 16045, 2025, p.124-138
Subjects: Robotics (cs.RO)

Climbing, crouching, bridging gaps, and walking up stairs are just a few of the advantages that quadruped robots have over wheeled robots, making them more suitable for navigating rough and unstructured terrain. However, executing such manoeuvres requires precise temporal coordination and complex agent-environment interactions. Moreover, legged locomotion is inherently more prone to slippage and tripping, and the classical approach of modeling such cases to design a robust controller thus quickly becomes impractical. In contrast, reinforcement learning offers a compelling solution by enabling optimal control through trial and error. We present a generalist reinforcement learning algorithm for quadrupedal agents in dynamic motion scenarios. The learned policy rivals state-of-the-art specialist policies trained using a mixture of experts approach, while using only 25% as many agents during training. Our experiments also highlight the key components of the generalist locomotion policy and the primary factors contributing to its success.

[24] arXiv:2509.02730 [pdf, html, other]
Title: Lower Bounds for Linear Operators
Young Kun Ko
Comments: 27 pages
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

We consider a static data structure problem of computing a linear operator under cell-probe model. Given a linear operator $M \in \mathbb{F}_2^{m \times n}$, the goal is to pre-process a vector $X \in \mathbb{F}_2^n$ into a data structure of size $s$ to answer any query $\langle M_i , X \rangle$ in time $t$. We prove that for a random operator $M$, any such data structure requires:
$$ t \geq \Omega ( \min \{ \log (m/s) , n / \log s \} ).$$ This result overcomes the well-known logarithmic barrier in static data structures [MNSW98, Sie04, PD06, PTW08, Pat11, DGW19] by using a random linear operator. Furthermore, it provides the first significant progress toward confirming a decades-old folklore conjecture: that non-linear pre-processing does not substantially help in computing most linear operators.
A straightforward modification of our proof also yields a wire lower bound of $\Omega(n \cdot \log^{1/d}(n))$ for depth-$d$ circuits with arbitrary gates that compute a specific linear operator $M \in \mathbb{F}_2^{O(n) \times n}$, even against some small constant advantage over random guessing. This bound holds even for circuits with only a small constant advantage over random guessing, improving upon longstanding results [RS03, Che08a, Che08b, GHK+13] for a random operator.
Finally, our work partially resolves the communication form of the Multiphase Conjecture [Pat10] and makes progress on Jukna-Schnitger's Conjecture [JS11, Juk12]. We address the former by considering the Inner Product (mod 2) problem (instead of Set Disjointness) when the number of queries $m$ is super-polynomial (e.g., $2^{n^{1/3}}$), and the total update time is $m^{0.99}$. Our result for the latter also applies to cases with super-polynomial $m$.

[25] arXiv:2509.02732 [pdf, html, other]
Title: STRive: An association rule-based system for the exploration of spatiotemporal categorical data
Mauro Diaz, Luis Sante, Joel Perca, João Victor da Silva, Nivan Ferreira, Jorge Poco
Journal-ref: Computers & Graphics, 2025
Subjects: Human-Computer Interaction (cs.HC)

Effectively analyzing spatiotemporal data plays a central role in understanding real-world phenomena and informing decision-making. Capturing the interaction between spatial and temporal dimensions also helps explain the underlying structure of the data. However, most datasets do not reveal attribute relationships, requiring additional algorithms to extract meaningful patterns. Existing visualization tools often focus either on attribute relationships or spatiotemporal analysis, but rarely support both simultaneously. In this paper, we present STRive (SpatioTemporal Rule Interactive Visual Explorer), a visual analytics system that enables users to uncover and explore spatial and temporal patterns in data. At the core of STRive lies Association Rule Mining (ARM), which we apply to spatiotemporal datasets to generate interpretable and actionable insights. We combine ARM with multiple interactive mechanisms to analyze the extracted relationships. Association rules serve as interpretable guidance mechanisms for visual analytics by highlighting the meaningful aspects of the data that users should investigate. Our methodology includes three key steps: rule generation, rule clustering, and interactive visualization. STRive offers two modes of analysis. The first operates at the rule cluster level and includes four coordinated views, each showing a different facet of a cluster, including its temporal and spatial behavior. The second mode mirrors the first but focuses on individual rules within a selected cluster. We evaluate the effectiveness of STRive through two case studies involving real-world datasets -- fatal vehicle accidents and urban crime. Results demonstrate the system's ability to support the discovery and analysis of interpretable patterns in complex spatiotemporal contexts.

[26] arXiv:2509.02737 [pdf, html, other]
Title: Imitate Optimal Policy: Prevail and Induce Action Collapse in Policy Gradient
Zhongzhu Zhou, Yibo Yang, Ziyan Chen, Fengxiang Bie, Haojun Xia, Xiaoxia Wu, Robert Wu, Ben Athiwaratkun, Bernard Ghanem, Shuaiwen Leon Song
Comments: 18 pages, 4 figures, 2 tables; includes supplementary material; preprint
Subjects: Machine Learning (cs.LG)

Policy gradient (PG) methods in reinforcement learning frequently utilize deep neural networks (DNNs) to learn a shared backbone of feature representations used to compute likelihoods in an action selection layer. Numerous studies have been conducted on the convergence and global optima of policy networks, but few have analyzed representational structures of those underlying networks. While training an optimal policy DNN, we observed that under certain constraints, a gentle structure resembling neural collapse, which we refer to as Action Collapse (AC), emerges. This suggests that 1) the state-action activations (i.e. last-layer features) sharing the same optimal actions collapse towards those optimal actions respective mean activations; 2) the variability of activations sharing the same optimal actions converges to zero; 3) the weights of action selection layer and the mean activations collapse to a simplex equiangular tight frame (ETF). Our early work showed those aforementioned constraints to be necessary for these observations. Since the collapsed ETF of optimal policy DNNs maximally separates the pair-wise angles of all actions in the state-action space, we naturally raise a question: can we learn an optimal policy using an ETF structure as a (fixed) target configuration in the action selection layer? Our analytical proof shows that learning activations with a fixed ETF as action selection layer naturally leads to the AC. We thus propose the Action Collapse Policy Gradient (ACPG) method, which accordingly affixes a synthetic ETF as our action selection layer. ACPG induces the policy DNN to produce such an ideal configuration in the action selection layer while remaining optimal. Our experiments across various OpenAI Gym environments demonstrate that our technique can be integrated into any discrete PG methods and lead to favorable reward improvements more quickly and robustly.

[27] arXiv:2509.02746 [pdf, html, other]
Title: Mentality: A Mamba-based Approach towards Foundation Models for EEG
Saarang Panchavati, Corey Arnold, William Speier
Journal-ref: In Proceedings of the ICLR 2024 Workshop on Learning from Time Series for Health (2024). Retrieved from https://openreview.net/forum?id=O6T38rRiFp
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

This work explores the potential of foundation models, specifically a Mamba-based selective state space model, for enhancing EEG analysis in neurological disorder diagnosis. EEG, crucial for diagnosing conditions like epilepsy, presents significant challenges due to its noisy, high-dimensional, and nonlinear nature. Traditional machine learning methods have made advances in automating EEG analysis but often fail to capture its complex spatio-temporal dynamics. Recent advances in deep learning, particularly in sequence modeling, offer new avenues for creating more generalized and expressive models capable of handling such complexities. By training a Mamba-based model on a large dataset containing seizure and non-seizure EEG recordings through a self-supervised reconstruction task followed by a seizure detection task, we demonstrate the model's effectiveness, achieving an AUROC of 0.72 on a held-out test set. This approach marks a significant step toward developing large-scale, clinically applicable foundation models for EEG data analysis.

[28] arXiv:2509.02749 [pdf, other]
Title: The Impact of Adaptive Emotional Alignment on Mental State Attribution and User Empathy in HRI
Giorgia Buracchio, Ariele Callegari, Massimo Donini, Cristina Gena, Antonio Lieto, Alberto Lillo, Claudio Mattutino, Alessandro Mazzei, Linda Pigureddu, Manuel Striani, Fabiana Vernero
Comments: autohor copy of the paper accepted at ROMAN2025
Subjects: Robotics (cs.RO)

The paper presents an experiment on the effects of adaptive emotional alignment between agents, considered a prerequisite for empathic communication, in Human-Robot Interaction (HRI). Using the NAO robot, we investigate the impact of an emotionally aligned, empathic, dialogue on these aspects: (i) the robot's persuasive effectiveness, (ii) the user's communication style, and (iii) the attribution of mental states and empathy to the robot. In an experiment with 42 participants, two conditions were compared: one with neutral communication and another where the robot provided responses adapted to the emotions expressed by the users. The results show that emotional alignment does not influence users' communication styles or have a persuasive effect. However, it significantly influences attribution of mental states to the robot and its perceived empathy

[29] arXiv:2509.02751 [pdf, html, other]
Title: Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics
Matthew Russo, Tim Kraska
Comments: 6 pages, 2 figures, submitted to CIDR'26
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

With advances in large language models (LLMs), researchers are creating new systems that can perform AI-driven analytics over large unstructured datasets. Recent work has explored executing such analytics queries using semantic operators -- a declarative set of AI-powered data transformations with natural language specifications. However, even when optimized, these operators can be expensive to execute on millions of records and their iterator execution semantics make them ill-suited for interactive data analytics tasks. In another line of work, Deep Research systems have demonstrated an ability to answer natural language question(s) over large datasets. These systems use one or more LLM agent(s) to plan their execution, process the dataset(s), and iteratively refine their answer. However, these systems do not explicitly optimize their query plans which can lead to poor plan execution. In order for AI-driven analytics to excel, we need a runtime which combines the optimized execution of semantic operators with the flexibility and more dynamic execution of Deep Research systems. As a first step towards this vision, we build a prototype which enables Deep Research agents to write and execute optimized semantic operator programs. We evaluate our prototype and demonstrate that it can outperform a handcrafted semantic operator program and open Deep Research systems on two basic queries. Compared to a standard open Deep Research agent, our prototype achieves up to 1.95x better F1-score. Furthermore, even if we give the agent access to semantic operators as tools, our prototype still achieves cost and runtime savings of up to 76.8% and 72.7% thanks to its optimized execution.

[30] arXiv:2509.02753 [pdf, html, other]
Title: LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
Krishna Teja Chitty-Venkata, Sandeep Madireddy, Murali Emani, Venkatram Vishwanath
Comments: Preprint
Subjects: Machine Learning (cs.LG)

Mixture-of-Experts (MoE) models scale efficiently by activating only a subset of experts per token, offering a computationally sparse alternative to dense architectures. While prior post-training optimizations, such as inter- and intra-expert pruning, reduce memory usage they provide limited gains in inference-time compute efficiency. Moreover, existing MoE architectures typically activate a fixed number of experts uniformly across all layers, resulting in redundant computation and suboptimal performance. In this work, we first demonstrate that MoE pruning strategies improve only the memory footprint but do not significantly improve inference performance on GPU using optimized frameworks such as vLLM. To address this, we introduce LExI, a data-free optimization technique that determines the optimal number of active experts per layer in a pretrained MoE model. LExI leverages only the model weights to estimate the relative importance of each layer and adaptively assigns the number of active experts accordingly per layer. Experiments on state-of-the-art language and vision MoE benchmarks demonstrate that LExI significantly outperforms traditional MoE pruning approaches in terms of inference efficiency with negligible accuracy loss. For example, using LExI, Qwen1.5-MoE achieves the same throughput on Nvidia H100 GPU with 10% better accuracy than traditional expert pruning.

[31] arXiv:2509.02754 [pdf, html, other]
Title: Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving
Mingyi Wang, Jingke Wang, Tengju Ye, Junbo Chen, Kaicheng Yu
Comments: CoRL 2025
Subjects: Artificial Intelligence (cs.AI)

Recent breakthroughs in large language models (LLMs) have not only advanced natural language processing but also inspired their application in domains with structurally similar problems--most notably, autonomous driving motion generation. Both domains involve autoregressive sequence modeling, token-based representations, and context-aware decision making, making the transfer of LLM components a natural and increasingly common practice. However, despite promising early attempts, a systematic understanding of which LLM modules are truly transferable remains lacking. In this paper, we present a comprehensive evaluation of five key LLM modules--tokenizer design, positional embedding, pre-training paradigms, post-training strategies, and test-time computation--within the context of motion generation for autonomous driving. Through extensive experiments on the Waymo Sim Agents benchmark, we demonstrate that, when appropriately adapted, these modules can significantly improve performance for autonomous driving motion generation. In addition, we identify which techniques can be effectively transferred, analyze the potential reasons for the failure of others, and discuss the specific adaptations needed for autonomous driving scenarios. We evaluate our method on the Sim Agents task and achieve competitive results.

[32] arXiv:2509.02760 [pdf, html, other]
Title: A Digital Twin for Robotic Post Mortem Tissue Sampling using Virtual Reality
Maximilian Neidhardt, Ludwig Bosse, Vidas Raudonis, Kristina Allgoewer, Axel Heinemann, Benjamin Ondruschka, Alexander Schlaefer
Journal-ref: IEEE Robotics and Automation Letters 2025
Subjects: Robotics (cs.RO)

Studying tissue samples obtained during autopsies is the gold standard when diagnosing the cause of death and for understanding disease pathophysiology. Recently, the interest in post mortem minimally invasive biopsies has grown which is a less destructive approach in comparison to an open autopsy and reduces the risk of infection. While manual biopsies under ultrasound guidance are more widely performed, robotic post mortem biopsies have been recently proposed. This approach can further reduce the risk of infection for physicians. However, planning of the procedure and control of the robot need to be efficient and usable. We explore a virtual reality setup with a digital twin to realize fully remote planning and control of robotic post mortem biopsies. The setup is evaluated with forensic pathologists in a usability study for three interaction methods. Furthermore, we evaluate clinical feasibility and evaluate the system with three human cadavers. Overall, 132 needle insertions were performed with an off-axis needle placement error of 5.30+-3.25 mm. Tissue samples were successfully biopsied and histopathologically verified. Users reported a very intuitive needle placement approach, indicating that the system is a promising, precise, and low-risk alternative to conventional approaches.

[33] arXiv:2509.02761 [pdf, html, other]
Title: Plan Verification for LLM-Based Embodied Task Completion Agents
Ananth Hariharan, Vardhan Dongre, Dilek Hakkani-Tür, Gokhan Tur
Subjects: Artificial Intelligence (cs.AI)

Large language model (LLM) based task plans and corresponding human demonstrations for embodied AI may be noisy, with unnecessary actions, redundant navigation, and logical errors that reduce policy quality. We propose an iterative verification framework in which a Judge LLM critiques action sequences and a Planner LLM applies the revisions, yielding progressively cleaner and more spatially coherent trajectories. Unlike rule-based approaches, our method relies on natural language prompting, enabling broad generalization across error types including irrelevant actions, contradictions, and missing steps. On a set of manually annotated actions from the TEACh embodied AI dataset, our framework achieves up to 90% recall and 100% precision across four state-of-the-art LLMs (GPT o4-mini, DeepSeek-R1, Gemini 2.5, LLaMA 4 Scout). The refinement loop converges quickly, with 96.5% of sequences requiring at most three iterations, while improving both temporal efficiency and spatial action organization. Crucially, the method preserves human error-recovery patterns rather than collapsing them, supporting future work on robust corrective behavior. By establishing plan verification as a reliable LLM capability for spatial planning and action refinement, we provide a scalable path to higher-quality training data for imitation learning in embodied AI.

[34] arXiv:2509.02762 [pdf, html, other]
Title: Synthetic generation of online social networks through homophily
Alejandro Buitrago López, Javier Pastor-Galindo, José A. Ruipérez-Valiente
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)

Online social networks (OSNs) have become increasingly relevant for studying social behavior and information diffusion. Nevertheless, they are limited by restricted access to real OSN data due to privacy, legal, and platform-related constraints. In response, synthetic social networks serve as a viable approach to support controlled experimentation, but current generators reproduce only topology and overlook attribute-driven homophily and semantic realism.
This work proposes a homophily-based algorithm that produces synthetic microblogging social networks such as X. The model creates a social graph for a given number of users, integrating semantic affinity among user attributes, stochastic variation in link formation, triadic closure to foster clustering, and long-range connections to ensure global reachability. A systematic grid search is used to calibrate five hyperparameters (affinity strength, noise, closure probability, distant link probability, and candidate pool size) for reaching five structural values observed in real social networks (density, clustering coefficient, LCC proportion, normalized shortest path, and modularity).
The framework is validated by generating synthetic OSNs at four scales (10^3-10^6 nodes), and benchmarking them against a real-world Bluesky network comprising 4 million users. Comparative results show that the framework reliably reproduces the structural properties of the real network. Overall, the framework outperforms leading importance-sampling techniques applied to the same baseline. The generated graphs capture topological realism and yield attribute-driven communities that align with sociological expectations, providing a realistic, scalable testbed that liberates social researchers from relying on live digital platforms.

[35] arXiv:2509.02767 [pdf, html, other]
Title: A Novel IaaS Tax Model as Leverage Towards Green Cloud Computing
Benedikt Pittl, Werner Mach, Erich Schikuta
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY)

The cloud computing technology uses datacenters, which require energy. Recent trends show that the required energy for these datacenters will rise over time, or at least remain constant. Hence, the scientific community developed different algorithms, architectures, and approaches for improving the energy efficiency of cloud datacenters, which are summarized under the umbrella term Green Cloud computing. In this paper, we use an economic approach - taxes - for reducing the energy consumption of datacenters. We developed a tax model called GreenCloud tax, which penalizes energy-inefficient datacenters while fostering datacenters that are energy-efficient. Hence, providers running energy-efficient datacenters are able to offer cheaper prices to consumers, which consequently leads to a shift of workloads from energy-inefficient datacenters to energy-efficient datacenters. The GreenCloud tax approach was implemented using the simulation environment CloudSim. We applied real data sets published in the SPEC benchmark for the executed simulation scenarios, which we used for evaluating the GreenCloud tax.

[36] arXiv:2509.02771 [pdf, html, other]
Title: Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
Nirmalya Mallick Thakur, Jia Qi Yip, Eng Siong Chng
Comments: Accepted by APSIPA ASC 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Neural audio codecs (NACs) have made significant advancements in recent years and are rapidly being adopted in many audio processing pipelines. However, they can introduce audio distortions which degrade speaker verification (SV) performance. This study investigates the impact of both traditional and neural audio codecs at varying bitrates on three state of-the-art SV models evaluated on the VoxCeleb1 dataset. Our findings reveal a consistent degradation in SV performance across all models and codecs as bitrates decrease. Notably, NACs do not fundamentally break SV performance when compared to traditional codecs. They outperform Opus by 6-8% at low-bitrates (< 12 kbps) and remain marginally behind at higher bitrates ($\approx$ 24 kbps), with an EER increase of only 0.4-0.7%. The disparity at higher bitrates is likely due to the primary optimization of NACs for perceptual quality, which can inadvertently discard critical speaker-discriminative features, unlike Opus which was designed to preserve vocal characteristics. Our investigation suggests that NACs are a feasible alternative to traditional codecs, especially under bandwidth limitations. To bridge the gap at higher bitrates, future work should focus on developing speaker-aware NACs or retraining and adapting SV models.

[37] arXiv:2509.02774 [pdf, other]
Title: Computational Social Science and Critical Studies of Education and Technology: An Improbable Combination?
Rebecca Eynon, Nabeel Gillani
Comments: Forthcoming in Learning, Media and Technology
Subjects: Computers and Society (cs.CY)

As belief around the potential of computational social science grows, fuelled by recent advances in machine learning, data scientists are ostensibly becoming the new experts in education. Scholars engaged in critical studies of education and technology have sought to interrogate the growing datafication of education yet tend not to use computational methods as part of this response. In this paper, we discuss the feasibility and desirability of the use of computational approaches as part of a critical research agenda. Presenting and reflecting upon two examples of projects that use computational methods in education to explore questions of equity and justice, we suggest that such approaches might help expand the capacity of critical researchers to highlight existing inequalities, make visible possible approaches for beginning to address such inequalities, and engage marginalised communities in designing and ultimately deploying these possibilities. Drawing upon work within the fields of Critical Data Studies and Science and Technology Studies, we further reflect on the two cases to discuss the possibilities and challenges of reimagining computational methods for critical research in education and technology, focusing on six areas of consideration: criticality, philosophy, inclusivity, context, classification, and responsibility.

[38] arXiv:2509.02782 [pdf, html, other]
Title: Key Principles in Cross-Domain Hyper-Heuristic Performance
Václav Sobotka, Lucas Kletzander, Nysret Musliu, Hana Rudová
Subjects: Artificial Intelligence (cs.AI)

Cross-domain selection hyper-heuristics aim to distill decades of research on problem-specific heuristic search algorithms into adaptable general-purpose search strategies. In this respect, existing selection hyper-heuristics primarily focus on an adaptive selection of low-level heuristics (LLHs) from a predefined set. In contrast, we concentrate on the composition of this set and its strategic transformations. We systematically analyze transformations based on three key principles: solution acceptance, LLH repetitions, and perturbation intensity, i.e., the proportion of a solution affected by a perturbative LLH. We demonstrate the raw effects of our transformations on a trivial unbiased random selection mechanism. With an appropriately constructed transformation, this trivial method outperforms all available state-of-the-art hyper-heuristics on three challenging real-world domains and finds 11 new best-known solutions. The same method is competitive with the winner of the CHeSC competition, commonly used as the standard cross-domain benchmark. Moreover, we accompany several recent hyper-heuristics with such strategic transformations. Using this approach, we outperform the current state-of-the-art methods on both the CHeSC benchmark and real-world domains while often simplifying their designs.

[39] arXiv:2509.02783 [pdf, html, other]
Title: The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface
Arnab Mazumder, Javier E. Santos, Noah Hobbs, Mohamed Mehana, Daniel O'Malley
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Geophysics (physics.geo-ph)

We present the Transparent Earth, a transformer-based architecture for reconstructing subsurface properties from heterogeneous datasets that vary in sparsity, resolution, and modality, where each modality represents a distinct type of observation (e.g., stress angle, mantle temperature, tectonic plate type). The model incorporates positional encodings of observations together with modality encodings, derived from a text embedding model applied to a description of each modality. This design enables the model to scale to an arbitrary number of modalities, making it straightforward to add new ones not considered in the initial design. We currently include eight modalities spanning directional angles, categorical classes, and continuous properties such as temperature and thickness. These capabilities support in-context learning, enabling the model to generate predictions either with no inputs or with an arbitrary number of additional observations from any subset of modalities. On validation data, this reduces errors in predicting stress angle by more than a factor of three. The proposed architecture is scalable and demonstrates improved performance with increased parameters. Together, these advances make the Transparent Earth an initial foundation model for the Earth's subsurface that ultimately aims to predict any subsurface property anywhere on Earth.

[40] arXiv:2509.02785 [pdf, other]
Title: DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
Jusheng Zhang, Yijia Fan, Kaitong Cai, Zimeng Huang, Xiaofei Sun, Jian Wang, Chengpei Tang, Keze Wang
Comments: Accepted 2025 EMNLP (MainConference)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper introduces DrDiff, a novel framework for long-text generation that overcomes the efficiency-quality trade-off through three core technologies. First, we design a dynamic expert scheduling mechanism that intelligently allocates computational resources during the diffusion process based on text complexity, enabling more efficient handling of text generation tasks of varying difficulty. Second, we introduce a Hierarchical Sparse Attention (HSA) mechanism that adaptively adjusts attention patterns according to a variety of input lengths, reducing computational complexity from O($n^2$) to O($n$) while maintaining model performance. Finally, we propose a soft absorption guidance optimization strategy that combines with DPM-solver++ to reduce diffusion steps, significantly improving generation speed. Comprehensive experiments on various long-text generation benchmarks demonstrate the superiority of our DrDiff over the existing SOTA methods.

[41] arXiv:2509.02792 [pdf, html, other]
Title: Structured Basis Function Networks: Loss-Centric Multi-Hypothesis Ensembles with Controllable Diversity
Alejandro Rodriguez Dominguez, Muhammad Shahzad, Xia Hong
Comments: 32 Pages, 10 Figures, 11 Tables
Subjects: Machine Learning (cs.LG)

Existing approaches to predictive uncertainty rely either on multi-hypothesis prediction, which promotes diversity but lacks principled aggregation, or on ensemble learning, which improves accuracy but rarely captures the structured ambiguity. This implicitly means that a unified framework consistent with the loss geometry remains absent. The Structured Basis Function Network addresses this gap by linking multi-hypothesis prediction and ensembling through centroidal aggregation induced by Bregman divergences. The formulation applies across regression and classification by aligning predictions with the geometry of the loss, and supports both a closed-form least-squares estimator and a gradient-based procedure for general objectives. A tunable diversity mechanism provides parametric control of the bias-variance-diversity trade-off, connecting multi-hypothesis generalisation with loss-aware ensemble aggregation. Experiments validate this relation and use the mechanism to study the complexity-capacity-diversity trade-off across datasets of increasing difficulty with deep-learning predictors.

[42] arXiv:2509.02794 [pdf, html, other]
Title: Learning General Policies From Examples
Blai Bonet, Hector Geffner
Subjects: Artificial Intelligence (cs.AI)

Combinatorial methods for learning general policies that solve large collections of planning problems have been recently developed. One of their strengths, in relation to deep learning approaches, is that the resulting policies can be understood and shown to be correct. A weakness is that the methods do not scale up and learn only from small training instances and feature pools that contain a few hundreds of states and features at most. In this work, we propose a new symbolic method for learning policies based on the generalization of sampled plans that ensures structural termination and hence acyclicity. The proposed learning approach is not based on SAT/ASP, as previous symbolic methods, but on a hitting set algorithm that can effectively handle problems with millions of states, and pools with hundreds of thousands of features. The formal properties of the approach are analyzed, and its scalability is tested on a number of benchmarks.

[43] arXiv:2509.02803 [pdf, html, other]
Title: Learning Laplacian Eigenvectors: a Pre-training Method for Graph Neural Networks
Howard Dai, Nyambura Njenga, Benjamin Whitsett, Catherine Ma, Darwin Deng, Sara de Ángel, Alexandre Van Tassel, Siddharth Viswanath, Ryan Pellico, Ian Adelstein, Smita Krishnaswamy
Subjects: Machine Learning (cs.LG)

We propose a novel framework for pre-training Graph Neural Networks (GNNs) by inductively learning Laplacian eigenvectors. Traditional Message Passing Neural Networks (MPNNs) often struggle to capture global and regional graph structure due to over-smoothing risk as network depth increases. Because the low-frequency eigenvectors of the graph Laplacian matrix encode global information, pre-training GNNs to predict these eigenvectors encourages the network to naturally learn large-scale structural patterns over each graph. Empirically, we show that models pre-trained via our framework outperform baseline models on a variety of graph structure-based tasks. While most existing pre-training methods focus on domain-specific tasks like node or edge feature reconstruction, our self-supervised pre-training framework is structure-based and highly flexible. Eigenvector-learning can be applied to all graph-based datasets, and can be used with synthetic features when task-specific data is sparse.

[44] arXiv:2509.02805 [pdf, html, other]
Title: Challenges in Understanding Modality Conflict in Vision-Language Models
Trang Nguyen, Jackson Michaels, Madalina Fiterau, David Jensen
Subjects: Machine Learning (cs.LG)

This paper highlights the challenge of decomposing conflict detection from conflict resolution in Vision-Language Models (VLMs) and presents potential approaches, including using a supervised metric via linear probes and group-based attention pattern analysis. We conduct a mechanistic investigation of LLaVA-OV-7B, a state-of-the-art VLM that exhibits diverse resolution behaviors when faced with conflicting multimodal inputs. Our results show that a linearly decodable conflict signal emerges in the model's intermediate layers and that attention patterns associated with conflict detection and resolution diverge at different stages of the network. These findings support the hypothesis that detection and resolution are functionally distinct mechanisms. We discuss how such decomposition enables more actionable interpretability and targeted interventions for improving model robustness in challenging multimodal settings.

[45] arXiv:2509.02806 [pdf, other]
Title: BISCAY: Practical Radio KPI Driven Congestion Control for Mobile Networks
Jon Larrea, Tanya Shreedhar, Mahesh K. Marina
Subjects: Networking and Internet Architecture (cs.NI)

Mobile application performance relies heavily on the congestion control design of the underlying transport, which is typically bottlenecked by cellular link and has to cope with rapid cellular link bandwidth fluctuations. We observe that radio KPI measurements from the mobile device chipset can be exploited for precise and timely measurement of available bandwidth on the cellular link. Building on this insight, we propose Biscay, a practical and radio KPI-driven congestion control system design for mobile networks. Biscay leverages OpenDiag, the in-kernel real-time radio KPI extraction tool we introduce in this paper, along with our KPI-based accurate bandwidth determination layer towards dynamically adjusting the congestion window to optimally use the available bandwidth while keeping delay to the minimum. Our solution is practical and deployable, as shown through our implementation of Biscay and OpenDiag on unrooted Android 5G phones. We extensively evaluate Biscay against different state-of-the-art congestion control designs including BBR and CUBIC with emulations driven by real measurement traces as well as real-world experiments spanning diverse 4G and 5G scenarios, and show that it provides significant average and tail delay improvements (typically over 90% reduction) while yielding better or similar throughput. These gains are enabled by 100% improvement in the granularity of on-device radio KPI measurements with OpenDiag compared to existing alternatives like MobileInsight.

[46] arXiv:2509.02807 [pdf, html, other]
Title: PixFoundation 2.0: Do Video Multi-Modal LLMs Use Motion in Visual Grounding?
Mennatullah Siam
Comments: Work under review in NeurIPS 2025 with the title "Are we using Motion in Referring Segmentation? A Motion-Centric Evaluation"
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-modal large language models (MLLMs) have shown impressive generalization across tasks using images and text modalities. While their extension to video has enabled tasks such as video question answering and video captioning, their pixel-level visual grounding abilities are less studied. In this work, we raise the pertinent question of whether motion is used in pixel-level visual grounding and whether video MLLMs can segment objects based on natural language expressions describing their motion patterns. We identify the shortcomings in the current benchmarks, where we show that a single frame can often suffice for capturing the motion referring expression without any temporal reasoning. To address this, we introduce four motion-centric probing techniques, particularly designed for the visual grounding task, to study video MLLMs' ability to identify true motion from a fake one and their ability to grasp the motion order. Consequently, we provide a motion-centric benchmark, MoCentric-Bench. It ensures that video MLLMs are evaluated towards leveraging the interaction between motion and language rather than being dominated by static appearance cues emphasized in existing visual grounding datasets. We further establish strong single-image baselines that are on par with or outperform prior methods. Finally, we explore simple motion-centric adaptation techniques that provide state-of-the-art performance on our MoCentric-Bench. Our motion-centric benchmark, evaluation and findings challenge future models to improve dense spatiotemporal grounding and pixel-level understanding within videos. Code and datasets will be made publicly available at this https URL.

[47] arXiv:2509.02808 [pdf, html, other]
Title: Improving the Resilience of Quadrotors in Underground Environments by Combining Learning-based and Safety Controllers
Isaac Ronald Ward, Mark Paral, Kristopher Riordan, Mykel J. Kochenderfer
Comments: Accepted and awarded best paper at the 11th International Conference on Control, Decision and Information Technologies (CoDIT 2025 - this https URL)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Autonomously controlling quadrotors in large-scale subterranean environments is applicable to many areas such as environmental surveying, mining operations, and search and rescue. Learning-based controllers represent an appealing approach to autonomy, but are known to not generalize well to `out-of-distribution' environments not encountered during training. In this work, we train a normalizing flow-based prior over the environment, which provides a measure of how far out-of-distribution the quadrotor is at any given time. We use this measure as a runtime monitor, allowing us to switch between a learning-based controller and a safe controller when we are sufficiently out-of-distribution. Our methods are benchmarked on a point-to-point navigation task in a simulated 3D cave environment based on real-world point cloud data from the DARPA Subterranean Challenge Final Event Dataset. Our experimental results show that our combined controller simultaneously possesses the liveness of the learning-based controller (completing the task quickly) and the safety of the safety controller (avoiding collision).

[48] arXiv:2509.02809 [pdf, other]
Title: Predicting Movie Success with Multi-Task Learning: A Hybrid Framework Combining GPT-Based Sentiment Analysis and SIR Propagation
Wenlan Xie
Subjects: Social and Information Networks (cs.SI)

This study presents a hybrid framework for predicting movie success. The framework integrates multi-task learning (MTL), GPT-based sentiment analysis, and Susceptible-Infected-Recovered (SIR) propagation modeling. The study examines limitations in existing approaches. It models static production attributes, information dissemination, and audience sentiment at the same time. The framework uses 5,840 films from 2004 to 2024 and approximate 300,000 user reviews. It shows predictive performance with classification accuracy of 0.964 and regression metrics of MAE 0.388. Ablation analysis indicates component interactions. Selective feature combinations perform better than the comprehensive model. This result questions assumptions about feature integration. The model shows virality patterns between successful and unsuccessful films. Innovations include epidemiological modeling for information diffusion, multidimensional sentiment features from GPT-based analysis, and a shared representation architecture that optimizes multiple success metrics. The framework provides applications in the film production lifecycle. It also contributes to understanding how audience engagement leads to commercial outcomes.

[49] arXiv:2509.02811 [pdf, html, other]
Title: Performance Evaluation of LoRa for IoT Applications in Non-Terrestrial Networks via ns-3
Alessandro Traspadini, Michele Zorzi, Marco Giordani
Comments: 6 pages, 4 figures, 2 tables. Accepted for publication in the 2025 IEEE Global Communications Conference (GLOBECOM) \c{opyright}2025 IEEE. A. Traspadini, M. Zorzi, and M. Giordani "Performance Evaluation of LoRa for IoT Applications in Non-Terrestrial Networks via ns-3," in Proc. IEEE Global Communications Conference (GLOBECOM), 2025
Subjects: Networking and Internet Architecture (cs.NI)

The integration of Internet of Things (IoT) and Non-Terrestrial Networks (NTNs) has emerged as a key paradigm to provide connectivity for sensors and actuators via satellite gateways in remote areas where terrestrial infrastructure is limited or unavailable. Among other Low-Power Wide-Area Network (LPWAN) technologies for IoT, Long Range (LoRa) holds great potential given its long range, energy efficiency, and flexibility. In this paper, we explore the feasibility and performance of LoRa to support large-scale IoT connectivity through Low Earth Orbit (LEO) satellite gateways. To do so, we developed a new ns3-LoRa-NTN simulation module, which integrates and extends the ns3-LoRa and ns3-NTN modules, to enable full-stack end-to-end simulation of satellite communication in LoRa networks. Our results, given in terms of average data rate and Packet Reception Ratio (PRR), confirm that LoRa can effectively support direct communication from the ground to LEO satellites, but network optimization is required to mitigate collision probability when end nodes use the same Spreading Factors (SFs) over long distances.

[50] arXiv:2509.02812 [pdf, html, other]
Title: Rollout-Based Approximate Dynamic Programming for MDPs with Information-Theoretic Constraints
Zixuan He, Charalambos D. Charalambous, Photios A. Stavrou
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT)

This paper studies a finite-horizon Markov decision problem with information-theoretic constraints, where the goal is to minimize directed information from the controlled source process to the control process, subject to stage-wise cost constraints, aiming for an optimal control policy. We propose a new way of approximating a solution for this problem, which is known to be formulated as an unconstrained MDP with a continuous information-state using Q-factors. To avoid the computational complexity of discretizing the continuous information-state space, we propose a truncated rollout-based backward-forward approximate dynamic programming (ADP) framework. Our approach consists of two phases: an offline base policy approximation over a shorter time horizon, followed by an online rollout lookahead minimization, both supported by provable convergence guarantees. We supplement our theoretical results with a numerical example where we demonstrate the cost improvement of the rollout method compared to a previously proposed policy approximation method, and the computational complexity observed in executing the offline and online phases for the two methods.

[51] arXiv:2509.02815 [pdf, html, other]
Title: Multi-Embodiment Locomotion at Scale with extreme Embodiment Randomization
Nico Bohlinger, Jan Peters
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

We present a single, general locomotion policy trained on a diverse collection of 50 legged robots. By combining an improved embodiment-aware architecture (URMAv2) with a performance-based curriculum for extreme Embodiment Randomization, our policy learns to control millions of morphological variations. Our policy achieves zero-shot transfer to unseen real-world humanoid and quadruped robots.

[52] arXiv:2509.02820 [pdf, html, other]
Title: Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Naman Deep Singh, Maximilian Müller, Francesco Croce, Matthias Hein
Subjects: Machine Learning (cs.LG)

Unlearning in large language models (LLMs) involves precisely removing specific information from a pre-trained model. This is crucial to ensure safety of LLMs by deleting private data or harmful knowledge acquired during pre-training. However, existing unlearning methods often fall short when subjected to thorough evaluation. To overcome this, we introduce JensUn, where we leverage the Jensen-Shannon Divergence as the training objective for both forget and retain sets for more stable and effective unlearning dynamics compared to commonly used loss functions. In extensive experiments, JensUn achieves better forget-utility trade-off than competing methods, and even demonstrates strong resilience to benign relearning. Additionally, for a precise unlearning evaluation, we introduce LKF, a curated dataset of lesser-known facts that provides a realistic unlearning scenario. Finally, to comprehensively test unlearning methods, we propose (i) employing an LLM as semantic judge instead of the standard ROUGE score, and (ii) using worst-case unlearning evaluation over various paraphrases and input formats. Our improved evaluation framework reveals that many existing methods are less effective than previously thought.

[53] arXiv:2509.02822 [pdf, html, other]
Title: Hybrid dynamical systems modeling of power systems
B.G. Odunlami, M. Netto, Y. Susuki
Subjects: Systems and Control (eess.SY)

The increasing integration of renewable energy sources has introduced complex dynamic behavior in power systems that challenge the adequacy of traditional continuous-time modeling approaches. These developments call for modeling frameworks that can capture the intricate interplay between continuous dynamics and discrete events characterizing modern grid operations. Hybrid dynamical systems offer a rigorous foundation for representing such mixed dynamics and have emerged as a valuable tool in power system analysis. Despite their potential, existing studies remain focused on isolated applications or case-specific implementations, offering limited generalizability and guidance for model selection. This paper addresses that gap by providing a comprehensive overview of hybrid modeling approaches relevant to power systems. It critically examines key formalisms, including hybrid automata, switched systems, and piecewise affine models, evaluating their respective strengths, limitations, and suitability across control, stability, and system design tasks. In doing so, the paper identifies open challenges and outlines future research directions to support the systematic application of hybrid methods in renewable-rich, converter-dominated power systems

[54] arXiv:2509.02824 [pdf, html, other]
Title: GPS Spoofing Attacks on Automated Frequency Coordination System in Wi-Fi 6E and Beyond
Yilu Dong (1), Tianchang Yang (1), Arupjyoti Bhuyan (2), Syed Rafiul Hussain (1) ((1) The Pennsylvania State University, (2) Idaho National Laboratory)
Comments: 6 pages, 4 figures, to be published in European Wireless 2025
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)

The 6 GHz spectrum, recently opened for unlicensed use under Wi-Fi 6E and Wi-Fi 7, overlaps with frequencies used by mission-critical incumbent systems such as public safety communications and utility infrastructure. To prevent interference, the FCC mandates the use of Automated Frequency Coordination (AFC) systems, which assign safe frequency and power levels based on Wi-Fi Access Point (AP)-reported locations. In this work, we demonstrate that GPS-based location reporting, which Wi-Fi APs use, can be spoofed using inexpensive, off-the-shelf radio equipment. This enables attackers to manipulate AP behavior, gain unauthorized spectrum access, cause harmful interference, or disable APs entirely by spoofing them into foreign locations. We validate these attacks in a controlled lab setting against a commercial AP and evaluate a commercial AFC system under spoofed scenarios. Our findings highlight critical gaps in the security assumptions of AFC and motivate the need for stronger location integrity protections.

[55] arXiv:2509.02826 [pdf, html, other]
Title: Ensemble Learning for Healthcare: A Comparative Analysis of Hybrid Voting and Ensemble Stacking in Obesity Risk Prediction
Towhidul Islam, Md Sumon Ali
Comments: 26 pages, 3 figures, 16 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applications (stat.AP); Computation (stat.CO)

Obesity is a critical global health issue driven by dietary, physiological, and environmental factors, and is strongly associated with chronic diseases such as diabetes, cardiovascular disorders, and cancer. Machine learning has emerged as a promising approach for early obesity risk prediction, yet a comparative evaluation of ensemble techniques -- particularly hybrid majority voting and ensemble stacking -- remains limited. This study aims to compare hybrid majority voting and ensemble stacking methods for obesity risk prediction, identifying which approach delivers higher accuracy and efficiency. The analysis seeks to highlight the complementary strengths of these ensemble techniques in guiding better predictive model selection for healthcare applications. Two datasets were utilized to evaluate three ensemble models: Majority Hard Voting, Weighted Hard Voting, and Stacking (with a Multi-Layer Perceptron as meta-classifier). A pool of nine Machine Learning (ML) algorithms, evaluated across a total of 50 hyperparameter configurations, was analyzed to identify the top three models to serve as base learners for the ensemble methods. Preprocessing steps involved dataset balancing, and outlier detection, and model performance was evaluated using Accuracy and F1-Score. On Dataset-1, weighted hard voting and stacking achieved nearly identical performance (Accuracy: 0.920304, F1: 0.920070), outperforming majority hard voting. On Dataset-2, stacking demonstrated superior results (Accuracy: 0.989837, F1: 0.989825) compared to majority hard voting (Accuracy: 0.981707, F1: 0.981675) and weighted hard voting, which showed the lowest performance. The findings confirm that ensemble stacking provides stronger predictive capability, particularly for complex data distributions, while hybrid majority voting remains a robust alternative.

[56] arXiv:2509.02828 [pdf, html, other]
Title: Store Languages of Turing Machines and Counter Machines
Noah Friesen, Oscar H. Ibarra, Jozef Jirásek, Ian McQuillan
Comments: 22 pages, 1 figure
Subjects: Formal Languages and Automata Theory (cs.FL)

The store language of an automaton is the set of store configurations (state and store contents, but not the input) that can appear as an intermediate step in an accepting computation. A one-way nondeterministic finite-visit Turing machine (fvNTM) is a Turing machine with a one-way read-only input tape, and a single worktape, where there is some number $k$ such that in every accepting computation, each worktape cell is visited at most $k$ times. We show that the store language of every fvNTM is a regular language. Furthermore, we show that the store language of every fvNTM augmented by reversal-bounded counters can be accepted by a machine with only reversal-bounded counters and no worktape. Several applications are given to problems in the areas of verification and fault tolerance, and to the study of right quotients. We also continue the investigation of the store languages of one-way and two-way machine models where we present some conditions under which their store languages are recursive or non-recursive.

[57] arXiv:2509.02830 [pdf, html, other]
Title: SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR
Pu Wang, Shinji Watanabe, Hugo Van hamme
Comments: Accepted by IEEE ASRU 2025
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Parameter-efficient fine-tuning (PEFT) has emerged as a scalable solution for adapting large foundation models. While low-rank adaptation (LoRA) is widely used in speech applications, its state-of-the-art variants, e.g., VeRA, DoRA, PiSSA, and SVFT, are developed mainly for language and vision tasks, with limited validation in speech. This work presents the first comprehensive integration and benchmarking of these PEFT methods within ESPnet. We further introduce structured SVD-guided (SSVD) fine-tuning, which selectively rotates input-associated right singular vectors while keeping output-associated vectors fixed to preserve semantic mappings. This design enables robust domain adaptation with minimal trainable parameters and improved efficiency. We evaluate all methods on domain-shifted speech recognition tasks, including child speech and dialectal variation, across model scales from 0.1B to 2B. All implementations are released in ESPnet to support reproducibility and future work.

[58] arXiv:2509.02834 [pdf, html, other]
Title: Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models
Gustavo Bonil, João Gondim, Marina dos Santos, Simone Hashiguti, Helena Maia, Nadia Silva, Helio Pedrini, Sandra Avila
Comments: 12 pages, 3 figures. Accepted at STIL @ BRACIS 2025
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This study investigates how large language models, in particular LLaMA 3.2-3B, construct narratives about Black and white women in short stories generated in Portuguese. From 2100 texts, we applied computational methods to group semantically similar stories, allowing a selection for qualitative analysis. Three main discursive representations emerge: social overcoming, ancestral mythification and subjective self-realization. The analysis uncovers how grammatically coherent, seemingly neutral texts materialize a crystallized, colonially structured framing of the female body, reinforcing historical inequalities. The study proposes an integrated approach, that combines machine learning techniques with qualitative, manual discourse analysis.

[59] arXiv:2509.02837 [pdf, html, other]
Title: HF-RAG: Hierarchical Fusion-based RAG with Multiple Sources and Rankers
Payel Santra, Madhusudan Ghosh, Debasis Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Leveraging both labeled (input-output associations) and unlabeled data (wider contextual grounding) may provide complementary benefits in retrieval augmented generation (RAG). However, effectively combining evidence from these heterogeneous sources is challenging as the respective similarity scores are not inter-comparable. Additionally, aggregating beliefs from the outputs of multiple rankers can improve the effectiveness of RAG. Our proposed method first aggregates the top-documents from a number of IR models using a standard rank fusion technique for each source (labeled and unlabeled). Next, we standardize the retrieval score distributions within each source by applying z-score transformation before merging the top-retrieved documents from the two sources. We evaluate our approach on the fact verification task, demonstrating that it consistently improves over the best-performing individual ranker or source and also shows better out-of-domain generalization.

[60] arXiv:2509.02839 [pdf, other]
Title: An overview of Koopman-based control: From error bounds to closed-loop guarantees
Robin Strässer, Karl Worthmann, Igor Mezić, Julian Berberich, Manuel Schaller, Frank Allgöwer
Subjects: Systems and Control (eess.SY)

Controlling nonlinear dynamical systems remains a central challenge in a wide range of applications, particularly when accurate first-principle models are unavailable. Data-driven approaches offer a promising alternative by designing controllers directly from observed trajectories. A wide range of data-driven methods relies on the Koopman-operator framework that enables linear representations of nonlinear dynamics via lifting into higher-dimensional observable spaces. Finite-dimensional approximations, such as extended dynamic mode decomposition (EDMD) and its controlled variants, make prediction and feedback control tractable but introduce approximation errors that must be accounted for to provide rigorous closed-loop guarantees. This survey provides a systematic overview of Koopman-based control, emphasizing the connection between data-driven surrogate models generated from finite data, approximation errors, controller design, and closed-loop guarantees. We review theoretical foundations, error bounds, and both linear and bilinear EDMD-based control schemes, highlighting robust strategies that ensure stability and performance. Finally, we discuss open challenges and future directions at the interface of operator theory, approximation theory, and nonlinear control.

[61] arXiv:2509.02840 [pdf, other]
Title: Fast and Accurate SVD-Type Updating in Streaming Data
Johannes J. Brust, Michael A. Saunders
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Mathematical Software (cs.MS)

For a datastream, the change over a short interval is often of low rank. For high throughput information arranged in matrix format, recomputing an optimal SVD approximation after each step is typically prohibitive. Instead, incremental and truncated updating strategies are used, which may not scale for large truncation ranks. Therefore, we propose a set of efficient new algorithms that update a bidiagonal factorization, and which are similarly accurate as the SVD methods. In particular, we develop a compact Householder-type algorithm that decouples a sparse part from a low-rank update and has about half the memory requirements of standard bidiagonalization methods. A second algorithm based on Givens rotations has only about 10 flops per rotation and scales quadratically with the problem size, compared to a typical cubic scaling. The algorithm is therefore effective for processing high-throughput updates, as we demonstrate in tracking large subspaces of recommendation systems and networks, and when compared to well known software such as LAPACK or the incremental SVD.

[62] arXiv:2509.02844 [pdf, html, other]
Title: Conformal Prediction for Time-series Forecasting with Change Points
Sophia Sun, Rose Yu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Conformal prediction has been explored as a general and efficient way to provide uncertainty quantification for time series. However, current methods struggle to handle time series data with change points - sudden shifts in the underlying data-generating process. In this paper, we propose a novel Conformal Prediction for Time-series with Change points (CPTC) algorithm, addressing this gap by integrating a model to predict the underlying state with online conformal prediction to model uncertainties in non-stationary time series. We prove CPTC's validity and improved adaptivity in the time series setting under minimum assumptions, and demonstrate CPTC's practical effectiveness on 6 synthetic and real-world datasets, showing improved validity and adaptivity compared to state-of-the-art baselines.

[63] arXiv:2509.02846 [pdf, html, other]
Title: Towards Reasoning for PDE Foundation Models: A Reward-Model-Driven Inference-Time-Scaling Algorithm
Siddharth Mansingh, James Amarel, Ragib Arnab, Arvind Mohan, Kamaljeet Singh, Gerd J. Kunde, Nicolas Hengartner, Benjamin Migliori, Emily Casleton, Nathan A. Debarledeben, Ayan Biswas, Diane Oyen, Earl Lawrence
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

Partial Differential Equations (PDEs) are the bedrock for modern computational sciences and engineering, and inherently computationally expensive. While PDE foundation models have shown much promise for simulating such complex spatio-temporal phenomena, existing models remain constrained by the pretraining datasets and struggle with auto-regressive rollout performance, especially in out-of-distribution (OOD) cases. Furthermore, they have significant compute and training data requirements which hamper their use in many critical applications. Inspired by recent advances in ``thinking" strategies used in large language models (LLMs), we introduce the first test-time computing (TTC) strategy for PDEs that utilizes computational resources during inference to achieve more accurate predictions with fewer training samples and smaller models. We accomplish this with two types of reward models that evaluate predictions of a stochastic based model for spatio-temporal consistency. We demonstrate this method on compressible Euler-equation simulations from the PDEGym benchmark and show that TTC captures improved predictions relative to standard non-adaptive auto-regressive inference. This TTC framework marks a foundational step towards more advanced reasoning algorithms or PDE modeling, inluding building reinforcement-learning-based approaches, potentially transforming computational workflows in physics and engineering.

[64] arXiv:2509.02851 [pdf, other]
Title: Multi-Scale Deep Learning for Colon Histopathology: A Hybrid Graph-Transformer Approach
Sadra Saremi, Amirhossein Ahmadkhan Kordbacheh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Colon cancer also known as Colorectal cancer, is one of the most malignant types of cancer worldwide. Early-stage detection of colon cancer is highly crucial to prevent its deterioration. This research presents a hybrid multi-scale deep learning architecture that synergizes capsule networks, graph attention mechanisms, transformer modules, and residual learning to advance colon cancer classification on the Lung and Colon Cancer Histopathological Image Dataset (LC25000) dataset. The proposed model in this paper utilizes the HG-TNet model that introduces a hybrid architecture that joins strength points in transformers and convolutional neural networks to capture multi-scale features in histopathological images. Mainly, a transformer branch extracts global contextual bonds by partitioning the image into patches by convolution-based patch embedding and then processing these patches through a transformer encoder. Analogously, a dedicated CNN branch captures fine-grained, local details through successive Incorporation these diverse features, combined with a self-supervised rotation prediction objective, produce a robust diagnostic representation that surpasses standard architectures in performance. Results show better performance not only in accuracy or loss function but also in these algorithms by utilizing capsule networks to preserve spatial orders and realize how each element individually combines and forms whole structures.

[65] arXiv:2509.02853 [pdf, other]
Title: The Architecture of AI Transformation: Four Strategic Patterns and an Emerging Frontier
Diana A. Wolfe, Alice Choe, Fergus Kidd
Comments: 59 pages, 2 tables, 4 figures
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Despite extensive investment in artificial intelligence, 95% of enterprises report no measurable profit impact from AI deployments (MIT, 2025). We argue that this gap reflects paradigmatic lock-in that channels AI into incremental optimization rather than structural transformation. Using a cross-case analysis, we propose a 2x2 framework that reconceptualizes AI strategy along two independent dimensions: the degree of transformation achieved (incremental to transformational) and the treatment of human contribution (reduced to amplified). The framework surfaces four patterns now dominant in practice: individual augmentation, process automation, workforce substitution, and a less deployed frontier of collaborative intelligence. Evidence shows that the first three reinforce legacy work models and yield localized gains without durable value capture. Realizing collaborative intelligence requires three mechanisms: complementarity (pairing distinct human and machine strengths), co-evolution (mutual adaptation through interaction), and boundary-setting (human determination of ethical and strategic parameters). Complementarity and boundary-setting are observable in regulated and high-stakes domains; co-evolution is largely absent, which helps explain limited system-level impact. A case study analysis illustrates that advancing toward collaborative intelligence requires material restructuring of roles, governance, and data architecture rather than additional tools. The framework reframes AI transformation as an organizational design challenge: moving from optimizing the division of labor between humans and machines to architecting their convergence, with implications for operating models, workforce development, and the future of work.

[66] arXiv:2509.02855 [pdf, html, other]
Title: IDEAlign: Comparing Large Language Models to Human Experts in Open-ended Interpretive Annotations
Hyunji Nam, Lucia Langlois, James Malamut, Mei Tan, Dorottya Demszky
Comments: 10 pages, 9 pages for appendix
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

Large language models (LLMs) are increasingly applied to open-ended, interpretive annotation tasks, such as thematic analysis by researchers or generating feedback on student work by teachers. These tasks involve free-text annotations requiring expert-level judgments grounded in specific objectives (e.g., research questions or instructional goals). Evaluating whether LLM-generated annotations align with those generated by expert humans is challenging to do at scale, and currently, no validated, scalable measure of similarity in ideas exists. In this paper, we (i) introduce the scalable evaluation of interpretive annotation by LLMs as a critical and understudied task, (ii) propose IDEAlgin, an intuitive benchmarking paradigm for capturing expert similarity ratings via a "pick-the-odd-one-out" triplet judgment task, and (iii) evaluate various similarity metrics, including vector-based ones (topic models, embeddings) and LLM-as-a-judge via IDEAlgin, against these human benchmarks. Applying this approach to two real-world educational datasets (interpretive analysis and feedback generation), we find that vector-based metrics largely fail to capture the nuanced dimensions of similarity meaningful to experts. Prompting LLMs via IDEAlgin significantly improves alignment with expert judgments (9-30% increase) compared to traditional lexical and vector-based metrics. These results establish IDEAlgin as a promising paradigm for evaluating LLMs against open-ended expert annotations at scale, informing responsible deployment of LLMs in education and beyond.

[67] arXiv:2509.02856 [pdf, html, other]
Title: Managing Correlations in Data and Privacy Demand
Syomantak Chaudhuri, Thomas A. Courtade
Comments: To appeat at ACM CCS, 2025
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Previous works in the differential privacy literature that allow users to choose their privacy levels typically operate under the heterogeneous differential privacy (HDP) framework with the simplifying assumption that user data and privacy levels are not correlated. Firstly, we demonstrate that the standard HDP framework falls short when user data and privacy demands are allowed to be correlated. Secondly, to address this shortcoming, we propose an alternate framework, Add-remove Heterogeneous Differential Privacy (AHDP), that jointly accounts for user data and privacy preference. We show that AHDP is robust to possible correlations between data and privacy. Thirdly, we formalize the guarantees of the proposed AHDP framework through an operational hypothesis testing perspective. The hypothesis testing setup may be of independent interest in analyzing other privacy frameworks as well. Fourthly, we show that there exists non-trivial AHDP mechanisms that notably do not require prior knowledge of the data-privacy correlations. We propose some such mechanisms and apply them to core statistical tasks such as mean estimation, frequency estimation, and linear regression. The proposed mechanisms are simple to implement with minimal assumptions and modeling requirements, making them attractive for real-world use. Finally, we empirically evaluate proposed AHDP mechanisms, highlighting their trade-offs using LLM-generated synthetic datasets, which we release for future research.

[68] arXiv:2509.02859 [pdf, html, other]
Title: Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
Sandipana Dowerah, Atharva Kulkarni, Ajinkya Kulkarni, Hoan My Tran, Joonas Kalda, Artem Fedorchenko, Benoit Fauve, Damien Lolive, Tanel Alumäe, Matthew Magimai Doss
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Parallel to the development of advanced deepfake audio generation, audio deepfake detection has also seen significant progress. However, a standardized and comprehensive benchmark is still missing. To address this, we introduce Speech DeepFake (DF) Arena, the first comprehensive benchmark for audio deepfake detection. Speech DF Arena provides a toolkit to uniformly evaluate detection systems, currently across 14 diverse datasets and attack scenarios, standardized evaluation metrics and protocols for reproducibility and transparency. It also includes a leaderboard to compare and rank the systems to help researchers and developers enhance their reliability and robustness. We include 14 evaluation sets, 12 state-of-the-art open-source and 3 proprietary detection systems. Our study presents many systems exhibiting high EER in out-of-domain scenarios, highlighting the need for extensive cross-domain evaluation. The leaderboard is hosted on Huggingface1 and a toolkit for reproducing results across the listed datasets is available on GitHub.

[69] arXiv:2509.02860 [pdf, html, other]
Title: Vision: An Extensible Methodology for Formal Software Verification in Microservice Systems
Connor Wojtak, Darek Gajewski, Tomas Cerny
Comments: Accepted at MODELS 2025
Subjects: Software Engineering (cs.SE); Logic in Computer Science (cs.LO)

Microservice systems are becoming increasingly adopted due to their scalability, decentralized development, and support for continuous integration and delivery (CI/CD). However, this decentralized development by separate teams and continuous evolution can introduce miscommunication and incompatible implementations, undermining system maintainability and reliability across aspects from security policy to system architecture. We propose a novel methodology that statically reconstructs microservice source code into a formal system model. From this model, a Satisfiability Modulo Theories (SMT) constraint set can be derived, enabling formal verification. Our methodology is extensible, supporting software verification across multiple cross-cutting concerns. We focus on applying the methodology to verify the system architecture concern, presenting formal reasoning to validate the methodology's correctness and applicability for this concern. Additional concerns such as security policy implementation are considered. Future directions are established to extend and evaluate the methodology.

[70] arXiv:2509.02861 [pdf, html, other]
Title: Power Grid Control with Graph-Based Distributed Reinforcement Learning
Carlo Fabrizio, Gianvito Losapio, Marco Mussi, Alberto Maria Metelli, Marcello Restelli
Subjects: Machine Learning (cs.LG)

The necessary integration of renewable energy sources, combined with the expanding scale of power networks, presents significant challenges in controlling modern power grids. Traditional control systems, which are human and optimization-based, struggle to adapt and to scale in such an evolving context, motivating the exploration of more dynamic and distributed control strategies. This work advances a graph-based distributed reinforcement learning framework for real-time, scalable grid management. The proposed architecture consists of a network of distributed low-level agents acting on individual power lines and coordinated by a high-level manager agent. A Graph Neural Network (GNN) is employed to encode the network's topological information within the single low-level agent's observation. To accelerate convergence and enhance learning stability, the framework integrates imitation learning and potential-based reward shaping. In contrast to conventional decentralized approaches that decompose only the action space while relying on global observations, this method also decomposes the observation space. Each low-level agent acts based on a structured and informative local view of the environment constructed through the GNN. Experiments on the Grid2Op simulation environment show the effectiveness of the approach, which consistently outperforms the standard baseline commonly adopted in the field. Additionally, the proposed model proves to be much more computationally efficient than the simulation-based Expert method.

[71] arXiv:2509.02863 [pdf, other]
Title: Enhancing Machine Learning for Imbalanced Medical Data: A Quantum-Inspired Approach to Synthetic Oversampling (QI-SMOTE)
Vikas Kashtriya, Pardeep Singh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Class imbalance remains a critical challenge in machine learning (ML), particularly in the medical domain, where underrepresented minority classes lead to biased models and reduced predictive performance. This study introduces Quantum-Inspired SMOTE (QI-SMOTE), a novel data augmentation technique that enhances the performance of ML classifiers, including Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), k-Nearest Neighbors (KNN), Gradient Boosting (GB), and Neural Networks, by leveraging quantum principles such as quantum evolution and layered entanglement. Unlike conventional oversampling methods, QI-SMOTE generates synthetic instances that preserve complex data structures, improving model generalization and classification accuracy. We validate QI-SMOTE on the MIMIC-III and MIMIC-IV datasets, using mortality detection as a benchmark task due to their clinical significance and inherent class imbalance. We compare our method against traditional oversampling techniques, including Borderline-SMOTE, ADASYN, SMOTE-ENN, SMOTE-TOMEK, and SVM-SMOTE, using key performance metrics such as Accuracy, F1-score, G-Mean, and AUC-ROC. The results demonstrate that QI-SMOTE significantly improves the effectiveness of ensemble methods (RF, GB, ADA), kernel-based models (SVM), and deep learning approaches by producing more informative and balanced training data. By integrating quantum-inspired transformations into the ML pipeline, QI-SMOTE not only mitigates class imbalance but also enhances the robustness and reliability of predictive models in medical diagnostics and decision-making. This study highlights the potential of quantum-inspired resampling techniques in advancing state-of-the-art ML methodologies.

[72] arXiv:2509.02864 [pdf, html, other]
Title: A-SEA3L-QA: A Fully Automated Self-Evolving, Adversarial Workflow for Arabic Long-Context Question-Answer Generation
Kesen Wang, Daulet Toibazar, Pedro J. Moreno
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present an end-to-end, self-evolving adversarial workflow for long-context Question-Answer (QA) Generation in Arabic. By orchestrating multiple specialized LVLMs: a question generator, an evaluator, and a swarm of answer generators, our system iteratively refines its own performance without any human intervention. Starting from raw, multi-page Arabic documents across diverse domains, the question generator produces fine-grained, context-aware queries to be tackled by the answer generator swarm, and the evaluator assesses and feeds back quality metrics. This closed-loop cycle enables continuous learning: low-confidence outputs trigger automated re-generation and model updates, progressively enhancing question difficulty and relevance. Moreover, we set the quality metrics as a tunable hyperparameter, enabling question generation at controllable and customizable difficulty levels. We release AraLongBench, a large-scale Arabic benchmark of single- and multi-page challenges spanning hundreds of pages, and demonstrate that our self-evolving workflow substantially outperform static pipelines, markedly boosting the long-context comprehension capabilities of leading Arabic Large Vision Language Models (LVLMs). Lastly, we also meticulously architect a fully automated agentic workflow for long-context Arabic document collection.

[73] arXiv:2509.02869 [pdf, html, other]
Title: A Distributed Gradient-Based Deployment Strategy for a Network of Sensors with a Probabilistic Sensing Model
Hesam Mosalli, Amir G. Aghdam
Comments: The shorter version is accepted at the 64th IEEE Conference on Decision and Control
Subjects: Systems and Control (eess.SY)

This paper presents a distributed gradient-based deployment strategy to maximize coverage in hybrid wireless sensor networks (WSNs) with probabilistic sensing. Leveraging Voronoi partitioning, the overall coverage is reformulated as a sum of local contributions, enabling mobile sensors to optimize their positions using only local information. The strategy adopts the Elfes model to capture detection uncertainty and introduces a dynamic step size based on the gradient of the local coverage, ensuring movements adaptive to regional importance. Obstacle awareness is integrated via visibility constraints, projecting sensor positions to unobstructed paths. A threshold-based decision rule ensures movement occurs only for sufficiently large coverage gains, with convergence achieved when all sensors and their neighbors stop at a local maximum configuration. Simulations demonstrate improved coverage over static deployments, highlighting scalability and practicality for real-world applications.

[74] arXiv:2509.02870 [pdf, html, other]
Title: Robotic 3D Flower Pose Estimation for Small-Scale Urban Farms
Harsh Muriki, Hong Ray Teo, Ved Sengupta, Ai-Ping Hu
Comments: 7 pages, 7 figures
Subjects: Robotics (cs.RO)

The small scale of urban farms and the commercial availability of low-cost robots (such as the FarmBot) that automate simple tending tasks enable an accessible platform for plant phenotyping. We have used a FarmBot with a custom camera end-effector to estimate strawberry plant flower pose (for robotic pollination) from acquired 3D point cloud models. We describe a novel algorithm that translates individual occupancy grids along orthogonal axes of a point cloud to obtain 2D images corresponding to the six viewpoints. For each image, 2D object detection models for flowers are used to identify 2D bounding boxes which can be converted into the 3D space to extract flower point clouds. Pose estimation is performed by fitting three shapes (superellipsoids, paraboloids and planes) to the flower point clouds and compared with manually labeled ground truth. Our method successfully finds approximately 80% of flowers scanned using our customized FarmBot platform and has a mean flower pose error of 7.7 degrees, which is sufficient for robotic pollination and rivals previous results. All code will be made available at this https URL.

[75] arXiv:2509.02873 [pdf, html, other]
Title: Portable Targeted Sampling Framework Using LLVM
Zhantong Qiu, Mahyar Samani, Jason Lowe-Power
Subjects: Hardware Architecture (cs.AR)

Comprehensive architectural evaluation of full workloads is throttled by slow simulation and per-binary sampling pipelines. We present Nugget, a flexible framework for portable sampling across simulators and real hardware, ISAs, and libraries. Nugget operates at the LLVM IR level to perform binary-agnostic interval analysis, then emits lightweight, cross-platform executables--nuggets--that can be validated on real machines before driving simulation. Across SPEC CPU2017, NPB, and LSMS, Nugget cuts interval-analysis cost by orders of magnitude relative to functional simulation (up to ~578X on multithreaded NPB), keeps single-thread overhead low, and enables native-speed validation of selected samples. Case studies with gem5 show that nuggets support evaluation of system performance and model accuracy. Nugget makes sampling methodology research faster and more portable.

[76] arXiv:2509.02876 [pdf, other]
Title: Generalizable Skill Learning for Construction Robots with Crowdsourced Natural Language Instructions, Composable Skills Standardization, and Large Language Model
Hongrui Yu, Vineet R. Kamat, Carol C. Menassa
Comments: Under review for ASCE OPEN: Multidisciplinary Journal of Civil Engineering
Subjects: Robotics (cs.RO)

The quasi-repetitive nature of construction work and the resulting lack of generalizability in programming construction robots presents persistent challenges to the broad adoption of robots in the construction industry. Robots cannot achieve generalist capabilities as skills learnt from one domain cannot readily transfer to another work domain or be directly used to perform a different set of tasks. Human workers have to arduously reprogram their scene-understanding, path-planning, and manipulation components to enable the robots to perform alternate work tasks. The methods presented in this paper resolve a significant proportion of such reprogramming workload by proposing a generalizable learning architecture that directly teaches robots versatile task-performance skills through crowdsourced online natural language instructions. A Large Language Model (LLM), a standardized and modularized hierarchical modeling approach, and Building Information Modeling-Robot sematic data pipeline are developed to address the multi-task skill transfer problem. The proposed skill standardization scheme and LLM-based hierarchical skill learning framework were tested with a long-horizon drywall installation experiment using a full-scale industrial robotic manipulator. The resulting robot task learning scheme achieves multi-task reprogramming with minimal effort and high quality.

[77] arXiv:2509.02878 [pdf, html, other]
Title: Designing a Lightweight GenAI Interface for Visual Data Analysis
Ratanond Koonchanok, Alex Kale, Khairi Reda
Subjects: Human-Computer Interaction (cs.HC)

Recent advances in Generative AI have transformed how users interact with data analysis through natural language interfaces. However, many systems rely too heavily on LLMs, creating risks of hallucination, opaque reasoning, and reduced user control. We present a hybrid visual analysis system that integrates GenAI in a constrained, high-level role to support statistical modeling while preserving transparency and user agency. GenAI translates natural language intent into formal statistical formulations, while interactive visualizations surface model behavior, residual patterns, and hypothesis comparisons to guide iterative exploration. Model fitting, diagnostics, and hypothesis testing are delegated entirely to a structured R-based backend, ensuring correctness, interpretability, and reproducibility. By combining GenAI-assisted intent translation with visualization-driven reasoning, our approach broadens access to modeling tools without compromising rigor. We present an example use case of the tool and discuss challenges and opportunities for future research.

[78] arXiv:2509.02885 [pdf, other]
Title: Efficient Dynamic Rank Aggregation
Morteza Alimi, Hourie Mehrabiun, Alireza Zarei
Comments: 23 pages, 4 figures, 15 tables
Subjects: Data Structures and Algorithms (cs.DS)

The rank aggregation problem, which has many real-world applications, refers to the process of combining multiple input rankings into a single aggregated ranking. In dynamic settings, where new rankings arrive over time, efficiently updating the aggregated ranking is essential. This paper develops a fast, theoretically and practically efficient dynamic rank aggregation algorithm. First, we develop the LR-Aggregation algorithm, built on top of the LR-tree data structure, which is itself modeled on the LR-distance, a novel and equivalent take on the classical Spearman's footrule distance. We then analyze the theoretical efficiency of the Pick-A-Perm algorithm, and show how it can be combined with the LR-aggregation algorithm using another data structure that we develop. We demonstrate through experimental evaluations that LR-Aggregation produces close to optimal solutions in practice. We show that Pick-A-Perm has a theoretical worst case approximation guarantee of 2. We also show that both the LR-Aggregation and Pick-A-Perm algorithms, as well as the methodology for combining them can be run in $O(n \log n)$ time. To the best of our knowledge, this is the first fast, near linear time rank aggregation algorithm in the dynamic setting, having both a theoretical approximation guarantee, and excellent practical performance (much better than the theoretical guarantee).

[79] arXiv:2509.02890 [pdf, html, other]
Title: Grocery to General Merchandise: A Cross-Pollination Recommender using LLMs and Real-Time Cart Context
Akshay Kekuda, Murali Mohana Krishna Dandu, Rimita Lahiri, Shiqin Cai, Sinduja Subramaniam, Evren Korpeoglu, Kannan Achan
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Modern e-commerce platforms strive to enhance customer experience by providing timely and contextually relevant recommendations. However, recommending general merchandise to customers focused on grocery shopping -- such as pairing milk with a milk frother -- remains a critical yet under-explored challenge. This paper introduces a cross-pollination (XP) framework, a novel approach that bridges grocery and general merchandise cross-category recommendations by leveraging multi-source product associations and real-time cart context. Our solution employs a two-stage framework: (1) A candidate generation mechanism that uses co-purchase market basket analysis and LLM-based approach to identify novel item-item associations; and (2) a transformer-based ranker that leverages the real-time sequential cart context and optimizes for engagement signals such as add-to-carts. Offline analysis and online A/B tests show an increase of 36\% add-to-cart rate with LLM-based retrieval, and 27\% NDCG\@4 lift using cart context-based ranker. Our work contributes practical techniques for cross-category recommendations and broader insights for e-commerce systems.

[80] arXiv:2509.02892 [pdf, html, other]
Title: Improving Generative Methods for Causal Evaluation via Simulation-Based Inference
Pracheta Amaranath, Vinitra Muralikrishnan, Amit Sharma, David D. Jensen
Comments: 12 pages main text, 48 pages total
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

Generating synthetic datasets that accurately reflect real-world observational data is critical for evaluating causal estimators, but remains a challenging task. Existing generative methods offer a solution by producing synthetic datasets anchored in the observed data (source data) while allowing variation in key parameters such as the treatment effect and amount of confounding bias. However, existing methods typically require users to provide point estimates of such parameters (rather than distributions) and fixed estimates (rather than estimates that can be improved with reference to the source data). This denies users the ability to express uncertainty over parameter values and removes the potential for posterior inference, potentially leading to unreliable estimator comparisons. We introduce simulation-based inference for causal evaluation (SBICE), a framework that models generative parameters as uncertain and infers their posterior distribution given a source dataset. Leveraging techniques in simulation-based inference, SBICE identifies parameter configurations that produce synthetic datasets closely aligned with the source data distribution. Empirical results demonstrate that SBICE improves the reliability of estimator evaluations by generating more realistic datasets, which supports a robust and data-consistent approach to causal benchmarking under uncertainty.

[81] arXiv:2509.02896 [pdf, html, other]
Title: Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees
Sepanta Zeighami, Shreya Shankar, Aditya Parameswaran
Comments: To appear in SIGMOD'26
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) are being increasingly used as a building block in data systems to process large text datasets. To do so, LLM model providers offer multiple LLMs with different sizes, spanning various cost-quality trade-offs when processing text at scale. Top-of-the-line LLMs (e.g., GPT-4o, Claude Sonnet) operate with high accuracy but are prohibitively expensive when processing many records. To avoid high costs, more affordable but lower quality LLMs (e.g., GPT-4o-mini, Claude Haiku) can be used to process records, but we need to ensure that the overall accuracy does not deviate substantially from that of the top-of-the-line LLMs. The model cascade framework provides a blueprint to manage this trade-off, by using the confidence of LLMs in their output (e.g., log-probabilities) to decide on which records to use the affordable LLM. However, existing solutions following this framework provide only marginal cost savings and weak theoretical guarantees because of poor estimation of the quality of the affordable LLM's outputs. We present BARGAIN, a method that judiciously uses affordable LLMs in data processing to significantly reduce cost while providing strong theoretical guarantees on the solution quality. BARGAIN employs a novel adaptive sampling strategy and statistical estimation procedure that uses data and task characteristics and builds on recent statistical tools to make accurate estimations with tight theoretical guarantees. Variants of BARGAIN can support guarantees on accuracy, precision, or recall of the output. Experimental results across 8 real-world datasets show that BARGAIN reduces cost, on average, by up to 86% more than state-of-the-art, while providing stronger theoretical guarantees on accuracy of output, with similar gains when guaranteeing a desired level of precision or recall.

[82] arXiv:2509.02898 [pdf, html, other]
Title: PRECISE-AS: Personalized Reinforcement Learning for Efficient Point-of-Care Echocardiography in Aortic Stenosis Diagnosis
Armin Saadat, Nima Hashemi, Hooman Vaseli, Michael Y. Tsang, Christina Luong, Michiel Van de Panne, Teresa S. M. Tsang, Purang Abolmaesumi
Comments: To be published in MICCAI 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Aortic stenosis (AS) is a life-threatening condition caused by a narrowing of the aortic valve, leading to impaired blood flow. Despite its high prevalence, access to echocardiography (echo), the gold-standard diagnostic tool, is often limited due to resource constraints, particularly in rural and underserved areas. Point-of-care ultrasound (POCUS) offers a more accessible alternative but is restricted by operator expertise and the challenge of selecting the most relevant imaging views. To address this, we propose a reinforcement learning (RL)-driven active video acquisition framework that dynamically selects each patient's most informative echo videos. Unlike traditional methods that rely on a fixed set of videos, our approach continuously evaluates whether additional imaging is needed, optimizing both accuracy and efficiency. Tested on data from 2,572 patients, our method achieves 80.6% classification accuracy while using only 47% of the echo videos compared to a full acquisition. These results demonstrate the potential of active feature acquisition to enhance AS diagnosis, making echocardiographic assessments more efficient, scalable, and personalized. Our source code is available at: this https URL.

[83] arXiv:2509.02899 [pdf, html, other]
Title: Safe Sharing of Fast Kernel-Bypass I/O Among Nontrusting Applications
Alan Beadle, Michael L. Scott, John Criswell
Comments: 13 pages
Subjects: Operating Systems (cs.OS); Cryptography and Security (cs.CR)

Protected user-level libraries have been proposed as a way to allow mutually distrusting applications to safely share kernel-bypass services. In this paper, we identify and solve several previously unaddressed obstacles to realizing this design and identify several optimization opportunities. First, to preserve the kernel's ability to reclaim failed processes, protected library functions must complete in modest, bounded time. We show how to move unbounded waits outside the library itself, enabling synchronous interaction among processes without the need for polling. Second, we show how the bounded time requirement can be leveraged to achieve lower and more stable latency for inter-process interactions. Third, we observe that prior work on protected libraries is vulnerable to a buffer unmapping attack; we prevent this attack by preventing applications from removing pages that they share with the protected library. Fourth, we show how a trusted daemon can respond to asynchronous events and dynamically divide work with application threads in a protected library.
By extending and improving the protected library model, our work provides a new way to structure OS services, combining the advantages of kernel bypass and microkernels. We present a set of safety and performance guidelines for developers of protected libraries, and a set of recommendations for developers of future protected library operating systems. We demonstrate the convenience and performance of our approach with a prototype version of the DDS communication service. To the best of our knowledge, this prototype represents the first successful sharing of a kernel-bypass NIC among mutually untrusting applications. Relative to the commercial FastDDS implementation, we achieve approximately 50\% lower latency and up to 7x throughput, with lower CPU utilization.

[84] arXiv:2509.02902 [pdf, html, other]
Title: LiGuard: A Streamlined Open-Source Framework for Rapid & Interactive Lidar Research
Muhammad Shahbaz, Shaurya Agarwal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

There is a growing interest in the development of lidar-based autonomous mobility and Intelligent Transportation Systems (ITS). To operate and research on lidar data, researchers often develop code specific to application niche. This approach leads to duplication of efforts across studies that, in many cases, share multiple methodological steps such as data input/output (I/O), pre/post processing, and common algorithms in multi-stage solutions. Moreover, slight changes in data, algorithms, and/or research focus may force major revisions in the code. To address these challenges, we present LiGuard, an open-source software framework that allows researchers to: 1) rapidly develop code for their lidar-based projects by providing built-in support for data I/O, pre/post processing, and commonly used algorithms, 2) interactively add/remove/reorder custom algorithms and adjust their parameters, and 3) visualize results for classification, detection, segmentation, and tracking tasks. Moreover, because it creates all the code files in structured directories, it allows easy sharing of entire projects or even the individual components to be reused by other researchers. The effectiveness of LiGuard is demonstrated via case studies.

[85] arXiv:2509.02903 [pdf, html, other]
Title: PercepTwin: Modeling High-Fidelity Digital Twins for Sim2Real LiDAR-based Perception for Intelligent Transportation Systems
Muhammad Shahbaz, Shaurya Agarwal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

LiDAR-based perception in intelligent transportation systems (ITS), for tasks such as object detection, tracking, and semantic and instance segmentation, is predominantly solved by deep neural network models which often require large-scale labeled datasets during training to achieve generalization. However, creating these datasets is costly. time consuming and require human labor before the datasets are ready for training models. This hinders scalability of the LiDAR-based perception systems in ITS. Sim2Real learning offers scalable alternative, however, its effectiveness is dependent on the fidelity of the source simulation(s) to real-world, in terms of environment structure, actor dynamics, and sensor emulations. In response, this paper introduces a rigorous and reproducible methodology for creating large-scale, high-quality synthetic datasets using High-Fidelity Digital Twins (HiFi DTs). The proposed workflow outlines the steps, tools, and best practices for digitally replicating real-world environments, encompassing static geometry modeling, road infrastructure replication, and dynamic traffic scenario generation. Leveraging open-source and readily available resources such as satellite imagery and OpenStreetMap data, alongside specific sensor configurations, this paper provides practical, detailed guidance for constructing robust synthetic environments. These environments subsequently facilitate scalable, cost-effective, and diverse dataset generation, forming a reliable foundation for robust Sim2Real learning.

[86] arXiv:2509.02904 [pdf, html, other]
Title: High-Fidelity Digital Twins for Bridging the Sim2Real Gap in LiDAR-Based ITS Perception
Muhammad Shahbaz, Shaurya Agarwal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sim2Real domain transfer offers a cost-effective and scalable approach for developing LiDAR-based perception (e.g., object detection, tracking, segmentation) in Intelligent Transportation Systems (ITS). However, perception models trained in simulation often under perform on real-world data due to distributional shifts. To address this Sim2Real gap, this paper proposes a high-fidelity digital twin (HiFi DT) framework that incorporates real-world background geometry, lane-level road topology, and sensor-specific specifications and placement. We formalize the domain adaptation challenge underlying Sim2Real learning and present a systematic method for constructing simulation environments that yield in-domain synthetic data. An off-the-shelf 3D object detector is trained on HiFi DT-generated synthetic data and evaluated on real data. Our experiments show that the DT-trained model outperforms the equivalent model trained on real data by 4.8%. To understand this gain, we quantify distributional alignment between synthetic and real data using multiple metrics, including Chamfer Distance (CD), Maximum Mean Discrepancy (MMD), Earth Mover's Distance (EMD), and Fr'echet Distance (FD), at both raw-input and latent-feature levels. Results demonstrate that HiFi DTs substantially reduce domain shift and improve generalization across diverse evaluation scenarios. These findings underscore the significant role of digital twins in enabling reliable, simulation-based LiDAR perception for real-world ITS applications.

[87] arXiv:2509.02908 [pdf, html, other]
Title: Advancing Minority Stress Detection with Transformers: Insights from the Social Media Datasets
Santosh Chapagain, Cory J Cascalheira, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi, Jillian R. Scheer
Comments: Accepted in Social Network Analysis and Mining Journal (SNAM)
Subjects: Computation and Language (cs.CL)

Individuals from sexual and gender minority groups experience disproportionately high rates of poor health outcomes and mental disorders compared to their heterosexual and cisgender counterparts, largely as a consequence of minority stress as described by Meyer's (2003) model. This study presents the first comprehensive evaluation of transformer-based architectures for detecting minority stress in online discourse. We benchmark multiple transformer models including ELECTRA, BERT, RoBERTa, and BART against traditional machine learning baselines and graph-augmented variants. We further assess zero-shot and few-shot learning paradigms to assess their applicability on underrepresented datasets. Experiments are conducted on the two largest publicly available Reddit corpora for minority stress detection, comprising 12,645 and 5,789 posts, and are repeated over five random seeds to ensure robustness. Our results demonstrate that integrating graph structure consistently improves detection performance across transformer-only models and that supervised fine-tuning with relational context outperforms zero and few-shot approaches. Theoretical analysis reveals that modeling social connectivity and conversational context via graph augmentation sharpens the models' ability to identify key linguistic markers such as identity concealment, internalized stigma, and calls for support, suggesting that graph-enhanced transformers offer the most reliable foundation for digital health interventions and public health policy.

[88] arXiv:2509.02910 [pdf, other]
Title: The Basic B*** Effect: The Use of LLM-based Agents Reduces the Distinctiveness and Diversity of People's Choices
Sandra C. Matz, C. Blaine Horton, Sofie Goethals
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Large language models (LLMs) increasingly act on people's behalf: they write emails, buy groceries, and book restaurants. While the outsourcing of human decision-making to AI can be both efficient and effective, it raises a fundamental question: how does delegating identity-defining choices to AI reshape who people become? We study the impact of agentic LLMs on two identity-relevant outcomes: interpersonal distinctiveness - how unique a person's choices are relative to others - and intrapersonal diversity - the breadth of a single person's choices over time. Using real choices drawn from social-media behavior of 1,000 U.S. users (110,000 choices in total), we compare a generic and personalized agent to a human baseline. Both agents shift people's choices toward more popular options, reducing the distinctiveness of their behaviors and preferences. While the use of personalized agents tempers this homogenization (compared to the generic AI), it also more strongly compresses the diversity of people's preference portfolios by narrowing what they explore across topics and psychological affinities. Understanding how AI agents might flatten human experience, and how using generic versus personalized agents involves distinctiveness-diversity trade-offs, is critical for designing systems that augment rather than constrain human agency, and for safeguarding diversity in thought, taste, and expression.

[89] arXiv:2509.02915 [pdf, html, other]
Title: English Pronunciation Evaluation without Complex Joint Training: LoRA Fine-tuned Speech Multimodal LLM
Taekyung Ahn, Hosung Nam
Subjects: Computation and Language (cs.CL)

This study demonstrates that a Multimodal Large Language Model (MLLM) adapted via Low-Rank Adaptation (LoRA) can perform both Automatic Pronunciation Assessment (APA) and Mispronunciation Detection and Diagnosis (MDD) simultaneously. Leveraging Microsoft's Phi-4-multimodal-instruct, our fine-tuning method eliminates the need for complex architectural changes or separate training procedures conventionally required for these distinct tasks. Fine-tuned on the Speechocean762 dataset, the pronunciation evaluation scores predicted by the model exhibited a strong Pearson Correlation Coefficient (PCC > 0.7) with human-assigned scores, while achieving low Word Error Rate (WER) and Phoneme Error Rate (PER) (both < 0.15). Notably, fine-tuning only the LoRA layers was sufficient to achieve performance levels comparable to those achieved by fine-tuning all audio layers. This research highlights that an integrated pronunciation assessment system can be established by adapting large multimodal models without full fine-tuning, utilizing a significantly simpler training methodology compared to previous joint models designed for simultaneous APA and MDD. This efficient LoRA-based approach paves the way for more accessible, integrated, and effective Computer-Assisted Pronunciation Training (CAPT) technologies for English L2 learners.

[90] arXiv:2509.02918 [pdf, html, other]
Title: Single Domain Generalization in Diabetic Retinopathy: A Neuro-Symbolic Learning Approach
Midhat Urooj, Ayan Banerjee, Farhat Shaikh, Kuntal Thakur, Sandeep Gupta
Comments: Accepted in ANSyA 2025: 1st International Workshop on Advanced Neuro-Symbolic Applications
Journal-ref: ANSyA 2025: 1st International Workshop on Advanced Neuro-Symbolic Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Domain generalization remains a critical challenge in medical imaging, where models trained on single sources often fail under real-world distribution shifts. We propose KG-DG, a neuro-symbolic framework for diabetic retinopathy (DR) classification that integrates vision transformers with expert-guided symbolic reasoning to enable robust generalization across unseen domains. Our approach leverages clinical lesion ontologies through structured, rule-based features and retinal vessel segmentation, fusing them with deep visual representations via a confidence-weighted integration strategy. The framework addresses both single-domain generalization (SDG) and multi-domain generalization (MDG) by minimizing the KL divergence between domain embeddings, thereby enforcing alignment of high-level clinical semantics. Extensive experiments across four public datasets (APTOS, EyePACS, Messidor-1, Messidor-2) demonstrate significant improvements: up to a 5.2% accuracy gain in cross-domain settings and a 6% improvement over baseline ViT models. Notably, our symbolic-only model achieves a 63.67% average accuracy in MDG, while the complete neuro-symbolic integration achieves the highest accuracy compared to existing published baselines and benchmarks in challenging SDG scenarios. Ablation studies reveal that lesion-based features (84.65% accuracy) substantially outperform purely neural approaches, confirming that symbolic components act as effective regularizers beyond merely enhancing interpretability. Our findings establish neuro-symbolic integration as a promising paradigm for building clinically robust, and domain-invariant medical AI systems.

[91] arXiv:2509.02920 [pdf, html, other]
Title: Event Detection and Classification for Long Range Sensing of Elephants Using Seismic Signal
Jaliya L. Wijayaraja, Janaka L. Wijekoon, Malitha Wijesundara
Comments: This article has been accepted for publication in IEEE Access
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Emerging Technologies (cs.ET); Systems and Control (eess.SY)

Detecting elephants through seismic signals is an emerging research topic aimed at developing solutions for Human-Elephant Conflict (HEC). Despite the promising results, such solutions heavily rely on manual classification of elephant footfalls, which limits their applicability for real-time classification in natural settings. To address this limitation and build on our previous work, this study introduces a classification framework targeting resource-constrained implementations, prioritizing both accuracy and computational efficiency. As part of this framework, a novel event detection technique named Contextually Customized Windowing (CCW), tailored specifically for detecting elephant footfalls, was introduced, and evaluations were conducted by comparing it with the Short-Term Average/Long-Term Average (STA/LTA) method. The yielded results show that the maximum validated detection range was 155.6 m in controlled conditions and 140 m in natural environments. Elephant footfall classification using Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel demonstrated superior performance across multiple settings, achieving an accuracy of 99% in controlled environments, 73% in natural elephant habitats, and 70% in HEC-prone human habitats, the most challenging scenario. Furthermore, feature impact analysis using explainable AI identified the number of Zero Crossings and Dynamic Time Warping (DTW) Alignment Cost as the most influential factors in all experiments, while Predominant Frequency exhibited significant influence in controlled settings.

[92] arXiv:2509.02922 [pdf, html, other]
Title: Approximate constrained stochastic optimal control via parameterized input inference
Shahbaz P Qadri Syed, He Bai
Journal-ref: Automatica Volume 171, January 2025
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Robotics (cs.RO); Optimization and Control (math.OC)

Approximate methods to solve stochastic optimal control (SOC) problems have received significant interest from researchers in the past decade. Probabilistic inference approaches to SOC have been developed to solve nonlinear quadratic Gaussian problems. In this work, we propose an Expectation-Maximization (EM) based inference procedure to generate state-feedback controls for constrained SOC problems. We consider the inequality constraints for the state and controls and also the structural constraints for the controls. We employ barrier functions to address state and control constraints. We show that the expectation step leads to smoothing of the state-control pair while the the maximization step on the non-zero subsets of the control parameters allows inference of structured stochastic optimal controllers. We demonstrate the effectiveness of the algorithm on unicycle obstacle avoidance, four-unicycle formation control, and quadcopter navigation in windy environment examples. In these examples, we perform an empirical study on the parametric effect of barrier functions on the state constraint satisfaction. We also present a comparative study of smoothing algorithms on the performance of the proposed approach.

[93] arXiv:2509.02923 [pdf, html, other]
Title: A Narrative Review of Clinical Decision Support Systems in Offloading Footwear for Diabetes-Related Foot Ulcers
Kunal Kumar, Muhammad Ashad Kabir, Luke Donnan, Sayed Ahmed
Comments: 44 pages, 2 figures, and 3 tables
Subjects: Machine Learning (cs.LG)

Offloading footwear helps prevent and treat diabetic foot ulcers (DFUs) by lowering plantar pressure (PP), yet prescription decisions remain fragmented: feature selection varies, personalization is limited, and evaluation practices differ. We performed a narrative review of 45 studies (12 guidelines/protocols, 25 knowledge-based systems, 8 machine-learning applications) published to Aug 2025. We thematically analyzed knowledge type, decision logic, evaluation methods, and enabling technologies. Guidelines emphasize PP thresholds (<=200 kPa or >=25--30\% reduction) but rarely yield actionable, feature-level outputs. Knowledge-based systems use rule- and sensor-driven logic, integrating PP monitoring, adherence tracking, and usability testing. ML work introduces predictive, optimization, and generative models with high computational accuracy but limited explainability and clinical validation. Evaluation remains fragmented: protocols prioritize biomechanical tests; knowledge-based systems assess usability/adherence; ML studies focus on technical accuracy with weak linkage to long-term outcomes. From this synthesis we propose a five-part CDSS framework: (1) a minimum viable dataset; (2) a hybrid architecture combining rules, optimization, and explainable ML; (3) structured feature-level outputs; (4) continuous validation and evaluation; and (5) integration with clinical and telehealth workflows. This framework aims to enable scalable, patient-centered CDSSs for DFU care; prioritizing interoperable datasets, explainable models, and outcome-focused evaluation will be key to clinical adoption.

[94] arXiv:2509.02924 [pdf, html, other]
Title: Simulacra Naturae: Generative Ecosystem driven by Agent-Based Simulations and Brain Organoid Collective Intelligence
Nefeli Manoudaki, Mert Toka, Iason Paterakis, Diarmid Flatley
Comments: to be published in IEEE VISAP 2025
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Simulacra Naturae is a data-driven media installation that explores collective care through the entanglement of biological computation, material ecologies, and generative systems. The work translates pre-recorded neural activity from brain organoids, lab-grown three-dimensional clusters of neurons, into a multi-sensory environment composed of generative visuals, spatial audio, living plants, and fabricated clay artifacts. These biosignals, streamed through a real-time system, modulate emergent agent behaviors inspired by natural systems such as termite colonies and slime molds. Rather than using biosignals as direct control inputs, Simulacra Naturae treats organoid activity as a co-creative force, allowing neural rhythms to guide the growth, form, and atmosphere of a generative ecosystem. The installation features computationally fabricated clay prints embedded with solenoids, adding physical sound resonances to the generative surround composition. The spatial environment, filled with live tropical plants and a floor-level projection layer featuring real-time generative AI visuals, invites participants into a sensory field shaped by nonhuman cognition. By grounding abstract data in living materials and embodied experience, Simulacra Naturae reimagines visualization as a practice of care, one that decentralizes human agency and opens new spaces for ethics, empathy, and ecological attunement within hybrid computational systems.

[95] arXiv:2509.02926 [pdf, html, other]
Title: Decoding the Rule Book: Extracting Hidden Moderation Criteria from Reddit Communities
Youngwoo Kim, Himanshu Beniwal, Steven L. Johnson, Thomas Hartvigsen
Comments: Accepted to EMNLP 2025 Main
Subjects: Computation and Language (cs.CL)

Effective content moderation systems require explicit classification criteria, yet online communities like subreddits often operate with diverse, implicit standards. This work introduces a novel approach to identify and extract these implicit criteria from historical moderation data using an interpretable architecture. We represent moderation criteria as score tables of lexical expressions associated with content removal, enabling systematic comparison across different communities. Our experiments demonstrate that these extracted lexical patterns effectively replicate the performance of neural moderation models while providing transparent insights into decision-making processes. The resulting criteria matrix reveals significant variations in how seemingly shared norms are actually enforced, uncovering previously undocumented moderation patterns including community-specific tolerances for language, features for topical restrictions, and underlying subcategories of the toxic speech classification.

[96] arXiv:2509.02927 [pdf, html, other]
Title: PDRL: Post-hoc Descriptor-based Residual Learning for Uncertainty-Aware Machine Learning Potentials
Shih-Peng Huang, Nontawat Charoenphakdee, Yuta Tsuboi, Yong-Bin Zhuang, Wenwen Li
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)

Ensemble method is considered the gold standard for uncertainty quantification (UQ) for machine learning interatomic potentials (MLIPs). However, their high computational cost can limit its practicality. Alternative techniques, such as Monte Carlo dropout and deep kernel learning, have been proposed to improve computational efficiency; however, some of these methods cannot be applied to already trained models and may affect the prediction accuracy. In this paper, we propose a simple and efficient post-hoc framework for UQ that leverages the descriptor of a trained graph neural network potential to estimate residual errors. We refer to this method as post-hoc descriptor-based residual-based learning (PDRL). PDRL models the discrepancy between MLIP predictions and ground truth values, allowing these residuals to act as proxies for prediction uncertainty. We explore multiple variants of PDRL and benchmark them against established UQ methods, evaluating both their effectiveness and limitations.

[97] arXiv:2509.02928 [pdf, html, other]
Title: A Data-Driven RetinaNet Model for Small Object Detection in Aerial Images
Zhicheng Tang, Jinwen Tang, Yi Shang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In the realm of aerial imaging, the ability to detect small objects is pivotal for a myriad of applications, encompassing environmental surveillance, urban design, and crisis management. Leveraging RetinaNet, this work unveils DDR-Net: a data-driven, deep-learning model devised to enhance the detection of diminutive objects. DDR-Net introduces novel, data-driven techniques to autonomously ascertain optimal feature maps and anchor estimations, cultivating a tailored and proficient training process while maintaining precision. Additionally, this paper presents an innovative sampling technique to bolster model efficacy under limited data training constraints. The model's enhanced detection capabilities support critical applications including wildlife and habitat monitoring, traffic flow optimization, and public safety improvements through accurate identification of small objects like vehicles and pedestrians. DDR-Net significantly reduces the cost and time required for data collection and training, offering efficient performance even with limited data. Empirical assessments over assorted aerial avian imagery datasets demonstrate that DDR-Net markedly surpasses RetinaNet and alternative contemporary models. These innovations advance current aerial image analysis technologies and promise wide-ranging impacts across multiple sectors including agriculture, security, and archaeology.

[98] arXiv:2509.02930 [pdf, html, other]
Title: VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills
Erik M. Lintunen
Comments: 17 pages including appendices
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of diversity, which could support skill-diversity pretraining in new and richly interactive environments where optimising for various forms of diversity may be desirable.

[99] arXiv:2509.02933 [pdf, html, other]
Title: Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup
Yanming Xiu, Maria Gorlatova
Comments: The paper has been accepted to 2025 MobiHoc 1st Workshop on Enhancing Security, Privacy, and Trust in Extended Reality (XR) Systems
Subjects: Human-Computer Interaction (cs.HC)

Augmented reality (AR) enhances user interaction with the real world but also presents vulnerabilities, particularly through Visual Information Manipulation (VIM) attacks. These attacks alter important real-world visual cues, leading to user confusion and misdirected actions. In this demo, we present a hands-on experience using a miniature city setup, where users interact with manipulated AR content via the Meta Quest 3. The demo highlights the impact of VIM attacks on user decision-making and underscores the need for effective security measures in AR systems. Future work includes a user study and cross-platform testing.

[100] arXiv:2509.02936 [pdf, html, other]
Title: Generalized Golub-Kahan bidiagonalization for generalized saddle point systems
Na-Na Wang, Ji-Cheng Li
Subjects: Numerical Analysis (math.NA)

We consider the iterative solution of generalized saddle point systems. When the right bottom block is zero, Arioli [SIAM J. Matrix Anal. Appl., 34 (2013), pp. 571--592] proposed a CRAIG algorithm based on generalized Golub-Kahan Bidiagonalization (GKB) for the augmented systems with the leading block being symmetric and positive definite (SPD), and then Dumitrasc et al. [SIAM J. Matrix Anal. Appl., 46 (2025), pp. 370--392] extended the GKB for the case where the symmetry condition of the leading block no longer holds and then proposed nonsymmetric version of the CRAIG (nsCRAIG) algorithm. The CRAIG and nsCRAIG algorithms are theoretically equivalent to the Schur complement reduction (SCR) methods where the Conjugate Gradient (CG) method and the Full Orthogonalization Method (FOM) are applied to the associated Schur-complement equation, respectively. We extend the GKB and its nonsymmetric counterpart used separately in CRAIG and nsCRAIG algorithms for the case where the right bottom block of saddle point system is nonzero. On this basis, we propose CRAIG and nsCRAIG algorithms for the solution of the generalized saddle point problems with the leading block being SPD and nonsymmetric positive definite (NSPD), respectively. They are also theoretically equivalent to the SCR methods with inner CG and FOM iterations for the associated Schur-complement equation, respectively. Moreover, we give algorithm steps of the two new solvers and propose appropriate stopping criteria based on an estimate of the energy norm for the error and the residual norm. Numerical comparison with MINRES or GMRES highlights the advantages of our proposed strategies regarding its high computational efficiency and/or low memory requirements and the associated implications.

[101] arXiv:2509.02942 [pdf, html, other]
Title: RankGraph: Unified Heterogeneous Graph Learning for Cross-Domain Recommendation
Renzhi Wu, Junjie Yang, Li Chen, Hong Li, Li Yu, Hong Yan
Comments: RecSys 2025
Journal-ref: RecSys 2025
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Cross-domain recommendation systems face the challenge of integrating fine-grained user and item relationships across various product domains. To address this, we introduce RankGraph, a scalable graph learning framework designed to serve as a core component in recommendation foundation models (FMs). By constructing and leveraging graphs composed of heterogeneous nodes and edges across multiple products, RankGraph enables the integration of complex relationships between users, posts, ads, and other entities. Our framework employs a GPU-accelerated Graph Neural Network and contrastive learning, allowing for dynamic extraction of subgraphs such as item-item and user-user graphs to support similarity-based retrieval and real-time clustering. Furthermore, RankGraph integrates graph-based pretrained representations as contextual tokens into FM sequence models, enriching them with structured relational knowledge. RankGraph has demonstrated improvements in click (+0.92%) and conversion rates (+2.82%) in online A/B tests, showcasing its effectiveness in cross-domain recommendation scenarios.

[102] arXiv:2509.02943 [pdf, other]
Title: Knowledge graph-based personalized multimodal recommendation fusion framework
Yu Fang
Subjects: Information Retrieval (cs.IR)

In the contemporary age characterized by information abundance, rapid advancements in artificial intelligence have rendered recommendation systems indispensable. Conventional recommendation methodologies based on collaborative filtering or individual attributes encounter deficiencies in capturing nuanced user interests. Knowledge graphs and multimodal data integration offer enhanced representations of users and items with greater richness and precision. This paper reviews existing multimodal knowledge graph recommendation frameworks, identifying shortcomings in modal interaction and higher-order dependency modeling. We propose the Cross-Graph Cross-Modal Mutual Information-Driven Unified Knowledge Graph Learning and Recommendation Framework (CrossGMMI-DUKGLR), which employs pre-trained visual-text alignment models for feature extraction, achieves fine-grained modality fusion through multi-head cross-attention, and propagates higher-order adjacency information via graph attention networks.

[103] arXiv:2509.02946 [pdf, html, other]
Title: Deep Reinforcement Learning-Based Decision-Making Strategy Considering User Satisfaction Feedback in Demand Response Program
Xin Li, Li Ding, Qiao Lin, Zhen-Wei Yu
Comments: This version corrects equation display errors that occurred in the IEEE Xplore version. Please cite the official IEEE DOI:https://doi.org/10.1109/ICPST65050.2025.11089098
Journal-ref: 2025 IEEE 3rd International Conference on Power Science and Technology (ICPST)
Subjects: Systems and Control (eess.SY)

Demand response providers (DRPs) are intermediaries between the upper-level distribution system operator and the lower-level participants in demand response (DR) programs. Usually, DRPs act as leaders and determine electricity pricing strategies to maximize their economic revenue, while end-users adjust their power consumption following the pricing signals. However, this profit-seeking bi-level optimization model often neglects the satisfaction of end-users participating in DR programs. In addition, the detailed mathematical models underlying user decision-making strategy and satisfaction evaluation mechanism are typically unavailable to DRPs, posing significant challenges to conventional model-based solution methods. To address these issues, this paper designs a user-side satisfaction evaluation mechanism and proposes a multi-branch temporal fusion twin-delayed deep deterministic policy gradient (MBTF-TD3) reinforcement learning algorithm. User satisfaction feedback is incorporated into the reward function via a dynamically adjusted penalty term. The proposed MBTF structure effectively extracts temporal feature dependencies in the time-series observation data, and the dynamically adjusted penalty function successfully enhances the overall satisfaction level of users. Several experiments are conducted to validate the performance and the effectiveness of our proposed solution algorithm.

[104] arXiv:2509.02949 [pdf, html, other]
Title: ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly
Kimihiro Hasegawa, Wiradee Imrattanatrai, Masaki Asada, Susan Holm, Yuran Wang, Vincent Zhou, Ken Fukuda, Teruko Mitamura
Comments: 29 pages. Code and data: this https URL
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Assistants on assembly tasks have a large potential to benefit humans from everyday tasks to industrial settings. However, no testbeds support application-oriented system evaluation in a practical setting, especially in assembly. To foster the development, we propose a new multimodal QA dataset on assembly activities. Our dataset, ProMQA-Assembly, consists of 391 QA pairs that require the multimodal understanding of human-activity recordings and their instruction manuals in an online-style manner. In the development, we adopt a semi-automated QA annotation approach, where LLMs generate candidates and humans verify them, as a cost-effective method, and further improve it by integrating fine-grained action labels to diversify question types. Furthermore, we create instruction task graphs for the target tasks of assembling toy vehicles. These newly created task graphs are used in our benchmarking experiment, as well as to facilitate the human verification process in the QA annotation. Utilizing our dataset, we benchmark models, including competitive proprietary multimodal models. Our results suggest great room for improvement for the current models. We believe our new evaluation dataset can contribute to the further development of procedural-activity assistants.

[105] arXiv:2509.02951 [pdf, html, other]
Title: Complex Scaling for the Junction of Semi-infinite Gratings
Fruzsina J. Agocs, Tristan Goodwill, Jeremy Hoskins
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

We present and analyze an integral equation method for the scattering of a non-periodic source from a geometry consisting of two semi-infinite, periodic structures glued together in two dimensions. The two structures may involve a periodic wall, several layers of transmission surfaces with a shared period, or periodic sets of obstacles. This integral equation is posed on the infinite interface between the two periodic structures using kernels built out of the Green's function for each structure. To combat the slow decay of the Green's function, we also show that our integral equation can be analytically continued into the complex plane, where it can be truncated with exponential accuracy. A careful analysis of the domain Green's functions far from the periodic structure is then used to prove that the analytically continued equation is Fredholm index zero. Finally, we show that the solution we generate satisfies a radiation condition and demonstrate an efficient and high order solver for this problem.

[106] arXiv:2509.02952 [pdf, html, other]
Title: STAR: A Fast and Robust Rigid Registration Framework for Serial Histopathological Images
Zeyu Liu, Shengwei Ding
Comments: The code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Registration of serial whole-slide histopathological images (WSIs) is critical for enabling direct comparison across diverse stains and for preparing paired datasets in artificial intelligence (AI) workflows such as virtual staining and biomarker prediction. While existing methods often rely on complex deformable or deep learning approaches that are computationally intensive and difficult to reproduce, lightweight rigid frameworks-sufficient for many consecutive-section scenarios-remain underdeveloped. We introduce STAR (Serial Tissue Alignment for Rigid registration), a fast and robust open-source framework for multi-WSI alignment. STAR integrates stain-conditioned preprocessing with a hierarchical coarse-to-fine correlation strategy, adaptive kernel scaling, and built-in quality control, achieving reliable rigid registration across heterogeneous tissue types and staining protocols, including hematoxylin-eosin (H&E), special histochemical stains (e.g., PAS, PASM, Masson's), and immunohistochemical (IHC) markers (e.g., CD31, KI67). Evaluated on the ANHIR 2019 and ACROBAT 2022 datasets spanning multiple organs and scanning conditions, STAR consistently produced stable alignments within minutes per slide, demonstrating robustness to cross-stain variability and partial tissue overlap. Beyond benchmarks, we present case studies on H&E-IHC alignment, construction of multi-IHC panels, and typical failure modes, underscoring both utility and limitations. Released as an open and lightweight tool, STAR provides a reproducible baseline that lowers the barrier for clinical adoption and enables large-scale paired data preparation for next-generation computational pathology.

[107] arXiv:2509.02958 [pdf, html, other]
Title: Lattice Annotated Temporal (LAT) Logic for Non-Markovian Reasoning
Kaustuv Mukherji, Jaikrishna Manojkumar Patil, Dyuman Aditya, Paulo Shakarian, Devendra Parkar, Lahari Pokala, Clark Dorman, Gerardo I. Simari
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)

We introduce Lattice Annotated Temporal (LAT) Logic, an extension of Generalized Annotated Logic Programs (GAPs) that incorporates temporal reasoning and supports open-world semantics through the use of a lower lattice structure. This logic combines an efficient deduction process with temporal logic programming to support non-Markovian relationships and open-world reasoning capabilities. The open-world aspect, a by-product of the use of the lower-lattice annotation structure, allows for efficient grounding through a Skolemization process, even in domains with infinite or highly diverse constants.
We provide a suite of theoretical results that bound the computational complexity of the grounding process, in addition to showing that many of the results on GAPs (using an upper lattice) still hold with the lower lattice and temporal extensions (though different proof techniques are required). Our open-source implementation, PyReason, features modular design, machine-level optimizations, and direct integration with reinforcement learning environments. Empirical evaluations across multi-agent simulations and knowledge graph tasks demonstrate up to three orders of magnitude speedup and up to five orders of magnitude memory reduction while maintaining or improving task performance. Additionally, we evaluate LAT Logic's value in reinforcement learning environments as a non-Markovian simulator, achieving up to three orders of magnitude faster simulation with improved agent performance, including a 26% increase in win rate due to capturing richer temporal dependencies. These results highlight LAT Logic's potential as a unified, extensible framework for open-world temporal reasoning in dynamic and uncertain environments. Our implementation is available at: this http URL.

[108] arXiv:2509.02962 [pdf, html, other]
Title: Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability
Shuai Jiang, Yunfeng Ma, Jingyu Zhou, Yuan Bian, Yaonan Wang, Min Liu
Comments: Accepted to IEEE/ASME Transactions on Mechatronics
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multimodal industrial surface defect detection (MISDD) aims to identify and locate defect in industrial products by fusing RGB and 3D modalities. This article focuses on modality-missing problems caused by uncertain sensors availability in MISDD. In this context, the fusion of multiple modalities encounters several troubles, including learning mode transformation and information vacancy. To this end, we first propose cross-modal prompt learning, which includes: i) the cross-modal consistency prompt serves the establishment of information consistency of dual visual modalities; ii) the modality-specific prompt is inserted to adapt different input patterns; iii) the missing-aware prompt is attached to compensate for the information vacancy caused by dynamic modalities-missing. In addition, we propose symmetric contrastive learning, which utilizes text modality as a bridge for fusion of dual vision modalities. Specifically, a paired antithetical text prompt is designed to generate binary text semantics, and triple-modal contrastive pre-training is offered to accomplish multimodal learning. Experiment results show that our proposed method achieves 73.83% I-AUROC and 93.05% P-AUROC with a total missing rate 0.7 for RGB and 3D modalities (exceeding state-of-the-art methods 3.84% and 5.58% respectively), and outperforms existing approaches to varying degrees under different missing types and rates. The source code will be available at this https URL.

[109] arXiv:2509.02964 [pdf, html, other]
Title: EdgeAttNet: Towards Barb-Aware Filament Segmentation
Victor Solomon, Piet Martens, Jingyu Liu, Rafal Angryk
Subjects: Computer Vision and Pattern Recognition (cs.CV); Solar and Stellar Astrophysics (astro-ph.SR); Image and Video Processing (eess.IV)

Accurate segmentation of solar filaments in H-alpha observations is critical for determining filament chirality, a key factor in the behavior of Coronal Mass Ejections (CMEs). However, existing methods often fail to capture fine-scale filament structures, particularly barbs, due to a limited ability to model long-range dependencies and spatial detail.
We propose EdgeAttNet, a segmentation architecture built on a U-Net backbone by introducing a novel, learnable edge map derived directly from the input image. This edge map is incorporated into the model by linearly transforming the attention Key and Query matrices with the edge information, thereby guiding the self-attention mechanism at the network's bottleneck to more effectively capture filament boundaries and barbs. By explicitly integrating this structural prior into the attention computations, EdgeAttNet enhances spatial sensitivity and segmentation accuracy while reducing the number of trainable parameters.
Trained end-to-end, EdgeAttNet outperforms U-Net and other U-Net-based transformer baselines on the MAGFILO dataset. It achieves higher segmentation accuracy and significantly better recognition of filament barbs, with faster inference performance suitable for practical deployment.

[110] arXiv:2509.02966 [pdf, other]
Title: KEPT: Knowledge-Enhanced Prediction of Trajectories from Consecutive Driving Frames with Vision-Language Models
Yujin Wang, Tianyi Wang, Quanfeng Liu, Wenxian Fan, Junfeng Jiao, Christian Claudel, Yunbing Yan, Bingzhao Gao, Jianqiang Wang, Hong Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate short-horizon trajectory prediction is pivotal for safe and reliable autonomous driving, yet existing vision-language models (VLMs) often fail to effectively ground their reasoning in scene dynamics and domain knowledge. To address this challenge, this paper introduces KEPT, a knowledge-enhanced VLM framework that predicts ego trajectories directly from consecutive front-view driving frames. KEPT couples a temporal frequency-spatial fusion (TFSF) video encoder, trained via self-supervised learning with hard-negative mining, with a scalable k-means + HNSW retrieval stack that supplies scene-aligned exemplars. Retrieved priors are embedded into chain-of-thought (CoT) prompts with explicit planning constraints, while a triple-stage fine-tuning schedule incrementally aligns the language head to metric spatial cues, physically feasible motion, and temporally conditioned front-view planning. Evaluated on nuScenes dataset, KEPT achieves state-of-the-art performance across open-loop protocols: under NoAvg, it achieves 0.70m average L2 with a 0.21\% collision rate; under TemAvg with lightweight ego status, it attains 0.31m average L2 and a 0.07\% collision rate. Ablation studies show that all three fine-tuning stages contribute complementary benefits, and that using Top-2 retrieved exemplars yields the best accuracy-safety trade-off. The k-means-clustered HNSW index delivers sub-millisecond retrieval latency, supporting practical deployment. These results indicate that retrieval-augmented, CoT-guided VLMs offer a promising, data-efficient pathway toward interpretable and trustworthy autonomous driving.

[111] arXiv:2509.02967 [pdf, html, other]
Title: AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting
Chen Zeng, Tiehang Xu, Qiao Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Conventional neural networks frequently face challenges in spectral analysis of signals. To address this challenge, Fourier neural networks (FNNs) and similar approaches integrate components of Fourier series into the structure of neural networks. Nonetheless, a significant hurdle is often overlooked: the superposition of periodic signals does not necessarily result in a periodic signal. For example, when forecasting almost periodic functions composed of signals with incommensurate frequencies, traditional models such as Autoregressive Integrated Moving Average (ARIMA) frequently outperform most neural networks including large language models (LLMs). To tackle this goal, we propose Autoregressive-Weight-Enhanced AR-KAN, a hybrid model that combines the benefits of both methods. Using the Universal Myopic Mapping Theorem, we apply a Kolmogorov-Arnold Network (KAN) for the static nonlinear part and include memory through a pre-trained AR component, which can be explained to retain the most useful information while eliminating redundancy. Experimental data indicates that AR-KAN delivers superior results on $72\%$ of real-world datasets.

[112] arXiv:2509.02968 [pdf, html, other]
Title: Spiking control systems for soft robotics: a rhythmic case study in a soft robotic crawler
Juncal Arbelaiz, Alessio Franci, Naomi Ehrich Leonard, Rodolphe Sepulchre, Bassam Bamieh
Subjects: Systems and Control (eess.SY)

Inspired by spiking neural feedback, we propose a spiking controller for efficient locomotion in a soft robotic crawler. Its bistability, akin to neural fast positive feedback, combined with a sensorimotor slow negative feedback loop, generates rhythmic spiking. The closed-loop system is robust through the quantized actuation, and negative feedback ensures efficient locomotion with minimal external tuning. We prove that peristaltic waves arise from a supercritical Hopf bifurcation controlled by the sensorimotor gain. Dimensional analysis reveals a separation of mechanical and electrical timescales, and Geometric Singular Perturbation analysis explains endogenous crawling through relaxation oscillations. We further formulate and analytically solve an optimization problem in the singularly perturbed regime, proving that crawling at mechanical resonance maximizes speed by a matching of neuromechanical scales. Given the importance and ubiquity of rhythms and waves in soft-bodied locomotion, we envision that spiking control systems could be utilized in a variety of soft-robotic morphologies and modular distributed architectures, yielding significant robustness, adaptability, and energetic gains across scales.

[113] arXiv:2509.02969 [pdf, html, other]
Title: VQualA 2025 Challenge on Engagement Prediction for Short Videos: Methods and Results
Dasong Li, Sizhuo Ma, Hang Hua, Wenjie Li, Jian Wang, Chris Wei Zhou, Fengbin Guan, Xin Li, Zihao Yu, Yiting Lu, Ru-Ling Liao, Yan Ye, Zhibo Chen, Wei Sun, Linhan Cao, Yuqin Cao, Weixia Zhang, Wen Wen, Kaiwei Zhang, Zijian Chen, Fangfang Lu, Xiongkuo Min, Guangtao Zhai, Erjia Xiao, Lingfeng Zhang, Zhenjie Su, Hao Cheng, Yu Liu, Renjing Xu, Long Chen, Xiaoshuai Hao, Zhenpeng Zeng, Jianqin Wu, Xuxu Wang, Qian Yu, Bo Hu, Weiwei Wang, Pinxin Liu, Yunlong Tang, Luchuan Song, Jinxi He, Jiaru Wu, Hanjia Lyu
Comments: ICCV 2025 VQualA workshop EVQA track
Journal-ref: ICCV 2025 Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Social and Information Networks (cs.SI)

This paper presents an overview of the VQualA 2025 Challenge on Engagement Prediction for Short Videos, held in conjunction with ICCV 2025. The challenge focuses on understanding and modeling the popularity of user-generated content (UGC) short videos on social media platforms. To support this goal, the challenge uses a new short-form UGC dataset featuring engagement metrics derived from real-world user interactions. This objective of the Challenge is to promote robust modeling strategies that capture the complex factors influencing user engagement. Participants explored a variety of multi-modal features, including visual content, audio, and metadata provided by creators. The challenge attracted 97 participants and received 15 valid test submissions, contributing significantly to progress in short-form UGC video engagement prediction.

[114] arXiv:2509.02970 [pdf, html, other]
Title: Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation
Kaoru Otsuka, Yuki Takezawa, Makoto Yamada
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Federated Learning (FL) allows distributed model training across multiple clients while preserving data privacy, but it remains vulnerable to Byzantine clients that exhibit malicious behavior. While existing Byzantine-robust FL methods provide strong convergence guarantees (e.g., to a stationary point in expectation) under Byzantine attacks, they typically assume full client participation, which is unrealistic due to communication constraints and client availability. Under partial participation, existing methods fail immediately after the sampled clients contain a Byzantine majority, creating a fundamental challenge for sparse communication. First, we introduce delayed momentum aggregation, a novel principle where the server aggregates the most recently received gradients from non-participating clients alongside fresh momentum from active clients. Our optimizer D-Byz-SGDM (Delayed Byzantine-robust SGD with Momentum) implements this delayed momentum aggregation principle for Byzantine-robust FL with partial participation. Then, we establish convergence guarantees that recover previous full participation results and match the fundamental lower bounds we prove for the partial participation setting. Experiments on deep learning tasks validated our theoretical findings, showing stable and robust training under various Byzantine attacks.

[115] arXiv:2509.02972 [pdf, html, other]
Title: IL-SLAM: Intelligent Line-assisted SLAM Based on Feature Awareness for Dynamic Environments
Haolan Zhang, Thanh Nguyen Canh, Chenghao Li, Ruidong Yang, Yonghoon Ji, Nak Young Chong
Comments: submitted to International Conference on Robotic Computing and Communication(IEEE IRC)
Subjects: Robotics (cs.RO)

Visual Simultaneous Localization and Mapping (SLAM) plays a crucial role in autonomous systems. Traditional SLAM methods, based on static environment assumptions, struggle to handle complex dynamic environments. Recent dynamic SLAM systems employ geometric constraints and deep learning to remove dynamic features, yet this creates a new challenge: insufficient remaining point features for subsequent SLAM processes. Existing solutions address this by continuously introducing additional line and plane features to supplement point features, achieving robust tracking and pose estimation. However, current methods continuously introduce additional features regardless of necessity, causing two problems: unnecessary computational overhead and potential performance degradation from accumulated low-quality additional features and noise. To address these issues, this paper proposes a feature-aware mechanism that evaluates whether current features are adequate to determine if line feature support should be activated. This decision mechanism enables the system to introduce line features only when necessary, significantly reducing computational complexity of additional features while minimizing the introduction of low-quality features and noise. In subsequent processing, the introduced line features assist in obtaining better initial camera poses through tracking, local mapping, and loop closure, but are excluded from global optimization to avoid potential negative impacts from low-quality additional features in long-term process. Extensive experiments on TUM datasets demonstrate substantial improvements in both ATE and RPE metrics compared to ORB-SLAM3 baseline and superior performance over other dynamic SLAM and multi-feature methods.

[116] arXiv:2509.02973 [pdf, html, other]
Title: InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System
Xianbao Hou, Yonghao He, Zeyd Boukhers, John See, Hu Su, Wei Sui, Cong Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Acquiring high-quality instance segmentation data is challenging due to the labor-intensive nature of the annotation process and significant class imbalances within datasets. Recent studies have utilized the integration of Copy-Paste and diffusion models to create more diverse datasets. However, these studies often lack deep collaboration between large language models (LLMs) and diffusion models, and underutilize the rich information within the existing training data. To address these limitations, we propose InstaDA, a novel, training-free Dual-Agent system designed to augment instance segmentation datasets. First, we introduce a Text-Agent (T-Agent) that enhances data diversity through collaboration between LLMs and diffusion models. This agent features a novel Prompt Rethink mechanism, which iteratively refines prompts based on the generated images. This process not only fosters collaboration but also increases image utilization and optimizes the prompts themselves. Additionally, we present an Image-Agent (I-Agent) aimed at enriching the overall data distribution. This agent augments the training set by generating new instances conditioned on the training images. To ensure practicality and efficiency, both agents operate as independent and automated workflows, enhancing usability. Experiments conducted on the LVIS 1.0 validation set indicate that InstaDA achieves significant improvements, with an increase of +4.0 in box average precision (AP) and +3.3 in mask AP compared to the baseline. Furthermore, it outperforms the leading model, DiverGen, by +0.3 in box AP and +0.1 in mask AP, with a notable +0.7 gain in box AP on common categories and mask AP gains of +0.2 on common categories and +0.5 on frequent categories.

[117] arXiv:2509.02981 [pdf, html, other]
Title: AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates
Minxin Zhang, Yuxuan Liu, Hayden Schaeffer
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

The recently proposed Muon optimizer updates weight matrices via orthogonalized momentum and has demonstrated strong empirical success in large language model training. However, it remains unclear how to determine the learning rates for such orthogonalized updates. AdaGrad, by contrast, is a widely used adaptive method that scales stochastic gradients by accumulated past gradients. We propose a new algorithm, AdaGO, which combines a norm-based AdaGrad-type stepsize with an orthogonalized update direction, bringing together the benefits of both approaches. Unlike other adaptive variants of Muon, AdaGO preserves the orthogonality of the update direction, which can be interpreted as a spectral descent direction, while adapting the stepsizes to the optimization landscape by scaling the direction with accumulated past gradient norms. The implementation of AdaGO requires only minimal modification to Muon, with a single additional scalar variable, the accumulated squared gradient norms, to be computed, making it computationally and memory efficient. Optimal theoretical convergence rates are established for nonconvex functions in both stochastic and deterministic settings under standard smoothness and unbiased bounded-variance noise assumptions. Empirical results on CIFAR-10 classification and function regression demonstrate that AdaGO outperforms Muon and Adam.

[118] arXiv:2509.02982 [pdf, html, other]
Title: StableSleep: Source-Free Test-Time Adaptation for Sleep Staging with Lightweight Safety Rails
Hritik Arasu, Faisal R Jahangiri
Comments: 5 page paper, 8 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

Sleep staging models often degrade when deployed on patients with unseen physiology or recording conditions. We propose a streaming, source-free test-time adaptation (TTA) recipe that combines entropy minimization (Tent) with Batch-Norm statistic refresh and two safety rails: an entropy gate to pause adaptation on uncertain windows and an EMA-based reset to reel back drift. On Sleep-EDF Expanded, using single-lead EEG (Fpz-Cz, 100 Hz, 30s epochs; R&K to AASM mapping), we show consistent gains over a frozen baseline at seconds-level latency and minimal memory, reporting per-stage metrics and Cohen's k. The method is model-agnostic, requires no source data or patient calibration, and is practical for on-device or bedside use.

[119] arXiv:2509.02983 [pdf, html, other]
Title: DUViN: Diffusion-Based Underwater Visual Navigation via Knowledge-Transferred Depth Features
Jinghe Yang, Minh-Quan Le, Mingming Gong, Ye Pu
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Autonomous underwater navigation remains a challenging problem due to limited sensing capabilities and the difficulty of constructing accurate maps in underwater environments. In this paper, we propose a Diffusion-based Underwater Visual Navigation policy via knowledge-transferred depth features, named DUViN, which enables vision-based end-to-end 4-DoF motion control for underwater vehicles in unknown environments. DUViN guides the vehicle to avoid obstacles and maintain a safe and perception awareness altitude relative to the terrain without relying on pre-built maps. To address the difficulty of collecting large-scale underwater navigation datasets, we propose a method that ensures robust generalization under domain shifts from in-air to underwater environments by leveraging depth features and introducing a novel model transfer strategy. Specifically, our training framework consists of two phases: we first train the diffusion-based visual navigation policy on in-air datasets using a pre-trained depth feature extractor. Secondly, we retrain the extractor on an underwater depth estimation task and integrate the adapted extractor into the trained navigation policy from the first step. Experiments in both simulated and real-world underwater environments demonstrate the effectiveness and generalization of our approach. The experimental videos are available at this https URL.

[120] arXiv:2509.02986 [pdf, html, other]
Title: CTBC: Contact-Triggered Blind Climbing for Wheeled Bipedal Robots with Instruction Learning and Reinforcement Learning
Rankun Li, Hao Wang, Qi Li, Zhuo Han, Yifei Chu, Linqi Ye, Wende Xie, Wenlong Liao
Subjects: Robotics (cs.RO)

In recent years, wheeled bipedal robots have gained increasing attention due to their advantages in mobility, such as high-speed locomotion on flat terrain. However, their performance on complex environments (e.g., staircases) remains inferior to that of traditional legged robots. To overcome this limitation, we propose a general contact-triggered blind climbing (CTBC) framework for wheeled bipedal robots. Upon detecting wheel-obstacle contact, the robot triggers a leg-lifting motion to overcome the obstacle. By leveraging a strongly-guided feedforward trajectory, our method enables the robot to rapidly acquire agile leg-lifting skills, significantly enhancing its capability to traverse unstructured terrains. The approach has been experimentally validated and successfully deployed on LimX Dynamics' wheeled bipedal robot, Tron1. Real-world tests demonstrate that Tron1 can reliably climb obstacles well beyond its wheel radius using only proprioceptive feedback.

[121] arXiv:2509.02990 [pdf, html, other]
Title: Automatically Generating High-Precision Simulated Road Networking in Traffic Scenario
Liang Xie, Wenke Huang
Comments: 7 pages,11 figures
Journal-ref: ACM MOBICOM 2025
Subjects: Multimedia (cs.MM)

Existing lane-level simulation road network generation is labor-intensive, resource-demanding, and costly due to the need for large-scale data collection and manual post-editing. To overcome these limitations, we propose automatically generating high-precision simulated road networks in traffic scenario, an efficient and fully automated solution. Initially, real-world road street view data is collected through open-source street view map platforms, and a large-scale street view lane line dataset is constructed to provide a robust foundation for subsequent analysis. Next, an end-to-end lane line detection approach based on deep learning is designed, where a neural network model is trained to accurately detect the number and spatial distribution of lane lines in street view images, enabling automated extraction of lane information. Subsequently, by integrating coordinate transformation and map matching algorithms, the extracted lane information from street views is fused with the foundational road topology obtained from open-source map service platforms, resulting in the generation of a high-precision lane-level simulation road network. This method significantly reduces the costs associated with data collection and manual editing while enhancing the efficiency and accuracy of simulation road network generation. It provides reliable data support for urban traffic simulation, autonomous driving navigation, and the development of intelligent transportation systems, offering a novel technical pathway for the automated modeling of large-scale urban road networks.

[122] arXiv:2509.02993 [pdf, html, other]
Title: SPENet: Self-guided Prototype Enhancement Network for Few-shot Medical Image Segmentation
Chao Fan, Xibin Jia, Anqi Xiao, Hongyuan Yu, Zhenghan Yang, Dawei Yang, Hui Xu, Yan Huang, Liang Wang
Comments: Accepted by MICCAI2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Few-Shot Medical Image Segmentation (FSMIS) aims to segment novel classes of medical objects using only a few labeled images. Prototype-based methods have made significant progress in addressing FSMIS. However, they typically generate a single global prototype for the support image to match with the query image, overlooking intra-class variations. To address this issue, we propose a Self-guided Prototype Enhancement Network (SPENet). Specifically, we introduce a Multi-level Prototype Generation (MPG) module, which enables multi-granularity measurement between the support and query images by simultaneously generating a global prototype and an adaptive number of local prototypes. Additionally, we observe that not all local prototypes in the support image are beneficial for matching, especially when there are substantial discrepancies between the support and query images. To alleviate this issue, we propose a Query-guided Local Prototype Enhancement (QLPE) module, which adaptively refines support prototypes by incorporating guidance from the query image, thus mitigating the negative effects of such discrepancies. Extensive experiments on three public medical datasets demonstrate that SPENet outperforms existing state-of-the-art methods, achieving superior performance.

[123] arXiv:2509.02998 [pdf, html, other]
Title: Integrating Generative AI into Cybersecurity Education: A Study of OCR and Multimodal LLM-assisted Instruction
Karan Patel, Yu-Zheng Lin, Gaurangi Raul, Bono Po-Jen Shih, Matthew W. Redondo, Banafsheh Saber Latibari, Jesus Pacheco, Soheil Salehi, Pratik Satam
Comments: 9 pages, 3 figures, accepted by IEEE FIE 2025
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR)

This full paper describes an LLM-assisted instruction integrated with a virtual cybersecurity lab platform. The digital transformation of Fourth Industrial Revolution (4IR) systems is reshaping workforce needs, widening skill gaps, especially among older workers. With rising emphasis on robotics, automation, AI, and security, re-skilling and up-skilling are essential. Generative AI can help build this workforce by acting as an instructional assistant to support skill acquisition during experiential learning. We present a generative AI instructional assistant integrated into a prior experiential learning platform. The assistant employs a zero-shot OCR-LLM pipeline within the legacy Cybersecurity Labs-as-a-Service (CLaaS) platform (2015). Text is extracted from slide images using Tesseract OCR, then simplified instructions are generated via a general-purpose LLM, enabling real-time instructional support with minimal infrastructure. The system was evaluated in a live university course where student feedback (n=42) averaged 7.83/10, indicating strong perceived usefulness. A comparative study with multimodal LLMs that directly interpret slide images showed higher performance on visually dense slides, but the OCR-LLM pipeline provided comparable pedagogical value on text-centric slides with much lower computational overhead and cost. This work demonstrates that a lightweight, easily integrable pipeline can effectively extend legacy platforms with modern generative AI, offering scalable enhancements for student comprehension in technical education.

[124] arXiv:2509.02999 [pdf, html, other]
Title: DiaCBT: A Long-Periodic Dialogue Corpus Guided by Cognitive Conceptualization Diagram for CBT-based Psychological Counseling
Yougen Zhou, Ningning Zhou, Qin Chen, Jie Zhou, Aimin Zhou, Liang He
Subjects: Computation and Language (cs.CL)

Psychotherapy reaches only a small fraction of individuals suffering from mental disorders due to social stigma and the limited availability of therapists. Large language models (LLMs), when equipped with professional psychotherapeutic skills, offer a promising solution to expand access to mental health services. However, the lack of psychological conversation datasets presents significant challenges in developing effective psychotherapy-guided conversational agents. In this paper, we construct a long-periodic dialogue corpus for counseling based on cognitive behavioral therapy (CBT). Our curated dataset includes multiple sessions for each counseling and incorporates cognitive conceptualization diagrams (CCDs) to guide client simulation across diverse scenarios. To evaluate the utility of our dataset, we train an in-depth counseling model and present a comprehensive evaluation framework to benchmark it against established psychological criteria for CBT-based counseling. Results demonstrate that DiaCBT effectively enhances LLMs' ability to emulate psychologists with CBT expertise, underscoring its potential for training more professional counseling agents.

[125] arXiv:2509.03000 [pdf, html, other]
Title: Closing the Visibility Gap: A Monitoring Framework for Verifiable Open RAN Operations
Hexuan Yu, Md Mohaimin Al Barat, Yang Xiao, Y. Thomas Hou, Wenjing Lou
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)

Open Radio Access Network (Open RAN) is reshaping mobile network architecture by promoting openness, disaggregation, and cross-vendor interoperability. However, this architectural flexibility introduces new security challenges, especially in deployments where multiple mobile network operators (MNOs) jointly operate shared components. Existing Zero Trust Architectures (ZTA) in O-RAN, as defined by governmental and industry standards, implicitly assume that authenticated components will comply with operational policies. However, this assumption creates a critical blind spot: misconfigured or compromised components can silently violate policies, misuse resources, or corrupt downstream processes (e.g., ML-based RIC xApps).
To address this critical gap, we propose a monitoring framework for low-trust O-RAN environments that proactively verifies configuration state and control behavior against tenant-defined policies. Our system provides scalable, verifiable oversight to enhance transparency and trust in O-RAN operations. We implement and evaluate the framework using standardized O-RAN configurations, with total processing latency of approximately 200 ms, demonstrating its efficiency and practicality for timely policy enforcement and compliance auditing in multi-MNO deployments.

[126] arXiv:2509.03002 [pdf, html, other]
Title: SOPSeg: Prompt-based Small Object Instance Segmentation in Remote Sensing Imagery
Chenhao Wang, Yingrui Ji, Yu Meng, Yunjian Zhang, Yao Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Extracting small objects from remote sensing imagery plays a vital role in various applications, including urban planning, environmental monitoring, and disaster management. While current research primarily focuses on small object detection, instance segmentation for small objects remains underexplored, with no dedicated datasets available. This gap stems from the technical challenges and high costs of pixel-level annotation for small objects. While the Segment Anything Model (SAM) demonstrates impressive zero-shot generalization, its performance on small-object segmentation deteriorates significantly, largely due to the coarse 1/16 feature resolution that causes severe loss of fine spatial details. To this end, we propose SOPSeg, a prompt-based framework specifically designed for small object segmentation in remote sensing imagery. It incorporates a region-adaptive magnification strategy to preserve fine-grained details, and employs a customized decoder that integrates edge prediction and progressive refinement for accurate boundary delineation. Moreover, we introduce a novel prompting mechanism tailored to the oriented bounding boxes widely adopted in remote sensing applications. SOPSeg outperforms existing methods in small object segmentation and facilitates efficient dataset construction for remote sensing tasks. We further construct a comprehensive small object instance segmentation dataset based on SODA-A, and will release both the model and dataset to support future research.

[127] arXiv:2509.03006 [pdf, html, other]
Title: Enhancing Robustness in Post-Processing Watermarking: An Ensemble Attack Network Using CNNs and Transformers
Tzuhsuan Huang, Cheng Yu Yeo, Tsai-Ling Huang, Hong-Han Shuai, Wen-Huang Cheng, Jun-Cheng Chen
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent studies on deep watermarking have predominantly focused on in-processing watermarking, which integrates the watermarking process into image generation. However, post-processing watermarking, which embeds watermarks after image generation, offers more flexibility. It can be applied to outputs from any generative model (e.g. GANs, diffusion models) without needing access to the model's internal structure. It also allows users to embed unique watermarks into individual images. Therefore, this study focuses on post-processing watermarking and enhances its robustness by incorporating an ensemble attack network during training. We construct various versions of attack networks using CNN and Transformer in both spatial and frequency domains to investigate how each combination influences the robustness of the watermarking model. Our results demonstrate that combining a CNN-based attack network in the spatial domain with a Transformer-based attack network in the frequency domain yields the highest robustness in watermarking models. Extensive evaluation on the WAVES benchmark, using average bit accuracy as the metric, demonstrates that our ensemble attack network significantly enhances the robustness of baseline watermarking methods under various stress tests. In particular, for the Regeneration Attack defined in WAVES, our method improves StegaStamp by 18.743%. The code is released at:this https URL.

[128] arXiv:2509.03010 [pdf, html, other]
Title: Mitigating Data Imbalance in Automated Speaking Assessment
Fong-Chun Tsai, Kuan-Tang Huang, Bi-Cheng Yan, Tien-Hong Lo, Berlin Chen
Comments: Submitted to APSIPA 2025
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Automated Speaking Assessment (ASA) plays a crucial role in evaluating second-language (L2) learners proficiency. However, ASA models often suffer from class imbalance, leading to biased predictions. To address this, we introduce a novel objective for training ASA models, dubbed the Balancing Logit Variation (BLV) loss, which perturbs model predictions to improve feature representation for minority classes without modifying the dataset. Evaluations on the ICNALE benchmark dataset show that integrating the BLV loss into a celebrated text-based (BERT) model significantly enhances classification accuracy and fairness, making automated speech evaluation more robust for diverse learners.

[129] arXiv:2509.03011 [pdf, html, other]
Title: Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations
Alexis Ivan Lopez Escamilla, Gilberto Ochoa, Sharib Al
Comments: Miccai Demi Conference 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We present a lesion-aware image captioning framework for ulcerative colitis (UC). The model integrates ResNet embeddings, Grad-CAM heatmaps, and CBAM-enhanced attention with a T5 decoder. Clinical metadata (MES score 0-3, vascular pattern, bleeding, erythema, friability, ulceration) is injected as natural-language prompts to guide caption generation. The system produces structured, interpretable descriptions aligned with clinical practice and provides MES classification and lesion tags. Compared with baselines, our approach improves caption quality and MES classification accuracy, supporting reliable endoscopic reporting.

[130] arXiv:2509.03012 [pdf, html, other]
Title: Uncertainty-aware Test-Time Training (UT$^3$) for Efficient On-the-fly Domain Adaptive Dense Regression
Uddeshya Upadhyay
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks (DNNs) are increasingly being used in autonomous systems. However, DNNs do not generalize well to domain shift. Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous systems deployed to the real world. Recent work on test-time training proposes methods that adapt to a new test distribution on the fly by optimizing the DNN model for each test input using self-supervision. However, these techniques result in a sharp increase in inference time as multiple forward and backward passes are required for a single test sample (for test-time training) before finally making the prediction based on the fine-tuned features. This is undesirable for real-world robotics applications where these models may be deployed to resource constraint hardware with strong latency requirements. In this work, we propose a new framework (called UT$^3$) that leverages test-time training for improved performance in the presence of continuous domain shift while also decreasing the inference time, making it suitable for real-world applications. Our method proposes an uncertainty-aware self-supervision task for efficient test-time training that leverages the quantified uncertainty to selectively apply the training leading to sharp improvements in the inference time while performing comparably to standard test-time training protocol. Our proposed protocol offers a continuous setting to identify the selected keyframes, allowing the end-user to control how often to apply test-time training. We demonstrate the efficacy of our method on a dense regression task - monocular depth estimation.

[131] arXiv:2509.03015 [pdf, html, other]
Title: Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems
David Jin, Alexis Montoison, Sungho Shin
Subjects: Mathematical Software (cs.MS)

We present a GPU implementation for the factorization and solution of block-tridiagonal symmetric positive definite linear systems, which commonly arise in time-dependent estimation and optimal control problems. Our method employs a recursive algorithm based on Schur complement reduction, transforming the system into a hierarchy of smaller, independent blocks that can be efficiently solved in parallel using batched BLAS/LAPACK routines. While batched routines have been used in sparse solvers, our approach applies these kernels in a tailored way by exploiting the block-tridiagonal structure known in advance. Performance benchmarks based on our open-source, cross-platform implementation, TBD-GPU, demonstrate the advantages of this tailored utilization: achieving substantial speed-ups compared to state-of-the-art CPU direct solvers, including CHOLMOD and HSL MA57, while remaining competitive with NVIDIA cuDSS. However, the current implementation still performs sequential calls of batched routines at each recursion level, and the block size must be sufficiently large to adequately amortize kernel launch overhead.

[132] arXiv:2509.03018 [pdf, html, other]
Title: Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training
Yangtao Deng, Lei Zhang, Qinlong Wang, Xiaoyun Zhi, Xinlei Zhang, Zhuo Jiang, Haohan Xu, Lei Wang, Zuquan Song, Gaohong Liu, Yang Bai, Shuguang Wang, Wencong Xiao, Jianxi Ye, Minlan Yu, Hong Xu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Reliability is essential for ensuring efficiency in LLM training. However, many real-world reliability issues remain difficult to resolve, resulting in wasted resources and degraded model performance. Unfortunately, today's collective communication libraries operate as black boxes, hiding critical information needed for effective root cause analysis. We propose Mycroft, a lightweight distributed tracing and root cause analysis system designed to address previously hidden reliability issues in collective communication. Mycroft's key idea is to trace collective communication states and leverage internal control and data dependencies to resolve reliability problems in LLM training. Mycroft has been deployed at ByteDance for over six months to debug collective communication related issues at runtime. It detected anomalies within 15 seconds in 90% of cases and identified the root cause within 20 seconds in 60% of cases. We also conducted extensive fault injection experiments to demonstrate Mycroft's capability and efficiency.

[133] arXiv:2509.03020 [pdf, html, other]
Title: Training LLMs to be Better Text Embedders through Bidirectional Reconstruction
Chang Su, Dengliang Shi, Siyuan Huang, Jintao Du, Changhua Meng, Yu Cheng, Weiqiang Wang, Zhouhan Lin
Comments: accepted by EMNLP 2025 Main Conference
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Large language models (LLMs) have increasingly been explored as powerful text embedders. Existing LLM-based text embedding approaches often leverage the embedding of the final token, typically a reserved special token such as [EOS]. However, these tokens have not been intentionally trained to capture the semantics of the whole context, limiting their capacity as text embeddings, especially for retrieval and re-ranking tasks. We propose to add a new training stage before contrastive learning to enrich the semantics of the final token embedding. This stage employs bidirectional generative reconstruction tasks, namely EBQ2D (Embedding-Based Query-to-Document) and EBD2Q (Embedding-Based Document-to-Query), which interleave to anchor the [EOS] embedding and reconstruct either side of Query-Document pairs. Experimental results demonstrate that our additional training stage significantly improves LLM performance on the Massive Text Embedding Benchmark (MTEB), achieving new state-of-the-art results across different LLM base models and scales.

[134] arXiv:2509.03024 [pdf, html, other]
Title: Efficient Privacy-Preserving Recommendation on Sparse Data using Fully Homomorphic Encryption
Moontaha Nishat Chowdhury, André Bauer, Minxuan Zhou
Comments: The paper is accepted at the 21st IEEE International eScience Conference (eScience'25) and will be published soon. Link: this https URL
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In today's data-driven world, recommendation systems personalize user experiences across industries but rely on sensitive data, raising privacy concerns. Fully homomorphic encryption (FHE) can secure these systems, but a significant challenge in applying FHE to recommendation systems is efficiently handling the inherently large and sparse user-item rating matrices. FHE operations are computationally intensive, and naively processing various sparse matrices in recommendation systems would be prohibitively expensive. Additionally, the communication overhead between parties remains a critical concern in encrypted domains. We propose a novel approach combining Compressed Sparse Row (CSR) representation with FHE-based matrix factorization that efficiently handles matrix sparsity in the encrypted domain while minimizing communication costs. Our experimental results demonstrate high recommendation accuracy with encrypted data while achieving the lowest communication costs, effectively preserving user privacy.

[135] arXiv:2509.03025 [pdf, html, other]
Title: Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens
Sohee Kim, Soohyun Ryu, Joonhyung Park, Eunho Yang
Comments: accepted to EMNLP 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Large Vision-Language Models (LVLMs) generate contextually relevant responses by jointly interpreting visual and textual inputs. However, our finding reveals they often mistakenly perceive text inputs lacking visual evidence as being part of the image, leading to erroneous responses. In light of this finding, we probe whether LVLMs possess an internal capability to determine if textual concepts are grounded in the image, and discover a specific subset of Feed-Forward Network (FFN) neurons, termed Visual Absence-aware (VA) neurons, that consistently signal the visual absence through a distinctive activation pattern. Leveraging these patterns, we develop a detection module that systematically classifies whether an input token is visually grounded. Guided by its prediction, we propose a method to refine the outputs by reinterpreting question prompts or replacing the detected absent tokens during generation. Extensive experiments show that our method effectively mitigates the models' tendency to falsely presume the visual presence of text input and its generality across various LVLMs.

[136] arXiv:2509.03029 [pdf, other]
Title: Multimodal learning of melt pool dynamics in laser powder bed fusion
Satyajit Mojumder, Pallock Halder, Tiana Tonge
Comments: 20 pages, 6 figures, 1 table
Subjects: Machine Learning (cs.LG)

While multiple sensors are used for real-time monitoring in additive manufacturing, not all provide practical or reliable process insights. For example, high-speed X-ray imaging offers valuable spatial information about subsurface melt pool behavior but is costly and impractical for most industrial settings. In contrast, absorptivity data from low-cost photodiodes correlate with melt pool dynamics but is often too noisy for accurate prediction when used alone. In this paper, we propose a multimodal data fusion approach for predicting melt pool dynamics by combining high-fidelity X-ray data with low-fidelity absorptivity data in the Laser Powder Bed Fusion (LPBF) process. Our multimodal learning framework integrates convolutional neural networks (CNNs) for spatial feature extraction from X-ray data with recurrent neural networks (RNNs) for temporal feature extraction from absorptivity signals, using an early fusion strategy. The multimodal model is further used as a transfer learning model to fine-tune the RNN model that can predict melt pool dynamics only with absorptivity, with greater accuracy compared to the multimodal model. Results show that training with both modalities significantly improves prediction accuracy compared to using either modality alone. Furthermore, once trained, the model can infer melt pool characteristics using only absorptivity data, eliminating the need for expensive X-ray imaging. This multimodal fusion approach enables cost-effective, real-time monitoring and has broad applicability in additive manufacturing.

[137] arXiv:2509.03030 [pdf, html, other]
Title: Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning
Zida Wu, Mathieu Lauriere, Matthieu Geist, Olivier Pietquin, Ankur Mehta
Comments: 2025 IEEE 64rd Conference on Decision and Control (CDC)
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)

Mean Field Games (MFGs) offer a powerful framework for studying large-scale multi-agent systems. Yet, learning Nash equilibria in MFGs remains a challenging problem, particularly when the initial distribution is unknown or when the population is subject to common noise. In this paper, we introduce an efficient deep reinforcement learning (DRL) algorithm designed to achieve population-dependent Nash equilibria without relying on averaging or historical sampling, inspired by Munchausen RL and Online Mirror Descent. The resulting policy is adaptable to various initial distributions and sources of common noise. Through numerical experiments on seven canonical examples, we demonstrate that our algorithm exhibits superior convergence properties compared to state-of-the-art algorithms, particularly a DRL version of Fictitious Play for population-dependent policies. The performance in the presence of common noise underscores the robustness and adaptability of our approach.

[138] arXiv:2509.03032 [pdf, html, other]
Title: Background Matters Too: A Language-Enhanced Adversarial Framework for Person Re-Identification
Kaicong Huang, Talha Azfar, Jack M. Reilly, Thomas Guggisberg, Ruimin Ke
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Person re-identification faces two core challenges: precisely locating the foreground target while suppressing background noise and extracting fine-grained features from the target region. Numerous visual-only approaches address these issues by partitioning an image and applying attention modules, yet they rely on costly manual annotations and struggle with complex occlusions. Recent multimodal methods, motivated by CLIP, introduce semantic cues to guide visual understanding. However, they focus solely on foreground information, but overlook the potential value of background cues. Inspired by human perception, we argue that background semantics are as important as the foreground semantics in ReID, as humans tend to eliminate background distractions while focusing on target appearance. Therefore, this paper proposes an end-to-end framework that jointly models foreground and background information within a dual-branch cross-modal feature extraction pipeline. To help the network distinguish between the two domains, we propose an intra-semantic alignment and inter-semantic adversarial learning strategy. Specifically, we align visual and textual features that share the same semantics across domains, while simultaneously penalizing similarity between foreground and background features to enhance the network's discriminative power. This strategy drives the model to actively suppress noisy background regions and enhance attention toward identity-relevant foreground cues. Comprehensive experiments on two holistic and two occluded ReID benchmarks demonstrate the effectiveness and generality of the proposed method, with results that match or surpass those of current state-of-the-art approaches.

[139] arXiv:2509.03034 [pdf, html, other]
Title: On a class of twisted elliptic curve codes
Xiaofeng Liu, Jun Zhang, Fang-Wei Fu
Subjects: Information Theory (cs.IT)

Motivated by the studies of twisted generalized Reed-Solomon (TGRS) codes, we initiate the study of twisted elliptic curve codes (TECCs) in this paper. In particular, we study a class of TECCs with one twist. The parity-check matrices of the TECCs are explicitly given by computing the Weil differentials. Then the sufficient and necessary conditions of self-duality are presented. The minimum distances of the TECCs are also determined. Moreover, examples of MDS, AMDS, self-dual and MDS self-dual TECCs are given. Finally, we calculate the dimensions of the Schur squares of TECCs and show the non-equivalence between TECCs and ECCs/GRS codes.

[140] arXiv:2509.03036 [pdf, html, other]
Title: Knowledge Integration for Physics-informed Symbolic Regression Using Pre-trained Large Language Models
Bilge Taskin, Wenxiong Xie, Teddy Lazebnik
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Symbolic Computation (cs.SC)

Symbolic regression (SR) has emerged as a powerful tool for automated scientific discovery, enabling the derivation of governing equations from experimental data. A growing body of work illustrates the promise of integrating domain knowledge into the SR to improve the discovered equation's generality and usefulness. Physics-informed SR (PiSR) addresses this by incorporating domain knowledge, but current methods often require specialized formulations and manual feature engineering, limiting their adaptability only to domain experts. In this study, we leverage pre-trained Large Language Models (LLMs) to facilitate knowledge integration in PiSR. By harnessing the contextual understanding of LLMs trained on vast scientific literature, we aim to automate the incorporation of domain knowledge, reducing the need for manual intervention and making the process more accessible to a broader range of scientific problems. Namely, the LLM is integrated into the SR's loss function, adding a term of the LLM's evaluation of the SR's produced equation. We extensively evaluate our method using three SR algorithms (DEAP, gplearn, and PySR) and three pre-trained LLMs (Falcon, Mistral, and LLama 2) across three physical dynamics (dropping ball, simple harmonic motion, and electromagnetic wave). The results demonstrate that LLM integration consistently improves the reconstruction of physical dynamics from data, enhancing the robustness of SR models to noise and complexity. We further explore the impact of prompt engineering, finding that more informative prompts significantly improve performance.

[141] arXiv:2509.03037 [pdf, html, other]
Title: TraceLLM: Security Diagnosis Through Traces and Smart Contracts in Ethereum
Shuzheng Wang, Yue Huang, Zhuoer Xu, Yuming Huang, Jing Tang
Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET); Software Engineering (cs.SE)

Ethereum smart contracts hold tens of billions of USD in DeFi and NFTs, yet comprehensive security analysis remains difficult due to unverified code, proxy-based architectures, and the reliance on manual inspection of complex execution traces. Existing approaches fall into two main categories: anomaly transaction detection, which flags suspicious transactions but offers limited insight into specific attack strategies hidden in execution traces inside transactions, and code vulnerability detection, which cannot analyze unverified contracts and struggles to show how identified flaws are exploited in real incidents. As a result, analysts must still manually align transaction traces with contract code to reconstruct attack scenarios and conduct forensics. To address this gap, TraceLLM is proposed as a framework that leverages LLMs to integrate execution trace-level detection with decompiled contract code. We introduce a new anomaly execution path identification algorithm and an LLM-refined decompile tool to identify vulnerable functions and provide explicit attack paths to LLM. TraceLLM establishes the first benchmark for joint trace and contract code-driven security analysis. For comparison, proxy baselines are created by jointly transmitting the results of three representative code analysis along with raw traces to LLM. TraceLLM identifies attacker and victim addresses with 85.19\% precision and produces automated reports with 70.37\% factual precision across 27 cases with ground truth expert reports, achieving 25.93\% higher accuracy than the best baseline. Moreover, across 148 real-world Ethereum incidents, TraceLLM automatically generates reports with 66.22\% expert-verified accuracy, demonstrating strong generalizability.

[142] arXiv:2509.03041 [pdf, html, other]
Title: MedLiteNet: Lightweight Hybrid Medical Image Segmentation Model
Pengyang Yu, Haoquan Wang, Gerard Marks, Tahar Kechadi, Laurence T. Yang, Sahraoui Dhelim, Nyothiri Aung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate skin-lesion segmentation remains a key technical challenge for computer-aided diagnosis of skin cancer. Convolutional neural networks, while effective, are constrained by limited receptive fields and thus struggle to model long-range dependencies. Vision Transformers capture global context, yet their quadratic complexity and large parameter budgets hinder use on the small-sample medical datasets common in dermatology. We introduce the MedLiteNet, a lightweight CNN Transformer hybrid tailored for dermoscopic segmentation that achieves high precision through hierarchical feature extraction and multi-scale context aggregation. The encoder stacks depth-wise Mobile Inverted Bottleneck blocks to curb computation, inserts a bottleneck-level cross-scale token-mixing unit to exchange information between resolutions, and embeds a boundary-aware self-attention module to sharpen lesion contours.

[143] arXiv:2509.03044 [pdf, html, other]
Title: DCDB: Dynamic Conditional Dual Diffusion Bridge for Ill-posed Multi-Tasks
Chengjie Huang, Jiafeng Yan, Jing Li, Lu Bai
Comments: 15 pages,6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Conditional diffusion models have made impressive progress in the field of image processing, but the characteristics of constructing data distribution pathways make it difficult to exploit the intrinsic correlation between tasks in multi-task scenarios, which is even worse in ill-posed tasks with a lack of training data. In addition, traditional static condition control makes it difficult for networks to learn in multi-task scenarios with its dynamically evolving characteristics. To address these challenges, we propose a dynamic conditional double diffusion bridge training paradigm to build a general framework for ill-posed multi-tasks. Firstly, this paradigm decouples the diffusion and condition generation processes, avoiding the dependence of the diffusion model on supervised data in ill-posed tasks. Secondly, generated by the same noise schedule, dynamic conditions are used to gradually adjust their statistical characteristics, naturally embed time-related information, and reduce the difficulty of network learning. We analyze the learning objectives of the network under different conditional forms in the single-step denoising process and compare the changes in its attention weights in the network, demonstrating the superiority of our dynamic conditions. Taking dehazing and visible-infrared fusion as typical ill-posed multi-task scenarios, we achieve the best performance in multiple indicators on public datasets. The code has been publicly released at: this https URL.

[144] arXiv:2509.03047 [pdf, html, other]
Title: FlashRecovery: Fast and Low-Cost Recovery from Failures for Large-Scale Training of LLMs
Haijun Zhang, Jinxiang Wang, Zhenhua Yu, Yanyong Zhang, Xuejie Ji, Kaining Mao, Jun Zhang, Yaqing Zhang, Ting Wu, Fei Jie, Xiemin Huang, Zhifang Cai, Junhua Cheng, Shuwei Wang, Wei Li, Xiaoming Bao, Hua Xu, Shixiong Zhao, Jun Li, Hongwei Sun, Ziyang Zhang, Yi Xiong, Chunsheng Li
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

Large language models (LLMs) have made a profound impact across various fields due to their advanced capabilities. However, training these models at unprecedented scales requires extensive AI accelerator clusters and sophisticated parallelism strategies, which pose significant challenges in maintaining system reliability over prolonged training periods. A major concern is the substantial loss of training time caused by inevitable hardware and software failures. To address these challenges, we present FlashRecovery, a fast and low-cost failure recovery system comprising three core modules: (1) Active and real-time failure detection. This module performs continuous training state monitoring, enabling immediate identification of hardware and software failures within seconds, thus ensuring rapid incident response; (2) Scale-independent task restart. By employing different recovery strategies for normal and faulty nodes, combined with an optimized communication group reconstruction protocol, our approach ensures that the recovery time remains nearly constant, regardless of cluster scale; (3) Checkpoint-free recovery within one step. Our novel recovery mechanism enables single-step restoration, completely eliminating dependence on traditional checkpointing methods and their associated overhead. Collectively, these innovations enable FlashRecovery to achieve optimal Recovery Time Objective (RTO) and Recovery Point Objective (RPO), substantially improving the reliability and efficiency of long-duration LLM training. Experimental results demonstrate that FlashRecovery system can achieve training restoration on training cluster with 4, 800 devices in 150 seconds. We also verify that the time required for failure recovery is nearly consistent for different scales of training tasks.

[145] arXiv:2509.03049 [pdf, html, other]
Title: Multi-layer Digital Twin System for Future Mobile Metaverse
Gaosheng Zhao, Dong In Kim
Comments: This article has been accepted for publication in IEEE Wireless Communications
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

In the upcoming 6G era, the communication networks are expected to face unprecedented challenges in terms of complexity and dynamics. Digital Twin (DT) technology, with its various digital capabilities, holds great potential to facilitate the transformation of the communication network from passive responding to proactive adaptation. Thus, in this paper, we propose a multi-layer DT system that coordinates local DT, edge DT, and cloud DT for future network architecture and functions. In our vision, the proposed DT system will not only achieve real-time data-driven decision-making and digital agent functions previously handled by centralized DT, but will do so in a more distributed, mobile, layer-by-layer manner. Moreover, it will supply essential data, pre-trained models, and open interfaces for future metaverse applications, enabling creators and users to efficiently develop and experience metaverse services.

[146] arXiv:2509.03052 [pdf, html, other]
Title: Fast approximation algorithms for the 1-median problem on real-world large graphs
Keisuke Ueta, Wei Wu, Mutsunori Yagiura
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

The 1-median problem (1MP) on undirected weighted graphs seeks to find a facility location minimizing the total weighted distance to all customer nodes. Although the 1MP can be solved exactly by computing the single-source shortest paths from each customer node, such approaches become computationally expensive on large-scale graphs with millions of nodes. In many real-world applications, such as recommendation systems based on large-scale knowledge graphs, the number of nodes (i.e., potential facility locations) is enormous, whereas the number of customer nodes is relatively small and spatially concentrated. In such cases, exhaustive graph exploration is not only inefficient but also unnecessary. Leveraging this observation, we propose three approximation algorithms that reduce computation by terminating Dijkstra's algorithm early. We provide theoretical analysis showing that one of the proposed algorithms guarantees an approximation ratio of 2, whereas the other two improve this ratio to 1.618. We demonstrate that the lower bound of the approximation ratio is 1.2 by presenting a specific instance. Moreover, we show that all proposed algorithms return optimal solutions when the number of customer nodes is less than or equal to three. Extensive experiments demonstrate that our algorithms significantly outperform baseline exact methods in runtime while maintaining near-optimal accuracy across all tested graph types. Notably, on grid graphs with 10 million nodes, our algorithms obtains all optimal solutions within 1 millisecond, whereas the baseline exact method requires over 70 seconds on average.

[147] arXiv:2509.03054 [pdf, html, other]
Title: Binary Quantization For LLMs Through Dynamic Grouping
Xinzhe Zheng, Zhen-Qun Yang, Haoran Xie, S. Joe Qin, Arlene Chen, Fangzhen Lin
Comments: 14 pages, 11 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of Natural Language Processing (NLP) tasks, but require substantial memory and computational resources. Binary quantization, which compresses model weights from 16-bit Brain Float to 1-bit representations in {-1, 1}, offers significant reductions in storage and inference costs. However, such aggressive quantization often leads to notable performance degradation compared to more conservative 4-bit quantization methods. In this research, we propose a novel optimization objective tailored for binary quantization, along with three algorithms designed to realize it effectively. Our method enhances blocked quantization by dynamically identifying optimal unstructured sub-matrices through adaptive grouping strategies. Experimental results demonstrate that our approach achieves an average bit length of just 1.007 bits, while maintaining high model quality. Specifically, our quantized LLaMA 3.2 3B model attains a perplexity of 8.23, remarkably close to the original 7.81, and surpasses previous SOTA BiLLM with a perplexity of only 123.90. Furthermore, our method is competitive with SOTA 4-bit approaches such as GPTQ in both performance and efficiency. The compression process is highly efficient, requiring only 14 seconds to quantize the full LLaMA 3.2 3B weights on a single CPU core, with the entire process completing in under 100 minutes and exhibiting embarrassingly parallel properties.
Code - this https URL

[148] arXiv:2509.03056 [pdf, html, other]
Title: Discrete Functional Geometry of ReLU Networks via ReLU Transition Graphs
Sahil Rajesh Dhayalkar
Comments: 7 pages, 3 figures. Submitted as a conference paper to 2025 5th International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2025)
Subjects: Machine Learning (cs.LG)

We extend the ReLU Transition Graph (RTG) framework into a comprehensive graph-theoretic model for understanding deep ReLU networks. In this model, each node represents a linear activation region, and edges connect regions that differ by a single ReLU activation flip, forming a discrete geometric structure over the network's functional behavior. We prove that RTGs at random initialization exhibit strong expansion, binomial degree distributions, and spectral properties that tightly govern generalization. These structural insights enable new bounds on capacity via region entropy and on generalization via spectral gap and edge-wise KL divergence. Empirically, we construct RTGs for small networks, measure their smoothness and connectivity properties, and validate theoretical predictions. Our results show that region entropy saturates under overparameterization, spectral gap correlates with generalization, and KL divergence across adjacent regions reflects functional smoothness. This work provides a unified framework for analyzing ReLU networks through the lens of discrete functional geometry, offering new tools to understand, diagnose, and improve generalization.

[149] arXiv:2509.03057 [pdf, other]
Title: Structure-Learnable Adapter Fine-Tuning for Parameter-Efficient Large Language Models
Ming Gong, Yingnan Deng, Nia Qi, Yujun Zou, Zhihao Xue, Yun Zi
Subjects: Computation and Language (cs.CL)

This paper addresses the issues of parameter redundancy, rigid structure, and limited task adaptability in the fine-tuning of large language models. It proposes an adapter-based fine-tuning method built on a structure-learnable mechanism. By introducing differentiable gating functions and structural sparsity control variables, the method enables automatic optimization of adapter insertion points, activation paths, and module combinations. This allows the model to adjust its structure flexibly in multi-task settings to match different task characteristics. With the backbone parameters kept frozen, the method uses a structure search mechanism to guide the dynamic construction of task-specific efficient substructures during training. This significantly improves parameter utilization and representational capacity. In addition, the paper designs a set of sensitivity analysis experiments to systematically evaluate the effects of sparsity weight, noise injection ratio, and data perturbation on model performance. These experiments verify the stability and robustness of the proposed method across various multi-task natural language understanding tasks. The experimental results show that the proposed method outperforms mainstream parameter-efficient tuning techniques on multiple tasks. It achieves a better balance among accuracy, compression rate, and robustness to noise and perturbation.

[150] arXiv:2509.03058 [pdf, html, other]
Title: EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint
Zhenhua Xu, Meng Han, Wenpeng Xing
Comments: Accepted by EMNLP2025 Main
Subjects: Cryptography and Security (cs.CR)

The proliferation of large language models (LLMs) has intensified concerns over model theft and license violations, necessitating robust and stealthy ownership verification. Existing fingerprinting methods either require impractical white-box access or introduce detectable statistical anomalies. We propose EverTracer, a novel gray-box fingerprinting framework that ensures stealthy and robust model provenance tracing. EverTracer is the first to repurpose Membership Inference Attacks (MIAs) for defensive use, embedding ownership signals via memorization instead of artificial trigger-output overfitting. It consists of Fingerprint Injection, which fine-tunes the model on any natural language data without detectable artifacts, and Verification, which leverages calibrated probability variation signal to distinguish fingerprinted models. This approach remains robust against adaptive adversaries, including input level modification, and model-level modifications. Extensive experiments across architectures demonstrate EverTracer's state-of-the-art effectiveness, stealthness, and resilience, establishing it as a practical solution for securing LLM intellectual property. Our code and data are publicly available at this https URL.

[151] arXiv:2509.03059 [pdf, html, other]
Title: Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Xingyue Huang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, Wendong Fan, Xin Gao, Ruohao Guo, Yuan He, Zhuangzhuang He, Xianglong Hu, Neil Johnson, Bowen Li, Fangru Lin, Siyu Lin, Tong Liu, Yunpu Ma, Hao Shen, Hao Sun, Beibei Wang, Fangyijie Wang, Hao Wang, Haoran Wang, Yang Wang, Yifeng Wang, Zhaowei Wang, Ziyang Wang, Yifan Wu, Zikai Xiao, Chengxing Xie, Fan Yang, Junxiao Yang, Qianshuo Ye, Ziyu Ye, Guangtao Zeng, Yuwen Ebony Zhang, Zeyu Zhang, Zihao Zhu, Bernard Ghanem, Philip Torr, Guohao Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at this https URL.

[152] arXiv:2509.03060 [pdf, html, other]
Title: A Long Short-Term Memory (LSTM) Model for Business Sentiment Analysis Based on Recurrent Neural Network
Md. Jahidul Islam Razin, Md. Abdul Karim, M. F. Mridha, S M Rafiuddin, Tahira Alam
Comments: 11 pages, 9 figures, 3 tables, published in Sustainable Communication Networks and Application: Proceedings of ICSCN 2020 (2021). Paper presents an LSTM-based business sentiment analysis model with 91.33% accuracy, compares against KNN, SVM, and Naive Bayes, and discusses methodology, dataset, training/testing, results, and implementation tools
Subjects: Computation and Language (cs.CL)

Business sentiment analysis (BSA) is one of the significant and popular topics of natural language processing. It is one kind of sentiment analysis techniques for business purposes. Different categories of sentiment analysis techniques like lexicon-based techniques and different types of machine learning algorithms are applied for sentiment analysis on different languages like English, Hindi, Spanish, etc. In this paper, long short-term memory (LSTM) is applied for business sentiment analysis, where a recurrent neural network is used. An LSTM model is used in a modified approach to prevent the vanishing gradient problem rather than applying the conventional recurrent neural network (RNN). To apply the modified RNN model, product review dataset is used. In this experiment, 70\% of the data is trained for the LSTM and the rest 30\% of the data is used for testing. The result of this modified RNN model is compared with other conventional RNN models, and a comparison is made among the results. It is noted that the proposed model performs better than the other conventional RNN models. Here, the proposed model, i.e., the modified RNN model approach has achieved around 91.33\% of accuracy. By applying this model, any business company or e-commerce business site can identify the feedback from their customers about different types of products that customers like or dislike. Based on the customer reviews, a business company or e-commerce platform can evaluate its marketing strategy.

[153] arXiv:2509.03061 [pdf, html, other]
Title: Isolated Bangla Handwritten Character Classification using Transfer Learning
Abdul Karim, S M Rafiuddin, Jahidul Islam Razin, Tahira Alam
Comments: Comments: 13 pages, 14 figures, published in the Proceedings of the 2nd International Conference on Computing Advancements (ICCA 2022), IEEE. Strong experimental section with comparisons across models (3DCNN, ResNet50, MobileNet)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Bangla language consists of fifty distinct characters and many compound characters. Several notable studies have been performed to recognize Bangla characters, both handwritten and optical. Our approach uses transfer learning to classify the basic, distinct, as well as compound Bangla handwritten characters while avoiding the vanishing gradient problem. Deep Neural Network techniques such as 3D Convolutional Neural Network (3DCNN), Residual Neural Network (ResNet), and MobileNet are applied to generate an end-to-end classification of all possible standard formations of handwritten characters in the Bangla language. The Bangla Lekha Isolated dataset, which contains 166,105 Bangla character image samples categorized into 84 distinct classes, is used for this classification model. The model achieved 99.82% accuracy on training data and 99.46% accuracy on test data. Comparisons with various state-of-the-art benchmarks of Bangla handwritten character classification show that the proposed model achieves better accuracy in classifying the data.

[154] arXiv:2509.03062 [pdf, html, other]
Title: High Cursive Complex Character Recognition using GAN External Classifier
S M Rafiuddin
Comments: Comments: 10 pages, 8 figures, published in the Proceedings of the 2nd International Conference on Computing Advancements (ICCA 2022). Paper introduces ADA-GAN with an external classifier for complex cursive handwritten character recognition, evaluated on MNIST and BanglaLekha datasets, showing improved robustness compared to CNN baselines
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Handwritten characters can be trickier to classify due to their complex and cursive nature compared to simple and non-cursive characters. We present an external classifier along with a Generative Adversarial Network that can classify highly cursive and complex characters. The generator network produces fake handwritten character images, which are then used to augment the training data after adding adversarially perturbed noise and achieving a confidence score above a threshold with the discriminator network. The results show that the accuracy of convolutional neural networks decreases as character complexity increases, but our proposed model, ADA-GAN, remains more robust and effective for both cursive and complex characters.

[155] arXiv:2509.03071 [pdf, other]
Title: AI-Generated Images for representing Individuals: Navigating the Thin Line Between Care and Bias
Julia C. Ahrend, Björn Döge, Tom M Duscher, Dario Rodighiero
Comments: Pictorial for IEEE VIS Art Program 2025 (VISAP). Theme: Collective Care. 15 pages, 38 figures
Subjects: Computers and Society (cs.CY)

This research discusses the figurative tensions that arise when using portraits to represent individuals behind a dataset. In the broader effort to communicate European data related to depression, the Kiel Science Communication Network (KielSCN) team attempted to engage a wider audience by combining interactive data graphics with AI-generated images of people. This article examines the project's decisions and results, reflecting on the reaction from the audience when information design incorporates figurative representations of individuals within the data.

[156] arXiv:2509.03093 [pdf, html, other]
Title: Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Design Principle Violations
Fatih Pehlivan, Arçin Ülkü Ergüzen, Sahand Moslemi Yengejeh, Mayasah Lami, Anil Koyuncu
Comments: Accepted to ASE2025
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Traditional static analysis methods struggle to detect semantic design flaws, such as violations of the SOLID principles, which require a strong understanding of object-oriented design patterns and principles. Existing solutions typically focus on individual SOLID principles or specific programming languages, leaving a gap in the ability to detect violations across all five principles in multi-language codebases. This paper presents a new approach: a methodology that leverages tailored prompt engineering to assess LLMs on their ability to detect SOLID violations across multiple languages. We present a benchmark of four leading LLMs-CodeLlama, DeepSeekCoder, QwenCoder, and GPT-4o Mini-on their ability to detect violations of all five SOLID principles. For this evaluation, we construct a new benchmark dataset of 240 manually validated code examples. Using this dataset, we test four distinct prompt strategies inspired by established zero-shot, few-shot, and chain-of-thought techniques to systematically measure their impact on detection accuracy. Our emerging results reveal a stark hierarchy among models, with GPT-4o Mini decisively outperforming others, yet even struggles with challenging principles like DIP. Crucially, we show that prompt strategy has a dramatic impact, but no single strategy is universally best; for instance, a deliberative ENSEMBLE prompt excels at OCP detection while a hint-based EXAMPLE prompt is superior for DIP violations. Across all experiments, detection accuracy is heavily influenced by language characteristics and degrades sharply with increasing code complexity. These initial findings demonstrate that effective, AI-driven design analysis requires not a single best model, but a tailored approach that matches the right model and prompt to the specific design context, highlighting the potential of LLMs to support maintainability through AI-assisted code analysis.

[157] arXiv:2509.03095 [pdf, html, other]
Title: TRELLIS-Enhanced Surface Features for Comprehensive Intracranial Aneurysm Analysis
Clément Hervé, Paul Garnier, Jonathan Viquerat, Elie Hachem
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Intracranial aneurysms pose a significant clinical risk yet are difficult to detect, delineate and model due to limited annotated 3D data. We propose a cross-domain feature-transfer approach that leverages the latent geometric embeddings learned by TRELLIS, a generative model trained on large-scale non-medical 3D datasets, to augment neural networks for aneurysm analysis. By replacing conventional point normals or mesh descriptors with TRELLIS surface features, we systematically enhance three downstream tasks: (i) classifying aneurysms versus healthy vessels in the Intra3D dataset, (ii) segmenting aneurysm and vessel regions on 3D meshes, and (iii) predicting time-evolving blood-flow fields using a graph neural network on the AnXplore dataset. Our experiments show that the inclusion of these features yields strong gains in accuracy, F1-score and segmentation quality over state-of-the-art baselines, and reduces simulation error by 15\%. These results illustrate the broader potential of transferring 3D representations from general-purpose generative models to specialized medical tasks.

[158] arXiv:2509.03098 [pdf, other]
Title: Compressed verification for post-quantum signatures with long-term public keys
Gustavo Banegas (LIX, GRACE), Anaëlle Le Dévéhat (GRACE, LIX), Benjamin Smith (GRACE, LIX)
Journal-ref: 24th International Conference on Cryptology and Network Security, Nov 2025, Osaka, Japan
Subjects: Cryptography and Security (cs.CR)

Many signature applications-such as root certificates, secure software updates, and authentication protocols-involve long-lived public keys that are transferred or installed once and then used for many verifications. This key longevity makes post-quantum signature schemes with conservative assumptions (e.g., structure-free lattices) attractive for long-term security. But many such schemes, especially those with short signatures, suffer from extremely large public keys. Even in scenarios where bandwidth is not a major concern, large keys increase storage costs and slow down verification. We address this with a method to replace large public keys in GPV-style signatures with smaller, private verification keys. This significantly reduces verifier storage and runtime while preserving security. Applied to the conservative, short-signature schemes Wave and Squirrels, our method compresses Squirrels-I keys from 665 kB to 20.7 kB and Wave822 keys from 3.5 MB to 207.97 kB.

[159] arXiv:2509.03102 [pdf, html, other]
Title: CARPO: Leveraging Listwise Learning-to-Rank for Context-Aware Query Plan Optimization
Wenrui Zhou, Qiyu Liu, Jingshu Peng, Aoqian Zhang, Lei Chen
Subjects: Databases (cs.DB)

Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to flawed heuristics and inaccurate cost models, leading to the emergence of Learned Query Optimizers (LQOs). To address challenges in existing LQOs, such as the inconsistency and suboptimality inherent in pairwise ranking methods, we introduce CARPO, a generic framework leveraging listwise learning-to-rank for context-aware query plan optimization. CARPO distinctively employs a Transformer-based model for holistic evaluation of candidate plan sets and integrates a robust hybrid decision mechanism, featuring Out-Of-Distribution (OOD) detection with a top-$k$ fallback strategy to ensure reliability. Furthermore, CARPO can be seamlessly integrated with existing plan embedding techniques, demonstrating strong adaptability. Comprehensive experiments on TPC-H and STATS benchmarks demonstrate that CARPO significantly outperforms both native PostgreSQL and Lero, achieving a Top-1 Rate of \textbf{74.54\%} on the TPC-H benchmark compared to Lero's 3.63\%, and reducing the total execution time to 3719.16 ms compared to PostgreSQL's 22577.87 ms.

[160] arXiv:2509.03103 [pdf, other]
Title: FastCaps: A Design Methodology for Accelerating Capsule Network on Field Programmable Gate Arrays
Abdul Rahoof, Vivek Chaturvedi, Muhammad Shafique
Comments: 2023 International Joint Conference on Neural Networks (IJCNN)
Subjects: Hardware Architecture (cs.AR)

Capsule Network (CapsNet) has shown significant improvement in understanding the variation in images along with better generalization ability compared to traditional Convolutional Neural Network (CNN). CapsNet preserves spatial relationship among extracted features and apply dynamic routing to efficiently learn the internal connections between capsules. However, due to the capsule structure and the complexity of the routing mechanism, it is non-trivial to accelerate CapsNet performance in its original form on Field Programmable Gate Array (FPGA). Most of the existing works on CapsNet have achieved limited acceleration as they implement only the dynamic routing algorithm on FPGA, while considering all the processing steps synergistically is important for real-world applications of Capsule Networks. Towards this, we propose a novel two-step approach that deploys a full-fledged CapsNet on FPGA. First, we prune the network using a novel Look-Ahead Kernel Pruning (LAKP) methodology that uses the sum of look-ahead scores of the model parameters. Next, we simplify the nonlinear operations, reorder loops, and parallelize operations of the routing algorithm to reduce CapsNet hardware complexity. To the best of our knowledge, this is the first work accelerating a full-fledged CapsNet on FPGA. Experimental results on the MNIST and F-MNIST datasets (typical in Capsule Network community) show that the proposed LAKP approach achieves an effective compression rate of 99.26% and 98.84%, and achieves a throughput of 82 FPS and 48 FPS on Xilinx PYNQ-Z1 FPGA, respectively. Furthermore, reducing the hardware complexity of the routing algorithm increases the throughput to 1351 FPS and 934 FPS respectively. As corroborated by our results, this work enables highly performance-efficient deployment of CapsNets on low-cost FPGA that are popular in modern edge devices.

[161] arXiv:2509.03104 [pdf, html, other]
Title: The High Cost of Keeping Warm: Characterizing Overhead in Serverless Autoscaling Policies
Leonid Kondrashov, Boxi Zhou, Hancheng Wang, Dmitrii Ustiugov
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Serverless computing is transforming cloud application development, but the performance-cost trade-offs of control plane designs remain poorly understood due to a lack of open, cross-platform benchmarks and detailed system analyses. In this work, we address these gaps by designing a serverless system that approximates the scaling behaviors of commercial providers, including AWS Lambda and Google Cloud Run. We systematically compare the performance and cost-efficiency of both synchronous and asynchronous autoscaling policies by replaying real-world workloads and varying key autoscaling parameters.
We demonstrate that our open-source systems can closely replicate the operational characteristics of commercial platforms, enabling reproducible and transparent experimentation. By evaluating how autoscaling parameters affect latency, memory usage, and CPU overhead, we reveal several key findings. First, we find that serverless systems exhibit significant computational overhead due to instance churn equivalent to 10-40% of the CPU cycles spent on request handling, primarily originating from worker nodes. Second, we observe high memory allocation due to scaling policy: 2-10 times more than actively used. Finally, we demonstrate that reducing these overheads typically results in significant performance degradation in the current systems, underscoring the need for new, cost-efficient autoscaling strategies. Additionally, we employ a hybrid methodology that combines real control plane deployments with large-scale simulation to extend our evaluation closer to a production scale, thereby bridging the gap between small research clusters and real-world environments.

[162] arXiv:2509.03108 [pdf, html, other]
Title: Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods
Shota Iwamatsu, Koichi Ito, Takafumi Aoki
Comments: 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face recognition systems are robust against environmental changes and noise, and thus may be vulnerable to illegal authentication attempts using user face photos, such as spoofing attacks. To prevent such spoofing attacks, it is crucial to discriminate whether the input image is a live user image or a spoofed image prior to the face recognition process. Most existing spoofing attack detection methods utilize deep learning, which necessitates a substantial amount of training data. Consequently, if malicious data is injected into a portion of the training dataset, a specific spoofing attack may be erroneously classified as live, leading to false this http URL this paper, we propose a novel backdoor poisoning attack method to demonstrate the latent threat of backdoor poisoning within face anti-spoofing detection. The proposed method enables certain spoofing attacks to bypass detection by embedding features extracted from the spoofing attack's face image into a live face image without inducing any perceptible visual this http URL experiments conducted on public datasets, we demonstrate that the proposed method constitutes a realistic threat to existing spoofing attack detection systems.

[163] arXiv:2509.03110 [pdf, html, other]
Title: LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization
Yunfei Teng, Sixin Zhang
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

While Sharpness-Aware Minimization (SAM) improves generalization in deep neural networks by minimizing both loss and sharpness, it suffers from inefficiency in distributed large-batch training. We present Landscape-Smoothed SAM (LSAM), a novel optimizer that preserves SAM's generalization advantages while offering superior efficiency. LSAM integrates SAM's adversarial steps with an asynchronous distributed sampling strategy, generating an asynchronous distributed sampling scheme, producing a smoothed sharpness-aware loss landscape for optimization. This design eliminates synchronization bottlenecks, accelerates large-batch convergence, and delivers higher final accuracy compared to data-parallel SAM.

[164] arXiv:2509.03112 [pdf, other]
Title: Information transmission: Inferring change area from change moment in time series remote sensing images
Jialu Li, Chen Wu, Meiqi Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Time series change detection is a critical task for exploring ecosystem dynamics using time series remote sensing images, because it can simultaneously indicate where and when change occur. While deep learning has shown excellent performance in this domain, it continues to approach change area detection and change moment identification as distinct tasks. Given that change area can be inferred from change moment, we propose a time series change detection network, named CAIM-Net (Change Area Inference from Moment Network), to ensure consistency between change area and change moment results. CAIM-Net infers change area from change moment based on the intrinsic relationship between time series analysis and spatial change detection. The CAIM-Net comprises three key steps: Difference Extraction and Enhancement, Coarse Change Moment Extraction, and Fine Change Moment Extraction and Change Area Inference. In the Difference Extraction and Enhancement, a lightweight encoder with batch dimension stacking is designed to rapidly extract difference features. Subsequently, boundary enhancement convolution is applied to amplify these difference features. In the Coarse Change Moment Extraction, the enhanced difference features from the first step are used to spatiotemporal correlation analysis, and then two distinct methods are employed to determine coarse change moments. In the Fine Change Moment Extraction and Change Area Inference, a multiscale temporal Class Activation Mapping (CAM) module first increases the weight of the change-occurring moment from coarse change moments. Then the weighted change moment is used to infer change area based on the fact that pixels with the change moment must have undergone a change.

[165] arXiv:2509.03113 [pdf, html, other]
Title: Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection
Shan Wang, Maying Shen, Nadine Chang, Chuong Nguyen, Hongdong Li, Jose M. Alvarez
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Hallucinations in multimodal large language model are caused by the text-visual bias and the co-occurrence bias. The former reflects an over-reliance on text information in the decision-making process, while the latter arises from the statistical object-pairing patterns abstracted from the training data. Existing mitigation methods heuristically address these biases without understanding the fluctuating bias level across the instances. We first propose estimating the influence of respective token types (visual, prompt, and previous outputs) using a gradient-based self-reflection method. The estimated token influence further enables the detection of object-related visual tokens and their integration into an influence-aware contrastive decoding framework to mitigate both types of biases simultaneously. Our method operates without the need for additional resources, such as costly fine-tuning, extra models, or data statistics. Extensive experiments show it effectively reduces hallucinations, achieving up to a 92% accuracy increase on LLaVA-QA90.

[166] arXiv:2509.03114 [pdf, html, other]
Title: Towards Realistic Hand-Object Interaction with Gravity-Field Based Diffusion Bridge
Miao Xu, Xiangyu Zhu, Xusheng Liang, Zidu Wang, Jinlin Wu, Zhen Lei
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing reconstruction or hand-object pose estimation methods are capable of producing coarse interaction states. However, due to the complex and diverse geometry of both human hands and objects, these approaches often suffer from interpenetration or leave noticeable gaps in regions that are supposed to be in contact. Moreover, the surface of a real human hand undergoes non-negligible deformations during interaction, which are difficult to capture and represent with previous methods. To tackle these challenges, we formulate hand-object interaction as an attraction-driven process and propose a Gravity-Field Based Diffusion Bridge (GravityDB) to simulate interactions between a deformable hand surface and rigid objects. Our approach effectively resolves the aforementioned issues by generating physically plausible interactions that are free of interpenetration, ensure stable grasping, and capture realistic hand deformations. Furthermore, we incorporate semantic information from textual descriptions to guide the construction of the gravitational field, enabling more semantically meaningful interaction regions. Extensive qualitative and quantitative experiments on multiple datasets demonstrate the effectiveness of our method.

[167] arXiv:2509.03116 [pdf, other]
Title: Measuring Scalar Constructs in Social Science with LLMs
Hauke Licht, Rupak Sarkar, Patrick Y. Wu, Pranav Goel, Niklas Stoehr, Elliott Ash, Alexander Miserlis Hoyle
Comments: Accepted to EMNLP 2025 (Main)
Subjects: Computation and Language (cs.CL)

Many constructs that characterize language, like its complexity or emotionality, have a naturally continuous semantic structure; a public speech is not just "simple" or "complex," but exists on a continuum between extremes. Although large language models (LLMs) are an attractive tool for measuring scalar constructs, their idiosyncratic treatment of numerical outputs raises questions of how to best apply them. We address these questions with a comprehensive evaluation of LLM-based approaches to scalar construct measurement in social science. Using multiple datasets sourced from the political science literature, we evaluate four approaches: unweighted direct pointwise scoring, aggregation of pairwise comparisons, token-probability-weighted pointwise scoring, and finetuning. Our study yields actionable findings for applied researchers. First, LLMs prompted to generate pointwise scores directly from texts produce discontinuous distributions with bunching at arbitrary numbers. The quality of the measurements improves with pairwise comparisons made by LLMs, but it improves even more by taking pointwise scores and weighting them by token probability. Finally, finetuning smaller models with as few as 1,000 training pairs can match or exceed the performance of prompted LLMs.

[168] arXiv:2509.03117 [pdf, html, other]
Title: PromptCOS: Towards System Prompt Copyright Auditing for LLMs via Content-level Output Similarity
Yuchen Yang, Yiming Li, Hongwei Yao, Enhao Huang, Shuo Shao, Bingrun Yang, Zhibo Wang, Dacheng Tao, Zhan Qin
Subjects: Cryptography and Security (cs.CR)

The rapid progress of large language models (LLMs) has greatly enhanced reasoning tasks and facilitated the development of LLM-based applications. A critical factor in improving LLM-based applications is the design of effective system prompts, which significantly impact the behavior and output quality of LLMs. However, system prompts are susceptible to theft and misuse, which could undermine the interests of prompt owners. Existing methods protect prompt copyrights through watermark injection and verification but face challenges due to their reliance on intermediate LLM outputs (e.g., logits), which limits their practical feasibility.
In this paper, we propose PromptCOS, a method for auditing prompt copyright based on content-level output similarity. It embeds watermarks by optimizing the prompt while simultaneously co-optimizing a special verification query and content-level signal marks. This is achieved by leveraging cyclic output signals and injecting auxiliary tokens to ensure reliable auditing in content-only scenarios. Additionally, it incorporates cover tokens to protect the watermark from malicious deletion. For copyright verification, PromptCOS identifies unauthorized usage by comparing the similarity between the suspicious output and the signal mark. Experimental results demonstrate that our method achieves high effectiveness (99.3% average watermark similarity), strong distinctiveness (60.8% greater than the best baseline), high fidelity (accuracy degradation of no more than 0.58%), robustness (resilience against three types of potential attacks), and computational efficiency (up to 98.1% reduction in computational cost). Our code is available at GitHub this https URL.

[169] arXiv:2509.03118 [pdf, html, other]
Title: A Hierarchical Deep Reinforcement Learning Framework for Traffic Signal Control with Predictable Cycle Planning
Hankang Gu, Yuli Zhang, Chengming Wang, Ruiyuan Jiang, Ziheng Qiao, Pengfei Fan, Dongyao Jia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Deep reinforcement learning (DRL) has become a popular approach in traffic signal control (TSC) due to its ability to learn adaptive policies from complex traffic environments. Within DRL-based TSC methods, two primary control paradigms are ``choose phase" and ``switch" strategies. Although the agent in the choose phase paradigm selects the next active phase adaptively, this paradigm may result in unexpected phase sequences for drivers, disrupting their anticipation and potentially compromising safety at intersections. Meanwhile, the switch paradigm allows the agent to decide whether to switch to the next predefined phase or extend the current phase. While this structure maintains a more predictable order, it can lead to unfair and inefficient phase allocations, as certain movements may be extended disproportionately while others are neglected. In this paper, we propose a DRL model, named Deep Hierarchical Cycle Planner (DHCP), to allocate the traffic signal cycle duration hierarchically. A high-level agent first determines the split of the total cycle time between the North-South (NS) and East-West (EW) directions based on the overall traffic state. Then, a low-level agent further divides the allocated duration within each major direction between straight and left-turn movements, enabling more flexible durations for the two movements. We test our model on both real and synthetic road networks, along with multiple sets of real and synthetic traffic flows. Empirical results show our model achieves the best performance over all datasets against baselines.

[170] arXiv:2509.03119 [pdf, html, other]
Title: Forbal: Force Balanced 2-5 Degree of Freedom Robot Manipulator Built from a Five Bar Linkage
Yash Vyas, Matteo Bottin
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

A force balanced manipulator design based on the closed chain planar five bar linkage is developed and experimentally validated. We present 2 variants as a modular design: Forbal-2, a planar 2-DOF manipulator, and its extension to 5-DOF spatial motion called Forbal-5. The design considerations in terms of geometric, kinematic, and dynamic design that fulfill the force balance conditions while maximizing workspace are discussed. Then, the inverse kinematics of both variants are derived from geometric principles.
We validate the improvements from force balancing the manipulator through comparative experiments with counter mass balanced and unbalanced configurations. The results show how the balanced configuration yields a reduction in the average reaction moments of up to 66\%, a reduction of average joint torques of up to 79\%, as well as a noticeable reduction in position error for Forbal-2. For Forbal-5, which has a higher end effector payload mass, the joint torques are reduced up to 84\% for the balanced configuration. Experimental results validate that the balanced manipulator design is suitable for applications where the reduction of joint torques and reaction forces/moments helps achieve millimeter level precision.

[171] arXiv:2509.03122 [pdf, html, other]
Title: From Evaluation to Defense: Constructing Persistent Edit-Based Fingerprints for Large Language Models
Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Xiaoling Wang, Linlin Wang
Comments: preprint
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The intellectual property (IP) protection of Large Language Models (LLMs) is increasingly critical. Injecting specialized fingerprints into LLMs through instruction tuning is a common IP protection technique. However, this may significantly degrade model performance, requires substantial computational resources, and exhibits poor persistence under model modifications. We argue that knowledge editing offers a lightweight alternative that is more suitable for fingerprint injection. Accordingly, we apply knowledge editing to fingerprint injection for the first time and demonstrate its strong capability. Despite using scrambled text as fingerprints to prevent them from being overwritten during fine-tuning, degradation still occurs under large-scale fine-tuning. To address this, we propose Fingerprint Subspace-aware Fine-Tuning (FSFT), which reduces fingerprint degradation by constraining the update of the fingerprint subspace. The performance of FSFT exceeds fine-tuning by 10% even in the worst-case scenario. Additionally, we observe that the fingerprint-injected models struggle to distinguish between fingerprints and similar texts due to the high similarity of their features. This finding underscores the urgent need for more robust and fine-grained fingerprinting injection methods for LLMs.

[172] arXiv:2509.03123 [pdf, html, other]
Title: Kangaroo: A Private and Amortized Inference Framework over WAN for Large-Scale Decision Tree Evaluation
Wei Xu, Hui Zhu, Yandong Zheng, Song Bian, Ning Sun, Hao Yuan, Dengguo Feng, Hui Li
Subjects: Cryptography and Security (cs.CR)

With the rapid adoption of Models-as-a-Service, concerns about data and model privacy have become increasingly critical. To solve these problems, various privacy-preserving inference schemes have been proposed. In particular, due to the efficiency and interpretability of decision trees, private decision tree evaluation (PDTE) has garnered significant attention. However, existing PDTE schemes suffer from significant limitations: their communication and computation costs scale with the number of trees, the number of nodes, or the tree depth, which makes them inefficient for large-scale models, especially over WAN networks. To address these issues, we propose Kangaroo, a private and amortized decision tree inference framework build upon packed homomorphic encryption. Specifically, we design a novel model hiding and encoding scheme, together with secure feature selection, oblivious comparison, and secure path evaluation protocols, enabling full amortization of the overhead as the number of nodes or trees scales. Furthermore, we enhance the performance and functionality of the framework through optimizations, including same-sharing-for-same-model, latency-aware, and adaptive encoding adjustment strategies. Kangaroo achieves a $14\times$ to $59\times$ performance improvement over state-of-the-art (SOTA) one-round interactive schemes in WAN environments. For large-scale decision tree inference tasks, it delivers a $3\times$ to $44\times$ speedup compared to existing schemes. Notably, Kangaroo enables the evaluation of a random forest with $969$ trees and $411825$ nodes in approximately $60$ ms per tree (amortized) under WAN environments.

[173] arXiv:2509.03126 [pdf, html, other]
Title: On the Smart Coordination of Flexibility Scheduling in Multi-carrier Integrated Energy Systems
Christian Doh Dinga, Sander van Rijn, Laurens de Vries, Milos Cvetkovic
Subjects: Systems and Control (eess.SY); General Economics (econ.GN); Optimization and Control (math.OC)

Coordinating the interactions between flexibility assets in multi-carrier integrated energy systems (MIES) can lead to an efficient integration of variable renewable energy resources, and a cost-efficient energy transition. However, the proliferation of flexibility assets and their participation in active demand response increases the complexity of coordinating these interactions. This paper introduces different approaches to model the coordination of flexibility scheduling in MIES. We propose a market auction-inspired model coupling approach to address the challenges of preserving the autonomy and privacy of flexibility providers, and the issue of scalability. We benchmark our approach against co-optimization and an iterative price-response method by conducting experiments with varying problem sizes and computing infrastructure. We show that our approach scales well and is suitable for modeling flexibility in large-scale energy systems in a more realistic way. From an optimality standpoint, the flexibility dispatch schedules and electricity prices are ``near-optimal". Our methodology is implemented as a new open-source software, which offers several practical applications. For example, flexibility providers and network operators can couple their models to simulate the interaction between their systems without disclosing confidential information; policy regulators can use it to investigate new market design and regulations to optimize the utilization of flexibility in MIES.

[174] arXiv:2509.03128 [pdf, html, other]
Title: Successive Cancellation Decoding For General Monotone Chain Polar Codes
Zichang Ren, Chunhang Zheng, Dou Li, Yuping Zhao
Subjects: Information Theory (cs.IT)

Monotone chain polar codes generalize classical polar codes to multivariate settings, offering a flexible approach for achieving the entire admissible rate region in the distributed lossless coding problem. However, this flexibility also introduces significant challenges for existing successive cancellation (SC) based decoding schemes. Motivated by the need for a general SC decoding solution, we present a comprehensive decoding strategy for monotone chain polar codes that can handle arbitrary numbers of terminals, non-binary alphabets, and decoding along arbitrary monotone chains. Specifically, we formulate the SC decoding task as a series of inference subtasks over the polar transform and propose a computational graph framework based on probability propagation principles. This approach highlights the impact of variable switching during decoding and shows that time complexity varies between $O(N\log{N})$ and $O(N^2)$, depending on the specific chain structure. Moreover, we demonstrate that the widely used $O(N)$ space optimization is not universally applicable to monotone chain polar codes, which prompts us to introduce a constant-time decoder forking strategy based on the proposed logical computation graphs. This strategy enables time-efficient list decoding without relying on $O(N)$-space techniques. Numerical results verify the superior performance of the proposed scheme compared with the classical lazy-copy scheme.

[175] arXiv:2509.03130 [pdf, html, other]
Title: A Plug-and-play Model-agnostic Embedding Enhancement Approach for Explainable Recommendation
Yunqi Mi, Boyang Yan, Guoshuai Zhao, Jialie Shen, Xueming Qian
Subjects: Information Retrieval (cs.IR)

Existing multimedia recommender systems provide users with suggestions of media by evaluating the similarities, such as games and movies. To enhance the semantics and explainability of embeddings, it is a consensus to apply additional information (e.g., interactions, contexts, popularity). However, without systematic consideration of representativeness and value, the utility and explainability of embedding drops drastically. Hence, we introduce RVRec, a plug-and-play model-agnostic embedding enhancement approach that can improve both personality and explainability of existing systems. Specifically, we propose a probability-based embedding optimization method that uses a contrastive loss based on negative 2-Wasserstein distance to learn to enhance the representativeness of the embeddings. In addtion, we introduce a reweighing method based on multivariate Shapley values strategy to evaluate and explore the value of interactions and embeddings. Extensive experiments on multiple backbone recommenders and real-world datasets show that RVRec can improve the personalization and explainability of existing recommenders, outperforming state-of-the-art baselines.

[176] arXiv:2509.03131 [pdf, html, other]
Title: RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
Sashuai Zhou, Weinan Gan, Qijiong Liu, Ke Lei, Jieming Zhu, Hai Huang, Yan Xia, Ruiming Tang, Zhenhua Dong, Zhou Zhao
Journal-ref: EMNLP 2025
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Recent advances in LLM-based recommendation have shown promise, yet their cross-domain generalization is hindered by a fundamental mismatch between language-centric pretraining and the recommendation task. Existing methods, relying on language-level knowledge, fail to capture dynamic, item-level user interests across domains. To bridge this gap, we propose RecBase, a domain-agnostic foundational model pretrained with a recommendation-oriented objective. RecBase leverages a large-scale, heterogeneous, cross-domain corpus with unified textual representations and feature mappings to enhance cross-domain generalization. To further align item semantics across domains, we introduce a unified item tokenizer that encodes items into hierarchical concept identifiers, enabling structured representation and efficient vocabulary sharing. The model is trained using an autoregressive objective to capture complex item-level sequential patterns. On eight real-world datasets, our 1.5B-parameter model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.

[177] arXiv:2509.03136 [pdf, html, other]
Title: Adaptive KV-Cache Compression without Manually Setting Budget
Chenxia Tang, Jianchun Liu, Hongli Xu, Liusheng Huang
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Large language models (LLMs) inference relies heavily on KV-caches to accelerate autoregressive decoding, but the resulting memory footprint grows rapidly with sequence length, posing significant efficiency challenges. Current KV-cache compression methods suffer from a Procrustes' bed problem: they force diverse workloads into fixed compression ratios, leading to suboptimal resource allocation and inference performance. To this end, we present GVote, an adaptive KV-cache compression scheme that eliminates manual budget specification while achieving superior accuracy-efficiency trade-offs. GVote operates on the principle that the important keys are the aggregation of keys required by future queries. The method predicts future query attention demands by Monte-Carlo style sampling potential queries and aggregating selected keys to determine the optimal cache budget without manual specification. Experimental evaluation demonstrates GVote's effectiveness across multiple benchmarks, including GSM8K, RULER and Longbench. Compared to baselines, GVote exhibits 2$\times$ memory reduction while the accuracy maintains higher or comparable.

[178] arXiv:2509.03137 [pdf, html, other]
Title: A Neural Network Approach to Multi-radionuclide TDCR Beta Spectroscopy
Li Yi, Qian Yang
Comments: 15 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Nuclear Experiment (nucl-ex); Computational Physics (physics.comp-ph); Instrumentation and Detectors (physics.ins-det)

Liquid scintillation triple-to-doubly coincident ratio (TDCR) spectroscopy is widely adopted as a standard method for radionuclide quantification because of its inherent advantages such as high precision, self-calibrating capability, and independence from radioactive reference sources. However, multiradionuclide analysis via TDCR faces the challenges of limited automation and reliance on mixture-specific standards, which may not be easily available. Here, we present an Artificial Intelligence (AI) framework that combines numerical spectral simulation and deep learning for standard-free automated analysis. $\beta$ spectra for model training were generated using Geant4 simulations coupled with statistically modeled detector response sampling. A tailored neural network architecture, trained on this dataset covering various nuclei mix ratio and quenching scenarios, enables autonomous resolution of individual radionuclide activities and detecting efficiency through end-to-end learning paradigms. The model delivers consistent high accuracy across tasks: activity proportions (mean absolute error = 0.009), detection efficiencies (mean absolute error = 0.002), and spectral reconstruction (Structural Similarity Index = 0.9998), validating its physical plausibility for quenched $\beta$ spectroscopy. This AI-driven methodology exhibits significant potential for automated safety-compliant multiradionuclide analysis with robust generalization, real-time processing capabilities, and engineering feasibility, particularly in scenarios where reference materials are unavailable or rapid field analysis is required.

[179] arXiv:2509.03140 [pdf, html, other]
Title: Decentralised self-organisation of pivoting cube ensembles using geometric deep learning
Nadezhda Dobreva, Emmanuel Blazquez, Jai Grover, Dario Izzo, Yuzhen Qin, Dominik Dold
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Robotics (cs.RO)

We present a decentralized model for autonomous reconfiguration of homogeneous pivoting cube modular robots in two dimensions. Each cube in the ensemble is controlled by a neural network that only gains information from other cubes in its local neighborhood, trained using reinforcement learning. Furthermore, using geometric deep learning, we include the grid symmetries of the cube ensemble in the neural network architecture. We find that even the most localized versions succeed in reconfiguring to the target shape, although reconfiguration happens faster the more information about the whole ensemble is available to individual cubes. Near-optimal reconfiguration is achieved with only nearest neighbor interactions by using multiple information passing between cubes, allowing them to accumulate more global information about the ensemble. Compared to standard neural network architectures, using geometric deep learning approaches provided only minor benefits. Overall, we successfully demonstrate mostly local control of a modular self-assembling system, which is transferable to other space-relevant systems with different action spaces, such as sliding cube modular robots and CubeSat swarms.

[180] arXiv:2509.03141 [pdf, html, other]
Title: Temporally-Aware Diffusion Model for Brain Progression Modelling with Bidirectional Temporal Regularisation
Mattia Litrico, Francesco Guarnera, Mario Valerio Giuffrida, Daniele Ravì, Sebastiano Battiato
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Generating realistic MRIs to accurately predict future changes in the structure of brain is an invaluable tool for clinicians in assessing clinical outcomes and analysing the disease progression at the patient level. However, current existing methods present some limitations: (i) some approaches fail to explicitly capture the relationship between structural changes and time intervals, especially when trained on age-imbalanced datasets; (ii) others rely only on scan interpolation, which lack clinical utility, as they generate intermediate images between timepoints rather than future pathological progression; and (iii) most approaches rely on 2D slice-based architectures, thereby disregarding full 3D anatomical context, which is essential for accurate longitudinal predictions. We propose a 3D Temporally-Aware Diffusion Model (TADM-3D), which accurately predicts brain progression on MRI volumes. To better model the relationship between time interval and brain changes, TADM-3D uses a pre-trained Brain-Age Estimator (BAE) that guides the diffusion model in the generation of MRIs that accurately reflect the expected age difference between baseline and generated follow-up scans. Additionally, to further improve the temporal awareness of TADM-3D, we propose the Back-In-Time Regularisation (BITR), by training TADM-3D to predict bidirectionally from the baseline to follow-up (forward), as well as from the follow-up to baseline (backward). Although predicting past scans has limited clinical applications, this regularisation helps the model generate temporally more accurate scans. We train and evaluate TADM-3D on the OASIS-3 dataset, and we validate the generalisation performance on an external test set from the NACC dataset. The code will be available upon acceptance.

[181] arXiv:2509.03143 [pdf, html, other]
Title: An experimental and computational study of an Estonian single-person word naming
Kaidi Lõo, Arvi Tavast, Maria Heitmeier, Harald Baayen
Subjects: Computation and Language (cs.CL)

This study investigates lexical processing in Estonian. A large-scale single-subject experiment is reported that combines the word naming task with eye-tracking. Five response variables (first fixation duration, total fixation duration, number of fixations, word naming latency, and spoken word duration) are analyzed with the generalized additive model. Of central interest is the question of whether measures for lexical processing generated by a computational model of the mental lexicon (the Discriminative Lexicon Model, DLM) are predictive for these response variables, and how they compare to classical predictors such as word frequency, neighborhood size, and inflectional paradigm size. Computational models were implemented both with linear and deep mappings. Central findings are, first, that DLM-based measures are powerful predictors for lexical processing, second, that DLM-measures using deep learning are not necessarily more precise predictors of lexical processing than DLM-measures using linear mappings, third, that classical predictors tend to provide somewhat more precise fits compared to DLM-based predictors (except for total fixation duration, where the two provide equivalent goodness of fit), and fourth, that in the naming task lexical variables are not predictive for first fixation duration and the total number of fixations. As the DLM works with mappings from form to meaning, the predictivity of DLM-based measures for total fixation duration, naming latencies, and spoken word duration indicates that meaning is heavily involved in the present word naming task.

[182] arXiv:2509.03145 [pdf, html, other]
Title: Efficient and Secure Sleepy Model for BFT Consensus
Pengkun Ren, Hai Dong, Zahir Tari, Pengcheng Zhang
Comments: Accepted to ESORICS 2025, 20 pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Byzantine Fault Tolerant (BFT) consensus protocols for dynamically available systems face a critical challenge: balancing latency and security in fluctuating node participation. Existing solutions often require multiple rounds of voting per decision, leading to high latency or limited resilience to adversarial behavior. This paper presents a BFT protocol integrating a pre-commit mechanism with publicly verifiable secret sharing (PVSS) into message transmission. By binding users' identities to their messages through PVSS, our approach reduces communication rounds. Compared to other state-of-the-art methods, our protocol typically requires only four network delays (4$\Delta$) in common scenarios while being resilient to up to 1/2 adversarial participants. This integration enhances the efficiency and security of the protocol without compromising integrity. Theoretical analysis demonstrates the robustness of the protocol against Byzantine attacks. Experimental evaluations show that, compared to traditional BFT protocols, our protocol significantly prevents fork occurrences and improves chain stability. Furthermore, compared to longest-chain protocol, our protocol maintains stability and lower latency in scenarios with moderate participation fluctuations.

[183] arXiv:2509.03148 [pdf, html, other]
Title: Expanding the WMT24++ Benchmark with Rumantsch Grischun, Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader
Jannis Vamvas, Ignacio Pérez Prat, Not Battesta Soliva, Sandra Baltermia-Guetg, Andrina Beeli, Simona Beeli, Madlaina Capeder, Laura Decurtins, Gian Peder Gregori, Flavia Hobi, Gabriela Holderegger, Arina Lazzarini, Viviana Lazzarini, Walter Rosselli, Bettina Vital, Anna Rutkiewicz, Rico Sennrich
Comments: Submitted to WMT25 (Open Language Data Initiative Shared Task)
Subjects: Computation and Language (cs.CL)

The Romansh language, spoken in Switzerland, has limited resources for machine translation evaluation. In this paper, we present a benchmark for six varieties of Romansh: Rumantsch Grischun, a supra-regional variety, and five regional varieties: Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader. Our reference translations were created by human translators based on the WMT24++ benchmark, which ensures parallelism with more than 55 other languages. An automatic evaluation of existing MT systems and LLMs shows that translation out of Romansh into German is handled relatively well for all the varieties, but translation into Romansh is still challenging.

[184] arXiv:2509.03151 [pdf, html, other]
Title: Convergence for adaptive resampling of random Fourier features
Xin Huang, Aku Kammonen, Anamika Pandey, Mattias Sandberg, Erik von Schwerin, Anders Szepessy, Raúl Tempone
Comments: 50 pages, 19 figures
Subjects: Numerical Analysis (math.NA); Machine Learning (stat.ML)

The machine learning random Fourier feature method for data in high dimension is computationally and theoretically attractive since the optimization is based on a convex standard least squares problem and independent sampling of Fourier frequencies. The challenge is to sample the Fourier frequencies well. This work proves convergence of a data adaptive method based on resampling the frequencies asymptotically optimally, as the number of nodes and amount of data tend to infinity. Numerical results based on resampling and adaptive random walk steps together with approximations of the least squares problem by conjugate gradient iterations confirm the analysis for regression and classification problems.

[185] arXiv:2509.03154 [pdf, html, other]
Title: Preserving instance continuity and length in segmentation through connectivity-aware loss computation
Karol Szustakowski, Luk Frank, Julia Esser, Jan Gründemann, Marie Piraud
Comments: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In many biomedical segmentation tasks, the preservation of elongated structure continuity and length is more important than voxel-wise accuracy. We propose two novel loss functions, Negative Centerline Loss and Simplified Topology Loss, that, applied to Convolutional Neural Networks (CNNs), help preserve connectivity of output instances. Moreover, we discuss characteristics of experiment design, such as downscaling and spacing correction, that help obtain continuous segmentation masks. We evaluate our approach on a 3D light-sheet fluorescence microscopy dataset of axon initial segments (AIS), a task prone to discontinuity due to signal dropout. Compared to standard CNNs and existing topology-aware losses, our methods reduce the number of segmentation discontinuities per instance, particularly in regions with missing input signal, resulting in improved instance length calculation in downstream applications. Our findings demonstrate that structural priors embedded in the loss design can significantly enhance the reliability of segmentation for biological applications.

[186] arXiv:2509.03161 [pdf, html, other]
Title: Domain Adaptation of LLMs for Process Data
Rafael Seidi Oyamada, Jari Peeperkorn, Jochen De Weerdt, Johannes De Smedt
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In recent years, Large Language Models (LLMs) have emerged as a prominent area of interest across various research domains, including Process Mining (PM). Current applications in PM have predominantly centered on prompt engineering strategies or the transformation of event logs into narrative-style datasets, thereby exploiting the semantic capabilities of LLMs to address diverse tasks. In contrast, this study investigates the direct adaptation of pretrained LLMs to process data without natural language reformulation, motivated by the fact that these models excel in generating sequences of tokens, similar to the objective in PM. More specifically, we focus on parameter-efficient fine-tuning techniques to mitigate the computational overhead typically associated with such models. Our experimental setup focuses on Predictive Process Monitoring (PPM), and considers both single- and multi-task predictions. The results demonstrate a potential improvement in predictive performance over state-of-the-art recurrent neural network (RNN) approaches and recent narrative-style-based solutions, particularly in the multi-task setting. Additionally, our fine-tuned models exhibit faster convergence and require significantly less hyperparameter optimization.

[187] arXiv:2509.03162 [pdf, html, other]
Title: SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala
Ashmari Pramodya, Nirasha Nelki, Heshan Shalinda, Chamila Liyanage, Yusuke Sakai, Randil Pushpananda, Ruvan Weerasinghe, Hidetaka Kamigaito, Taro Watanabe
Comments: 19 pages, 11 figures
Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) demonstrate impressive general knowledge and reasoning abilities, yet their evaluation has predominantly focused on global or anglocentric subjects, often neglecting low-resource languages and culturally specific content. While recent multilingual benchmarks attempt to bridge this gap, many rely on automatic translation, which can introduce errors and misrepresent the original cultural context. To address this, we introduce SinhalaMMLU, the first multiple-choice question answering benchmark designed specifically for Sinhala, a low-resource language. The dataset includes over 7,000 questions spanning secondary to collegiate education levels, aligned with the Sri Lankan national curriculum, and covers six domains and 30 subjects, encompassing both general academic topics and culturally grounded knowledge. We evaluate 26 LLMs on SinhalaMMLU and observe that, while Claude 3.5 sonnet and GPT-4o achieve the highest average accuracies at 67% and 62% respectively, overall model performance remains limited. In particular, models struggle in culturally rich domains such as the Humanities, revealing substantial room for improvement in adapting LLMs to low-resource and culturally specific contexts.

[188] arXiv:2509.03164 [pdf, html, other]
Title: OPRA-Vis: Visual Analytics System to Assist Organization-Public Relationship Assessment with Large Language Models
Sangbong Yoo, Seongbum Seo, Chanyoung Yoon, Hyelim Lee, Jeong-Nam Kim, Chansoo Kim, Yun Jang, Takanori Fujiwara
Subjects: Human-Computer Interaction (cs.HC)

Analysis of public opinions collected from digital media helps organizations maintain positive relationships with the public. Such public relations (PR) analysis often involves assessing opinions, for example, measuring how strongly people trust an organization. Pre-trained Large Language Models (LLMs) hold great promise for supporting Organization-Public Relationship Assessment (OPRA) because they can map unstructured public text to OPRA dimensions and articulate rationales through prompting. However, adapting LLMs for PR analysis typically requires fine-tuning on large labeled datasets, which is both labor-intensive and knowledge-intensive, making it difficult for PR researchers to apply these models. In this paper, we present OPRA-Vis, a visual analytics system that leverages LLMs for OPRA without requiring extensive labeled data. Our framework employs Chain-of-Thought prompting to guide LLMs in analyzing public opinion data by incorporating PR expertise directly into the reasoning process. Furthermore, OPRA-Vis provides visualizations that reveal the clues and reasoning paths used by LLMs, enabling users to explore, critique, and refine model decisions. We demonstrate the effectiveness of OPRA-Vis through two real-world use cases and evaluate it quantitatively, through comparisons with alternative LLMs and prompting strategies, and qualitatively, through assessments of usability, effectiveness, and expert feedback.

[189] arXiv:2509.03168 [pdf, html, other]
Title: Target Enclosing Control for Nonholonomic Multi-Agent Systems with Connectivity Maintenance and Collision Avoidance
Boyin Zheng, Yahui Hao, Lu Liu
Subjects: Systems and Control (eess.SY)

This article addresses the moving target enclosing control problem for nonholonomic multi-agent systems with guaranteed network connectivity and collision avoidance. We propose a novel control scheme to handle distance constraints imposed by the agents' limited interaction ranges and collision-free thresholds. By leveraging a Henneberg construction method, we innovatively formulate the target enclosing requirements within an isostatic distance-based formation framework, facilitating the integration of distance constraints. Compared with existing results, our approach ensures the positive definiteness of the underlying rigidity matrix and does not require controlling the target's motion. To eliminate the occurrences of control singularities caused by nonholonomic constraints, we propose a fixed-time angular control law using barrier Lyapunov functions. Additionally, we develop a linear velocity control law using the prescribed performance control approach and transformed error constraints. We rigorously prove that our control laws enable the multi-agent system to asymptotically achieve the desired angular formation pattern around a moving target while satisfying the established distance constraints. Finally, a simulation example is provided to validate the effectiveness of the proposed method.

[190] arXiv:2509.03169 [pdf, html, other]
Title: Rashomon in the Streets: Explanation Ambiguity in Scene Understanding
Helge Spieker, Jørn Eirik Betten, Arnaud Gotlieb, Nadjib Lazaar, Nassim Belmecheri
Comments: AAAI 2025 Fall Symposium: AI Trustworthiness and Risk Assessment for Challenged Contexts (ATRACC)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Explainable AI (XAI) is essential for validating and trusting models in safety-critical applications like autonomous driving. However, the reliability of XAI is challenged by the Rashomon effect, where multiple, equally accurate models can offer divergent explanations for the same prediction. This paper provides the first empirical quantification of this effect for the task of action prediction in real-world driving scenes. Using Qualitative Explainable Graphs (QXGs) as a symbolic scene representation, we train Rashomon sets of two distinct model classes: interpretable, pair-based gradient boosting models and complex, graph-based Graph Neural Networks (GNNs). Using feature attribution methods, we measure the agreement of explanations both within and between these classes. Our results reveal significant explanation disagreement. Our findings suggest that explanation ambiguity is an inherent property of the problem, not just a modeling artifact.

[191] arXiv:2509.03170 [pdf, html, other]
Title: Count2Density: Crowd Density Estimation without Location-level Annotations
Mattia Litrico, Feng Chen, Michael Pound, Sotirios A Tsaftaris, Sebastiano Battiato, Mario Valerio Giuffrida
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Crowd density estimation is a well-known computer vision task aimed at estimating the density distribution of people in an image. The main challenge in this domain is the reliance on fine-grained location-level annotations, (i.e. points placed on top of each individual) to train deep networks. Collecting such detailed annotations is both tedious, time-consuming, and poses a significant barrier to scalability for real-world applications. To alleviate this burden, we present Count2Density: a novel pipeline designed to predict meaningful density maps containing quantitative spatial information using only count-level annotations (i.e., total number of people) during training. To achieve this, Count2Density generates pseudo-density maps leveraging past predictions stored in a Historical Map Bank, thereby reducing confirmation bias. This bank is initialised using an unsupervised saliency estimator to provide an initial spatial prior and is iteratively updated with an EMA of predicted density maps. These pseudo-density maps are obtained by sampling locations from estimated crowd areas using a hypergeometric distribution, with the number of samplings determined by the count-level annotations. To further enhance the spatial awareness of the model, we add a self-supervised contrastive spatial regulariser to encourage similar feature representations within crowded regions while maximising dissimilarity with background regions. Experimental results demonstrate that our approach significantly outperforms cross-domain adaptation methods and achieves better results than recent state-of-the-art approaches in semi-supervised settings across several datasets. Additional analyses validate the effectiveness of each individual component of our pipeline, confirming the ability of Count2Density to effectively retrieve spatial information from count-level annotations and enabling accurate subregion counting.

[192] arXiv:2509.03171 [pdf, html, other]
Title: Plan More, Debug Less: Applying Metacognitive Theory to AI-Assisted Programming Education
Tung Phung, Heeryung Choi, Mengyan Wu, Adish Singla, Christopher Brooks
Comments: AIED'25 paper
Subjects: Computers and Society (cs.CY)

The growing adoption of generative AI in education highlights the need to integrate established pedagogical principles into AI-assisted learning environments. This study investigates the potential of metacognitive theory to inform AI-assisted programming education through a hint system designed around the metacognitive phases of planning, monitoring, and evaluation. Upon request, the system can provide three types of AI-generated hints--planning, debugging, and optimization--to guide students at different stages of problem-solving. Through a study with 102 students in an introductory data science programming course, we find that students perceive and engage with planning hints most highly, whereas optimization hints are rarely requested. We observe a consistent association between requesting planning hints and achieving higher grades across question difficulty and student competency. However, when facing harder tasks, students seek additional debugging but not more planning support. These insights contribute to the growing field of AI-assisted programming education by providing empirical evidence on the importance of pedagogical principles in AI-assisted learning.

[193] arXiv:2509.03176 [pdf, html, other]
Title: Systematic Evaluation of Attribution Methods: Eliminating Threshold Bias and Revealing Method-Dependent Performance Patterns
Serra Aksoy
Comments: 15 pages, 9 figures
Subjects: Machine Learning (cs.LG)

Attribution methods explain neural network predictions by identifying influential input features, but their evaluation suffers from threshold selection bias that can reverse method rankings and undermine conclusions. Current protocols binarize attribution maps at single thresholds, where threshold choice alone can alter rankings by over 200 percentage points. We address this flaw with a threshold-free framework that computes Area Under the Curve for Intersection over Union (AUC-IoU), capturing attribution quality across the full threshold spectrum. Evaluating seven attribution methods on dermatological imaging, we show single-threshold metrics yield contradictory results, while threshold-free evaluation provides reliable differentiation. XRAI achieves 31% improvement over LIME and 204% over vanilla Integrated Gradients, with size-stratified analysis revealing performance variations up to 269% across lesion scales. These findings establish methodological standards that eliminate evaluation artifacts and enable evidence-based method selection. The threshold-free framework provides both theoretical insight into attribution behavior and practical guidance for robust comparison in medical imaging and beyond.

[194] arXiv:2509.03179 [pdf, html, other]
Title: AutoDetect: Designing an Autoencoder-based Detection Method for Poisoning Attacks on Object Detection Applications in the Military Domain
Alma M. Liezenga, Stefan Wijnja, Puck de Haan, Niels W. T. Brink, Jip J. van Stijn, Yori Kamphuis, Klamer Schutte
Comments: To be presented at SPIE: Sensors + Imaging, Artificial Intelligence for Security and Defence Applications II
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Poisoning attacks pose an increasing threat to the security and robustness of Artificial Intelligence systems in the military domain. The widespread use of open-source datasets and pretrained models exacerbates this risk. Despite the severity of this threat, there is limited research on the application and detection of poisoning attacks on object detection systems. This is especially problematic in the military domain, where attacks can have grave consequences. In this work, we both investigate the effect of poisoning attacks on military object detectors in practice, and the best approach to detect these attacks. To support this research, we create a small, custom dataset featuring military vehicles: MilCivVeh. We explore the vulnerability of military object detectors for poisoning attacks by implementing a modified version of the BadDet attack: a patch-based poisoning attack. We then assess its impact, finding that while a positive attack success rate is achievable, it requires a substantial portion of the data to be poisoned -- raising questions about its practical applicability. To address the detection challenge, we test both specialized poisoning detection methods and anomaly detection methods from the visual industrial inspection domain. Since our research shows that both classes of methods are lacking, we introduce our own patch detection method: AutoDetect, a simple, fast, and lightweight autoencoder-based method. Our method shows promising results in separating clean from poisoned samples using the reconstruction error of image slices, outperforming existing methods, while being less time- and memory-intensive. We urge that the availability of large, representative datasets in the military domain is a prerequisite to further evaluate risks of poisoning attacks and opportunities patch detection.

[195] arXiv:2509.03181 [pdf, other]
Title: Beyond Words: Interjection Classification for Improved Human-Computer Interaction
Yaniv Goren, Yuval Cohen, Alexander Apartsin, Yehudit Aperstein
Comments: 9 pages
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

In the realm of human-computer interaction, fostering a natural dialogue between humans and machines is paramount. A key, often overlooked, component of this dialogue is the use of interjections such as "mmm" and "hmm". Despite their frequent use to express agreement, hesitation, or requests for information, these interjections are typically dismissed as "non-words" by Automatic Speech Recognition (ASR) engines. Addressing this gap, we introduce a novel task dedicated to interjection classification, a pioneer in the field to our knowledge. This task is challenging due to the short duration of interjection signals and significant inter- and intra-speaker variability. In this work, we present and publish a dataset of interjection signals collected specifically for interjection classification. We employ this dataset to train and evaluate a baseline deep learning model. To enhance performance, we augment the training dataset using techniques such as tempo and pitch transformation, which significantly improve classification accuracy, making models more robust. The interjection dataset, a Python library for the augmentation pipeline, baseline model, and evaluation scripts, are available to the research community.

[196] arXiv:2509.03185 [pdf, html, other]
Title: PPORLD-EDNetLDCT: A Proximal Policy Optimization-Based Reinforcement Learning Framework for Adaptive Low-Dose CT Denoising
Debopom Sutradhar, Ripon Kumar Debnath, Mohaimenul Azam Khan Raiaan, Yan Zhang, Reem E. Mohamed, Sami Azam
Comments: 20 pages, 5 figures, 5 tables. Submitted to Computers in Biology and Medicine for peer review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Low-dose computed tomography (LDCT) is critical for minimizing radiation exposure, but it often leads to increased noise and reduced image quality. Traditional denoising methods, such as iterative optimization or supervised learning, often fail to preserve image quality. To address these challenges, we introduce PPORLD-EDNetLDCT, a reinforcement learning-based (RL) approach with Encoder-Decoder for LDCT. Our method utilizes a dynamic RL-based approach in which an advanced posterior policy optimization (PPO) algorithm is used to optimize denoising policies in real time, based on image quality feedback, trained via a custom gym environment. The experimental results on the low dose CT image and projection dataset demonstrate that the proposed PPORLD-EDNetLDCT model outperforms traditional denoising techniques and other DL-based methods, achieving a peak signal-to-noise ratio of 41.87, a structural similarity index measure of 0.9814 and a root mean squared error of 0.00236. Moreover, in NIH-AAPM-Mayo Clinic Low Dose CT Challenge dataset our method achived a PSNR of 41.52, SSIM of 0.9723 and RMSE of 0.0051. Furthermore, we validated the quality of denoising using a classification task in the COVID-19 LDCT dataset, where the images processed by our method improved the classification accuracy to 94\%, achieving 4\% higher accuracy compared to denoising without RL-based denoising. This method offers a promising solution for safer and more accurate LDCT imaging.

[197] arXiv:2509.03187 [pdf, html, other]
Title: Enhancing Interpretability and Effectiveness in Recommendation with Numerical Features via Learning to Contrast the Counterfactual samples
Xiaoxiao Xu, Hao Wu, Wenhui Yu, Lantao Hu, Peng Jiang, Kun Gai
Comments: Accepted by TheWebConf2024
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

We propose a general model-agnostic Contrastive learning framework with Counterfactual Samples Synthesizing (CCSS) for modeling the monotonicity between the neural network output and numerical features which is critical for interpretability and effectiveness of recommender systems. CCSS models the monotonicity via a two-stage process: synthesizing counterfactual samples and contrasting the counterfactual samples. The two techniques are naturally integrated into a model-agnostic framework, forming an end-to-end training process. Abundant empirical tests are conducted on a publicly available dataset and a real industrial dataset, and the results well demonstrate the effectiveness of our proposed CCSS. Besides, CCSS has been deployed in our real large-scale industrial recommender, successfully serving over hundreds of millions users.

[198] arXiv:2509.03191 [pdf, html, other]
Title: Tabular foundation model for GEOAI benchmark problems BM/AirportSoilProperties/2/2025
Taiga Saito, Yu Otake, Stephen Wu
Subjects: Machine Learning (cs.LG)

This paper presents a novel application of the Tabular Prior-Data Fitted Network (TabPFN) - a transformer-based foundation model for tabular data - to geotechnical site characterization problems defined in the GEOAI benchmark BM/AirportSoilProperties/2/2025. Two tasks are addressed: (1) predicting the spatial variation of undrained shear strength (su) across borehole depth profiles, and (2) imputing missing mechanical parameters in a dense-site dataset. We apply TabPFN in a zero-training, few-shot, in-context learning setting - without hyper-parameter tuning - and provide it with additional context from the big indirect database (BID). The study demonstrates that TabPFN, as a general-purpose foundation model, achieved superior accuracy and well-calibrated predictive distributions compared to a conventional hierarchical Bayesian model (HBM) baseline, while also offering significant gains in inference efficiency. In Benchmark Problem #1 (spatial su prediction), TabPFN outperformed the HBM in prediction accuracy and delivered an order-of-magnitude faster runtime. In Benchmark Problem #2 (missing mechanical parameter imputation), TabPFN likewise achieved lower RMSE for all target parameters with well-quantified uncertainties, though its cumulative computation cost was higher than HBM's due to its one-variable-at-a-time inference. These results mark the first successful use of a tabular foundation model in geotechnical modeling, suggesting a potential paradigm shift in probabilistic site characterization.

[199] arXiv:2509.03198 [pdf, html, other]
Title: Efficient QR-based Column Subset Selection through Randomized Sparse Embeddings
Israa Fakih, Laura Grigori
Subjects: Numerical Analysis (math.NA)

In this paper, we introduce an efficient algorithm for column subset selection that combines the column-pivoted QR factorization with sparse subspace embeddings. The proposed method, SE-QRSC, is particularly effective for wide matrices with significantly more columns than rows. Starting from a matrix $A$, the algorithm selects $k$ columns from the sketched matrix $B = A \Omega^T$, where $\Omega$ is a sparse subspace embedding of $\mathrm{range}(A^T)$. The sparsity structure of $\Omega$ is then exploited to map the selected pivots back to the corresponding columns of $A$, which are then used to produce the final subset of selected columns. We prove that this procedure yields a factorization with strong rank-revealing properties, thus revealing the spectrum of $A$. The resulting bounds exhibit a reduced dependence on the number of columns of $A$ compared to those obtained from the strong rank-revealing QR factorization of $A$. Moreover, when the leverage scores are known, such as for orthogonal matrices, or can be efficiently approximated, the bounds become entirely independent of the column dimension. For general matrices, the algorithm can be extended by first applying an additional subspace embedding of $range(A)$.

[200] arXiv:2509.03199 [pdf, html, other]
Title: Finding My Way: Influence of Different Audio Augmented Reality Navigation Cues on User Experience and Subjective Usefulness
Sina Hinzmann, Francesco Vona, Juliane Henning, Mohamed Amer, Omar Abdellatif, Tanja Kojic, Jan-Niklas Voigt-Antons
Subjects: Human-Computer Interaction (cs.HC)

As augmented reality (AR) becomes increasingly prevalent in mobile and context-aware applications, the role of auditory cues in guiding users through physical environments is becoming critical. This study investigates the effectiveness and user experience of various categories of audio cues, including fully non-verbal sounds and speech-derived Spearcons, during outdoor navigation tasks using the Meta Quest 3 headset. Twenty participants navigated five outdoor routes using audio-only cue types: Artificial Sounds, Nature Sounds, Spearcons, Musical Instruments, and Auditory Icons. Subjective evaluations were collected to assess the perceived effectiveness and user experience of each sound type. Results revealed significant differences in perceived novelty and stimulation across sound types. Artificial Sounds and Musical Instruments were rated higher than Spearcons in novelty, while Artificial Sounds were also rated higher than Spearcons in stimulation. Overall preference was evenly split between Nature Sounds and Artificial Sounds. These findings suggest that incorporating aspects of novelty and user engagement in auditory feedback design may enhance the effectiveness of AR navigation systems.

[201] arXiv:2509.03201 [pdf, html, other]
Title: CapsBeam: Accelerating Capsule Network based Beamformer for Ultrasound Non-Steered Plane Wave Imaging on Field Programmable Gate Array
Abdul Rahoof, Vivek Chaturvedi, Mahesh Raveendranatha Panicker, Muhammad Shafique
Journal-ref: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 33, issue. 7, pp. 1934-1944, 2025
Subjects: Hardware Architecture (cs.AR)

In recent years, there has been a growing trend in accelerating computationally complex non-real-time beamforming algorithms in ultrasound imaging using deep learning models. However, due to the large size and complexity these state-of-the-art deep learning techniques poses significant challenges when deploying on resource-constrained edge devices. In this work, we propose a novel capsule network based beamformer called CapsBeam, designed to operate on raw radio-frequency data and provide an envelope of beamformed data through non-steered plane wave insonification. Experiments on in-vivo data, CapsBeam reduced artifacts compared to the standard Delay-and-Sum (DAS) beamforming. For in-vitro data, CapsBeam demonstrated a 32.31% increase in contrast, along with gains of 16.54% and 6.7% in axial and lateral resolution compared to the DAS. Similarly, in-silico data showed a 26% enhancement in contrast, along with improvements of 13.6% and 21.5% in axial and lateral resolution, respectively, compared to the DAS. To reduce the parameter redundancy and enhance the computational efficiency, we pruned the model using our multi-layer LookAhead Kernel Pruning (LAKP-ML) methodology, achieving a compression ratio of 85% without affecting the image quality. Additionally, the hardware complexity of the proposed model is reduced by applying quantization, simplification of non-linear operations, and parallelizing operations. Finally, we proposed a specialized accelerator architecture for the pruned and optimized CapsBeam model, implemented on a Xilinx ZU7EV FPGA. The proposed accelerator achieved a throughput of 30 GOPS for the convolution operation and 17.4 GOPS for the dynamic routing operation.

[202] arXiv:2509.03204 [pdf, html, other]
Title: Exploring the Design Space of Fair Tree Learning Algorithms
Kiara Stempel, Mattia Cerrato, Stefan Kramer
Subjects: Machine Learning (cs.LG)

Decision trees have been studied extensively in the context of fairness, aiming to maximize prediction performance while ensuring non-discrimination against different groups. Techniques in this space usually focus on imposing constraints at training time, constraining the search space so that solutions which display unacceptable values of relevant metrics are not considered, discarded, or discouraged. If we assume one target variable y and one sensitive attribute s, the design space of tree learning algorithms can be spanned as follows: (i) One can have one tree T that is built using an objective function that is a function of y, s, and T. For instance, one can build a tree based on the weighted information gain regarding y (maximizing) and s (minimizing). (ii) The second option is to have one tree model T that uses an objective function in y and T and a constraint on s and T. Here, s is no longer part of the objective, but part of a constraint. This can be achieved greedily by aborting a further split as soon as the condition that optimizes the objective in y fails to satisfy the constraint on s. A simple way to explore other splits is to backtrack during tree construction once a fairness constraint is violated. (iii) The third option is to have two trees T_y and T_s, one for y and one for s, such that the tree structure for y and s does not have to be shared. In this way, information regarding y and regarding s can be used independently, without having to constrain the choices in tree construction by the mutual information between the two variables. Quite surprisingly, of the three options, only the first one and the greedy variant of the second have been studied in the literature so far. In this paper, we introduce the above two additional options from that design space and characterize them experimentally on multiple datasets.

[203] arXiv:2509.03206 [pdf, html, other]
Title: Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback
Zeqiang Zhang, Fabian Wurzberger, Gerrit Schmid, Sebastian Gottwald, Daniel A. Braun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning faces significant challenges when applied to tasks characterized by sparse reward structures. Although imitation learning, within the domain of supervised learning, offers faster convergence, it relies heavily on human-generated demonstrations. Recently, Goal-Conditioned Supervised Learning (GCSL) has emerged as a potential solution by enabling self-imitation learning for autonomous systems. By strategically relabelling goals, agents can derive policy insights from their own experiences. Despite the successes of this framework, it presents two notable limitations: (1) Learning exclusively from self-generated experiences can exacerbate the agents' inherent biases; (2) The relabelling strategy allows agents to focus solely on successful outcomes, precluding them from learning from their mistakes. To address these issues, we propose a novel model that integrates contrastive learning principles into the GCSL framework to learn from both success and failure. Through empirical evaluations, we demonstrate that our algorithm overcomes limitations imposed by agents' initial biases and thereby enables more exploratory behavior. This facilitates the identification and adoption of effective policies, leading to superior performance across a variety of challenging environments.

[204] arXiv:2509.03211 [pdf, html, other]
Title: Efficient Active Training for Deep LiDAR Odometry
Beibei Zhou, Zhiyuan Zhang, Zhenbo Song, Jianhui Guo, Hui Kong
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Robust and efficient deep LiDAR odometry models are crucial for accurate localization and 3D reconstruction, but typically require extensive and diverse training data to adapt to diverse environments, leading to inefficiencies. To tackle this, we introduce an active training framework designed to selectively extract training data from diverse environments, thereby reducing the training load and enhancing model generalization. Our framework is based on two key strategies: Initial Training Set Selection (ITSS) and Active Incremental Selection (AIS). ITSS begins by breaking down motion sequences from general weather into nodes and edges for detailed trajectory analysis, prioritizing diverse sequences to form a rich initial training dataset for training the base model. For complex sequences that are difficult to analyze, especially under challenging snowy weather conditions, AIS uses scene reconstruction and prediction inconsistency to iteratively select training samples, refining the model to handle a wide range of real-world scenarios. Experiments across datasets and weather conditions validate our approach's effectiveness. Notably, our method matches the performance of full-dataset training with just 52\% of the sequence volume, demonstrating the training efficiency and robustness of our active training paradigm. By optimizing the training process, our approach sets the stage for more agile and reliable LiDAR odometry systems, capable of navigating diverse environmental conditions with greater precision.

[205] arXiv:2509.03212 [pdf, html, other]
Title: AIVA: An AI-based Virtual Companion for Emotion-aware Interaction
Chenxi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in Large Language Models (LLMs) have significantly improved natural language understanding and generation, enhancing Human-Computer Interaction (HCI). However, LLMs are limited to unimodal text processing and lack the ability to interpret emotional cues from non-verbal signals, hindering more immersive and empathetic interactions. This work explores integrating multimodal sentiment perception into LLMs to create emotion-aware agents. We propose \ours, an AI-based virtual companion that captures multimodal sentiment cues, enabling emotionally aligned and animated HCI. \ours introduces a Multimodal Sentiment Perception Network (MSPN) using a cross-modal fusion transformer and supervised contrastive learning to provide emotional cues. Additionally, we develop an emotion-aware prompt engineering strategy for generating empathetic responses and integrate a Text-to-Speech (TTS) system and animated avatar module for expressive interactions. \ours provides a framework for emotion-aware agents with applications in companion robotics, social care, mental health, and human-centered AI.

[206] arXiv:2509.03214 [pdf, html, other]
Title: RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion
Junhao Jia, Yifei Sun, Yunyou Liu, Cheng Yang, Changmiao Wang, Feiwei Qin, Yong Peng, Wenwen Min
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Functional magnetic resonance imaging (fMRI) is a powerful tool for probing brain function, yet reliable clinical diagnosis is hampered by low signal-to-noise ratios, inter-subject variability, and the limited frequency awareness of prevailing CNN- and Transformer-based models. Moreover, most fMRI datasets lack textual annotations that could contextualize regional activation and connectivity patterns. We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis. RTGMFF consists of three components: (i) ROI-driven fMRI text generation deterministically condenses each subject's activation, connectivity, age, and sex into reproducible text tokens; (ii) Hybrid frequency-spatial encoder fuses a hierarchical wavelet-mamba branch with a cross-scale Transformer encoder to capture frequency-domain structure alongside long-range spatial dependencies; and (iii) Adaptive semantic alignment module embeds the ROI token sequence and visual features in a shared space, using a regularized cosine-similarity loss to narrow the modality gap. Extensive experiments on the ADHD-200 and ABIDE benchmarks show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve. Code is available at this https URL.

[207] arXiv:2509.03215 [pdf, html, other]
Title: Triangle Detection in Worst-Case Sparse Graphs via Local Sketching
Hongyi Duan, Jian'an Zhang
Comments: Work in progress. Several technical details remain to be fully verified; comments and corrections are appreciated
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG)

We present a non-algebraic, locality-preserving framework for triangle detection in worst-case sparse graphs. Our algorithm processes the graph in $O(\log n)$ independent layers and partitions incident edges into prefix-based classes where each class maintains a 1-sparse triple over a prime field. Potential witnesses are surfaced by pair-key (PK) alignment, and every candidate is verified by a three-stage, zero-false-positive pipeline: a class-level 1-sparse consistency check, two slot-level decodings, and a final adjacency confirmation. \textbf{To obtain single-run high-probability coverage, we further instantiate $R=c_G\log n$ independent PK groups per class (each probing a constant number of complementary buckets), which amplifies the per-layer hit rate from $\Theta(1/\log n)$ to $1-n^{-\Omega(1)}$ without changing the accounting.} A one-shot pairing discipline and class-term triggering yield a per-(layer,level) accounting bound of $O(m)$, while keep-coin concentration ensures that each vertex retains only $O(d^+(x))$ keys with high probability. Consequently, the total running time is $O(m\log^2 n)$ and the peak space is $O(m\log n)$, both with high probability. The algorithm emits a succinct Seeds+Logs artifact that enables a third party to replay all necessary checks and certify a NO-instance in $\tilde O(m\log n)$ time. We also prove a $\Theta(1/\log n)$ hit-rate lower bound for any single PK family under a constant-probe local model (via Yao)--motivating the use of $\Theta(\log n)$ independent groups--and discuss why global algebraic convolutions would break near-linear accounting or run into fine-grained barriers. We outline measured paths toward Las Vegas $O(m\log n)$ and deterministic near-linear variants.

[208] arXiv:2509.03219 [pdf, html, other]
Title: Uncertainty-driven Adaptive Exploration
Leonidas Bakopoulos, Georgios Chalkiadakis
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Adaptive exploration methods propose ways to learn complex policies via alternating between exploration and exploitation. An important question for such methods is to determine the appropriate moment to switch between exploration and exploitation and vice versa. This is critical in domains that require the learning of long and complex sequences of actions. In this work, we present a generic adaptive exploration framework that employs uncertainty to address this important issue in a principled manner. Our framework includes previous adaptive exploration approaches as special cases. Moreover, we can incorporate in our framework any uncertainty-measuring mechanism of choice, for instance mechanisms used in intrinsic motivation or epistemic uncertainty-based exploration methods. We experimentally demonstrate that our framework gives rise to adaptive exploration strategies that outperform standard ones across several MuJoCo environments.

[209] arXiv:2509.03221 [pdf, html, other]
Title: LGBP-OrgaNet: Learnable Gaussian Band Pass Fusion of CNN and Transformer Features for Robust Organoid Segmentation and Tracking
Jing Zhang, Siying Tao, Jiao Li, Tianhe Wang, Junchen Wu, Ruqian Hao, Xiaohui Du, Ruirong Tan, Rui Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Organoids replicate organ structure and function, playing a crucial role in fields such as tumor treatment and drug screening. Their shape and size can indicate their developmental status, but traditional fluorescence labeling methods risk compromising their structure. Therefore, this paper proposes an automated, non-destructive approach to organoid segmentation and tracking. We introduced the LGBP-OrgaNet, a deep learning-based system proficient in accurately segmenting, tracking, and quantifying organoids. The model leverages complementary information extracted from CNN and Transformer modules and introduces the innovative feature fusion module, Learnable Gaussian Band Pass Fusion, to merge data from two branches. Additionally, in the decoder, the model proposes a Bidirectional Cross Fusion Block to fuse multi-scale features, and finally completes the decoding through progressive concatenation and upsampling. SROrga demonstrates satisfactory segmentation accuracy and robustness on organoids segmentation datasets, providing a potent tool for organoid research.

[210] arXiv:2509.03222 [pdf, html, other]
Title: The Role of Embodiment in Intuitive Whole-Body Teleoperation for Mobile Manipulation
Sophia Bianchi Moyen, Rickmer Krohn, Sophie Lueth, Kay Pompetzki, Jan Peters, Vignesh Prasad, Georgia Chalvatzaki
Comments: 8 pages, 8 figures, Accepted at the IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2025
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Intuitive Teleoperation interfaces are essential for mobile manipulation robots to ensure high quality data collection while reducing operator workload. A strong sense of embodiment combined with minimal physical and cognitive demands not only enhances the user experience during large-scale data collection, but also helps maintain data quality over extended periods. This becomes especially crucial for challenging long-horizon mobile manipulation tasks that require whole-body coordination. We compare two distinct robot control paradigms: a coupled embodiment integrating arm manipulation and base navigation functions, and a decoupled embodiment treating these systems as separate control entities. Additionally, we evaluate two visual feedback mechanisms: immersive virtual reality and conventional screen-based visualization of the robot's field of view. These configurations were systematically assessed across a complex, multi-stage task sequence requiring integrated planning and execution. Our results show that the use of VR as a feedback modality increases task completion time, cognitive workload, and perceived effort of the teleoperator. Coupling manipulation and navigation leads to a comparable workload on the user as decoupling the embodiments, while preliminary experiments suggest that data acquired by coupled teleoperation leads to better imitation learning performance. Our holistic view on intuitive teleoperation interfaces provides valuable insight into collecting high-quality, high-dimensional mobile manipulation data at scale with the human operator in mind. Project website:this https URL

[211] arXiv:2509.03226 [pdf, html, other]
Title: BAMG: A Block-Aware Monotonic Graph Index for Disk-Based Approximate Nearest Neighbor Search
Huiling Li, Jianliang Xu
Subjects: Databases (cs.DB)

Approximate Nearest Neighbor Search (ANNS) over high-dimensional vectors is a foundational problem in databases, where disk I/O often emerges as the dominant performance bottleneck at scale. Existing graph indexing solutions for disk-based ANNS typically either optimize the storage layout for a given graph or construct the graph independently of the storage layout, thus overlooking their interaction. In this paper, we propose the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), a novel graph structure that jointly considers both geometric distance and storage layout for edge selection, theoretically guaranteeing the existence of I/O monotonic search paths. To address the scalability challenge of BMRNG construction, we further develop a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout. BAMG integrates block-aware edge pruning with a decoupled storage design that separates raw vectors from the graph index, thereby maximizing block utilization and minimizing redundant disk reads. Additionally, we design a multi-layer navigation graph for adaptive and efficient query entry, along with a block-first search algorithm that prioritizes intra-block traversal to fully exploit each disk I/O operation. Extensive experiments on real-world datasets demonstrate that BAMG achieves up to 2.1x higher throughput and reduces I/O reads by up to 52% compared to state-of-the-art methods, while maintaining comparable recall.

[212] arXiv:2509.03228 [pdf, html, other]
Title: NeurStore: Efficient In-database Deep Learning Model Management System
Siqi Xiang, Sheng Wang, Xiaokui Xiao, Cong Yue, Zhanhao Zhao, Beng Chin Ooi
Comments: 15 pages, 14 figures, Accepted at SIGMOD 2026
Subjects: Databases (cs.DB); Machine Learning (cs.LG)

With the prevalence of in-database AI-powered analytics, there is an increasing demand for database systems to efficiently manage the ever-expanding number and size of deep learning models. However, existing database systems typically store entire models as monolithic files or apply compression techniques that overlook the structural characteristics of deep learning models, resulting in suboptimal model storage overhead. This paper presents NeurStore, a novel in-database model management system that enables efficient storage and utilization of deep learning models. First, NeurStore employs a tensor-based model storage engine to enable fine-grained model storage within databases. In particular, we enhance the hierarchical navigable small world (HNSW) graph to index tensors, and only store additional deltas for tensors within a predefined similarity threshold to ensure tensor-level deduplication. Second, we propose a delta quantization algorithm that effectively compresses delta tensors, thus achieving a superior compression ratio with controllable model accuracy loss. Finally, we devise a compression-aware model loading mechanism, which improves model utilization performance by enabling direct computation on compressed tensors. Experimental evaluations demonstrate that NeurStore achieves superior compression ratios and competitive model loading throughput compared to state-of-the-art approaches.

[213] arXiv:2509.03231 [pdf, other]
Title: Exploring persuasive Interactions with generative social robots: An experimental framework
Stephan Vonschallen, Larissa Julia Corina Finsler, Theresa Schmiedel, Friederike Eyssel
Comments: A shortened version of this paper was accepted as poster for the Thirteenth International Conference on Human-Agent Interaction (HAI2025)
Subjects: Robotics (cs.RO)

Integrating generative AI such as large language models into social robots has improved their ability to engage in natural, human-like communication. This study presents a method to examine their persuasive capabilities. We designed an experimental framework focused on decision making and tested it in a pilot that varied robot appearance and self-knowledge. Using qualitative analysis, we evaluated interaction quality, persuasion effectiveness, and the robot's communicative strategies. Participants generally experienced the interaction positively, describing the robot as competent, friendly, and supportive, while noting practical limits such as delayed responses and occasional speech-recognition errors. Persuasiveness was highly context dependent and shaped by robot behavior: participants responded well to polite, reasoned suggestions and expressive gestures, but emphasized the need for more personalized, context-aware arguments and clearer social roles. These findings suggest that generative social robots can influence user decisions, but their effectiveness depends on communicative nuance and contextual relevance. We propose refinements to the framework to further study persuasive dynamics between robots and human users.

[214] arXiv:2509.03232 [pdf, html, other]
Title: Card Sorting with Fewer Cards and the Same Mental Models? A Re-examination of an Established Practice
Eduard Kuric, Peter Demcak, Matus Krajcovic
Subjects: Human-Computer Interaction (cs.HC)

To keep card sorting with a lot of cards concise, a common strategy for gauging mental models involves presenting participants with fewer randomly selected cards instead of the full set. This is a decades-old practice, but its effects lacked systematic examination. To assess how randomized subsets affect data, we conducted an experiment with 160 participants. We compared results between full and randomized 60\% card sets, then analyzed sample size requirements and the impacts of individual personality and cognitive factors. Our results demonstrate that randomized subsets can yield comparable similarity matrices to standard card sorting, but thematic patterns in categories can differ. Increased data variability also warrants larger sample sizes (25-35 for 60% card subset). Results indicate that personality traits and cognitive reflection interact with card sorting. Our research suggests evidence-based practices for conducting card sorting while exposing the influence of study design and individual differences on measurement of mental models.

[215] arXiv:2509.03234 [pdf, html, other]
Title: TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models
Yuxuan Gu, Wuyang Zhou, Giorgos Iacovides, Danilo Mandic
Subjects: Machine Learning (cs.LG)

Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), have significantly reduced the number of trainable parameters needed in fine-tuning large language models (LLMs). Subsequent developments of LoRA-style adapters have diverged into two main directions: (1) enhancing model expressivity with high-rank adapters, and (2) pushing for further parameter reduction, as exemplified by vector-based methods. However, these approaches present a trade-off, as achieving the expressivity of high-rank weight updates typically comes at the cost of sacrificing the extreme parameter efficiency offered by vector-based techniques. To address this issue, we propose a vector-based random \underline{\textbf{Te}}nsor network for high-\underline{\textbf{R}}ank \underline{\textbf{A}}daptation (TeRA), a novel PEFT method that achieves high-rank weight updates while retaining the parameter efficiency of vector-based PEFT adapters. This is achieved by parameterizing the tensorized weight update matrix as a Tucker-like tensor network (TN), in which large randomly initialized factors are frozen and shared across layers, while only small layer-specific scaling vectors, formed by entries in diagonal factor matrices, are trained. This design effectively decouples the rank of the weight update matrix from the number of trainable parameters. Comprehensive experiments demonstrate that TeRA matches or even outperforms high-rank adapters, while requiring a trainable parameter count similar to vector-based methods. Theoretical analysis and ablation studies further validate the effectiveness of our approach.

[216] arXiv:2509.03236 [pdf, html, other]
Title: OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search
Ben Chen, Xian Guo, Siyuan Wang, Zihan Liang, Yue Lv, Yufei Ma, Xinlong Xiao, Bowen Xue, Xuxin Zhang, Ying Yang, Huangyu Dai, Xing Xu, Tong Zhao, Mingcan Peng, XiaoYang Zheng, Cong Zhang, Qihang Zhao, Yuqing Ding, Chenyi Lei, Wenwu Ou, Han Li
Subjects: Information Retrieval (cs.IR)

Traditional e-commerce search systems employ multi-stage cascading architectures (MCA) that progressively filter items through recall, pre-ranking, and ranking stages. While effective at balancing computational efficiency with business conversion, these systems suffer from fragmented computation and optimization objective collisions across stages, which ultimately limit their performance ceiling. To address these, we propose \textbf{OneSearch}, the first industrial-deployed end-to-end generative framework for e-commerce search. This framework introduces three key innovations: (1) a Keyword-enhanced Hierarchical Quantization Encoding (KHQE) module, to preserve both hierarchical semantics and distinctive item attributes while maintaining strong query-item relevance constraints; (2) a multi-view user behavior sequence injection strategy that constructs behavior-driven user IDs and incorporates both explicit short-term and implicit long-term sequences to model user preferences comprehensively; and (3) a Preference-Aware Reward System (PARS) featuring multi-stage supervised fine-tuning and adaptive reward-weighted ranking to capture fine-grained user preferences. Extensive offline evaluations on large-scale industry datasets demonstrate OneSearch's superior performance for high-quality recall and ranking. The rigorous online A/B tests confirm its ability to enhance relevance in the same exposure position, achieving statistically significant improvements: +1.67\% item CTR, +2.40\% buyer, and +3.22\% order volume. Furthermore, OneSearch reduces operational expenditure by 75.40\% and improves Model FLOPs Utilization from 3.26\% to 27.32\%. The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users, generating tens of millions of PVs daily.

[217] arXiv:2509.03238 [pdf, html, other]
Title: Vibration Damping in Underactuated Cable-suspended Artwork -- Flying Belt Motion Control
Martin Goubej, Lauria Clarke, Martin Hrabačka, David Tolar
Comments: 10 pages, 10 figures
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents a comprehensive refurbishment of the interactive robotic art installation Standards and Double Standards by Rafael Lozano-Hemmer. The installation features an array of belts suspended from the ceiling, each actuated by stepper motors and dynamically oriented by a vision-based tracking system that follows the movements of exhibition visitors. The original system was limited by oscillatory dynamics, resulting in torsional and pendulum-like vibrations that constrained rotational speed and reduced interactive responsiveness. To address these challenges, the refurbishment involved significant upgrades to both hardware and motion control algorithms. A detailed mathematical model of the flying belt system was developed to accurately capture its dynamic behavior, providing a foundation for advanced control design. An input shaping method, formulated as a convex optimization problem, was implemented to effectively suppress vibrations, enabling smoother and faster belt movements. Experimental results demonstrate substantial improvements in system performance and audience interaction. This work exemplifies the integration of robotics, control engineering, and interactive art, offering new solutions to technical challenges in real-time motion control and vibration damping for large-scale kinetic installations.

[218] arXiv:2509.03240 [pdf, html, other]
Title: Evaluation of Stress Detection as Time Series Events -- A Novel Window-Based F1-Metric
Harald Vilhelm Skat-Rørdam, Sneha Das, Kathrine Sofie Rasmussen, Nicole Nadine Lønfeldt, Line Clemmensen
Comments: 15 pages, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)

Accurate evaluation of event detection in time series is essential for applications such as stress monitoring with wearable devices, where ground truth is typically annotated as single-point events, even though the underlying phenomena are gradual and temporally diffused. Standard metrics like F1 and point-adjusted F1 (F1$_{pa}$) often misrepresent model performance in such real-world, imbalanced datasets. We introduce a window-based F1 metric (F1$_w$) that incorporates temporal tolerance, enabling a more robust assessment of event detection when exact alignment is unrealistic. Empirical analysis in three physiological datasets, two in-the-wild (ADARP, Wrist Angel) and one experimental (ROAD), indicates that F1$_w$ reveals meaningful model performance patterns invisible to conventional metrics, while its window size can be adapted to domain knowledge to avoid overestimation. We show that the choice of evaluation metric strongly influences the interpretation of model performance: using predictions from TimesFM, only our temporally tolerant metrics reveal statistically significant improvements over random and null baselines in the two in-the-wild use cases. This work addresses key gaps in time series evaluation and provides practical guidance for healthcare applications where requirements for temporal precision vary by context.

[219] arXiv:2509.03241 [pdf, html, other]
Title: Unsupervised Learning based Element Resource Allocation for Reconfigurable Intelligent Surfaces in mmWave Network
Pujitha Mamillapalli, Yoghitha Ramamoorthi, Abhinav Kumar, Tomoki Murakami, Tomoaki Ogawa, Yasushi Takatori
Subjects: Machine Learning (cs.LG)

The increasing demand for high data rates and seamless connectivity in wireless systems has sparked significant interest in reconfigurable intelligent surfaces (RIS) and artificial intelligence-based wireless applications. RIS typically comprises passive reflective antenna elements that control the wireless propagation environment by adequately tuning the phase of the reflective elements. The allocation of RIS elements to multipleuser equipment (UEs) is crucial for efficiently utilizing RIS. In this work, we formulate a joint optimization problem that optimizes the RIS phase configuration and resource allocation under an $\alpha$-fair scheduling framework and propose an efficient way of allocating RIS elements. Conventional iterative optimization methods, however, suffer from exponentially increasing computational complexity as the number of RIS elements increases and also complicate the generation of training labels for supervised learning. To overcome these challenges, we propose a five-layer fully connected neural network (FNN) combined with a preprocessing technique to significantly reduce input dimensionality, lower computational complexity, and enhance scalability. The simulation results show that our proposed NN-based solution reduces computational overhead while significantly improving system throughput by 6.8% compared to existing RIS element allocation schemes. Furthermore, the proposed system achieves better performance while reducing computational complexity, making it significantly more scalable than the iterative optimization algorithms.

[220] arXiv:2509.03242 [pdf, html, other]
Title: TopoMap: A Feature-based Semantic Discriminator of the Topographical Regions in the Test Input Space
Gianmarco De Vita, Nargiz Humbatova, Paolo Tonella
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)

Testing Deep Learning (DL)-based systems is an open challenge. Although it is relatively easy to find inputs that cause a DL model to misbehave, the grouping of inputs by features that make the DL model under test fail is largely unexplored. Existing approaches for DL testing introduce perturbations that may focus on specific failure-inducing features, while neglecting others that belong to different regions of the feature space. In this paper, we create an explicit topographical map of the input feature space. Our approach, named TopoMap, is both black-box and model-agnostic as it relies solely on features that characterise the input space. To discriminate the inputs according to the specific features they share, we first apply dimensionality reduction to obtain input embeddings, which are then subjected to clustering. Each DL model might require specific embedding computations and clustering algorithms to achieve a meaningful separation of inputs into discriminative groups. We propose a novel way to evaluate alternative configurations of embedding and clustering techniques. We used a deep neural network (DNN) as an approximation of a human evaluator who could tell whether a pair of clusters can be discriminated based on the features of the included elements. We use such a DNN to automatically select the optimal topographical map of the inputs among all those that are produced by different embedding/clustering configurations. The evaluation results show that the maps generated by TopoMap consist of distinguishable and meaningful regions. In addition, we evaluate the effectiveness of TopoMap using mutation analysis. In particular, we assess whether the clusters in our topographical map allow for an effective selection of mutation-killing inputs. Experimental results show that our approach outperforms random selection by 35% on average on killable mutants; by 61% on non-killable ones.

[221] arXiv:2509.03244 [pdf, html, other]
Title: FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization
Yiming Yao, Fei Liu, Liang Zhao, Xi Lin, Qingfu Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Expensive multi-objective optimization is a prevalent and crucial concern in many real-world scenarios, where sample-efficiency is vital due to the limited evaluations to recover the true Pareto front for decision making. Existing works either involve rebuilding Gaussian process surrogates from scratch for each objective in each new problem encountered, or rely on extensive past domain experiments for pre-training deep learning models, making them hard to generalize and impractical to cope with various emerging applications in the real world. To address this issue, we propose a new paradigm named FoMEMO (Foundation Models for Expensive Multi-objective Optimization), which enables the establishment of a foundation model conditioned on any domain trajectory and user preference, and facilitates fast in-context optimization based on the predicted preference-wise aggregation posteriors. Rather than accessing extensive domain experiments in the real world, we demonstrate that pre-training the foundation model with a diverse set of hundreds of millions of synthetic data can lead to superior adaptability to unknown problems, without necessitating any subsequent model training or updates in the optimization process. We evaluate our method across a variety of synthetic benchmarks and real-word applications, and demonstrate its superior generality and competitive performance compared to existing methods.

[222] arXiv:2509.03249 [pdf, other]
Title: Structure Transfer: an Inference-Based Calculus for the Transformation of Representations
Daniel Raggi, Gem Stapleton, Mateja Jamnik, Aaron Stockdill, Grecia Garcia Garcia, Peter C-H. Cheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

Representation choice is of fundamental importance to our ability to communicate and reason effectively. A major unsolved problem, addressed in this paper, is how to devise \textit{representational-system (RS) agnostic} techniques that drive representation transformation and choice. We present a novel calculus, called \textit{structure transfer}, that enables representation transformation across diverse RSs. Specifically, given a \textit{source} representation drawn from a source RS, the rules of structure transfer allow us to generate a \textit{target} representation for a target RS. The generality of structure transfer comes in part from its ability to ensure that the source representation and the generated target representation satisfy \textit{any} specified relation (such as semantic equivalence). This is done by exploiting \textit{schemas}, which encode knowledge about RSs. Specifically, schemas can express \textit{preservation of information} across relations between any pair of RSs, and this knowledge is used by structure transfer to derive a structure for the target representation which ensures that the desired relation holds. We formalise this using Representational Systems Theory~\cite{raggi2022rst}, building on the key concept of a \textit{construction space}. The abstract nature of construction spaces grants them the generality to model RSs of diverse kinds, including formal languages, geometric figures and diagrams, as well as informal notations. Consequently, structure transfer is a system-agnostic calculus that can be used to identify alternative representations in a wide range of practical settings.

[223] arXiv:2509.03256 [pdf, html, other]
Title: Comparison of End-to-end Speech Assessment Models for the NOCASA 2025 Challenge
Aleksei Žavoronkov, Tanel Alumäe
Comments: Published at IEEE MLSP 2025
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper presents an analysis of three end-to-end models developed for the NOCASA 2025 Challenge, aimed at automatic word-level pronunciation assessment for children learning Norwegian as a second language. Our models include an encoder-decoder Siamese architecture (E2E-R), a prefix-tuned direct classification model leveraging pretrained wav2vec2.0 representations, and a novel model integrating alignment-free goodness-of-pronunciation (GOP) features computed via CTC. We introduce a weighted ordinal cross-entropy loss tailored for optimizing metrics such as unweighted average recall and mean absolute error. Among the explored methods, our GOP-CTC-based model achieved the highest performance, substantially surpassing challenge baselines and attaining top leaderboard scores.

[224] arXiv:2509.03257 [pdf, html, other]
Title: Hidden Convexity in Active Learning: A Convexified Online Input Design for ARX Systems
Nicolas Chatzikiriakos, Bowen Song, Philipp Rank, Andrea Iannelli
Comments: Accepted for presentation at CDC 2025
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

The goal of this work is to accelerate the identification of an unknown ARX system from trajectory data through online input design. Specifically, we present an active learning algorithm that sequentially selects the input to excite the system according to an experiment design criterion using the past measured data. The adopted criterion yields a non-convex optimization problem, but we provide an exact convex reformulation allowing to find the global optimizer in a computationally tractable way. Moreover, we give sample complexity bounds on the estimation error due to the stochastic noise. Numerical studies showcase the effectiveness of our algorithm and the benefits of the convex reformulation.

[225] arXiv:2509.03260 [pdf, html, other]
Title: HyPV-LEAD: Proactive Early-Warning of Cryptocurrency Anomalies through Data-Driven Structural-Temporal Modeling
Minjung Park, Gyuyeon Na, Soyoun Kim, Sunyoung Moon, HyeonJeong Cha, Sangmi Chai
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Risk Management (q-fin.RM)

Abnormal cryptocurrency transactions - such as mixing services, fraudulent transfers, and pump-and-dump operations -- pose escalating risks to financial integrity but remain notoriously difficult to detect due to class imbalance, temporal volatility, and complex network dependencies. Existing approaches are predominantly model-centric and post hoc, flagging anomalies only after they occur and thus offering limited preventive value. This paper introduces HyPV-LEAD (Hyperbolic Peak-Valley Lead-time Enabled Anomaly Detection), a data-driven early-warning framework that explicitly incorporates lead time into anomaly detection. Unlike prior methods, HyPV-LEAD integrates three innovations: (1) window-horizon modeling to guarantee actionable lead-time alerts, (2) Peak-Valley (PV) sampling to mitigate class imbalance while preserving temporal continuity, and (3) hyperbolic embedding to capture the hierarchical and scale-free properties of blockchain transaction networks. Empirical evaluation on large-scale Bitcoin transaction data demonstrates that HyPV-LEAD consistently outperforms state-of-the-art baselines, achieving a PR-AUC of 0.9624 with significant gains in precision and recall. Ablation studies further confirm that each component - PV sampling, hyperbolic embedding, and structural-temporal modeling - provides complementary benefits, with the full framework delivering the highest performance. By shifting anomaly detection from reactive classification to proactive early-warning, HyPV-LEAD establishes a robust foundation for real-time risk management, anti-money laundering (AML) compliance, and financial security in dynamic blockchain environments.

[226] arXiv:2509.03261 [pdf, html, other]
Title: Parallel-Constraint Model Predictive Control: Exploiting Parallel Computation for Improving Safety
Elias Fontanari, Gianni Lunardi, Matteo Saveriano, Andrea Del Prete
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Ensuring constraint satisfaction is a key requirement for safety-critical systems, which include most robotic platforms. For example, constraints can be used for modeling joint position/velocity/torque limits and collision avoidance. Constrained systems are often controlled using Model Predictive Control, because of its ability to naturally handle constraints, relying on numerical optimization. However, ensuring constraint satisfaction is challenging for nonlinear systems/constraints. A well-known tool to make controllers safe is the so-called control-invariant set (a.k.a. safe set). In our previous work, we have shown that safety can be improved by letting the safe-set constraint recede along the MPC horizon. In this paper, we push that idea further by exploiting parallel computation to improve safety. We solve several MPC problems at the same time, where each problem instantiates the safe-set constraint at a different time step along the horizon. Finally, the controller can select the best solution according to some user-defined criteria. We validated this idea through extensive simulations with a 3-joint robotic arm, showing that significant improvements can be achieved in terms of safety and performance, even using as little as 4 computational cores.

[227] arXiv:2509.03262 [pdf, html, other]
Title: PI3DETR: Parametric Instance Detection of 3D Point Cloud Edges with a Geometry-Aware 3DETR
Fabio F. Oberweger, Michael Schwingshackl, Vanessa Staderini
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present PI3DETR, an end-to-end framework that directly predicts 3D parametric curve instances from raw point clouds, avoiding the intermediate representations and multi-stage processing common in prior work. Extending 3DETR, our model introduces a geometry-aware matching strategy and specialized loss functions that enable unified detection of differently parameterized curve types, including cubic Bézier curves, line segments, circles, and arcs, in a single forward pass. Optional post-processing steps further refine predictions without adding complexity. This streamlined design improves robustness to noise and varying sampling densities, addressing critical challenges in real world LiDAR and 3D sensing scenarios. PI3DETR sets a new state-of-the-art on the ABC dataset and generalizes effectively to real sensor data, offering a simple yet powerful solution for 3D edge and curve estimation.

[228] arXiv:2509.03263 [pdf, html, other]
Title: Estudio de la eficiencia en la escalabilidad de GPUs para el entrenamiento de Inteligencia Artificial
David Cortes, Carlos Juiz, Belen Bermejo
Comments: 8 pages, in Spanish language, 8 figures, Conference at SARTECO 2025, Spain
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)

Training large-scale deep learning models has become a key challenge for the scientific community and industry. While the massive use of GPUs can significantly speed up training times, this approach has a negative impact on efficiency. In this article, we present a detailed analysis of the times reported by MLPerf Training v4.1 on four workloads: BERT, Llama2 LoRA, RetinaNet, and Stable Diffusion, showing that there are configurations that optimise the relationship between performance, GPU usage, and efficiency. The results point to a break-even point that allows training times to be reduced while maximising efficiency.

[229] arXiv:2509.03265 [pdf, html, other]
Title: Compressed Dictionary Matching on Run-Length Encoded Strings
Philip Bille, Inge Li Gørtz, Simon J. Puglisi, Simon R. Tarnow
Journal-ref: 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 21:1-21:16, Schloss Dagstuhl - Leibniz-Zentrum f\"ur Informatik (2025)
Subjects: Data Structures and Algorithms (cs.DS)

Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-length encoding, and the goal is to solve the problem without decompression and achieve efficient time and space in the size of the compressed strings. Let $m$ and $n$ be the total length of the patterns $\mathcal{P}$ and the length of the text string $S$, respectively, and let $\overline{m}$ and $\overline{n}$ be the total number of runs in the run-length encoding of the patterns in $\mathcal{P}$ and $S$, respectively. Our main result is an algorithm that achieves $O( (\overline{m} + \overline{n})\log \log m + \mathrm{occ})$ expected time, and $O(\overline{m})$ space, where $\mathrm{occ}$ is the total number of occurrences of patterns in $S$. This is the first non-trivial solution to the problem. Since any solution must read the input, our time bound is optimal within an $\log \log m$ factor. We introduce several new techniques to achieve our bounds, including a new compressed representation of the classic Aho-Corasick automaton and a new efficient string index that supports fast queries in run-length encoded strings.

[230] arXiv:2509.03267 [pdf, html, other]
Title: SynBT: High-quality Tumor Synthesis for Breast Tumor Segmentation by 3D Diffusion Model
Hongxu Yang, Edina Timko, Levente Lippenszky, Vanda Czipczer, Lehel Ferenczi
Comments: Accepted by MICCAI 2025 Deep-Breath Workshop. Supported by IHI SYNTHIA project
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Synthetic tumors in medical images offer controllable characteristics that facilitate the training of machine learning models, leading to an improved segmentation performance. However, the existing methods of tumor synthesis yield suboptimal performances when tumor occupies a large spatial volume, such as breast tumor segmentation in MRI with a large field-of-view (FOV), while commonly used tumor generation methods are based on small patches. In this paper, we propose a 3D medical diffusion model, called SynBT, to generate high-quality breast tumor (BT) in contrast-enhanced MRI images. The proposed model consists of a patch-to-volume autoencoder, which is able to compress the high-resolution MRIs into compact latent space, while preserving the resolution of volumes with large FOV. Using the obtained latent space feature vector, a mask-conditioned diffusion model is used to synthesize breast tumors within selected regions of breast tissue, resulting in realistic tumor appearances. We evaluated the proposed method for a tumor segmentation task, which demonstrated the proposed high-quality tumor synthesis method can facilitate the common segmentation models with performance improvement of 2-3% Dice Score on a large public dataset, and therefore provides benefits for tumor segmentation in MRI images.

[231] arXiv:2509.03269 [pdf, html, other]
Title: Bridging Gaps Between Student and Expert Evaluations of AI-Generated Programming Hints
Tung Phung, Mengyan Wu, Heeryung Choi, Gustavo Soares, Sumit Gulwani, Adish Singla, Christopher Brooks
Comments: L@S'25
Subjects: Computers and Society (cs.CY)

Generative AI has the potential to enhance education by providing personalized feedback to students at scale. Recent work has proposed techniques to improve AI-generated programming hints and has evaluated their performance based on expert-designed rubrics or student ratings. However, it remains unclear how the rubrics used to design these techniques align with students' perceived helpfulness of hints. In this paper, we systematically study the mismatches in perceived hint quality from students' and experts' perspectives based on the deployment of AI-generated hints in a Python programming course. We analyze scenarios with discrepancies between student and expert evaluations, in particular, where experts rated a hint as high-quality while the student found it unhelpful. We identify key reasons for these discrepancies and classify them into categories, such as hints not accounting for the student's main concern or not considering previous help requests. Finally, we propose and discuss preliminary results on potential methods to bridge these gaps, first by extending the expert-designed quality rubric and then by adapting the hint generation process, e.g., incorporating the student's comments or history. These efforts contribute toward scalable, personalized, and pedagogically sound AI-assisted feedback systems, which are particularly important for high-enrollment educational settings.

[232] arXiv:2509.03270 [pdf, html, other]
Title: AI Safety Assurance in Electric Vehicles: A Case Study on AI-Driven SOC Estimation
Martin Skoglund, Fredrik Warg, Aria Mirzai, Anders Thorsen, Karl Lundgren, Peter Folkesson, Bastian Havers-zulka
Comments: 12 pages, 9 figures, EVS38, this https URL
Subjects: Software Engineering (cs.SE); Robotics (cs.RO)

Integrating Artificial Intelligence (AI) technology in electric vehicles (EV) introduces unique challenges for safety assurance, particularly within the framework of ISO 26262, which governs functional safety in the automotive domain. Traditional assessment methodologies are not geared toward evaluating AI-based functions and require evolving standards and practices. This paper explores how an independent assessment of an AI component in an EV can be achieved when combining ISO 26262 with the recently released ISO/PAS 8800, whose scope is AI safety for road vehicles. The AI-driven State of Charge (SOC) battery estimation exemplifies the process. Key features relevant to the independent assessment of this extended evaluation approach are identified. As part of the evaluation, robustness testing of the AI component is conducted using fault injection experiments, wherein perturbed sensor inputs are systematically introduced to assess the component's resilience to input variance.

[233] arXiv:2509.03271 [pdf, html, other]
Title: Beyond Quantification: Navigating Uncertainty in Professional AI Systems
Sylvie Delacroix, Diana Robinson, Umang Bhatt, Jacopo Domenicucci, Jessica Montgomery, Gael Varoquaux, Carl Henrik Ek, Vincent Fortuin, Yulan He, Tom Diethe, Neill Campbell, Mennatallah El-Assady, Soren Hauberg, Ivana Dusparic, Neil Lawrence
Subjects: Human-Computer Interaction (cs.HC)

The growing integration of large language models across professional domains transforms how experts make critical decisions in healthcare, education, and law. While significant research effort focuses on getting these systems to communicate their outputs with probabilistic measures of reliability, many consequential forms of uncertainty in professional contexts resist such quantification. A physician pondering the appropriateness of documenting possible domestic abuse, a teacher assessing cultural sensitivity, or a mathematician distinguishing procedural from conceptual understanding face forms of uncertainty that cannot be reduced to percentages. This paper argues for moving beyond simple quantification toward richer expressions of uncertainty essential for beneficial AI integration. We propose participatory refinement processes through which professional communities collectively shape how different forms of uncertainty are communicated. Our approach acknowledges that uncertainty expression is a form of professional sense-making that requires collective development rather than algorithmic optimization.

[234] arXiv:2509.03277 [pdf, html, other]
Title: PointAD+: Learning Hierarchical Representations for Zero-shot 3D Anomaly Detection
Qihang Zhou, Shibo He, Jiangtao Yan, Wenchao Meng, Jiming Chen
Comments: Submitted to TPAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we aim to transfer CLIP's robust 2D generalization capabilities to identify 3D anomalies across unseen objects of highly diverse class semantics. To this end, we propose a unified framework to comprehensively detect and segment 3D anomalies by leveraging both point- and pixel-level information. We first design PointAD, which leverages point-pixel correspondence to represent 3D anomalies through their associated rendering pixel representations. This approach is referred to as implicit 3D representation, as it focuses solely on rendering pixel anomalies but neglects the inherent spatial relationships within point clouds. Then, we propose PointAD+ to further broaden the interpretation of 3D anomalies by introducing explicit 3D representation, emphasizing spatial abnormality to uncover abnormal spatial relationships. Hence, we propose G-aggregation to involve geometry information to enable the aggregated point representations spatially aware. To simultaneously capture rendering and spatial abnormality, PointAD+ proposes hierarchical representation learning, incorporating implicit and explicit anomaly semantics into hierarchical text prompts: rendering prompts for the rendering layer and geometry prompts for the geometry layer. A cross-hierarchy contrastive alignment is further introduced to promote the interaction between the rendering and geometry layers, facilitating mutual anomaly learning. Finally, PointAD+ integrates anomaly semantics from both layers to capture the generalized anomaly semantics. During the test, PointAD+ can integrate RGB information in a plug-and-play manner and further improve its detection performance. Extensive experiments demonstrate the superiority of PointAD+ in ZS 3D anomaly detection across unseen objects with highly diverse class semantics, achieving a holistic understanding of abnormality.

[235] arXiv:2509.03281 [pdf, html, other]
Title: A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks
Qianyi Bai, Haiteng Wang, Qiang Yu
Subjects: Neural and Evolutionary Computing (cs.NE)

While spiking neural networks (SNNs) provide a biologically inspired and energy-efficient computational framework, their robustness and the dynamic advantages inherent to biological neurons remain significantly underutilized owing to oversimplified neuron models. In particular, conventional leaky integrate-and-fire (LIF) neurons often omit the dynamic conductance mechanisms inherent in biological neurons, thereby limiting their capacity to cope with noise and temporal variability. In this work, we revisit dynamic conductance from a functional perspective and uncover its intrinsic role as a biologically plausible gating mechanism that modulates information flow. Building on this insight, we introduce the Dynamic Gated Neuron~(DGN), a novel spiking unit in which membrane conductance evolves in response to neuronal activity, enabling selective input filtering and adaptive noise suppression. We provide a theoretical analysis showing that DGN possess enhanced stochastic stability compared to standard LIF models, with dynamic conductance intriguingly acting as a disturbance rejection mechanism. DGN-based SNNs demonstrate superior performance across extensive evaluations on anti-noise tasks and temporal-related benchmarks such as TIDIGITS and SHD, consistently exhibiting excellent robustness. Our results highlight, for the first time, a biologically plausible dynamic gating as a key mechanism for robust spike-based computation, providing not only theoretical guarantees but also strong empirical validations. This work thus paves the way for more resilient, efficient, and biologically inspired spiking neural networks.

[236] arXiv:2509.03286 [pdf, html, other]
Title: Accountability Framework for Healthcare AI Systems: Towards Joint Accountability in Decision Making
Prachi Bagave, Marcus Westberg, Marijn Janssen, Aaron Yi Ding
Comments: To be published in AAAI AIES 2025
Subjects: Artificial Intelligence (cs.AI)

AI is transforming the healthcare domain and is increasingly helping practitioners to make health-related decisions. Therefore, accountability becomes a crucial concern for critical AI-driven decisions. Although regulatory bodies, such as the EU commission, provide guidelines, they are highlevel and focus on the ''what'' that should be done and less on the ''how'', creating a knowledge gap for actors. Through an extensive analysis, we found that the term accountability is perceived and dealt with in many different ways, depending on the actor's expertise and domain of work. With increasing concerns about AI accountability issues and the ambiguity around this term, this paper bridges the gap between the ''what'' and ''how'' of AI accountability, specifically for AI systems in healthcare. We do this by analysing the concept of accountability, formulating an accountability framework, and providing a three-tier structure for handling various accountability mechanisms. Our accountability framework positions the regulations of healthcare AI systems and the mechanisms adopted by the actors under a consistent accountability regime. Moreover, the three-tier structure guides the actors of the healthcare AI system to categorise the mechanisms based on their conduct. Through our framework, we advocate that decision-making in healthcare AI holds shared dependencies, where accountability should be dealt with jointly and should foster collaborations. We highlight the role of explainability in instigating communication and information sharing between the actors to further facilitate the collaborative process.

[237] arXiv:2509.03290 [pdf, html, other]
Title: Machine Learning-Driven Anomaly Detection for 5G O-RAN Performance Metrics
Babak Azkaei, Kishor Chandra Joshi, George Exarchakos
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)

The ever-increasing reliance of critical services on network infrastructure coupled with the increased operational complexity of beyond-5G/6G networks necessitate the need for proactive and automated network fault management. The provision for open interfaces among different radio access network\,(RAN) elements and the integration of AI/ML into network architecture enabled by the Open RAN\,(O-RAN) specifications bring new possibilities for active network health monitoring and anomaly detection. In this paper we leverage these advantages and develop an anomaly detection framework that proactively detect the possible throughput drops for a UE and minimize the post-handover failures. We propose two actionable anomaly detection algorithms tailored for real-world deployment. The first algorithm identifies user equipment (UE) at risk of severe throughput degradation by analyzing key performance indicators (KPIs) such as resource block utilization and signal quality metrics, enabling proactive handover initiation. The second algorithm evaluates neighbor cell radio coverage quality, filtering out cells with anomalous signal strength or interference levels. This reduces candidate targets for handover by 41.27\% on average. Together, these methods mitigate post-handover failures and throughput drops while operating much faster than the near-real-time latency constraints. This paves the way for self-healing 6G networks.

[238] arXiv:2509.03294 [pdf, html, other]
Title: A Comprehensive Guide to Differential Privacy: From Theory to User Expectations
Napsu Karmitsa, Antti Airola, Tapio Pahikkala, Tinja Pitkämäki
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The increasing availability of personal data has enabled significant advances in fields such as machine learning, healthcare, and cybersecurity. However, this data abundance also raises serious privacy concerns, especially in light of powerful re-identification attacks and growing legal and ethical demands for responsible data use. Differential privacy (DP) has emerged as a principled, mathematically grounded framework for mitigating these risks. This review provides a comprehensive survey of DP, covering its theoretical foundations, practical mechanisms, and real-world applications. It explores key algorithmic tools and domain-specific challenges - particularly in privacy-preserving machine learning and synthetic data generation. The report also highlights usability issues and the need for improved communication and transparency in DP systems. Overall, the goal is to support informed adoption of DP by researchers and practitioners navigating the evolving landscape of data privacy.

[239] arXiv:2509.03300 [pdf, html, other]
Title: LatPhon: Lightweight Multilingual G2P for Romance Languages and English
Luis Felipe Chary, Miguel Arjona Ramirez
Subjects: Computation and Language (cs.CL)

Grapheme-to-phoneme (G2P) conversion is a key front-end for text-to-speech (TTS), automatic speech recognition (ASR), speech-to-speech translation (S2ST) and alignment systems, especially across multiple Latin-script this http URL present LatPhon, a 7.5 M - parameter Transformer jointly trained on six such languages--English, Spanish, French, Italian, Portuguese, and Romanian. On the public ipa-dict corpus, it attains a mean phoneme error rate (PER) of 3.5%, outperforming the byte-level ByT5 baseline (5.4%) and approaching language-specific WFSTs (3.2%) while occupying 30 MB of memory, which makes on-device deployment feasible when needed. These results indicate that compact multilingual G2P can serve as a universal front-end for Latin-language speech pipelines.

[240] arXiv:2509.03303 [pdf, html, other]
Title: Automatic Differentiation of Agent-Based Models
Arnau Quera-Bofarull, Nicholas Bishop, Joel Dyer, Daniel Jarne Ornia, Anisoara Calinescu, Doyne Farmer, Michael Wooldridge
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

Agent-based models (ABMs) simulate complex systems by capturing the bottom-up interactions of individual agents comprising the system. Many complex systems of interest, such as epidemics or financial markets, involve thousands or even millions of agents. Consequently, ABMs often become computationally demanding and rely on the calibration of numerous free parameters, which has significantly hindered their widespread adoption. In this paper, we demonstrate that automatic differentiation (AD) techniques can effectively alleviate these computational burdens. By applying AD to ABMs, the gradients of the simulator become readily available, greatly facilitating essential tasks such as calibration and sensitivity analysis. Specifically, we show how AD enables variational inference (VI) techniques for efficient parameter calibration. Our experiments demonstrate substantial performance improvements and computational savings using VI on three prominent ABMs: Axtell's model of firms; Sugarscape; and the SIR epidemiological model. Our approach thus significantly enhances the practicality and scalability of ABMs for studying complex systems.

[241] arXiv:2509.03310 [pdf, html, other]
Title: app.build: A Production Framework for Scaling Agentic Prompt-to-App Generation with Environment Scaffolding
Evgenii Kniazev, Arseny Kravchenko, Igor Rekun, James Broadhead, Nikita Shamgunov, Pranav Sah, Pratik Nichite, Ivan Yamshchikov
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

We present this http URL (this https URL), an open-source framework that improves LLM-based application generation through systematic validation and structured environments. Our approach combines multi-layered validation pipelines, stack-specific orchestration, and model-agnostic architecture, implemented across three reference stacks. Through evaluation on 30 generation tasks, we demonstrate that comprehensive validation achieves 73.3% viability rate with 30% reaching perfect quality scores, while open-weights models achieve 80.8% of closed-model performance when provided structured environments. The open-source framework has been adopted by the community, with over 3,000 applications generated to date. This work demonstrates that scaling reliable AI agents requires scaling environments, not just models -- providing empirical insights and complete reference implementations for production-oriented agent systems.

[242] arXiv:2509.03312 [pdf, other]
Title: AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
Guibin Zhang, Junhao Wang, Junjie Chen, Wangchunshu Zhou, Kun Wang, Shuicheng Yan
Subjects: Computation and Language (cs.CL); Multiagent Systems (cs.MA)

Large Language Model (LLM)-based agentic systems, often comprising multiple models, complex tool invocations, and orchestration protocols, substantially outperform monolithic agents. Yet this very sophistication amplifies their fragility, making them more prone to system failure. Pinpointing the specific agent or step responsible for an error within long execution traces defines the task of agentic system failure attribution. Current state-of-the-art reasoning LLMs, however, remain strikingly inadequate for this challenge, with accuracy generally below 10%. To address this gap, we propose AgenTracer, the first automated framework for annotating failed multi-agent trajectories via counterfactual replay and programmed fault injection, producing the curated dataset TracerTraj. Leveraging this resource, we develop AgenTracer-8B, a lightweight failure tracer trained with multi-granular reinforcement learning, capable of efficiently diagnosing errors in verbose multi-agent interactions. On the Who&When benchmark, AgenTracer-8B outperforms giant proprietary LLMs like Gemini-2.5-Pro and Claude-4-Sonnet by up to 18.18%, setting a new standard in LLM agentic failure attribution. More importantly, AgenTracer-8B delivers actionable feedback to off-the-shelf multi-agent systems like MetaGPT and MaAS with 4.8-14.2% performance gains, empowering self-correcting and self-evolving agentic AI.

[243] arXiv:2509.03316 [pdf, html, other]
Title: Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning
Fatemeh Azad, Zoran Bosnić, Matjaž Kukar
Subjects: Machine Learning (cs.LG)

Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. Our work highlights the potential of ensemble learning in imputation and paves the way for more robust, modular, and interpretable preprocessing pipelines in real-world machine learning systems.

[244] arXiv:2509.03318 [pdf, other]
Title: Semantically Reflected Programs
Eduard Kamburjan, Vidar Norstein Klungre, Yuanwei Qu, Rudolf Schlatte, Egor V. Kostylev, Martin Giese, Einar Broch Johnsen
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)

This paper addresses the dichotomy between the formalization of structural and the formalization of behavioral knowledge by means of semantically lifted programs, which explore an intuitive connection between programs and knowledge graphs. While knowledge graphs and ontologies are eminently useful to represent formal knowledge about a system's individuals and universals, programming languages are designed to describe the system's evolution. To address this dichotomy, we introduce a semantic lifting of the program states of an executing program into a knowledge graph, for an object-oriented programming language. The resulting graph is exposed as a semantic reflection layer within the programming language, allowing programmers to leverage knowledge of the application domain in their programs. In this paper, we formalize semantic lifting and semantic reflection for a small programming language, SMOL, explain the operational aspects of the language, and consider type correctness and virtualisation for runtime program queries through the semantic reflection layer. We illustrate semantic lifting and semantic reflection through a case study of geological modelling and discuss different applications of the technique. The language implementation is open source and available online.

[245] arXiv:2509.03319 [pdf, html, other]
Title: Temporal social network modeling of mobile connectivity data with graph neural networks
Joel Jaskari, Chandreyee Roy, Fumiko Ogushi, Mikko Saukkoriipi, Jaakko Sahlsten, Kimmo Kaski
Comments: 22 pages, 7 figures
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)

Graph neural networks (GNNs) have emerged as a state-of-the-art data-driven tool for modeling connectivity data of graph-structured complex networks and integrating information of their nodes and edges in space and time. However, as of yet, the analysis of social networks using the time series of people's mobile connectivity data has not been extensively investigated. In the present study, we investigate four snapshot - based temporal GNNs in predicting the phone call and SMS activity between users of a mobile communication network. In addition, we develop a simple non - GNN baseline model using recently proposed EdgeBank method. Our analysis shows that the ROLAND temporal GNN outperforms the baseline model in most cases, whereas the other three GNNs perform on average worse than the baseline. The results show that GNN based approaches hold promise in the analysis of temporal social networks through mobile connectivity data. However, due to the relatively small performance margin between ROLAND and the baseline model, further research is required on specialized GNN architectures for temporal social network analysis.

[246] arXiv:2509.03321 [pdf, html, other]
Title: Empowering Lightweight MLLMs with Reasoning via Long CoT SFT
Linyu Ou
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While Reinforcement Learning with Verifiable Rewards has enhanced the reasoning of large-scale language models (LLMs), its efficacy for lightweight multimodal language models (MLLMs) with fewer than seven billion parameters remains underexplored. This paper investigates the role of long Chain-of-Thought (long CoT) data in enhancing the reasoning abilities of such MLLMs. Our findings demonstrate that Supervised Fine-Tuning (SFT) with long CoT data significantly improves MLLM reasoning. Furthermore, we observe that after this initial SFT phase, MLLMs can achieve additional performance gains through a subsequent RL stage. We conclude that a SFT stage with long CoT data is a critical prerequisite for developing the reasoning capabilities of lightweight MLLMs.

[247] arXiv:2509.03323 [pdf, other]
Title: Heatmap Guided Query Transformers for Robust Astrocyte Detection across Immunostains and Resolutions
Xizhe Zhang, Jiayang Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Astrocytes are critical glial cells whose altered morphology and density are hallmarks of many neurological disorders. However, their intricate branching and stain dependent variability make automated detection of histological images a highly challenging task. To address these challenges, we propose a hybrid CNN Transformer detector that combines local feature extraction with global contextual reasoning. A heatmap guided query mechanism generates spatially grounded anchors for small and faint astrocytes, while a lightweight Transformer module improves discrimination in dense clusters. Evaluated on ALDH1L1 and GFAP stained astrocyte datasets, the model consistently outperformed Faster R-CNN, YOLOv11 and DETR, achieving higher sensitivity with fewer false positives, as confirmed by FROC analysis. These results highlight the potential of hybrid CNN Transformer architectures for robust astrocyte detection and provide a foundation for advanced computational pathology tools.

[248] arXiv:2509.03324 [pdf, html, other]
Title: InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point clouds
Yixiong Jing, Cheng Zhang, Haibing Wu, Guangming Wang, Olaf Wysocki, Brian Sheil
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Point clouds are widely used for infrastructure monitoring by providing geometric information, where segmentation is required for downstream tasks such as defect detection. Existing research has automated semantic segmentation of structural components, while brick-level segmentation (identifying defects such as spalling and mortar loss) has been primarily conducted from RGB images. However, acquiring high-resolution images is impractical in low-light environments like masonry tunnels. Point clouds, though robust to dim lighting, are typically unstructured, sparse, and noisy, limiting fine-grained segmentation. We present InfraDiffusion, a zero-shot framework that projects masonry point clouds into depth maps using virtual cameras and restores them by adapting the Denoising Diffusion Null-space Model (DDNM). Without task-specific training, InfraDiffusion enhances visual clarity and geometric consistency of depth maps. Experiments on masonry bridge and tunnel point cloud datasets show significant improvements in brick-level segmentation using the Segment Anything Model (SAM), underscoring its potential for automated inspection of masonry assets. Our code and data is available at this https URL.

[249] arXiv:2509.03329 [pdf, html, other]
Title: SESGO: Spanish Evaluation of Stereotypical Generative Outputs
Melissa Robles, Catalina Bernal, Denniss Raigoso, Mateo Dulce Rubio
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

This paper addresses the critical gap in evaluating bias in multilingual Large Language Models (LLMs), with a specific focus on Spanish language within culturally-aware Latin American contexts. Despite widespread global deployment, current evaluations remain predominantly US-English-centric, leaving potential harms in other linguistic and cultural contexts largely underexamined. We introduce a novel, culturally-grounded framework for detecting social biases in instruction-tuned LLMs. Our approach adapts the underspecified question methodology from the BBQ dataset by incorporating culturally-specific expressions and sayings that encode regional stereotypes across four social categories: gender, race, socioeconomic class, and national origin. Using more than 4,000 prompts, we propose a new metric that combines accuracy with the direction of error to effectively balance model performance and bias alignment in both ambiguous and disambiguated contexts. To our knowledge, our work presents the first systematic evaluation examining how leading commercial LLMs respond to culturally specific bias in the Spanish language, revealing varying patterns of bias manifestation across state-of-the-art models. We also contribute evidence that bias mitigation techniques optimized for English do not effectively transfer to Spanish tasks, and that bias patterns remain largely consistent across different sampling temperatures. Our modular framework offers a natural extension to new stereotypes, bias categories, or languages and cultural contexts, representing a significant step toward more equitable and culturally-aware evaluation of AI systems in the diverse linguistic environments where they operate.

[250] arXiv:2509.03331 [pdf, html, other]
Title: VulnRepairEval: An Exploit-Based Evaluation Framework for Assessing Large Language Model Vulnerability Repair Capabilities
Weizhe Wang, Wei Ma, Qiang Hu, Yao Zhang, Jianfei Sun, Bin Wu, Yang Liu, Guangquan Xu, Lingxiao Jiang
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

The adoption of Large Language Models (LLMs) for automated software vulnerability patching has shown promising outcomes on carefully curated evaluation sets. Nevertheless, existing datasets predominantly rely on superficial validation methods rather than exploit-based verification, leading to overestimated performance in security-sensitive applications. This paper introduces VulnRepairEval, an evaluation framework anchored in functional Proof-of-Concept (PoC) exploits. Our framework delivers a comprehensive, containerized evaluation pipeline that enables reproducible differential assessment, where repair success requires the original exploit to fail execution against the modified code. The benchmark construction involved extensive data curation: we processed over 400 CVEs and approximately 2,500 potential sources to extract a collection of authentic vulnerability instances (23 Python CVEs) amenable to automated testing with working PoCs. Through VulnRepairEval, we conduct a comprehensive evaluation of 12 popular LLMs and observe a significant performance deficit: even the top-performing model successfully addresses merely 5/23 instances (about 21.7%), exposing critical weaknesses in security-focused applications. Our failure analysis reveals that most unsuccessful attempts stem from imprecise vulnerability identification and patches containing syntactic or semantic errors. Enhanced prompting strategies and multi-agent approaches yield minimal improvements, with overall effectiveness remaining largely unaffected. This work contributes a stringent, practical evaluation framework for LLM-driven vulnerability remediation and underscores the necessity for assessment protocols that authentically reflect real-world exploitation scenarios.

[251] arXiv:2509.03335 [pdf, html, other]
Title: EvolveSignal: A Large Language Model Powered Coding Agent for Discovering Traffic Signal Control Algorithms
Leizhen Wang, Peibo Duan, Hao Wang, Yue Wang, Jian Xu, Nan Zheng, Zhenliang Ma
Subjects: Machine Learning (cs.LG)

In traffic engineering, the fixed-time traffic signal control remains widely used for its low cost, stability, and interpretability. However, its design depends on hand-crafted formulas (e.g., Webster) and manual re-timing by engineers to adapt to demand changes, which is labor-intensive and often yields suboptimal results under heterogeneous or congested conditions. This paper introduces the EvolveSignal, a large language models (LLMs) powered coding agent to automatically discover new traffic signal control algorithms. We formulate the problem as program synthesis, where candidate algorithms are represented as Python functions with fixed input-output structures, and iteratively optimized through external evaluations (e.g., a traffic simulator) and evolutionary search. Experiments on a signalized intersection demonstrate that the discovered algorithms outperform Webster's baseline, reducing average delay by 20.1% and average stops by 47.1%. Beyond performance, ablation and incremental analyses reveal that EvolveSignal modifications-such as adjusting cycle length bounds, incorporating right-turn demand, and rescaling green allocations-can offer practically meaningful insights for traffic engineers. This work opens a new research direction by leveraging AI for algorithm design in traffic signal control, bridging program synthesis with transportation engineering.

[252] arXiv:2509.03337 [pdf, html, other]
Title: New Bounds for Linear Codes with Applications
Liren Lin, Guanghui Zhang, Bocong Chen, Hongwei Liu
Comments: 15 pages
Subjects: Information Theory (cs.IT)

Bounds on linear codes play a central role in coding theory, as they capture the fundamental trade-off between error-correction capability (minimum distance) and information rate (dimension relative to length). Classical results characterize this trade-off solely in terms of the parameters $n$, $k$, $d$ and $q$. In this work we derive new bounds under the additional assumption that the code contains a nonzero codeword of weight $w$.By combining residual-code techniques with classical results such as the Singleton and Griesmer bounds,we obtain explicit inequalities linking $n$, $k$, $d$, $q$ and $w$. These bounds impose sharper restrictions on admissible codeword weights, particularly those close to the minimum distance or to the code length. Applications include refined constraints on the weights of MDS codes, numerical restrictions on general linear codes, and excluded weight ranges in the weight distribution. Numerical comparisons across standard parameter sets demonstrate that these $w$-aware bounds strictly enlarge known excluded weight ranges and sharpen structural limitations on linear codes.

[253] arXiv:2509.03340 [pdf, html, other]
Title: Equivariant Flow Matching for Symmetry-Breaking Bifurcation Problems
Fleur Hendriks, Ondřej Rokoš, Martin Doškář, Marc G.D. Geers, Vlado Menkovski
Comments: 12 pages, 7 figures including appendices
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)

Bifurcation phenomena in nonlinear dynamical systems often lead to multiple coexisting stable solutions, particularly in the presence of symmetry breaking. Deterministic machine learning models struggle to capture this multiplicity, averaging over solutions and failing to represent lower-symmetry outcomes. In this work, we propose a generative framework based on flow matching to model the full probability distribution over bifurcation outcomes. Our method enables direct sampling of multiple valid solutions while preserving system symmetries through equivariant modeling. We introduce a symmetric matching strategy that aligns predicted and target outputs under group actions, allowing accurate learning in equivariant settings. We validate our approach on a range of systems, from toy models to complex physical problems such as buckling beams and the Allen-Cahn equation. Our results demonstrate that flow matching significantly outperforms non-probabilistic and variational methods in capturing multimodal distributions and symmetry-breaking bifurcations, offering a principled and scalable solution for modeling multistability in high-dimensional systems.

[254] arXiv:2509.03341 [pdf, html, other]
Title: On the MIA Vulnerability Gap Between Private GANs and Diffusion Models
Ilana Sebag, Jean-Yves Franceschi, Alain Rakotomamonjy, Alexandre Allauzen, Jamal Atif
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis. While both can be trained under differential privacy (DP) to protect sensitive data, their sensitivity to membership inference attacks (MIAs), a key threat to data confidentiality, remains poorly understood. In this work, we present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models. We begin by showing, through a stability-based analysis, that GANs exhibit fundamentally lower sensitivity to data perturbations than diffusion models, suggesting a structural advantage in resisting MIAs. We then validate this insight with a comprehensive empirical study using a standardized MIA pipeline to evaluate privacy leakage across datasets and privacy budgets. Our results consistently reveal a marked privacy robustness gap in favor of GANs, even in strong DP regimes, highlighting that model type alone can critically shape privacy leakage.

[255] arXiv:2509.03345 [pdf, html, other]
Title: Language Models Do Not Follow Occam's Razor: A Benchmark for Inductive and Abductive Reasoning
Yunxin Sun, Abulhair Saparov
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Reasoning is a core capability in artificial intelligence systems, for which large language models (LLMs) have recently shown remarkable progress. However, most work focuses exclusively on deductive reasoning, which is problematic since other types of reasoning are also essential in solving real-world problems, and they are less explored. This work focuses on evaluating LLMs' inductive and abductive reasoning capabilities. We introduce a programmable and synthetic dataset, InAbHyD (pronounced in-a-bid), where each reasoning example consists of an incomplete world model and a set of observations. The task for the intelligent agent is to produce hypotheses to explain observations under the incomplete world model to solve each reasoning example. We propose a new metric to evaluate the quality of hypotheses based on Occam's Razor. We evaluate and analyze some state-of-the-art LLMs. Our analysis shows that LLMs can perform inductive and abductive reasoning in simple scenarios, but struggle with complex world models and producing high-quality hypotheses, even with popular reasoning-enhancing techniques such as in-context learning and RLVR.

[256] arXiv:2509.03346 [pdf, html, other]
Title: Solving Polynomial Systems with Gröbner Bases: An Introduction to F4 and FGLM
Anna Maria Bigatti, Alessio Caminata, Tor Kristian Ellingsen, Evelina Lanteri, Andrea Sanguineti, Irene Villa
Comments: are welcome
Subjects: Symbolic Computation (cs.SC); Commutative Algebra (math.AC)

These notes originate from a reading course held by the authors in the spring of 2024 at the Università di Genova. They provide a hands-on introduction to the F4 and FGLM algorithms. In addition to the notes, we present two implementations of the algorithms: FGLM in CoCoALib and F4 in Sage. These implementations closely follow the structure of the algorithms as described here and are intended to help readers experiment with them in practice, thereby gaining a deeper understanding.

[257] arXiv:2509.03348 [pdf, html, other]
Title: Generative Auto-Bidding in Large-Scale Competitive Auctions via Diffusion Completer-Aligner
Yewen Li, Jingtong Gao, Nan Jiang, Shuai Mao, Ruyi An, Fei Pan, Xiangyu Zhao, Bo An, Qingpeng Cai, Peng Jiang
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

Auto-bidding is central to computational advertising, achieving notable commercial success by optimizing advertisers' bids within economic constraints. Recently, large generative models show potential to revolutionize auto-bidding by generating bids that could flexibly adapt to complex, competitive environments. Among them, diffusers stand out for their ability to address sparse-reward challenges by focusing on trajectory-level accumulated rewards, as well as their explainable capability, i.e., planning a future trajectory of states and executing bids accordingly. However, diffusers struggle with generation uncertainty, particularly regarding dynamic legitimacy between adjacent states, which can lead to poor bids and further cause significant loss of ad impression opportunities when competing with other advertisers in a highly competitive auction environment. To address it, we propose a Causal auto-Bidding method based on a Diffusion completer-aligner framework, termed CBD. Firstly, we augment the diffusion training process with an extra random variable t, where the model observes t-length historical sequences with the goal of completing the remaining sequence, thereby enhancing the generated sequences' dynamic legitimacy. Then, we employ a trajectory-level return model to refine the generated trajectories, aligning more closely with advertisers' objectives. Experimental results across diverse settings demonstrate that our approach not only achieves superior performance on large-scale auto-bidding benchmarks, such as a 29.9% improvement in conversion value in the challenging sparse-reward auction setting, but also delivers significant improvements on the Kuaishou online advertising platform, including a 2.0% increase in target cost.

[258] arXiv:2509.03350 [pdf, html, other]
Title: Exposing Privacy Risks in Anonymizing Clinical Data: Combinatorial Refinement Attacks on k-Anonymity Without Auxiliary Information
Somiya Chhillar, Mary K. Righi, Rebecca E. Sutter, Evgenios M. Kornaropoulos
Subjects: Cryptography and Security (cs.CR)

Despite longstanding criticism from the privacy community, k-anonymity remains a widely used standard for data anonymization, mainly due to its simplicity, regulatory alignment, and preservation of data utility. However, non-experts often defend k-anonymity on the grounds that, in the absence of auxiliary information, no known attacks can compromise its protections. In this work, we refute this claim by introducing Combinatorial Refinement Attacks (CRA), a new class of privacy attacks targeting k-anonymized datasets produced using local recoding. This is the first method that does not rely on external auxiliary information or assumptions about the underlying data distribution. CRA leverages the utility-optimizing behavior of local recoding anonymization of ARX, which is a widely used open-source software for anonymizing data in clinical settings, to formulate a linear program that significantly reduces the space of plausible sensitive values. To validate our findings, we partnered with a network of free community health clinics, an environment where (1) auxiliary information is indeed hard to find due to the population they serve and (2) open-source k-anonymity solutions are attractive due to regulatory obligations and limited resources. Our results on real-world clinical microdata reveal that even in the absence of external information, established anonymization frameworks do not deliver the promised level of privacy, raising critical privacy concerns.

[259] arXiv:2509.03351 [pdf, other]
Title: epiGPTope: A machine learning-based epitope generator and classifier
Natalia Flechas Manrique, Alberto Martínez, Elena López-Martínez, Luc Andrea, Román Orus, Aitor Manteca, Aitziber L. Cortajarena, Llorenç Espinosa-Portalés
Comments: 11 pages, 4 figures. Supplementary Information with 5 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Epitopes are short antigenic peptide sequences which are recognized by antibodies or immune cell receptors. These are central to the development of immunotherapies, vaccines, and diagnostics. However, the rational design of synthetic epitope libraries is challenging due to the large combinatorial sequence space, $20^n$ combinations for linear epitopes of n amino acids, making screening and testing unfeasible, even with high throughput experimental techniques. In this study, we present a large language model, epiGPTope, pre-trained on protein data and specifically fine-tuned on linear epitopes, which for the first time can directly generate novel epitope-like sequences, which are found to possess statistical properties analogous to the ones of known epitopes. This generative approach can be used to prepare libraries of epitope candidate sequences. We further train statistical classifiers to predict whether an epitope sequence is of bacterial or viral origin, thus narrowing the candidate library and increasing the likelihood of identifying specific epitopes. We propose that such combination of generative and predictive models can be of assistance in epitope discovery. The approach uses only primary amino acid sequences of linear epitopes, bypassing the need for a geometric framework or hand-crafted features of the sequences. By developing a method to create biologically feasible sequences, we anticipate faster and more cost-effective generation and screening of synthetic epitopes, with relevant applications in the development of new biotechnologies.

[260] arXiv:2509.03353 [pdf, html, other]
Title: Fair Resource Allocation for Fleet Intelligence
Oguzhan Baser, Kaan Kale, Po-han Li, Sandeep Chinchali
Comments: This paper has been accepted for presentation at the 2025 IEEE Global Communications Conference (GLOBECOM 2025)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Resource allocation is crucial for the performance optimization of cloud-assisted multi-agent intelligence. Traditional methods often overlook agents' diverse computational capabilities and complex operating environments, leading to inefficient and unfair resource distribution. To address this, we open-sourced Fair-Synergy, an algorithmic framework that utilizes the concave relationship between the agents' accuracy and the system resources to ensure fair resource allocation across fleet intelligence. We extend traditional allocation approaches to encompass a multidimensional machine learning utility landscape defined by model parameters, training data volume, and task complexity. We evaluate Fair-Synergy with advanced vision and language models such as BERT, VGG16, MobileNet, and ResNets on datasets including MNIST, CIFAR-10, CIFAR-100, BDD, and GLUE. We demonstrate that Fair-Synergy outperforms standard benchmarks by up to 25% in multi-agent inference and 11% in multi-agent learning settings. Also, we explore how the level of fairness affects the least advantaged, most advantaged, and average agents, providing insights for equitable fleet intelligence.

[261] arXiv:2509.03358 [pdf, other]
Title: Some patterns of sleep quality and Daylight Saving Time across countries: a predictive and exploratory analysis
Bhanu Sharma, Eugene Pinsky
Comments: 16 Pages
Journal-ref: International Journal of Data Mining & Knowledge Management Process (IJDKP) 2025
Subjects: Machine Learning (cs.LG)

In this study we analyzed average sleep durations across 61 countries to examine the impact of Daylight Saving Time (DST) practices. Key metrics influencing sleep were identified, and statistical correlation analysis was applied to explore relationships among these factors. Countries were grouped based on DST observance, and visualizations compared sleep patterns between DST and non-DST regions. Results show that, on average, countries observing DST tend to report longer sleep durations than those that do not. A more detailed pattern emerged when accounting for latitude: at lower latitudes, DST-observing countries reported shorter sleep durations compared to non-DST countries, while at higher latitudes, DST-observing countries reported longer average sleep durations. These findings suggest that the influence of DST on sleep may be moderated by geographical location.

[262] arXiv:2509.03365 [pdf, other]
Title: The distribution of calibrated likelihood functions on the probability-likelihood Aitchison simplex
Paul-Gauthier Noé, Andreas Nautsch, Driss Matrouf, Pierre-Michel Bousquet, Jean-François Bonastre
Comments: Preprint. Under review
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

While calibration of probabilistic predictions has been widely studied, this paper rather addresses calibration of likelihood functions. This has been discussed, especially in biometrics, in cases with only two exhaustive and mutually exclusive hypotheses (classes) where likelihood functions can be written as log-likelihood-ratios (LLRs). After defining calibration for LLRs and its connection with the concept of weight-of-evidence, we present the idempotence property and its associated constraint on the distribution of the LLRs. Although these results have been known for decades, they have been limited to the binary case. Here, we extend them to cases with more than two hypotheses by using the Aitchison geometry of the simplex, which allows us to recover, in a vector form, the additive form of the Bayes' rule; extending therefore the LLR and the weight-of-evidence to any number of hypotheses. Especially, we extend the definition of calibration, the idempotence, and the constraint on the distribution of likelihood functions to this multiple hypotheses and multiclass counterpart of the LLR: the isometric-log-ratio transformed likelihood function. This work is mainly conceptual, but we still provide one application to machine learning by presenting a non-linear discriminant analysis where the discriminant components form a calibrated likelihood function over the classes, improving therefore the interpretability and the reliability of the method.

[263] arXiv:2509.03367 [pdf, html, other]
Title: Tuning Block Size for Workload Optimization in Consortium Blockchain Networks
Narges Dadkhah, Somayeh Mohammadi, Gerhard Wunder
Subjects: Cryptography and Security (cs.CR)

Determining the optimal block size is crucial for achieving high throughput in blockchain systems. Many studies have focused on tuning various components, such as databases, network bandwidth, and consensus mechanisms. However, the impact of block size on system performance remains a topic of debate, often resulting in divergent views and even leading to new forks in blockchain networks. This research proposes a mathematical model to maximize performance by determining the ideal block size for Hyperledger Fabric, a prominent consortium blockchain. By leveraging machine learning and solving the model with a genetic algorithm, the proposed approach assesses how factors such as block size, transaction size, and network capacity influence the block processing time. The integration of an optimization solver enables precise adjustments to block size configuration before deployment, ensuring improved performance from the outset. This systematic approach aims to balance block processing efficiency, network latency, and system throughput, offering a robust solution to improve blockchain performance across diverse business contexts.

[264] arXiv:2509.03370 [pdf, html, other]
Title: Neural Field Turing Machine: A Differentiable Spatial Computer
Akash Malhotra, Nacéra Seghouani
Comments: 11 Pages, 6 Figures
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

We introduce the Neural Field Turing Machine (NFTM), a differentiable architecture that unifies symbolic computation, physical simulation, and perceptual inference within continuous spatial fields. NFTM combines a neural controller, continuous memory field, and movable read/write heads that perform local updates. At each timestep, the controller reads local patches, computes updates via learned rules, and writes them back while updating head positions. This design achieves linear O(N) scaling through fixed-radius neighborhoods while maintaining Turing completeness under bounded error. We demonstrate three example instantiations of NFTM: cellular automata simulation (Rule 110), physics-informed PDE solvers (2D heat equation), and iterative image refinement (CIFAR-10 inpainting). These instantiations learn local update rules that compose into global dynamics, exhibit stable long-horizon rollouts, and generalize beyond training horizons. NFTM provides a unified computational substrate bridging discrete algorithms and continuous field dynamics within a single differentiable framework.

[265] arXiv:2509.03373 [pdf, html, other]
Title: Cluster and then Embed: A Modular Approach for Visualization
Elizabeth Coda, Ery Arias-Castro, Gal Mishne
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Dimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However, t-SNE and UMAP also tend to distort the global geometry of the underlying data. We propose a more transparent, modular approach consisting of first clustering the data, then embedding each cluster, and finally aligning the clusters to obtain a global embedding. We demonstrate this approach on several synthetic and real-world datasets and show that it is competitive with existing methods, while being much more transparent.

[266] arXiv:2509.03376 [pdf, html, other]
Title: Transformer-Guided Content-Adaptive Graph Learning for Hyperspectral Unmixing
Hui Chen, Liangyu Liu, Xianchao Xiu, Wanquan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Hyperspectral unmixing (HU) targets to decompose each mixed pixel in remote sensing images into a set of endmembers and their corresponding abundances. Despite significant progress in this field using deep learning, most methods fail to simultaneously characterize global dependencies and local consistency, making it difficult to preserve both long-range interactions and boundary details. This letter proposes a novel transformer-guided content-adaptive graph unmixing framework (T-CAGU), which overcomes these challenges by employing a transformer to capture global dependencies and introducing a content-adaptive graph neural network to enhance local relationships. Unlike previous work, T-CAGU integrates multiple propagation orders to dynamically learn the graph structure, ensuring robustness against noise. Furthermore, T-CAGU leverages a graph residual mechanism to preserve global information and stabilize training. Experimental results demonstrate its superiority over the state-of-the-art methods. Our code is available at this https URL.

[267] arXiv:2509.03377 [pdf, html, other]
Title: Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing
Rui Xie, Asad Ul Haq, Linsen Ma, Yunhua Fang, Zirak Burzin Engineer, Liu Liu, Tong Zhang
Subjects: Hardware Architecture (cs.AR)

Large language model (LLM) inference is bottlenecked by the limited bandwidth of CXL-based memory used for capacity expansion. We introduce CXL-NDP, a transparent near-data processing architecture that amplifies effective CXL bandwidth without requiring changes to the this http URL interface or AI models. CXL-NDP integrates a precision-scalable bit-plane layout for dynamic quantization with transparent lossless compression of weights and KV caches directly within the CXL device. In end-to-end serving, CXL-NDP improves throughput by 43%, extends the maximum context length by 87%, and reduces the KV cache footprint by 46.9% without accuracy loss. Hardware synthesis confirms its practicality with a modest silicon footprint, lowering the barrier for adopting efficient, scalable CXL-based memory in generative AI infrastructure.

[268] arXiv:2509.03379 [pdf, html, other]
Title: TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers
Guoxin Wang, Qingyuan Wang, Binhua Huang, Shaowu Chen, Deepu John
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Vision Transformers (ViTs) achieve strong performance in image classification but incur high computational costs from processing all image tokens. To reduce inference costs in large ViTs without compromising accuracy, we propose TinyDrop, a training-free token dropping framework guided by a lightweight vision model. The guidance model estimates the importance of tokens while performing inference, thereby selectively discarding low-importance tokens if large vit models need to perform attention calculations. The framework operates plug-and-play, requires no architectural modifications, and is compatible with diverse ViT architectures. Evaluations on standard image classification benchmarks demonstrate that our framework reduces FLOPs by up to 80% for ViTs with minimal accuracy degradation, highlighting its generalization capability and practical utility for efficient ViT-based classification.

[269] arXiv:2509.03380 [pdf, other]
Title: Situating AI Agents in their World: Aspective Agentic AI for Dynamic Partially Observable Information Systems
Peter J. Bentley, Soo Ling Lim, Fuyuki Ishikawa
Comments: 9 pages
Journal-ref: 7th International Workshop on Agent-Based Modelling of Human Behaviour (ABMHuB'25), ALife 2025
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Agentic LLM AI agents are often little more than autonomous chatbots: actors following scripts, often controlled by an unreliable director. This work introduces a bottom-up framework that situates AI agents in their environment, with all behaviors triggered by changes in their environments. It introduces the notion of aspects, similar to the idea of umwelt, where sets of agents perceive their environment differently to each other, enabling clearer control of information. We provide an illustrative implementation and show that compared to a typical architecture, which leaks up to 83% of the time, aspective agentic AI enables zero information leakage. We anticipate that this concept of specialist agents working efficiently in their own information niches can provide improvements to both security and efficiency.

[270] arXiv:2509.03381 [pdf, html, other]
Title: Dependency Chain Analysis of ROS 2 DDS QoS Policies: From Lifecycle Tutorial to Static Verification
Sanghoon Lee, Junha Kang, Kyung-Joon Park
Comments: 14 pages, 4 figures
Subjects: Networking and Internet Architecture (cs.NI); Robotics (cs.RO)

Robot Operating System 2 (ROS 2) relies on the Data Distribution Service (DDS), which offers more than 20 Quality of Service (QoS) policies governing availability, reliability, and resource usage. Yet ROS 2 users lack clear guidance on safe policy combinations and validation processes prior to deployment, which often leads to trial-and-error tuning and unexpected runtime failures. To address these challenges, we analyze DDS Publisher-Subscriber communication over a life cycle divided into Discovery, Data Exchange, and Disassociation, and provide a user oriented tutorial explaining how 16 QoS policies operate in each phase. Building on this analysis, we derive a QoS dependency chain that formalizes inter-policy relationships and classifies 41 dependency violation rules, capturing constraints that commonly cause communication failures in practice. Finally, we introduce QoS Guard, a ROS 2 package that statically validates DDS XML profiles offline, flags conflicts, and enables safe, predeployment tuning without establishing a live ROS 2 session. Together, these contributions give ROS 2 users both conceptual insight and a concrete tool that enables early detection of misconfigurations, improving the reliability and resource efficiency of ROS 2 based robotic systems.

[271] arXiv:2509.03383 [pdf, html, other]
Title: ANNIE: Be Careful of Your Robots
Yiyang Huang, Zixuan Wang, Zishen Wan, Yapeng Tian, Haobo Xu, Yinhe Han, Yiming Gan
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

The integration of vision-language-action (VLA) models into embodied AI (EAI) robots is rapidly advancing their ability to perform complex, long-horizon tasks in humancentric environments. However, EAI systems introduce critical security risks: a compromised VLA model can directly translate adversarial perturbations on sensory input into unsafe physical actions. Traditional safety definitions and methodologies from the machine learning community are no longer sufficient. EAI systems raise new questions, such as what constitutes safety, how to measure it, and how to design effective attack and defense mechanisms in physically grounded, interactive settings. In this work, we present the first systematic study of adversarial safety attacks on embodied AI systems, grounded in ISO standards for human-robot interactions. We (1) formalize a principled taxonomy of safety violations (critical, dangerous, risky) based on physical constraints such as separation distance, velocity, and collision boundaries; (2) introduce ANNIEBench, a benchmark of nine safety-critical scenarios with 2,400 video-action sequences for evaluating embodied safety; and (3) ANNIE-Attack, a task-aware adversarial framework with an attack leader model that decomposes long-horizon goals into frame-level perturbations. Our evaluation across representative EAI models shows attack success rates exceeding 50% across all safety categories. We further demonstrate sparse and adaptive attack strategies and validate the real-world impact through physical robot experiments. These results expose a previously underexplored but highly consequential attack surface in embodied AI systems, highlighting the urgent need for security-driven defenses in the physical AI era. Code is available at this https URL.

[272] arXiv:2509.03385 [pdf, html, other]
Title: Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Reina Ishikawa, Ryo Fujii, Hideo Saito, Ryo Hachiuma
Comments: Accepted to ICCV Workshop 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Evaluating concept customization is challenging, as it requires a comprehensive assessment of fidelity to generative prompts and concept images. Moreover, evaluating multiple concepts is considerably more difficult than evaluating a single concept, as it demands detailed assessment not only for each individual concept but also for the interactions among concepts. While humans can intuitively assess generated images, existing metrics often provide either overly narrow or overly generalized evaluations, resulting in misalignment with human preference. To address this, we propose Decomposed GPT Score (D-GPTScore), a novel human-aligned evaluation method that decomposes evaluation criteria into finer aspects and incorporates aspect-wise assessments using Multimodal Large Language Model (MLLM). Additionally, we release Human Preference-Aligned Concept Customization Benchmark (CC-AlignBench), a benchmark dataset containing both single- and multi-concept tasks, enabling stage-wise evaluation across a wide difficulty range -- from individual actions to multi-person interactions. Our method significantly outperforms existing approaches on this benchmark, exhibiting higher correlation with human preferences. This work establishes a new standard for evaluating concept customization and highlights key challenges for future research. The benchmark and associated materials are available at this https URL.

[273] arXiv:2509.03386 [pdf, html, other]
Title: Hierarchical Low-Altitude Wireless Network Empowered Air Traffic Management
Ziye Jia, Jia He, Yuanhao Cui, Qiuming Zhu, Ligang Yuan, Fuhui Zhou, Qihui Wu, Dusit Niyato, Zhu Han
Comments: 7 pages 6 figures
Subjects: Networking and Internet Architecture (cs.NI)

As the increasing development of low-altitude aircrafts, the rational design of low-altitude networks directly impacts the aerial safety and resource utilization. To address the challenges of environmental complexity and aircraft diversity in the traffic management, we propose a hierarchical low-altitude wireless network (HLWN) framework. Empowered by the threedimensional spatial discretization and integrated wireless monitoring mechanisms in HLWN, we design low-altitude air corridors to guarantee safe operation and optimization. Besides, we develop the multi-dimensional flight risk assessment through conflict detection and probabilistic collision analysis, facilitating dynamic collision avoidance for heterogeneous aircrafts. Finally, the open issues and future directions are investigated to provide insights into HLAN development.

[274] arXiv:2509.03391 [pdf, html, other]
Title: More Parameters Than Populations: A Systematic Literature Review of Large Language Models within Survey Research
Trent D. Buskirk, Florian Keusch, Leah von der Heyde, Adam Eck
Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY)

Survey research has a long-standing history of being a human-powered field, but one that embraces various technologies for the collection, processing, and analysis of various behavioral, political, and social outcomes of interest, among others. At the same time, Large Language Models (LLMs) bring new technological challenges and prerequisites in order to fully harness their potential. In this paper, we report work-in-progress on a systematic literature review based on keyword searches from multiple large-scale databases as well as citation networks that assesses how LLMs are currently being applied within the survey research process. We synthesize and organize our findings according to the survey research process to include examples of LLM usage across three broad phases: pre-data collection, data collection, and post-data collection. We discuss selected examples of potential use cases for LLMs as well as its pitfalls based on examples from existing literature. Considering survey research has rich experience and history regarding data quality, we discuss some opportunities and describe future outlooks for survey research to contribute to the continued development and refinement of LLMs.

[275] arXiv:2509.03392 [pdf, html, other]
Title: More AI Assistance Reduces Cognitive Engagement: Examining the AI Assistance Dilemma in AI-Supported Note-Taking
Xinyue Chen, Kunlin Ruan, Kexin Phyllis Ju, Nathan Yap, Xu Wang
Comments: Accepted by CSCW2025
Subjects: Human-Computer Interaction (cs.HC)

As AI tools become increasingly embedded in cognitively demanding tasks such as note-taking, questions remain about whether they enhance or undermine cognitive engagement. This paper examines the "AI Assistance Dilemma" in note-taking, investigating how varying levels of AI support affect user engagement and comprehension. In a within-subject experiment, we asked participants (N=30) to take notes during lecture videos under three conditions: Automated AI (high assistance with structured notes), Intermediate AI (moderate assistance with real-time summary, and Minimal AI (low assistance with transcript). Results reveal that Intermediate AI yields the highest post-test scores and Automated AI the lowest. Participants, however, preferred the automated setup due to its perceived ease of use and lower cognitive effort, suggesting a discrepancy between preferred convenience and cognitive benefits. Our study provides insights into designing AI assistance that preserves cognitive engagement, offering implications for designing moderate AI support in cognitive tasks.

[276] arXiv:2509.03393 [pdf, html, other]
Title: Exploring a Graph-based Approach to Offline Reinforcement Learning for Sepsis Treatment
Taisiya Khakharova, Lucas Sakizloglou, Leen Lambers
Comments: 18th European Workshop on Reinforcement Learning (EWRL 2025)
Subjects: Machine Learning (cs.LG)

Sepsis is a serious, life-threatening condition. When treating sepsis, it is challenging to determine the correct amount of intravenous fluids and vasopressors for a given patient. While automated reinforcement learning (RL)-based methods have been used to support these decisions with promising results, previous studies have relied on relational data. Given the complexity of modern healthcare data, representing data as a graph may provide a more natural and effective approach. This study models patient data from the well-known MIMIC-III dataset as a heterogeneous graph that evolves over time. Subsequently, we explore two Graph Neural Network architectures - GraphSAGE and GATv2 - for learning patient state representations, adopting the approach of decoupling representation learning from policy learning. The encoders are trained to produce latent state representations, jointly with decoders that predict the next patient state. These representations are then used for policy learning with the dBCQ algorithm. The results of our experimental evaluation confirm the potential of a graph-based approach, while highlighting the complexity of representation learning in this domain.

[277] arXiv:2509.03394 [pdf, html, other]
Title: CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload
Amirhossein Shahbazinia, Darong Huang, Luis Costero, David Atienza
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)

Cloud platforms are increasingly relied upon to host diverse, resource-intensive workloads due to their scalability, flexibility, and cost-efficiency. In multi-tenant cloud environments, virtual machines are consolidated on shared physical servers to improve resource utilization. While virtualization guarantees resource partitioning for CPU, memory, and storage, it cannot ensure performance isolation. Competition for shared resources such as last-level cache, memory bandwidth, and network interfaces often leads to severe performance degradation. Existing management techniques, including VM scheduling and resource provisioning, require accurate performance prediction to mitigate interference. However, this remains challenging in public clouds due to the black-box nature of VMs and the highly dynamic nature of workloads. To address these limitations, we propose CloudFormer, a dual-branch Transformer-based model designed to predict VM performance degradation in black-box environments. CloudFormer jointly models temporal dynamics and system-level interactions, leveraging 206 system metrics at one-second resolution across both static and dynamic scenarios. This design enables the model to capture transient interference effects and adapt to varying workload conditions without scenario-specific tuning. Complementing the methodology, we provide a fine-grained dataset that significantly expands the temporal resolution and metric diversity compared to existing benchmarks. Experimental results demonstrate that CloudFormer consistently outperforms state-of-the-art baselines across multiple evaluation metrics, achieving robust generalization across diverse and previously unseen workloads. Notably, CloudFormer attains a mean absolute error (MAE) of just 7.8%, representing a substantial improvement in predictive accuracy and outperforming existing methods at least by 28%.

[278] arXiv:2509.03399 [pdf, html, other]
Title: Tangential Action Spaces: Geometry, Memory and Cost in Holonomic and Nonholonomic Agents
Marcel Blattner
Comments: 28 pages, 6 figures
Subjects: Systems and Control (eess.SY)

How much energy must an embodied agent spend to remember its past actions? We present Tangential Action Spaces (TAS), a differential-geometric framework revealing a fundamental trade-off between memory and energy in embodied agents. By modeling agents as hierarchical manifolds with projections Phi: P -> C and Psi: C -> I connecting physical (P), cognitive (C), and intentional (I) spaces, we show that the geometry of Phi dictates both memory mechanisms and their energetic costs. Our main contributions are: (1) a rigorous classification proving that one-to-one projections (diffeomorphisms) require engineered dynamics for memory while many-to-one projections (fibrations) enable intrinsic geometric memory through connection curvature; (2) a proof that any deviation from the energy-minimal lift incurs a quantifiable penalty, establishing that path-dependent behavior necessarily costs energy; and (3) a universal principle that excess cost Delta E scales with the square of accumulated holonomy (geometric memory). We validate this cost-memory duality through five systems: the strip-sine system (engineered memory, Delta E proportional to (Delta h)^2), helical and twisted fibrations (intrinsic geometric memory), and flat/cylindrical fibrations (proving curvature, not topology, creates memory). This framework bridges geometric mechanics and embodied cognition, explaining biological motor diversity and providing design principles for efficient robotic control.

[279] arXiv:2509.03403 [pdf, html, other]
Title: Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Chenlu Ye, Zhou Yu, Ziji Zhang, Hao Chen, Narayanan Sadagopan, Jing Huang, Tong Zhang, Anurag Beniwal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning with verifiable rewards (RLVR) has emerged to be a predominant paradigm for mathematical reasoning tasks, offering stable improvements in reasoning ability. However, Outcome Reward Models (ORMs) in RLVR are too coarse-grained to distinguish flawed reasoning within correct answers or valid reasoning within incorrect answers. This lack of granularity introduces noisy and misleading gradients significantly and hinders further progress in reasoning process quality. While Process Reward Models (PRMs) offer fine-grained guidance for intermediate steps, they frequently suffer from inaccuracies and are susceptible to reward hacking.
To resolve this dilemma, we introduce PRocess cOnsistency Filter (PROF), an effective data process curation method that harmonizes noisy, fine-grained process rewards with accurate, coarse-grained outcome rewards. Rather than naively blending PRM and ORM in the objective function (arXiv:archive/2506.18896), PROF leverages their complementary strengths through consistency-driven sample selection. Our approach retains correct responses with higher averaged process values and incorrect responses with lower averaged process values, while maintaining positive/negative training sample balance. Extensive experiments demonstrate that our method not only consistently improves the final accuracy over $4\%$ compared to the blending approaches, but also strengthens the quality of intermediate reasoning steps. Codes and training recipes are available at this https URL.

[280] arXiv:2509.03405 [pdf, html, other]
Title: LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Daniela Gottesman, Alon Gilae-Dotan, Ido Cohen, Yoav Gur-Arieh, Marius Mosbach, Ori Yoran, Mor Geva
Comments: Submitted to TACL, August 2025
Subjects: Computation and Language (cs.CL)

Language models (LMs) increasingly drive real-world applications that require world knowledge. However, the internal processes through which models turn data into representations of knowledge and beliefs about the world, are poorly understood. Insights into these processes could pave the way for developing LMs with knowledge representations that are more consistent, robust, and complete. To facilitate studying these questions, we present LMEnt, a suite for analyzing knowledge acquisition in LMs during pretraining. LMEnt introduces: (1) a knowledge-rich pretraining corpus, fully annotated with entity mentions, based on Wikipedia, (2) an entity-based retrieval method over pretraining data that outperforms previous approaches by as much as 80.4%, and (3) 12 pretrained models with up to 1B parameters and 4K intermediate checkpoints, with comparable performance to popular open-sourced models on knowledge benchmarks. Together, these resources provide a controlled environment for analyzing connections between entity mentions in pretraining and downstream performance, and the effects of causal interventions in pretraining data. We show the utility of LMEnt by studying knowledge acquisition across checkpoints, finding that fact frequency is key, but does not fully explain learning trends. We release LMEnt to support studies of knowledge in LMs, including knowledge representations, plasticity, editing, attribution, and learning dynamics.

[281] arXiv:2509.03407 [pdf, other]
Title: Learning Mechanism Underlying NLP Pre-Training and Fine-Tuning
Yarden Tzach, Ronit D. Gross, Ella Koresh, Shalom Rosner, Or Shpringer, Tal Halevi, Ido Kanter
Comments: 46 pages, 18 figures, 10 tables
Subjects: Computation and Language (cs.CL)

Natural language processing (NLP) enables the understanding and generation of meaningful human language, typically using a pre-trained complex architecture on a large dataset to learn the language and next fine-tune its weights to implement a specific task. Twofold goals are examined; to understand the mechanism underlying successful pre-training and to determine the interplay between the pre-training accuracy and the fine-tuning of classification tasks. The following main results were obtained; the accuracy per token (APT) increased with its appearance frequency in the dataset, and its average over all tokens served as an order parameter to quantify pre-training success, which increased along the transformer blocks. Pre-training broke the symmetry among tokens and grouped them into finite, small, strong match token clusters, as inferred from the presented token confusion matrix. This feature was sharpened along the transformer blocks toward the output layer, enhancing its performance considerably compared with that of the embedding layer. Consequently, higher-order language structures were generated by pre-training, even though the learning cost function was directed solely at identifying a single token. These pre-training findings were reflected by the improved fine-tuning accuracy along the transformer blocks. Additionally, the output label prediction confidence was found to be independent of the average input APT, as the input meaning was preserved since the tokens are replaced primarily by strong match tokens. Finally, although pre-training is commonly absent in image classification tasks, its underlying mechanism is similar to that used in fine-tuning NLP classification tasks, hinting at its universality. The results were based on the BERT-6 architecture pre-trained on the Wikipedia dataset and fine-tuned on the FewRel and DBpedia classification tasks.

[282] arXiv:2509.03408 [pdf, html, other]
Title: Scalable and Loosely-Coupled Multimodal Deep Learning for Breast Cancer Subtyping
Mohammed Amer, Mohamed A. Suliman, Tu Bui, Nuria Garcia, Serban Georgescu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Healthcare applications are inherently multimodal, benefiting greatly from the integration of diverse data sources. However, the modalities available in clinical settings can vary across different locations and patients. A key area that stands to gain from multimodal integration is breast cancer molecular subtyping, an important clinical task that can facilitate personalized treatment and improve patient prognosis. In this work, we propose a scalable and loosely-coupled multimodal framework that seamlessly integrates data from various modalities, including copy number variation (CNV), clinical records, and histopathology images, to enhance breast cancer subtyping. While our primary focus is on breast cancer, our framework is designed to easily accommodate additional modalities, offering the flexibility to scale up or down with minimal overhead without requiring re-training of existing modalities, making it applicable to other types of cancers as well. We introduce a dual-based representation for whole slide images (WSIs), combining traditional image-based and graph-based WSI representations. This novel dual approach results in significant performance improvements. Moreover, we present a new multimodal fusion strategy, demonstrating its ability to enhance performance across a range of multimodal conditions. Our comprehensive results show that integrating our dual-based WSI representation with CNV and clinical health records, along with our pipeline and fusion strategy, outperforms state-of-the-art methods in breast cancer subtyping.

[283] arXiv:2509.03409 [pdf, html, other]
Title: Multi-level SSL Feature Gating for Audio Deepfake Detection
Hoan My Tran, Damien Lolive, Aghilas Sini, Arnaud Delhay, Pierre-François Marteau, David Guennec
Comments: This paper has been accepted by ACM MM 2025
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

Recent advancements in generative AI, particularly in speech synthesis, have enabled the generation of highly natural-sounding synthetic speech that closely mimics human voices. While these innovations hold promise for applications like assistive technologies, they also pose significant risks, including misuse for fraudulent activities, identity theft, and security threats. Current research on spoofing detection countermeasures remains limited by generalization to unseen deepfake attacks and languages. To address this, we propose a gating mechanism extracting relevant feature from the speech foundation XLS-R model as a front-end feature extractor. For downstream back-end classifier, we employ Multi-kernel gated Convolution (MultiConv) to capture both local and global speech artifacts. Additionally, we introduce Centered Kernel Alignment (CKA) as a similarity metric to enforce diversity in learned features across different MultiConv layers. By integrating CKA with our gating mechanism, we hypothesize that each component helps improving the learning of distinct synthetic speech patterns. Experimental results demonstrate that our approach achieves state-of-the-art performance on in-domain benchmarks while generalizing robustly to out-of-domain datasets, including multilingual speech samples. This underscores its potential as a versatile solution for detecting evolving speech deepfake threats.

[284] arXiv:2509.03417 [pdf, html, other]
Title: Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study
Spyros Rigas, Dhruv Verma, Georgios Alexandridis, Yixuan Wang
Comments: 30 pages, 19 figures
Subjects: Machine Learning (cs.LG)

Kolmogorov-Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions, offering enhanced flexibility and interpretability. While KANs have been applied successfully across scientific and machine learning tasks, their initialization strategies remain largely unexplored. In this work, we study initialization schemes for spline-based KANs, proposing two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents. Our evaluation combines large-scale grid searches on function fitting and forward PDE benchmarks, an analysis of training dynamics through the lens of the Neural Tangent Kernel, and evaluations on a subset of the Feynman dataset. Our findings indicate that the Glorot-inspired initialization significantly outperforms the baseline in parameter-rich models, while power-law initialization achieves the strongest performance overall, both across tasks and for architectures of varying size. All code and data accompanying this manuscript are publicly available at this https URL.

[285] arXiv:2509.03419 [pdf, other]
Title: Curse of Knowledge: When Complex Evaluation Context Benefits yet Biases LLM Judges
Weiyuan Li, Xintao Wang, Siyu Yuan, Rui Xu, Jiangjie Chen, Qingqing Dong, Yanghua Xiao, Deqing Yang
Comments: 8 pages, 4 figures, conference
Subjects: Computation and Language (cs.CL)

As large language models (LLMs) grow more capable, they face increasingly diverse and complex tasks, making reliable evaluation challenging. The paradigm of LLMs as judges has emerged as a scalable solution, yet prior work primarily focuses on simple settings. Their reliability in complex tasks--where multi-faceted rubrics, unstructured reference answers, and nuanced criteria are critical--remains understudied. In this paper, we constructed ComplexEval, a challenge benchmark designed to systematically expose and quantify Auxiliary Information Induced Biases. We systematically investigated and validated 6 previously unexplored biases across 12 basic and 3 advanced scenarios. Key findings reveal: (1) all evaluated models exhibit significant susceptibility to these biases, with bias magnitude scaling with task complexity; (2) notably, Large Reasoning Models (LRMs) show paradoxical vulnerability. Our in-depth analysis offers crucial insights for improving the accuracy and verifiability of evaluation signals, paving the way for more general and robust evaluation models.

[286] arXiv:2509.03425 [pdf, html, other]
Title: LINKER: Learning Interactions Between Functional Groups and Residues With Chemical Knowledge-Enhanced Reasoning and Explainability
Phuc Pham, Viet Thanh Duy Nguyen, Truong-Son Hy
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Accurate identification of interactions between protein residues and ligand functional groups is essential to understand molecular recognition and guide rational drug design. Existing deep learning approaches for protein-ligand interpretability often rely on 3D structural input or use distance-based contact labels, limiting both their applicability and biological relevance. We introduce LINKER, the first sequence-based model to predict residue-functional group interactions in terms of biologically defined interaction types, using only protein sequences and the ligand SMILES as input. LINKER is trained with structure-supervised attention, where interaction labels are derived from 3D protein-ligand complexes via functional group-based motif extraction. By abstracting ligand structures into functional groups, the model focuses on chemically meaningful substructures while predicting interaction types rather than mere spatial proximity. Crucially, LINKER requires only sequence-level input at inference time, enabling large-scale application in settings where structural data is unavailable. Experiments on the LP-PDBBind benchmark demonstrate that structure-informed supervision over functional group abstractions yields interaction predictions closely aligned with ground-truth biochemical annotations.

[287] arXiv:2509.03426 [pdf, html, other]
Title: Time-Scaling State-Space Models for Dense Video Captioning
AJ Piergiovanni, Ganesh Satish Mallya, Dahun Kim, Anelia Angelova
Comments: BMVC 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dense video captioning is a challenging video understanding task which aims to simultaneously segment the video into a sequence of meaningful consecutive events and to generate detailed captions to accurately describe each event. Existing methods often encounter difficulties when working with the long videos associated with dense video captioning, due to the computational complexity and memory limitations. Furthermore, traditional approaches require the entire video as input, in order to produce an answer, which precludes online processing of the video. We address these challenges by time-scaling State-Space Models (SSMs) to even longer sequences than before. Our approach, State-Space Models with Transfer State, combines both the long-sequence and recurrent properties of SSMs and addresses the main limitation of SSMs which are otherwise not able to sustain their state for very long contexts, effectively scaling SSMs further in time. The proposed model is particularly suitable for generating captions on-the-fly, in an online or streaming manner, without having to wait for the full video to be processed, which is more beneficial in practice. When applied to dense video captioning, our approach scales well with video lengths and uses 7x fewer FLOPs.

[288] arXiv:2509.03427 [pdf, html, other]
Title: Federated Learning: An approach with Hybrid Homomorphic Encryption
Pedro Correia, Ivan Silva, Ivone Amorim, Eva Maia, Isabel Praça
Comments: 19 pages, 8 figures, To be published in the conference Security and Trust Management(STM), ESORICS 2025
Subjects: Cryptography and Security (cs.CR)

Federated Learning (FL) is a distributed machine learning approach that promises privacy by keeping the data on the device. However, gradient reconstruction and membership-inference attacks show that model updates still leak information. Fully Homomorphic Encryption (FHE) can address those privacy concerns but it suffers from ciphertext expansion and requires prohibitive overhead on resource-constrained devices. We propose the first Hybrid Homomorphic Encryption (HHE) framework for FL that pairs the PASTA symmetric cipher with the BFV FHE scheme. Clients encrypt local model updates with PASTA and send both the lightweight ciphertexts and the PASTA key (itself BFV-encrypted) to the server, which performs a homomorphic evaluation of the decryption circuit of PASTA and aggregates the resulting BFV ciphertexts. A prototype implementation, developed on top of the Flower FL framework, shows that on independently and identically distributed MNIST dataset with 12 clients and 10 training rounds, the proposed HHE system achieves 97.6% accuracy, just 1.3% below plaintext, while reducing client upload bandwidth by over 2,000x and cutting client runtime by 30% compared to a system based solely on the BFV FHE scheme. However, server computational cost increases by roughly 15621x for each client participating in the training phase, a challenge to be addressed in future work.

[289] arXiv:2509.03430 [pdf, html, other]
Title: EclipseTouch: Touch Segmentation on Ad Hoc Surfaces using Worn Infrared Shadow Casting
Vimal Mollyn, Nathan DeVrio, Chris Harrison
Comments: Accepted to UIST 2025
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)

The ability to detect touch events on uninstrumented, everyday surfaces has been a long-standing goal for mixed reality systems. Prior work has shown that virtual interfaces bound to physical surfaces offer performance and ergonomic benefits over tapping at interfaces floating in the air. A wide variety of approaches have been previously developed, to which we contribute a new headset-integrated technique called \systemname. We use a combination of a computer-triggered camera and one or more infrared emitters to create structured shadows, from which we can accurately estimate hover distance (mean error of 6.9~mm) and touch contact (98.0\% accuracy). We discuss how our technique works across a range of conditions, including surface material, interaction orientation, and environmental lighting.

[290] arXiv:2509.03432 [pdf, html, other]
Title: A New Approach to Direct Discretization of Wave Kinetic Equations with Application to a Nonlinear Schrodinger System in 2D
J. W. Banks, J. Shatah
Subjects: Numerical Analysis (math.NA)

Wave Kinetic Equations (WKEs) are often used to describe the evolution of ensemble averaged wave amplitudes for nonlinear wave systems. In the present manuscript we describe a new approach to direct numerical simulation of solutions to WKEs. This new method relies on a piecewise polynomial approximation of the resonant manifold, followed by numerical quadrature of the collision integral. The approach is general in nature, and is discussed in detail here for a particular nonlinear Schrodinger model in 2 spatial dimensions. Detailed convergence studies demonstrate 2nd-order accuracy for model collision integrals, and self-convergence studies for the WKE show near 2nd-order rates. Furthermore, comparison of the WKE approximation to ensemble averages of the NLS illustrate the efficacy of the method and the validity of the WKE, for both isotropic and an-isotropic solutions.

[291] arXiv:2509.03433 [pdf, html, other]
Title: Decoding Visual Neural Representations by Multimodal with Dynamic Balancing
Kaili sun, Xingyu Miao, Bing Zhai, Haoran Duan, Yang Long
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the semantic correspondence between EEG signals and visual content. With the explicit semantic labels provided by text, image and EEG features of the same category can be more closely aligned with the corresponding text representations in a shared multimodal space. To fully utilize pre-trained visual and textual representations, we propose an adapter module that alleviates the instability of high-dimensional representation while facilitating the alignment and fusion of cross-modal features. Additionally, to alleviate the imbalance in multimodal feature contributions introduced by the textual representations, we propose a Modal Consistency Dynamic Balance (MCDB) strategy that dynamically adjusts the contribution weights of each modality. We further propose a stochastic perturbation regularization (SPR) term to enhance the generalization ability of semantic perturbation-based models by introducing dynamic Gaussian noise in the modality optimization process. The evaluation results on the ThingsEEG dataset show that our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0\% and 4.7\% respectively.

[292] arXiv:2509.03436 [pdf, html, other]
Title: Cost-Optimized Systems Engineering for IoT-Enabled Robot Nurse in Infectious Pandemic Management
Md Mhamud Hussen Sifat, Md Maruf, Md Rokunuzzaman
Comments: 11 pages, 10 figures, 4 tables, 1 algorithm. Corresponding author: Md Maruf ([email protected])
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)

The utilization of robotic technology has gained traction in healthcare facilities due to progress in the field that enables time and cost savings, minimizes waste, and improves patient care. Digital healthcare technologies that leverage automation, such as robotics and artificial intelligence, have the potential to enhance the sustainability and profitability of healthcare systems in the long run. However, the recent COVID-19 pandemic has amplified the need for cyber-physical robots to automate check-ups and medication administration. A robot nurse is controlled by the Internet of Things (IoT) and can serve as an automated medical assistant while also allowing supervisory control based on custom commands. This system helps reduce infection risk and improves outcomes in pandemic settings. This research presents a test case with a nurse robot that can assess a patient's health status and take action accordingly. We also evaluate the system's performance in medication administration, health-status monitoring, and life-cycle considerations.

[293] arXiv:2509.03442 [pdf, html, other]
Title: Evaluating Diverse Feature Extraction Techniques of Multifaceted IoT Malware Analysis: A Survey
Zhuoyun Qian, Hongyi Miao, Yili Jiang, Qin Hu, Jiaqi Huang, Cheng Zhang, Fangtian Zhong
Subjects: Cryptography and Security (cs.CR)

As IoT devices continue to proliferate, their reliability is increasingly constrained by security concerns. In response, researchers have developed diverse malware analysis techniques to detect and classify IoT malware. These techniques typically rely on extracting features at different levels from IoT applications, giving rise to a wide range of feature extraction methods. However, current approaches still face significant challenges when applied in practice. This survey provides a comprehensive review of feature extraction techniques for IoT malware analysis from multiple perspectives. We first examine static and dynamic feature extraction methods, followed by hybrid approaches. We then explore feature representation strategies based on graph learning. Finally, we compare the strengths and limitations of existing techniques, highlight open challenges, and outline promising directions for future research.

[294] arXiv:2509.03446 [pdf, html, other]
Title: Graph neural networks for learning liquid simulations in dynamic scenes containing kinematic objects
Niteesh Midlagajni, Constantin A. Rothkopf
Subjects: Machine Learning (cs.LG)

Simulating particle dynamics with high fidelity is crucial for solving real-world interaction and control tasks involving liquids in design, graphics, and robotics. Recently, data-driven approaches, particularly those based on graph neural networks (GNNs), have shown progress in tackling such problems. However, these approaches are often limited to learning fluid behavior in static free-fall environments or simple manipulation settings involving primitive objects, often overlooking complex interactions with dynamically moving kinematic rigid bodies. Here, we propose a GNN-based framework designed from the ground up to learn the dynamics of liquids under rigid body interactions and active manipulations, where particles are represented as graph nodes and particle-object collisions are handled using surface representations with the bounding volume hierarchy (BVH) algorithm. This approach enables the network to model complex interactions between liquid particles and intricate surface geometries. Our model accurately captures fluid behavior in dynamic settings and can also function as a simulator in static free-fall environments. Despite being trained on a single-object manipulation task of pouring, our model generalizes effectively to environments with unseen objects and novel manipulation tasks such as stirring and scooping. Finally, we show that the learned dynamics can be leveraged to solve control and manipulation tasks using gradient-based optimization methods.

[295] arXiv:2509.03451 [pdf, html, other]
Title: SmartPoser: Arm Pose Estimation with a Smartphone and Smartwatch Using UWB and IMU Data
Nathan DeVrio, Vimal Mollyn, Chris Harrison
Comments: The first two listed authors contributed equally. Published at UIST 2023
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)

The ability to track a user's arm pose could be valuable in a wide range of applications, including fitness, rehabilitation, augmented reality input, life logging, and context-aware assistants. Unfortunately, this capability is not readily available to consumers. Systems either require cameras, which carry privacy issues, or utilize multiple worn IMUs or markers. In this work, we describe how an off-the-shelf smartphone and smartwatch can work together to accurately estimate arm pose. Moving beyond prior work, we take advantage of more recent ultra-wideband (UWB) functionality on these devices to capture absolute distance between the two devices. This measurement is the perfect complement to inertial data, which is relative and suffers from drift. We quantify the performance of our software-only approach using off-the-shelf devices, showing it can estimate the wrist and elbow joints with a \hl{median positional error of 11.0~cm}, without the user having to provide training data.

[296] arXiv:2509.03462 [pdf, html, other]
Title: sam-llm: interpretable lane change trajectoryprediction via parametric finetuning
Zhuo Cao, Yunxiao Shi, Min Xu
Comments: 5 pages
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

This work introduces SAM-LLM, a novel hybrid architecture that bridges the gap between the contextual reasoning of Large Language Models (LLMs) and the physical precision of kinematic lane change models for autonomous driving. The system is designed for interpretable lane change trajectory prediction by finetuning an LLM to output the core physical parameters of a trajectory model instead of raw coordinates. For lane-keeping scenarios, the model predicts discrete coordinates, but for lane change maneuvers, it generates the parameters for an enhanced Sinusoidal Acceleration Model (SAM), including lateral displacement, maneuver duration, initial lateral velocity, and longitudinal velocity change. This parametric approach yields a complete, continuous, and physically plausible trajectory model that is inherently interpretable and computationally efficient, achieving an 80% reduction in output size compared to coordinate-based methods. The SAM-LLM achieves a state-of-the-art overall intention prediction accuracy of 98.73%, demonstrating performance equivalent to traditional LLM predictors while offering significant advantages in explainability and resource efficiency.

[297] arXiv:2509.03463 [pdf, other]
Title: The Impact of Critique on LLM-Based Model Generation from Natural Language: The Case of Activity Diagrams
Parham Khamsepour, Mark Cole, Ish Ashraf, Sandeep Puri, Mehrdad Sabetzadeh, Shiva Nejati
Subjects: Software Engineering (cs.SE)

Large Language Models (LLMs) show strong potential for automating the generation of models from natural-language descriptions. A common approach is an iterative generate-critique-refine loop, where candidate models are produced, evaluated, and updated based on detected issues. This process needs to address: (1) structural correctness - compliance with well-formedness rules - and (2) semantic alignment - accurate reflection of the intended meaning in the source text. We present LADEX (LLM-based Activity Diagram Extractor), a pipeline for deriving activity diagrams from natural-language process descriptions using an LLM-driven critique-refine process. Structural checks in LADEX can be performed either algorithmically or by an LLM, while alignment checks are always performed by an LLM. We design five ablated variants of LADEX to study: (i) the impact of the critique-refine loop itself, (ii) the role of LLM-based semantic checks, and (iii) the comparative effectiveness of algorithmic versus LLM-based structural checks.
To evaluate LADEX, we compare the generated activity diagrams with expert-created ground truths using trace-based operational semantics. This enables automated measurement of correctness and completeness. Experiments on two datasets indicate that: (1) the critique-refine loop improves structural validity, correctness, and completeness compared to single-pass generation; (2) algorithmic structural checks eliminate inconsistencies that LLM-based checks fail to detect, improving correctness by an average of 17.81% and completeness by 13.24% over LLM-only checks; and (3) combining algorithmic structural checks with LLM-based semantic checks, implemented using the reasoning-focused O4 Mini, achieves the best overall performance - yielding average correctness of up to 86.37% and average completeness of up to 88.56% - while requiring fewer than five LLM calls on average.

[298] arXiv:2509.03465 [pdf, html, other]
Title: Joint Training of Image Generator and Detector for Road Defect Detection
Kuan-Chuan Peng
Comments: This paper is accepted to ICCV 2025 Workshop on Representation Learning with Very Limited Resources: When Data, Modalities, Labels, and Computing Resources are Scarce as an oral paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Road defect detection is important for road authorities to reduce the vehicle damage caused by road defects. Considering the practical scenarios where the defect detectors are typically deployed on edge devices with limited memory and computational resource, we aim at performing road defect detection without using ensemble-based methods or test-time augmentation (TTA). To this end, we propose to Jointly Train the image Generator and Detector for road defect detection (dubbed as JTGD). We design the dual discriminators for the generative model to enforce both the synthesized defect patches and overall images to look plausible. The synthesized image quality is improved by our proposed CLIP-based Fréchet Inception Distance loss. The generative model in JTGD is trained jointly with the detector to encourage the generative model to synthesize harder examples for the detector. Since harder synthesized images of better quality caused by the aforesaid design are used in the data augmentation, JTGD outperforms the state-of-the-art method in the RDD2022 road defect detection benchmark across various countries under the condition of no ensemble and TTA. JTGD only uses less than 20% of the number of parameters compared with the competing baseline, which makes it more suitable for deployment on edge devices in practice.

[299] arXiv:2509.03467 [pdf, html, other]
Title: Continuous Saudi Sign Language Recognition: A Vision Transformer Approach
Soukeina Elhassen, Lama Al Khuzayem, Areej Alhothali, Ohoud Alzamzami, Nahed Alowaidi
Comments: 23 pages, 13 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Sign language (SL) is an essential communication form for hearing-impaired and deaf people, enabling engagement within the broader society. Despite its significance, limited public awareness of SL often leads to inequitable access to educational and professional opportunities, thereby contributing to social exclusion, particularly in Saudi Arabia, where over 84,000 individuals depend on Saudi Sign Language (SSL) as their primary form of communication. Although certain technological approaches have helped to improve communication for individuals with hearing impairments, there continues to be an urgent requirement for more precise and dependable translation techniques, especially for Arabic sign language variants like SSL. Most state-of-the-art solutions have primarily focused on non-Arabic sign languages, resulting in a considerable absence of resources dedicated to Arabic sign language, specifically SSL. The complexity of the Arabic language and the prevalence of isolated sign language datasets that concentrate on individual words instead of continuous speech contribute to this issue. To address this gap, our research represents an important step in developing SSL resources. To address this, we introduce the first continuous Saudi Sign Language dataset called KAU-CSSL, focusing on complete sentences to facilitate further research and enable sophisticated recognition systems for SSL recognition and translation. Additionally, we propose a transformer-based model, utilizing a pretrained ResNet-18 for spatial feature extraction and a Transformer Encoder with Bidirectional LSTM for temporal dependencies, achieving 99.02\% accuracy at signer dependent mode and 77.71\% accuracy at signer independent mode. This development leads the way to not only improving communication tools for the SSL community but also making a substantial contribution to the wider field of sign language.

[300] arXiv:2509.03471 [pdf, html, other]
Title: Linear Relaxation Schemes with Asymptotically Compatible Energy Law for Time-Fractional Phase-Field Models
Hui Yu, Zhaoyang Wang, Ping Lin
Subjects: Numerical Analysis (math.NA)

In this paper, we propose a variable time-step linear relaxation scheme for time-fractional phase-field equations with a free energy density in general polynomial form. The $L1^{+}$-CN formula is used to discretize the fractional derivative, and an auxiliary variable is introduced to approximate the nonlinear term by directly solving algebraic equations rather than differential-algebraic equations as in the invariant energy quadratization (IEQ) and the scalar auxiliary variable (SAV) approaches. The developed semi-discrete scheme is second-order accurate in time, and the inconsistency between the auxiliary and the original variables does not deteriorate over time. Furthermore, we take the time-fractional volume-conserved Allen-Cahn equation, the time-fractional Cahn-Hilliard equation, and the time-fractional Swift-Hohenberg equation as examples to demonstrate that the constructed schemes are energy stable and that the discrete energy dissipation law is asymptotically compatible with the classical one when the fractional-order parameter $\alpha\rightarrow 1^{-}$. Several numerical examples demonstrate the effectiveness of the proposed scheme. In particular, numerical results confirm that the auxiliary variable remains well aligned with the original variable, and the error between them does not continue to increase over time before the system reaches steady state.

[301] arXiv:2509.03472 [pdf, html, other]
Title: DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
Yubo Gao, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Differentially-Private SGD (DP-SGD) is a powerful technique to protect user privacy when using sensitive data to train neural networks. During training, converting model weights and activations into low-precision formats, i.e., quantization, can drastically reduce training times, energy consumption, and cost, and is thus a widely used technique. In this work, we demonstrate that quantization causes significantly higher accuracy degradation in DP-SGD compared to regular SGD. We observe that this is caused by noise injection in DP-SGD, which amplifies quantization variance, leading to disproportionately large accuracy degradation. To address this challenge, we present QPQuant, a dynamic quantization framework that adaptively selects a changing subset of layers to quantize at each epoch. Our method combines two key ideas that effectively reduce quantization variance: (i) probabilistic sampling of the layers that rotates which layers are quantized every epoch, and (ii) loss-aware layer prioritization, which uses a differentially private loss sensitivity estimator to identify layers that can be quantized with minimal impact on model quality. This estimator consumes a negligible fraction of the overall privacy budget, preserving DP guarantees. Empirical evaluations on ResNet18, ResNet50, and DenseNet121 across a range of datasets demonstrate that DPQuant consistently outperforms static quantization baselines, achieving near Pareto-optimal accuracy-compute trade-offs and up to 2.21x theoretical throughput improvements on low-precision hardware, with less than 2% drop in validation accuracy.

[302] arXiv:2509.03474 [pdf, html, other]
Title: Geometric Foundations of Tuning without Forgetting in Neural ODEs
Erkan Bayram, Mohamed-Ali Belabbas, Tamer Başar
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

In our earlier work, we introduced the principle of Tuning without Forgetting (TwF) for sequential training of neural ODEs, where training samples are added iteratively and parameters are updated within the subspace of control functions that preserves the end-point mapping at previously learned samples on the manifold of output labels in the first-order approximation sense. In this letter, we prove that this parameter subspace forms a Banach submanifold of finite codimension under nonsingular controls, and we characterize its tangent space. This reveals that TwF corresponds to a continuation/deformation of the control function along the tangent space of this Banach submanifold, providing a theoretical foundation for its mapping-preserving (not forgetting) during the sequential training exactly, beyond first-order approximation.

[303] arXiv:2509.03477 [pdf, html, other]
Title: Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning
Duy A. Nguyen, Abhi Kamboj, Minh N. Do
Comments: Accepted and presented at IJCAI 2025 in Montreal, Canada
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Addressing missing modalities and limited labeled data is crucial for advancing robust multimodal learning. We propose Robult, a scalable framework designed to mitigate these challenges by preserving modality-specific information and leveraging redundancy through a novel information-theoretic approach. Robult optimizes two core objectives: (1) a soft Positive-Unlabeled (PU) contrastive loss that maximizes task-relevant feature alignment while effectively utilizing limited labeled data in semi-supervised settings, and (2) a latent reconstruction loss that ensures unique modality-specific information is retained. These strategies, embedded within a modular design, enhance performance across various downstream tasks and ensure resilience to incomplete modalities during inference. Experimental results across diverse datasets validate that Robult achieves superior performance over existing approaches in both semi-supervised learning and missing modality contexts. Furthermore, its lightweight design promotes scalability and seamless integration with existing architectures, making it suitable for real-world multimodal applications.

[304] arXiv:2509.03479 [pdf, other]
Title: Design and Optimization of Reinforcement Learning-Based Agents in Text-Based Games
Haonan Wang, Mingjia Zhao, Junfeng Sun, Wei Liu
Comments: 6 papges
Journal-ref: Copyright (c) 2025 International Journal of Computer Science and Information Technology International Journal of Computer Science and Information Technology International Journal of Computer Science and Information Technology
Subjects: Computation and Language (cs.CL)

As AI technology advances, research in playing text-based games with agents has becomeprogressively popular. In this paper, a novel approach to agent design and agent learning ispresented with the context of reinforcement learning. A model of deep learning is first applied toprocess game text and build a world model. Next, the agent is learned through a policy gradient-based deep reinforcement learning method to facilitate conversion from state value to optimal this http URL enhanced agent works better in several text-based game experiments and significantlysurpasses previous agents on game completion ratio and win rate. Our study introduces novelunderstanding and empirical ground for using reinforcement learning for text games and sets thestage for developing and optimizing reinforcement learning agents for more general domains andproblems.

[305] arXiv:2509.03481 [pdf, html, other]
Title: PoolPy: Flexible Group Testing Design for Large-Scale Screening
Lorenzo Talamanca, Julian Trouillon
Subjects: Information Theory (cs.IT)

In large screening campaigns, group testing can greatly reduce the number of tests needed when compared to testing each sample individually. However, choosing and applying an appropriate group testing method remains challenging due to the wide variety in design and performance across methods, and the lack of accessible tools. Here, we present PoolPy, a unified framework for designing and selecting optimal group testing strategies across ten different methods according to user-defined constraints, such as time, cost or sample dilution. By computing over 10,000 group testing designs made available through a web interface, we identified key trade-offs, such as minimizing test number or group size, that define applicability to specific use cases. Overall, we show that no single method is universally optimal, and provide clear indications for method choice on a case-by-case basis.

[306] arXiv:2509.03484 [pdf, other]
Title: Globally Asymptotically Stable Trajectory Tracking of Underactuated UAVs using Geometric Algebra
Ignacio Rubio Scola, Omar Alejandro Garcia Alcantara, Steven Sandoval, Eduardo Steed Espinoza Quesada, Hernan Haimovich, Luis Rodolfo Garcia Carrillo
Comments: This work has been submitted to the IEEE TAES for possible publication
Subjects: Systems and Control (eess.SY)

This paper employs Geometric Algebra (GA) tools to model the dynamics of objects in 3-dimensional space, serving as a proof of concept to facilitate control design for trajectory tracking in underactuated systems. For control purposes, the model is structured as a cascade system, where a rotational subsystem drives a translational one. The rotational subsystem is linear, while the translational subsystem follows a linear-plus-perturbation form, thereby reducing the complexity of control design. A control strategy requiring only simple operations, no memory, and no iterative search loops is presented to illustrate the main features of the GA model. By employing GA to model both translations and rotations, a singularity-free and geometrically intuitive representation can be achieved through the use of the geometric product. Closed-loop stability is rigorously established using input-to-state stability methods. Numerical simulations of a quad tilt-rotorcraft performing trajectory tracking in a windy environment validate the controller's stability and performance.

[307] arXiv:2509.03487 [pdf, html, other]
Title: SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)

Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at this https URL.

[308] arXiv:2509.03493 [pdf, html, other]
Title: On Entropy Control in LLM-RL Algorithms
Han Shen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

For RL algorithms, appropriate entropy control is crucial to their effectiveness. To control the policy entropy, a commonly used method is entropy regularization, which is adopted in various popular RL algorithms including PPO, SAC and A3C. Although entropy regularization proves effective in robotic and games RL conventionally, studies found that it gives weak to no gains in LLM-RL training. In this work, we study the issues of entropy bonus in LLM-RL setting. Specifically, we first argue that the conventional entropy regularization suffers from the LLM's extremely large response space and the sparsity of the optimal outputs. As a remedy, we propose AEnt, an entropy control method that utilizes a new clamped entropy bonus with an automatically adjusted coefficient. The clamped entropy is evaluated with the re-normalized policy defined on certain smaller token space, which encourages exploration within a more compact response set. In addition, the algorithm automatically adjusts entropy coefficient according to the clamped entropy value, effectively controlling the entropy-induced bias while leveraging the entropy's benefits. AEnt is tested in math-reasoning tasks under different base models and datasets, and it is observed that AEnt outperforms the baselines consistently across multiple benchmarks.

[309] arXiv:2509.03494 [pdf, html, other]
Title: Parameter-Efficient Adaptation of mPLUG-Owl2 via Pixel-Level Visual Prompts for NR-IQA
Yahya Benmahane, Mohammed El Hassouni
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel parameter-efficient adaptation method for No- Reference Image Quality Assessment (NR-IQA) using visual prompts optimized in pixel-space. Unlike full fine-tuning of Multimodal Large Language Models (MLLMs), our approach trains only 600K parameters at most (< 0.01% of the base model), while keeping the underlying model fully frozen. During inference, these visual prompts are combined with images via addition and processed by mPLUG-Owl2 with the textual query "Rate the technical quality of the image." Evaluations across distortion types (synthetic, realistic, AI-generated) on KADID- 10k, KonIQ-10k, and AGIQA-3k demonstrate competitive performance against full finetuned methods and specialized NR-IQA models, achieving 0.93 SRCC on KADID-10k. To our knowledge, this is the first work to leverage pixel-space visual prompts for NR-IQA, enabling efficient MLLM adaptation for low-level vision tasks. The source code is publicly available at https: // github. com/ yahya-ben/ mplug2-vp-for-nriqa .

[310] arXiv:2509.03497 [pdf, html, other]
Title: Invariant Features for Global Crop Type Classification
Xin-Yi Tong, Sherrie Wang
Subjects: Machine Learning (cs.LG)

Accurately obtaining crop type and its spatial distribution at a global scale is critical for food security, agricultural policy-making, and sustainable development. Remote sensing offers an efficient solution for large-scale crop classification, but the limited availability of reliable ground samples in many regions constrains applicability across geographic areas. To address performance declines under geospatial shifts, this study identifies remote sensing features that are invariant to geographic variation and proposes strategies to enhance cross-regional generalization. We construct CropGlobe, a global crop type dataset with 300,000 pixel-level samples from eight countries across five continents, covering six major food and industrial crops (corn, soybeans, rice, wheat, sugarcane, cotton). With broad geographic coverage, CropGlobe enables a systematic evaluation under cross-country, cross-continent, and cross-hemisphere transfer. We compare the transferability of temporal multi-spectral features (Sentinel-2-based 1D/2D median features and harmonic coefficients) and hyperspectral features (from EMIT). To improve generalization under spectral and phenological shifts, we design CropNet, a lightweight and robust CNN tailored for pixel-level crop classification, coupled with temporal data augmentation (time shift, time scale, and magnitude warping) that simulates realistic cross-regional phenology. Experiments show that 2D median temporal features from Sentinel-2 consistently exhibit the strongest invariance across all transfer scenarios, and augmentation further improves robustness, particularly when training data diversity is limited. Overall, the work identifies more invariant feature representations that enhance geographic transferability and suggests a promising path toward scalable, low-cost crop type applications across globally diverse regions.

[311] arXiv:2509.03498 [pdf, html, other]
Title: OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Han Li, Xinyu Peng, Yaoming Wang, Zelin Peng, Xin Chen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Wenrui Dai, Hongkai Xiong
Comments: technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce OneCAT, a unified multimodal model that seamlessly integrates understanding, generation, and editing within a novel, pure decoder-only transformer architecture. Our framework uniquely eliminates the need for external components such as Vision Transformers (ViT) or vision tokenizer during inference, leading to significant efficiency gains, especially for high-resolution inputs. This is achieved through a modality-specific Mixture-of-Experts (MoE) structure trained with a single autoregressive (AR) objective, which also natively supports dynamic resolutions. Furthermore, we pioneer a multi-scale visual autoregressive mechanism within the Large Language Model (LLM) that drastically reduces decoding steps compared to diffusion-based methods while maintaining state-of-the-art performance. Our findings demonstrate the powerful potential of pure autoregressive modeling as a sufficient and elegant foundation for unified multimodal intelligence. As a result, OneCAT sets a new performance standard, outperforming existing open-source unified multimodal models across benchmarks for multimodal generation, editing, and understanding.

[312] arXiv:2509.03499 [pdf, html, other]
Title: DeepSea MOT: A benchmark dataset for multi-object tracking on deep-sea video
Kevin Barnard, Elaine Liu, Kristine Walz, Brian Schlining, Nancy Jacobsen Stout, Lonny Lundsten
Comments: 5 pages, 3 figures, dataset available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Benchmarking multi-object tracking and object detection model performance is an essential step in machine learning model development, as it allows researchers to evaluate model detection and tracker performance on human-generated 'test' data, facilitating consistent comparisons between models and trackers and aiding performance optimization. In this study, a novel benchmark video dataset was developed and used to assess the performance of several Monterey Bay Aquarium Research Institute object detection models and a FathomNet single-class object detection model together with several trackers. The dataset consists of four video sequences representing midwater and benthic deep-sea habitats. Performance was evaluated using Higher Order Tracking Accuracy, a metric that balances detection, localization, and association accuracy. To the best of our knowledge, this is the first publicly available benchmark for multi-object tracking in deep-sea video footage. We provide the benchmark data, a clearly documented workflow for generating additional benchmark videos, as well as example Python notebooks for computing metrics.

[313] arXiv:2509.03500 [pdf, html, other]
Title: Real-Time Instrument Planning and Perception for Novel Measurements of Dynamic Phenomena
Itai Zilberstein, Alberto Candela, Steve Chien
Comments: Appears in Proceedings of 18th Symposium on Advanced Space Technologies in Robotics and Automation
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Advancements in onboard computing mean remote sensing agents can employ state-of-the-art computer vision and machine learning at the edge. These capabilities can be leveraged to unlock new rare, transient, and pinpoint measurements of dynamic science phenomena. In this paper, we present an automated workflow that synthesizes the detection of these dynamic events in look-ahead satellite imagery with autonomous trajectory planning for a follow-up high-resolution sensor to obtain pinpoint measurements. We apply this workflow to the use case of observing volcanic plumes. We analyze classification approaches including traditional machine learning algorithms and convolutional neural networks. We present several trajectory planning algorithms that track the morphological features of a plume and integrate these algorithms with the classifiers. We show through simulation an order of magnitude increase in the utility return of the high-resolution instrument compared to baselines while maintaining efficient runtimes.

[314] arXiv:2509.03501 [pdf, html, other]
Title: Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Honglu Zhou, Xiangyu Peng, Shrikant Kendre, Michael S. Ryoo, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
Comments: This technical report serves as the archival version of our paper accepted at the ICCV 2025 Workshop. For more information, please visit our project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Next-generation AI companions must go beyond general video understanding to resolve spatial and temporal references in dynamic, real-world environments. Existing Video Large Language Models (Video LLMs), while capable of coarse-level comprehension, struggle with fine-grained, spatiotemporal reasoning, especially when user queries rely on time-based event references for temporal anchoring, or gestural cues for spatial anchoring to clarify object references and positions. To bridge this critical gap, we introduce Strefer, a synthetic instruction data generation framework designed to equip Video LLMs with spatiotemporal referring and reasoning capabilities. Strefer produces diverse instruction-tuning data using a data engine that pseudo-annotates temporally dense, fine-grained video metadata, capturing rich spatial and temporal information in a structured manner, including subjects, objects, their locations as masklets, and their action descriptions and timelines. Our approach enhances the ability of Video LLMs to interpret spatial and temporal references, fostering more versatile, space-time-aware reasoning essential for real-world AI companions. Without using proprietary models, costly human annotation, or the need to annotate large volumes of new videos, experimental evaluations show that models trained with data produced by Strefer outperform baselines on tasks requiring spatial and temporal disambiguation. Additionally, these models exhibit enhanced space-time-aware reasoning, establishing a new foundation for perceptually grounded, instruction-tuned Video LLMs.

[315] arXiv:2509.03503 [pdf, html, other]
Title: Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients
Gwen Legate, Irina Rish, Eugene Belilovsky
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation we seek to correct. We devise a federated, memory-efficient zeroth-order optimizer, ZOWarmUp that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.

[316] arXiv:2509.03505 [pdf, html, other]
Title: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Han, Xuanyue Li, Yan Lu, Yuan Xue, Yuanyuan Jiang, Zimu Wang, Zhenlei Wang, Peng Cui
Comments: 56 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX, the first installment of our large structured-data models (LDMs). LimiX treats structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. LimiX is pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, where the model predicts for query subsets conditioned on dataset-specific contexts, supporting rapid, training-free adaptation at inference. We evaluate LimiX across 10 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. With a single model and a unified interface, LimiX consistently surpasses strong baselines including gradient-boosting trees, deep tabular networks, recent tabular foundation models, and automated ensembles, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. All LimiX models are publicly accessible under Apache 2.0.

[317] arXiv:2509.03510 [pdf, other]
Title: A comprehensive Persian offline handwritten database for investigating the effects of heritability and family relationships on handwriting
Abbas Zohrevand, Javad Sadri, Zahra Imani
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a comprehensive database for research and investigation on the effects of inheritance on handwriting. A database has been created that can be used to answer questions such as: Is there a genetic component to handwriting? Is handwriting inherited? Do family relationships affect handwriting? Varieties of samples of handwritten components such as: digits, letters, shapes and free paragraphs of 210 families including (grandparents, parents, uncles, aunts, siblings, cousins, nephews and nieces) have been collected using specially designed forms, and family relationships of all writers are captured. To the best of our knowledge, no such database is presently available. Based on comparisons and investigation of features of handwritings of family members, similarities among their features and writing styles are detected. Our database is freely available to the pattern recognition community and hope it will pave the way for investigations on the effects of inheritance and family relationships on handwritings.

[318] arXiv:2509.03515 [pdf, html, other]
Title: Can the Waymo Open Motion Dataset Support Realistic Behavioral Modeling? A Validation Study with Naturalistic Trajectories
Yanlin Zhang, Sungyong Chung, Nachuan Li, Dana Monzer, Hani S. Mahmassani, Samer H. Hamdar, Alireza Talebpour
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY); Applications (stat.AP)

The Waymo Open Motion Dataset (WOMD) has become a popular resource for data-driven modeling of autonomous vehicles (AVs) behavior. However, its validity for behavioral analysis remains uncertain due to proprietary post-processing, the absence of error quantification, and the segmentation of trajectories into 20-second clips. This study examines whether WOMD accurately captures the dynamics and interactions observed in real-world AV operations. Leveraging an independently collected naturalistic dataset from Level 4 AV operations in Phoenix, Arizona (PHX), we perform comparative analyses across three representative urban driving scenarios: discharging at signalized intersections, car-following, and lane-changing behaviors. For the discharging analysis, headways are manually extracted from aerial video to ensure negligible measurement error. For the car-following and lane-changing cases, we apply the Simulation-Extrapolation (SIMEX) method to account for empirically estimated error in the PHX data and use Dynamic Time Warping (DTW) distances to quantify behavioral differences. Results across all scenarios consistently show that behavior in PHX falls outside the behavioral envelope of WOMD. Notably, WOMD underrepresents short headways and abrupt decelerations. These findings suggest that behavioral models calibrated solely on WOMD may systematically underestimate the variability, risk, and complexity of naturalistic driving. Caution is therefore warranted when using WOMD for behavior modeling without proper validation against independently collected data.

[319] arXiv:2509.03516 [pdf, html, other]
Title: Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Ouxiang Li, Yuan Wang, Xinting Hu, Huijuan Huang, Rui Chen, Jiarong Ou, Xin Tao, Pengfei Wan, Fuli Feng
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-to-image (T2I) generation aims to synthesize images from textual prompts, which jointly specify what must be shown and imply what can be inferred, thereby corresponding to two core capabilities: composition and reasoning. However, with the emerging advances of T2I models in reasoning beyond composition, existing benchmarks reveal clear limitations in providing comprehensive evaluations across and within these capabilities. Meanwhile, these advances also enable models to handle more complex prompts, whereas current benchmarks remain limited to low scene density and simplified one-to-one reasoning. To address these limitations, we propose T2I-CoReBench, a comprehensive and complex benchmark that evaluates both composition and reasoning capabilities of T2I models. To ensure comprehensiveness, we structure composition around scene graph elements (instance, attribute, and relation) and reasoning around the philosophical framework of inference (deductive, inductive, and abductive), formulating a 12-dimensional evaluation taxonomy. To increase complexity, driven by the inherent complexities of real-world scenarios, we curate each prompt with high compositional density for composition and multi-step inference for reasoning. We also pair each prompt with a checklist that specifies individual yes/no questions to assess each intended element independently to facilitate fine-grained and reliable evaluation. In statistics, our benchmark comprises 1,080 challenging prompts and around 13,500 checklist questions. Experiments across 27 current T2I models reveal that their composition capability still remains limited in complex high-density scenarios, while the reasoning capability lags even further behind as a critical bottleneck, with all models struggling to infer implicit elements from prompts. Our project page: this https URL.

[320] arXiv:2509.03518 [pdf, html, other]
Title: Can LLMs Lie? Investigation beyond Hallucination
Haoran Huan, Mihir Prabhudesai, Mengning Wu, Shantanu Jaiswal, Deepak Pathak
Comments: Website at this https URL
Subjects: Machine Learning (cs.LG)

Large language models (LLMs) have demonstrated impressive capabilities across a variety of tasks, but their increasing autonomy in real-world applications raises concerns about their trustworthiness. While hallucinations-unintentional falsehoods-have been widely studied, the phenomenon of lying, where an LLM knowingly generates falsehoods to achieve an ulterior objective, remains underexplored. In this work, we systematically investigate the lying behavior of LLMs, differentiating it from hallucinations and testing it in practical scenarios. Through mechanistic interpretability techniques, we uncover the neural mechanisms underlying deception, employing logit lens analysis, causal interventions, and contrastive activation steering to identify and control deceptive behavior. We study real-world lying scenarios and introduce behavioral steering vectors that enable fine-grained manipulation of lying tendencies. Further, we explore the trade-offs between lying and end-task performance, establishing a Pareto frontier where dishonesty can enhance goal optimization. Our findings contribute to the broader discourse on AI ethics, shedding light on the risks and potential safeguards for deploying LLMs in high-stakes environments. Code and more illustrations are available at this https URL

Cross submissions (showing 83 of 83 entries)

[321] arXiv:2509.02568 (cross-list from eess.SP) [pdf, html, other]
Title: EEG-MSAF: An Interpretable Microstate Framework uncovers Default-Mode Decoherence in Early Neurodegeneration
Mohammad Mehedi Hasan, Pedro G. Lind, Hernando Ombao, Anis Yazidi, Rabindra Khadka
Comments: Dementia, EEG, Microstates, Explainable, SHAP
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Dementia (DEM) is a growing global health challenge, underscoring the need for early and accurate diagnosis. Electroencephalography (EEG) provides a non-invasive window into brain activity, but conventional methods struggle to capture its transient complexity. We present the \textbf{EEG Microstate Analysis Framework (EEG-MSAF)}, an end-to-end pipeline that leverages EEG microstates discrete, quasi-stable topographies to identify DEM-related biomarkers and distinguish DEM, mild cognitive impairment (MCI), and normal cognition (NC). EEG-MSAF comprises three stages: (1) automated microstate feature extraction, (2) classification with machine learning (ML), and (3) feature ranking using Shapley Additive Explanations (SHAP) to highlight key biomarkers. We evaluate on two EEG datasets: the public Chung-Ang University EEG (CAUEEG) dataset and a clinical cohort from Thessaloniki Hospital. Our framework demonstrates strong performance and generalizability. On CAUEEG, EEG-MSAF-SVM achieves \textbf{89\% $\pm$ 0.01 accuracy}, surpassing the deep learning baseline CEEDNET by \textbf{19.3\%}. On the Thessaloniki dataset, it reaches \textbf{95\% $\pm$ 0.01 accuracy}, comparable to EEGConvNeXt. SHAP analysis identifies mean correlation and occurrence as the most informative metrics: disruption of microstate C (salience/attention network) dominates DEM prediction, while microstate F, a novel default-mode pattern, emerges as a key early biomarker for both MCI and DEM. By combining accuracy, generalizability, and interpretability, EEG-MSAF advances EEG-based dementia diagnosis and sheds light on brain dynamics across the cognitive spectrum.

[322] arXiv:2509.02571 (cross-list from eess.AS) [pdf, other]
Title: Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
Diego Di Carlo (RIKEN AIP), Koyama Shoichi (UTokyo), Nugraha Aditya Arie (RIKEN AIP), Fontaine Mathieu (LTCI, S2A), Bando Yoshiaki (AIST), Yoshii Kazuyoshi (RIKEN AIP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

This paper investigates continuous representations of steering vectors over frequency and position of microphone and source for augmented listening (e.g., spatial filtering and binaural rendering) with precise control of the sound field perceived by the user. Steering vectors have typically been used for representing the spatial characteristics of the sound field as a function of the listening position. The basic algebraic representation of steering vectors assuming an idealized environment cannot deal with the scattering effect of the sound field. One may thus collect a discrete set of real steering vectors measured in dedicated facilities and super-resolve (i.e., upsample) them. Recently, physics-aware deep learning methods have been effectively used for this purpose. Such deterministic super-resolution, however, suffers from the overfitting problem due to the non-uniform uncertainty over the measurement space. To solve this problem, we integrate an expressive representation based on the neural field (NF) into the principled probabilistic framework based on the Gaussian process (GP). Specifically, we propose a physics-aware composite kernel that model the directional incoming waves and the subsequent scattering effect. Our comprehensive comparative experiment showed the effectiveness of the proposed method under data insufficiency conditions. In downstream tasks such as speech enhancement and binaural rendering using the simulated data of the SPEAR challenge, the oracle performances were attained with less than ten times fewer measurements.

[323] arXiv:2509.02582 (cross-list from physics.med-ph) [pdf, other]
Title: Application of Quantum Convolutional Neural Networks for MRI-Based Brain Tumor Detection and Classification
Sugih Pratama Nugraha, Ariiq Islam Alfajri, Tony Sumaryada, Duong Thanh Tai, Nissren Tamam, Abdelmoneim Sulieman, Sitti Yani
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

This study explores the application of Quantum Convolutional Neural Networks (QCNNs) for brain tumor classification using MRI images, leveraging quantum computing for enhanced computational efficiency. A dataset of 3,264 MRI images, including glioma, meningioma, pituitary tumors, and non-tumor cases, was utilized. The data was split into 80% training and 20% testing, with an oversampling technique applied to address class imbalance. The QCNN model consists of quantum convolution layers, flatten layers, and dense layers, with a filter size of 2, depth of 4, and 4 qubits, trained over 10 epochs. Two models were developed: a binary classification model distinguishing tumor presence and a multiclass classification model categorizing tumor types. The binary model achieved 88% accuracy, improving to 89% after data balancing, while the multiclass model achieved 52% accuracy, increasing to 62% after oversampling. Despite strong binary classification performance, the multiclass model faced challenges due to dataset complexity and quantum circuit limitations. These findings suggest that QCNNs hold promise for medical imaging applications, particularly in binary classification. However, further refinements, including optimized quantum circuit architectures and hybrid classical-quantum approaches, are necessary to enhance multiclass classification accuracy and improve QCNN applicability in clinical settings.

[324] arXiv:2509.02585 (cross-list from eess.IV) [pdf, html, other]
Title: Pan-Cancer mitotic figures detection and domain generalization: MIDOG 2025 Challenge
Zhuoyan Shen, Esther Bär, Maria Hawkins, Konstantin Bräutigam, Charles-Antoine Collins-Fekete
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

This report details our submission to the Mitotic Domain Generalization (MIDOG) 2025 challenge, which addresses the critical task of mitotic figure detection in histopathology for cancer prognostication. Following the "Bitter Lesson"\cite{sutton2019bitterlesson} principle that emphasizes data scale over algorithmic novelty, we have publicly released two new datasets to bolster training data for both conventional \cite{Shen2024framework} and atypical mitoses \cite{shen_2025_16780587}. Besides, we implement up-to-date training methodologies for both track and reach a Track-1 F1-Score of 0.8407 on our test set, as well as a Track-2 balanced accuracy of 0.9107 for atypical mitotic cell classification.

[325] arXiv:2509.02586 (cross-list from eess.IV) [pdf, html, other]
Title: MitoDetect++: A Domain-Robust Pipeline for Mitosis Detection and Atypical Subtyping
Esha Sadia Nasir, Jiaqi Lv, Mostafa Jahanifer, Shan E Ahmed Raza
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Automated detection and classification of mitotic figures especially distinguishing atypical from normal remain critical challenges in computational pathology. We present MitoDetect++, a unified deep learning pipeline designed for the MIDOG 2025 challenge, addressing both mitosis detection and atypical mitosis classification. For detection (Track 1), we employ a U-Net-based encoder-decoder architecture with EfficientNetV2-L as the backbone, enhanced with attention modules, and trained via combined segmentation losses. For classification (Track 2), we leverage the Virchow2 vision transformer, fine-tuned efficiently using Low-Rank Adaptation (LoRA) to minimize resource consumption. To improve generalization and mitigate domain shifts, we integrate strong augmentations, focal loss, and group-aware stratified 5-fold cross-validation. At inference, we deploy test-time augmentation (TTA) to boost robustness. Our method achieves a balanced accuracy of 0.892 across validation domains, highlighting its clinical applicability and scalability across tasks.

[326] arXiv:2509.02588 (cross-list from eess.IV) [pdf, html, other]
Title: Sequential Hard Mining: a data-centric approach for Mitosis Detection
Maxime W. Lafarge, Viktor H. Koelzer
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

With a continuously growing availability of annotated datasets of mitotic figures in histology images, finding the best way to optimally use with this unprecedented amount of data to optimally train deep learning models has become a new challenge. Here, we build upon previously proposed approaches with a focus on efficient sampling of training data inspired by boosting techniques and present our candidate solutions for the two tracks of the MIDOG 2025 challenge.

[327] arXiv:2509.02589 (cross-list from eess.IV) [pdf, html, other]
Title: Normal and Atypical Mitosis Image Classifier using Efficient Vision Transformer
Xuan Qi, Dominic Labella, Thomas Sanford, Maxwell Lee
Comments: for grandchallenge midog 2025 track 2 abstract
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We tackle atypical versus normal mitosis classification in the MIDOG 2025 challenge using EfficientViT-L2, a hybrid CNN--ViT architecture optimized for accuracy and efficiency. A unified dataset of 13,938 nuclei from seven cancer types (MIDOG++ and AMi-Br) was used, with atypical mitoses comprising ~15. To assess domain generalization, we applied leave-one-cancer-type-out cross-validation with 5-fold ensembles, using stain-deconvolution for image augmentation. For challenge submissions, we trained an ensemble with the same 5-fold split but on all cancer types. In the preliminary evaluation phase, this model achieved balanced accuracy of 0.859, ROC AUC of 0.942, and raw accuracy of 0.85, demonstrating competitive and well-balanced performance across metrics.

[328] arXiv:2509.02591 (cross-list from eess.IV) [pdf, html, other]
Title: Ensemble of Pathology Foundation Models for MIDOG 2025 Track 2: Atypical Mitosis Classification
Mieko Ochi, Bae Yuan
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Mitotic figures are classified into typical and atypical variants, with atypical counts correlating strongly with tumor aggressiveness. Accurate differentiation is therefore essential for patient prognostication and resource allocation, yet remains challenging even for expert pathologists. Here, we leveraged Pathology Foundation Models (PFMs) pre-trained on large histopathology datasets and applied parameter-efficient fine-tuning via low-rank adaptation. During training, we employ a fisheye transform to emphasize mitoses and Fourier Domain Adaptation using ImageNet target images. Finally, we ensembled multiple PFMs to integrate complementary morphological insights, achieving a high balanced accuracy on the Preliminary Evaluation Phase dataset.

[329] arXiv:2509.02593 (cross-list from eess.IV) [pdf, html, other]
Title: Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
Raphaël Bourgade, Guillaume Balezo, Thomas Walter
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Mitotic figures represent a key histoprognostic feature in tumor pathology, providing crucial insights into tumor aggressiveness and proliferation. However, their identification remains challenging, subject to significant inter-observer variability, even among experienced pathologists. To address this issue, the MItosis DOmain Generalization (MIDOG) 2025 challenge marks the third edition of an international competition aiming to develop robust mitosis detection algorithms. In this paper, we present a mitotic figures detection approach based on the YOLOv12 object detection architecture, achieving a $F_1$-score of 0.801 on the preliminary test set of the MIDOG 2025 challenge, without relying on external data.

[330] arXiv:2509.02594 (cross-list from q-bio.QM) [pdf, html, other]
Title: OpenAIs HealthBench in Action: Evaluating an LLM-Based Medical Assistant on Realistic Clinical Queries
Sandhanakrishnan Ravichandran, Shivesh Kumar, Rogerio Corga Da Silva, Miguel Romano, Reinhard Berkels, Michiel van der Heijden, Olivier Fail, Valentine Emmanuel Gnanapragasam
Comments: 13 pages, two graphs
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Information Retrieval (cs.IR)

Evaluating large language models (LLMs) on their ability to generate high-quality, accurate, situationally aware answers to clinical questions requires going beyond conventional benchmarks to assess how these systems behave in complex, high-stake clincal scenarios. Traditional evaluations are often limited to multiple-choice questions that fail to capture essential competencies such as contextual reasoning, awareness and uncertainty handling etc. To address these limitations, we evaluate our agentic, RAG-based clinical support assistant, this http URL, using HealthBench, a rubric-driven benchmark composed of open-ended, expert-annotated health conversations. On the Hard subset of 1,000 challenging examples, this http URL achieves a HealthBench score of 0.51, substantially outperforming leading frontier LLMs (GPT-5, o3, Grok 3, GPT-4, Gemini 2.5, etc.) across all behavioral axes (accuracy, completeness, instruction following, etc.). In a separate 100-sample evaluation against similar agentic RAG assistants (OpenEvidence, this http URL), it maintains a performance lead with a health-bench score of 0.54. These results highlight this http URL strengths in communication, instruction following, and accuracy, while also revealing areas for improvement in context awareness and completeness of a response. Overall, the findings underscore the utility of behavior-level, rubric-based evaluation for building a reliable and trustworthy AI-enabled clinical support assistant.

[331] arXiv:2509.02595 (cross-list from eess.IV) [pdf, html, other]
Title: ConvNeXt with Histopathology-Specific Augmentations for Mitotic Figure Classification
Hana Feki, Alice Blondel, Thomas Walter
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Accurate mitotic figure classification is crucial in computational pathology, as mitotic activity informs cancer grading and patient prognosis. Distinguishing atypical mitotic figures (AMFs), which indicate higher tumor aggressiveness, from normal mitotic figures (NMFs) remains challenging due to subtle morphological differences and high intra-class variability. This task is further complicated by domain shifts, including variations in organ, tissue type, and scanner, as well as limited annotations and severe class imbalance. To address these challenges in Track 2 of the MIDOG 2025 Challenge, we propose a solution based on the lightweight ConvNeXt architecture, trained on all available datasets (AMi-Br, AtNorM-Br, AtNorM-MD, and OMG-Octo) to maximize domain coverage. Robustness is enhanced through a histopathology-specific augmentation pipeline, including elastic and stain-specific transformations, and balanced sampling to mitigate class imbalance. A grouped 5-fold cross-validation strategy ensures reliable evaluation. On the preliminary leaderboard, our model achieved a balanced accuracy of 0.8961, ranking among the top entries. These results highlight that broad domain exposure combined with targeted augmentation strategies is key to building accurate and generalizable mitotic figure classifiers.

[332] arXiv:2509.02596 (cross-list from econ.GN) [pdf, other]
Title: Introducing LCOAI: A Standardized Economic Metric for Evaluating AI Deployment Costs
Eliseo Curcio
Subjects: General Economics (econ.GN); Systems and Control (eess.SY)

As artificial intelligence (AI) becomes foundational to enterprise infrastructure, organizations face growing challenges in accurately assessing the full economic implications of AI deployment. Existing metrics such as API token costs, GPU-hour billing, or Total Cost of Ownership (TCO) fail to capture the complete lifecycle costs of AI systems and provide limited comparability across deployment models. This paper introduces the Levelized Cost of Artificial Intelligence (LCOAI), a standardized economic metric designed to quantify the total capital (CAPEX) and operational (OPEX) expenditures per unit of productive AI output, normalized by valid inference volume. Analogous to established metrics like LCOE (levelized cost of electricity) and LCOH (levelized cost of hydrogen) in the energy sector, LCOAI offers a rigorous, transparent framework to evaluate and compare the cost-efficiency of vendor API deployments versus self-hosted, fine-tuned models. We define the LCOAI methodology in detail and apply it to three representative scenarios, OpenAI GPT-4.1 API, Anthropic Claude Haiku API, and a self-hosted LLaMA-2-13B deployment demonstrating how LCOAI captures critical trade-offs in scalability, investment planning, and cost optimization. Extensive sensitivity analyses further explore the impact of inference volume, CAPEX, and OPEX variability on lifecycle economics. The results illustrate the practical utility of LCOAI in procurement, infrastructure planning, and automation strategy, and establish it as a foundational benchmark for AI economic analysis. Policy implications and areas for future refinement, including environmental and performance-adjusted cost metrics, are also discussed.

[333] arXiv:2509.02597 (cross-list from eess.IV) [pdf, html, other]
Title: Solutions for Mitotic Figure Detection and Atypical Classification in MIDOG 2025
Shuting Xu, Runtong Liu, Zhixuan Chen, Junlin Hou, Hao Chen
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning has driven significant advances in mitotic figure analysis within computational pathology. In this paper, we present our approach to the Mitosis Domain Generalization (MIDOG) 2025 Challenge, which consists of two distinct tasks, i.e., mitotic figure detection and atypical mitosis classification. For the mitotic figure detection task, we propose a two-stage detection-classification framework that first localizes candidate mitotic figures and subsequently refines the predictions using a dedicated classification module. For the atypical mitosis classification task, we employ an ensemble strategy that integrates predictions from multiple state-of-the-art deep learning architectures to improve robustness and accuracy. Extensive experiments demonstrate the effectiveness of our proposed methods across both tasks.

[334] arXiv:2509.02598 (cross-list from eess.IV) [pdf, other]
Title: MIDOG 2025: Mitotic Figure Detection with Attention-Guided False Positive Correction
Andrew Broad, Jason Keighley, Lucy Godson, Alex Wright
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

We present a novel approach which extends the existing Fully Convolutional One-Stage Object Detector (FCOS) for mitotic figure detection. Our composite model adds a Feedback Attention Ladder CNN (FAL-CNN) model for classification of normal versus abnormal mitotic figures, feeding into a fusion network that is trained to generate adjustments to bounding boxes predicted by FCOS. Our network aims to reduce the false positive rate of the FCOS object detector, to improve the accuracy of object detection and enhance the generalisability of the network. Our model achieved an F1 score of 0.655 for mitosis detection on the preliminary evaluation dataset.

[335] arXiv:2509.02599 (cross-list from eess.IV) [pdf, html, other]
Title: RF-DETR for Robust Mitotic Figure Detection: A MIDOG 2025 Track 1 Approach
Piotr Giedziun, Jan Sołtysik, Mateusz Górczany, Norbert Ropiak, Marcin Przymus, Piotr Krajewski, Jarosław Kwiecień, Artur Bartczak, Izabela Wasiak, Mateusz Maniewski
Comments: Challenge report for MIDOG 2025 Track 1
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Mitotic figure detection in histopathology images remains challenging due to significant domain shifts across different scanners, staining protocols, and tissue types. This paper presents our approach for the MIDOG 2025 challenge Track 1, focusing on robust mitotic figure detection across diverse histological contexts. While we initially planned a two-stage approach combining high-recall detection with subsequent classification refinement, time constraints led us to focus on optimizing a single-stage detection pipeline. We employed RF-DETR (Roboflow Detection Transformer) with hard negative mining, trained on MIDOG++ dataset. On the preliminary test set, our method achieved an F1 score of 0.789 with a recall of 0.839 and precision of 0.746, demonstrating effective generalization across unseen domains. The proposed solution offers insights into the importance of training data balance and hard negative mining for addressing domain shift challenges in mitotic figure detection.

[336] arXiv:2509.02600 (cross-list from eess.IV) [pdf, html, other]
Title: Team Westwood Solution for MIDOG 2025 Challenge
Tengyou Xu, Haochen Yang, Xiang 'Anthony' Chen, Hongyan Gu, Mohammad Haeri
Comments: 2 pages, 2 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

This abstract presents our solution (Team Westwood) for mitosis detection and atypical mitosis classification in the MItosis DOmain Generalization (MIDOG) 2025 challenge. For mitosis detection, we trained an nnUNetV2 for initial mitosis candidate screening with high sensitivity, followed by a random forest classifier ensembling predictions of three convolutional neural networks (CNNs): EfficientNet-b3, EfficientNet-b5, and EfficientNetV2-s. For the atypical mitosis classification, we trained another random forest classifier ensembling the predictions of three CNNs: EfficientNet-b3, EfficientNet-b5, and InceptionV3. On the preliminary test set, our solution achieved an F1 score of 0.7450 for track 1 mitosis detection, and a balanced accuracy of 0.8722 for track 2 atypical mitosis classification.

[337] arXiv:2509.02601 (cross-list from eess.IV) [pdf, html, other]
Title: Foundation Model-Driven Classification of Atypical Mitotic Figures with Domain-Aware Training Strategies
Piotr Giedziun, Jan Sołtysik, Mateusz Górczany, Norbert Ropiak, Marcin Przymus, Piotr Krajewski, Jarosław Kwiecień, Artur Bartczak, Izabela Wasiak, Mateusz Maniewski
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

We present a solution for the MIDOG 2025 Challenge Track~2, addressing binary classification of normal mitotic figures (NMFs) versus atypical mitotic figures (AMFs). The approach leverages pathology-specific foundation model H-optimus-0, selected based on recent cross-domain generalization benchmarks and our empirical testing, with Low-Rank Adaptation (LoRA) fine-tuning and MixUp augmentation. Implementation includes soft labels based on multi-expert consensus, hard negative mining, and adaptive focal loss, metric learning and domain adaptation. The method demonstrates both the promise and challenges of applying foundation models to this complex classification task, achieving reasonable performance in the preliminary evaluation phase.

[338] arXiv:2509.02603 (cross-list from stat.AP) [pdf, html, other]
Title: A systematic machine learning approach to measure and assess biases in mobile phone population data
Carmen Cabrera, Francisco Rowe
Comments: 18 pages; 5 figures; 2 tables
Subjects: Applications (stat.AP); Computers and Society (cs.CY)

Traditional sources of population data, such as censuses and surveys, are costly, infrequent, and often unavailable in crisis-affected regions. Mobile phone application data offer near real-time, high-resolution insights into population distribution, but their utility is undermined by unequal access to and use of digital technologies, creating biases that threaten representativeness. Despite growing recognition of these issues, there is still no standard framework to measure and explain such biases, limiting the reliability of digital traces for research and policy. We develop and implement a systematic, replicable framework to quantify coverage bias in aggregated mobile phone application data without requiring individual-level demographic attributes. The approach combines a transparent indicator of population coverage with explainable machine learning to identify contextual drivers of spatial bias. Using four datasets for the United Kingdom benchmarked against the 2021 census, we show that mobile phone data consistently achieve higher population coverage than major national surveys, but substantial biases persist across data sources and subnational areas. Coverage bias is strongly associated with demographic, socioeconomic, and geographic features, often in complex nonlinear ways. Contrary to common assumptions, multi-application datasets do not necessarily reduce bias compared to single-app sources. Our findings establish a foundation for bias assessment standards in mobile phone data, offering practical tools for researchers, statistical agencies, and policymakers to harness these datasets responsibly and equitably.

[339] arXiv:2509.02606 (cross-list from q-bio.QM) [pdf, html, other]
Title: Lessons Learned from Deploying Adaptive Machine Learning Agents with Limited Data for Real-time Cell Culture Process Monitoring
Thanh Tung Khuat, Johnny Peng, Robert Bassett, Ellen Otte, Bogdan Gabrys
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)

This study explores the deployment of three machine learning (ML) approaches for real-time prediction of glucose, lactate, and ammonium concentrations in cell culture processes, using Raman spectroscopy as input features. The research addresses challenges associated with limited data availability and process variability, providing a comparative analysis of pretrained models, just-in-time learning (JITL), and online learning algorithms. Two industrial case studies are presented to evaluate the impact of varying bioprocess conditions on model performance. The findings highlight the specific conditions under which pretrained models demonstrate superior predictive accuracy and identify scenarios where JITL or online learning approaches are more effective for adaptive process monitoring. This study also highlights the critical importance of updating the deployed models/agents with the latest offline analytical measurements during bioreactor operations to maintain the model performance against the changes in cell growth behaviours and operating conditions throughout the bioreactor run. Additionally, the study confirms the usefulness of a simple mixture-of-experts framework in achieving enhanced accuracy and robustness for real-time predictions of metabolite concentrations based on Raman spectral data. These insights contribute to the development of robust strategies for the efficient deployment of ML models in dynamic and changing biomanufacturing environments.

[340] arXiv:2509.02607 (cross-list from eess.IV) [pdf, other]
Title: Towards Digital Twins for Optimal Radioembolization
Nisanth Kumar Panneerselvam, Guneet Mummaneni, Emilie Roncali
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

Radioembolization is a localized liver cancer treatment that delivers radioactive microspheres (30 micron) to tumors via a catheter inserted in the hepatic arterial tree. The goal is to maximize therapeutic efficacy while minimizing damage to healthy liver tissue. However, optimization is challenging due to complex hepatic artery anatomy, variable blood flow, and uncertainty in microsphere transport. The creation of dynamic, patient-specific digital twins may provide a transformative solution to these challenges. This work outlines a framework for a liver radioembolization digital twin using high-fidelity computational fluid dynamics (CFD) and/or recent physics-informed machine learning approaches. The CFD approach involves microsphere transport calculations in the hepatic arterial tree with individual patient data, which enables personalized treatment planning. Although accurate, traditional CFD is computationally expensive and limits clinical applicability.
To accelerate simulations, physics-informed neural networks (PINNs) and their generative extensions play an increasingly important role. PINNs integrate governing equations, such as the Navier-Stokes equations, directly into the neural network training process, enabling mesh-free, data-efficient approximation of blood flow and microsphere transport. Physics-informed generative adversarial networks (PI-GANs), diffusion models (PI-DMs), and transformer-based architectures further enable uncertainty-aware, temporally resolved predictions with reduced computational cost. These AI surrogates not only maintain physical fidelity but also support rapid sampling of diverse flow scenarios, facilitating real-time decision support.
Together, CFD and physics-informed AI methods form the foundation of dynamic, patient-specific digital twin to optimize radioembolization planning and ultimately improve clinical outcomes.

[341] arXiv:2509.02610 (cross-list from q-bio.QM) [pdf, html, other]
Title: Resilient Biosecurity in the Era of AI-Enabled Bioweapons
Jonathan Feldman, Tal Feldman
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI)

Recent advances in generative biology have enabled the design of novel proteins, creating significant opportunities for drug discovery while also introducing new risks, including the potential development of synthetic bioweapons. Existing biosafety measures primarily rely on inference-time filters such as sequence alignment and protein-protein interaction (PPI) prediction to detect dangerous outputs. In this study, we evaluate the performance of three leading PPI prediction tools: AlphaFold 3, AF3Complex, and SpatialPPIv2. These models were tested on well-characterized viral-host interactions, such as those involving Hepatitis B and SARS-CoV-2. Despite being trained on many of the same viruses, the models fail to detect a substantial number of known interactions. Strikingly, none of the tools successfully identify any of the four experimentally validated SARS-CoV-2 mutants with confirmed binding. These findings suggest that current predictive filters are inadequate for reliably flagging even known biological threats and are even more unlikely to detect novel ones. We argue for a shift toward response-oriented infrastructure, including rapid experimental validation, adaptable biomanufacturing, and regulatory frameworks capable of operating at the speed of AI-driven developments.

[342] arXiv:2509.02612 (cross-list from eess.IV) [pdf, html, other]
Title: Is Synthetic Image Augmentation Useful for Imbalanced Classification Problems? Case-Study on the MIDOG2025 Atypical Cell Detection Competition
Leire Benito-Del-Valle, Pedro A. Moreno-Sánchez, Itziar Egusquiza, Itsaso Vitoria, Artzai Picón, Cristina López-Saratxaga, Adrian Galdran
Comments: version 0, to be updated; submitted to midog 2025
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The MIDOG 2025 challenge extends prior work on mitotic figure detection by introducing a new Track 2 on atypical mitosis classification. This task aims to distinguish normal from atypical mitotic figures in histopathology images, a clinically relevant but highly imbalanced and cross-domain problem. We investigated two complementary backbones: (i) ConvNeXt-Small, pretrained on ImageNet, and (ii) a histopathology-specific ViT from Lunit trained via self-supervision. To address the strong prevalence imbalance (9408 normal vs. 1741 atypical), we synthesized additional atypical examples to approximate class balance and compared models trained with real-only vs. real+synthetic data. Using five-fold cross-validation, both backbones reached strong performance (mean AUROC approximately 95 percent), with ConvNeXt achieving slightly higher peaks while Lunit exhibited greater fold-to-fold stability. Synthetic balancing, however, did not lead to consistent improvements. On the organizers' preliminary hidden test set, explicitly designed as an out-of-distribution debug subset, ConvNeXt attained the highest AUROC (95.4 percent), whereas Lunit remained competitive on balanced accuracy. These findings suggest that both ImageNet and domain-pretrained backbones are viable for atypical mitosis classification, with domain-pretraining conferring robustness and ImageNet pretraining reaching higher peaks, while naive synthetic balancing has limited benefit. Full hidden test set results will be reported upon challenge completion.

[343] arXiv:2509.02614 (cross-list from stat.AP) [pdf, html, other]
Title: Use ADAS Data to Predict Near-Miss Events: A Group-Based Zero-Inflated Poisson Approach
Xinbo Zhang, Montserrat Guillen, Lishuai Li, Xin Li, Youhua Frank Chen
Comments: Preprint. 10 pages, 3 figures, 4 tables. Submitted to 2025 IEEE International Conference on Big Data (IEEE BigData 2025). Corresponding authors: Youhua Frank Chen (youhchen@cityu.this http URL)
Subjects: Applications (stat.AP); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

Driving behavior big data leverages multi-sensor telematics to understand how people drive and powers applications such as risk evaluation, insurance pricing, and targeted intervention. Usage-based insurance (UBI) built on these data has become mainstream. Telematics-captured near-miss events (NMEs) provide a timely alternative to claim-based risk, but weekly NMEs are sparse, highly zero-inflated, and behaviorally heterogeneous even after exposure normalization. Analyzing multi-sensor telematics and ADAS warnings, we show that the traditional statistical models underfit the dataset. We address these challenges by proposing a set of zero-inflated Poisson (ZIP) frameworks that learn latent behavior groups and fit offset-based count models via EM to yield calibrated, interpretable weekly risk predictions. Using a naturalistic dataset from a fleet of 354 commercial drivers over a year, during which the drivers completed 287,511 trips and logged 8,142,896 km in total, our results show consistent improvements over baselines and prior telematics models, with lower AIC/BIC values in-sample and better calibration out-of-sample. We also conducted sensitivity analyses on the EM-based grouping for the number of clusters, finding that the gains were robust and interpretable. Practically, this supports context-aware ratemaking on a weekly basis and fairer premiums by recognizing heterogeneous driving styles.

[344] arXiv:2509.02615 (cross-list from astro-ph.IM) [pdf, other]
Title: Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation
Mariia Drozdova, Erica Lastufka, Vitaliy Kinakh, Taras Holotyak, Daniel Schaerer, Slava Voloshynovskiy
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI)

Vision-Language Models (VLMs), such as recent Qwen and Gemini models, are positioned as general-purpose AI systems capable of reasoning across domains. Yet their capabilities in scientific imaging, especially on unfamiliar and potentially previously unseen data distributions, remain poorly understood. In this work, we assess whether generic VLMs, presumed to lack exposure to astronomical corpora, can perform morphology-based classification of radio galaxies using the MiraBest FR-I/FR-II dataset. We explore prompting strategies using natural language and schematic diagrams, and, to the best of our knowledge, we are the first to introduce visual in-context examples within prompts in astronomy. Additionally, we evaluate lightweight supervised adaptation via LoRA fine-tuning. Our findings reveal three trends: (i) even prompt-based approaches can achieve good performance, suggesting that VLMs encode useful priors for unfamiliar scientific domains; (ii) however, outputs are highly unstable, i.e. varying sharply with superficial prompt changes such as layout, ordering, or decoding temperature, even when semantic content is held constant; and (iii) with just 15M trainable parameters and no astronomy-specific pretraining, fine-tuned Qwen-VL achieves near state-of-the-art performance (3% Error rate), rivaling domain-specific models. These results suggest that the apparent "reasoning" of VLMs often reflects prompt sensitivity rather than genuine inference, raising caution for their use in scientific domains. At the same time, with minimal adaptation, generic VLMs can rival specialized models, offering a promising but fragile tool for scientific discovery.

[345] arXiv:2509.02617 (cross-list from stat.ML) [pdf, html, other]
Title: Gaussian process surrogate with physical law-corrected prior for multi-coupled PDEs defined on irregular geometry
Pucheng Tang, Hongqiao Wang, Wenzhou Lin, Qian Chen, Heng Yong
Comments: 40 pages, 16 figures, 7 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)

Parametric partial differential equations (PDEs) are fundamental mathematical tools for modeling complex physical systems, yet their numerical evaluation across parameter spaces remains computationally intensive when using conventional high-fidelity solvers. To address this challenge, we propose a novel physical law-corrected prior Gaussian process (LC-prior GP) surrogate modeling framework that effectively integrates data-driven learning with underlying physical constraints to flexibly handle multi-coupled variables defined on complex geometries. The proposed approach leverages proper orthogonal decomposition (POD) to parameterize high-dimensional PDE solutions via their dominant modes and associated coefficients, thereby enabling efficient Gaussian process (GP) surrogate modeling within a reduced-dimensional coefficient space. A key contribution lies in the incorporation of physical laws together with a limited number of parameter samples to correct the GP posterior mean, thus avoiding reliance on computationally expensive numerical solvers. Furthermore, interpolation functions are constructed to describe the mapping from the full parameter space to the physics-based correction term. This mapping is subsequently backpropagated to constrain the original GP surrogate, yielding a more physically consistent conditional prior. To handle irregular geometries, the radial basis function-finite difference (RBF-FD) method is incorporated during training set computation, with its inherent differentiation matrices providing both computational efficiency and numerical accuracy for physical constraint optimization. The effectiveness of the proposed method is demonstrated through numerical experiments involving a reaction-diffusion model, miscible flooding models, and Navier-Stokes equations with multi-physics coupling defined on irregular domains.

[346] arXiv:2509.02622 (cross-list from eess.AS) [pdf, other]
Title: IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering
Berger Clémentine (IDS, S2A), Stamadiatis Paraskevas (IDS, S2A), Badeau Roland (IDS, S2A), Essid Slim (IDS, S2A)
Journal-ref: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, Oct 2025, Tahoe City, CA, United States
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)

We are interested in audio systems capable of performing a differentiated processing of stationary backgrounds and isolated acoustic events within an acoustic scene, whether for applying specific processing methods to each part or for focusing solely on one while ignoring the other. Such systems have applications in real-world scenarios, including robust adaptive audio rendering systems (e.g., EQ or compression), plosive attenuation in voice mixing, noise suppression or reduction, robust acoustic event classification or even bioacoustics. To this end, we introduce IS${}^3$, a neural network designed for Impulsive--Stationary Sound Separation, that isolates impulsive acoustic events from the stationary background using a deep filtering approach, that can act as a pre-processing stage for the above-mentioned tasks. To ensure optimal training, we propose a sophisticated data generation pipeline that curates and adapts existing datasets for this task. We demonstrate that a learning-based approach, build on a relatively lightweight neural architecture and trained with well-designed and varied data, is successful in this previously unaddressed task, outperforming the Harmonic--Percussive Sound Separation masking method, adapted from music signal processing research, and wavelet filtering on objective separation metrics.

[347] arXiv:2509.02627 (cross-list from eess.IV) [pdf, html, other]
Title: A Two-Stage Strategy for Mitosis Detection Using Improved YOLO11x Proposals and ConvNeXt Classification
Jie Xiao, Mengye Lyu, Shaojun Liu
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

MIDOG 2025 Track 1 requires mitosis detection in whole-slide images (WSIs) containing non-tumor, inflamed, and necrotic regions. Due to the complicated and heterogeneous context, as well as possible artifacts, there are often false positives and false negatives, thus degrading the detection F1-score. To address this problem, we propose a two-stage framework. Firstly, an improved YOLO11x, integrated with EMA attention and LSConv, is employed to generate mitosis candidates. We use a low confidence threshold to generate as many proposals as possible, ensuring the detection recall. Then, a ConvNeXt-Tiny classifier is employed to filter out the false positives, ensuring the detection precision. Consequently, the proposed two-stage framework can generate a high detection F1-score. Evaluated on a fused dataset comprising MIDOG++, MITOS_WSI_CCMCT, and MITOS_WSI_CMC, our framework achieves an F1-score of 0.882, which is 0.035 higher than the single-stage YOLO11x baseline. This performance gain is produced by a significant precision improvement, from 0.762 to 0.839, and a comparable recall. The code is available at this https URL.

[348] arXiv:2509.02629 (cross-list from quant-ph) [pdf, html, other]
Title: \textit{In Silico} Benchmarking of Detectable Byzantine Agreement in Noisy Quantum Networks
Mayank Bhatia, Shaan Doshi, Daniel Winton, Brian Doolittle, Bruno Abreu, Santiago Núñez-Corrales
Comments: 10 pages, 17 figures
Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

Quantum communication resources offer significant advantages for fault-tolerant distributed protocols, particularly in Byzantine Agreement (BA), where reliability against adversarial interference is essential. Quantum Detectable Byzantine Agreement (QDBA) enables consensus protocols that surpass classical limitations by leveraging entangled quantum states. In this work, we focus on the practical realization of QDBA using Einstein-Podolsky-Rosen (EPR) pairs, the simplest maximally entangled quantum resources, making the protocol experimentally accessible across current quantum hardware platforms. We present a comprehensive computational study of the EPRQDBA protocol under realistic quantum network conditions, utilizing the Aliro Quantum Network Simulator to evaluate the performance and robustness of the protocol. Our simulations systematically explore the protocol's parameter space --including variations in network size, traitorous node count, the amount of entanglement consumed in the protocol, and physically motivated noise models tailored specifically for superconducting and photonic qubit technologies. Through extensive numerical experiments, we provide insights into how these physically realistic parameters impact protocol performance, establishing critical thresholds and optimal operational regimes for experimental implementations. This work bridges theoretical advances in quantum consensus protocols with practical network implementations, offering a concrete reference for experimentalists. Our findings serve as a guideline for evaluating and optimizing QDBA implementations in realistic, noisy environments.

[349] arXiv:2509.02630 (cross-list from eess.IV) [pdf, html, other]
Title: Challenges and Lessons from MIDOG 2025: A Two-Stage Approach to Domain-Robust Mitotic Figure Detection
Euiseop Song, Jaeyoung Park, Jaewoo Park
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Mitotic figure detection remains a challenging task in computational pathology due to domain variability and morphological complexity. This paper describes our participation in the MIDOG 2025 challenge, focusing on robust mitotic figure detection across diverse tissue domains. We developed a two-stage pipeline combining Faster R-CNN for candidate detection with an ensemble of three classifiers (DenseNet-121, EfficientNet-v2, InceptionResNet-v2) for false positive reduction. Our best submission achieved F1-score 0.2237 (Recall: 0.9528, Precision: 0.1267) using a Faster R-CNN trained solely on MIDOG++ dataset. While our high recall demonstrates effective mitotic figure detection, the critically low precision (12.67%) reveals fundamental challenges in distinguishing true mitoses from morphologically similar imposters across diverse domains. Analysis of six submission variants showed that subsequent optimization attempts were counterproductive, highlighting the omplexity of domain generalization in histopathology. This work provides valuable insights into the practical challenges of developing robust mitotic figure detection algorithms and emphasizes the importance of effective false positive suppression strategies.

[350] arXiv:2509.02637 (cross-list from eess.IV) [pdf, other]
Title: A Single Detect Focused YOLO Framework for Robust Mitotic Figure Detection
Yasemin Topuz, M. Taha Gökcan, Serdar Yıldız, Songül Varlı
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Mitotic figure detection is a crucial task in computational pathology, as mitotic activity serves as a strong prognostic marker for tumor aggressiveness. However, domain variability that arises from differences in scanners, tissue types, and staining protocols poses a major challenge to the robustness of automated detection methods. In this study, we introduce SDF-YOLO (Single Detect Focused YOLO), a lightweight yet domain-robust detection framework designed specifically for small, rare targets such as mitotic figures. The model builds on YOLOv11 with task-specific modifications, including a single detection head aligned with mitotic figure scale, coordinate attention to enhance positional sensitivity, and improved cross-channel feature mixing. Experiments were conducted on three datasets that span human and canine tumors: MIDOG ++, canine cutaneous mast cell tumor (CCMCT), and canine mammary carcinoma (CMC). When submitted to the preliminary test set for the MIDOG2025 challenge, SDF-YOLO achieved an average precision (AP) of 0.799, with a precision of 0.758, a recall of 0.775, an F1 score of 0.766, and an FROC-AUC of 5.793, demonstrating both competitive accuracy and computational efficiency. These results indicate that SDF-YOLO provides a reliable and efficient framework for robust mitotic figure detection across diverse domains.

[351] arXiv:2509.02639 (cross-list from q-bio.GN) [pdf, html, other]
Title: Enhanced Single-Cell RNA-seq Embedding through Gene Expression and Data-Driven Gene-Gene Interaction Integration
Hojjat Torabi Goudarzi, Maziyar Baran Pouyan
Comments: 33 pages, 9 figures, article
Journal-ref: Computers in Biology and Medicine 188 (2025) 109880
Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI)

Single-cell RNA sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity, enabling detailed analysis of complex biological systems at single-cell resolution. However, the high dimensionality and technical noise inherent in scRNA-seq data pose significant analytical challenges. While current embedding methods focus primarily on gene expression levels, they often overlook crucial gene-gene interactions that govern cellular identity and function. To address this limitation, we present a novel embedding approach that integrates both gene expression profiles and data-driven gene-gene interactions. Our method first constructs a Cell-Leaf Graph (CLG) using random forest models to capture regulatory relationships between genes, while simultaneously building a K-Nearest Neighbor Graph (KNNG) to represent expression similarities between cells. These graphs are then combined into an Enriched Cell-Leaf Graph (ECLG), which serves as input for a graph neural network to compute cell embeddings. By incorporating both expression levels and gene-gene interactions, our approach provides a more comprehensive representation of cellular states. Extensive evaluation across multiple datasets demonstrates that our method enhances the detection of rare cell populations and improves downstream analyses such as visualization, clustering, and trajectory inference. This integrated approach represents a significant advance in single-cell data analysis, offering a more complete framework for understanding cellular diversity and dynamics.

[352] arXiv:2509.02640 (cross-list from eess.IV) [pdf, html, other]
Title: Adaptive Learning Strategies for Mitotic Figure Classification in MIDOG2025 Challenge
Biwen Meng, Xi Long, Jingxin Liu
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Atypical mitotic figures (AMFs) are clinically relevant indicators of abnormal cell division, yet their reliable detection remains challenging due to morphological ambiguity and scanner variability. In this work, we investigated three variants of adapting the pathology foundation model UNI2-h for the MIDOG2025 Track 2 challenge. Starting from a LoRA-based baseline, we found that visual prompt tuning (VPT) substantially improved generalization, and that further integrating test-time augmentation (TTA) with Vahadane and Macenko stain normalization provided the best robustness. Our final submission achieved a balanced accuracy of 0.8837 and an ROC-AUC of 0.9513 on the preliminary leaderboard, ranking within the top 10 teams. These results demonstrate that prompt-based adaptation combined with stain-normalization TTA offers an effective strategy for atypical mitosis classification under diverse imaging conditions.

[353] arXiv:2509.02642 (cross-list from physics.chem-ph) [pdf, html, other]
Title: BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation
Bin Feng, Jiying Zhang, Xinni Zhang, Zijing Liu, Yu Li
Subjects: Chemical Physics (physics.chem-ph); Artificial Intelligence (cs.AI)

Molecular dynamics (MD) simulations are essential tools in computational chemistry and drug discovery, offering crucial insights into dynamic molecular behavior. However, their utility is significantly limited by substantial computational costs, which severely restrict accessible timescales for many biologically relevant processes. Despite the encouraging performance of existing machine learning (ML) methods, they struggle to generate extended biomolecular system trajectories, primarily due to the lack of MD datasets and the large computational demands of modeling long historical trajectories. Here, we introduce BioMD, the first all-atom generative model to simulate long-timescale protein-ligand dynamics using a hierarchical framework of forecasting and interpolation. We demonstrate the effectiveness and versatility of BioMD on the DD-13M (ligand unbinding) and MISATO datasets. For both datasets, BioMD generates highly realistic conformations, showing high physical plausibility and low reconstruction errors. Besides, BioMD successfully generates ligand unbinding paths for 97.1% of the protein-ligand systems within ten attempts, demonstrating its ability to explore critical unbinding pathways. Collectively, these results establish BioMD as a tool for simulating complex biomolecular processes, offering broad applicability for computational chemistry and drug discovery.

[354] arXiv:2509.02648 (cross-list from q-bio.GN) [pdf, other]
Title: Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data
John Zobolas, Anne-Marie George, Alberto López, Sebastian Fischer, Marc Becker, Tero Aittokallio
Comments: 52 pages, 5 figures, 9 Supplementary Figures, 1 Supplementary Table
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Applications (stat.AP)

Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.

[355] arXiv:2509.02649 (cross-list from stat.ML) [pdf, other]
Title: Fast kernel methods: Sobolev, physics-informed, and additive models
Nathan Doumèche (LPSM, EDF R&amp;D OSIRIS), Francis Bach (ENS-PSL), Gérard Biau (LPSM, IUF), Claire Boyer (LMO)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)

Kernel methods are powerful tools in statistical learning, but their cubic complexity in the sample size n limits their use on large-scale datasets. In this work, we introduce a scalable framework for kernel regression with O(n log n) complexity, fully leveraging GPU acceleration. The approach is based on a Fourier representation of kernels combined with non-uniform fast Fourier transforms (NUFFT), enabling exact, fast, and memory-efficient computations. We instantiate our framework in three settings: Sobolev kernel regression, physics-informed regression, and additive models. When known, the proposed estimators are shown to achieve minimax convergence rates, consistent with classical kernel theory. Empirical results demonstrate that our methods can process up to tens of billions of samples within minutes, providing both statistical accuracy and computational scalability. These contributions establish a flexible approach, paving the way for the routine application of kernel methods in large-scale learning tasks.

[356] arXiv:2509.02651 (cross-list from q-bio.OT) [pdf, other]
Title: Quantifying Clinician Bias and its Effects on Schizophrenia Diagnosis in the Emergency Department of the Mount Sinai Health System
Alissa A. Valentine, Lauren A. Lepow, Lili Chan, Alexander W. Charney, Isotta Landi
Subjects: Other Quantitative Biology (q-bio.OT); Machine Learning (cs.LG)

In the United States, schizophrenia (SCZ) carries a race and sex disparity that may be explained by clinician bias - a belief held by a clinician about a patient that prevents impartial clinical decision making. The emergency department (ED) is marked by higher rates of stress that lead to clinicians relying more on implicit biases during decision making. In this work, we considered a large cohort of psychiatric patients in the ED from the Mount Sinai Health System (MSHS) in New York City to investigate the effects of clinician bias on SCZ diagnosis while controlling for known risk factors and patient sociodemographic information. Clinician bias was quantified as the ratio of negative to total sentences within a patient's first ED note. We utilized a logistic regression to predict SCZ diagnosis given patient race, sex, age, history of trauma or substance use disorder, and the ratio of negative sentences. Our findings showed that an increased ratio of negative sentences is associated with higher odds of obtaining a SCZ diagnosis [OR (95% CI)=1.408 (1.361-1.456)]. Identifying as male [OR (95% CI)=1.112 (1.055-1.173)] or Black [OR (95% CI)=1.081(1.031-1.133)] increased one's odds of being diagnosed with SCZ. However, from an intersectional lens, Black female patients with high SES have the highest odds of obtaining a SCZ diagnosis [OR (95% CI)=1.629 (1.535-1.729)]. Results such as these suggest that SES does not act as a protective buffer against SCZ diagnosis in all patients, demanding more attention to the quantification of health disparities. Lastly, we demonstrated that clinician bias is operational with real world data and related to increased odds of obtaining a stigmatizing diagnosis such as SCZ.

[357] arXiv:2509.02653 (cross-list from physics.soc-ph) [pdf, other]
Title: Quantifying the Social Costs of Power Outages and Restoration Disparities Across Four U.S. Hurricanes
Xiangpeng Li, Junwei Ma, Bo Li, Ali Mostafavi
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); General Economics (econ.GN)

The multifaceted nature of disaster impact shows that densely populated areas contribute more to aggregate burden, while sparsely populated but heavily affected regions suffer disproportionately at the individual level. This study introduces a framework for quantifying the societal impacts of power outages by translating customer weighted outage exposure into deprivation measures, integrating welfare metrics with three recovery indicators, average outage days per customer, restoration duration, and relative restoration rate, computed from sequential EAGLE I observations and linked to Zip Code Tabulation Area demographics. Applied to four United States hurricanes, Beryl 2024 Texas, Helene 2024 Florida, Milton 2024 Florida, and Ida 2021 Louisiana, this standardized pipeline provides the first cross event, fine scale evaluation of outage impacts and their drivers. Results demonstrate regressive patterns with greater burdens in lower income areas, mechanistic analysis shows deprivation increases with longer restoration durations and decreases with faster restoration rates, explainable modeling identifies restoration duration as the dominant driver, and clustering reveals distinct recovery typologies not captured by conventional reliability metrics. This framework delivers a transferable method for assessing outage impacts and equity, comparative cross event evidence linking restoration dynamics to social outcomes, and actionable spatial analyses that support equity informed restoration planning and resilience investment.

[358] arXiv:2509.02710 (cross-list from physics.med-ph) [pdf, html, other]
Title: Toward a robust lesion detection model in breast DCE-MRI: adapting foundation models to high-risk women
Gabriel A.B. do Nascimento, Vincent Dong, Guilherme J. Cavalcante, Alex Nguyen, Thaís G. do Rêgo, Yuri Malheiros, Telmo M. Silva Filho, Carla R. Zeballos Torrez, James C. Gee, Anne Marie McCarthy, Andrew D. A. Maidment, Bruno Barufaldi
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Accurate breast MRI lesion detection is critical for early cancer diagnosis, especially in high-risk populations. We present a classification pipeline that adapts a pretrained foundation model, the Medical Slice Transformer (MST), for breast lesion classification using dynamic contrast-enhanced MRI (DCE-MRI). Leveraging DINOv2-based self-supervised pretraining, MST generates robust per-slice feature embeddings, which are then used to train a Kolmogorov--Arnold Network (KAN) classifier. The KAN provides a flexible and interpretable alternative to conventional convolutional networks by enabling localized nonlinear transformations via adaptive B-spline activations. This enhances the model's ability to differentiate benign from malignant lesions in imbalanced and heterogeneous clinical datasets. Experimental results demonstrate that the MST+KAN pipeline outperforms the baseline MST classifier, achieving AUC = 0.80 \pm 0.02 while preserving interpretability through attention-based heatmaps. Our findings highlight the effectiveness of combining foundation model embeddings with advanced classification strategies for building robust and generalizable breast MRI analysis tools.

[359] arXiv:2509.02724 (cross-list from eess.SP) [pdf, other]
Title: Recall Gabor Communication Theory and Joint Time-Frequency Analysis
Xiang-Gen Xia
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

In this article, we first briefly recall Gabor's communication theory and then Gabor transform and expansion, and also its connection with joint time frequency analysis.

[360] arXiv:2509.02758 (cross-list from math.HO) [pdf, other]
Title: Optimizing Geometry Problem Sets for Skill Development
Michael Bouzinier, Sergey Trifonov
Subjects: History and Overview (math.HO); Artificial Intelligence (cs.AI)

This article describes an ontology and methodology for annotating and organizing Euclidean Geometry problems, developed in the early 1990s and implemented as a software tool. While the majority of this work -- including the ontology and solution graph paradigm -- was completed over thirty years ago, we argue that it has renewed relevance in the context of modern artificial intelligence. In particular, we explore the hypothesis that this established framework can facilitate automated solution validation and feedback when paired with contemporary large language models, thereby supporting teachers and self-learners in geometry education. We document the original architecture and its enduring value, and outline pathways for bridging historical educational resources with next-generation AI techniques.

[361] arXiv:2509.02797 (cross-list from eess.SP) [pdf, html, other]
Title: minPIC: Towards Optimal Power Allocation in Multi-User Interference Channels
Sagnik Bhattacharya, Abhiram Rao Gorle, John M. Cioffi
Comments: To appear in IEEE GLOBECOM 2025
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

6G envisions massive cell-free networks with spatially nested multiple access (MAC) and broadcast (BC) channels without centralized coordination. This makes optimal resource allocation across power, subcarriers, and decoding orders crucial for interference channels (ICs), where neither transmitters nor receivers can cooperate. Current orthogonal multiple access (OMA) methods, as well as non-orthogonal (NOMA) and rate-splitting (RSMA) schemes, rely on fixed heuristics for interference management, leading to suboptimal rates, power inefficiency, and scalability issues. This paper proposes a novel minPIC framework for optimal power, subcarrier, and decoding order allocation in general multi-user ICs. Unlike existing methods, minPIC eliminates heuristic SIC order assumptions. Despite the convexity of the IC capacity region, fixing an SIC order induces non-convexity in resource allocation, traditionally requiring heuristic approximations. We instead introduce a dual-variable-guided sorting criterion to identify globally optimal SIC orders, followed by convex optimization with auxiliary log-det constraints, efficiently solved via binary search. We also demonstrate that minPIC could potentially meet the stringent high-rate, low-power targets of immersive XR and other 6G applications. To the best of our knowledge, minPIC is the first algorithmic realisation of the Pareto boundary of the SIC-achievable rate region for Gaussian ICs, opening the door to scalable interference management in cell-free networks.

[362] arXiv:2509.02800 (cross-list from econ.GN) [pdf, html, other]
Title: Too Noisy to Collude? Algorithmic Collusion Under Laplacian Noise
Niuniu Zhang
Subjects: General Economics (econ.GN); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)

The rise of autonomous pricing systems has sparked growing concern over algorithmic collusion in markets from retail to housing. This paper examines controlled information quality as an ex ante policy lever: by reducing the fidelity of data that pricing algorithms draw on, regulators can frustrate collusion before supracompetitive prices emerge. We show, first, that information quality is the central driver of competitive outcomes, shaping prices, profits, and consumer welfare. Second, we demonstrate that collusion can be slowed or destabilized by injecting carefully calibrated noise into pooled market data, yielding a feasibility region where intervention disrupts cartels without undermining legitimate pricing. Together, these results highlight information control as a lightweight yet practical lever to blunt digital collusion at its source.

[363] arXiv:2509.02804 (cross-list from math.OC) [pdf, html, other]
Title: A Proximal Descent Method for Minimizing Weakly Convex Optimization
Feng-Yi Liao, Yang Zheng
Comments: 54 pages, 3 tables, and 3 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We study the problem of minimizing a $m$-weakly convex and possibly nonsmooth function. Weak convexity provides a broad framework that subsumes convex, smooth, and many composite nonconvex functions. In this work, we propose a $\textit{proximal descent method}$, a simple and efficient first-order algorithm that combines the inexact proximal point method with classical convex bundle techniques. Our analysis establishes explicit non-asymptotic convergence rates in terms of $(\eta,\epsilon)$-inexact stationarity. In particular, the method finds an $(\eta,\epsilon)$-inexact stationary point using at most $\mathcal{O}\!\left( \Big(\tfrac{1}{\eta^2} + \tfrac{1}{\epsilon}\Big) \max\!\left\{\tfrac{1}{\eta^2}, \tfrac{1}{\epsilon}\right\} \right)$ function value and subgradient evaluations. Consequently, the algorithm also achieves the best-known complexity of $\mathcal{O}(1/\delta^4)$ for finding an approximate Moreau stationary point with $\|\nabla f_{2m}(x)\|\leq \delta$. A distinctive feature of our method is its \emph{automatic adaptivity}: with no parameter tuning or algorithmic modification, it accelerates to $\mathcal{O}(1/\delta^2)$ complexity under smoothness and further achieves linear convergence under quadratic growth. Overall, this work bridges convex bundle methods and weakly convex optimization, while providing accelerated guarantees under structural assumptions.

[364] arXiv:2509.02909 (cross-list from quant-ph) [pdf, html, other]
Title: Treasure Hunt in Anonymous Graphs with Quantum Pebbles by Oblivious Agents
Gaurav Gaur, Barun Gorain, Rishi Ranjan Singh, Daya Gaur
Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Emerging Technologies (cs.ET)

We investigate the problem of finding a static treasure in anonymous graphs using oblivious agents and introduce a novel approach that leverages quantum information. In anonymous graphs, vertices are unlabelled, indistinguishable, and edges are locally labelled with port numbers. Agents typically rely on stationary classical pebbles placed by an oracle to guide their search. However, this classical approach is constrained by limited information transmission and high traversal complexity. Classical pebbles are not sufficient for search if the agents are oblivious. We propose the first use of quantum pebbles for search in anonymous graphs. Quantum pebbles periodically emit qubits in a fixed quantum state. Each pebble encodes the port number to the next node using a unique quantum state. The agent determines the correct path by performing measurements in multiple bases, exploiting the probabilistic nature of quantum measurement to distinguish states. We show that this strategy enables an oblivious agent to locate the treasure in $D$ steps using $D$ quantum pebbles, where $D$ is the length of the shortest path between the starting point and the treasure. Moreover, only $O((\log D + \log \Delta)/(\log 1/\delta))$ measurements per node are required to ensure high success probability in a graph with maximum degree $\Delta$ where $\delta = \cos^2(\frac{\pi}{2\Delta})$. We propose the use of quantum information as a guidance mechanism in anonymous graph search. We demonstrate that quantum pebbles can not only emulate the functionality of classical pebbles but can do so with improved efficiency, offering a promising direction for future quantum-enhanced distributed algorithms.

[365] arXiv:2509.02937 (cross-list from math.OC) [pdf, html, other]
Title: Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
Lesi Chen, Junru Li, Jingzhao Zhang
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper studies the complexity of finding an $\epsilon$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(\epsilon^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $\Omega(\epsilon^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p \epsilon^{4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $\Omega(\epsilon^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = \Omega( \log \epsilon^{-1} / \log \log \epsilon^{-1})$.

[366] arXiv:2509.02947 (cross-list from quant-ph) [pdf, html, other]
Title: Zero-Error Nash Equilibrium: Harnessing Nonlocal Correlation in Incomplete Information Games
Ambuj, Tushar, Siddharth R. Pandey, Ram Krishna Patra, Anandamay Das Bhowmik, Kuntal Som, Amit Mukherjee
Comments: Comments are welcome. 11 pages, 1 figure
Subjects: Quantum Physics (quant-ph); Computer Science and Game Theory (cs.GT)

Claude Shannon's zero-error communication paradigm reshaped our understanding of fault-tolerant information transfer. Here, we adapt this notion into game theory with incomplete information. We ask: can players with private information coordinate on a Nash equilibrium with zero probability of error? We identify Bayesian games in which such coordination is impossible classically, yet achievable by harnessing Bell nonlocal correlations. We formalize this requirement as zero-error Nash equilibrium coordination, establishing a new bridge between information theory, game theory, and quantum nonlocality. Furthermore, we construct a tripartite Bayesian game that admits zero-error Nash equilibrium coordination with genuine entanglement, and a two-player game where a stronger notion of coordination can be achieved using every two-qubit pure entangled state except the maximally one. Crucially, the advantage persists under experimentally relevant noise, demonstrating nonlocality as a robust resource for near-zero error decision-making under uncertainty.

[367] arXiv:2509.02957 (cross-list from eess.IV) [pdf, html, other]
Title: Ensemble YOLO Framework for Multi-Domain Mitotic Figure Detection in Histopathology Images
Navya Sri Kelam, Akash Parekh, Saikiran Bonthu, Nitin Singhal
Comments: 3pages, MIDOG25 Challenge
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Accurate detection of mitotic figures in whole slide histopathological images remains a challenging task due to their scarcity, morphological heterogeneity, and the variability introduced by tissue preparation and staining protocols. The MIDOG competition series provides standardized benchmarks for evaluating detection approaches across diverse domains, thus motivating the development of generalizable deep learning models. In this work, we investigate the performance of two modern one-stage detectors, YOLOv5 and YOLOv8, trained on MIDOG++, CMC, and CCMCT datasets. To enhance robustness, training incorporated stain-invariant color perturbations and texture preserving augmentations. In internal validation, YOLOv5 achieved superior precision, while YOLOv8 provided improved recall, reflecting architectural trade-offs between anchor-based and anchor-free detection. To capitalize on these complementary strengths, we employed an ensemble of the two models, which improved sensitivity without a major reduction in precision. These findings highlight the effectiveness of ensemble strategies built upon contemporary object detectors to advance automated mitosis detection in digital pathology.

[368] arXiv:2509.02971 (cross-list from stat.ML) [pdf, html, other]
Title: Scale-Adaptive Generative Flows for Multiscale Scientific Data
Yifan Chen, Eric Vanden-Eijnden
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Probability (math.PR)

Flow-based generative models can face significant challenges when modeling scientific data with multiscale Fourier spectra, often producing large errors in fine-scale features. We address this problem within the framework of stochastic interpolants, via principled design of noise distributions and interpolation schedules. The key insight is that the noise should not be smoother than the target data distribution -- measured by Fourier spectrum decay rates -- to ensure bounded drift fields near the initial time. For Gaussian and near-Gaussian distributions whose fine-scale structure is known, we show that spectrum-matched noise improves numerical efficiency compared to standard white-noise approaches. For complex non-Gaussian distributions, we develop scale-adaptive interpolation schedules that address the numerical ill-conditioning arising from rougher-than-data noise. Numerical experiments on synthetic Gaussian random fields and solutions to the stochastic Allen-Cahn and Navier-Stokes equations validate our approach and demonstrate its ability to generate high-fidelity samples at lower computational cost than traditional approaches.

[369] arXiv:2509.02992 (cross-list from quant-ph) [pdf, other]
Title: Programmable Quantum Matter: Heralding Large Cluster States in Driven Inhomogeneous Spin Ensembles
Pratyush Anand, Louis Follet, Odiel Hooybergs, Dirk R. Englund
Comments: 21 pages main text, 9 figures; 27 pages Supplementary Information, 13 figures
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Information Theory (cs.IT)

Atom-like emitters in solids are promising platforms for quantum sensing and information processing, but inhomogeneities in the emitter fine structure complicate quantum control. We present a framework that leverages this diversity to reduce the resources for generating optically heralded spin cluster states across $N_q$ emitters from the conventional order $O(N_q)$ to $O(1)$ in ensembles of $N_q \sim 10$-$100$. An optimized pulse sequence simultaneously corrects pulse-length and detuning errors, achieving single-qubit gate fidelities exceeding $99.99\%$ for errors (normalized relative to the Rabi drive strength) up to 0.3, while maintaining fidelities above $99\%$ for errors as large as 0.4. Applied as a Carr-Purcell-Meiboom-Gill (CPMG) dynamical decoupling protocol to the dominant noise spectrum of silicon-vacancy centers in diamond, it enhances ensemble coherence times by over $7\times$ compared to interleaved bang-bang based CPMG. For state-of-the-art dilution refrigerators, global resonant optimal decoupling across $N_q$ spins sharply reduces heating, addressing the trade-off between the spin coherence and scaling to $N_q \gg 1$. We further introduce a modified single-photon entanglement protocol with an efficient algorithm for deterministic entanglement compilation. Depending on the decoupling time window, our method yields order $O(10^2$-$10^4)$ more entanglement links than bang-bang sequences, with theoretical guarantees of order $\Omega(N_q)$ unique links, improvable by control tuning. Together, these techniques provide scalable tools - including global control, phase denoising, remote entanglement, and compilation - for robust quantum computing architectures with heterogeneous spin ensembles.

[370] arXiv:2509.02996 (cross-list from math.PR) [pdf, html, other]
Title: Group-averaged Markov chains: mixing improvement
Michael C.H. Choi, Youjia Wang
Comments: 68 pages
Subjects: Probability (math.PR); Information Theory (cs.IT); Group Theory (math.GR); Computation (stat.CO)

For Markov kernels $P$ on a general state space $\mathcal{X}$, we introduce a new class of averaged Markov kernels $P_{da}(G,\nu)$ of $P$ induced by a group $G$ that acts on $\mathcal{X}$ and a probability measure $\nu$ on $G \times G$. Notable special cases are the group-orbit average $\overline{P}$, left-average $P_{la}$, right-average $P_{ra}$ and the independent-double-average $(P_{la})_{ra}$. For $\pi$-stationary $P$ in which $\pi$ is invariant with respect to $G$, we show that in general $P_{da}$ enjoys favorable convergence properties than $P$ based on metrics such as spectral gap or asymptotic variance, and within the family of $P_{da}$ the most preferable kernel is in general $(P_{la})_{ra}$. We demonstrate that $P_{la}, P_{ra}, (P_{la})_{ra}$ are comparable in terms of mixing times, which supports the use of $P_{la}, P_{ra}$ in practice as computationally cheaper alternatives over $(P_{la})_{ra}$. These averaged kernels also admit natural geometric interpretations: they emerge as unique projections of $P$ onto specific $G$-invariant structures under the Kullback-Leibler divergence or the Hilbert-Schmidt norm and satisfy Pythagorean identities. On the other hand, in the general case if $\pi$ is not invariant with respect to $G$, we propose and study a technique that we call state-dependent averaging of Markov kernels which generalizes the earlier results to this setting. As examples and applications, this averaging perspective not only allows us to recast state-of-the-art Markov chain samplers such as Hamiltonian Monte Carlo or piecewise-deterministic Markov processes as specific cases of $P_{da}$, but also enables improvements to existing samplers such as Metropolis-Hastings, achieving rapid mixing in some toy models or when $\pi$ is the discrete uniform distribution.

[371] arXiv:2509.03004 (cross-list from quant-ph) [pdf, html, other]
Title: Identifiability and minimality bounds of quantum and post-quantum models of classical stochastic processes
Paul M. Riechers, Thomas J. Elliott
Comments: 11 pages, 4 figures
Subjects: Quantum Physics (quant-ph); Statistical Mechanics (cond-mat.stat-mech); Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL); Information Theory (cs.IT)

To make sense of the world around us, we develop models, constructed to enable us to replicate, describe, and explain the behaviours we see. Focusing on the broad case of sequences of correlated random variables, i.e., classical stochastic processes, we tackle the question of determining whether or not two different models produce the same observable behavior. This is the problem of identifiability. Curiously, the physics of the model need not correspond to the physics of the observations; recent work has shown that it is even advantageous -- in terms of memory and thermal efficiency -- to employ quantum models to generate classical stochastic processes. We resolve the identifiability problem in this regime, providing a means to compare any two models of a classical process, be the models classical, quantum, or `post-quantum', by mapping them to a canonical `generalized' hidden Markov model. Further, this enables us to place (sometimes tight) bounds on the minimal dimension required of a quantum model to generate a given classical stochastic process.

[372] arXiv:2509.03013 (cross-list from eess.AS) [pdf, html, other]
Title: Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
Ryandhimas E. Zezario, Dyah A.M.G. Wisnu, Hsin-Min Wang, Yu Tsao
Comments: Accepted to APSIPA ASC 2025
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Non-intrusive speech intelligibility prediction remains challenging due to variability in speakers, noise conditions, and subjective perception. We propose an uncertainty-aware approach that leverages Whisper embeddings in combination with statistical features, specifically the mean, standard deviation, and entropy computed across the embedding dimensions. The entropy, computed via a softmax over the feature dimension, serves as a proxy for uncertainty, complementing global information captured by the mean and standard deviation. To model the sequential structure of speech, we adopt a scalar long short-term memory (sLSTM) network, which efficiently captures long-range dependencies. Building on this foundation, we propose iMTI-Net, an improved multi-target intelligibility prediction network that integrates convolutional neural network (CNN) and sLSTM components within a multitask learning framework. It jointly predicts human intelligibility scores and machine-based word error rates (WER) from Google ASR and Whisper. Experimental results show that iMTI-Net outperforms the original MTI-Net across multiple evaluation metrics, demonstrating the effectiveness of incorporating uncertainty-aware features and the CNN-sLSTM architecture.

[373] arXiv:2509.03017 (cross-list from eess.AS) [pdf, html, other]
Title: Non-Intrusive Intelligibility Prediction for Hearing Aids: Recent Advances, Trends, and Challenges
Ryandhimas E. Zezario
Comments: APSIPA ASC 2025 perspective paper
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

This paper provides an overview of recent progress in non-intrusive speech intelligibility prediction for hearing aids (HA). We summarize developments in robust acoustic feature extraction, hearing loss modeling, and the use of emerging architectures for long-sequence processing. Listener-specific adaptation strategies and domain generalization approaches that aim to improve robustness in unseen acoustic environments are also discussed. Remaining challenges, such as the need for large-scale, diverse datasets and reliable cross-profile generalization, are acknowledged. Our goal is to offer a perspective on current trends, ongoing challenges, and possible future directions toward practical and reliable HA-oriented intelligibility prediction systems.

[374] arXiv:2509.03021 (cross-list from eess.AS) [pdf, html, other]
Title: A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
Ryandhimas E. Zezario, Dyah A.M.G. Wisnu, Hsin-Min Wang, Yu Tsao
Comments: Accepted to IEEE ICCE-TW 2025
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

This work focuses on zero-shot non-intrusive speech assessment for hearing aids (HA) using large language models (LLMs). Specifically, we introduce GPT-Whisper-HA, an extension of GPT-Whisper, a zero-shot non-intrusive speech assessment model based on LLMs. GPT-Whisper-HA is designed for speech assessment for HA, incorporating MSBG hearing loss and NAL-R simulations to process audio input based on each individual's audiogram, two automatic speech recognition (ASR) modules for audio-to-text representation, and GPT-4o to predict two corresponding scores, followed by score averaging for the final estimated score. Experimental results indicate that GPT-Whisper-HA achieves a 2.59% relative root mean square error (RMSE) improvement over GPT-Whisper, confirming the potential of LLMs for zero-shot speech assessment in predicting subjective intelligibility for HA users.

[375] arXiv:2509.03023 (cross-list from math.AT) [pdf, other]
Title: Homotopy equivalence of digital pictures in $\mathbb{Z}^2$
Dae-Woong Lee, P. Christopher Staecker
Comments: 21 pages, 11 figures
Subjects: Algebraic Topology (math.AT); Discrete Mathematics (cs.DM)

We investigate the properties of digital homotopy in the context of digital pictures $(X,\kappa,\bar \kappa)$, where $X\subsetneq \mathbb{Z}^n$ is a finite set, $\kappa$ is an adjacency relation on $X$, and $\bar \kappa$ is an adjacency relation on the complement of $X$. In particular we focus on homotopy equivalence between digital pictures in $\mathbb{Z}^2$. We define a numerical homotopy-type invariant for digital pictures in $\mathbb{Z}^2$ called the outer perimeter, which is a basic tool for distinguishing homotopy types of digital pictures. When a digital pictures has no holes, we show that it is homotopy equivalent to its rc-convex hull, obtained by ``filling in the gaps'' of any row or column. We show that a digital picture $(X,c_i,c_j)$ is homotopy equivalent to only finitely many other digital pictures $(Y,c_i,c_j)$. At the end of the paper, we raise a conjecture on the row-column-convex hull of a digital picture.

[376] arXiv:2509.03064 (cross-list from math.CO) [pdf, html, other]
Title: Representation number of word-representable co-bipartite graph
Biswajit Das, Ramesh Hariharasubramanian
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

A graph $G = (V, E)$ is said to be word-representable if there exists a word $w$ over the alphabet $V$ such that, for any two distinct letters $x, y \in V$, the letters $x$ and $y$ alternate in $w$ if and only if $xy \in E$. A graph is co-bipartite if its complement is bipartite. Therefore, the vertex set of a co-bipartite graph can be partitioned into two disjoint subsets $X$ and $Y$ such that the subgraphs induced by $X$ and $Y$ are cliques.
The concept of word-representability for graph classes has gained significant attention in recent years. The book Words and Graphs by Sergey Kitaev and Vadim Lozin presents examples of co-bipartite graphs that are not word-representable. It is known that a graph is word-representable if and only if it admits a semi-transitive orientation. Although the necessary and sufficient conditions for the existence of a semi-transitive orientation in co-bipartite graphs have been established, the characterization based on vertex ordering remains open. In this paper, we present necessary and sufficient conditions for a co-bipartite graph to be word-representable in terms of its vertex ordering. Furthermore, based on this vertex ordering, we provide an algorithm to construct a $3$-uniform word-representation for any word-representable co-bipartite graph. Using this result, we prove that except for the permutation graphs, the representation number of all other word-representable co-bipartite graphs is $3$.

[377] arXiv:2509.03066 (cross-list from eess.SP) [pdf, html, other]
Title: S2M2ECG: Spatio-temporal bi-directional State Space Model Enabled Multi-branch Mamba for ECG
Huaicheng Zhang, Ruoxin Wang, Chenlian Zhou, Jiguang Shi, Yue Ge, Zhoutong Li, Sheng Chang, Hao Wang, Jin He, Qijun Huang
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As one of the most effective methods for cardiovascular disease (CVD) diagnosis, multi-lead Electrocardiogram (ECG) signals present a characteristic multi-sensor information fusion challenge that has been continuously researched in deep learning domains. Despite the numerous algorithms proposed with different DL architectures, maintaining a balance among performance, computational complexity, and multi-source ECG feature fusion remains challenging. Recently, state space models (SSMs), particularly Mamba, have demonstrated remarkable effectiveness across various fields. Their inherent design for high-efficiency computation and linear complexity makes them particularly suitable for low-dimensional data like ECGs. This work proposes S2M2ECG, an SSM architecture featuring three-level fusion mechanisms: (1) Spatio-temporal bi-directional SSMs with segment tokenization for low-level signal fusion, (2) Intra-lead temporal information fusion with bi-directional scanning to enhance recognition accuracy in both forward and backward directions, (3) Cross-lead feature interaction modules for spatial information fusion. To fully leverage the ECG-specific multi-lead mechanisms inherent in ECG signals, a multi-branch design and lead fusion modules are incorporated, enabling individual analysis of each lead while ensuring seamless integration with others. Experimental results reveal that S2M2ECG achieves superior performance in the rhythmic, morphological, and clinical scenarios. Moreover, its lightweight architecture ensures it has nearly the fewest parameters among existing models, making it highly suitable for efficient inference and convenient deployment. Collectively, S2M2ECG offers a promising alternative that strikes an excellent balance among performance, computational complexity, and ECG-specific characteristics, paving the way for high-performance, lightweight computations in CVD diagnosis.

[378] arXiv:2509.03075 (cross-list from astro-ph.IM) [pdf, html, other]
Title: A description of the radio astronomy data processing tool DDF Pipeline
Mathis Certenais, François Bodin, Laurent Morin
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Distributed, Parallel, and Cluster Computing (cs.DC)

This paper presents the DDF Pipeline, a radio astronomy data processing tool initially designed for the LOw-Frequency ARray (LO- FAR) radio-telescope and a candidate for processing data from the Square Kilometre Array (SKA). This work describes the DDF Pipeline software and presents a coarse-grain profiling execution to characterize its performance.

[379] arXiv:2509.03084 (cross-list from q-bio.BM) [pdf, html, other]
Title: SurGBSA: Learning Representations From Molecular Dynamics Simulations
Derek Jones, Yue Yang, Felice C. Lightstone, Niema Moshiri, Jonathan E. Allen, Tajana S. Rosing
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)

Self-supervised pretraining from static structures of drug-like compounds and proteins enable powerful learned feature representations. Learned features demonstrate state of the art performance on a range of predictive tasks including molecular properties, structure generation, and protein-ligand interactions. The majority of approaches are limited by their use of static structures and it remains an open question, how best to use atomistic molecular dynamics (MD) simulations to develop more generalized models to improve prediction accuracy for novel molecular structures. We present SURrogate mmGBSA (SurGBSA) as a new modeling approach for MD-based representation learning, which learns a surrogate function of the Molecular Mechanics Generalized Born Surface Area (MMGBSA). We show for the first time the benefits of physics-informed pre-training to train a surrogate MMGBSA model on a collection of over 1.4 million 3D trajectories collected from MD simulations of the CASF-2016 benchmark. SurGBSA demonstrates a dramatic 6,497x speedup versus a traditional physics-based single-point MMGBSA calculation while nearly matching single-point MMGBSA accuracy on the challenging pose ranking problem for identification of the correct top pose (-0.4% difference). Our work advances the development of molecular foundation models by showing model improvements when training on MD simulations. Models, code and training data are made publicly available.

[380] arXiv:2509.03121 (cross-list from math.CO) [pdf, html, other]
Title: Expansion of gap-planar graphs
David R. Wood
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

A graph is $k$-gap-planar if it has a drawing in the plane such that every crossing can be charged to one of the two edges involved so that at most $k$ crossings are charged to each edge. We show this class of graphs has linear expansion. In particular, every $r$-shallow minor of a $k$-gap-planar graph has density $O(rk)$. Several extensions of this result are proved: for topological minors, for $k$-cover-planar graphs, for $k$-gap-cover-planar graphs, and for drawings on any surface. Application to graph colouring are presented.

[381] arXiv:2509.03165 (cross-list from astro-ph.CO) [pdf, html, other]
Title: PatchNet: A hierarchical approach for neural field-level inference from Quijote Simulations
Anirban Bairagi, Benjamin Wandelt
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Information Theory (cs.IT)

\textit{What is the cosmological information content of a cubic Gigaparsec of dark matter? } Extracting cosmological information from the non-linear matter distribution has high potential to tighten parameter constraints in the era of next-generation surveys such as Euclid, DESI, and the Vera Rubin Observatory. Traditional approaches relying on summary statistics like the power spectrum and bispectrum, though analytically tractable, fail to capture the full non-Gaussian and non-linear structure of the density field. Simulation-Based Inference (SBI) provides a powerful alternative by learning directly from forward-modeled simulations. In this work, we apply SBI to the \textit{Quijote} dark matter simulations and introduce a hierarchical method that integrates small-scale information from field sub-volumes or \textit{patches} with large-scale statistics such as power spectrum and bispectrum. This hybrid strategy is efficient both computationally and in terms of the amount of training data required. It overcomes the memory limitations associated with full-field training. We show that our approach enhances Fisher information relative to analytical summaries and matches that of a very different approach (wavelet-based statistics), providing evidence that we are estimating the full information content of the dark matter density field at the resolution of $\sim 7.8~\mathrm{Mpc}/h$.

[382] arXiv:2509.03173 (cross-list from eess.IV) [pdf, html, other]
Title: Deep Self-knowledge Distillation: A hierarchical supervised learning for coronary artery segmentation
Mingfeng Lin
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Coronary artery disease is a leading cause of mortality, underscoring the critical importance of precise diagnosis through X-ray angiography. Manual coronary artery segmentation from these images is time-consuming and inefficient, prompting the development of automated models. However, existing methods, whether rule-based or deep learning models, struggle with issues like poor performance and limited generalizability. Moreover, current knowledge distillation methods applied in this field have not fully exploited the hierarchical knowledge of the model, leading to certain information waste and insufficient enhancement of the model's performance capabilities for segmentation tasks. To address these issues, this paper introduces Deep Self-knowledge Distillation, a novel approach for coronary artery segmentation that leverages hierarchical outputs for supervision. By combining Deep Distribution Loss and Pixel-wise Self-knowledge Distillation Loss, our method enhances the student model's segmentation performance through a hierarchical learning strategy, effectively transferring knowledge from the teacher model. Our method combines a loosely constrained probabilistic distribution vector with tightly constrained pixel-wise supervision, providing dual regularization for the segmentation model while also enhancing its generalization and robustness. Extensive experiments on XCAD and DCA1 datasets demonstrate that our approach outperforms the dice coefficient, accuracy, sensitivity and IoU compared to other models in comparative evaluations.

[383] arXiv:2509.03188 (cross-list from eess.IV) [pdf, html, other]
Title: Prompt-Guided Patch UNet-VAE with Adversarial Supervision for Adrenal Gland Segmentation in Computed Tomography Medical Images
Hania Ghouse, Muzammil Behzad
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Segmentation of small and irregularly shaped abdominal organs, such as the adrenal glands in CT imaging, remains a persistent challenge due to severe class imbalance, poor spatial context, and limited annotated data. In this work, we propose a unified framework that combines variational reconstruction, supervised segmentation, and adversarial patch-based feedback to address these limitations in a principled and scalable manner. Our architecture is built upon a VAE-UNet backbone that jointly reconstructs input patches and generates voxel-level segmentation masks, allowing the model to learn disentangled representations of anatomical structure and appearance. We introduce a patch-based training pipeline that selectively injects synthetic patches generated from the learned latent space, and systematically study the effects of varying synthetic-to-real patch ratios during training. To further enhance output fidelity, the framework incorporates perceptual reconstruction loss using VGG features, as well as a PatchGAN-style discriminator for adversarial supervision over spatial realism. Comprehensive experiments on the BTCV dataset demonstrate that our approach improves segmentation accuracy, particularly in boundary-sensitive regions, while maintaining strong reconstruction quality. Our findings highlight the effectiveness of hybrid generative-discriminative training regimes for small-organ segmentation and provide new insights into balancing realism, diversity, and anatomical consistency in data-scarce scenarios.

[384] arXiv:2509.03230 (cross-list from physics.soc-ph) [pdf, html, other]
Title: Network connectivity analysis via shortest paths
Silvia Noschese, Lothar Reichel
Comments: 17 pages, 4 figures
Subjects: Physics and Society (physics.soc-ph); Numerical Analysis (math.NA)

Complex systems of interacting components often can be modeled by a simple graph $\mathcal{G}$ that consists of a set of $n$ nodes and a set of $m$ edges. Such a graph can be represented by an adjacency matrix $A\in\R^{n\times n}$, whose $(ij)$th entry is one if there is an edge pointing from node $i$ to node $j$, and is zero otherwise. The matrix $A$ and its positive integer powers reveal important properties of the graph and allow the construction of the path length matrix $L$ for the graph. The $(ij)$th entry of $L$ is the length of the shortest path from node $i$ to node $j$; if there is no path between these nodes, then the value of the entry is set to $\infty$. We are interested in how well information flows via shortest paths of the graph. This can be studied with the aid of the path length matrix. The path length matrix allows the definition of several measures of communication in the network defined by the graph such as the global $K$-efficiency, which considers shortest paths that are made up of at most $K$ edges for some $K<n$, as well as the number of such shortest paths. Novel notions of connectivity introduced in this paper help us understand the importance of specific edges for the flow of information through the graph. This is of interest when seeking to simplify a network by removing selected edges or trying to assess the sensitivity of the flow of information to changes due to exterior causes such as a traffic stoppage on a road network.

[385] arXiv:2509.03280 (cross-list from quant-ph) [pdf, html, other]
Title: An experience-based classification of quantum bugs in quantum software
Nils Quetschlich, Olivia Di Matteo
Comments: 25 pages, 4 figures. To appear in special issue of Computing, "Pivoting Quantum Computing Using Software Engineering Best Practices"
Subjects: Quantum Physics (quant-ph); Software Engineering (cs.SE)

As quantum computers continue to improve in quality and scale, there is a growing need for accessible software frameworks for programming them. However, the unique behavior of quantum systems means specialized approaches, beyond traditional software development, are required. This is particularly true for debugging due to quantum bugs, i.e., bugs that occur precisely because an algorithm is a quantum algorithm. Pinpointing a quantum bug's root cause often requires significant developer time, as there is little established guidance for quantum debugging techniques. Developing such guidance is the main challenge we sought to address. In this work, we describe a set of 14 quantum bugs, sourced primarily from our experience as quantum software developers, and supplemented by analysis of open-source GitHub repositories. We detail their context, symptoms, and the techniques applied to identify and fix them. While classifying these bugs based on existing schemes, we observed that most emerged due to unique interactions between multiple aspects of an algorithm or workflow. In other words, they occurred because more than one thing went wrong, which provided important insight into why quantum debugging is more challenging. Furthermore, based on this clustering, we found that - unexpectedly - there is no clear relationship between debugging strategies and bug classes. Further research is needed to develop effective and systematic quantum debugging strategies.

[386] arXiv:2509.03292 (cross-list from eess.AS) [pdf, html, other]
Title: Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
Dyah A. M. G. Wisnu, Ryandhimas E. Zezario, Stefano Rini, Hsin-Min Wang, Yu Tsao
Comments: Accepted by IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), 2025
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

We present a system for automatic multi-axis perceptual quality prediction of generative audio, developed for Track 2 of the AudioMOS Challenge 2025. The task is to predict four Audio Aesthetic Scores--Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness--for audio generated by text-to-speech (TTS), text-to-audio (TTA), and text-to-music (TTM) systems. A main challenge is the domain shift between natural training data and synthetic evaluation data. To address this, we combine BEATs, a pretrained transformer-based audio representation model, with a multi-branch long short-term memory (LSTM) predictor and use a triplet loss with buffer-based sampling to structure the embedding space by perceptual similarity. Our results show that this improves embedding discriminability and generalization, enabling domain-robust audio quality assessment without synthetic training data.

[387] arXiv:2509.03306 (cross-list from quant-ph) [pdf, html, other]
Title: Evaluating Security Properties in the Execution of Quantum Circuits
Paolo Bernardi, Antonio Brogi, Gian-Luigi Ferrari, Giuseppe Bisicchia
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

Quantum computing is a disruptive technology that is expected to offer significant advantages in many critical fields (e.g. drug discovery and cryptography). The security of information processed by such machines is therefore paramount. Currently, modest Noisy Intermediate-Scale Quantum (NISQ) devices are available. The goal of this work is to identify a practical, heuristic methodology to evaluate security properties, such as secrecy and integrity, while using quantum processors owned by potentially untrustworthy providers.

[388] arXiv:2509.03311 (cross-list from eess.SP) [pdf, html, other]
Title: Credible Uncertainty Quantification under Noise and System Model Mismatch
Penggao Yan, Li-Ta Hsu
Comments: This manuscript has been submitted to IEEE Signal Processing Letters
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

State estimators often provide self-assessed uncertainty metrics, such as covariance matrices, whose reliability is critical for downstream tasks. However, these self-assessments can be misleading due to underlying modeling violations like noise or system model mismatch. This letter addresses the problem of estimator credibility by introducing a unified, multi-metric evaluation framework. We construct a compact credibility portfolio that synergistically combines traditional metrics like the Normalized Estimation Error Squared (NEES) and the Noncredibility Index (NCI) with proper scoring rules, namely the Negative Log-Likelihood (NLL) and the Energy Score (ES). Our key contributions are a novel energy distance-based location test to robustly detect system model misspecification and a method that leverages the asymmetric sensitivities of NLL and ES to distinguish optimism covariance scaling from system bias. Monte Carlo simulations across six distinct credibility scenarios demonstrate that our proposed method achieves high classification accuracy (80-100%), drastically outperforming single-metric baselines which consistently fail to provide a complete and correct diagnosis. This framework provides a practical tool for turning patterns of credibility indicators into actionable diagnoses of model deficiencies.

[389] arXiv:2509.03317 (cross-list from stat.ML) [pdf, html, other]
Title: Bayesian Additive Regression Trees for functional ANOVA model
Seokhun Park, Insung Kong, Yongdai Kim
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at the cost of interpretability. To address this limitation, we propose ANOVA Bayesian Additive Regression Trees (ANOVA-BART), a novel extension of BART based on the functional ANOVA decomposition, which is used to decompose the variability of a function into different interactions, each representing the contribution of a different set of covariates or factors. Our proposed ANOVA-BART enhances interpretability, preserves and extends the theoretical guarantees of BART, and achieves superior predictive performance. Specifically, we establish that the posterior concentration rate of ANOVA-BART is nearly minimax optimal, and further provides the same convergence rates for each interaction that are not available for BART. Moreover, comprehensive experiments confirm that ANOVA-BART surpasses BART in both accuracy and uncertainty quantification, while also demonstrating its effectiveness in component selection. These results suggest that ANOVA-BART offers a compelling alternative to BART by balancing predictive accuracy, interpretability, and theoretical consistency.

[390] arXiv:2509.03333 (cross-list from eess.SP) [pdf, html, other]
Title: Baseband Model, Cutoff Rate Bounds and Constellation Shaping for Mixed Gaussian-Impulsive Noise
Tianfu Qi, Jun Wang
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Mixed noise, composed of white Gaussian noise (WGN) and impulsive noise (IN), appears in numerous communication scenarios and can severely degrade system performance. In this paper, we address this issue by optimizing the transmitted constellation under mixed noise based on a theoretical analysis of the cutoff rate (CR). First, starting from the passband model of the mixed noise, we derive its corresponding baseband representation. Due to the complexity of the CR, an exact analytic expression is generally intractable. Therefore, the baseband noise model is employed to obtain closed-form lower and upper bounds of the CR. A piecewise linear approximation is applied to derive efficient bounds by exploiting the algebraic properties of the integral terms. These bounds are then used as criteria to optimize the transmitted constellation points in both geometric and probabilistic distributions. The projected gradient method is employed to solve the optimization problem, and the convergence and properties of the solutions are analyzed. Numerical results demonstrate that the proposed CR bounds are tight and exhibit the expected asymptotic behavior. Furthermore, the optimized constellation scheme achieves a significant rate improvement compared to baselines.

[391] arXiv:2509.03336 (cross-list from q-bio.MN) [pdf, other]
Title: AI-Driven Drug Repurposing through miRNA-mRNA Relation
Sharanya Manoharan, Balu Bhasuran, Oviya Ramalakshmi Iyyappan, Mohamed Saleem Abdul Shukkoor, Malathi Sellapan, Kalpana Raja
Subjects: Molecular Networks (q-bio.MN); Information Retrieval (cs.IR); Quantitative Methods (q-bio.QM)

miRNA mRNA relations are closely linked to several biological processes and disease mechanisms In a recent study we tested the performance of large language models LLMs on extracting miRNA mRNA relations from PubMed PubMedBERT achieved the best performance of 0.783 F1 score for miRNA mRNA Interaction Corpus MMIC Here we first applied the finetuned PubMedBERT model to extract miRNA mRNA relations from PubMed for chronic obstructive pulmonary disease COPD Alzheimers disease AD stroke type 2 diabetes mellitus T2DM chronic liver disease and cancer Next we retrieved miRNA drug relations using KinderMiner a literature mining tool for relation extraction Then we constructed three interaction networks 1 disease centric network 2 drug centric network and 3 miRNA centric network comprising 3497 nodes and 16417 edges organized as a directed graph to capture complex biological relationships Finally we validated the drugs using MIMIC IV Our integrative approach revealed both established and novel candidate drugs for diseases under study through 595 miRNA drug relations extracted from PubMed To the best of our knowledge this is the first study to systematically extract and visualize relationships among four distinct biomedical entities miRNA mRNA drug and disease

[392] arXiv:2509.03339 (cross-list from math.CO) [pdf, other]
Title: Line Graphs of Non-Word-Representable Graphs are Not Always Non-Word-Representable
Khyodeno Mozhui, Tithi Dwary, K. V. Krishna
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

A graph is said to be word-representable if there exists a word over its vertex set such that any two vertices are adjacent if and only if they alternate in the word. If no such word exists, the graph is non-word-representable. In the literature, there are examples of non-word-representable graphs whose line graphs are non-word-representable. However, it is an open problem to determine whether the line graph of a non-word-representable graph is always non-word-representable or not? In this work, we address the open problem by considering a class of non-word-representable graphs, viz., Mycielski graphs of odd cycles of length at least five, and show that their line graphs are word-representable.

[393] arXiv:2509.03372 (cross-list from eess.AS) [pdf, html, other]
Title: An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
Tien-Hong Lo, Szu-Yu Chen, Yao-Ting Sung, Berlin Chen
Comments: Accepted at ASRU 2025
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

A recent line of research on automated speaking assessment (ASA) has benefited from self-supervised learning (SSL) representations, which capture rich acoustic and linguistic patterns in non-native speech without underlying assumptions of feature curation. However, speech-based SSL models capture acoustic-related traits but overlook linguistic content, while text-based SSL models rely on ASR output and fail to encode prosodic nuances. Moreover, most prior arts treat proficiency levels as nominal classes, ignoring their ordinal structure and non-uniform intervals between proficiency labels. To address these limitations, we propose an effective ASA approach combining SSL with handcrafted indicator features via a novel modeling paradigm. We further introduce a multi-margin ordinal loss that jointly models both the score ordinality and non-uniform intervals of proficiency labels. Extensive experiments on the TEEMI corpus show that our method consistently outperforms strong baselines and generalizes well to unseen prompts.

[394] arXiv:2509.03378 (cross-list from stat.ML) [pdf, html, other]
Title: Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Wu Lin, Scott C. Lowe, Felix Dangel, Runa Eschenhagen, Zikun Xu, Roger B. Grosse
Comments: technical report, working in progress
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

As an adaptive method, Shampoo employs a structured second-moment estimation, and its effectiveness has attracted growing attention. Prior work has primarily analyzed its estimation scheme through the Frobenius norm. Motivated by the natural connection between the second moment and a covariance matrix, we propose studying Shampoo's estimation as covariance estimation through the lens of Kullback-Leibler (KL) minimization. This alternative perspective reveals a previously hidden limitation, motivating improvements to Shampoo's design. Building on this insight, we develop a practical estimation scheme, termed KL-Shampoo, that eliminates Shampoo's reliance on Adam for stabilization, thereby removing the additional memory overhead introduced by Adam. Preliminary results show that KL-Shampoo improves Shampoo's performance, enabling it to stabilize without Adam and even outperform its Adam-stabilized variant, SOAP, in neural network pretraining.

[395] arXiv:2509.03390 (cross-list from math.CO) [pdf, html, other]
Title: Row Impartial Terminus
Eric Gottlieb, Dawood Khatana, Matjaž Krnc, Peter Muršič, Ismael Qureshi
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

We introduce Row Impartial Terminus (RIT), an impartial combinatorial game played on integer partitions. We show that any position in RIT can be uniquely decomposed into a core and a remnant. Our central result is that the Conway pair of any RIT position-which determines the outcome under both normal and misère play-is identical to the Conway pair of a corresponding position in the game of Nim defined by the remnant. This finding provides a complete winning strategy for both variants of RIT, reducing its analysis to the well-understood framework of Nim. As a consequence, we classify RIT within the Conway-Gurvich-Ho hierarchy, showing it to be forced and miserable but not pet.

[396] arXiv:2509.03421 (cross-list from eess.IV) [pdf, other]
Title: Generalist versus Specialist Vision Foundation Models for Ocular Disease and Oculomics
Yukun Zhou, Paul Nderitu, Jocelyn Hui Lin Goh, Justin Engelmann, Siegfried K. Wagner, Anran Ran, Hongyang Jiang, Lie Ju, Ke Zou, Sahana Srinivasan, Hyunmin Kim, Takahiro Ninomiya, Zheyuan Wang, Gabriel Dawei Yang, Eden Ruffell, Dominic Williamson, Rui Santos, Gabor Mark Somfai, Carol Y. Cheung, Tien Yin Wong, Daniel C. Alexander, Yih Chung Tham, Pearse A. Keane
Comments: 39 pages, 8 Figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Medical foundation models, pre-trained with large-scale clinical data, demonstrate strong performance in diverse clinically relevant applications. RETFound, trained on nearly one million retinal images, exemplifies this approach in applications with retinal images. However, the emergence of increasingly powerful and multifold larger generalist foundation models such as DINOv2 and DINOv3 raises the question of whether domain-specific pre-training remains essential, and if so, what gap persists. To investigate this, we systematically evaluated the adaptability of DINOv2 and DINOv3 in retinal image applications, compared to two specialist RETFound models, RETFound-MAE and RETFound-DINOv2. We assessed performance on ocular disease detection and systemic disease prediction using two adaptation strategies: fine-tuning and linear probing. Data efficiency and adaptation efficiency were further analysed to characterise trade-offs between predictive performance and computational cost. Our results show that although scaling generalist models yields strong adaptability across diverse tasks, RETFound-DINOv2 consistently outperforms these generalist foundation models in ocular-disease detection and oculomics tasks, demonstrating stronger generalisability and data efficiency. These findings suggest that specialist retinal foundation models remain the most effective choice for clinical applications, while the narrowing gap with generalist foundation models suggests that continued data and model scaling can deliver domain-relevant gains and position them as strong foundations for future medical foundation models.

[397] arXiv:2509.03431 (cross-list from math.AP) [pdf, html, other]
Title: A novel approach to study the wellposedness of the 3D fluid-2D plate interaction PDE System
George Avalos, Pelin G. Geredeli, Hemanta Kunwar, Hyesuk Lee
Comments: 23 pages, 4 figures
Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

We consider a certain fluid-structure interaction (FSI) system with a view of obtaining an alternative methodology for establishing its strongly continuous semigroup wellposedness. (Semigroup generation for this FSI was originally considered in Avalos-Clark (2014).) The FSI model under consideration describes the vibrations of an incompressible fluid within a 3D cavity as it interacts with the elastic membrane on the ``free" upper boundary of the cavity. Such coupled PDE systems appear in variety of natural settings such as biomedicine, aeroelasticity, and fluid dynamics.
Our proof of $C_0$-semigroup wellposedness is based on a proper application of Lumer Phillips Theorem. In this regard, our main challenge is to show the maximality of the corresponding semigroup generator. To this end, we develop a ``nonstandard" inf-sup approach which avoids the use of technical nonlocal maps in the associated bilinear forms--unlike the earlier paper Avalos-Clark (2014)--and allows for the solution of the fluid and plate solution variables simultanously. Our new inf-sup strategy will lead to a more efficient mixed finite element method (FEM) for approximating solutions to the FSI problem, inasmuch our novel variational formulation avoids bilinear forms which are free from the computationally-intensive nonlocal solution operators invoked in Avalos-Clark (2014). We also perform numerical tests based on this formulation using a benchmark problem
and present numerical results to demonstrate the effectiveness of our approach.

[398] arXiv:2509.03438 (cross-list from stat.ML) [pdf, html, other]
Title: Non-Linear Counterfactual Aggregate Optimization
Benjamin Heymann, Otmane Sakhi
Comments: Recsys '25, CONSEQUENCES: Causality, Counterfactuals & Sequential Decision-Making Workshop
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We consider the problem of directly optimizing a non-linear function of an outcome, where this outcome itself is the sum of many small contributions. The non-linearity of the function means that the problem is not equivalent to the maximization of the expectation of the individual contribution. By leveraging the concentration properties of the sum of individual outcomes, we derive a scalable descent algorithm that directly optimizes for our stated objective. This allows for instance to maximize the probability of successful A/B test, for which it can be wiser to target a success criterion, such as exceeding a given uplift, rather than chasing the highest expected payoff.

[399] arXiv:2509.03443 (cross-list from math.OC) [pdf, html, other]
Title: On the Perturbed Projection-Based Distributed Gradient-Descent Algorithm: A Fully-Distributed Adaptive Redesign
Tarek Bazizi, Mohamed Maghenem, Paolo Frasca, Antonio Lorìa, Elena Panteley
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this work, we revisit a classical distributed gradient-descent algorithm, introducing an interesting class of perturbed multi-agent systems. The state of each subsystem represents a local estimate of a solution to the global optimization problem. Thereby, the network is required to minimize local cost functions, while gathering the local estimates around a common value. Such a complex task suggests the interplay of consensus-based dynamics with gradient-descent dynamics. The latter descent dynamics involves the projection operator, which is assumed to provide corrupted projections of a specific form, reminiscent of existing (fast) projection algorithms. Hence, for the resulting class of perturbed networks, we are able to adaptively tune some gains in a fully distributed fashion, to approach the optimal consensus set up to arbitrary-desired precision.

[400] arXiv:2509.03456 (cross-list from stat.ML) [pdf, html, other]
Title: Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation
Imad Aouali, Otmane Sakhi
Comments: Recsys '25, CONSEQUENCES: Causality, Counterfactuals & Sequential Decision-Making Workshop
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits. Recent advances in OPL primarily optimize OPE estimators with improved statistical properties, assuming that better estimators inherently yield superior policies. Although theoretically justified, we argue this estimator-centric approach neglects a critical practical obstacle: challenging optimization landscapes. In this paper, we provide theoretical insights and extensive empirical evidence showing that current OPL methods encounter severe optimization issues, particularly as action spaces become large. We demonstrate that simpler weighted log-likelihood objectives enjoy substantially better optimization properties and still recover competitive, often superior, learned policies. Our findings emphasize the necessity of explicitly addressing optimization considerations in the development of OPL algorithms for large action spaces.

[401] arXiv:2509.03475 (cross-list from math.OC) [pdf, html, other]
Title: From Image Denoisers to Regularizing Imaging Inverse Problems: An Overview
Hong Ye Tan, Subhadip Mukherjee, Junqi Tang
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Inverse problems lie at the heart of modern imaging science, with broad applications in areas such as medical imaging, remote sensing, and microscopy. Recent years have witnessed a paradigm shift in solving imaging inverse problems, where data-driven regularizers are used increasingly, leading to remarkably high-fidelity reconstruction. A particularly notable approach for data-driven regularization is to use learned image denoisers as implicit priors in iterative image reconstruction algorithms. This survey presents a comprehensive overview of this powerful and emerging class of algorithms, commonly referred to as plug-and-play (PnP) methods. We begin by providing a brief background on image denoising and inverse problems, followed by a short review of traditional regularization strategies. We then explore how proximal splitting algorithms, such as the alternating direction method of multipliers (ADMM) and proximal gradient descent (PGD), can naturally accommodate learned denoisers in place of proximal operators, and under what conditions such replacements preserve convergence. The role of Tweedie's formula in connecting optimal Gaussian denoisers and score estimation is discussed, which lays the foundation for regularization-by-denoising (RED) and more recent diffusion-based posterior sampling methods. We discuss theoretical advances regarding the convergence of PnP algorithms, both within the RED and proximal settings, emphasizing the structural assumptions that the denoiser must satisfy for convergence, such as non-expansiveness, Lipschitz continuity, and local homogeneity. We also address practical considerations in algorithm design, including choices of denoiser architecture and acceleration strategies.

[402] arXiv:2509.03495 (cross-list from quant-ph) [pdf, html, other]
Title: Learning AC Power Flow Solutions using a Data-Dependent Variational Quantum Circuit
Thinh Viet Le, Md Obaidur Rahman, Vassilis Kekatos
Comments: 7 pages, 6 figures, accepted for the IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids 2025
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)

Interconnection studies require solving numerous instances of the AC load or power flow (AC PF) problem to simulate diverse scenarios as power systems navigate the ongoing energy transition. To expedite such studies, this work leverages recent advances in quantum computing to find or predict AC PF solutions using a variational quantum circuit (VQC). VQCs are trainable models that run on modern-day noisy intermediate-scale quantum (NISQ) hardware to accomplish elaborate optimization and machine learning (ML) tasks. Our first contribution is to pose a single instance of the AC PF as a nonlinear least-squares fit over the VQC trainable parameters (weights) and solve it using a hybrid classical/quantum computing approach. The second contribution is to feed PF specifications as features into a data-embedded VQC and train the resultant quantum ML (QML) model to predict general PF solutions. The third contribution is to develop a novel protocol to efficiently measure AC-PF quantum observables by exploiting the graph structure of a power network. Preliminary numerical tests indicate that the proposed VQC models attain enhanced prediction performance over a deep neural network despite using much fewer weights. The proposed quantum AC-PF framework sets the foundations for addressing more elaborate grid tasks via quantum computing.

[403] arXiv:2509.03496 (cross-list from quant-ph) [pdf, html, other]
Title: Information-Theoretic Lower Bounds for Approximating Monomials via Optimal Quantum Tsallis Entropy Estimation
Qisheng Wang
Comments: Submitted to CCC 2025. 36 pages, 1 figure, 1 algorithm
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Information Theory (cs.IT); Classical Analysis and ODEs (math.CA)

This paper reveals a conceptually new connection from information theory to approximation theory via quantum algorithms for entropy estimation. Specifically, we provide an information-theoretic lower bound $\Omega(\sqrt{n})$ on the approximate degree of the monomial $x^n$, compared to the analytic lower bounds shown in Newman and Rivlin (Aequ. Math. 1976) via Fourier analysis and in Sachdeva and Vishnoi (Found. Trends Theor. Comput. Sci. 2014) via the Markov brothers' inequality. This is done by relating the polynomial approximation of monomials to quantum Tsallis entropy estimation. This further implies a quantum algorithm that estimates to within additive error $\varepsilon$ the Tsallis entropy of integer order $q \geq 2$ of an unknown probability distribution $p$ or an unknown quantum state $\rho$, using $\widetilde \Theta(\frac{1}{\sqrt{q}\varepsilon})$ queries to the quantum oracle that produces a sample from $p$ or prepares a copy of $\rho$, improving the prior best $O(\frac{1}{\varepsilon})$ via the Shift test due to Ekert, Alves, Oi, Horodecki, Horodecki and Kwek (Phys. Rev. Lett. 2002). To the best of our knowledge, this is the first quantum entropy estimator with optimal query complexity (up to polylogarithmic factors) for all parameters simultaneously.

Replacement submissions (showing 294 of 294 entries)

[404] arXiv:2208.13266 (replaced) [pdf, html, other]
Title: JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang
Comments: 19th International Conference on Neurosymbolic Learning and Reasoning
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc. Traditional symbolic methods have scaling and generalization issues, while end-to-end deep learning models suffer from data scarcity and high task complexity, and are often hard to explain. To benefit from both worlds, we propose JARVIS, a neuro-symbolic commonsense reasoning framework for modular, generalizable, and interpretable conversational embodied agents. First, it acquires symbolic representations by prompting large language models (LLMs) for language understanding and sub-goal planning, and by constructing semantic maps from visual observations. Then the symbolic module reasons for sub-goal planning and action generation based on task- and action-level common sense. Extensive experiments on the TEACh dataset validate the efficacy and efficiency of our JARVIS framework, which achieves state-of-the-art (SOTA) results on all three dialog-based embodied tasks, including Execution from Dialog History (EDH), Trajectory from Dialog (TfD), and Two-Agent Task Completion (TATC) (e.g., our method boosts the unseen Success Rate on EDH from 6.1\% to 15.8\%). Moreover, we systematically analyze the essential factors that affect the task performance and also demonstrate the superiority of our method in few-shot settings. Our JARVIS model ranks first in the Alexa Prize SimBot Public Benchmark Challenge.

[405] arXiv:2210.10547 (replaced) [pdf, html, other]
Title: Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Xu Yuan, Chen Xu, Qiwei Chen, Chao Li, Junfeng Ge, Wenwu Ou
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

In this era of information explosion, a personalized recommendation system is convenient for users to get information they are interested in. To deal with billions of users and items, large-scale online recommendation services usually consist of three stages: candidate generation, coarse-grained ranking, and fine-grained ranking. The success of each stage depends on whether the model accurately captures the interests of users, which are usually hidden in users' behavior data. Previous research shows that users' interests are diverse, and one vector is not sufficient to capture users' different preferences. Therefore, many methods use multiple vectors to encode users' interests. However, there are two unsolved problems: (1) The similarity of different vectors in existing methods is too high, with too much redundant information. Consequently, the interests of users are not fully represented. (2) Existing methods model the long-term and short-term behaviors together, ignoring the differences between them. This paper proposes a Hierarchical Multi-Interest Co-Network (HCN) to capture users' diverse interests in the coarse-grained ranking stage. Specifically, we design a hierarchical multi-interest extraction layer to update users' diverse interest centers iteratively. The multiple embedded vectors obtained in this way contain more information and represent the interests of users better in various aspects. Furthermore, we develop a Co-Interest Network to integrate users' long-term and short-term interests. Experiments on several real-world datasets and one large-scale industrial dataset show that HCN effectively outperforms the state-of-the-art methods. We deploy HCN into a large-scale real world E-commerce system and achieve extra 2.5\% improvements on GMV (Gross Merchandise Value).

[406] arXiv:2210.11570 (replaced) [pdf, other]
Title: Online Resource Allocation with Cancellations
Farbod Ekbatani, Yiding Feng, Rad Niazadeh
Comments: A preliminary conference version of this work has appeared in the ACM Economics and Computations(EC) 2023 conference under the title "Online Resource Allocation with Buyback: Optimal Algorithms via Primal-Dual." ; An earlier version of this paper was distributed under the title "Online Matching with Cancellation Costs."
Subjects: Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT)

We initiate the study of two-sided online resource allocation with costly cancellations. Our focus is on edge-weighted online bipartite matching (and several of its extensions), where nodes arrive online and request offline resources. In contrast to the classic literature, any fraction of an offline resource that was preallocated to an earlier online node can be reclaimed, resulting in the loss of the previously allocated edge-weight plus an additional penalty equal to a non-negative constant factor $f$ times the edge-weight. Parameterizing the problem by the buyback factor $f$, our main result is the development of optimal competitive algorithms for \emph{all possible values} of $f$ through a novel primal-dual family of algorithms in the fractional (or equivalently, large capacity) setting, and establishing their optimality by deriving matching lower bounds. Interestingly, our results reveal a phase transition: for the small buyback regime ($f < \frac{e-2}{2}$), the optimal competitive ratio is $\frac{e}{e-(1+f)}$, and for the large buyback regime ($f \geq \frac{e-2}{2}$), the competitive ratio is $-W_{-1}\left(\frac{-1}{e(1+f)}\right)$, where $W_{-1}$ is the non-principal branch of the Lambert $W$ function. We also study variants of this model, such as matching with deterministic integral allocations. We again show a phase transition: for the small buyback regime ($f < \frac{1}{3}$), the optimal competitive ratio is $\frac{2}{1-f}$, while for the large buyback regime ($f \geq \frac{1}{3}$), the competitive ratio is $1 + 2f + 2\sqrt{f(1+f)}$. We further consider various extensions, including to configuration allocations and submodular welfare maximization, as well as negative values of $f$, modeling a secondary supply channels or overflow capacities available at discounted rates. Our unifying primal-dual framework achieves the exact optimal competitive ratio across all these variants

[407] arXiv:2210.14275 (replaced) [pdf, html, other]
Title: Similarity between Units of Natural Language: The Transition from Coarse to Fine Estimation
Wenchuan Mu
Comments: PhD thesis
Subjects: Computation and Language (cs.CL)

Capturing the similarities between human language units is crucial for explaining how humans associate different objects, and therefore its computation has received extensive attention, research, and applications. With the ever-increasing amount of information around us, calculating similarity becomes increasingly complex, especially in many cases, such as legal or medical affairs, measuring similarity requires extra care and precision, as small acts within a language unit can have significant real-world effects. My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way.
Computation of similarity has come a long way, but approaches to debugging the measures are often based on continually fitting human judgment values. To this end, my goal is to develop an algorithm that precisely catches loopholes in a similarity calculation. Furthermore, most methods have vague definitions of the similarities they compute and are often difficult to interpret. The proposed framework addresses both shortcomings. It constantly improves the model through catching different loopholes. In addition, every refinement of the model provides a reasonable explanation. The regression model introduced in this thesis is called progressively refined similarity computation, which combines attack testing with adversarial training. The similarity regression model of this thesis achieves state-of-the-art performance in handling edge cases.

[408] arXiv:2212.14511 (replaced) [pdf, html, other]
Title: Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part I
Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra
Comments: 51 pages; extended journal version, with an end-to-end guarantee added
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a cost-driven approach, where a dynamic model in some latent state space is learned by predicting the costs without predicting the observations or actions. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model, for finite-horizon time-varying LQG control problems. To the best of our knowledge, despite various empirical successes, finite-sample guarantees of such a cost-driven approach remain elusive. Our result underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations. A second part of this work, that is to appear as Part II, addresses the infinite-horizon linear time-invariant setting; it also extends the results to an approach that implicitly learns the latent dynamics, inspired by the recent empirical breakthrough of MuZero in model-based reinforcement learning.

[409] arXiv:2304.08630 (replaced) [pdf, html, other]
Title: MFGLib: A Library for Mean-Field Games
Xin Guo, Anran Hu, Matteo Santamaria, Mahan Tajrobehkar, Junzi Zhang
Subjects: Computer Science and Game Theory (cs.GT)

Mean-field games (MFGs) are limiting models to approximate $N$-player games, with a number of applications. Despite the ever-growing numerical literature on computation of MFGs, there is no library that allows researchers and practitioners to easily create and solve their own MFG problems. The purpose of this document is to introduce MFGLib, an open-source Python library for solving general MFGs with a user-friendly and customizable interface. It serves as a handy tool for creating and analyzing generic MFG environments, along with embedded auto-tuners for all implemented algorithms. The package is distributed under the MIT license and the source code and documentation can be found at this https URL.

[410] arXiv:2306.02192 (replaced) [pdf, html, other]
Title: Correcting Auto-Differentiation in Neural-ODE Training
Yewei Xu, Shi Chen, Qin Li
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Does the use of auto-differentiation yield reasonable updates for deep neural networks (DNNs)? Specifically, when DNNs are designed to adhere to neural ODE architectures, can we trust the gradients provided by auto-differentiation? Through mathematical analysis and numerical evidence, we demonstrate that when neural networks employ high-order methods, such as Linear Multistep Methods (LMM) or Explicit Runge-Kutta Methods (ERK), to approximate the underlying ODE flows, brute-force auto-differentiation often introduces artificial oscillations in the gradients that prevent convergence. In the case of Leapfrog and 2-stage ERK, we propose simple post-processing techniques that effectively eliminates these oscillations, correct the gradient computation and thus returns the accurate updates.

[411] arXiv:2310.03311 (replaced) [pdf, html, other]
Title: Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses
Eslam Abdelaleem, Ilya Nemenman, K. Michael Martini
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Data Analysis, Statistics and Probability (physics.data-an)

Variational dimensionality reduction methods are widely used for their accuracy, generative capabilities, and robustness. We introduce a unifying framework that generalizes both such as traditional and state-of-the-art methods. The framework is based on an interpretation of the multivariate information bottleneck, trading off the information preserved in an encoder graph (defining what to compress) against that in a decoder graph (defining a generative model for data). Using this approach, we rederive existing methods, including the deep variational information bottleneck, variational autoencoders, and deep multiview information bottleneck. We naturally extend the deep variational CCA (DVCCA) family to beta-DVCCA and introduce a new method, the deep variational symmetric information bottleneck (DVSIB). DSIB, the deterministic limit of DVSIB, connects to modern contrastive learning approaches such as Barlow Twins, among others. We evaluate these methods on Noisy MNIST and Noisy CIFAR-100, showing that algorithms better matched to the structure of the problem like DVSIB and beta-DVCCA produce better latent spaces as measured by classification accuracy, dimensionality of the latent variables, sample efficiency, and consistently outperform other approaches under comparable conditions. Additionally, we benchmark against state-of-the-art models, achieving superior or competitive accuracy. Our results demonstrate that this framework can seamlessly incorporate diverse multi-view representation learning algorithms, providing a foundation for designing novel, problem-specific loss functions.

[412] arXiv:2312.02420 (replaced) [pdf, html, other]
Title: Repurposing SAM for User-Defined Semantics Aware Segmentation
Rohit Kundu, Sudipta Paul, Arindam Dutta, Amit K. Roy-Chowdhury
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Segment Anything Model (SAM) excels at generating precise object masks from input prompts but lacks semantic awareness, failing to associate its generated masks with specific object categories. To address this limitation, we propose U-SAM, a novel framework that imbibes semantic awareness into SAM, enabling it to generate targeted masks for user-specified object categories. Given only object class names as input from the user, U-SAM provides pixel-level semantic annotations for images without requiring any labeled/unlabeled samples from the test data distribution. Our approach leverages synthetically generated or web crawled images to accumulate semantic information about the desired object classes. We then learn a mapping function between SAM's mask embeddings and object class labels, effectively enhancing SAM with granularity-specific semantic recognition capabilities. As a result, users can obtain meaningful and targeted segmentation masks for specific objects they request, rather than generic and unlabeled masks. We evaluate U-SAM on PASCAL VOC 2012 and MSCOCO-80, achieving significant mIoU improvements of +17.95% and +5.20%, respectively, over state-of-the-art methods. By transforming SAM into a semantically aware segmentation model, U-SAM offers a practical and flexible solution for pixel-level annotation across diverse and unseen domains in a resource-constrained environment.

[413] arXiv:2401.11666 (replaced) [pdf, html, other]
Title: P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer
Zhiyuan Wang, Xiaoyang Qu, Jing Xiao, Bokui Chen, Jianzong Wang
Comments: Accepted by the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new task-specific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.

[414] arXiv:2401.11667 (replaced) [pdf, html, other]
Title: INCPrompt: Task-Aware incremental Prompting for Rehearsal-Free Class-incremental Learning
Zhiyuan Wang, Xiaoyang Qu, Jing Xiao, Bokui Chen, Jianzong Wang
Comments: Accepted by the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
Subjects: Machine Learning (cs.LG)

This paper introduces INCPrompt, an innovative continual learning solution that effectively addresses catastrophic forgetting. INCPrompt's key innovation lies in its use of adaptive key-learner and task-aware prompts that capture task-relevant information. This unique combination encapsulates general knowledge across tasks and encodes task-specific knowledge. Our comprehensive evaluation across multiple continual learning benchmarks demonstrates INCPrompt's superiority over existing algorithms, showing its effectiveness in mitigating catastrophic forgetting while maintaining high performance. These results highlight the significant impact of task-aware incremental prompting on continual learning performance.

[415] arXiv:2401.14996 (replaced) [pdf, html, other]
Title: A Resolution-Based Interactive Proof System for UNSAT
Philipp Czerner, Javier Esparza, Valentin Krasotin, Adrian Krauss
Comments: 26 pages
Subjects: Logic in Computer Science (cs.LO)

Modern SAT or QBF solvers are expected to produce correctness certificates. However, certificates have worst-case exponential size (unless $\textsf{NP}=\textsf{coNP}$), and at recent SAT competitions the largest certificates of unsatisfiability are starting to reach terabyte size.
Recently, Couillard, Czerner, Esparza, and Majumdar have suggested to replace certificates with interactive proof systems based on the $\textsf{IP}=\textsf{PSPACE}$ theorem. They have presented an interactive protocol between a prover and a verifier for an extension of QBF. The overall running time of the protocol is linear in the time needed by a standard BDD-based algorithm, and the time invested by the verifier is polynomial in the size of the formula. (So, in particular, the verifier never has to read or process exponentially long certificates). We call such an interactive protocol competitive with the BDD algorithm for solving QBF.
While BDD algorithms are state-of-the-art for certain classes of QBF instances, no modern (UN)SAT solver is based on BDDs. For this reason, we initiate the study of interactive certification for more practical SAT algorithms. In particular, we address the question whether interactive protocols can be competitive with some variant of resolution. We present two contributions. First, we prove a theorem that reduces the problem of finding competitive interactive protocols to finding an arithmetisation of formulas satisfying certain commutativity properties. (Arithmetisation is the fundamental technique underlying the $\textsf{IP}=\textsf{PSPACE}$ theorem.) Then, we apply the theorem to give the first interactive protocol for the Davis-Putnam resolution procedure. We also report on an implementation and give some experimental results.

[416] arXiv:2402.07911 (replaced) [pdf, html, other]
Title: From Metrics to Meaning: Time to Rethink Evaluation in Human-AI Collaborative Design
Sean P. Walton, Ben J. Evans, Alma A. M. Rahat, James Stovold, Jakub Vincalek
Comments: 31 pages, under review
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Neural and Evolutionary Computing (cs.NE)

As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience -- not simply a back end tool.

[417] arXiv:2402.16114 (replaced) [pdf, html, other]
Title: Controlling Deformable Objects with Non-negligible Dynamics: a Shape-Regulation Approach to End-Point Positioning
Sebastien Tiburzio (1), Tomás Coleman (1), Daniel Feliu-Talegon (1), Cosimo Della Santina (1 and 2) ((1) Department of Cognitive Robotics, Delft University of Technology, Delft, The Netherlands, (2) Institute of Robotics and Mechatronics, German Aerospace Center (DLR), Oberpfaffenhofen, Germany)
Comments: 15 pages, 18 figures. Accepted for publication as a Regular Paper in the IEEE Transactions on Robotics (T-RO)
Subjects: Robotics (cs.RO)

Model-based manipulation of deformable objects has traditionally dealt with objects while neglecting their dynamics, thus mostly focusing on very lightweight objects at steady state. At the same time, soft robotic research has made considerable strides toward general modeling and control, despite soft robots and deformable objects being very similar from a mechanical standpoint. In this work, we leverage these recent results to develop a control-oriented, fully dynamic framework of slender deformable objects grasped at one end by a robotic manipulator. We introduce a dynamic model of this system using functional strain parameterizations and describe the manipulation challenge as a regulation control problem. This enables us to define a fully model-based control architecture, for which we can prove analytically closed-loop stability and provide sufficient conditions for steady state convergence to the desired state. The nature of this work is intended to be markedly experimental. We provide an extensive experimental validation of the proposed ideas, tasking a robot arm with controlling the distal end of six different cables, in a given planar position and orientation in space.

[418] arXiv:2403.02284 (replaced) [pdf, other]
Title: Graphical Quadratic Algebra
Dario Stein, Fabio Zanasi, Robin Piedeleu, Richard Samuelson
Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT); Optimization and Control (math.OC)

Convex analysis and Gaussian probability are tightly connected, as mostly evident in the theory of linear regression. Our work introduces an algebraic perspective on such relationship, in the form of a diagrammatic calculus of string diagrams, called Graphical Quadratic Algebra (GQA). We show that GQA is a complete axiomatisation for the category of quadratic relations, a compositional formulation of quadratic problems. Moreover, we identify a sub-theory of GQA which is complete for the category of Gaussian probabilistic processes. We show how GQA may be used to study linear regression and probabilistic programming.

[419] arXiv:2403.04931 (replaced) [pdf, html, other]
Title: A Survey on Human-AI Collaboration with Large Foundation Models
Vanshika Vats, Marzia Binta Nizam, Minghao Liu, Ziyuan Wang, Richard Ho, Mohnish Sai Prasad, Vincent Titterton, Sai Venkat Malreddy, Riya Aggarwal, Yanwen Xu, Lei Ding, Jay Mehta, Nathan Grinnell, Li Liu, Sijia Zhong, Devanathan Nallur Gandamani, Xinyi Tang, Rohan Ghosalkar, Celeste Shen, Rachel Shen, Nafisa Hussain, Kesav Ravichandran, James Davis
Comments: Topic and scope refinement
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

As the capabilities of artificial intelligence (AI) continue to expand rapidly, Human-AI (HAI) Collaboration, combining human intellect and AI systems, has become pivotal for advancing problem-solving and decision-making processes. The advent of Large Foundation Models (LFMs) has greatly expanded its potential, offering unprecedented capabilities by leveraging vast amounts of data to understand and predict complex patterns. At the same time, realizing this potential responsibly requires addressing persistent challenges related to safety, fairness, and control. This paper reviews the crucial integration of LFMs with HAI, highlighting both opportunities and risks. We structure our analysis around four areas: human-guided model development, collaborative design principles, ethical and governance frameworks, and applications in high-stakes domains. Our review shows that successful HAI systems are not the automatic result of stronger models but the product of careful, human-centered design. By identifying key open challenges, this survey aims to give insight into current and future research that turns the raw power of LFMs into partnerships that are reliable, trustworthy, and beneficial to society.

[420] arXiv:2403.09752 (replaced) [pdf, html, other]
Title: Explainable Machine Learning-Based Security and Privacy Protection Framework for Internet of Medical Things Systems
Ayoub Si-ahmed, Mohammed Ali Al-Garadi, Narhimene Boustia
Comments: 40 pages, 13 figures, 6 tables, journal paper
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

The Internet of Medical Things transcends traditional medical boundaries, enabling a transition from reactive treatment to proactive prevention. This innovative method revolutionizes healthcare by facilitating early disease detection and tailored care, particularly in chronic disease management, where IoMT automates treatments based on real-time health data collection. Nonetheless, its benefits are countered by significant security challenges that endanger the lives of its users due to the sensitivity and value of the processed data, thereby attracting malicious interests. Moreover, the utilization of wireless communication for data transmission exposes medical data to interception and tampering by cybercriminals. Additionally, anomalies may arise due to human error, network interference, or hardware malfunctions. In this context, anomaly detection based on Machine Learning (ML) is an interesting solution, but it comes up against obstacles in terms of explicability and privacy protection. To address these challenges, a new framework for Intrusion Detection Systems is introduced, leveraging Artificial Neural Networks for intrusion detection while utilizing Federated Learning for privacy preservation. Additionally, eXplainable Artificial Intelligence methods are incorporated to enhance model explanation and interpretation. The efficacy of the proposed framework is evaluated and compared with centralized approaches using multiple datasets containing network and medical data, simulating various attack types impacting the confidentiality, integrity, and availability of medical and physiological data. The results offer compelling evidence that the FL method performs comparably to the centralized method, demonstrating high performance. Additionally, it affords the dual advantage of safeguarding privacy and providing model explanation while adhering to ethical principles.

[421] arXiv:2404.00024 (replaced) [pdf, html, other]
Title: Hey, Teacher, (Don't) Leave Those Kids Alone: Standardizing HRI Education
Alexis E. Block
Comments: Presented at the Designing an Intro to HRI Course Workshop at HRI 2024 (arXiv:2403.05588)
Subjects: Robotics (cs.RO); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Creating a standardized introduction course becomes more critical as the field of human-robot interaction (HRI) becomes more established. This paper outlines the key components necessary to provide an undergraduate with a sufficient foundational understanding of the interdisciplinary nature of this field and provides proposed course content. It emphasizes the importance of creating a course with theoretical and experimental components to accommodate all different learning preferences. This manuscript also advocates creating or adopting a universal platform to standardize the hands-on component of introductory HRI courses, regardless of university funding or size. Next, it recommends formal training in how to read scientific articles and staying up-to-date with the latest relevant papers. Finally, it provides detailed lecture content and project milestones for a 15-week semester. By creating a standardized course, researchers can ensure consistency and quality are maintained across institutions, which will help students as well as industrial and academic employers understand what foundational knowledge is expected.

[422] arXiv:2404.08217 (replaced) [pdf, other]
Title: Escape with Your Self: Sound and Expressive Bidirectional Typing with Avoidance for Reachability Types
Songlin Jia, Guannan Wei, Siyuan He, Yuyan Bao, Tiark Rompf
Subjects: Programming Languages (cs.PL)

Algorithmic type checking and inference of reachability types present a particular challenge with regards to subtyping. As a restricted form of dependent types, reachability types are subject to the avoidance problem: a variable mentioned in types becomes ill-scoped when its defining scope ends. Prior works thus introduce self-references, akin to this pointers in OO languages, to replace the escaping variable, so that an escaping object's this pointer can serve as the new logical owner of any captured resources. Nevertheless, conversions involving self-references require reasoning about function qualifiers. As prior work isolates subtyping judgements from associated qualifiers, their system requires manually-inserted term-level coercions (i.e., $\eta$-expansion) to support escaping values. This, of course, is highly unsatisfactory for algorithmic avoidance.
In this work, we propose the first typing algorithm for reachability types with formal soundness guarantees, and with an avoidance strategy based entirely on subtyping. We first present a refined declarative reachability type system, $G_{<:}^\blacklozenge$, which includes an expressive self-aware subtyping theory for self-references, and is built on algorithmic contexts where holes can reside in partially specified qualifiers. On top of that, we develop the bidirectional typing system, $G_\leftrightharpoons^\blacklozenge$, which infers qualifiers by a lightweight unification mechanism, and converts types automatically for avoidance. $G_{<:}^\blacklozenge$ is proven sound by a logical relation, and $G_\leftrightharpoons^\blacklozenge$ is proven decidable and sound with respect to $G_{<:}^\blacklozenge$. The result is an end-to-end formally verified type checker, implemented and mechanized in Lean, which is able to type-check challenging example programs such as escaping Church-encoded data types.

[423] arXiv:2405.06208 (replaced) [pdf, html, other]
Title: A Lock-free Binary Trie
Jeremy Ko
Subjects: Data Structures and Algorithms (cs.DS)

A binary trie is a sequential data structure for a dynamic set on the universe $\{0,\dots,u-1\}$ supporting Search with $O(1)$ worst-case step complexity, and Insert, Delete, and Predecessor operations with $O(\log u)$ worst-case step complexity.
We give a wait-free implementation of a relaxed binary trie, using read, write, CAS, and ($\log u$)-bit AND operations. It supports all operations with the same worst-case step complexity as the sequential binary trie. However, Predecessor operations may not return a key when there are concurrent update operations. We use this as a component of a lock-free, linearizable implementation of a binary trie. It supports Search with $O(1)$ worst-case step complexity and Insert, Delete and Predecessor with $O(c^2 + \log u)$ amortized step complexity, where $c$ is a measure of the contention.
A lock-free binary trie is challenging to implement as compared to many other lock-free data structures because Insert and Delete operations perform a non-constant number of modifications to the binary trie in the worst-case to ensure the correctness of Predecessor operations.

[424] arXiv:2405.06464 (replaced) [pdf, other]
Title: Single-seed generation of Brownian paths and integrals for adaptive and high order SDE solvers
Andraž Jelinčič, James Foster, Patrick Kidger
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Probability (math.PR); Computation (stat.CO)

Despite the success of adaptive time-stepping in ODE simulation, it has so far seen few applications for Stochastic Differential Equations (SDEs). To simulate SDEs adaptively, methods such as the Virtual Brownian Tree (VBT) have been developed, which can generate Brownian motion (BM) non-chronologically. However, in most applications, knowing only the values of Brownian motion is not enough to achieve a high order of convergence; for that, we must compute time-integrals of BM such as $\int_s^t W_r \, dr$. With the aim of using high order SDE solvers adaptively, we extend the VBT to generate these integrals of BM in addition to the Brownian increments. A JAX-based implementation of our construction is included in the popular Diffrax library (this https URL).
Since the entire Brownian path produced by VBT is uniquely determined by a single PRNG seed, previously generated samples need not be stored, which results in a constant memory footprint and enables experiment repeatability and strong error estimation. Based on binary search, the VBT's time complexity is logarithmic in the tolerance parameter $\varepsilon$. Unlike the original VBT algorithm, which was only precise at some dyadic times, we prove that our construction exactly matches the joint distribution of the Brownian motion and its time integrals at any query times, provided they are at least $\varepsilon$ apart.
We present two applications of adaptive high order solvers enabled by our new VBT. Using adaptive solvers to simulate a high-volatility CIR model, we achieve more than twice the convergence order of constant stepping. We apply an adaptive third order underdamped or kinetic Langevin solver to an MCMC problem, where our approach outperforms the No U-Turn Sampler, while using only a tenth of its function evaluations.

[425] arXiv:2405.19229 (replaced) [pdf, html, other]
Title: On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios
Stylianos Loukas Vasileiou, William Yeoh, Alessandro Previti, Tran Cao Son
Subjects: Artificial Intelligence (cs.AI)

Explanation generation frameworks aim to make AI systems' decisions transparent and understandable to human users. However, generating explanations in uncertain environments characterized by incomplete information and probabilistic models remains a significant challenge. In this paper, we propose a novel framework for generating probabilistic monolithic explanations and model reconciling explanations. Monolithic explanations provide self-contained reasons for an explanandum without considering the agent receiving the explanation, while model reconciling explanations account for the knowledge of the agent receiving the explanation. For monolithic explanations, our approach integrates uncertainty by utilizing probabilistic logic to increase the probability of the explanandum. For model reconciling explanations, we propose a framework that extends the logic-based variant of the model reconciliation problem to account for probabilistic human models, where the goal is to find explanations that increase the probability of the explanandum while minimizing conflicts between the explanation and the probabilistic human model. We introduce explanatory gain and explanatory power as quantitative metrics to assess the quality of these explanations. Further, we present algorithms that exploit the duality between minimal correction sets and minimal unsatisfiable sets to efficiently compute both types of explanations in probabilistic contexts. Extensive experimental evaluations on various benchmarks demonstrate the effectiveness and scalability of our approach in generating explanations under uncertainty.

[426] arXiv:2406.09701 (replaced) [pdf, html, other]
Title: Towards Explainable Vulnerability Detection with Large Language Models
Qiheng Mao, Zhenhao Li, Xing Hu, Kui Liu, Xin Xia, Jianling Sun
Subjects: Software Engineering (cs.SE)

Software vulnerabilities pose significant risks to the security and integrity of software systems. Although prior studies have explored vulnerability detection using deep learning and pre-trained models, these approaches often fail to provide the detailed explanations necessary for developers to understand and remediate vulnerabilities effectively. The advent of large language models (LLMs) has introduced transformative potential due to their advanced generative capabilities and ability to comprehend complex contexts, offering new possibilities for addressing these challenges. In this paper, we propose LLMVulExp, an automated framework designed to specialize LLMs for the dual tasks of vulnerability detection and explanation. To address the challenges of acquiring high-quality annotated data and injecting domain-specific knowledge, LLMVulExp leverages prompt-based techniques for annotating vulnerability explanations and finetunes LLMs using instruction tuning with Low-Rank Adaptation (LoRA), enabling LLMVulExp to detect vulnerability types in code while generating detailed explanations, including the cause, location, and repair suggestions. Additionally, we employ a Chain-of-Thought (CoT) based key code extraction strategy to focus LLMs on analyzing vulnerability-prone code, further enhancing detection accuracy and explanatory depth. Our experimental results demonstrate that LLMVulExp achieves over a 90% F1 score on the SeVC dataset, effectively combining high detection accuracy with actionable and coherent explanations. This study highlights the feasibility of utilizing LLMs for real-world vulnerability detection and explanation tasks, providing critical insights into their adaptation and application in software security.

[427] arXiv:2406.13748 (replaced) [pdf, html, other]
Title: Learn and Unlearn: Addressing Misinformation in Multilingual LLMs
Taiming Lu, Philipp Koehn
Comments: EMNLP 2025 Main Conference
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This paper investigates the propagation of harmful information in multilingual large language models (LLMs) and evaluates the efficacy of various unlearning methods. We demonstrate that fake information, regardless of the language it is in, once introduced into these models through training data, can spread across different languages, compromising the integrity and reliability of the generated content. Our findings reveal that standard unlearning techniques, which typically focus on English data, are insufficient in mitigating the spread of harmful content in multilingual contexts and could inadvertently reinforce harmful content across languages. We show that only by addressing harmful responses in both English and the original language of the harmful data can we effectively eliminate generations for all languages. This underscores the critical need for comprehensive unlearning strategies that consider the multilingual nature of modern LLMs to enhance their safety and reliability across diverse linguistic landscapes.

[428] arXiv:2406.14427 (replaced) [pdf, html, other]
Title: Frugal inference for control
Itzel Olivos-Castillo, Paul Schrater, Xaq Pitkow
Subjects: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

A key challenge in advancing artificial intelligence is achieving the right balance between utility maximization and resource use by both external movement and internal computation. While this trade-off has been studied in fully observable settings, our understanding of resource efficiency in partially observable environments remains limited. Motivated by this challenge, we develop a version of the POMDP framework where the information gained through inference is treated as a resource that must be optimized alongside task performance and motion effort. By solving this problem in environments described by linear-Gaussian dynamics, we uncover fundamental principles of resource efficiency. Our study reveals a phase transition in the inference, switching from a Bayes-optimal approach to one that strategically leaves some uncertainty unresolved. This frugal behavior gives rise to a structured family of equally effective strategies, facilitating adaptation to later objectives and constraints overlooked during the original optimization. We illustrate the applicability of our framework and the generality of the principles we derived using two nonlinear tasks. Overall, this work provides a foundation for a new type of rational computation that both brains and machines could use for effective but resource-efficient control under uncertainty.

[429] arXiv:2406.15486 (replaced) [pdf, html, other]
Title: SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Guanyu Feng, Xin Lv, Xiao Chuanfu, Dahua Lin, Chao Yang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for near-lossless sparse attention. We find dynamically capturing head-specific sparse patterns at runtime with low overhead is crucial. To address this, we propose SampleAttention, an adaptive structured and near-lossless sparse attention. Leveraging observed significant sparse patterns, SampleAttention attends to a fixed percentage of adjacent tokens to capture local window patterns, and employs a two-stage query-guided key-value filtering approach, which adaptively select a minimum set of key-values with low overhead, to capture column stripe patterns. Comprehensive evaluations show that SampleAttention can seamlessly replace vanilla attention in off-the-shelf LLMs with nearly no accuracy loss, and reduces TTFT by up to $2.42\times$ compared with FlashAttention.

[430] arXiv:2406.15596 (replaced) [pdf, html, other]
Title: DiVerify: Hardening Identity-Based Software Signing with Programmable Diverse-Context Scopes
Chinenye L. Okafor, Trishank Kuppusamy, James C. Davis, Santiago Torres-Arias
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Code signing enables software developers to digitally sign their code using cryptographic keys, thereby associating the code with a specific key. This key is then linked to an identity (e.g., through an identity provider), allowing users to establish trust in the origin of the signature and verify both the code's origin and integrity. However, this code-identity binding is only as trustworthy as the mechanisms enforcing it. State-of-the-art identity-based code signing schemes have a major shortcoming: they fail to provide verifiable information about the context in which a signature is generated. If an identity verification server is compromised or the signing client behaves maliciously, the resulting signature may falsely suggest a trustworthy origin, despite the absence of actual developer intent.
To address these issues, we propose a diverse identity verification approach that reduces reliance on a single source of verification and enforces stronger guarantees around the signing process itself. By combining multiple identity signals with verifiable execution environments, our system improves confidence that signatures reflect the intent of a legitimate user, produced under expected conditions. Signing in our DiVerify prototype incurs only a few kilobytes of additional storage - less than 0.4% of the average package size in widely used ecosystems like PyPI, and signing complete in under 100ms on a typical deployment.

[431] arXiv:2406.17642 (replaced) [pdf, other]
Title: Banishing LLM Hallucinations Requires Rethinking Generalization
Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos
Comments: I want to revisit some of the experiments in this paper, specifically figure 5
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.

[432] arXiv:2407.00079 (replaced) [pdf, html, other]
Title: Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu
Comments: 23 pages, 13 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. The core of Mooncake is its KVCache-centric scheduler, which balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs). Unlike traditional studies that assume all requests will be processed, Mooncake faces challenges due to highly overloaded scenarios. To mitigate these, we developed a prediction-based early rejection policy. Experiments show that Mooncake excels in long-context scenarios. Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs. Under real workloads, Mooncake's innovative architecture enables Kimi to handle 75% more requests.

[433] arXiv:2408.07897 (replaced) [pdf, html, other]
Title: The Nah Bandit: Modeling User Non-compliance in Recommendation Systems
Tianyue Zhou, Jung-Hoon Cho, Cathy Wu
Comments: 12 pages, 8 figures, accepted by IEEE Transactions on Control of Network Systems
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

Recommendation systems now pervade the digital world, ranging from advertising to entertainment. However, it remains challenging to implement effective recommendation systems in the physical world, such as in mobility or health. This work focuses on a key challenge: in the physical world, it is often easy for the user to opt out of taking any recommendation if they are not to her liking, and to fall back to her baseline behavior. It is thus crucial in cyber-physical recommendation systems to operate with an interaction model that is aware of such user behavior, lest the user abandon the recommendations altogether. This paper thus introduces the Nah Bandit, a tongue-in-cheek reference to describe a Bandit problem where users can say `nah' to the recommendation and opt for their preferred option instead. As such, this problem lies in between a typical bandit setup and supervised learning. We model the user non-compliance by parameterizing an anchoring effect of recommendations on users. We then propose the Expert with Clustering (EWC) algorithm, a hierarchical approach that incorporates feedback from both recommended and non-recommended options to accelerate user preference learning. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ clusters, EWC achieves a regret bound of $O(N\sqrt{T\log K} + NT)$, achieving superior theoretical performance in the short term compared to LinUCB algorithm. Experimental results also highlight that EWC outperforms both supervised learning and traditional contextual bandit approaches. This advancement reveals that effective use of non-compliance feedback can accelerate preference learning and improve recommendation accuracy. This work lays the foundation for future research in Nah Bandit, providing a robust framework for more effective recommendation systems.

[434] arXiv:2408.16206 (replaced) [pdf, html, other]
Title: RMMI: Reactive Mobile Manipulation using an Implicit Neural Map
Nicolas Marticorena, Tobias Fischer, Jesse Haviland, Niko Suenderhauf
Comments: 8 pages, 6 figures, accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Subjects: Robotics (cs.RO)

Mobile manipulator robots operating in complex domestic and industrial environments must effectively coordinate their base and arm motions while avoiding obstacles. While current reactive control methods gracefully achieve this coordination, they rely on simplified and idealised geometric representations of the environment to avoid collisions. This limits their performance in cluttered environments. To address this problem, we introduce RMMI, a reactive control framework that leverages the ability of neural Signed Distance Fields (SDFs) to provide a continuous and differentiable representation of the environment's geometry. RMMI formulates a quadratic program that optimises jointly for robot base and arm motion, maximises the manipulability, and avoids collisions through a set of inequality constraints. These constraints are constructed by querying the SDF for the distance and direction to the closest obstacle for a large number of sampling points on the robot. We evaluate RMMI both in simulation and in a set of real-world experiments. For reaching in cluttered environments, we observe a 25% increase in success rate. For additional details, code, and experiment videos, please visit this https URL.

[435] arXiv:2409.00061 (replaced) [pdf, other]
Title: Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
Arief Purnama Muharram, Ayu Purwarianti
Comments: Accepted for publication in the Journal of ICT Research and Applications (JICTRA)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Automated fact-checking is a key strategy to overcome the spread of COVID-19 misinformation on the internet. These systems typically leverage deep learning approaches through Natural Language Inference (NLI) to verify the truthfulness of information based on supporting evidence. However, one challenge that arises in deep learning is performance stagnation due to a lack of knowledge during training. This study proposes using a Knowledge Graph (KG) as external knowledge to enhance NLI performance for automated COVID-19 fact-checking in the Indonesian language. The proposed model architecture comprises three modules: a fact module, an NLI module, and a classifier module. The fact module processes information from the KG, while the NLI module handles semantic relationships between the given premise and hypothesis. The representation vectors from both modules are concatenated and fed into the classifier module to produce the final result. The model was trained using the generated Indonesian COVID-19 fact-checking dataset and the COVID-19 KG Bahasa Indonesia. Our study demonstrates that incorporating KGs can significantly improve NLI performance in fact-checking, achieving the best accuracy of 0.8616. This suggests that KGs are a valuable component for enhancing NLI performance in automated fact-checking.

[436] arXiv:2409.01120 (replaced) [pdf, other]
Title: Coverage and metadata completeness and accuracy of African research publications in OpenAlex: A comparative analysis
Patricia Alonso-Alvarez, Nees Jan van Eck
Subjects: Digital Libraries (cs.DL)

Unlike traditional proprietary data sources such as Scopus and the Web of Science (WoS), OpenAlex emphasizes its comprehensiveness. This study analyzes OpenAlex coverage and metadata completeness and accuracy of African research publications. To achieve this, OpenAlex is compared with Scopus, WoS, and African Journals Online (AJOL). First, we examine the coverage of African research publications in OpenAlex relative to Scopus, WoS, and AJOL. Then, we assess and compare the availability and accuracy of metadata in OpenAlex, Scopus, and WoS. The findings indicate that OpenAlex offers the most extensive publication coverage. In terms of metadata, OpenAlex provides high coverage for publication and author information, though its coverage of affiliations, references, and funder information is comparatively lower. Metadata accuracy is similarly high for publication and author fields, while affiliation, reference, and funding information show higher rates of missing or incomplete data. Notably, the results demonstrate that both metadata availability and accuracy in OpenAlex improve significantly for publications also indexed in Scopus and WoS. These findings suggest that OpenAlex has the potential to replace proprietary data sources for certain types of analyses. However, for some metadata fields, there remains a trade-off between extensiveness and accuracy.

[437] arXiv:2409.06509 (replaced) [pdf, html, other]
Title: Aligning Machine and Human Visual Representations across Abstraction Levels
Lukas Muttenthaler, Klaus Greff, Frieda Born, Bernhard Spitzer, Simon Kornblith, Michael C. Mozer, Klaus-Robert Müller, Thomas Unterthiner, Andrew K. Lampinen
Comments: 91 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Deep neural networks have achieved success across a wide range of applications, including as models of human behavior and neural representations in vision tasks. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do raising questions regarding the similarity of their underlying representations. What is missing for modern learning systems to exhibit more human-aligned behavior? We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-aligned structure from its representations to refine the representations of pretrained state-of-the-art vision foundation models via finetuning. These human-aligned models more accurately approximate human behavior and uncertainty across a wide range of similarity tasks, including a new dataset of human judgments spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognitive judgments and more practically useful, thus paving the way toward more robust, interpretable, and human-aligned artificial intelligence systems.

[438] arXiv:2409.11017 (replaced) [pdf, html, other]
Title: Stretchable Electrohydraulic Artificial Muscle for Full Motion Ranges in Musculoskeletal Antagonistic Joints
Amirhossein Kazemipour, Ronan Hinchet, Robert K. Katzschmann
Comments: This paper has been accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2025
Subjects: Robotics (cs.RO)

Artificial muscles play a crucial role in musculoskeletal robotics and prosthetics to approximate the force-generating functionality of biological muscle. However, current artificial muscle systems are typically limited to either contraction or extension, not both. This limitation hinders the development of fully functional artificial musculoskeletal systems. We address this challenge by introducing an artificial antagonistic muscle system capable of both contraction and extension. Our design integrates non-stretchable electrohydraulic soft actuators (HASELs) with electrostatic clutches within an antagonistic musculoskeletal framework. This configuration enables an antagonistic joint to achieve a full range of motion without displacement loss due to tendon slack. We implement a synchronization method to coordinate muscle and clutch units, ensuring smooth motion profiles and speeds. This approach facilitates seamless transitions between antagonistic muscles at operational frequencies of up to 3.2 Hz. While our prototype utilizes electrohydraulic actuators, this muscle-clutch concept is adaptable to other non-stretchable artificial muscles, such as McKibben actuators, expanding their capability for extension and full range of motion in antagonistic setups. Our design represents a significant advancement in the development of fundamental components for more functional and efficient artificial musculoskeletal systems, bringing their capabilities closer to those of their biological counterparts.

[439] arXiv:2409.11171 (replaced) [pdf, other]
Title: Preventing Inactive CBF Safety Filters Caused by Invalid Relative Degree Assumptions
Lukas Brunke, Siqi Zhou, Angela P. Schoellig
Comments: 8 pages, 4 figures, accepted for publication in the IEEE Transactions on Automatic Control
Subjects: Systems and Control (eess.SY)

Control barrier function (CBF) safety filters emerged as a popular framework to certify and modify potentially unsafe control inputs, for example, provided by a reinforcement learning agent or a non-expert user. Typical CBF safety filter designs assume that the system has a uniform relative degree. This assumption is restrictive and is frequently overlooked in practice. When violated, the assumption can cause the safety filter to become inactive, allowing large and possibly unsafe control inputs to be applied to the system. In discrete-time implementations, the inactivity issue is often manifested as chattering close to the safety boundary and/or constraint violations. In this work, we provide an in-depth discussion on the safety filter inactivity issue, propose a mitigation strategy based on multiple CBFs, and derive an upper bound on the sampling time for safety under sampled-data control. The effectiveness of our proposed method is validated through both simulation and quadrotor experiments.

[440] arXiv:2409.11661 (replaced) [pdf, html, other]
Title: Bridging the Domain Gap for Flight-Ready Spaceborne Vision
Tae Ha Park, Simone D'Amico
Comments: Accepted to Journal of Spacecraft and Rockets
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work presents Spacecraft Pose Network v3 (SPNv3), a Neural Network (NN) for monocular pose estimation of a known, non-cooperative target spacecraft. SPNv3 is designed and trained to be computationally efficient while providing robustness to spaceborne images that have not been observed during offline training and validation on the ground. These characteristics are essential to deploying NNs on space-grade edge devices. They are achieved through careful NN design choices, and an extensive trade-off analysis reveals features such as data augmentation, transfer learning and vision transformer architecture as a few of those that contribute to simultaneously maximizing robustness and minimizing computational overhead. Experiments demonstrate that the final SPNv3 can achieve state-of-the-art pose accuracy on hardware-in-the-loop images from a robotic testbed while having trained exclusively on computer-generated synthetic images, effectively bridging the domain gap between synthetic and real imagery. At the same time, SPNv3 runs well above the update frequency of modern satellite navigation filters when tested on a representative graphical processing unit system with flight heritage. Overall, SPNv3 is an efficient, flight-ready NN model readily applicable to close-range rendezvous and proximity operations with target resident space objects.

[441] arXiv:2409.15550 (replaced) [pdf, html, other]
Title: Talk, Listen, Connect: How Humans and AI Evaluate Empathy in Responses to Emotionally Charged Narratives
Mahnaz Roshanaei, Rezvaneh Rezapour, Magy Seif El-Nasr
Comments: 21 pages, 4 figures, 6 tables. Title updated from "Talk, Listen, Connect: Navigating Empathy in Human-AI Interactions" to "Talk, Listen, Connect: How Humans and AI Evaluate Empathy in Responses to Emotionally Charged Narratives" in this version. This is version 2 (v2) of the paper. All previous citations of arXiv:2409.15550 with the old title still refer to the same paper
Subjects: Human-Computer Interaction (cs.HC)

Social interactions promote well-being, yet barriers like geographic distance, time limitations, and mental health conditions can limit face-to-face interactions. Emotionally responsive AI systems, such as chatbots, offer new opportunities for social and emotional support, but raise critical questions about how empathy is perceived and experienced in human-AI interactions. This study examines how empathy is evaluated in AI-generated versus human responses. Using personal narratives, we explored how persona attributes (e.g., gender, empathic traits, shared experiences) and story qualities affect empathy ratings. We compared responses from standard and fine-tuned AI models with human judgments. Results show that while humans are highly sensitive to emotional vividness and shared experience, AI-responses are less influenced by these cues, often lack nuance in empathic expression. These findings highlight challenges in designing emotionally intelligent systems that respond meaningfully across diverse users and contexts, and informs the design of ethically aware tools to support social connection and well-being.

[442] arXiv:2409.19954 (replaced) [pdf, html, other]
Title: Domain Consistency Representation Learning for Lifelong Person Re-Identification
Shiben Liu, Huijie Fan, Qiang Wang, Weihong Ren, Yandong Tang, Yang Cong
Comments: 12 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Lifelong person re-identification (LReID) exhibits a contradictory relationship between intra-domain discrimination and inter-domain gaps when learning from continuous data. Intra-domain discrimination focuses on individual nuances (i.e., clothing type, accessories, etc.), while inter-domain gaps emphasize domain consistency. Achieving a trade-off between maximizing intra-domain discrimination and minimizing inter-domain gaps is a crucial challenge for improving LReID performance. Most existing methods strive to reduce inter-domain gaps through knowledge distillation to maintain domain consistency. However, they often ignore intra-domain discrimination. To address this challenge, we propose a novel domain consistency representation learning (DCR) model that explores global and attribute-wise representations as a bridge to balance intra-domain discrimination and inter-domain gaps. At the intra-domain level, we explore the complementary relationship between global and attribute-wise representations to improve discrimination among similar identities. Excessive learning intra-domain discrimination can lead to catastrophic forgetting. We further develop an attribute-oriented anti-forgetting (AF) strategy that explores attribute-wise representations to enhance inter-domain consistency, and propose a knowledge consolidation (KC) strategy to facilitate knowledge transfer. Extensive experiments show that our DCR achieves superior performance compared to state-of-the-art LReID methods. Our code is available at this https URL.

[443] arXiv:2410.03861 (replaced) [pdf, html, other]
Title: Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering
Laura Fink, Linus Franke, Bernhard Egger, Joachim Keinert, Marc Stamminger
Comments: 8 pages main paper + 3 pages of references + 6 pages appendix
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate depth estimation is at the core of many applications in computer graphics, vision, and robotics. Current state-of-the-art monocular depth estimators, trained on extensive datasets, generalize well but lack 3D consistency needed for many applications. In this paper, we combine the strength of those generalizing monocular depth estimation techniques with multi-view data by framing this as an analysis-by-synthesis optimization problem to lift and refine such relative depth maps to accurate error-free depth maps. After an initial global scale estimation through structure-from-motion point clouds, we further refine the depth map through optimization enforcing multi-view consistency via photometric and geometric losses with differentiable rendering of the meshed depth map. In a two-stage optimization, scaling is further refined first, and afterwards artifacts and errors in the depth map are corrected via nearby-view photometric supervision. Our evaluation shows that our method is able to generate detailed, high-quality, view consistent, accurate depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art multi-view depth reconstruction approaches on such datasets.
Project page and source code can be found at this https URL.

[444] arXiv:2410.06340 (replaced) [pdf, html, other]
Title: FedGraph: A Research Library and Benchmark for Federated Graph Learning
Yuhang Yao, Yuan Li, Xinyi Fan, Junhao Li, Kay Liu, Weizhao Jin, Yu Yang, Srivatsan Ravi, Philip S. Yu, Carlee Joe-Wong
Comments: this https URL
Subjects: Machine Learning (cs.LG)

Federated graph learning is an emerging field with significant practical challenges. While algorithms have been proposed to improve the accuracy of training graph neural networks, such as node classification on federated graphs, the system performance is often overlooked, despite it is crucial for real-world deployment. To bridge this gap, we introduce FedGraph, a research library designed for practical distributed training and comprehensive benchmarking of FGL algorithms. FedGraph supports a range of state-of-the-art graph learning methods and includes a monitoring class that evaluates system performance, with a particular focus on communication and computation costs during training. Unlike existing federated learning platforms, FedGraph natively integrates homomorphic encryption to enhance privacy preservation and supports scalable deployment across multiple physical machines with system-level performance evaluation to guide the system design of future algorithms. To enhance efficiency and privacy, we propose a low-rank communication scheme for algorithms like FedGCN that require pre-training communication, accelerating both the pre-training and training phases. Extensive experiments benchmark FGL algorithms on three major graph learning tasks and demonstrate FedGraph as the first efficient FGL framework to support encrypted low-rank communication and scale to graphs with 100 million nodes.

[445] arXiv:2410.15048 (replaced) [pdf, other]
Title: MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration
Siyuan Lu, Jiaqi Shao, Bing Luo, Tao Lin
Subjects: Artificial Intelligence (cs.AI)

Large Language Model (LLM) based multi-agent systems (MAS) have shown promise in tackling complex tasks, but often rely on predefined roles and centralized coordination, limiting their adaptability to evolving challenges. This paper introduces MorphAgent, a novel Autonomous, Self-Organizing, and Self-Adaptive Multi-Agent System for decentralized agent collaboration that enables agents to dynamically evolve their roles and capabilities. Our approach employs self-evolving agent profiles, optimized through three key metrics, guiding agents in refining their individual expertise while maintaining complementary team dynamics. MorphAgent implements a two-phase process: a Profile Update phase for profile optimization, followed by a Task Execution phase where agents continuously adapt their roles based on task feedback. Our experimental results show that MorphAgent outperforms existing frameworks in terms of task performance and adaptability to changing requirements, paving the way for more robust and versatile multi-agent collaborative systems.

[446] arXiv:2410.16033 (replaced) [pdf, html, other]
Title: TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Chenhao Zhu, Xinzhe Juan, Ling Yang, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, HH-RLHF, UltraFeedback, GSM8K, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves the highest win rate of 65% on TutorEval and around 60% win rates across other different datasets, outperforming standard BoN with the same computational cost and showcasing its scalability and alignment efficacy.

[447] arXiv:2410.16822 (replaced) [pdf, html, other]
Title: Can Large Language Models Act as Ensembler for Multi-GNNs?
Hanqi Duan, Yao Cheng, Jianxiang Yu, Yao Liu, Xiang Li
Subjects: Artificial Intelligence (cs.AI)

Graph Neural Networks (GNNs) have emerged as powerful models for learning from graph-structured data. However, GNNs lack the inherent semantic understanding capability of rich textual node attributes, limiting their effectiveness in applications. On the other hand, we empirically observe that for existing GNN models, no one can consistently outperforms others across diverse datasets. In this paper, we study whether LLMs can act as an ensembler for multi-GNNs and propose the LensGNN model. The model first aligns multiple GNNs, mapping the representations of different GNNs into the same space. Then, through LoRA fine-tuning, it aligns the space between the GNN and the LLM, injecting graph tokens and textual information into LLMs. This allows LensGNN to ensemble multiple GNNs and take advantage of the strengths of LLM, leading to a deeper understanding of both textual semantic information and graph structural information. The experimental results show that LensGNN outperforms existing models. This research advances text-attributed graph ensemble learning by providing a robust and superior solution for integrating semantic and structural information. We provide our code and data here: this https URL.

[448] arXiv:2410.18002 (replaced) [pdf, html, other]
Title: Digital Network Twins for Next-generation Wireless: Creation, Optimization, and Challenges
Zifan Zhang, Zhiyuan Peng, Hanzhi Yu, Mingzhe Chen, Yuchen Liu
Comments: Under Major Revision in IEEE Network
Subjects: Networking and Internet Architecture (cs.NI)

Digital network twins (DNTs), by representing a physical network using a virtual model, offer significant benefits such as streamlined network development, enhanced productivity, and cost reduction for next-generation (nextG) communication infrastructure. Existing works mainly describe the deployment of DNT technologies in various service this http URL full life cycle of DNTs for telecommunication has not yet been comprehensively studied, particularly in the aspects of fine-grained creation, real-time adaptation, resource-efficient deployment, and security protection. This article presents an in-depth overview of DNTs, exploring their concrete integration into networks and communication, covering the fundamental designs, the emergent applications, and critical challenges in multiple dimensions. We also include two detailed case studies to illustrate how DNTs can be applied in real-world scenarios such as wireless traffic forecasting and edge caching. Additionally, a forward-looking vision of the research opportunities in tackling the challenges of DNTs is provided, aiming to fully maximize the benefits of DNTs in nextG networks.

[449] arXiv:2410.20712 (replaced) [pdf, html, other]
Title: Interaction-Aware Vulnerability Detection in Smart Contract Bytecodes
Wenkai Li, Xiaoqi Li, Yingjie Mao, Yuqing Zhang
Comments: This work is accepted by TDSC
Subjects: Cryptography and Security (cs.CR)

The detection of vulnerabilities in smart contracts remains a significant challenge. While numerous tools are available for analyzing smart contracts in source code, only about 1.79% of smart contracts on Ethereum are open-source. For existing tools that target bytecodes, most of them only consider the semantic logic context and disregard function interface information in the bytecodes. In this paper, we propose COBRA, a novel framework that integrates semantic context and function interfaces to detect vulnerabilities in bytecodes of the smart contract. To our best knowledge, COBRA is the first framework that combines these two features. Moreover, to infer the function signatures that are not present in signature databases, we propose SRIF, automatically learn the rules of function signatures from the smart contract bytecodes. The bytecodes associated with the function signatures are collected by constructing a control flow graph (CFG) for the SRIF training. We optimize the semantic context using the operation code in the static single assignment (SSA) format. Finally, we integrate the context and function interface representations in the latent space as the contract feature embedding. The contract features in the hidden space are decoded for vulnerability classifications with a decoder and attention module. Experimental results demonstrate that SRIF can achieve 94.76% F1-score for function signature inference. Furthermore, when the ground truth ABI exists, COBRA achieves 93.45% F1-score for vulnerability classification. In the absence of ABI, the inferred function feature fills the encoder, and the system accomplishes an 89.46% recall rate.

[450] arXiv:2410.20940 (replaced) [pdf, html, other]
Title: Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models
Piotr Przybyła, Euan McGill, Horacio Saggion
Comments: Presented at EMNLP 2025
Subjects: Computation and Language (cs.CL)

Large language models have many beneficial applications, but can they also be used to attack content-filtering algorithms in social media platforms? We investigate the challenge of generating adversarial examples to test the robustness of text classification algorithms detecting low-credibility content, including propaganda, false claims, rumours and hyperpartisan news. We focus on simulation of content moderation by setting realistic limits on the number of queries an attacker is allowed to attempt. Within our solution (TREPAT), initial rephrasings are generated by large language models with prompts inspired by meaning-preserving NLP tasks, such as text simplification and style transfer. Subsequently, these modifications are decomposed into small changes, applied through beam search procedure, until the victim classifier changes its decision. We perform (1) quantitative evaluation using various prompts, models and query limits, (2) targeted manual assessment of the generated text and (3) qualitative linguistic analysis. The results confirm the superiority of our approach in the constrained scenario, especially in case of long input text (news articles), where exhaustive search is not feasible.

[451] arXiv:2410.22846 (replaced) [pdf, html, other]
Title: Integrating Knowledge Graphs and Visualization Dashboards for Advance Data Discovery in VESA
Pawandeep Kaur Betz, Tobias Hecking, Andreas Gerndt
Subjects: Databases (cs.DB)

The increasing complexity and scale of scientific datasets demand advanced tools for efficient discovery and exploration. Traditional search systems often fall short in addressing the multidimensional nature of data and their intricate relationships, limiting their utility for researchers. This paper introduces the Knowledge Graph Based Visualization Search Application (VESA), which reshapes the process of data discovery by leveraging knowledge graph technology to establish meaningful connections and employing a visualization dashboard to enable multidimensional exploration. A software prototype is developed, showcasing our use case of connecting two Earth System Science repositories via a knowledge graph backend and visualization dashboard at the frontend. The framework's effectiveness was assessed against guidelines derived from a comprehensive literature review and further validated through an online user study. The evaluation revealed positive reception, highlighting VESA's low learning curve, ease of use, and potential to enhance data discovery workflows.

[452] arXiv:2411.02708 (replaced) [pdf, html, other]
Title: Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Yunkai Dang, Mengxi Gao, Yibo Yan, Xin Zou, Yanggan Gu, Jungang Li, Jingyu Wang, Peijie Jiang, Aiwei Liu, Jia Liu, Xuming Hu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Multimodal large language models (MLLMs) have recently achieved state-of-the-art performance on tasks ranging from visual question answering to video understanding. However, existing studies have concentrated mainly on visual-textual misalignment, leaving largely unexplored the MLLMs' ability to preserve an originally correct answer when confronted with misleading information. We reveal a response uncertainty phenomenon: across nine standard datasets, twelve state-of-the-art open-source MLLMs overturn a previously correct answer in 65% of cases after receiving a single deceptive cue. To systematically quantify this vulnerability, we propose a two-stage evaluation pipeline: (1) elicit each model's original response on unperturbed inputs; (2) inject explicit (false-answer hints) and implicit (contextual contradictions) misleading instructions, and compute the misleading rate - the fraction of correct-to-incorrect flips. Leveraging the most susceptible examples, we curate the Multimodal Uncertainty Benchmark (MUB), a collection of image-question pairs stratified into low, medium, and high difficulty based on how many of twelve state-of-the-art MLLMs they mislead. Extensive evaluation on twelve open-source and five closed-source models reveals a high uncertainty: average misleading rates exceed 86%, with explicit cues over 67.19% and implicit cues over 80.67%. To reduce the misleading rate, we then fine-tune all open-source MLLMs on a compact 2000-sample mixed-instruction dataset, reducing misleading rates to 6.97% (explicit) and 32.77% (implicit), boosting consistency by nearly 29.37% on highly deceptive inputs, and slightly improving accuracy on standard benchmarks. Our code is available at this https URL

[453] arXiv:2411.03871 (replaced) [pdf, html, other]
Title: Safe Sequences via Dominators in DAGs for Path-Covering Problems
Francisco Sena, Romeo Rizzi, Alexandru I. Tomescu
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Genomics (q-bio.GN)

A path-covering problem on a directed acyclic graph (DAG) requires finding a set of source-to-sink paths that cover all the nodes, all the arcs, or subsets thereof, and additionally they are optimal with respect to some function. In this paper we study safe sequences of nodes or arcs, namely sequences that appear in some path of every path cover of a DAG.
We show that safe sequences admit a simple characterization via cutnodes. Moreover, we establish a connection between maximal safe sequences and leaf-to-root paths in the source- and sink-dominator trees of the DAG, which may be of independent interest in the extensive literature on dominators. With dominator trees, safe sequences admit an O(n)-size representation and a linear-time output-sensitive enumeration algorithm running in time O(m + o), where n and m are the number of nodes and arcs, respectively, and o is the total length of the maximal safe sequences.
We then apply maximal safe sequences to simplify Integer Linear Programs (ILPs) for two path-covering problems, LeastSquares and MinPathError, which are at the core of RNA transcript assembly problems from bioinformatics. On various datasets, maximal safe sequences can be computed in under 0.1 seconds per graph, on average, and ILP solvers whose search space is reduced in this manner exhibit significant speed-ups. For example on graphs with a large width, average speed-ups are in the range 50-250x for MinPathError and in the range 80-350x for LeastSquares. Optimizing ILPs using safe sequences can thus become a fast building block of practical RNA transcript assembly tools, and more generally, of path-covering problems.

[454] arXiv:2411.05085 (replaced) [pdf, html, other]
Title: PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation
Daniel C. Castro, Aurelia Bustos, Shruthi Bannur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores Sánchez-Valverde, Lara Jaques-Pérez, Lourdes Pérez-Rodríguez, Kenji Takeda, José María Salinas, Javier Alvarez-Valle, Joaquín Galant Herrero, Antonio Pertusa
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Radiology report generation (RRG) aims to create free-text radiology reports from clinical imaging. Grounded radiology report generation (GRRG) extends RRG by including the localisation of individual findings on the image. Currently, there are no manually annotated chest X-ray (CXR) datasets to train GRRG models. In this work, we present a dataset called PadChest-GR (Grounded-Reporting) derived from PadChest aimed at training GRRG models for CXR images. We curate a public bi-lingual dataset of 4,555 CXR studies with grounded reports (3,099 abnormal and 1,456 normal), each containing complete lists of sentences describing individual present (positive) and absent (negative) findings in English and Spanish. In total, PadChest-GR contains 7,037 positive and 3,422 negative finding sentences. Every positive finding sentence is associated with up to two independent sets of bounding boxes labelled by different readers and has categorical labels for finding type, locations, and progression. To the best of our knowledge, PadChest-GR is the first manually curated dataset designed to train GRRG models for understanding and interpreting radiological images and generated text. By including detailed localization and comprehensive annotations of all clinically relevant findings, it provides a valuable resource for developing and evaluating GRRG models from CXR images. PadChest-GR can be downloaded under request from this https URL

[455] arXiv:2411.11514 (replaced) [pdf, html, other]
Title: Learning a Neural Association Network for Self-supervised Multi-Object Tracking
Shuai Li, Michael Burke, Subramanian Ramamoorthy, Juergen Gall
Comments: BMVC2025 poster
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a novel framework to learn data association for multi-object tracking in a self-supervised manner. Fully-supervised learning methods are known to achieve excellent tracking performances, but acquiring identity-level annotations is tedious and time-consuming. Motivated by the fact that in real-world scenarios object motion can be usually represented by a Markov process, we present a novel expectation maximization (EM) algorithm that trains a neural network to associate detections for tracking, without requiring prior knowledge of their temporal correspondences. At the core of our method lies a neural Kalman filter, with an observation model conditioned on associations of detections parameterized by a neural network. Given a batch of frames as input, data associations between detections from adjacent frames are predicted by a neural network followed by a Sinkhorn normalization that determines the assignment probabilities of detections to states. Kalman smoothing is then used to obtain the marginal probability of observations given the inferred states, producing a training objective to maximize this marginal probability using gradient descent. The proposed framework is fully differentiable, allowing the underlying neural model to be trained end-to-end. We evaluate our approach on the challenging MOT17, MOT20, and BDD100K datasets and achieve state-of-the-art results in comparison to self-supervised trackers using public detections.

[456] arXiv:2411.13314 (replaced) [pdf, html, other]
Title: I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception
Jiawei Zhang, Tian-Hao Zhang, Jun Wang, Jiaran Gao, Xinyuan Qian, Xu-Cheng Yin
Comments: Accepted by APSIPA ASC2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Controlling the style and characteristics of speech synthesis is crucial for adapting the output to specific contexts and user requirements. Previous Text-to-speech (TTS) works have focused primarily on the technical aspects of producing natural-sounding speech, such as intonation, rhythm, and clarity. However, they overlook the fact that there is a growing emphasis on spatial perception of synthesized speech, which may provide immersive experience in gaming and virtual reality. To solve this issue, in this paper, we present a novel multi-modal TTS approach, namely Image-indicated Immersive Text-to-speech Synthesis (I2TTS). Specifically, we introduce a scene prompt encoder that integrates visual scene prompts directly into the synthesis pipeline to control the speech generation process. Additionally, we propose a reverberation classification and refinement technique that adjusts the synthesized mel-spectrogram to enhance the immersive experience, ensuring that the involved reverberation condition matches the scene accurately. Experimental results demonstrate that our model achieves high-quality scene and spatial matching without compromising speech naturalness, marking a significant advancement in the field of context-aware speech synthesis. Project demo page: this https URL Index Terms-Speech synthesis, scene prompt, spatial perception

[457] arXiv:2411.14679 (replaced) [pdf, html, other]
Title: Recursive Gaussian Process State Space Model
Tengjie Zheng, Haipeng Chen, Lin Cheng, Shengping Gong, Xu Huang
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)

Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.

[458] arXiv:2411.16073 (replaced) [pdf, html, other]
Title: Soft-TransFormers for Continual Learning
Haeyong Kang, Chang D. Yoo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Inspired by the Well-initialized Lottery Ticket Hypothesis (WLTH), which provides suboptimal fine-tuning solutions, we propose a novel fully fine-tuned continual learning (CL) method referred to as Soft-TransFormers (Soft-TF). Soft-TF sequentially learns and selects an optimal soft-network for each task. During sequential training in CL, a well-initialized Soft-TF mask optimizes the weights of sparse layers to obtain task-adaptive soft (real-valued) networks, while keeping the well-pre-trained layer parameters frozen. In inference, the identified task-adaptive network of Soft-TF masks the parameters of the pre-trained network, mapping to an optimal solution for each task and minimizing Catastrophic Forgetting (CF) - the soft-masking preserves the knowledge of the pre-trained network. Extensive experiments on the Vision Transformer (ViT) and the Language Transformer (Bert) demonstrate the effectiveness of Soft-TF, achieving state-of-the-art performance across Vision and Language Class Incremental Learning (CIL) scenarios.

[459] arXiv:2411.17960 (replaced) [pdf, html, other]
Title: Calibrating DRAMPower Model for HPC: A Runtime Perspective from Real-Time Measurements
Xinyu Shi, Dina Ali Abdelhamid, Thomas Ilsche, Saeideh Alinezhad Chamazcoti, Timon Evenblij, Mohit Gupta, Francky Catthoor
Comments: Supplementary Materials for Computer Architecture Letter
Subjects: Hardware Architecture (cs.AR)

Main memory's rising energy consumption has emerged as a critical challenge in modern computing architectures, particularly in large-scale systems, driven by frequent access patterns, growing data volumes, and insufficient power management strategies. Accurate modeling of DRAM power consumption is essential to address this challenge and optimize energy efficiency. However, existing modeling tools often rely on vendor-provided datasheet values that are obtained under worst-case or idealized conditions. As a result, they fail to capture important system-level factors, such as temperature variations, chip aging, and workload-induced variability, which leads to significant discrepancies between estimated and actual power consumption observed in real deployments. In this work, we propose a runtime calibration methodology for the DRAMPower model using energy measurements collected from real-system experiments. By applying custom memory benchmarks on an HPC cluster and leveraging fine-grained power monitoring infrastructure, we refine key current parameters (IDD values) in the model. Our calibration reduces the average energy estimation error to less than 5%, substantially improving modeling accuracy and making DRAMPower a more reliable tool for power-aware system design and optimization on the target server platform.

[460] arXiv:2411.19475 (replaced) [pdf, html, other]
Title: GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis
Ruoqi Wang, Haitao Wang, Qiong Luo
Comments: ACM MM 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Astrophysics of Galaxies (astro-ph.GA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Galaxy morphology analysis involves studying galaxies based on their shapes and structures. For such studies, fundamental tasks include identifying and classifying galaxies in astronomical images, as well as retrieving visually or structurally similar galaxies through similarity search. Existing methods either directly train domain-specific foundation models on large, annotated datasets or fine-tune vision foundation models on a smaller set of images. The former is effective but costly, while the latter is more resource-efficient but often yields lower accuracy. To address these challenges, we introduce GalaxAlign, a multimodal approach inspired by how citizen scientists identify galaxies in astronomical images by following textual descriptions and matching schematic symbols. Specifically, GalaxAlign employs a tri-modal alignment framework to align three types of data during fine-tuning: (1) schematic symbols representing galaxy shapes and structures, (2) textual labels for these symbols, and (3) galaxy images. By incorporating multimodal instructions, GalaxAlign eliminates the need for expensive pretraining and enhances the effectiveness of fine-tuning. Experiments on galaxy classification and similarity search demonstrate that our method effectively fine-tunes general pre-trained models for astronomical tasks by incorporating domain-specific multi-modal knowledge. Code is available at this https URL.

[461] arXiv:2412.00177 (replaced) [pdf, html, other]
Title: LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, Anand Bhattad
Comments: Corrects an evaluation bug in Table 1 due to a data normalization error. Thanks to the Sony PlayStation team for discovering and reporting the issue. The paper's core contributions, qualitative results, and user study are unaffected. We also include a minor update to the method to further improve result quality. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

We introduce LumiNet, a novel architecture that leverages generative models and latent intrinsic representations for effective lighting transfer. Given a source image and a target lighting image, LumiNet synthesizes a relit version of the source scene that captures the target's lighting. Our approach makes two key contributions: a data curation strategy from the StyleGAN-based relighting model for our training, and a modified diffusion-based ControlNet that processes both latent intrinsic properties from the source image and latent extrinsic properties from the target image. We further improve lighting transfer through a learned adaptor (MLP) that injects the target's latent extrinsic properties via cross-attention and fine-tuning.
Unlike traditional ControlNet, which generates images with conditional maps from a single scene, LumiNet processes latent representations from two different images - preserving geometry and albedo from the source while transferring lighting characteristics from the target. Experiments demonstrate that our method successfully transfers complex lighting phenomena including specular highlights and indirect illumination across scenes with varying spatial layouts and materials, outperforming existing approaches on challenging indoor scenes using only images as input.

[462] arXiv:2412.09049 (replaced) [pdf, html, other]
Title: Dial-In LLM: Human-Aligned LLM-in-the-loop Intent Clustering for Customer Service Dialogues
Mengze Hong, Wailing Ng, Chen Jason Zhang, Yuanfeng Song, Di Jiang
Comments: Accepted by EMNLP 2025 Main Conference
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Discovering customer intentions is crucial for automated service agents, yet existing intent clustering methods often fall short due to their reliance on embedding distance metrics and neglect of underlying semantic structures. To address these limitations, we propose an LLM-in-the-loop (LLM-ITL) intent clustering framework, integrating the language understanding capabilities of LLMs into conventional clustering algorithms. Specifically, this paper (1) examines the effectiveness of fine-tuned LLMs in semantic coherence evaluation and intent cluster naming, achieving over 95% accuracy aligned with human judgments; (2) designs an LLM-ITL framework that facilitates the iterative discovery of coherent intent clusters and the optimal number of clusters; and (3) introduces context-aware techniques tailored for customer service dialogue. Since existing English benchmarks lack sufficient semantic diversity and intent coverage, we further present a comprehensive Chinese dialogue intent dataset comprising over 100k real customer service calls with 1,507 human-annotated clusters. The proposed approaches significantly outperform LLM-guided baselines, achieving notable improvements in clustering quality, cost efficiency, and downstream applications. Combined with several best practices, our findings highlight the prominence of LLM-in-the-loop techniques for scalable dialogue data mining.

[463] arXiv:2412.12278 (replaced) [pdf, html, other]
Title: Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing DeepFake detection techniques primarily focus on facial manipulations, such as face-swapping or lip-syncing. However, advancements in text-to-video (T2V) and image-to-video (I2V) generative models now allow fully AI-generated synthetic content and seamless background alterations, challenging face-centric detection methods and demanding more versatile approaches.
To address this, we introduce the \underline{U}niversal \underline{N}etwork for \underline{I}dentifying \underline{T}ampered and synth\underline{E}tic videos (\texttt{UNITE}) model, which, unlike traditional detectors, captures full-frame manipulations. \texttt{UNITE} extends detection capabilities to scenarios without faces, non-human subjects, and complex background modifications. It leverages a transformer-based architecture that processes domain-agnostic features extracted from videos via the SigLIP-So400M foundation model. Given limited datasets encompassing both facial/background alterations and T2V/I2V content, we integrate task-irrelevant data alongside standard DeepFake datasets in training. We further mitigate the model's tendency to over-focus on faces by incorporating an attention-diversity (AD) loss, which promotes diverse spatial attention across video frames. Combining AD loss with cross-entropy improves detection performance across varied contexts. Comparative evaluations demonstrate that \texttt{UNITE} outperforms state-of-the-art detectors on datasets (in cross-data settings) featuring face/background manipulations and fully synthetic T2V/I2V videos, showcasing its adaptability and generalizable detection capabilities.

[464] arXiv:2412.16490 (replaced) [pdf, html, other]
Title: BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization
Jiayi Chen, Yubin Ke, He Wang
Comments: ICRA 2025
Subjects: Robotics (cs.RO)

Robotic dexterous grasping is important for interacting with the environment. To unleash the potential of data-driven models for dexterous grasping, a large-scale, high-quality dataset is essential. While gradient-based optimization offers a promising way for constructing such datasets, previous works suffer from limitations, such as inefficiency, strong assumptions in the grasp quality energy, or limited object sets for experiments. Moreover, the lack of a standard benchmark for comparing different methods and datasets hinders progress in this field. To address these challenges, we develop a highly efficient synthesis system and a comprehensive benchmark with MuJoCo for dexterous grasping. We formulate grasp synthesis as a bilevel optimization problem, combining a novel lower-level quadratic programming (QP) with an upper-level gradient descent process. By leveraging recent advances in CUDA-accelerated robotic libraries and GPU-based QP solvers, our system can parallelize thousands of grasps and synthesize over 49 grasps per second on a single 3090 GPU. Our synthesized grasps for Shadow, Allegro, and Leap hands all achieve a success rate above 75% in simulation, with a penetration depth under 1 mm, outperforming existing baselines on nearly all metrics. Compared to the previous large-scale dataset, DexGraspNet, our dataset significantly improves the performance of learning models, with a success rate from around 40% to 80% in simulation. Real-world testing of the trained model on the Shadow Hand achieves an 81% success rate across 20 diverse objects. The codes and datasets are released on our project page: this https URL.

[465] arXiv:2501.06988 (replaced) [pdf, other]
Title: Fully Differentiable Boundary Element Solver for Hydrodynamic Sensitivity Analysis of Wave-Structure Interactions
Kapil Khanal, Carlos A. Michelén Ströfer, Matthieu Ancellin, Maha Haji
Journal-ref: Applied Ocean Research, 2025
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Accurately predicting wave-structure interactions is critical for the effective design and analysis of marine structures. This is typically achieved using solvers that employ the boundary element method (BEM), which relies on linear potential flow theory. Precise estimation of the sensitivity of these interactions is equally important for system-level applications such as design optimization. Current BEM solvers are unable to provide these sensitivities as they do not support automatic differentiation (AD). To address these challenges, we have developed a fully differentiable BEM solver

[466] arXiv:2501.08848 (replaced) [pdf, html, other]
Title: RouteNet-Gauss: Hardware-Enhanced Network Modeling with Machine Learning
Carlos Güemes-Palau, Miquel Ferriol-Galmés, Jordi Paillisse-Vilanova, Albert López-Brescó, Pere Barlet-Ros, Albert Cabellos-Aparicio
Comments: 13 pages, 11 figures
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Network simulation is pivotal in network modeling, assisting with tasks ranging from capacity planning to performance estimation. Traditional approaches such as Discrete Event Simulation (DES) face limitations in terms of computational cost and accuracy. This paper introduces RouteNet-Gauss, a novel integration of a testbed network with a Machine Learning (ML) model to address these challenges. By using the testbed as a hardware accelerator, RouteNet-Gauss generates training datasets rapidly and simulates network scenarios with high fidelity to real-world conditions. Experimental results show that RouteNet-Gauss significantly reduces prediction errors by up to 95% and achieves a 488x speedup in inference time compared to state-of-the-art DES-based methods. RouteNet-Gauss's modular architecture is dynamically constructed based on the specific characteristics of the network scenario, such as topology and routing. This enables it to understand and generalize to different network configurations beyond those seen during training, including networks up to 10x larger. Additionally, it supports Temporal Aggregated Performance Estimation (TAPE), providing configurable temporal granularity and maintaining high accuracy in flow performance metrics. This approach shows promise in improving both simulation efficiency and accuracy, offering a valuable tool for network operators.

[467] arXiv:2501.09997 (replaced) [pdf, html, other]
Title: Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models
Qiang Liu, Xinlong Chen, Yue Ding, Bowen Song, Weiqiang Wang, Shu Wu, Liang Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Hallucination has emerged as a significant barrier to the effective application of Large Language Models (LLMs). In this work, we introduce a novel Attention-Guided SElf-Reflection (AGSER) approach for zero-shot hallucination detection in LLMs. The AGSER method utilizes attention contributions to categorize the input query into attentive and non-attentive queries. Each query is then processed separately through the LLMs, allowing us to compute consistency scores between the generated responses and the original answer. The difference between the two consistency scores serves as a hallucination estimator. In addition to its efficacy in detecting hallucinations, AGSER notably reduces computational overhead, requiring only three passes through the LLM and utilizing two sets of tokens. We have conducted extensive experiments with four widely-used LLMs across three different hallucination benchmarks, demonstrating that our approach significantly outperforms existing methods in zero-shot hallucination detection.

[468] arXiv:2501.11992 (replaced) [pdf, html, other]
Title: Survey on Hand Gesture Recognition from Visual Input
Manousos Linardakis, Iraklis Varlamis, Georgios Th. Papadopoulos
Comments: 37 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Hand gesture recognition has become an important research area, driven by the growing demand for human-computer interaction in fields such as sign language recognition, virtual and augmented reality, and robotics. Despite the rapid growth of the field, there are few surveys that comprehensively cover recent research developments, available solutions, and benchmark datasets. This survey addresses this gap by examining the latest advancements in hand gesture and 3D hand pose recognition from various types of camera input data including RGB images, depth images, and videos from monocular or multiview cameras, examining the differing methodological requirements of each approach. Furthermore, an overview of widely used datasets is provided, detailing their main characteristics and application domains. Finally, open challenges such as achieving robust recognition in real-world environments, handling occlusions, ensuring generalization across diverse users, and addressing computational efficiency for real-time applications are highlighted to guide future research directions. By synthesizing the objectives, methodologies, and applications of recent studies, this survey offers valuable insights into current trends, challenges, and opportunities for future research in human hand gesture recognition.

[469] arXiv:2501.12553 (replaced) [pdf, html, other]
Title: ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
Yanming Xiu, Tim Scargill, Maria Gorlatova
Comments: The paper has been accepted to the 2025 IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR), and selected for publication in the 2025 IEEE Transactions on Visualization and Computer Graphics (TVCG) special issue
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In Augmented Reality (AR), virtual content enhances user experience by providing additional information. However, improperly positioned or designed virtual content can be detrimental to task performance, as it can impair users' ability to accurately interpret real-world information. In this paper we examine two types of task-detrimental virtual content: obstruction attacks, in which virtual content prevents users from seeing real-world objects, and information manipulation attacks, in which virtual content interferes with users' ability to accurately interpret real-world information. We provide a mathematical framework to characterize these attacks and create a custom open-source dataset for attack evaluation. To address these attacks, we introduce ViDDAR (Vision language model-based Task-Detrimental content Detector for Augmented Reality), a comprehensive full-reference system that leverages Vision Language Models (VLMs) and advanced deep learning techniques to monitor and evaluate virtual content in AR environments, employing a user-edge-cloud architecture to balance performance with low latency. To the best of our knowledge, ViDDAR is the first system to employ VLMs for detecting task-detrimental content in AR settings. Our evaluation results demonstrate that ViDDAR effectively understands complex scenes and detects task-detrimental content, achieving up to 92.15% obstruction detection accuracy with a detection latency of 533 ms, and an 82.46% information manipulation content detection accuracy with a latency of 9.62 s.

[470] arXiv:2501.13199 (replaced) [pdf, html, other]
Title: Symbolic Control for Autonomous Docking of Marine Surface Vessels
Elizabeth Dietrich, Emir Cem Gezer, Bingzhuo Zhong, Murat Arcak, Majid Zamani, Roger Skjetne, Asgeir Johan Sørensen
Subjects: Systems and Control (eess.SY)

We develop a hierarchical control architecture for autonomous docking maneuvers of a dynamic positioning vessel and provide formal safety guarantees. At the upper-level, we treat the vessel's desired surge, sway, and yaw velocities as control inputs and synthesize a symbolic controller in real-time. The desired velocities are then executed by the vessel's low-level velocity feedback control loop. We next investigate methods to optimize the performance of the proposed control scheme. The results are evaluated on a simulation model of a marine surface vessel in the presence of static obstacles and, for the first time, through physical experiments on a scale model vessel.

[471] arXiv:2501.19073 (replaced) [pdf, html, other]
Title: Pareto-frontier Entropy Search with Variational Lower Bound Maximization
Masanori Ishikura, Masayuki Karasuyama
Comments: Published at ICML2025
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This study considers multi-objective Bayesian optimization (MOBO) through the information gain of the Pareto-frontier. To calculate the information gain, a predictive distribution conditioned on the Pareto-frontier plays a key role, which is defined as a distribution truncated by the Pareto-frontier. However, it is usually impossible to obtain the entire Pareto-frontier in a continuous domain, and therefore, the complete truncation cannot be known. We consider an approximation of the truncate distribution by using a mixture distribution consisting of two possible approximate truncation obtainable from a subset of the Pareto-frontier, which we call over- and under-truncation. Since the optimal balance of the mixture is unknown beforehand, we propose optimizing the balancing coefficient through the variational lower bound maximization framework, by which the approximation error of the information gain can be minimized. Our empirical evaluation demonstrates the effectiveness of the proposed method particularly when the number of objective functions is large.

[472] arXiv:2502.01684 (replaced) [pdf, html, other]
Title: Predict, Cluster, Refine: A Joint Embedding Predictive Self-Supervised Framework for Graph Representation Learning
Srinitish Srinivasan, Omkumar CU
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Graph representation learning has emerged as a cornerstone for tasks like node classification and link prediction, yet prevailing self-supervised learning (SSL) methods face challenges such as computational inefficiency, reliance on contrastive objectives, and representation collapse. Existing approaches often depend on feature reconstruction, negative sampling, or complex decoders, which introduce training overhead and hinder generalization. Further, current techniques which address such limitations fail to account for the contribution of node embeddings to a certain prediction in the absence of labeled nodes. To address these limitations, we propose a novel joint embedding predictive framework for graph SSL that eliminates contrastive objectives and negative sampling while preserving semantic and structural information. Additionally, we introduce a semantic-aware objective term that incorporates pseudo-labels derived from Gaussian Mixture Models (GMMs), enhancing node discriminability by evaluating latent feature contributions. Extensive experiments demonstrate that our framework outperforms state-of-the-art graph SSL methods across benchmarks, achieving superior performance without contrastive loss or complex decoders. Key innovations include (1) a non-contrastive, view-invariant joint embedding predictive architecture, (2) Leveraging single context and multiple targets relationship between subgraphs, and (3) GMM-based pseudo-label scoring to capture semantic contributions. This work advances graph SSL by offering a computationally efficient, collapse-resistant paradigm that bridges spatial and semantic graph features for downstream tasks. The code for our paper can be found at this https URL

[473] arXiv:2502.04387 (replaced) [pdf, html, other]
Title: FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs
Royson Lee, Minyoung Kim, Fady Rezk, Rui Li, Stylianos I. Venieris, Timothy Hospedales
Comments: Preprint
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Federated learning (FL) has enabled the training of multilingual large language models (LLMs) on diverse and decentralized multilingual data, especially on low-resource languages. To improve client-specific performance, personalization via the use of parameter-efficient fine-tuning (PEFT) modules such as LoRA is common. This involves a personalization strategy (PS), such as the design of the PEFT adapter structures (e.g., in which layers to add LoRAs and what ranks) and choice of hyperparameters (e.g., learning rates) for fine-tuning. Instead of manual PS configuration, we propose FedP$^2$EFT, a federated learning-to-personalize method for multilingual LLMs in cross-device FL settings. Unlike most existing PEFT structure selection methods, which are prone to overfitting low-data regimes, FedP$^2$EFT collaboratively learns the optimal personalized PEFT structure for each client via Bayesian sparse rank selection. Evaluations on both simulated and real-world multilingual FL benchmarks demonstrate that FedP$^2$EFT largely outperforms existing personalized fine-tuning methods, while complementing other existing FL methods. Code is available at this https URL.

[474] arXiv:2502.05961 (replaced) [pdf, html, other]
Title: The Human Labour of Data Work: Capturing Cultural Diversity through World Wide Dishes
Siobhan Mackenzie Hall, Samantha Dalal, Raesetje Sefala, Foutse Yuehgoh, Aisha Alaagib, Imane Hamzaoui, Shu Ishida, Jabez Magomere, Lauren Crais, Aya Salama, Tejumade Afonja
Subjects: Computers and Society (cs.CY)

This paper provides guidance for building and maintaining infrastructure for participatory AI efforts by sharing reflections on building World Wide Dishes (WWD), a bottom-up, community-led image and text dataset of culinary dishes and associated cultural customs. We present WWD as an example of participatory dataset creation, where community members both guide the design of the research process and contribute to the crowdsourced dataset. This approach incorporates localised expertise and knowledge to address the limitations of web-scraped Internet datasets acknowledged in the Participatory AI discourse. We show that our approach can result in curated, high-quality data that supports decentralised contributions from communities that do not typically contribute to datasets due to a variety of systemic factors. Our project demonstrates the importance of participatory mediators in supporting community engagement by identifying the kinds of labour they performed to make WWD possible. We surface three dimensions of labour performed by participatory mediators that are crucial for participatory dataset construction: building trust with community members, making participation accessible, and contextualising community values to support meaningful data collection. Drawing on our findings, we put forth five lessons for building infrastructure to support future participatory AI efforts.

[475] arXiv:2502.06380 (replaced) [pdf, html, other]
Title: Structure-preserving contrastive learning for spatial time series
Yiru Jiao, Sander van Cranenburgh, Simeon Calvert, Hans van Lint
Comments: TL;DR: Preserving certain structures of similarity relations in spatio-temporal data can improve downstream task performance via contrastive learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The effectiveness of neural network models largely relies on learning meaningful latent patterns from data, where self-supervised learning of informative representations can enhance model performance and generalisability. However, self-supervised representation learning for spatially characterised time series, which are ubiquitous in transportation domain, poses unique challenges due to the necessity of maintaining fine-grained spatio-temporal similarities in the latent space. In this study, we introduce two structure-preserving regularisers for the contrastive learning of spatial time series: one regulariser preserves the topology of similarities between instances, and the other preserves the graph geometry of similarities across spatial and temporal dimensions. To balance the contrastive learning objective and the need for structure preservation, we propose a dynamic weighting mechanism that adaptively manages this trade-off and stabilises training. We validate the proposed method through extensive experiments, including multivariate time series classification to demonstrate its general applicability, as well as macroscopic and microscopic traffic prediction to highlight its particular usefulness in encoding traffic interactions. Across all tasks, our method preserves the similarity structures more effectively and improves state-of-the-art task performances. This method can be integrated with an arbitrary neural network model and is particularly beneficial for time series data with spatial or geographical features. Furthermore, our findings suggest that well-preserved similarity structures in the latent space indicate more informative and useful representations. This provides insights to design more effective neural networks for data-driven transportation research. Our code is made openly accessible with all resulting data at this https URL

[476] arXiv:2502.11128 (replaced) [pdf, html, other]
Title: FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin
Comments: Accepted by ACM Multimedia 2025
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching. By leveraging the autoregressive nature of language models and the generative efficacy of flow matching, FELLE effectively predicts continuous-valued tokens (mel-spectrograms). For each continuous-valued token, FELLE modifies the general prior distribution in flow matching by incorporating information from the previous step, improving coherence and stability. Furthermore, to enhance synthesis quality, FELLE introduces a coarse-to-fine flow-matching mechanism, generating continuous-valued tokens hierarchically, conditioned on the language model's output. Experimental results demonstrate the potential of incorporating flow-matching techniques in autoregressive mel-spectrogram modeling, leading to significant improvements in TTS generation quality, as shown in this https URL.

[477] arXiv:2502.11951 (replaced) [pdf, other]
Title: Quantum Data Encoding and Variational Algorithms: A Framework for Hybrid Quantum Classical Machine Learning
Bhavna Bose, Saurav Verma
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Quantum Physics (quant-ph)

The development of quantum computers has been the stimulus that enables the realization of Quantum Machine Learning (QML), an area that integrates the calculational framework of quantum mechanics with the adaptive properties of classical machine learning. This article suggests a broad architecture that allows the connection between classical data pipelines and quantum algorithms, hybrid quantum-classical models emerge as a promising route to scalable and near-term quantum benefit. At the core of this paradigm lies the Classical-Quantum (CQ) paradigm, in which the qubit states of high-dimensional classical data are encoded using sophisticated classical encoding strategies which encode the data in terms of amplitude and angle of rotation, along with superposition mapping. These techniques allow compression of information exponentially into Hilbert space representations, which, together with reduced sample complexity, allows greater feature expressivity. We also examine variational quantum circuits, quantum gates expressed as trainable variables that run with classical optimizers to overcome decoherence, noise, and gate-depth constraints of the existing Noisy Intermediate-Scale Quantum (NISQ) devices. Experimental comparisons with a Quantum Naive Bayes classifier prove that even small quantum circuits can approximate probabilistic inference with competitive accuracy compared to classical benchmarks, and have much better robustness to noisy data distributionsThis model does not only explain the algorithmic and architectural design of QML, it also offers a roadmap to the implementation of quantum kernels, variational algorithms, and hybrid feedback loops into practice, including optimization, computer vision, and medical diagnostics. The results support the idea that hybrid architectures with strong data encoding and adaptive error protection are key to moving QML out of theory to practice.

[478] arXiv:2502.14791 (replaced) [pdf, html, other]
Title: Rapid Word Learning Through Meta In-Context Learning
Wentao Wang, Guangyuan Jiang, Tal Linzen, Brenden M. Lake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Humans can quickly learn a new word from a few illustrative examples, and then systematically and flexibly use it in novel contexts. Yet the abilities of current language models for few-shot word learning, and methods for improving these abilities, are underexplored. In this study, we introduce a novel method, Meta-training for IN-context learNing Of Words (Minnow). This method trains language models to generate new examples of a word's usage given a few in-context examples, using a special placeholder token to represent the new word. This training is repeated on many new words to develop a general word-learning ability. We find that training models from scratch with Minnow on human-scale child-directed language enables strong few-shot word learning, comparable to a large language model (LLM) pre-trained on orders of magnitude more data. Furthermore, through discriminative and generative evaluations, we demonstrate that finetuning pre-trained LLMs with Minnow improves their ability to discriminate between new words, identify syntactic categories of new words, and generate reasonable new usages and definitions for new words, based on one or a few in-context examples. These findings highlight the data efficiency of Minnow and its potential to improve language model performance in word learning tasks.

[479] arXiv:2502.15785 (replaced) [pdf, html, other]
Title: Investigating a Model-Agnostic and Imputation-Free Approach for Irregularly-Sampled Multivariate Time-Series Modeling
Abhilash Neog, Arka Daw, Sepideh Fatemi Khorasgani, Medha Sawhney, Aanish Pradhan, Mary E. Lofton, Bennett J. McAfee, Adrienne Breef-Pilz, Heather L. Wander, Dexter W Howard, Cayelan C. Carey, Paul Hanson, Anuj Karpatne
Comments: 21 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Modeling Irregularly-sampled and Multivariate Time Series (IMTS) is crucial across a variety of applications where different sets of variates may be missing at different time-steps due to sensor malfunctions or high data acquisition costs. Existing approaches for IMTS either consider a two-stage impute-then-model framework or involve specialized architectures specific to a particular model and task. We perform a series of experiments to derive novel insights about the performance of IMTS methods on a variety of semi-synthetic and real-world datasets for both classification and forecasting. We also introduce Missing Feature-aware Time Series Modeling (MissTSM) or MissTSM, a novel model-agnostic and imputation-free approach for IMTS modeling. We show that MissTSM shows competitive performance compared to other IMTS approaches, especially when the amount of missing values is large and the data lacks simplistic periodic structures - conditions common to real-world IMTS applications.

[480] arXiv:2502.18179 (replaced) [pdf, html, other]
Title: Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst
Comments: accepted at EMNLP'25
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study investigates the sub-problems and methods within these core challenges, such as input representation, chunking, prompting, selection of LLMs, and multimodal models. It examines the effect of different design choices through LayIE-LLM, a new, open-source, layout-aware IE test suite, benchmarking against traditional, fine-tuned IE models. The results on two IE datasets show that LLMs require adjustment of the IE pipeline to achieve competitive performance: the optimized configuration found with LayIE-LLM achieves 13.3--37.5 F1 points more than a general-practice baseline configuration using the same LLM. To find a well-working configuration, we develop a one-factor-at-a-time (OFAT) method that achieves near-optimal results. Our method is only 0.8--1.8 points lower than the best full factorial exploration with a fraction (2.8%) of the required computation. Overall, we demonstrate that, if well-configured, general-purpose LLMs match the performance of specialized models, providing a cost-effective, finetuning-free alternative. Our test-suite is available at this https URL.

[481] arXiv:2502.18893 (replaced) [pdf, html, other]
Title: Distributed Online Task Assignment via Inexact ADMM for unplanned online tasks and its Applications to Security
Ziqi Yang, Roberto Tron
Comments: IEEE Transactions on Control of Network Systems
Subjects: Multiagent Systems (cs.MA)

In multi-robot system (MRS) applications, efficient task assignment is essential not only for coordinating agents and ensuring mission success but also for maintaining overall system security. In this work, we first propose an optimization-based distributed task assignment algorithm that dynamically assigns mandatory security-critical tasks and optional tasks among teams. Leveraging an inexact Alternating Direction Method of Multipliers (ADMM)-based approach, we decompose the task assignment problem into separable and non-separable subproblems. The non-separable subproblems are transformed into an inexact ADMM update by projected gradient descent, which can be performed through several communication steps within the team.
In the second part of this paper, we formulate a comprehensive framework that enables MRS under plan-deviation attacks to handle online tasks without compromising security. The process begins with a security analysis that determines whether an online task can be executed securely by a robot and, if so, the required time and location for the robot to rejoin the team. Next, the proposed task assignment algorithm is used to allocate security-related tasks and verified online tasks. Finally, task fulfillment is managed using a Control Lyapunov Function (CLF)-based controller, while security enforcement is ensured through a Control Barrier Function (CBF)-based security filter. Through simulations, we demonstrate that the proposed framework allows MRS to effectively respond to unplanned online tasks while maintaining security guarantees.

[482] arXiv:2502.20336 (replaced) [pdf, html, other]
Title: A posteriori certification for neural network approximations to PDEs
Lewin Ernst, Nikolaos Rekatsinas, Karsten Urban
Subjects: Numerical Analysis (math.NA)

We propose rigorous lower and upper bounds for neural network (NN) approximations to PDEs by efficiently computing the Riesz representations of suitable extension and restrictions of the NN residual towards geometrically simpler domains, which are either embedded or enveloping the original domain. Error bounds are proven and detailed for elliptic as well as parabolic problems. Numerical experiments show the good quantitative behaviour of the derived upper and lower error bounds.

[483] arXiv:2502.21102 (replaced) [pdf, html, other]
Title: Minimal positive Markov realizations
Hamed Taghavian, Jens Sjölund
Subjects: Systems and Control (eess.SY)

Finding a positive state-space realization with the minimum dimension for a given transfer function is an open problem in control theory. In this paper, we focus on positive realizations in Markov form and propose a linear programming approach that computes them with a minimum dimension. Such minimum dimension of positive Markov realizations is an upper bound of the minimal positive realization dimension. However, we show that these two dimensions are equal for certain systems.

[484] arXiv:2503.00479 (replaced) [pdf, html, other]
Title: Bayesian Active Learning for Multi-Criteria Comparative Judgement in Educational Assessment
Andy Gray, Alma Rahat, Tom Crick, Stephen Lindsay
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Comparative Judgement (CJ) provides an alternative assessment approach by evaluating work holistically rather than breaking it into discrete criteria. This method leverages human ability to make nuanced comparisons, yielding more reliable and valid assessments. CJ aligns with real-world evaluations, where overall quality emerges from the interplay of various elements. However, rubrics remain widely used in education, offering structured criteria for grading and detailed feedback. This creates a gap between CJ's holistic ranking and the need for criterion-based performance breakdowns.
This paper addresses this gap using a Bayesian approach. We build on Bayesian CJ (BCJ) by Gray et al., which directly models preferences instead of using likelihoods over total scores, allowing for expected ranks with uncertainty estimation. Their entropy-based active learning method selects the most informative pairwise comparisons for assessors. We extend BCJ to handle multiple independent learning outcome (LO) components, defined by a rubric, enabling both holistic and component-wise predictive rankings with uncertainty estimates. Additionally, we propose a method to aggregate entropies and identify the most informative comparison for assessors. Experiments on synthetic and real data demonstrate our method's effectiveness. Finally, we address a key limitation of BCJ, which is the inability to quantify assessor agreement. We show how to derive agreement levels, enhancing transparency in assessment.

[485] arXiv:2503.00634 (replaced) [pdf, html, other]
Title: Efficiently Editing Mixture-of-Experts Models with Compressed Experts
Yifei He, Yang Liu, Chen Liang, Hany Hassan Awadalla
Comments: EMNLP 2025 Findings
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Mixture-of-Experts (MoE) models have become a key approach for scaling large language models efficiently by activating only a subset of experts during training and inference. Typically, the number of activated experts presents a trade-off: fewer experts reduce computational costs, while more experts improve performance. Recent studies reveal that not all activated experts contribute equally to model performance, with some providing minimal utility, particularly when finetuning pretrained MoE models for specialized downstream tasks. The co-existence of significant and redundant parameters in experts provides us an opportunity to reduce the number of activated experts while maintaining model performance. In this work, we propose the concept of compressed experts, lightweight modules that serve as compact representations of full experts. Our approach preserves the most important experts while replacing other auxiliary activated experts with compressed experts. The reduction of active parameters significantly lowers inference costs while achieving comparable performance. Extensive experiments on models including Phi-MoE and OLMoE demonstrate that compressed experts recover over 90% of full expert performance across various tasks while reducing more than 30% active parameters and saving 20% in inference costs. This approach enables efficient deployment of MoE models in resource-constrained settings and facilitates scaling to larger models with manageable overhead. Our code is available at this https URL.

[486] arXiv:2503.03561 (replaced) [pdf, html, other]
Title: Transformer-Based Power Optimization for Max-Min Fairness in Cell-Free Massive MIMO
Irched Chafaa, Giacomo Bacci, Luca Sanguinetti
Comments: Journal: IEEE Wireless Communications Letters Publication Date: AUGUST 2025
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Power allocation is an important task in wireless communication networks. Classical optimization algorithms and deep learning methods, while effective in small and static scenarios, become either computationally demanding or unsuitable for large and dynamic networks with varying user loads. This letter explores the potential of transformer-based deep learning models to address these challenges. We propose a transformer neural network to jointly predict optimal uplink and downlink power using only user and access point positions. The max-min fairness problem in cell-free massive multiple input multiple output systems is considered. Numerical results show that the trained model provides near-optimal performance and adapts to varying numbers of users and access points without retraining, additional processing, or updating its neural network architecture. This demonstrates the effectiveness of the proposed model in achieving robust and flexible power allocation for dynamic networks.

[487] arXiv:2503.05546 (replaced) [pdf, html, other]
Title: Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning
Raphael Trumpp, Ansgar Schäfftlein, Mirco Theile, Marco Caccamo
Comments: Reinforcement Learning Conference 2025
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

As image-based deep reinforcement learning tackles more challenging tasks, increasing model size has become an important factor in improving performance. Recent studies achieved this by focusing on the parameter efficiency of scaled networks, typically using Impala-CNN, a 15-layer ResNet-inspired network, as the image encoder. However, while Impala-CNN evidently outperforms older CNN architectures, potential advancements in network design for deep reinforcement learning-specific image encoders remain largely unexplored. We find that replacing the flattening of output feature maps in Impala-CNN with global average pooling leads to a notable performance improvement. This approach outperforms larger and more complex models in the Procgen Benchmark, particularly in terms of generalization. We call our proposed encoder model Impoola-CNN. A decrease in the network's translation sensitivity may be central to this improvement, as we observe the most significant gains in games without agent-centered observations. Our results demonstrate that network scaling is not just about increasing model size - efficient network design is also an essential factor. We make our code available at this https URL.

[488] arXiv:2503.05720 (replaced) [pdf, html, other]
Title: That is Unacceptable: the Moral Foundations of Canceling
Soda Marem Lo, Oscar Araque, Rajesh Sharma, Marco Antonio Stranisci
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

Canceling is a morally-driven phenomenon that hinders the development of safe social media platforms and contributes to ideological polarization. To address this issue we present the Canceling Attitudes Detection (CADE) dataset, an annotated corpus of canceling incidents aimed at exploring the factors of disagreements in evaluating people canceling attitudes on social media. Specifically, we study the impact of annotators' morality in their perception of canceling, showing that morality is an independent axis for the explanation of disagreement on this phenomenon. Annotator's judgments heavily depend on the type of controversial events and involved celebrities. This shows the need to develop more event-centric datasets to better understand how harms are perpetrated in social media and to develop more aware technologies for their detection.

[489] arXiv:2503.07702 (replaced) [pdf, html, other]
Title: A Reliable Self-Organized Distributed Complex Network for Communication of Smart Agents
Mehdi Bakhshipoor, Yousef Azizi, Seyed Ehsan Nedaaee Oskoee
Subjects: Multiagent Systems (cs.MA)

Collaboration is a fundamental and essential characteristic of many complex systems, ranging from ant colonies to human societies. Each component within a complex system interacts with others, even at a distance, to accomplish a given task. A network of collaboration can be defined to study the collective behavior of such systems within the framework of complex networks. The nodes in these networks may represent simple organisms or more sophisticated intelligent agents, such as humans. In this study, we utilize intelligent agents (nodes) trained through reinforcement learning techniques to establish connections with their neighbors, ultimately leading to the emergence of a large-scale communication cluster. Notably, there is no centralized administrator; instead, agents must adjust their connections based on information obtained from local observations. The connection strategy is formulated using a physical Hamiltonian, thereby categorizing this intelligent system under the paradigm of "Physics-Guided Machine Learning". The resulting self-organized distributed complex network has numerous industrial applications, including constructing Internet of Things (IoT) networks. The design of such networks often encounters challenges, the most critical of which is ensuring effective connectivity for reliable communication while optimizing energy consumption. IoT networks are inherently dynamic in many real-world applications, such as Vehicle Ad-hoc Networks (VANETs), where nodes are mobile, and the connection topology evolves rapidly over time. These systems require a robust and rapidly self-organizing communication network. Our findings demonstrate that the proposed intelligent agents facilitate the formation of self-organized complex networks capable of maintaining network-wide connectivity across various dynamic scenarios while simultaneously optimizing average electrical power consumption.

[490] arXiv:2503.08580 (replaced) [pdf, html, other]
Title: Comparing Next-Day Wildfire Predictability of MODIS and VIIRS Satellite Data
Justus Karlsson, Yonghao Xu, Amanda Berg, Leif Haglund
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multiple studies have performed next-day fire prediction using satellite imagery. Two main satellites are used to detect wildfires: MODIS and VIIRS. Both satellites provide fire mask products, called MOD14 and VNP14, respectively. Studies have used one or the other, but there has been no comparison between them to determine which might be more suitable for next-day fire prediction. In this paper, we first evaluate how well VIIRS and MODIS data can be used to forecast wildfire spread one day ahead. We find that the model using VIIRS as input and VNP14 as target achieves the best results. Interestingly, the model using MODIS as input and VNP14 as target performs significantly better than using VNP14 as input and MOD14 as target. Next, we discuss why MOD14 might be harder to use for predicting next-day fires. We find that the MOD14 fire mask is highly stochastic and does not correlate with reasonable fire spread patterns. This is detrimental for machine learning tasks, as the model learns irrational patterns. Therefore, we conclude that MOD14 is unsuitable for next-day fire prediction and that VNP14 is a much better option. However, using MODIS input and VNP14 as target, we achieve a significant improvement in predictability. This indicates that an improved fire detection model is possible for MODIS. The full code and dataset is available online: this https URL

[491] arXiv:2503.11427 (replaced) [pdf, html, other]
Title: FlowKac: An Efficient Neural Fokker-Planck solver using Temporal Normalizing Flows and the Feynman-Kac Formula
Naoufal El Bekri, Lucas Drumetz, Franck Vermet
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)

Solving the Fokker-Planck equation for high-dimensional complex dynamical systems remains a pivotal yet challenging task due to the intractability of analytical solutions and the limitations of traditional numerical methods. In this work, we present FlowKac, a novel approach that reformulates the Fokker-Planck equation using the Feynman-Kac formula, allowing to query the solution at a given point via the expected values of stochastic paths. A key innovation of FlowKac lies in its adaptive stochastic sampling scheme which significantly reduces the computational complexity while maintaining high accuracy. This sampling technique, coupled with a time-indexed normalizing flow, designed for capturing time-evolving probability densities, enables robust sampling of collocation points, resulting in a flexible and mesh-free solver. This formulation mitigates the curse of dimensionality and enhances computational efficiency and accuracy, which is particularly crucial for applications that inherently require dimensions beyond the conventional three. We validate the robustness and scalability of our method through various experiments on a range of stochastic differential equations, demonstrating significant improvements over existing techniques.

[492] arXiv:2503.12137 (replaced) [pdf, html, other]
Title: A State Alignment-Centric Approach to Federated System Identification: The FedAlign Framework
Ertuğrul Keçeci, Müjde Güzelkaya, Tufan Kumbasar
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

This paper presents FedAlign, a Federated Learning (FL) framework particularly designed for System Identification (SYSID) tasks by aligning state representations. Local workers can learn State-Space Models (SSMs) with equivalent representations but different dynamics. We demonstrate that directly aggregating these local SSMs via FedAvg results in a global model with altered system dynamics. FedAlign overcomes this problem by employing similarity transformation matrices to align state representations of local SSMs, thereby establishing a common parameter basin that retains the dynamics of local SSMs. FedAlign computes similarity transformation matrices via two distinct approaches: FedAlign-A and FedAlign-O. In FedAlign-A, we represent the global SSM in controllable canonical form (CCF). We apply control theory to analytically derive similarity transformation matrices that convert each local SSM into this form. Yet, establishing global SSM in CCF brings additional alignment challenges in multi input - multi output SYSID as CCF representation is not unique, unlike in single input - single output SYSID. In FedAlign-O, we address these alignment challenges by reformulating the local parameter basin alignment problem as an optimization task. We determine the parameter basin of a local worker as the common parameter basin and solve least square problems to obtain similarity transformation matrices needed to align the remaining local SSMs. Through the experiments conducted on synthetic and real-world datasets, we show that FedAlign outperforms FedAvg, converges faster, and provides improved stability of the global SSM thanks to the efficient alignment of local parameter basins.

[493] arXiv:2503.12615 (replaced) [pdf, other]
Title: LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization
Alessio Spagnoletti, Jean Prost, Andrés Almansa, Nicolas Papadakis, Marcelo Pereyra
Comments: 27 pages, 24 figures, International Conference on Computer Vision, ICCV 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Text-to-image latent diffusion models (LDMs) have recently emerged as powerful generative models with great potential for solving inverse problems in imaging. However, leveraging such models in a Plug & Play (PnP), zero-shot manner remains challenging because it requires identifying a suitable text prompt for the unknown image of interest. Also, existing text-to-image PnP approaches are highly computationally expensive. We herein address these challenges by proposing a novel PnP inference paradigm specifically designed for embedding generative models within stochastic inverse solvers, with special attention to Latent Consistency Models (LCMs), which distill LDMs into fast generators. We leverage our framework to propose LAtent consisTency INverse sOlver (LATINO), the first zero-shot PnP framework to solve inverse problems with priors encoded by LCMs. Our conditioning mechanism avoids automatic differentiation and reaches SOTA quality in as little as 8 neural function evaluations. As a result, LATINO delivers remarkably accurate solutions and is significantly more memory and computationally efficient than previous approaches. We then embed LATINO within an empirical Bayesian framework that automatically calibrates the text prompt from the observed measurements by marginal maximum likelihood estimation. Extensive experiments show that prompt self-calibration greatly improves estimation, allowing LATINO with PRompt Optimization to define new SOTAs in image reconstruction quality and computational efficiency. The code is available at this https URL

[494] arXiv:2503.14494 (replaced) [pdf, html, other]
Title: Deeply Supervised Flow-Based Generative Models
Inkyu Shin, Chenglin Yang, Liang-Chieh Chen
Comments: Accepted to ICCV 2025. Project website at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Flow based generative models have charted an impressive path across multiple visual generation tasks by adhering to a simple principle: learning velocity representations of a linear interpolant. However, we observe that training velocity solely from the final layer output underutilizes the rich inter layer representations, potentially impeding model convergence. To address this limitation, we introduce DeepFlow, a novel framework that enhances velocity representation through inter layer communication. DeepFlow partitions transformer layers into balanced branches with deep supervision and inserts a lightweight Velocity Refiner with Acceleration (VeRA) block between adjacent branches, which aligns the intermediate velocity features within transformer blocks. Powered by the improved deep supervision via the internal velocity alignment, DeepFlow converges 8 times faster on ImageNet with equivalent performance and further reduces FID by 2.6 while halving training time compared to previous flow based models without a classifier free guidance. DeepFlow also outperforms baselines in text to image generation tasks, as evidenced by evaluations on MSCOCO and zero shot GenEval.

[495] arXiv:2503.15867 (replaced) [pdf, html, other]
Title: TruthLens: Visual Grounding for Universal DeepFake Reasoning
Rohit Kundu, Shan Jia, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Detecting DeepFakes has become a crucial research area as the widespread use of AI image generators enables the effortless creation of face-manipulated and fully synthetic content, while existing methods are often limited to binary classification (real vs. fake) and lack interpretability. To address these challenges, we propose TruthLens, a novel, unified, and highly generalizable framework that goes beyond traditional binary classification, providing detailed, textual reasoning for its predictions. Distinct from conventional methods, TruthLens performs MLLM grounding.
TruthLens uses a task-driven representation integration strategy that unites global semantic context from a multimodal large language model (MLLM) with region-specific forensic cues through explicit cross-modal adaptation of a vision-only model. This enables nuanced, region-grounded reasoning for both face-manipulated and fully synthetic content, and supports fine-grained queries such as "Does the eyes/nose/mouth look real or fake?"- capabilities beyond pretrained MLLMs alone. Extensive experiments across diverse datasets demonstrate that TruthLens sets a new benchmark in both forensic interpretability and detection accuracy, generalizing to seen and unseen manipulations alike. By unifying high-level scene understanding with fine-grained region grounding, TruthLens delivers transparent DeepFake forensics, bridging a critical gap in the literature.

[496] arXiv:2503.16423 (replaced) [pdf, html, other]
Title: GAEA: A Geolocation Aware Conversational Assistant
Ron Campos, Ashmal Vayani, Parth Parag Kulkarni, Rohit Gupta, Aizan Zafar, Aritra Dutta, Mubarak Shah
Comments: The dataset and code used in this submission is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image geolocalization, in which an AI model traditionally predicts the precise GPS coordinates of an image, is a challenging task with many downstream applications. However, the user cannot utilize the model to further their knowledge beyond the GPS coordinates; the model lacks an understanding of the location and the conversational ability to communicate with the user. In recent days, with the tremendous progress of large multimodal models (LMMs) -- proprietary and open-source -- researchers have attempted to geolocalize images via LMMs. However, the issues remain unaddressed; beyond general tasks, for more specialized downstream tasks, such as geolocalization, LMMs struggle. In this work, we propose solving this problem by introducing a conversational model, GAEA, that provides information regarding the location of an image as the user requires. No large-scale dataset enabling the training of such a model exists. Thus, we propose GAEA-1.4M, a comprehensive dataset comprising over 800k images and approximately 1.4M question-answer pairs, constructed by leveraging OpenStreetMap (OSM) attributes and geographical context clues. For quantitative evaluation, we propose a diverse benchmark, GAEA-Bench, comprising 3.5k image-text pairs to evaluate conversational capabilities equipped with diverse question types. We consider 11 state-of-the-art open-source and proprietary LMMs and demonstrate that GAEA significantly outperforms the best open-source model, LLaVA-OneVision, by 18.2% and the best proprietary model, GPT-4o, by 7.2%. Our dataset, model and codes are available.

[497] arXiv:2503.18659 (replaced) [pdf, html, other]
Title: A filtered two-step variational integrator for charged-particle dynamics in a moderate or strong magnetic field
Ting Li, Bin Wang
Subjects: Numerical Analysis (math.NA)

This article is concerned with a new filtered two-step variational integrator for solving the charged-particle dynamics in a mildly non-homogeneous moderate or strong magnetic field with a dimensionless parameter $\epsilon$ inversely proportional to the strength of the magnetic field. In the case of a moderate magnetic field ($\epsilon=1$), second-order error bounds and long time energy and momentum conservations are obtained. Moreover, the proof of the long-term analysis is accomplished by the backward error analysis. For the strong magnetic field ($0<\epsilon \ll1$), this paper clarifies the behaviour of the filtered variational integrator for both a large stepsize $h^2 \geq C\epsilon$ and a smaller stepsize $ h \leq c\epsilon$. The approach to analysing the error bounds for these two stepsizes is based on comparing the modulated Fourier expansions of the exact and the numerical solutions. It is shown that the proposed integrator achieves a second-order accuracy $\mathcal{O}(h^2)$ in the position and in the parallel velocity for a large step size and an $\mathcal{O}(\epsilon)$ accuracy for a smaller stepsize. This paper also yields the long time energy and magnetic moment conservations for the strong magnetic field by developing the modulated Fourier expansion of the proposed scheme. All the theoretical results of the error behaviour and long-term conservations are numerically demonstrated by four numerical experiments.

[498] arXiv:2503.23768 (replaced) [pdf, html, other]
Title: Texture or Semantics? Vision-Language Models Get Lost in Font Recognition
Zhecheng Li, Guoxian Song, Yujun Cai, Zhen Xiong, Junsong Yuan, Yiwei Wang
Comments: Accepted to COLM 2025
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Modern Vision-Language Models (VLMs) exhibit remarkable visual and linguistic capabilities, achieving impressive performance in various tasks such as image recognition and object localization. However, their effectiveness in fine-grained tasks remains an open question. In everyday scenarios, individuals encountering design materials, such as magazines, typography tutorials, research papers, or branding content, may wish to identify aesthetically pleasing fonts used in the text. Given their multimodal capabilities and free accessibility, many VLMs are often considered potential tools for font recognition. This raises a fundamental question: Do VLMs truly possess the capability to recognize fonts? To investigate this, we introduce the Font Recognition Benchmark (FRB), a compact and well-structured dataset comprising 15 commonly used fonts. FRB includes two versions: (i) an easy version, where 10 sentences are rendered in different fonts, and (ii) a hard version, where each text sample consists of the names of the 15 fonts themselves, introducing a stroop effect that challenges model perception. Through extensive evaluation of various VLMs on font recognition tasks, we arrive at the following key findings: (i) Current VLMs exhibit limited font recognition capabilities, with many state-of-the-art models failing to achieve satisfactory performance and being easily affected by the stroop effect introduced by textual information. (ii) Few-shot learning and Chain-of-Thought (CoT) prompting provide minimal benefits in improving font recognition accuracy across different VLMs. (iii) Attention analysis sheds light on the inherent limitations of VLMs in capturing semantic features.

[499] arXiv:2503.24342 (replaced) [pdf, html, other]
Title: Coordinating Distributed Energy Resources with Nodal Pricing in Distribution Networks: a Game-Theoretic Approach
Eli Brock, Jingqi Li, Javad Lavaei, Somayeh Sojoudi
Subjects: Systems and Control (eess.SY)

We propose a real-time nodal pricing mechanism for cost minimization and voltage control in a distribution network with autonomous distributed energy resources and analyze the resulting market using stochastic game theory. Unlike existing methods, the proposed pricing scheme does not require device-aware centralized coordination or communication between prosumers. By developing new sufficient conditions under which a stochastic game is a Markov potential game, we show that the problem of computing an equilibrium for the proposed model is equivalent to solving a single-agent Markov Decision Process. These new conditions are general and may apply to other applications. We compute the equilibrium for an IEEE test system to empirically demonstrate the effectiveness of the pricing policy.

[500] arXiv:2504.00389 (replaced) [pdf, other]
Title: CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation
Chengshuai Zhao, Riccardo De Maria, Tharindu Kumarage, Kumar Satvik Chaudhary, Garima Agrawal, Yiwen Li, Jongchan Park, Yuli Deng, Ying-Chih Chen, Huan Liu
Comments: Accepted by The Conference on Information and Knowledge Management (CIKM) 2025
Subjects: Artificial Intelligence (cs.AI)

Advancements in large language models (LLMs) have enabled the development of intelligent educational tools that support inquiry-based learning across technical domains. In cybersecurity education, where accuracy and safety are paramount, systems must go beyond surface-level relevance to provide information that is both trustworthy and domain-appropriate. To address this challenge, we introduce CyberBOT, a question-answering chatbot that leverages a retrieval-augmented generation (RAG) pipeline to incorporate contextual information from course-specific materials and validate responses using a domain-specific cybersecurity ontology. The ontology serves as a structured reasoning layer that constrains and verifies LLM-generated answers, reducing the risk of misleading or unsafe guidance. CyberBOT has been deployed in a large graduate-level course at Arizona State University (ASU), where more than one hundred students actively engage with the system through a dedicated web-based platform. Computational evaluations in lab environments highlight the potential capacity of CyberBOT, and a forthcoming field study will evaluate its pedagogical impact. By integrating structured domain reasoning with modern generative capabilities, CyberBOT illustrates a promising direction for developing reliable and curriculum-aligned AI applications in specialized educational contexts.

[501] arXiv:2504.00397 (replaced) [pdf, html, other]
Title: Control Barrier Function Synthesis for Nonlinear Systems with Dual Relative Degree
Gilbert Bahati, Ryan K. Cosner, Max H. Cohen, Ryan M. Bena, Aaron D. Ames
Subjects: Systems and Control (eess.SY)

Control barrier functions (CBFs) are a powerful tool for synthesizing safe control actions; however, constructing CBFs remains difficult for general nonlinear systems. In this work, we provide a constructive framework for synthesizing CBFs for systems with dual relative degree -- where different inputs influence the outputs at two different orders of differentiation; this is common in systems with orientation-based actuation, such as unicycles and quadrotors. In particular, we propose dual relative degree CBFs (DRD-CBFs) and show that these DRD-CBFs can be constructively synthesized and used to guarantee system safety. Our method constructs DRD-CBFs by leveraging the dual relative degree property -- combining a CBF for an integrator chain with a Lyapunov function certifying the tracking of safe inputs generated for this linear system. We apply these results to dual relative degree systems, both in simulation and experimentally on hardware using quadruped and quadrotor robotic platforms.

[502] arXiv:2504.00969 (replaced) [pdf, html, other]
Title: HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO
Giovanni Cioffi, Leonard Bauersfeld, Davide Scaramuzza
Comments: Transactions on Robotics (T-RO) 2025
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Visual-inertial odometry (VIO) is widely used for state estimation in autonomous micro aerial vehicles using onboard sensors. Current methods improve VIO by incorporating a model of the translational vehicle dynamics, yet their performance degrades when faced with low-accuracy vehicle models or continuous external disturbances, like wind. Additionally, incorporating rotational dynamics in these models is computationally intractable when they are deployed in online applications, e.g., in a closed-loop control system. We present HDVIO2.0, which models full 6-DoF, translational and rotational, vehicle dynamics and tightly incorporates them into a VIO with minimal impact on the runtime. HDVIO2.0 builds upon the previous work, HDVIO, and addresses these challenges through a hybrid dynamics model combining a point-mass vehicle model with a learning-based component, with access to control commands and IMU history, to capture complex aerodynamic effects. The key idea behind modeling the rotational dynamics is to represent them with continuous-time functions. HDVIO2.0 leverages the divergence between the actual motion and the predicted motion from the hybrid dynamics model to estimate external forces as well as the robot state. Our system surpasses the performance of state-of-the-art methods in experiments using public and new drone dynamics datasets, as well as real-world flights in winds up to 25 km/h. Unlike existing approaches, we also show that accurate vehicle dynamics predictions are achievable without precise knowledge of the full vehicle state.

[503] arXiv:2504.01086 (replaced) [pdf, html, other]
Title: MPCritic: A plug-and-play MPC architecture for reinforcement learning
Nathan P. Lawrence, Thomas Banker, Ali Mesbah
Comments: CDC 2025 final version
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

The reinforcement learning (RL) and model predictive control (MPC) communities have developed vast ecosystems of theoretical approaches and computational tools for solving optimal control problems. Given their conceptual similarities but differing strengths, there has been increasing interest in synergizing RL and MPC. However, existing approaches tend to be limited for various reasons, including computational cost of MPC in an RL algorithm and software hurdles towards seamless integration of MPC and RL tools. These challenges often result in the use of "simple" MPC schemes or RL algorithms, neglecting the state-of-the-art in both areas. This paper presents MPCritic, a machine learning-friendly architecture that interfaces seamlessly with MPC tools. MPCritic utilizes the loss landscape defined by a parameterized MPC problem, focusing on "soft" optimization over batched training steps; thereby updating the MPC parameters while avoiding costly minimization and parametric sensitivities. Since the MPC structure is preserved during training, an MPC agent can be readily used for online deployment, where robust constraint satisfaction is paramount. We demonstrate the versatility of MPCritic, in terms of MPC architectures and RL algorithms that it can accommodate, on classic control benchmarks.

[504] arXiv:2504.01096 (replaced) [pdf, html, other]
Title: Efficient State Estimation of a Networked FlipIt Model
Brandon Collins, Thomas Gherna, Keith Paarporn, Shouhuai Xu, Philip N. Brown
Subjects: Cryptography and Security (cs.CR)

The Boolean Kalman Filter and associated Boolean Dynamical System Theory have been proposed to study the spread of infection on computer networks. Such models feature a network where attacks propagate through, an intrusion detection system that provides noisy signals of the true state of the network, and the capability of the defender to clean a subset of computers at any time. The Boolean Kalman Filter has been used to solve the optimal estimation problem, by estimating the hidden true state given the attack-defense dynamics and noisy observations. However, this algorithm is intractable because it runs in exponential time and space with respect to the network size. We address this feasibility problem by proposing a mean-field estimation approach, which is inspired by the epidemic modeling literature. Although our approach is heuristic, we prove that our estimator exactly matches the optimal estimator in certain non-trivial cases. We conclude by using simulations to show both the run-time improvement and estimation accuracy of our approach.

[505] arXiv:2504.09766 (replaced) [pdf, html, other]
Title: On the representation of stack operators by mathematical morphology
Diego Marcondes
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces the class of grey-scale image stack operators as those that (a) map binary-images into binary-images and (b) commute on average with cross-sectioning. Equivalently, stack operators are 1-Lipchitz extensions of set operators which can be represented by applying a characteristic set operator to the cross-sections of the image and adding. In particular, they are a generalisation of stack filters, for which the characteristic set operators are increasing. Our main result is that stack operators inherit lattice properties of the characteristic set operators. We focus on the case of translation-invariant and locally defined stack operators and show the main result by deducing the characteristic function, kernel, and basis representation of stack operators. The results of this paper have implications on the design of image operators, since imply that to solve some grey-scale image processing problems it is enough to design an operator for performing the desired transformation on binary images, and then considering its extension given by a stack operator. We leave many topics for future research regarding the machine learning of stack operators and the characterisation of the image processing problems that can be solved by them.

[506] arXiv:2504.09833 (replaced) [pdf, html, other]
Title: PPF: Pre-training and Preservative Fine-tuning of Humanoid Locomotion via Model-Assumption-based Regularization
Hyunyoung Jung, Zhaoyuan Gu, Ye Zhao, Hae-Won Park, Sehoon Ha
Subjects: Robotics (cs.RO)

Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine-tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.

[507] arXiv:2504.13231 (replaced) [pdf, html, other]
Title: WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada
Braeden Sherritt, Isar Nejadgholi, Efstratios Aivaliotis, Khaled Mslmani, Marzieh Amini
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Rapid information access is vital during wildfires, yet traditional data sources are slow and costly. Social media offers real-time updates, but extracting relevant insights remains a challenge. In this work, we focus on multimodal wildfire social media data, which, although existing in current datasets, is currently underrepresented in Canadian contexts. We present WildFireCan-MMD, a new multimodal dataset of X posts from recent Canadian wildfires, annotated across twelve key themes. We evaluate zero-shot vision-language models on this dataset and compare their results with those of custom-trained and baseline classifiers. We show that while baseline methods and zero-shot prompting offer quick deployment, custom-trained models outperform them when labelled data is available. Our best-performing custom model reaches 84.48% f-score, outperforming VLMs and baseline classifiers. We also demonstrate how this model can be used to uncover trends during wildfires, through the collection and analysis of a large unlabeled dataset. Our dataset facilitates future research in wildfire response, and our findings highlight the importance of tailored datasets and task-specific training. Importantly, such datasets should be localized, as disaster response requirements vary across regions and contexts.

[508] arXiv:2504.13529 (replaced) [pdf, html, other]
Title: Improving Bayesian Optimization for Portfolio Management with an Adaptive Scheduling
Zinuo You, John Cartlidge, Karen Elliott, Menghan Ge, Daniel Gold
Comments: 5 pages, 2 figures; author manuscript accepted for ICAAI 2025, 9th International Conference on Advances in Artificial Intelligence, Nov 2025, Manchester, UK
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Computational Finance (q-fin.CP); Portfolio Management (q-fin.PM)

Existing black-box portfolio management systems are prevalent in the financial industry due to commercial and safety constraints, though their performance can fluctuate dramatically with changing market regimes. Evaluating these non-transparent systems is computationally expensive, as fixed budgets limit the number of possible observations. Therefore, achieving stable and sample-efficient optimization for these systems has become a critical challenge. This work presents a novel Bayesian optimization framework (TPE-AS) that improves search stability and efficiency for black-box portfolio models under these limited observation budgets. Standard Bayesian optimization, which solely maximizes expected return, can yield erratic search trajectories and misalign the surrogate model with the true objective, thereby wasting the limited evaluation budget. To mitigate these issues, we propose a weighted Lagrangian estimator that leverages an adaptive schedule and importance sampling. This estimator dynamically balances exploration and exploitation by incorporating both the maximization of model performance and the minimization of the variance of model observations. It guides the search from broad, performance-seeking exploration towards stable and desirable regions as the optimization progresses. Extensive experiments and ablation studies, which establish our proposed method as the primary approach and other configurations as baselines, demonstrate its effectiveness across four backtest settings with three distinct black-box portfolio management models.

[509] arXiv:2504.14212 (replaced) [pdf, html, other]
Title: Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification
Takuma Udagawa, Yang Zhao, Hiroshi Kanayama, Bishwaranjan Bhattacharjee
Comments: Accepted to EMNLP 2025 (Findings)
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.

[510] arXiv:2504.15800 (replaced) [pdf, html, other]
Title: FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation
Chanyeol Choi, Jihoon Kwon, Jaeseon Ha, Hojun Choi, Chaewoon Kim, Yongjae Lee, Jy-yong Sohn, Alejandro Lopez-Lira
Comments: 10 pages, 3 figures, ICLR 2025 Workshop Advances in Financial AI
Subjects: Information Retrieval (cs.IR)

In the fast-paced financial domain, accurate and up-to-date information is critical to addressing ever-evolving market conditions. Retrieving this information correctly is essential in financial Question-Answering (QA), since many language models struggle with factual accuracy in this domain. We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance. Unlike existing QA datasets that provide predefined contexts and rely on relatively clear and straightforward queries, FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets derived from real-world financial inquiries. These queries frequently include abbreviations, acronyms, and concise expressions, capturing the brevity and ambiguity common in the realistic search behavior of professionals. By challenging models to retrieve relevant information from large corpora rather than relying on readily determined contexts, FinDER offers a more realistic benchmark for evaluating RAG systems. We further present a comprehensive evaluation of multiple state-of-the-art retrieval models and Large Language Models, showcasing challenges derived from a realistic benchmark to drive future research on truthful and precise RAG in the financial domain.

[511] arXiv:2504.16485 (replaced) [pdf, html, other]
Title: On Developers' Self-Declaration of AI-Generated Code: An Analysis of Practices
Syed Mohammad Kashif, Peng Liang, Amjed Tahir
Comments: 36 pages, 15 images, 8 tables, Manuscript revision submitted to a journal (2025)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

AI code generation tools have gained significant popularity among developers, who use them to assist in software development due to their capability to generate code. Existing studies mainly explored the quality, e.g., correctness and security, of AI-generated code, while in real-world software development, the prerequisite is to distinguish AI-generated code from human-written code, which emphasizes the need to explicitly declare AI-generated code by developers. To this end, this study intends to understand the ways developers use to self-declare AI-generated code and explore the reasons why developers choose to self-declare or not. We conducted a mixed-methods study consisting of two phases. In the first phase, we mined GitHub repositories and collected 613 instances of AI-generated code snippets. In the second phase, we conducted a follow-up practitioners' survey, which received 111 valid responses. Our research revealed the practices followed by developers to self-declare AI-generated code. Most practitioners (76.6%) always or sometimes self-declare AI-generated code. In contrast, other practitioners (23.4%) noted that they never self-declare AI-generated code. The reasons for self-declaring AI-generated code include the need to track and monitor the code for future review and debugging, and ethical considerations. The reasons for not self-declaring AI-generated code include extensive modifications to AI-generated code and the developers' perception that self-declaration is an unnecessary activity. We finally provided guidelines for practitioners to self-declare AI-generated code, addressing ethical and code quality concerns.

[512] arXiv:2504.18829 (replaced) [pdf, html, other]
Title: Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy
Jiayi Chen, Yubin Ke, Lin Peng, He Wang
Comments: Accepted by Robotics: Science and Systems (RSS 2025)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Generalizable dexterous grasping with suitable grasp types is a fundamental skill for intelligent robots. Developing such skills requires a large-scale and high-quality dataset that covers numerous grasp types (i.e., at least those categorized by the GRASP taxonomy), but collecting such data is extremely challenging. Existing automatic grasp synthesis methods are often limited to specific grasp types or object categories, hindering scalability. This work proposes an efficient pipeline capable of synthesizing contact-rich, penetration-free, and physically plausible grasps for any grasp type, object, and articulated hand. Starting from a single human-annotated template for each hand and grasp type, our pipeline tackles the complicated synthesis problem with two stages: optimize the object to fit the hand template first, and then locally refine the hand to fit the object in simulation. To validate the synthesized grasps, we introduce a contact-aware control strategy that allows the hand to apply the appropriate force at each contact point to the object. Those validated grasps can also be used as new grasp templates to facilitate future synthesis. Experiments show that our method significantly outperforms previous type-unaware grasp synthesis baselines in simulation. Using our algorithm, we construct a dataset containing 10.7k objects and 9.5M grasps, covering 31 grasp types in the GRASP taxonomy. Finally, we train a type-conditional generative model that successfully performs the desired grasp type from single-view object point clouds, achieving an 82.3% success rate in real-world experiments. Project page: this https URL.

[513] arXiv:2504.18840 (replaced) [pdf, html, other]
Title: Distributed Lloyd-Based Algorithm for Uncertainty-Aware Multi-Robot Under-Canopy Flocking
Manuel Boldrer, Vit Kratky, Viktor Walter, Martin Saska
Subjects: Robotics (cs.RO)

In this letter, we present a distributed algorithm for flocking in complex environments that operates at constant altitude, without explicit communication, no a priori information about the environment, and by using only on-board sensing and computation capabilities. We provide sufficient conditions to guarantee that each robot reaches its goal region in a finite time, avoiding collisions with obstacles and other robots without exceeding a desired maximum distance from a predefined set of neighbors (flocking or proximity constraint). The proposed approach allows to operate in crowded scenarios and to deal with tracking errors and on-board sensing errors, without violating safety and proximity constraints. The algorithm was verified through simulations with varying number of UAVs and also through numerous real-world experiments in a dense forest involving up to four UAVs.

[514] arXiv:2504.18942 (replaced) [pdf, html, other]
Title: LawFlow: Collecting and Simulating Lawyers' Thought Processes on Business Formation Case Studies
Debarati Das, Khanh Chi Le, Ritik Sachin Parkar, Karin De Langis, Brendan Madson, Chad M. Berryman, Robin M. Willis, Daniel H. Moses, Brett McDonnell, Daniel Schwarcz, Dongyeop Kang
Comments: Accepted at COLM 2025
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Legal practitioners, particularly those early in their careers, face complex, high-stakes tasks that require adaptive, context-sensitive reasoning. While AI holds promise in supporting legal work, current datasets and models are narrowly focused on isolated subtasks and fail to capture the end-to-end decision-making required in real-world practice. To address this gap, we introduce LawFlow, a dataset of complete end-to-end legal workflows collected from trained law students, grounded in real-world business entity formation scenarios. Unlike prior datasets focused on input-output pairs or linear chains of thought, LawFlow captures dynamic, modular, and iterative reasoning processes that reflect the ambiguity, revision, and client-adaptive strategies of legal practice. Using LawFlow, we compare human and LLM-generated workflows, revealing systematic differences in structure, reasoning flexibility, and plan execution. Human workflows tend to be modular and adaptive, while LLM workflows are more sequential, exhaustive, and less sensitive to downstream implications. Our findings also suggest that legal professionals prefer AI to carry out supportive roles, such as brainstorming, identifying blind spots, and surfacing alternatives, rather than executing complex workflows end-to-end. Our results highlight both the current limitations of LLMs in supporting complex legal workflows and opportunities for developing more collaborative, reasoning-aware legal AI systems.
All data and code are available on our project page (this https URL).

[515] arXiv:2505.01103 (replaced) [pdf, html, other]
Title: Semi-Centennial REDUCE
Arthur C. Norman, Stephen M. Watt
Subjects: Symbolic Computation (cs.SC)

We present a version of the REDUCE computer algebra system as it was in the early 1970s. We show how this historical version of REDUCE may be built and run in very modest present-day environments and outline some of its capabilities.

[516] arXiv:2505.01530 (replaced) [pdf, other]
Title: Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer
Muhammad Tayyab Khan, Zane Yong, Lequn Chen, Jun Ming Tan, Wenhe Feng, Seung Ki Moon
Comments: This manuscript has been accepted for publication at IEEE International Conference on Industrial Engineering and Engineering Management (IEEM)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate extraction of key information from 2D engineering drawings is crucial for high-precision manufacturing. Manual extraction is slow and labor-intensive, while traditional Optical Character Recognition (OCR) techniques often struggle with complex layouts and overlapping symbols, resulting in unstructured outputs. To address these challenges, this paper proposes a novel hybrid deep learning framework for structured information extraction by integrating an Oriented Bounding Box (OBB) detection model with a transformer-based document parsing model (Donut). An in-house annotated dataset is used to train YOLOv11 for detecting nine key categories: Geometric Dimensioning and Tolerancing (GD&T), General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. Detected OBBs are cropped into images and labeled to fine-tune Donut for structured JSON output. Fine-tuning strategies include a single model trained across all categories and category-specific models. Results show that the single model consistently outperforms category-specific ones across all evaluation metrics, achieving higher precision (94.77% for GD&T), recall (100% for most categories), and F1 score (97.3%), while reducing hallucinations (5.23%). The proposed framework improves accuracy, reduces manual effort, and supports scalable deployment in precision-driven industries.

[517] arXiv:2505.02273 (replaced) [pdf, html, other]
Title: Demystifying optimized prompts in language models
Rimon Melamed, Lucas H. McCabe, H. Howie Huang
Comments: EMNLP 2025 Main
Subjects: Computation and Language (cs.CL)

Modern language models (LMs) are not robust to out-of-distribution inputs. Machine generated (``optimized'') prompts can be used to modulate LM outputs and induce specific behaviors while appearing completely uninterpretable. In this work, we investigate the composition of optimized prompts, as well as the mechanisms by which LMs parse and build predictions from optimized prompts. We find that optimized prompts primarily consist of punctuation and noun tokens which are more rare in the training data. Internally, optimized prompts are clearly distinguishable from natural language counterparts based on sparse subsets of the model's activations. Across various families of instruction-tuned models, optimized prompts follow a similar path in how their representations form through the network.

[518] arXiv:2505.02476 (replaced) [pdf, html, other]
Title: Point Cloud Recombination: Systematic Real Data Augmentation Using Robotic Targets for LiDAR Perception Validation
Hubert Padusinski, Christian Steinhauser, Christian Scherl, Julian Gaal, Jacob Langner
Comments: Pre-print for IEEE IAVVC 2025
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The validation of LiDAR-based perception of intelligent mobile systems operating in open-world applications remains a challenge due to the variability of real environmental conditions. Virtual simulations allow the generation of arbitrary scenes under controlled conditions but lack physical sensor characteristics, such as intensity responses or material-dependent effects. In contrast, real-world data offers true sensor realism but provides less control over influencing factors, hindering sufficient validation. Existing approaches address this problem with augmentation of real-world point cloud data by transferring objects between scenes. However, these methods do not consider validation and remain limited in controllability because they rely on empirical data. We solve these limitations by proposing Point Cloud Recombination, which systematically augments captured point cloud scenes by integrating point clouds acquired from physical target objects measured in controlled laboratory environments. Thus enabling the creation of vast amounts and varieties of repeatable, physically accurate test scenes with respect to phenomena-aware occlusions with registered 3D meshes. Using the Ouster OS1-128 Rev7 sensor, we demonstrate the augmentation of real-world urban and rural scenes with humanoid targets featuring varied clothing and poses, for repeatable positioning. We show that the recombined scenes closely match real sensor outputs, enabling targeted testing, scalable failure analysis, and improved system safety. By providing controlled yet sensor-realistic data, our method enables trustworthy conclusions about the limitations of specific sensors in compound with their algorithms, e.g., object detection.

[519] arXiv:2505.03911 (replaced) [pdf, html, other]
Title: Explaining Anomalies with Tensor Networks
Hans Hohenfeld, Marius Beuerle, Elie Mounzer
Comments: 6 pages, 3 figures, Accepted for publication at IEEE QAI 2025
Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph)

Tensor networks, a class of variational quantum many-body wave functions have attracted considerable research interest across many disciplines, including classical machine learning. Recently, Aizpurua et al. demonstrated explainable anomaly detection with matrix product states on a discrete-valued cyber-security task, using quantum-inspired methods to gain insight into the learned model and detected anomalies. Here, we extend this framework to real-valued data domains. We furthermore introduce tree tensor networks for the task of explainable anomaly detection. We demonstrate these methods with three benchmark problems, show adequate predictive performance compared to several baseline models and both tensor network architectures' ability to explain anomalous samples. We thereby extend the application of tensor networks to a broader class of potential problems and open a pathway for future extensions to more complex tensor network architectures.

[520] arXiv:2505.05225 (replaced) [pdf, html, other]
Title: QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Mengze Hong, Wailing Ng, Chen Jason Zhang, Di Jiang
Comments: Accepted by EMNLP 2025 Main Conference. Homepage: this https URL
Subjects: Computation and Language (cs.CL)

The rapid advancement of Chinese LLMs underscores the need for vertical-domain evaluations to ensure reliable applications. However, existing benchmarks often lack domain coverage and provide limited insights into the Chinese working context. Leveraging qualification exams as a unified framework for expertise evaluation, we introduce QualBench, the first multi-domain Chinese QA benchmark dedicated to localized assessment of Chinese LLMs. The dataset includes over 17,000 questions across six vertical domains, drawn from 24 Chinese qualifications to align with national policies and professional standards. Results reveal an interesting pattern of Chinese LLMs consistently surpassing non-Chinese models, with the Qwen2.5 model outperforming the more advanced GPT-4o, emphasizing the value of localized domain knowledge in meeting qualification requirements. The average accuracy of 53.98% reveals the current gaps in domain coverage within model capabilities. Furthermore, we identify performance degradation caused by LLM crowdsourcing, assess data contamination, and illustrate the effectiveness of prompt engineering and model fine-tuning, suggesting opportunities for future improvements through multi-domain RAG and Federated Learning.

[521] arXiv:2505.05755 (replaced) [pdf, html, other]
Title: Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel, Aishwarya Sahoo, Avinash Amballa, Tahira Naseem, Tim G. J. Rudner, Andrew McCallum
Comments: Additional related work. Code available at: this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Autoregressive models (ARMs), which predict subsequent tokens one-by-one ``from left to right,'' have achieved significant success across a wide range of sequence generation tasks. However, they struggle to accurately represent sequences that require satisfying sophisticated constraints or whose sequential dependencies are better addressed by out-of-order generation. Masked Diffusion Models (MDMs) address some of these limitations, but the process of unmasking multiple tokens simultaneously in MDMs can introduce incoherences, and MDMs cannot handle arbitrary infilling constraints when the number of tokens to be filled in is not known in advance. In this work, we introduce Insertion Language Models (ILMs), which learn to insert tokens at arbitrary positions in a sequence -- that is, they select jointly both the position and the vocabulary element to be inserted. By inserting tokens one at a time, ILMs can represent strong dependencies between tokens, and their ability to generate sequences in arbitrary order allows them to accurately model sequences where token dependencies do not follow a left-to-right sequential structure. To train ILMs, we propose a tailored network parameterization and use a simple denoising objective. Our empirical evaluation demonstrates that ILMs outperform both ARMs and MDMs on common planning tasks. Furthermore, we show that ILMs outperform MDMs and perform on par with ARMs in an unconditional text generation task while offering greater flexibility than MDMs in arbitrary-length text infilling. The code is available at: this https URL .

[522] arXiv:2505.08957 (replaced) [pdf, html, other]
Title: Even Faster Algorithm for the Chamfer Distance
Ying Feng, Piotr Indyk
Comments: Simplified Section 4
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)

For two d-dimensional point sets A, B of size up to n, the Chamfer distance from A to B is defined as CH(A,B) = \sum_{a \in A} \min_{b \in B} \|a-b\|. The Chamfer distance is a widely used measure for quantifying dissimilarity between sets of points, used in many machine learning and computer vision applications. A recent work of Bakshi et al, NeuriPS'23, gave the first near-linear time (1+eps)-approximate algorithm, with a running time of O(ndlog(n)/eps^2). In this paper we improve the running time further, to O(nd(loglog(n)+log(1/eps))/eps^2). When eps is a constant, this reduces the gap between the upper bound and the trivial Omega(dn) lower bound significantly, from O(log n) to O(loglog n).

[523] arXiv:2505.09016 (replaced) [pdf, html, other]
Title: Resource Allocation with Multi-Team Collaboration Based on Hamilton's Rule
Riwa Karam, Ruoyu Lin, Brooks A. Butler, Magnus Egerstedt
Subjects: Systems and Control (eess.SY)

This paper presents a multi-team collaboration strategy based on Hamilton's rule from ecology that facilitates resource allocation among multiple teams, where agents are considered as shared resource among all teams that must be allocated appropriately. We construct an algorithmic framework that allows teams to make bids for agents that consider the costs and benefits of transferring agents while also considering relative mission importance for each team. This framework is applied to a multi-team coverage control mission to demonstrate its effectiveness. It is shown that the necessary criteria of a mission evaluation function are met by framing it as a function of the locational coverage cost of each team with respect to agent gain and loss, and these results are illustrated through simulations.

[524] arXiv:2505.10978 (replaced) [pdf, html, other]
Title: Group-in-Group Policy Optimization for LLM Agent Training
Lang Feng, Zhenghai Xue, Tingcong Liu, Bo An
Comments: Preprint
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent advances in group-based reinforcement learning (RL) have driven frontier large language models (LLMs) in single-turn tasks like mathematical reasoning. However, their scalability to long-horizon LLM agent training remains limited. Unlike static tasks, agent-environment interactions unfold over many steps and often yield sparse or delayed rewards, making credit assignment across individual steps significantly more challenging. In this work, we propose Group-in-Group Policy Optimization (GiGPO), a novel RL algorithm that achieves fine-grained credit assignment for LLM agents while preserving the appealing properties of group-based RL: critic-free, low memory, and stable convergence. GiGPO introduces a two-level structure for estimating relative advantage: (i) At the episode-level, GiGPO computes macro relative advantages based on groups of complete trajectories; (ii) At the step-level, GiGPO introduces an anchor state grouping mechanism that retroactively constructs step-level groups by identifying repeated environment states across trajectories. Actions stemming from the same state are grouped together, enabling micro relative advantage estimation. This hierarchical structure effectively captures both global trajectory quality and local step effectiveness without relying on auxiliary models or additional rollouts. We evaluate GiGPO on two challenging agent benchmarks, ALFWorld and WebShop, using Qwen2.5-1.5B-Instruct and Qwen2.5-7B-Instruct. Crucially, GiGPO delivers fine-grained per-step credit signals and achieves performance gains of > 12\% on ALFWorld and > 9\% on WebShop over the GRPO baseline: all while maintaining the same GPU memory overhead, identical LLM rollout, and incurring little to no additional time cost.

[525] arXiv:2505.13188 (replaced) [pdf, html, other]
Title: When a Reinforcement Learning Agent Encounters Unknown Unknowns
Juntian Zhu, Miguel de Carvalho, Zhouwang Yang, Fengxiang He
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

An AI agent might surprisingly find she has reached an unknown state which she has never been aware of -- an unknown unknown. We mathematically ground this scenario in reinforcement learning: an agent, after taking an action calculated from value functions $Q$ and $V$ defined on the {\it {aware domain}}, reaches a state out of the domain. To enable the agent to handle this scenario, we propose an {\it episodic Markov decision {process} with growing awareness} (EMDP-GA) model, taking a new {\it noninformative value expansion} (NIVE) approach to expand value functions to newly aware areas: when an agent arrives at an unknown unknown, value functions $Q$ and $V$ whereon are initialised by noninformative beliefs -- the averaged values on the aware domain. This design is out of respect for the complete absence of knowledge in the newly discovered state. The upper confidence bound momentum Q-learning is then adapted to the growing awareness for training the EMDP-GA model. We prove that (1) the regret of our approach is asymptotically consistent with the state of the art (SOTA) without exposure to unknown unknowns in an extremely uncertain environment, and (2) our computational complexity and space complexity are comparable with the SOTA -- these collectively suggest that though an unknown unknown is surprising, it will be asymptotically properly discovered with decent speed and an affordable cost.

[526] arXiv:2505.13754 (replaced) [pdf, html, other]
Title: Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs
Devendra Parkar, Anya Chaturvedi, Joshua J. Daymude
Comments: 13 pages, 2 figures, 2 tables, 3 algorithms
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

We present the first unsupervised learning model for finding Maximum Independent Sets (MaxIS) in dynamic graphs where edges change over time. Our method combines structural learning from graph neural networks (GNNs) with a learned distributed update mechanism that, given an edge addition or deletion event, modifies nodes' internal memories and infers their MaxIS membership in a single, parallel step. We parameterize our model by the update mechanism's radius and investigate the resulting performance-runtime tradeoffs for various dynamic graph topologies. We evaluate our model against a mixed integer programming solver and the state-of-the-art learning-based methods for MaxIS on static graphs (ICML 2020; NeurIPS 2020, 2023). Across synthetic and empirical dynamic graphs of 50-1,000 nodes, our model achieves competitive approximation ratios with excellent scalability; on large graphs, it significantly outperforms the state-of-the-art learning methods in solution quality, runtime, and memory usage. When generalizing to graphs of 10,000 nodes (100x larger than the ones used for training), our model produces MaxIS solutions 1.05-1.18x larger than any other learning method, even while maintaining competitive runtimes.

[527] arXiv:2505.14257 (replaced) [pdf, html, other]
Title: Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow
Jianfei Zhao, Feng Zhang, Xin Sun, Chong Feng
Comments: Accepted to Findings of EMNLP 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Due to the unidirectional masking mechanism, Decoder-Only models propagate information from left to right. LVLMs (Large Vision-Language Models) follow the same architecture, with visual information gradually integrated into semantic representations during forward propagation. Through systematic analysis, we observe that the majority of the visual information is absorbed into the semantic representations. However, the model's attention distribution does not exhibit sufficient emphasis on semantic representations. This misalignment between the attention distribution and the actual information flow undermines the model's visual understanding ability and contributes to hallucinations. To address this issue, we enhance the model's visual understanding by leveraging the core information embedded in semantic representations. Specifically, we identify attention heads that focus on core semantic representations based on their attention distributions. Then, through a two-stage optimization paradigm, we propagate the advantages of these attention heads across the entire model, aligning the attention distribution with the actual information flow. We evaluate our method on three image captioning benchmarks using five different LVLMs, demonstrating its effectiveness in significantly reducing hallucinations. Further experiments reveal a trade-off between reduced hallucinations and richer details. Notably, our method allows for manual adjustment of the model's conservativeness, enabling flexible control to meet diverse real-world requirements.

[528] arXiv:2505.16022 (replaced) [pdf, html, other]
Title: NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Wei Liu, Siya Qi, Xinyu Wang, Chen Qian, Yali Du, Yulan He
Comments: 20 pages, 5 tables, 12 figures. accepted to EMNLP 2025
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent advances such as DeepSeek R1-Zero highlight the effectiveness of incentive training, a reinforcement learning paradigm that computes rewards solely based on the final answer part of a language model's output, thereby encouraging the generation of intermediate reasoning steps. However, these methods fundamentally rely on external verifiers, which limits their applicability to domains like mathematics and coding where such verifiers are readily available. Although reward models can serve as verifiers, they require high-quality annotated data and are costly to train. In this work, we propose NOVER, NO-VERifier Reinforcement Learning, a general reinforcement learning framework that requires only standard supervised fine-tuning data with no need for an external verifier. NOVER enables incentive training across a wide range of text-to-text tasks and outperforms the model of the same size distilled from large reasoning models such as DeepSeek R1 671B by 7.7 percent. Moreover, the flexibility of NOVER enables new possibilities for optimizing large language models, such as inverse incentive training.

[529] arXiv:2505.17067 (replaced) [pdf, html, other]
Title: Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning
Kristin Qi, Jiali Cheng, Youxiang Zhu, Hadi Amiri, Xiaohui Liang
Comments: IEEE Global Communications Conference (GlobeCom) 2025
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Detecting Mild Cognitive Impairment from picture descriptions is critical yet challenging, especially in multilingual and multiple picture settings. Prior work has primarily focused on English speakers describing a single picture (e.g., the 'Cookie Theft'). The TAUKDIAL-2024 challenge expands this scope by introducing multilingual speakers and multiple pictures, which presents new challenges in analyzing picture-dependent content. To address these challenges, we propose a framework with three components: (1) enhancing discriminative representation learning via supervised contrastive learning, (2) involving image modality rather than relying solely on speech and text modalities, and (3) applying a Product of Experts (PoE) strategy to mitigate spurious correlations and overfitting. Our framework improves MCI detection performance, achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to 75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to the text unimodal baseline. Notably, the contrastive learning component yields greater gains for the text modality compared to speech. These results highlight our framework's effectiveness in multilingual and multi-picture MCI detection.

[530] arXiv:2505.17137 (replaced) [pdf, html, other]
Title: Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands
Kristin Qi, Youxiang Zhu, Caroline Summerour, John A. Batsis, Xiaohui Liang
Comments: IEEE Global Communications Conference (GlobeCom) 2025
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Early detection of cognitive decline is crucial for enabling interventions that can slow neurodegenerative disease progression. Traditional diagnostic approaches rely on labor-intensive clinical assessments, which are impractical for frequent monitoring. Our pilot study investigates voice assistant systems (VAS) as non-invasive tools for detecting cognitive decline through longitudinal analysis of speech patterns in voice commands. Over an 18-month period, we collected voice commands from 35 older adults, with 15 participants providing daily at-home VAS interactions. To address the challenges of analyzing these short, unstructured and noisy commands, we propose Cog-TiPRO, a framework that combines (1) LLM-driven iterative prompt refinement for linguistic feature extraction, (2) HuBERT-based acoustic feature extraction, and (3) transformer-based temporal modeling. Using iTransformer, our approach achieves 73.80% accuracy and 72.67% F1-score in detecting MCI, outperforming its baseline by 27.13%. Through our LLM approach, we identify linguistic features that uniquely characterize everyday command usage patterns in individuals experiencing cognitive decline.

[531] arXiv:2505.20015 (replaced) [pdf, html, other]
Title: On the class of coding optimality of human languages and the origins of Zipf's law
Ramon Ferrer-i-Cancho
Comments: a few typos corrected, in press in Europhysics Letters
Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

Here we present a new class of optimality for coding systems. Members of that class are displaced linearly from optimal coding and thus exhibit Zipf's law, namely a power-law distribution of frequency ranks. Within that class, Zipf's law, the size-rank law and the size-probability law form a group-like structure. We identify human languages that are members of the class. All languages showing sufficient agreement with Zipf's law are potential members of the class. In contrast, there are communication systems in other species that cannot be members of that class for exhibiting an exponential distribution instead but dolphins and humpback whales might. We provide a new insight into plots of frequency versus rank in double logarithmic scale. For any system, a straight line in that scale indicates that the lengths of optimal codes under non-singular coding and under uniquely decodable encoding are displaced by a linear function whose slope is the exponent of Zipf's law. For systems under compression and constrained to be uniquely decodable, such a straight line may indicate that the system is coding close to optimality. We provide support for the hypothesis that Zipf's law originates from compression and define testable conditions for the emergence of Zipf's law in compressing systems.

[532] arXiv:2505.20203 (replaced) [pdf, other]
Title: Shutdownable Agents through POST-Agency
Elliott Thornley
Subjects: Artificial Intelligence (cs.AI)

Many fear that future artificial agents will resist shutdown. I present an idea - the POST-Agents Proposal - for ensuring that doesn't happen. I propose that we train agents to satisfy Preferences Only Between Same-Length Trajectories (POST). I then prove that POST - together with other conditions - implies Neutrality+: the agent maximizes expected utility, ignoring the probability distribution over trajectory-lengths. I argue that Neutrality+ keeps agents shutdownable and allows them to be useful.

[533] arXiv:2505.20353 (replaced) [pdf, html, other]
Title: FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
Dong Liu, Yanxuan Yu, Jiayi Zhang, Yifan Li, Ben Lengerich, Ying Nian Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Performance (cs.PF)

Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose FastCache, a hidden-state-level caching and compression framework that accelerates DiT inference by exploiting redundancy within the model's internal representations. FastCache introduces a dual strategy: (1) a spatial-aware token selection mechanism that adaptively filters redundant tokens based on hidden state saliency, and (2) a transformer-level cache that reuses latent activations across timesteps when changes are statistically insignificant. These modules work jointly to reduce unnecessary computation while preserving generation fidelity through learnable linear approximation. Theoretical analysis shows that FastCache maintains bounded approximation error under a hypothesis-testing-based decision rule. Empirical evaluations across multiple DiT variants demonstrate substantial reductions in latency and memory usage, with best generation output quality compared to other cache methods, as measured by FID and t-FID. Code implementation of FastCache is available on GitHub at this https URL.

[534] arXiv:2505.20357 (replaced) [pdf, html, other]
Title: Learning and Interpreting Gravitational-Wave Features from CNNs with a Random Forest Approach
Jun Tian, He Wang, Jibo He, Yu Pan, Shuo Cao, Qingquan Jiang
Journal-ref: 2025 Mach. Learn.: Sci. Technol. 6 035045
Subjects: Machine Learning (cs.LG); General Relativity and Quantum Cosmology (gr-qc); Data Analysis, Statistics and Probability (physics.data-an)

Convolutional neural networks (CNNs) have become widely adopted in gravitational wave (GW) detection pipelines due to their ability to automatically learn hierarchical features from raw strain data. However, the physical meaning of these learned features remains underexplored, limiting the interpretability of such models. In this work, we propose a hybrid architecture that combines a CNN-based feature extractor with a random forest (RF) classifier to improve both detection performance and interpretability. Unlike prior approaches that directly connect classifiers to CNN outputs, our method introduces four physically interpretable metrics - variance, signal-to-noise ratio (SNR), waveform overlap, and peak amplitude - computed from the final convolutional layer. These are jointly used with the CNN output in the RF classifier to enable more informed decision boundaries. Tested on long-duration strain datasets, our hybrid model outperforms a baseline CNN model, achieving a relative improvement of 21\% in sensitivity at a fixed false alarm rate of 10 events per month. Notably, it also shows improved detection of low-SNR signals (SNR $\le$ 10), which are especially vulnerable to misclassification in noisy environments. Feature attribution via the RF model reveals that both CNN-extracted and handcrafted features contribute significantly to classification decisions, with learned variance and CNN outputs ranked among the most informative. These findings suggest that physically motivated post-processing of CNN feature maps can serve as a valuable tool for interpretable and efficient GW detection, bridging the gap between deep learning and domain knowledge.

[535] arXiv:2505.21360 (replaced) [pdf, html, other]
Title: CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models
Dhanesh Ramachandram, Ananya Raval
Comments: Added Feature Importance Diagrams and co-author
Subjects: Machine Learning (cs.LG)

Competing risks are crucial considerations in survival modelling, particularly in healthcare domains where patients may experience multiple distinct event types. We propose CRISP-NAM (Competing Risks Interpretable Survival Prediction with Neural Additive Models), an interpretable neural additive model for competing risks survival analysis which extends the neural additive architecture to model cause-specific hazards while preserving feature-level interpretability. Each feature contributes independently to risk estimation through dedicated neural networks, allowing for visualization of complex non-linear relationships between covariates and each competing risk. We demonstrate competitive performance on multiple datasets compared to existing approaches.

[536] arXiv:2505.23643 (replaced) [pdf, other]
Title: Securing AI Agents with Information-Flow Control
Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at this https URL

[537] arXiv:2505.23980 (replaced) [pdf, html, other]
Title: DeepTopoNet: A Framework for Subglacial Topography Estimation on the Greenland Ice Sheets
Bayu Adhi Tama, Mansa Krishna, Homayra Alam, Mostafa Cham, Omar Faruque, Gong Cheng, Jianwu Wang, Mathieu Morlighem, Vandana Janeja
Comments: Accepted as Full Application Track Paper in SIGSPATIAL 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Understanding Greenland's subglacial topography is critical for projecting the future mass loss of the ice sheet and its contribution to global sea-level rise. However, the complex and sparse nature of observational data, particularly information about the bed topography under the ice sheet, significantly increases the uncertainty in model projections. Bed topography is traditionally measured by airborne ice-penetrating radar that measures the ice thickness directly underneath the aircraft, leaving data gap of tens of kilometers in between flight lines. This study introduces a deep learning framework, which we call as DeepTopoNet, that integrates radar-derived ice thickness observations and BedMachine Greenland data through a novel dynamic loss-balancing mechanism. Among all efforts to reconstruct bed topography, BedMachine has emerged as one of the most widely used datasets, combining mass conservation principles and ice thickness measurements to generate high-resolution bed elevation estimates. The proposed loss function adaptively adjusts the weighting between radar and BedMachine data, ensuring robustness in areas with limited radar coverage while leveraging the high spatial resolution of BedMachine predictions i.e. bed estimates. Our approach incorporates gradient-based and trend surface features to enhance model performance and utilizes a CNN architecture designed for subgrid-scale predictions. By systematically testing on the Upernavik Isstrøm) region, the model achieves high accuracy, outperforming baseline methods in reconstructing subglacial terrain. This work demonstrates the potential of deep learning in bridging observational gaps, providing a scalable and efficient solution to inferring subglacial topography.

[538] arXiv:2506.01211 (replaced) [pdf, html, other]
Title: Iola Walker: A Mobile Footfall Detection System for Music Composition
William B. James
Subjects: Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

This outing is part of a larger music technology research project. The objective is to find a method for materially enhancing music using hardware and software. There is a strong likelihood that there exists a new medium for experiencing music via a wearable device that ordinary listeners prefer over the current state of the art. If such a medium is discovered, it is a step towards altruistic, prosocial reform in the music industry. A new playback system infrastructure has a chance to soothe some of the societal problems tied to the larger entertainment industry ecosystem. Iola walker is a music playback system that allows musicians to compose music that changes in accordance with the listener's gait. Artifacts are available here: this https URL

[539] arXiv:2506.01269 (replaced) [pdf, html, other]
Title: Region-of-Interest-Guided Deep Joint Source-Channel Coding for Image Transmission
Hansung Choi, Daewon Seo
Subjects: Information Theory (cs.IT)

Deep joint source-channel coding (deepJSCC) methods have shown promising improvements in communication performance over wireless networks. However, existing approaches primarily focus on enhancing overall image reconstruction quality, which may not fully align with user experiences, often driven by the quality of regions of interest (ROI). Motivated by this, we propose ROI-guided joint source-channel coding (ROI-JSCC), a novel deepJSCC framework that prioritizes high-quality transmission of ROI. The ROI-JSCC consists of four key components: (1) Image ROI embedding, (2) ROI-guided split processing, (3) ROI-based loss function design, and (4) ROI-adaptive bandwidth allocation. Together, these components allow ROI-JSCC to selectively enhance the ROI reconstruction quality at varying ROI positions while maintaining overall image quality with minimal computational overhead. Experimental results under diverse communication environments demonstrate that ROI-JSCC significantly improves ROI reconstruction quality while maintaining competitive average image quality compared to recent state-of-the-art methods.

[540] arXiv:2506.01326 (replaced) [pdf, other]
Title: ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research
Zhiyuan Wang, Bokui Chen, Yinya Huang, Qingxing Cao, Ming He, Jianping Fan, Xiaodan Liang
Comments: Accepted by Annual Meetings of the Association for Computational Linguistics 2025
Subjects: Artificial Intelligence (cs.AI)

Operations research (OR) is widely deployed to solve critical decision-making problems with complex objectives and constraints, impacting manufacturing, logistics, finance, and healthcare outcomes. While Large Language Models (LLMs) have shown promising results in various domains, their practical application in industry-relevant operations research (OR) problems presents significant challenges and opportunities. Preliminary industrial applications of LLMs for operations research face two critical deployment challenges: 1) Self-correction focuses on code syntax rather than mathematical accuracy, causing costly errors; 2) Complex expert selection creates unpredictable workflows that reduce transparency and increase maintenance costs, making them impractical for time-sensitive business applications. To address these business limitations, we introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning. Our approach emulates human cognition, implementing an end-to-end workflow that systematically transforms requirements into mathematical models and executable solver code. It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers. Experiments demonstrate that ORMind outperforms existing methods, achieving a 9.5\% improvement on the NL4Opt dataset and a 14.6\% improvement on the ComplexOR dataset.

[541] arXiv:2506.01729 (replaced) [pdf, html, other]
Title: Update-Aware Robust Optimal Model Predictive Control for Nonlinear Systems
J. Wehbeh, E. C. Kerrigan
Comments: 6 pages, 2 figures, published in the IEEE Control System Letters (2025)
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Robust optimal or min-max model predictive control (MPC) approaches aim to guarantee constraint satisfaction over a known, bounded uncertainty set while minimizing a worst-case performance bound. Traditionally, these methods compute a trajectory that meets the desired properties over a fixed prediction horizon, apply a portion of the resulting input, and then re-solve the MPC problem using newly obtained measurements at the next time step. However, this approach fails to account for the fact that the control trajectory will be updated in the future, potentially leading to conservative designs. In this paper, we present a novel update-aware robust optimal MPC algorithm for decreasing horizon problems on nonlinear systems that explicitly accounts for future control trajectory updates. This additional insight allows our method to provably expand the feasible solution set and guarantee improved worst-case performance bounds compared to existing techniques. Our approach formulates the trajectory generation problem as a sequence of nested existence-constrained semi-infinite programs (SIPs), which can be efficiently solved using local reduction techniques. To demonstrate its effectiveness, we evaluate our approach on a planar quadrotor problem, where it clearly outperforms an equivalent method that does not account for future updates at the cost of increased computation time.

[542] arXiv:2506.01742 (replaced) [pdf, html, other]
Title: Smooth Logic Constraints in Nonlinear Optimization and Optimal Control Problems
J. Wehbeh, E. C. Kerrigan
Comments: 6 pages, 7 figures, accepted for publication at the 2025 IEEE Conference on Decision and Control
Subjects: Systems and Control (eess.SY)

In some optimal control problems, complex relationships between states and inputs cannot be easily represented using continuous constraints, necessitating the use of discrete logic instead. This paper presents a method for incorporating such logic constraints directly within continuous optimization frameworks, eliminating the need for binary variables or specialized solvers. Our approach reformulates arbitrary logic constraints under minimal assumptions as max-min constraints, which are then smoothed by introducing auxiliary variables into the optimization problem. When these reformulated constraints are satisfied, they guarantee that the original logical conditions hold, ensuring correctness in the optimization process. We demonstrate the effectiveness of this method on two planar quadrotor control tasks with complex logic constraints. Compared to existing techniques for encoding logic in continuous optimization, our approach achieves faster computational performance and improved convergence to feasible solutions.

[543] arXiv:2506.03590 (replaced) [pdf, html, other]
Title: VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration
Minh Luu, Surya Jasper, Khoi Le, Evan Pan, Michael Quinn, Aakash Tyagi, Jiang Hu
Subjects: Machine Learning (cs.LG)

Failure triage in design functional verification is critical but time-intensive, relying on manual specification reviews, log inspections, and waveform analyses. While machine learning (ML) has improved areas like stimulus generation and coverage closure, its application to RTL-level simulation failure triage, particularly for large designs, remains limited. VCDiag offers an efficient, adaptable approach using VCD data to classify failing waveforms and pinpoint likely failure locations. In the largest experiment, VCDiag achieves over 94% accuracy in identifying the top three most likely modules. The framework introduces a novel signal selection and statistical compression approach, achieving over 120x reduction in raw data size while preserving features essential for classification. It can also be integrated into diverse Verilog/SystemVerilog designs and testbenches.

[544] arXiv:2506.05668 (replaced) [pdf, other]
Title: RNE: plug-and-play diffusion inference-time control and energy-based training
Jiajun He, José Miguel Hernández-Lobato, Yuanqi Du, Francisco Vargas
Comments: 48 pages; 15 figures; Add more experiments on energy-based training, fix several typos and an error in RNC-TDS paragraph
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Diffusion models generate data by removing noise gradually, which corresponds to the time-reversal of a noising process. However, access to only the denoising kernels is often insufficient. In many applications, we need the knowledge of the marginal densities along the generation trajectory, which enables tasks such as inference-time control. To address this gap, in this paper, we introduce the Radon-Nikodym Estimator (RNE). Based on the concept of the density ratio between path distributions, it reveals a fundamental connection between marginal densities and transition kernels, providing a flexible plug-and-play framework that unifies diffusion density estimation, inference-time control, and energy-based diffusion training under a single perspective. Experiments demonstrated that RNE delivers strong results in inference-time control applications, such as annealing and model composition, with promising inference-time scaling performance. Moreover, RNE provides a simple yet efficient regularisation for training energy-based diffusion.

[545] arXiv:2506.07940 (replaced) [pdf, html, other]
Title: Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation
Christopher Subia-Waud (Rayonlabs Team)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Current AutoML platforms leave substantial performance untapped. Testing 180 fine-tuning tasks across models from 70M to 70B parameters, we found that HuggingFace AutoTrain, TogetherAI, Databricks, and Google Cloud consistently produce suboptimal configurations. Gradients, built on the Bittensor network, attacks this problem through competition. Independent miners race to find optimal hyperparameters, earning rewards proportional to their models' performance. This tournament drives exploration of configuration spaces that single-strategy methods never examine. In our experiments, Gradients achieved a 100\% win rate against TogetherAI, Databricks, and Google Cloud, and beat HuggingFace AutoTrain in 82.8\% of experiments. Mean improvements reached 42.1\% against commercial platforms. Retrieval-augmented generation tasks saw 30-40\% gains; diffusion models improved 23.4\% on person-specific generation. When miners compete for rewards, they develop optimization strategies that centralized approaches overlook. These findings demonstrate that decentralized systems with economic incentives can systematically outperform traditional AutoML, suggesting market dynamics may be key to achieving superior fine-tuning results. Code is available at this https URL.

[546] arXiv:2506.09785 (replaced) [pdf, html, other]
Title: A theoretical framework for self-supervised contrastive learning for continuous dependent data
Alexander Marusov, Aleksandr Yugay, Alexey Zaytsev
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision. However, its application to dependent data, such as temporal and spatio-temporal domains, remains underexplored. Besides, traditional contrastive SSL methods often assume \emph{semantic independence between samples}, which does not hold for dependent data exhibiting complex correlations. We propose a novel theoretical framework for contrastive SSL tailored to \emph{continuous dependent data}, which allows the nearest samples to be semantically close to each other. In particular, we propose two possible \textit{ground truth similarity measures} between objects -- \emph{hard} and \emph{soft} closeness. Under it, we derive an analytical form for the \textit{estimated similarity matrix} that accommodates both types of closeness between samples, thereby introducing dependency-aware loss functions. We validate our approach, \emph{Dependent TS2Vec}, on temporal and spatio-temporal downstream problems. Given the dependency patterns presented in the data, our approach surpasses modern ones for dependent data, highlighting the effectiveness of our theoretically grounded loss functions for SSL in capturing spatio-temporal dependencies. Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$\% and $2.08$\%, respectively. Furthermore, on the drought classification task, which involves complex spatio-temporal patterns, our method achieves a $7$\% higher ROC-AUC score.

[547] arXiv:2506.12100 (replaced) [pdf, html, other]
Title: LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
Reza Fayyazi, Michael Zuzak, Shanchieh Jay Yang
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) are increasingly used for cybersecurity threat analysis, but their deployment in security-sensitive environments raises trust and safety concerns. With over 21,000 vulnerabilities disclosed in 2025, manual analysis is infeasible, making scalable and verifiable AI support critical. When querying LLMs, dealing with emerging vulnerabilities is challenging as they have a training cut-off date. While Retrieval-Augmented Generation (RAG) can inject up-to-date context to alleviate the cut-off date limitation, it remains unclear how much LLMs rely on retrieved evidence versus the model's internal knowledge, and whether the retrieved information is meaningful or even correct. This uncertainty could mislead security analysts, mis-prioritize patches, and increase security risks. Therefore, this work proposes LLM Embedding-based Attribution (LEA) to analyze the generated responses for vulnerability exploitation analysis. More specifically, LEA quantifies the relative contribution of internal knowledge vs. retrieved content in the generated responses. We evaluate LEA on 500 critical vulnerabilities disclosed between 2016 and 2025, across three RAG settings -- valid, generic, and incorrect -- using three state-of-the-art LLMs. Our results demonstrate LEA's ability to detect clear distinctions between non-retrieval, generic-retrieval, and valid-retrieval scenarios with over 95% accuracy on larger models. Finally, we demonstrate the limitations posed by incorrect retrieval of vulnerability information and raise a cautionary note to the cybersecurity community regarding the blind reliance on LLMs and RAG for vulnerability analysis. LEA offers security analysts with a metric to audit RAG-enhanced workflows, improving the transparent and trustworthy deployment of AI in cybersecurity threat analysis.

[548] arXiv:2506.12348 (replaced) [pdf, html, other]
Title: Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments
Zaiqiang Wu, I-Chao Shen, Takeo Igarashi
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Per-garment virtual try-on methods collect garment-specific datasets and train networks tailored to each garment to achieve superior results. However, these approaches often struggle with loose-fitting garments due to two key limitations: (1) They rely on human body semantic maps to align garments with the body, but these maps become unreliable when body contours are obscured by loose-fitting garments, resulting in degraded outcomes; (2) They train garment synthesis networks on a per-frame basis without utilizing temporal information, leading to noticeable jittering artifacts. To address the first limitation, we propose a two-stage approach for robust semantic map estimation. First, we extract a garment-invariant representation from the raw input image. This representation is then passed through an auxiliary network to estimate the semantic map. This enhances the robustness of semantic map estimation under loose-fitting garments during garment-specific dataset generation. To address the second limitation, we introduce a recurrent garment synthesis framework that incorporates temporal dependencies to improve frame-to-frame coherence while maintaining real-time performance. We conducted qualitative and quantitative evaluations to demonstrate that our method outperforms existing approaches in both image quality and temporal coherence. Ablation studies further validate the effectiveness of the garment-invariant representation and the recurrent synthesis framework.

[549] arXiv:2506.12389 (replaced) [pdf, other]
Title: Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity
Zhiyuan Su, Sunhao Dai, Xiao Zhang
Comments: Some proof details are being revised
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Clustering of Bandits (CB) methods enhance sequential decision-making by grouping bandits into clusters based on similarity and incorporating cluster-level contextual information, demonstrating effectiveness and adaptability in applications like personalized streaming recommendations. However, when extending CB algorithms to their neural version (commonly referred to as Clustering of Neural Bandits, or CNB), they suffer from loss of plasticity, where neural network parameters become rigid and less adaptable over time, limiting their ability to adapt to non-stationary environments (e.g., dynamic user preferences in recommendation). To address this challenge, we propose Selective Reinitialization (SeRe), a novel bandit learning framework that dynamically preserves the adaptability of CNB algorithms in evolving environments. SeRe leverages a contribution utility metric to identify and selectively reset underutilized units, mitigating loss of plasticity while maintaining stable knowledge retention. Furthermore, when combining SeRe with CNB algorithms, the adaptive change detection mechanism adjusts the reinitialization frequency according to the degree of non-stationarity, ensuring effective adaptation without unnecessary resets. Theoretically, we prove that SeRe enables sublinear cumulative regret in piecewise-stationary environments, outperforming traditional CNB approaches in long-term performances. Extensive experiments on six real-world recommendation datasets demonstrate that SeRe-enhanced CNB algorithms can effectively mitigate the loss of plasticity with lower regrets, improving adaptability and robustness in dynamic settings.

[550] arXiv:2506.13265 (replaced) [pdf, html, other]
Title: Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning
Rohit Mohan, Julia Hindel, Florian Drews, Claudius Gläser, Daniele Cattaneo, Abhinav Valada
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Autonomous vehicles that navigate in open-world environments may encounter previously unseen object classes. However, most existing LiDAR panoptic segmentation models rely on closed-set assumptions, failing to detect unknown object instances. In this work, we propose ULOPS, an uncertainty-guided open-set panoptic segmentation framework that leverages Dirichlet-based evidential learning to model predictive uncertainty. Our architecture incorporates separate decoders for semantic segmentation with uncertainty estimation, embedding with prototype association, and instance center prediction. During inference, we leverage uncertainty estimates to identify and segment unknown instances. To strengthen the model's ability to differentiate between known and unknown objects, we introduce three uncertainty-driven loss functions. Uniform Evidence Loss to encourage high uncertainty in unknown regions. Adaptive Uncertainty Separation Loss ensures a consistent difference in uncertainty estimates between known and unknown objects at a global scale. Contrastive Uncertainty Loss refines this separation at the fine-grained level. To evaluate open-set performance, we extend benchmark settings on KITTI-360 and introduce a new open-set evaluation for nuScenes. Extensive experiments demonstrate that ULOPS consistently outperforms existing open-set LiDAR panoptic segmentation methods.

[551] arXiv:2506.13554 (replaced) [pdf, html, other]
Title: Non-Asymptotic Stability and Consistency Guarantees for Physics-Informed Neural Networks via Coercive Operator Analysis
Ronald Katende
Subjects: Machine Learning (cs.LG); Functional Analysis (math.FA); Numerical Analysis (math.NA)

We present a unified theoretical framework for analyzing the stability and consistency of Physics-Informed Neural Networks (PINNs), grounded in operator coercivity, variational formulations, and non-asymptotic perturbation theory. PINNs approximate solutions to partial differential equations (PDEs) by minimizing residual losses over sampled collocation and boundary points. We formalize both operator-level and variational notions of consistency, proving that residual minimization in Sobolev norms leads to convergence in energy and uniform norms under mild regularity. Deterministic stability bounds quantify how bounded perturbations to the network outputs propagate through the full composite loss, while probabilistic concentration results via McDiarmid's inequality yield sample complexity guarantees for residual-based generalization. A unified generalization bound links residual consistency, projection error, and perturbation sensitivity. Empirical results on elliptic, parabolic, and nonlinear PDEs confirm the predictive accuracy of our theoretical bounds across regimes. The framework identifies key structural principles, such as operator coercivity, activation smoothness, and sampling admissibility, that underlie robust and generalizable PINN training, offering principled guidance for the design and analysis of PDE-informed learning systems.

[552] arXiv:2506.15079 (replaced) [pdf, html, other]
Title: Neural Canonical Polyadic Factorization for Traffic Analysis
Wenyu Luo, Yikai Hou, Peng Tang
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Modern intelligent transportation systems rely on accurate spatiotemporal traffic analysis to optimize urban mobility and infrastructure resilience. However, pervasive missing data caused by sensor failures and heterogeneous sensing gaps fundamentally hinders reliable traffic modeling. This paper proposes a Neural Canonical Polyadic Factorization (NCPF) model that synergizes low-rank tensor algebra with deep representation learning for robust traffic data imputation. The model innovatively embeds CP decomposition into neural architecture through learnable embedding projections, where sparse traffic tensors are encoded into dense latent factors across road segments, time intervals, and mobility metrics. A hierarchical feature fusion mechanism employs Hadamard products to explicitly model multilinear interactions, while stacked multilayer perceptron layers nonlinearly refine these representations to capture complex spatiotemporal couplings. Extensive evaluations on six urban traffic datasets demonstrate NCPF's superiority over six state-of-the-art baselines. By unifying CP decomposition's interpretable factor analysis with neural network's nonlinear expressive power, NCPF provides a principled yet flexible approaches for high-dimensional traffic data imputation, offering critical support for next-generation transportation digital twins and adaptive traffic control systems.

[553] arXiv:2506.17948 (replaced) [pdf, html, other]
Title: Your Build Scripts Stink: The State of Code Smells in Build Scripts
Mahzabin Tamanna, Yash Chandrani, Matthew Burrows, Brandon Wroblewski, Laurie Williams, Dominik Wermke
Comments: 13 pages, 5 tables, 2 figures
Subjects: Software Engineering (cs.SE)

Build scripts are files that automate the process of compiling source code, managing dependencies, running tests, and packaging software into deployable artifacts. These scripts are ubiquitous in modern software development pipelines for streamlining testing and delivery. While developing build scripts, practitioners may inadvertently introduce code smells. Code smells are recurring patterns of poor coding practices that may lead to build failures or increase risk and technical debt. The goal of this study is to aid practitioners in avoiding code smells in build scripts through an empirical study of build scripts and issues on GitHub. We employed a mixed-methods approach, combining qualitative and quantitative analysis. We conducted a qualitative analysis of 2000 build-script-related GitHub issues. Next, we developed a static analysis tool, Sniffer, to identify code smells in 5882 build scripts of Maven, Gradle, CMake, and Make files, collected from 4877 open-source GitHub repositories. We identified 13 code smell categories, with a total of 10,895 smell occurrences, where 3184 were in Maven, 1214 in Gradle, 337 in CMake, and 6160 in Makefiles.
Our analysis revealed that Insecure URLs were the most prevalent code smell in Maven build scripts, while Hardcoded Paths/URLs were commonly observed in both Gradle and CMake scripts. Wildcard Usage emerged as the most frequent smell in Makefiles. The co-occurrence analysis revealed strong associations between specific smell pairs of Hardcoded Paths/URLs with Duplicates, and Inconsistent Dependency Management with Empty or Incomplete Tags, indicating potential underlying issues in the build script structure and maintenance practices. Based on our findings, we also recommended strategies to mitigate the existence of code smells in build scripts to improve the efficiency, reliability, and maintainability of software projects.

[554] arXiv:2506.18096 (replaced) [pdf, html, other]
Title: Deep Research Agents: A Systematic Examination And Roadmap
Yuxuan Huang, Yihang Chen, Haozheng Zhang, Kang Li, Huichi Zhou, Meng Fang, Linyi Yang, Xiaoguang Li, Lifeng Shang, Songcen Xu, Jianye Hao, Kun Shao, Jun Wang
Subjects: Artificial Intelligence (cs.AI)

The rapid progress of Large Language Models (LLMs) has given rise to a new category of autonomous AI systems, referred to as Deep Research (DR) agents. These agents are designed to tackle complex, multi-turn informational research tasks by leveraging a combination of dynamic reasoning, adaptive long-horizon planning, multi-hop information retrieval, iterative tool use, and the generation of structured analytical reports. In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute Deep Research agents. We begin by reviewing information acquisition strategies, contrasting API-based retrieval methods with browser-based exploration. We then examine modular tool-use frameworks, including code execution, multimodal input processing, and the integration of Model Context Protocols (MCPs) to support extensibility and ecosystem development. To systematize existing approaches, we propose a taxonomy that differentiates between static and dynamic workflows, and we classify agent architectures based on planning strategies and agent composition, including single-agent and multi-agent configurations. We also provide a critical evaluation of current benchmarks, highlighting key limitations such as restricted access to external knowledge, sequential execution inefficiencies, and misalignment between evaluation metrics and the practical objectives of DR agents. Finally, we outline open challenges and promising directions for future research. A curated and continuously updated repository of DR agent research is available at: {this https URL}.

[555] arXiv:2506.18368 (replaced) [pdf, html, other]
Title: Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection
Anja Delić, Matej Grcić, Siniša Šegvić
Comments: ICCV 2025 Highlight
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detecting anomalous human behaviour is an important visual task in safety-critical applications such as healthcare monitoring, workplace safety, or public surveillance. In these contexts, abnormalities are often reflected with unusual human poses. Thus, we propose SeeKer, a method for detecting anomalies in sequences of human skeletons. Our method formulates the skeleton sequence density through autoregressive factorization at the keypoint level. The corresponding conditional distributions represent probable keypoint locations given prior skeletal motion. We formulate the joint distribution of the considered skeleton as causal prediction of conditional Gaussians across its constituent keypoints. A skeleton is flagged as anomalous if its keypoint locations surprise our model (i.e. receive a low density). In practice, our anomaly score is a weighted sum of per-keypoint log-conditionals, where the weights account for the confidence of the underlying keypoint detector. Despite its conceptual simplicity, SeeKer surpasses all previous methods on the UBnormal and MSAD-HR datasets while delivering competitive performance on the ShanghaiTech dataset.

[556] arXiv:2506.19608 (replaced) [pdf, html, other]
Title: ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP
Zhiyuan Wang, Bokui Chen
Comments: Accepted by the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2025)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Continual learning (CL) empowers pre-trained vision-language models to adapt effectively to novel or previously underrepresented data distributions without comprehensive retraining, enhancing their adaptability and efficiency. While vision-language models like CLIP show great promise, they struggle to maintain performance across domains in incremental learning scenarios. Existing prompt learning methods face two main limitations: 1) they primarily focus on class-incremental learning scenarios, lacking specific strategies for multi-domain task incremental learning; 2) most current approaches employ single-modal prompts, neglecting the potential benefits of cross-modal information exchange. To address these challenges, we propose the \ChordPrompt framework, which facilitates a harmonious interplay between visual and textual prompts. \ChordPrompt introduces cross-modal prompts to leverage interactions between visual and textual information. Our approach also employs domain-adaptive text prompts to select appropriate prompts for continual adaptation across multiple domains. Comprehensive experiments on multi-domain incremental learning benchmarks demonstrate that \ChordPrompt outperforms state-of-the-art methods in zero-shot generalization and downstream task performance.

[557] arXiv:2506.19992 (replaced) [pdf, html, other]
Title: HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization
Gabor Petnehazi, Bernadett Aradi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The explosive growth of complex datasets across various modalities necessitates advanced analytical tools that not only group data effectively but also provide human-understandable insights into the discovered structures. We introduce HERCULES (Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization), a novel algorithm and Python package designed for hierarchical k-means clustering of diverse data types, including text, images, and numeric data (processed one modality per run). HERCULES constructs a cluster hierarchy by recursively applying k-means clustering, starting from individual data points at level 0. A key innovation is its deep integration of Large Language Models (LLMs) to generate semantically rich titles and descriptions for clusters at each level of the hierarchy, significantly enhancing interpretability. The algorithm supports two main representation modes: `direct' mode, which clusters based on original data embeddings or scaled numeric features, and `description' mode, which clusters based on embeddings derived from LLM-generated summaries. Users can provide a `topic\_seed' to guide LLM-generated summaries towards specific themes. An interactive visualization tool facilitates thorough analysis and understanding of the clustering results. We demonstrate HERCULES's capabilities and discuss its potential for extracting meaningful, hierarchical knowledge from complex datasets.

[558] arXiv:2506.21619 (replaced) [pdf, html, other]
Title: IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Existing autoregressive large-scale text-to-speech (TTS) models have advantages in speech naturalness, but their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This becomes a significant limitation in applications requiring strict audio-visual synchronization, such as video dubbing. This paper introduces IndexTTS2, which proposes a novel, general, and autoregressive model-friendly method for speech duration control. The method supports two generation modes: one explicitly specifies the number of generated tokens to precisely control speech duration; the other freely generates speech in an autoregressive manner without specifying the number of tokens, while faithfully reproducing the prosodic features of the input prompt. Furthermore, IndexTTS2 achieves disentanglement between emotional expression and speaker identity, enabling independent control over timbre and emotion. In the zero-shot setting, the model can accurately reconstruct the target timbre (from the timbre prompt) while perfectly reproducing the specified emotional tone (from the style prompt). To enhance speech clarity in highly emotional expressions, we incorporate GPT latent representations and design a novel three-stage training paradigm to improve the stability of the generated speech. Additionally, to lower the barrier for emotional control, we designed a soft instruction mechanism based on text descriptions by fine-tuning Qwen3, effectively guiding the generation of speech with the desired emotional orientation. Finally, experimental results on multiple datasets show that IndexTTS2 outperforms state-of-the-art zero-shot TTS models in terms of word error rate, speaker similarity, and emotional fidelity. Audio samples are available at: this https URL

[559] arXiv:2506.21915 (replaced) [pdf, other]
Title: An Effective Two-Phase Genetic Algorithm for Solving the Resource Constrained Project Scheduling Problem (RCPSP)
D. Sun, S. Zhou
Comments: 12 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

This note presents a simple and effective variation of genetic algorithm (GA) for solving RCPSP, denoted as 2-Phase Genetic Algorithm (2PGA). The 2PGA implements GA parent selection in two phases: Phase-1 includes the best current solutions in the parent pool, and Phase-2 excludes the best current solutions from the parent pool. The 2PGA carries out the GA evolution by alternating the two phases iteratively. In exploring a solution space, the Phase-1 emphasizes intensification in current neighborhood, while the Phase-2 emphasizes diversification to escape local traps. The 2PGA was tested on the standard benchmark problems in PSPLIB, the results have shown that the algorithm is effective and has improved some of the best heuristic solutions.

[560] arXiv:2506.23294 (replaced) [pdf, html, other]
Title: Threshold Signatures for Central Bank Digital Currencies
Mostafa Abdelrahman, Filip Rezabek, Lars Hupel, Kilian Glas, Georg Carle
Subjects: Cryptography and Security (cs.CR)

Digital signatures are crucial for securing Central Bank Digital Currencies (CBDCs) transactions. Like most forms of digital currencies, CBDC solutions rely on signatures for transaction authenticity and integrity, leading to major issues in the case of private key compromise. Our work explores threshold signature schemes (TSSs) in the context of CBDCs. TSSs allow distributed key management and signing, reducing the risk of a compromised key. We analyze CBDC-specific requirements, considering the applicability of TSSs, and use Filia CBDC solution as a base for a detailed evaluation. As most of the current solutions rely on ECDSA for compatibility, we focus on ECDSA-based TSSs and their supporting libraries. Our performance evaluation measured the computational and communication complexity across key processes, as well as the throughput and latency of end-to-end transactions. The results confirm that TSS can enhance the security of CBDC implementations while maintaining acceptable performance for real-world deployments.

[561] arXiv:2506.23367 (replaced) [pdf, html, other]
Title: You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Paige Tuttösí, H. Henny Yeung, Yue Wang, Jean-Julien Aucouturier, Angelica Lim
Comments: Accepted to ISCA Speech Synthesis Workshop, 2025, Project webpage here: this https URL Code here: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

We present the first text-to-speech (TTS) system tailored to second language (L2) speakers. We use duration differences between American English tense (longer) and lax (shorter) vowels to create a "clarity mode" for Matcha-TTS. Our perception studies showed that French-L1, English-L2 listeners had fewer (at least 9.15%) transcription errors when using our clarity mode, and found it more encouraging and respectful than overall slowed down speech. Remarkably, listeners were not aware of these effects: despite the decreased word error rate in clarity mode, listeners still believed that slowing all target words was the most intelligible, suggesting that actual intelligibility does not correlate with perceived intelligibility. Additionally, we found that Whisper-ASR did not use the same cues as L2 speakers to differentiate difficult vowels and is not sufficient to assess the intelligibility of TTS systems for these individuals.

[562] arXiv:2506.23903 (replaced) [pdf, html, other]
Title: GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models
Hamza Rasaee, Taha Koleilat, Hassan Rivaz
Comments: 11 pages, 3 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision-language model (VLM) that integrates Grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of Grounding DINO using Low Rank Adaptation (LoRA) to the ultrasound domain, and 3 were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art segmentation methods, including UniverSeg, MedSAM, MedCLIP-SAM, BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on this http URL after acceptance.

[563] arXiv:2507.00790 (replaced) [pdf, html, other]
Title: LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
Huaqiu Li, Yong Wang, Tongwen Huang, Hailang Huang, Haoqian Wang, Xiangxiang Chu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Unified image restoration is a significantly challenging task in low-level vision. Existing methods either make tailored designs for specific tasks, limiting their generalizability across various types of degradation, or rely on training with paired datasets, thereby suffering from closed-set constraints. To address these issues, we propose a novel, dataset-free, and unified approach through recurrent posterior sampling utilizing a pretrained latent diffusion model. Our method incorporates the multimodal understanding model to provide sematic priors for the generative model under a task-blind condition. Furthermore, it utilizes a lightweight module to align the degraded input with the generated preference of the diffusion model, and employs recurrent refinement for posterior sampling. Extensive experiments demonstrate that our method outperforms state-of-the-art methods, validating its effectiveness and robustness. Our code and data are available at this https URL.

[564] arXiv:2507.00917 (replaced) [pdf, html, other]
Title: A Survey: Learning Embodied Intelligence from Physical Simulators and World Models
Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, Wei Li, Wei Yin, Yao Yao, Jia Pan, Qiu Shen, Ruigang Yang, Xun Cao, Qionghai Dai
Comments: Update with recent progresses. 49pages, 25figures, 6tables, github repository avalible in this https URL
Subjects: Robotics (cs.RO)

The pursuit of artificial general intelligence (AGI) has placed embodied intelligence at the forefront of robotics research. Embodied intelligence focuses on agents capable of perceiving, reasoning, and acting within the physical world. Achieving robust embodied intelligence requires not only advanced perception and control, but also the ability to ground abstract cognition in real-world interactions. Two foundational technologies, physical simulators and world models, have emerged as critical enablers in this quest. Physical simulators provide controlled, high-fidelity environments for training and evaluating robotic agents, allowing safe and efficient development of complex behaviors. In contrast, world models empower robots with internal representations of their surroundings, enabling predictive planning and adaptive decision-making beyond direct sensory input. This survey systematically reviews recent advances in learning embodied AI through the integration of physical simulators and world models. We analyze their complementary roles in enhancing autonomy, adaptability, and generalization in intelligent robots, and discuss the interplay between external simulation and internal modeling in bridging the gap between simulated training and real-world deployment. By synthesizing current progress and identifying open challenges, this survey aims to provide a comprehensive perspective on the path toward more capable and generalizable embodied AI systems. We also maintain an active repository that contains up-to-date literature and open-source projects at this https URL.

[565] arXiv:2507.02654 (replaced) [pdf, html, other]
Title: Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure
Rui Xie, Asad Ul Haq, Yunhua Fang, Linsen Ma, Sanchari Sen, Swagath Venkataramani, Liu Liu, Tong Zhang
Subjects: Hardware Architecture (cs.AR)

High-Bandwidth Memory (HBM) delivers exceptional bandwidth and energy efficiency for AI workloads, but its high cost per bit, driven in part by stringent on-die reliability requirements, poses a growing barrier to scalable deployment. This work explores a system-level approach to cost reduction by eliminating on-die ECC and shifting all fault management to the memory controller. We introduce a domain-specific ECC framework combining large-codeword Reed--Solomon~(RS) correction with lightweight fine-grained CRC detection, differential parity updates to mitigate write amplification, and tunable protection based on data importance. Our evaluation using LLM inference workloads shows that, even under raw HBM bit error rates up to $10^{-3}$, the system retains over 78\% of throughput and 97\% of model accuracy compared with systems equipped with ideal error-free HBM. By treating reliability as a tunable system parameter rather than a fixed hardware constraint, our design opens a new path toward low-cost, high-performance HBM deployment in AI infrastructure.

[566] arXiv:2507.03034 (replaced) [pdf, html, other]
Title: Rethinking Data Protection in the (Generative) Artificial Intelligence Era
Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren
Comments: Perspective paper for a broader scientific audience. The first two authors contributed equally to this paper. 13 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

The (generative) artificial intelligence (AI) era has profoundly reshaped the meaning and value of data. No longer confined to static content, data now permeates every stage of the AI lifecycle from the training samples that shape model parameters to the prompts and outputs that drive real-world model deployment. This shift renders traditional notions of data protection insufficient, while the boundaries of what needs safeguarding remain poorly defined. Failing to safeguard data in AI systems can inflict societal and individual, underscoring the urgent need to clearly delineate the scope of and rigorously enforce data protection. In this perspective, we propose a four-level taxonomy, including non-usability, privacy preservation, traceability, and deletability, that captures the diverse protection needs arising in modern (generative) AI models and systems. Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. We analyze representative technical approaches at each level and reveal regulatory blind spots that leave critical assets exposed. By offering a structured lens to align future AI technologies and governance with trustworthy data practices, we underscore the urgency of rethinking data protection for modern AI techniques and provide timely guidance for developers, researchers, and regulators alike.

[567] arXiv:2507.04416 (replaced) [pdf, html, other]
Title: RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
Xiuying Wei, Anunay Yadav, Razvan Pascanu, Caglar Gulcehre
Subjects: Computation and Language (cs.CL)

Transformers have become the cornerstone of modern large-scale language models, but their reliance on softmax attention poses a computational bottleneck at both training and inference. Recurrent models offer high efficiency, but compressing the full sequence into a fixed-size and holistic representation suffers from memory degradation in long contexts and limits fine-grained retrieval. To address this, we propose RAT, an intermediate design that bridges the efficiency of RNNs and capacity of attention. RAT partitions the input into chunks, applies recurrence within each chunk for local dependencies, and softmax-based attention across chunks for long-range interactions. This design mitigates memory degradation and enables direct access to distant tokens, while retaining computational efficiency. Empirically, with a chunk size of 16, the RAT block achieves a 7x improvement in training speed with 100K token sequences and 9x in generation at the 4K position, while maintaining similar performance compared to standard attention. We demonstrate this by training 1.3B parameter models from scratch and performing large-scale evaluations, including short- and long-context benchmarks, as well as supervised fine-tuning~(SFT). We further propose a hybrid architecture that interleaves RAT with local attention. By combining efficient long-range modeling with strong local interactions, this hybrid design not only improves inference speed and reduces cache memory usage, but also consistently enhances performance and shows the overall best results. Code is available at this https URL.

[568] arXiv:2507.06463 (replaced) [pdf, html, other]
Title: Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis
Atieh Barati Nia, Mohammad Dindoost, David A. Bader
Comments: 7 pages, v2: minor revision to match final paper published in the The 29th Annual IEEE High Performance Extreme Computing Conference (HPEC), Virtual, September 15-19, 2025. Outstanding Student Paper Award
Subjects: Software Engineering (cs.SE)

Large Language Models (LLMs) are increasingly used to automate software development, yet most prior evaluations focus on functional correctness or high-level languages such as Python. As one of the first systematic explorations of LLM-assisted software performance engineering, we present a comprehensive study of LLMs' ability to generate efficient C implementations of graph-analysis routines -- code that must satisfy stringent runtime and memory constraints. This emerging field of LLM-assisted algorithm engineering holds significant promise, as these models may possess the capability to design novel approaches that improve existing algorithms and their implementations. Eight state-of-the-art models (OpenAI ChatGPT o3 and o4-mini-high, Anthropic Claude 4 Sonnet and Sonnet Extended, Google Gemini 2.5 Flash and Pro, xAI Grok 3-Think, and DeepSeek DeepThink R1) are benchmarked using two distinct approaches. The first approach evaluates the ability of LLMs to generate algorithms that outperform existing benchmarks. The second approach assesses their capability to generate graph algorithms for integration into performance-critical systems. The results show that Claude Sonnet 4 Extended achieves superior performance in ready-to-use code generation and efficiency, outperforming human-written baselines in triangle counting. Although our findings demonstrate that contemporary LLMs excel in optimizing and integrating established algorithms, the potential for these models to eventually invent transformative algorithmic techniques represents a compelling frontier for future research. We provide prompts, generated code, and measurement scripts to promote reproducible research in this rapidly evolving domain. All of the source code is available on GitHub at this https URL.

[569] arXiv:2507.06656 (replaced) [pdf, html, other]
Title: Enhancing Diffusion Model Stability for Image Restoration via Gradient Management
Hongjie Wu, Mingqin Zhang, Linchao He, Ji-Zhe Zhou, Jiancheng Lv
Comments: Accepted to ACM Multimedia 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Diffusion models have shown remarkable promise for image restoration by leveraging powerful priors. Prominent methods typically frame the restoration problem within a Bayesian inference framework, which iteratively combines a denoising step with a likelihood guidance step. However, the interactions between these two components in the generation process remain underexplored. In this paper, we analyze the underlying gradient dynamics of these components and identify significant instabilities. Specifically, we demonstrate conflicts between the prior and likelihood gradient directions, alongside temporal fluctuations in the likelihood gradient itself. We show that these instabilities disrupt the generative process and compromise restoration performance. To address these issues, we propose Stabilized Progressive Gradient Diffusion (SPGD), a novel gradient management technique. SPGD integrates two synergistic components: (1) a progressive likelihood warm-up strategy to mitigate gradient conflicts; and (2) adaptive directional momentum (ADM) smoothing to reduce fluctuations in the likelihood gradient. Extensive experiments across diverse restoration tasks demonstrate that SPGD significantly enhances generation stability, leading to state-of-the-art performance in quantitative metrics and visually superior results. Code is available at this https URL.

[570] arXiv:2507.08406 (replaced) [pdf, other]
Title: CCSS: Hardware-Accelerated RTL Simulation with Fast Combinational Logic Computing and Sequential Logic Synchronization
Weigang Feng, Yijia Zhang, Zekun Wang, Zhengyang Wang, Yi Wang, Peijun Ma, Ningyi Xu
Comments: We plan to add more experiments and refine the figures in the paper. In addition, the overall structure needs significant revision to improve its readability
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

As transistor counts in a single chip exceed tens of billions, the complexity of RTL-level simulation and verification has grown exponentially, often extending simulation campaigns to several months. In industry practice, RTL simulation is divided into two phases: functional debug and system validation. While system validation demands high simulation speed and is typically accelerated using FPGAs, functional debug relies on rapid compilation-rendering multi-core CPUs the primary choice. However, the limited simulation speed of CPUs has become a major bottleneck. To address this challenge, we propose CCSS, a scalable multi-core RTL simulation platform that achieves both fast compilation and high simulation throughput. CCSS accelerates combinational logic computation and sequential logic synchronization through specialized architecture and compilation strategies. It employs a balanced DAG partitioning method and efficient boolean computation cores for combinational logic, and adopts a low-latency network-on-chip (NoC) design to synchronize sequential states across cores efficiently. Experimental results show that CCSS delivers up to 12.9x speedup over state-of-the-art multi-core simulators.

[571] arXiv:2507.09879 (replaced) [pdf, html, other]
Title: Covering a Few Submodular Constraints and Applications
Tanvi Bajpai, Chandra Chekuri, Pooja Kulkarni
Comments: 34 pages. Accepted to APPROX 2025
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

We consider the problem of covering multiple submodular constraints. Given a finite ground set $N$, a cost function $c: N \rightarrow \mathbb{R}_+$, $r$ monotone submodular functions $f_1,f_2,\ldots,f_r$ over $N$ and requirements $b_1,b_2,\ldots,b_r$ the goal is to find a minimum cost subset $S \subseteq N$ such that $f_i(S) \ge b_i$ for $1 \le i \le r$. When $r=1$ this is the well-known Submodular Set Cover problem. Previous work \cite{chekuri2022covering} considered the setting when $r$ is large and developed bi-criteria approximation algorithms, and approximation algorithms for the important special case when each $f_i$ is a weighted coverage function. These are fairly general models and capture several concrete and interesting problems as special cases. The approximation ratios for these problem are at least $\Omega(\log r)$ which is unavoidable when $r$ is part of the input. In this paper, motivated by some recent applications, we consider the problem when $r$ is a \emph{fixed constant} and obtain two main results. For covering multiple submodular constraints we obtain a randomized bi-criteria approximation algorithm that for any given integer $\alpha \ge 1$ outputs a set $S$ such that $f_i(S) \ge$ $(1-1/e^\alpha -\epsilon)b_i$ for each $i \in [r]$ and $\mathbb{E}[c(S)] \le (1+\epsilon)\alpha \cdot \sf{OPT}$. Second, when the $f_i$ are weighted coverage functions from a deletion-closed set system we obtain a $(1+\epsilon)$ $(\frac{e}{e-1})$ $(1+\beta)$-approximation where $\beta$ is the approximation ratio for the underlying set cover instances via the natural LP. These results show that one can obtain nearly as good an approximation for any fixed $r$ as what one would achieve for $r=1$. We mention some applications that follow easily from these general results and anticipate more in the future.

[572] arXiv:2507.10578 (replaced) [pdf, html, other]
Title: When and Where do Data Poisons Attack Textual Inversion?
Jeremy Styborski, Mingzhi Lyu, Jiayou Lu, Nupur Kapur, Adams Kong
Comments: Accepted to ICCV 2025
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Poisoning attacks pose significant challenges to the robustness of diffusion models (DMs). In this paper, we systematically analyze when and where poisoning attacks textual inversion (TI), a widely used personalization technique for DMs. We first introduce Semantic Sensitivity Maps, a novel method for visualizing the influence of poisoning on text embeddings. Second, we identify and experimentally verify that DMs exhibit non-uniform learning behavior across timesteps, focusing on lower-noise samples. Poisoning attacks inherit this bias and inject adversarial signals predominantly at lower timesteps. Lastly, we observe that adversarial signals distract learning away from relevant concept regions within training data, corrupting the TI process. Based on these insights, we propose Safe-Zone Training (SZT), a novel defense mechanism comprised of 3 key components: (1) JPEG compression to weaken high-frequency poison signals, (2) restriction to high timesteps during TI training to avoid adversarial signals at lower timesteps, and (3) loss masking to constrain learning to relevant regions. Extensive experiments across multiple poisoning methods demonstrate that SZT greatly enhances the robustness of TI against all poisoning attacks, improving generative quality beyond prior published defenses. Code: this http URL Data: this http URL

[573] arXiv:2507.17695 (replaced) [pdf, html, other]
Title: Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks
Ilias Chatzistefanidis, Navid Nikaein
Comments: Submitted to Computer Networks AI for 6G
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

Large Language Model (LLM)-based autonomous agents are expected to play a vital role in the evolution of 6G networks, by empowering real-time decision-making related to management and service provisioning to end-users. This shift facilitates the transition from a specialized intelligence approach, where artificial intelligence (AI) algorithms handle isolated tasks, to artificial general intelligence (AGI)-driven networks, where agents possess broader reasoning capabilities and can manage diverse network functions. In this paper, we introduce a novel agentic paradigm that combines LLMs with real-time optimization algorithms towards Trustworthy AI, defined as symbiotic agents. Optimizers at the LLM's input-level provide bounded uncertainty steering for numerically precise tasks, whereas output-level optimizers supervised by the LLM enable adaptive real-time control. We design and implement two novel agent types including: (i) Radio Access Network optimizers, and (ii) multi-agent negotiators for Service-Level Agreements (SLAs). We further propose an end-to-end architecture for AGI networks and evaluate it on a 5G testbed capturing channel fluctuations from moving vehicles. Results show that symbiotic agents reduce decision errors fivefold compared to standalone LLM-based agents, while smaller language models (SLM) achieve similar accuracy with a 99.9% reduction in GPU resource overhead and in near-real-time loops of 82 ms. A multi-agent demonstration for collaborative RAN on the real-world testbed highlights significant flexibility in service-level agreement and resource allocation, reducing RAN over-utilization by approximately 44%. Drawing on our findings and open-source implementations, we introduce the symbiotic paradigm as the foundation for next-generation, AGI-driven networks-systems designed to remain adaptable, efficient, and trustworthy even as LLMs advance.

[574] arXiv:2507.17736 (replaced) [pdf, html, other]
Title: Symmetric Private Information Retrieval (SPIR) on Graph-Based Replicated Systems
Shreya Meel, Sennur Ulukus
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Databases (cs.DB); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

We introduce the problem of symmetric private information retrieval (SPIR) on replicated databases modeled by a simple graph. In this model, each vertex corresponds to a server, and a message is replicated on two servers if and only if there is an edge between them. We consider the setting where the server-side common randomness necessary to accomplish SPIR is also replicated at the servers according to the graph, and we call this as message-specific common randomness. In this setting, we establish a lower bound on the SPIR capacity, i.e., the maximum download rate, for general graphs, by proposing an achievable SPIR scheme. Next, we prove that, for any SPIR scheme to be feasible, the minimum size of message-specific randomness should be equal to the size of a message. Finally, by providing matching upper bounds, we derive the exact SPIR capacity for the class of path and regular graphs.

[575] arXiv:2507.19733 (replaced) [pdf, other]
Title: Integrating Activity Predictions in Knowledge Graphs
Forrest Hare Alec Sculley, Cameron Stockton
Comments: 21 pages. 18 figures. Conference: Semantic Technology for Intelligence, Defense, and Security (STIDS 2024)
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

We argue that ontology-structured knowledge graphs can play a crucial role in generating predictions about future events. By leveraging the semantic framework provided by Basic Formal Ontology (BFO) and Common Core Ontologies (CCO), we demonstrate how data such as the movements of a fishing vessel can be organized in and retrieved from a knowledge graph. These query results are then used to create Markov chain models, allowing us to predict future states based on the vessel's history. To fully support this process, we introduce the term `spatiotemporal instant' to complete the necessary structural semantics. Additionally, we critique the prevailing ontological model of probability, according to which probabilities are about the future. We propose an alternative view, where at least some probabilities are treated as being about actual process profiles, which better captures the dynamics of real-world phenomena. Finally, we demonstrate how our Markov chain-based probability calculations can be seamlessly integrated back into the knowledge graph, enabling further analysis and decision-making.

[576] arXiv:2507.20301 (replaced) [pdf, html, other]
Title: Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation
Abdullah Alabdullah, Lifeng Han, Chenghua Lin
Subjects: Computation and Language (cs.CL)

Dialectal Arabic (DA) poses a persistent challenge for natural language processing (NLP), as most everyday communication in the Arab world occurs in dialects that diverge significantly from Modern Standard Arabic (MSA). This linguistic divide impedes progress in Arabic machine translation. This paper presents two core contributions to advancing DA-MSA translation for the Levantine, Egyptian, and Gulf dialects, particularly in low-resource and computationally constrained settings: (i) a comprehensive evaluation of training-free prompting techniques, and (ii) the development of a resource-efficient fine-tuning pipeline. Our evaluation of prompting strategies across six large language models (LLMs) found that few-shot prompting consistently outperformed zero-shot, chain-of-thought, and our proposed Ara-TEaR method. Ara-TEaR is designed as a three-stage self-refinement prompting process, targeting frequent meaning-transfer and adaptation errors in DA-MSA translation. In this evaluation, GPT-4o achieved the highest performance across all prompting settings. For fine-tuning LLMs, a quantized Gemma2-9B model achieved a chrF++ score of 49.88, outperforming zero-shot GPT-4o (44.58). Joint multi-dialect trained models outperformed single-dialect counterparts by over 10% chrF++, and 4-bit quantization reduced memory usage by 60% with less than 1% performance loss. The results and insights of our experiments offer a practical blueprint for improving dialectal inclusion in Arabic NLP, showing that high-quality DA-MSA machine translation is achievable even with limited resources and paving the way for more inclusive language technologies.

[577] arXiv:2507.20800 (replaced) [pdf, other]
Title: LanternNet: A Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations
Vinil Polepalli
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The invasive spotted lanternfly (SLF) poses a significant threat to agriculture and ecosystems, causing widespread damage. Current control methods, such as egg scraping, pesticides, and quarantines, prove labor-intensive, environmentally hazardous, and inadequate for long-term SLF suppression. This research introduces LanternNet, a novel autonomous robotic Hub-and-Spoke system designed for scalable detection and suppression of SLF populations. A central, tree-mimicking hub utilizes a YOLOv8 computer vision model for precise SLF identification. Three specialized robotic spokes perform targeted tasks: pest neutralization, environmental monitoring, and navigation/mapping. Field deployment across multiple infested sites over 5 weeks demonstrated LanternNet's efficacy. Quantitative analysis revealed significant reductions (p < 0.01, paired t-tests) in SLF populations and corresponding improvements in tree health indicators across the majority of test sites. Compared to conventional methods, LanternNet offers substantial cost advantages and improved scalability. Furthermore, the system's adaptability for enhanced autonomy and targeting of other invasive species presents significant potential for broader ecological impact. LanternNet demonstrates the transformative potential of integrating robotics and AI for advanced invasive species management and improved environmental outcomes.

[578] arXiv:2507.23751 (replaced) [pdf, html, other]
Title: CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Ping Yu, Jack Lanchantin, Tianlu Wang, Weizhe Yuan, Olga Golovneva, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Jing Xu
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We propose CoT-Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on given seed tasks, and then generate a new synthetic example of similar quality and complexity. This is followed by a filtering step to select high-quality data using automatic metrics, which are then used for LLM training. In verifiable reasoning, our synthetic data significantly outperforms existing training datasets, such as s1k and OpenMathReasoning, when evaluated on MATH500, AMC23, AIME24, and GPQA-Diamond. For non-verifiable instruction-following tasks, our method surpasses the performance of both human and standard Self-Instruct training data on the AlpacaEval 2.0 and Arena-Hard benchmarks.

[579] arXiv:2508.01197 (replaced) [pdf, html, other]
Title: A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
Zhan Shi, Song Wang, Junbo Chen, Jianke Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Visual grounding aims to identify objects or regions in a scene based on natural language descriptions, essential for spatially aware perception in autonomous driving. However, existing visual grounding tasks typically depend on bounding boxes that often fail to capture fine-grained details. Not all voxels within a bounding box are occupied, resulting in inaccurate object representations. To address this, we introduce a benchmark for 3D occupancy grounding in challenging outdoor scenes. Built on the nuScenes dataset, it integrates natural language with voxel-level occupancy annotations, offering more precise object perception compared to the traditional grounding task. Moreover, we propose GroundingOcc, an end-to-end model designed for 3D occupancy grounding through multi-modal learning. It combines visual, textual, and point cloud features to predict object location and occupancy information from coarse to fine. Specifically, GroundingOcc comprises a multimodal encoder for feature extraction, an occupancy head for voxel-wise predictions, and a grounding head to refine localization. Additionally, a 2D grounding module and a depth estimation module enhance geometric understanding, thereby boosting model performance. Extensive experiments on the benchmark demonstrate that our method outperforms existing baselines on 3D occupancy grounding. The dataset is available at this https URL.

[580] arXiv:2508.01415 (replaced) [pdf, html, other]
Title: RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Mingcong Lei, Honghao Cai, Binbin Que, Zezhou Cui, Liangchen Tan, Junkun Hong, Gehan Hu, Shuangyu Zhu, Yimou Wu, Shaohan Jiang, Ge Wang, Zhen Li, Shuguang Cui, Yiming Zhao, Yatong Han
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

We present RoboMemory, a brain-inspired multi-memory framework for lifelong learning in physical embodied systems, addressing critical challenges in real-world environments: continuous learning, multi-module memory latency, task correlation capture, and infinite-loop mitigation in closed-loop planning. Grounded in cognitive neuroscience, it integrates four core modules: the Information Preprocessor (thalamus-like), the Lifelong Embodied Memory System (hippocampus-like), the Closed-Loop Planning Module (prefrontal lobe-like), and the Low-Level Executer (cerebellum-like) to enable long-term planning and cumulative learning. The Lifelong Embodied Memory System, central to the framework, alleviates inference speed issues in complex memory frameworks via parallelized updates/retrieval across Spatial, Temporal, Episodic, and Semantic submodules. It incorporates a dynamic Knowledge Graph (KG) and consistent architectural design to enhance memory consistency and scalability. Evaluations on EmbodiedBench show RoboMemory outperforms the open-source baseline (Qwen2.5-VL-72B-Ins) by 25% in average success rate and surpasses the closed-source State-of-the-Art (SOTA) (Claude3.5-Sonnet) by 5%, establishing new SOTA. Ablation studies validate key components (critic, spatial memory, long-term memory), while real-world deployment confirms its lifelong learning capability with significantly improved success rates across repeated tasks. RoboMemory alleviates high latency challenges with scalability, serving as a foundational reference for integrating multi-modal memory systems in physical robots.

[581] arXiv:2508.01550 (replaced) [pdf, html, other]
Title: RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale
Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur Leung, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Haoxiang Zhang, Aaditya Bhatia, Chong Chun Yong, Ahmed E. Hassan
Subjects: Software Engineering (cs.SE)

Training software engineering (SWE) LLMs is bottlenecked by expensive infrastructure, inefficient evaluation pipelines, scarce training data, and costly quality control. We present RepoForge, an autonomous, end-to-end pipeline that generates, evaluates, and trains SWE agents at scale. Our key contributions include: (1) RepoForge-8B-Agent, achieving 17.4\% on SWE-Bench-Verified~\citep{swebench_verified2024}, establishing new state-of-the-art for $\leq$8B non-thinking LLMs; (2) 7,304 executable environments auto-generated from real GitHub commits with zero manual intervention; (3) 14$\times$ storage reduction (1.4GB $\rightarrow$ 102MB per instance) via intelligent dependency management and image pruning; (4) $>$70\% faster evaluation using a Ray-powered~\citep{ray2018} distributed RepoForge harness; (5) 19,000$\times$ cheaper labeling through our automated SPICE~\citep{spice2024} difficulty assessment technique. By unifying storage-efficient sandboxing, Ray-powered evaluation harness, automated data generation, SPICE-based labeling, and bubble-free RL scaffold, we demonstrate that even $\leq$8B models can reach new state-of-the-art performance on demanding benchmarks like SWE-Bench-Verified. Our approach addresses critical bottlenecks in SWE agent training: high storage costs of container-based evaluation, inefficient sequential reward pipelines, limited availability of high-quality training data, expensive manual labeling, and multi-turn RL pipeline bottlenecks.

[582] arXiv:2508.03665 (replaced) [pdf, html, other]
Title: A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design
Claudiu Leoveanu-Condrei
Comments: 4 pages, 1 figure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Generative models, particularly Large Language Models (LLMs), produce fluent outputs yet lack verifiable guarantees. We adapt Design by Contract (DbC) and type-theoretic principles to introduce a contract layer that mediates every LLM call. Contracts stipulate semantic and type requirements on inputs and outputs, coupled with probabilistic remediation to steer generation toward compliance. The layer exposes the dual view of LLMs as semantic parsers and probabilistic black-box components. Contract satisfaction is probabilistic and semantic validation is operationally defined through programmer-specified conditions on well-typed data structures. More broadly, this work postulates that any two agents satisfying the same contracts are \emph{functionally equivalent} with respect to those contracts.

[583] arXiv:2508.03700 (replaced) [pdf, other]
Title: MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhiheng Xi, Zhihui Cao, Hailiang Pang, Heng Kong, He Yang, Mingxu Chai, Zhilin Gao, Xingyu Liu, Yingnan Fu, Jiaming Liu, Xuanjing Huang, Yu-Gang Jiang, Tao Gui, Qi Zhang, Kang Wang, Yunke Zhang, Yuran Wang
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This paper presents MagicGUI, a foundational mobile GUI agent designed to address critical challenges in perception, grounding, and reasoning within real-world mobile GUI environments. The framework is underpinned by following six key components: (1) a comprehensive and accurate dataset, constructed via the scalable GUI Data Pipeline, which aggregates the largest and most diverse GUI-centric multimodal data to date from open-source repositories, automated crawling, and targeted manual annotation; (2) enhanced perception and grounding capabilities, facilitating fine-grained multimodal alignment for UI element referencing, grounding, and screen comprehension; (3) a comprehensive and unified action space, encompassing both fundamental UI operations and complex interactive intents to support human-agent interactions; (4) planning-oriented reasoning mechanisms that enable the model to decompose complex user instructions into sequential actions with explicit intermediate meta-paln reasoning; (5) an iterative two-stage training procedure, combining large-scale continue pre-training on 7.8M samples with reinforcement fine-tuning utilizing a spatially enhanced composite reward and dual filtering strategy; and (6) competitive performance on both the proprietary Magic-RICH benchmark and over a dozen public benchmarks, achieving superior performance across GUI perception and agent tasks, while demonstrating robust generalization and real-world deployment potential in practical mobile GUI scenarios, as detailed in Figure 1.

[584] arXiv:2508.04416 (replaced) [pdf, html, other]
Title: Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
Haoji Zhang, Xin Gu, Jiawen Li, Chixiang Ma, Sule Bai, Chubin Zhang, Bowen Zhang, Zhichao Zhou, Dongliang He, Yansong Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The video reasoning ability of multimodal large language models (MLLMs) is crucial for downstream tasks like video question answering and temporal grounding. While recent approaches have explored text-based chain-of-thought (CoT) reasoning for MLLMs, these methods often suffer from limited cross-modal interaction and increased hallucination, especially with longer videos or reasoning chains. To address these challenges, we propose Video Intelligence via Tool-Augmented Learning (VITAL), a novel end-to-end agentic video reasoning framework. With a visual toolbox, the model can densely sample new video frames on demand and generate multimodal CoT for precise long video reasoning. We observe that temporal grounding and question answering are mutually beneficial for video understanding tasks. Therefore, we construct two high-quality multi-task video reasoning datasets MTVR-CoT-72k for supervised fine-tuning and MTVR-RL-110k for reinforcement learning. Moreover, we propose a Difficulty-aware Group Relative Policy Optimization algorithm (DGRPO) to mitigate difficulty imbalance in multi-task reinforcement learning. Extensive experiments on 11 challenging video understanding benchmarks demonstrate the advanced reasoning ability of VITAL, outperforming existing methods in video question answering and temporal grounding tasks, especially in long video scenarios. Code is available at this https URL.

[585] arXiv:2508.05093 (replaced) [pdf, html, other]
Title: An End-to-End Multi-objective Ensemble Ranking Framework for Video Recommendation
Tiantian He, Minzhi Xie, Runtong Li, Xiaoxiao Xu, Jiaqi Yu, Zixiu Wang, Lantao Hu, Han Li, Kun Gai
Subjects: Information Retrieval (cs.IR)

We propose a novel End-to-end Multi-objective Ensemble Ranking framework (EMER) for the multi-objective ensemble ranking module, which is the most critical component of the short video recommendation system. EMER enhances personalization by replacing manually-designed heuristic formulas with an end-to-end modeling paradigm. EMER introduces a meticulously designed loss function to address the fundamental challenge of defining effective supervision for ensemble ranking, where no single ground-truth signal can fully capture user satisfaction. Moreover, EMER introduces novel sample organization method and transformer-based network architecture to capture the comparative relationships among candidates, which are critical for effective ranking. Additionally, we have proposed an offline-online consistent evaluation system to enhance the efficiency of offline model optimization, which is an established yet persistent challenge within the multi-objective ranking domain in industry. Abundant empirical tests are conducted on a real industrial dataset, and the results well demonstrate the effectiveness of our proposed framework. In addition, our framework has been deployed in the primary scenarios of Kuaishou, a short video recommendation platform with hundreds of millions of daily active users, achieving a 1.39% increase in overall App Stay Time and a 0.196% increase in 7-day user Lifetime(LT7), which are substantial improvements.

[586] arXiv:2508.07834 (replaced) [pdf, html, other]
Title: KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations
Mubaris Nadeem, Johannes Zenkert, Lisa Bender, Christian Weber, Madjid Fathi
Comments: LWDA'23, KIRETT project, University of Siegen, Germany
Journal-ref: In LWDA (pp. 259-270) 2023
Subjects: Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

Over the years, the need for rescue operations throughout the world has increased rapidly. Demographic changes and the resulting risk of injury or health disorders form the basis for emergency calls. In such scenarios, first responders are in a rush to reach the patient in need, provide first aid, and save lives. In these situations, they must be able to provide personalized and optimized healthcare in the shortest possible time and estimate the patients condition with the help of freshly recorded vital data in an emergency situation. However, in such a timedependent situation, first responders and medical experts cannot fully grasp their knowledge and need assistance and recommendation for further medical treatments. To achieve this, on the spot calculated, evaluated, and processed knowledge must be made available to improve treatments by first responders. The Knowledge Graph presented in this article as a central knowledge representation provides first responders with an innovative knowledge management that enables intelligent treatment recommendations with an artificial intelligence-based pre-recognition of the situation.

[587] arXiv:2508.08005 (replaced) [pdf, html, other]
Title: Learning to Select MCP Algorithms: From Traditional ML to Dual-Channel GAT-MLP
Xiang Li, Shanshan Wang, Chenglong Xiao
Comments: 10 pages, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Extensive experiments and prior studies show that no single maximum clique algorithm consistently performs best across all instances, highlighting the importance of selecting suitable algorithms based on instance features. Through an extensive analysis of relevant studies, it is found that there is a lack of research work concerning algorithm selection oriented toward the Maximum Clique Problem (MCP). In this work, we propose a learning-based framework that integrates both traditional machine learning and graph neural networks to address this gap. We construct a labeled dataset by running four exact MCP algorithms on a diverse collection of graph instances, accompanied by structural and global statistical features extracted from each graph. We first evaluate four conventional classifiers: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbors (KNN), across multiple dataset variants. Experimental results show that RF consistently shows strong performance across metrics and dataset variants, making it a reliable baseline. In addition, feature importance analysis indicates that connectivity and topological structure are strong predictors of algorithm performance. Building on these findings, we develop a dual-channel model named GAT-MLP, which combines a Graph Attention Network (GAT) for local structural encoding with a Multilayer Perceptron (MLP) for global feature modeling. The GAT-MLP model shows strong and consistent performance across all metrics. Our results highlight the effectiveness of dual-channel architectures and the promise of graph neural networks in combinatorial algorithm selection.

[588] arXiv:2508.08022 (replaced) [pdf, html, other]
Title: Optimizing Federated Learning for Scalable Power-demand Forecasting in Microgrids
Roopkatha Banerjee, Sampath Koti, Gyanendra Singh, Anirban Chakraborty, Gurunath Gurrala, Bhushan Jagyasi, Yogesh Simmhan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Real-time monitoring of power consumption in cities and micro-grids through the Internet of Things (IoT) can help forecast future demand and optimize grid operations. But moving all consumer-level usage data to the cloud for predictions and analysis at fine time scales can expose activity patterns. Federated Learning~(FL) is a privacy-sensitive collaborative DNN training approach that retains data on edge devices, trains the models on private data locally, and aggregates the local models in the cloud. But key challenges exist: (i) clients can have non-independently identically distributed~(non-IID) data, and (ii) the learning should be computationally cheap while scaling to 1000s of (unseen) clients. In this paper, we develop and evaluate several optimizations to FL training across edge and cloud for time-series demand forecasting in micro-grids and city-scale utilities using DNNs to achieve a high prediction accuracy while minimizing the training cost. We showcase the benefit of using exponentially weighted loss while training and show that it further improves the prediction of the final model. Finally, we evaluate these strategies by validating over 1000s of clients for three states in the US from the OpenEIA corpus, and performing FL both in a pseudo-distributed setting and a Pi edge cluster. The results highlight the benefits of the proposed methods over baselines like ARIMA and DNNs trained for individual consumers, which are not scalable.

[589] arXiv:2508.08040 (replaced) [pdf, html, other]
Title: BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Maozhen Zhang, Mengnan Zhao, Bo Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Prompt-based tuning has emerged as a lightweight alternative to full fine-tuning in large vision-language models, enabling efficient adaptation via learned contextual prompts. This paradigm has recently been extended to federated learning settings (e.g., PromptFL), where clients collaboratively train prompts under data privacy constraints. However, the security implications of prompt-based aggregation in federated multimodal learning remain largely unexplored, leaving a critical attack surface unaddressed. In this paper, we introduce \textbf{BadPromptFL}, the first backdoor attack targeting prompt-based federated learning in multimodal contrastive models. In BadPromptFL, compromised clients jointly optimize local backdoor triggers and prompt embeddings, injecting poisoned prompts into the global aggregation process. These prompts are then propagated to benign clients, enabling universal backdoor activation at inference without modifying model parameters. Leveraging the contextual learning behavior of CLIP-style architectures, BadPromptFL achieves high attack success rates (e.g., \(>90\%\)) with minimal visibility and limited client participation. Extensive experiments across multiple datasets and aggregation protocols validate the effectiveness, stealth, and generalizability of our attack, raising critical concerns about the robustness of prompt-based federated learning in real-world deployments.

[590] arXiv:2508.08624 (replaced) [pdf, html, other]
Title: Communication Efficient Robotic Mixed Reality with Gaussian Splatting Cross-Layer Optimization
Chenxuan Liu, He Li, Zongze Li, Shuai Wang, Wei Xu, Kejiang Ye, Derrick Wing Kwan Ng, Chengzhong Xu
Comments: 14 pages, 18 figures, to appear in IEEE Transactions on Cognitive Communications and Networking
Subjects: Robotics (cs.RO); Information Theory (cs.IT)

Realizing low-cost communication in robotic mixed reality (RoboMR) systems presents a challenge, due to the necessity of uploading high-resolution images through wireless channels. This paper proposes Gaussian splatting (GS) RoboMR (GSMR), which enables the simulator to opportunistically render a photo-realistic view from the robot's pose by calling ``memory'' from a GS model, thus reducing the need for excessive image uploads. However, the GS model may involve discrepancies compared to the actual environments. To this end, a GS cross-layer optimization (GSCLO) framework is further proposed, which jointly optimizes content switching (i.e., deciding whether to upload image or not) and power allocation (i.e., adjusting to content profiles) across different frames by minimizing a newly derived GSMR loss function. The GSCLO problem is addressed by an accelerated penalty optimization (APO) algorithm that reduces computational complexity by over $10$x compared to traditional branch-and-bound and search algorithms. Moreover, variants of GSCLO are presented to achieve robust, low-power, and multi-robot GSMR. Extensive experiments demonstrate that the proposed GSMR paradigm and GSCLO method achieve significant improvements over existing benchmarks on both wheeled and legged robots in terms of diverse metrics in various scenarios. For the first time, it is found that RoboMR can be achieved with ultra-low communication costs, and mixture of data is useful for enhancing GS performance in dynamic scenarios.

[591] arXiv:2508.09600 (replaced) [pdf, html, other]
Title: OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue
Xuelong Geng, Qijie Shao, Hongfei Xue, Shuiyuan Wang, Hanke Xie, Zhao Guo, Yi Zhao, Guojian Li, Wenjie Tian, Chengyou Wang, Zhixian Zhao, Kangxiang Xia, Ziyu Zhang, Zhennan Lin, Tianlun Zuo, Mingchen Shao, Yuang Cao, Guobin Ma, Longhao Li, Yuhang Dai, Dehui Gao, Dake Guo, Lei Xie
Subjects: Sound (cs.SD)

Empathy is crucial in enabling natural interactions within spoken dialogue systems, allowing machines to recognize and respond appropriately to paralinguistic cues such as age, gender, and emotion. Recent advancements in end-to-end speech language models, which unify speech understanding and generation, provide promising solutions. However, several challenges persist, including an over-reliance on large-scale dialogue datasets, insufficient extraction of paralinguistic cues vital for conveying empathy, and the lack of empathy-specific datasets and evaluation frameworks. To address these issues, we introduce OSUM-EChat, an open-source, end-to-end spoken dialogue system designed to enhance empathetic interactions, particularly in resource-limited settings. OSUM-EChat introduces two key innovations: (1) a three-stage understanding-driven spoken dialogue training strategy that extends the capabilities of a large speech understanding model to spoken dialogue tasks, and (2) a linguistic-paralinguistic dual thinking mechanism that integrates paralinguistic understanding through a chain of thought with dialogue generation, enabling the system to produce more empathetic responses. This approach reduces reliance on large-scale dialogue datasets while maintaining high-quality empathetic interactions. Additionally, we introduce the EChat-200K dataset, a rich corpus of empathetic speech-to-speech dialogues, and the EChat-eval benchmark, a comprehensive framework for evaluating the empathetic capabilities of dialogue systems. Experimental results demonstrate that OSUM-EChat outperforms end-to-end spoken dialogue models regarding empathetic responsiveness, validating its effectiveness.

[592] arXiv:2508.09777 (replaced) [pdf, html, other]
Title: In-place Double Stimulus Methodology for Subjective Assessment of High Quality Images
Shima Mohammadi, Mohsen Jenadeleh, Michela Testolina, Jon Sneyers, Touradj Ebrahimi, Dietmar Saupe, João Ascenso
Comments: 6 pages, 5 figures, Accepted at European Workshop on Visual Information Processing
Subjects: Multimedia (cs.MM)

This paper introduces a novel double stimulus subjective assessment methodology for the evaluation of high quality images to address the limitations of existing protocols in detecting subtle perceptual differences. The In-place Double Stimulus Quality Scale (IDSQS) allows subjects to alternately view a reference and a distorted image at the same spatial location, facilitating a more intuitive detection of differences in quality, especially at high to visually lossless quality levels. A large-scale crowdsourcing study employing this methodology was conducted, generating a comprehensive public dataset to evaluate perceived image quality across several compression algorithms and distortion levels. An additional contribution is the modeling of quality scores using a Beta distribution, allowing for the assessment of variability and subject consistency. Our findings demonstrate the effectiveness of the IDSQS methodology in achieving high correlation with more precise subjective evaluation benchmarks. The dataset, subjective data, and graphical user interface developed for this study are publicly available at this https URL

[593] arXiv:2508.09853 (replaced) [pdf, other]
Title: STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
Tegan McCaslin, Jide Alaga, Samira Nedungadi, Seth Donoughe, Tom Reed, Rishi Bommasani, Chris Painter, Luca Righetti
Comments: 47 pages, 1 figure. Includes appendices and reporting template
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Evaluations of dangerous AI capabilities are important for managing catastrophic risks. Public transparency into these evaluations - including what they test, how they are conducted, and how their results inform decisions - is crucial for building trust in AI development. We propose STREAM (A Standard for Transparently Reporting Evaluations in AI Model Reports), a standard to improve how model reports disclose evaluation results, initially focusing on chemical and biological (ChemBio) benchmarks. Developed in consultation with 23 experts across government, civil society, academia, and frontier AI companies, this standard is designed to (1) be a practical resource to help AI developers present evaluation results more clearly, and (2) help third parties identify whether model reports provide sufficient detail to assess the rigor of the ChemBio evaluations. We concretely demonstrate our proposed best practices with "gold standard" examples, and also provide a three-page reporting template to enable AI developers to implement our recommendations more easily.

[594] arXiv:2508.10059 (replaced) [pdf, html, other]
Title: CodeGrad: Integrating Multi-Step Verification with Gradient-Based LLM Refinement
Yueke Zhang, Yifan Zhang, Kevin Leach, Yu Huang
Comments: 6 Pages
Subjects: Software Engineering (cs.SE)

While Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, they often produce solutions that lack guarantees of correctness, robustness, and efficiency. This limitation is particularly acute in domains requiring strict constraints. CodeGrad introduces a principled framework that integrates rigorous verification techniques directly into an iterative LLM-based generation loop. It uniquely treats code as a differentiable variable, converting structured feedback and mathematical constraints into a textual pseudo-gradient. This gradient guides the model to iteratively refine solutions, ensuring they are not only functional but also robust and mathematically justified.
We evaluate CodeGrad on the HumanEval, HumanEval+, and LiveCodeBench benchmarks. Our implementation outperforms strong baselines, achieving an absolute improvement of up to 27% on HumanEval and a 41% relative improvement on the challenging LiveCodeBench V6. StructuredGrad generates mathematically justified code that is robust and efficient, paving the way for reliable AI-assisted software development in high-stakes applications.

[595] arXiv:2508.10198 (replaced) [pdf, html, other]
Title: Digital Contact Tracing: Examining the Effects of Understanding and Release Organization on Public Trust
Lucas Draper
Subjects: Computers and Society (cs.CY)

Contact tracing has existed in various forms for a very long time. With the rise of COVID-19, the concept has become increasingly important to help slow the spread of the virus. One approach to modernizing contact tracing is to introduce applications that detect all close contacts without individuals having to interact knowingly. 101 United States adults were surveyed in June of 2022 regarding their perceptions and trust of COVID-19 contact tracing applications. We see no definitive correlation between an individual's understanding of privacy protection procedures for contact tracing applications and their willingness to trust such an application. We also see that the release of the application by a private entity like Google-Apple or by a public entity like the United States Federal Government has no significant correlation with a person's trust in the application.

[596] arXiv:2508.11133 (replaced) [pdf, html, other]
Title: MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
Tomer Wolfson, Harsh Trivedi, Mor Geva, Yoav Goldberg, Dan Roth, Tushar Khot, Ashish Sabharwal, Reut Tsarfaty
Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2025. Authors pre-print
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB)

Automated agents, powered by Large language models (LLMs), are emerging as the go-to tool for querying information. However, evaluation benchmarks for LLM agents rarely feature natural questions that are both information-seeking and genuinely time-consuming for humans. To address this gap we introduce MoNaCo, a benchmark of 1,315 natural and time-consuming questions that require dozens, and at times hundreds, of intermediate steps to solve -- far more than any existing QA benchmark. To build MoNaCo, we developed a decomposed annotation pipeline to elicit and manually answer real-world time-consuming questions at scale. Frontier LLMs evaluated on MoNaCo achieve at most 61.2% F1, hampered by low recall and hallucinations. Our results underscore the limitations of LLM-powered agents in handling the complexity and sheer breadth of real-world information-seeking tasks -- with MoNaCo providing an effective resource for tracking such progress. The MoNaCo benchmark, codebase, prompts and models predictions are all publicly available at: this https URL

[597] arXiv:2508.12232 (replaced) [pdf, html, other]
Title: LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery
Arshia Akhavan, Alireza Hosseinpour, Abbas Heydarnoori, Mehdi Keshani
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Issue-to-commit link recovery plays an important role in software traceability and improves project management. However, it remains a challenging task. A study on GitHub shows that only 42.2% of the issues are correctly linked to their commits. This highlights the potential for further development and research in this area. Existing studies have employed various AI/ML-based approaches, and with the recent development of large language models, researchers have leveraged LLMs to tackle this problem. These approaches suffer from two main issues. First, LLMs are constrained by limited context windows and cannot ingest all of the available data sources, such as long commit histories, extensive issue comments, and large code repositories. Second, most methods operate on individual issue-commit pairs; that is, given a single issue-commit pair, they determine whether the commit resolves the issue. This quickly becomes impractical in real-world repositories containing tens of thousands of commits. To address these limitations, we present LinkAnchor, the first autonomous LLM-based agent designed for issue-to-commit link recovery. The lazy-access architecture of LinkAnchor enables the underlying LLM to access the rich context of software, spanning commits, issue comments, and code files, without exceeding the token limit by dynamically retrieving only the most relevant contextual data. Additionally, LinkAnchor is able to automatically pinpoint the target commit rather than exhaustively scoring every possible candidate. Our evaluations show that LinkAnchor outperforms state-of-the-art issue-to-commit link recovery approaches by 60-262% in Hit@1 score across all our case study projects. We also publicly release LinkAnchor as a ready-to-use tool, along with our replication package. LinkAnchor is designed and tested for GitHub and Jira, and is easily extendable to other platforms.

[598] arXiv:2508.12750 (replaced) [pdf, html, other]
Title: D2-Mamba: Dual-Scale Fusion and Dual-Path Scanning with SSMs for Shadow Removal
Linhao Li, Boya Jin, Zizhe Li, Lanqing Guo, Hao Cheng, Bo Li, Yongfeng Dong
Comments: Paper Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Shadow removal aims to restore images that are partially degraded by shadows, where the degradation is spatially localized and non-uniform. Unlike general restoration tasks that assume global degradation, shadow removal can leverage abundant information from non-shadow regions for guidance. However, the transformation required to correct shadowed areas often differs significantly from that of well-lit regions, making it challenging to apply uniform correction strategies. This necessitates the effective integration of non-local contextual cues and adaptive modeling of region-specific transformations. To this end, we propose a novel Mamba-based network featuring dual-scale fusion and dual-path scanning to selectively propagate contextual information based on transformation similarity across regions. Specifically, the proposed Dual-Scale Fusion Mamba Block (DFMB) enhances multi-scale feature representation by fusing original features with low-resolution features, effectively reducing boundary artifacts. The Dual-Path Mamba Group (DPMG) captures global features via horizontal scanning and incorporates a mask-aware adaptive scanning strategy, which improves structural continuity and fine-grained region modeling. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art approaches on shadow removal benchmarks.

[599] arXiv:2508.13401 (replaced) [pdf, html, other]
Title: AIM 2025 Rip Current Segmentation (RipSeg) Challenge Report
Andrei Dumitriu, Florin Miron, Florin Tatui, Radu Tudor Ionescu, Radu Timofte, Aakash Ralhan, Florin-Alexandru Vasluianu, Shenyang Qian, Mitchell Harley, Imran Razzak, Yang Song, Pu Luo, Yumei Li, Cong Xu, Jinming Chai, Kexin Zhang, Licheng Jiao, Lingling Li, Siqi Yu, Chao Zhang, Kehuan Song, Fang Liu, Puhua Chen, Xu Liu, Jin Hu, Jinyang Xu, Biao Liu
Comments: Challenge report paper from AIM2025 Workshop at ICCVW 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This report presents an overview of the AIM 2025 RipSeg Challenge, a competition designed to advance techniques for automatic rip current segmentation in still images. Rip currents are dangerous, fast-moving flows that pose a major risk to beach safety worldwide, making accurate visual detection an important and underexplored research task. The challenge builds on RipVIS, the largest available rip current dataset, and focuses on single-class instance segmentation, where precise delineation is critical to fully capture the extent of rip currents. The dataset spans diverse locations, rip current types, and camera orientations, providing a realistic and challenging benchmark.
In total, $75$ participants registered for this first edition, resulting in $5$ valid test submissions. Teams were evaluated on a composite score combining $F_1$, $F_2$, $AP_{50}$, and $AP_{[50:95]}$, ensuring robust and application-relevant rankings. The top-performing methods leveraged deep learning architectures, domain adaptation techniques, pretrained models, and domain generalization strategies to improve performance under diverse conditions.
This report outlines the dataset details, competition framework, evaluation metrics, and final results, providing insights into the current state of rip current segmentation. We conclude with a discussion of key challenges, lessons learned from the submissions, and future directions for expanding RipSeg.

[600] arXiv:2508.15327 (replaced) [pdf, html, other]
Title: Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
Xiancheng Gao, Yufeng Shi, Wengang Zhou, Houqiang Li
Comments: 7 pages, 6 figures, under review
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Offline reinforcement learning refers to the process of learning policies from fixed datasets, without requiring additional environment interaction. However, it often relies on well-defined reward functions, which are difficult and expensive to design. Human feedback is an appealing alternative, but its two common forms, expert demonstrations and preferences, have complementary limitations. Demonstrations provide stepwise supervision, but they are costly to collect and often reflect limited expert behavior modes. In contrast, preferences are easier to collect, but it is unclear which parts of a behavior contribute most to a trajectory segment, leaving credit assignment unresolved. In this paper, we introduce a Search-Based Preference Weighting (SPW) scheme to unify these two feedback sources. For each transition in a preference labeled trajectory, SPW searches for the most similar state-action pairs from expert demonstrations and directly derives stepwise importance weights based on their similarity scores. These weights are then used to guide standard preference learning, enabling more accurate credit assignment that traditional approaches struggle to achieve. We demonstrate that SPW enables effective joint learning from preferences and demonstrations, outperforming prior methods that leverage both feedback types on challenging robot manipulation tasks.

[601] arXiv:2508.17985 (replaced) [pdf, html, other]
Title: Integration of Computer Vision with Adaptive Control for Autonomous Driving Using ADORE
Abu Shad Ahammed, Md Shahi Amran Hossain, Sayeri Mukherjee, Roman Obermaisser, Md. Ziaur Rahman
Subjects: Robotics (cs.RO)

Ensuring safety in autonomous driving requires a seamless integration of perception and decision making under uncertain conditions. Although computer vision (CV) models such as YOLO achieve high accuracy in detecting traffic signs and obstacles, their performance degrades in drift scenarios caused by weather variations or unseen objects. This work presents a simulated autonomous driving system that combines a context aware CV model with adaptive control using the ADORE framework. The CARLA simulator was integrated with ADORE via the ROS bridge, allowing real-time communication between perception, decision, and control modules. A simulated test case was designed in both clear and drift weather conditions to demonstrate the robust detection performance of the perception model while ADORE successfully adapted vehicle behavior to speed limits and obstacles with low response latency. The findings highlight the potential of coupling deep learning-based perception with rule-based adaptive decision making to improve automotive safety critical system.

[602] arXiv:2508.18222 (replaced) [pdf, html, other]
Title: Exploratory Notes on Symbolic Constraints in Polyhedral Enclosure and Tetrahedral Decomposition in Genus-0 Polyhedra
Moustapha Itani
Subjects: Computational Geometry (cs.CG); Combinatorics (math.CO)

I present a coordinate-free, symbolic framework for deciding whether a given set of polygonal faces can form a closed, genus-zero polyhedral surface and for predicting how such a surface could be decomposed into internal tetrahedra. The method uses only discrete incidence variables, such as the number of internal tetrahedra $T$, internal gluing triangles $N_i$, and internal triangulation segments $S_i$, and applies combinatorial feasibility checks before any geometric embedding is attempted. For polyhedra in normal form, I record exact incidence identities linking $V,E,F$ to a flatness parameter $S:=\sum_f(\tmop{deg} f-3)$, and I identify parity-sensitive effects in $E$, $F$, and $S$. The external identities and parity-sensitive bounds hold universally for genus-0 polyhedral graphs. For internal quantities, I prove exact relations $N_i=2T-V+2$ and $T-N_i+S_i=1$ (with $S_i$ taken to be the number of interior edges) and obtain restricted linear ranges within a shell-aligned ladder subclass (SALT), where at most one interior edge is introduced per layer. Consequently, I propose a symbolic workflow that yields rapid pre-checks for structural impossibility, reducing the need for costly geometric validation in computational geometry, graphics, and automated modeling.

[603] arXiv:2508.18298 (replaced) [pdf, html, other]
Title: Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
Gohar Irfan Chaudhry, Esha Choukse, Haoran Qiu, Íñigo Goiri, Rodrigo Fonseca, Adam Belay, Ricardo Bianchini
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Agentic workflows commonly coordinate multiple models and tools with complex control logic. They are quickly becoming the dominant paradigm for AI applications. However, serving them remains inefficient with today's frameworks. The key problem is that they expose workflows as opaque sequences of model and tool calls that tightly couple agent logic with model and hardware choices. Often, these workflow components are fragmented across different entities, preventing systems from reasoning about trade-offs across accuracy, latency, energy, and cost. This leads to resource waste and degraded service-level objectives (SLOs).
We present Murakkab, a resource-efficient serving system for agentic workflows. Murakkab introduces a declarative abstraction that decouples workflow specification from execution configuration. A profile-guided optimizer and adaptive runtime jointly manage the full stack: orchestrating workflow components, mapping them to models and hardware, and dynamically reconfiguring execution to satisfy user-defined SLOs. By exposing the internal structure of agentic workflows, Murakkab enables cross-layer optimization that existing frameworks and cloud schedulers cannot achieve.
Our evaluation on diverse workflows shows that Murakkab reduces GPU usage by up to 2.8$\times$, energy consumption by 3.7$\times$, and cost by 4.3$\times$ while maintaining SLOs.

[604] arXiv:2508.18453 (replaced) [pdf, other]
Title: Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication
Yaser Baseri, Abdelhakim Senhaji Hafid, Dimitrios Makrakis, Hamidreza Fereidouni
Subjects: Cryptography and Security (cs.CR)

Balancing robust security with strong privacy guarantees is critical for Risk-Based Adaptive Authentication (RBA), particularly in decentralized settings. Federated Learning (FL) offers a promising solution by enabling collaborative risk assessment without centralizing user data. However, existing FL approaches struggle with Non-Independent and Identically Distributed (Non-IID) user features, resulting in biased, unstable, and poorly generalized global models. This paper introduces FL-RBA2, a novel Federated Learning framework for Risk-Based Adaptive Authentication that addresses Non-IID challenges through a mathematically grounded similarity transformation. By converting heterogeneous user features (including behavioral, biometric, contextual, interaction-based, and knowledge-based modalities) into IID similarity vectors, FL-RBA2 supports unbiased aggregation and personalized risk modeling across distributed clients. The framework mitigates cold-start limitations via clustering-based risk labeling, incorporates Differential Privacy (DP) to safeguard sensitive information, and employs Message Authentication Codes (MACs) to ensure model integrity and authenticity. Federated updates are securely aggregated into a global model, achieving strong balance between user privacy, scalability, and adaptive authentication robustness. Rigorous game-based security proofs in the Random Oracle Model formally establish privacy, correctness, and adaptive security guarantees. Extensive experiments on keystroke, mouse, and contextual datasets validate FL-RBA2's effectiveness in high-risk user detection and its resilience to model inversion and inference attacks, even under strong DP constraints.

[605] arXiv:2508.18721 (replaced) [pdf, html, other]
Title: LLM as an Execution Estimator: Recovering Missing Dependency for Practical Time-travelling Debugging
Yunrui Pei, Hongshu Wang, Wenjie Zhang, Yun Lin, Weiyu Kong, Jin song Dong
Subjects: Software Engineering (cs.SE)

Determining the dynamic data dependency of a step that reads a variable $v$ is challenging. It typically requires either exhaustive instrumentation, which becomes prohibitively expensive when $v$ is defined within library calls, or repeated executions, which are impractical for non-deterministic programs. In this work, we propose RecovSlicing for computing dynamic data dependency in a single run, with only partial instrumentation. We explore the intuition that LLM can potentially infer program dynamics based on a partially recorded trace and relevant code as its context. Given (1) a partially recorded trace of a program $P$ and (2) the slicing criteria consisting of a query step $s$ and a query variable $v$ read by $s$, RecovSlicing computes the runtime definition of $v$ on the trace by estimating the miss-recorded execution of $P$. In this work, we allow the user to specify implicit query variable. Technically, built upon non-deterministic LLM, we address the challenges of (1) precise recovery of runtime variable value and structure from the recorded execution and (2) aligning the memory address of recovered variables and the recorded variables for definition analysis. We evaluate RecovSlicing on 8300 data dependencies across three slicing benchmarks, comparing it with Slicer4J, ND-Slicer, LLM Slicer, and re-execution Slicer. RecovSlicing achieves significantly higher accuracy (80.3%, 91.1%, 98.3%) and recall (up to 98.3%) than the best baseline (accuracy: 39.0%, 82.0%, 59.9%; recall: 53.4%, 79.1%, 87.1%). Integrated into a dual-slicing regression bug localizer, it identifies 16% more regressions.

[606] arXiv:2508.19200 (replaced) [pdf, html, other]
Title: The Ramon Llull's Thinking Machine for Automated Ideation
Xinran Zhao, Boyuan Zheng, Chenglei Si, Haofei Yu, Ken Liu, Runlong Zhou, Ruochen Li, Tong Chen, Xiang Li, Yiming Zhang, Tongshuang Wu
Comments: 21 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This paper revisits Ramon Llull's Ars combinatoria - a medieval framework for generating knowledge through symbolic recombination - as a conceptual foundation for building a modern Llull's thinking machine for research ideation. Our approach defines three compositional axes: Theme (e.g., efficiency, adaptivity), Domain (e.g., question answering, machine translation), and Method (e.g., adversarial training, linear attention). These elements represent high-level abstractions common in scientific work - motivations, problem settings, and technical approaches - and serve as building blocks for LLM-driven exploration. We mine elements from human experts or conference papers and show that prompting LLMs with curated combinations produces research ideas that are diverse, relevant, and grounded in current literature. This modern thinking machine offers a lightweight, interpretable tool for augmenting scientific creativity and suggests a path toward collaborative ideation between humans and AI.

[607] arXiv:2508.19352 (replaced) [pdf, html, other]
Title: Memorization in Graph Neural Networks
Adarsh Jamadandi, Jing Xu, Adam Dziedzic, Franziska Boenisch
Comments: Version3, updated affiliation
Subjects: Machine Learning (cs.LG)

Deep neural networks (DNNs) have been shown to memorize their training data, yet similar analyses for graph neural networks (GNNs) remain largely under-explored. We introduce NCMemo (Node Classification Memorization), the first framework to quantify label memorization in semi-supervised node classification. We first establish an inverse relationship between memorization and graph homophily, i.e., the property that connected nodes share similar labels/features. We find that lower homophily significantly increases memorization, indicating that GNNs rely on memorization to learn less homophilic graphs. Secondly, we analyze GNN training dynamics. We find that the increased memorization in low homophily graphs is tightly coupled to the GNNs' implicit bias on using graph structure during learning. In low homophily regimes, this structure is less informative, hence inducing memorization of the node labels to minimize training loss. Finally, we show that nodes with higher label inconsistency in their feature-space neighborhood are significantly more prone to memorization. Building on our insights into the link between graph homophily and memorization, we investigate graph rewiring as a means to mitigate memorization. Our results demonstrate that this approach effectively reduces memorization without compromising model performance. Moreover, we show that it lowers the privacy risk for previously memorized data points in practice. Thus, our work not only advances understanding of GNN learning but also supports more privacy-preserving GNN deployment.

[608] arXiv:2508.19444 (replaced) [pdf, html, other]
Title: Infrastructure-enabled risk assessment of hazardous road conditions on rural roads during inclement weather
Suhala Rabab Saba, Sagar Dasgupta, Mizanur Rahman, Nathan Huynh, Li Zhao, Mehmet C. Vuran, Qiang Liu, Eren Erman Ozguven
Comments: 18 pages, 5 figures, 5 tables
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Rural roadways often expose Commercial Motor Vehicle (CMV) drivers to hazardous conditions, such as heavy fog, rain, snow, black ice, and flash floods, many of which remain unreported in real time. This lack of timely information, coupled with limited infrastructure in rural areas, significantly increases the risk of crashes. Although various sensing technologies exist to monitor individual hazards like low visibility or surface friction, they rarely assess the combined driving risk posed by multiple simultaneous hazards, nor do they provide actionable recommendations such as safe advisory speeds. To address this critical gap, in this study, we present a roadway hazard risk assessment framework that provides an approach to quantify the probability and severity of crash occurrences due to specific roadway hazards. To evaluate this framework, we presented a case study by constructing a synthetic "year-long" dataset that encompasses every possible pairing of road surface and visibility conditions. Our analysis confirms that the combined ProbabilitySeverity scoring yields a coherent, stepwise risk profile across all hazard scenarios. These results validate the practicality of our risk assessment approach and provide a foundation for deploying graduated safety measures in real-world roadway operations.

[609] arXiv:2508.19493 (replaced) [pdf, html, other]
Title: Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Zhixin Lin, Jungang Li, Shidong Pan, Yibo Shi, Yue Yao, Dongliang Xu
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Smartphones bring significant convenience to users but also enable devices to extensively record various types of personal information. Existing smartphone agents powered by Multimodal Large Language Models (MLLMs) have achieved remarkable performance in automating different tasks. However, as the cost, these agents are granted substantial access to sensitive users' personal information during this operation. To gain a thorough understanding of the privacy awareness of these agents, we present the first large-scale benchmark encompassing 7,138 scenarios to the best of our knowledge. In addition, for privacy context in scenarios, we annotate its type (e.g., Account Credentials), sensitivity level, and location. We then carefully benchmark seven available mainstream smartphone agents. Our results demonstrate that almost all benchmarked agents show unsatisfying privacy awareness (RA), with performance remaining below 60% even with explicit hints. Overall, closed-source agents show better privacy ability than open-source ones, and Gemini 2.0-flash achieves the best, achieving an RA of 67%. We also find that the agents' privacy detection capability is highly related to scenario sensitivity level, i.e., the scenario with a higher sensitivity level is typically more identifiable. We hope the findings enlighten the research community to rethink the unbalanced utility-privacy tradeoff about smartphone agents. Our code and benchmark are available at this https URL.

[610] arXiv:2508.19828 (replaced) [pdf, html, other]
Title: Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Hinrich Schütze, Volker Tresp, Yunpu Ma
Comments: work in progress
Subjects: Computation and Language (cs.CL); Multiagent Systems (cs.MA)

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking any learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns to perform structured memory operations, including adding, updating, deleting, or taking no operation on memory entries; and an Answer Agent that selects the most relevant entries and reasons over them to produce an answer. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management and utilization with minimal supervision. With as few as 152 question-answer pairs and a corresponding temporal memory bank for training, Memory-R1 outperforms the strongest existing baseline and demonstrates strong generalization across diverse question types and LLM backbones. Beyond presenting an effective approach, this work provides insights into how RL can unlock more agentic, memory-aware behavior in LLMs, pointing toward richer, more persistent reasoning systems.

[611] arXiv:2508.19855 (replaced) [pdf, html, other]
Title: Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning
Junnan Dong, Siyu An, Yifei Yu, Qian-Wen Zhang, Linhao Luo, Xiao Huang, Yunsheng Wu, Di Yin, Xing Sun
Comments: 19 pages, 7 figures, 6 tables
Subjects: Information Retrieval (cs.IR)

Graph retrieval-augmented generation (GraphRAG) has effectively enhanced large language models in complex reasoning by organizing fragmented knowledge into explicitly structured graphs. Prior efforts have been made to improve either graph construction or graph retrieval in isolation, yielding suboptimal performance, especially when domain shifts occur. In this paper, we propose a vertically unified agentic paradigm, Youtu-GraphRAG, to jointly connect the entire framework as an intricate integration. Specifically, (i) a seed graph schema is introduced to bound the automatic extraction agent with targeted entity types, relations and attribute types, also continuously expanded for scalability over unseen domains; (ii) To obtain higher-level knowledge upon the schema, we develop novel dually-perceived community detection, fusing structural topology with subgraph semantics for comprehensive knowledge organization. This naturally yields a hierarchical knowledge tree that supports both top-down filtering and bottom-up reasoning with community summaries; (iii) An agentic retriever is designed to interpret the same graph schema to transform complex queries into tractable and parallel sub-queries. It iteratively performs reflection for more advanced reasoning; (iv) To alleviate the knowledge leaking problem in pre-trained LLM, we propose a tailored anonymous dataset and a novel 'Anonymity Reversion' task that deeply measures the real performance of the GraphRAG frameworks. Extensive experiments across six challenging benchmarks demonstrate the robustness of Youtu-GraphRAG, remarkably moving the Pareto frontier with up to 90.71% saving of token costs and 16.62% higher accuracy over state-of-the-art baselines. The results indicate our adaptability, allowing seamless domain transfer with minimal intervention on schema.

[612] arXiv:2508.20282 (replaced) [pdf, html, other]
Title: Network-Level Prompt and Trait Leakage in Local Research Agents
Hyejun Jeong, Mohammadreza Teymoorianfard, Abhinav Kumar, Amir Houmansadr, Eugene Bagdasarian
Comments: Code available at this https URL
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

We show that Web and Research Agents (WRAs) -- language model-based systems that investigate complex topics on the Internet -- are vulnerable to inference attacks by passive network adversaries such as ISPs. These agents could be deployed locally by organizations and individuals for privacy, legal, or financial purposes. Unlike sporadic web browsing by humans, WRAs visit $70{-}140$ domains with distinguishable timing correlations, enabling unique fingerprinting attacks.
Specifically, we demonstrate a novel prompt and user trait leakage attack against WRAs that only leverages their network-level metadata (i.e., visited IP addresses and their timings). We start by building a new dataset of WRA traces based on user search queries and queries generated by synthetic personas. We define a behavioral metric (called OBELS) to comprehensively assess similarity between original and inferred prompts, showing that our attack recovers over 73% of the functional and domain knowledge of user prompts. Extending to a multi-session setting, we recover up to 19 of 32 latent traits with high accuracy. Our attack remains effective under partial observability and noisy conditions. Finally, we discuss mitigation strategies that constrain domain diversity or obfuscate traces, showing negligible utility impact while reducing attack effectiveness by an average of 29%.

[613] arXiv:2508.20553 (replaced) [pdf, other]
Title: DMPC-Swarm: Distributed Model Predictive Control on Nano UAV Swarms
Alexander Gräfe, Joram Eickhoff, Marco Zimmerling, Sebastian Trimpe
Subjects: Systems and Control (eess.SY)

Swarms of unmanned aerial vehicles (UAVs) are increasingly becoming vital to our society, undertaking tasks such as search and rescue, surveillance and delivery. A special variant of Distributed Model Predictive Control (DMPC) has emerged as a promising approach for the safe management of these swarms by combining the scalability of distributed computation with dynamic swarm motion control. In this DMPC method, multiple agents solve local optimization problems with coupled anti-collision constraints, periodically exchanging their solutions. Despite its potential, existing methodologies using this DMPC variant have yet to be deployed on distributed hardware that fully utilize true distributed computation and wireless communication. This is primarily due to the lack of a communication system tailored to meet the unique requirements of mobile swarms and an architecture that supports distributed computation while adhering to the payload constraints of UAVs. We present DMPC-SWARM, a new swarm control methodology that integrates an efficient, stateless low-power wireless communication protocol with a novel DMPC algorithm that provably avoids UAV collisions even under message loss. By utilizing event-triggered and distributed off-board computing, DMPC-SWARM supports nano UAVs, allowing them to benefit from additional computational resources while retaining scalability and fault tolerance. In a detailed theoretical analysis, we prove that DMPC-SWARM guarantees collision avoidance under realistic conditions, including communication delays and message loss. Finally, we present DMPC-SWARM's implementation on a swarm of up to 16 nano-quadcopters, demonstrating the first realization of these DMPC variants with computation distributed on multiple physical devices interconnected by a real wireless mesh networks. A video showcasing DMPC-SWARM is available at this http URL.

[614] arXiv:2508.20757 (replaced) [pdf, html, other]
Title: GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation
Yuanhao Ding, Esteban Garces Arias, Meimingwei Li, Julian Rodemann, Matthias Aßenmacher, Danlu Chen, Gaojuan Fan, Christian Heumann, Chongsheng Zhang
Comments: Accepted at Findings of the Association for Computational Linguistics: EMNLP 2025
Subjects: Computation and Language (cs.CL)

Open-ended text generation faces a critical challenge: balancing coherence with diversity in LLM outputs. While contrastive search-based decoding strategies have emerged to address this trade-off, their practical utility is often limited by hyperparameter dependence and high computational costs. We introduce GUARD, a self-adaptive decoding method that effectively balances these competing objectives through a novel "Glocal" uncertainty-driven framework. GUARD combines global entropy estimates with local entropy deviations to integrate both long-term and short-term uncertainty signals. We demonstrate that our proposed global entropy formulation effectively mitigates abrupt variations in uncertainty, such as sudden overconfidence or high entropy spikes, and provides theoretical guarantees of unbiasedness and consistency. To reduce computational overhead, we incorporate a simple yet effective token-count-based penalty into GUARD. Experimental results demonstrate that GUARD achieves a good balance between text diversity and coherence, while exhibiting substantial improvements in generation speed. In a more nuanced comparison study across different dimensions of text quality, both human and LLM evaluators validated its remarkable performance. Our code is available at this https URL.

[615] arXiv:2508.21236 (replaced) [pdf, html, other]
Title: Population-Scale Network Embeddings Expose Educational Divides in Network Structure Related to Right-Wing Populist Voting
Malte Lüken, Javier Garcia-Bernardo, Sreeparna Deb, Flavio Hafner, Megha Khosla
Comments: 30 pages, 6 figures, Supplementary Materials available at this https URL small textual changes, update Figure 6
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Applications (stat.AP)

Administrative registry data can be used to construct population-scale networks whose ties reflect shared social contexts between persons. With machine learning, such networks can be encoded into numerical representations -- embeddings -- that automatically capture individuals' position within the network. We created embeddings for all persons in the Dutch population from a population-scale network that represents five shared contexts: neighborhood, work, family, household, and school. To assess the informativeness of these embeddings, we used them to predict right-wing populist voting. Embeddings alone predicted right-wing populist voting above chance-level but performed worse than individual characteristics. Combining the best subset of embeddings with individual characteristics only slightly improved predictions. After transforming the embeddings to make their dimensions more sparse and orthogonal, we found that one embedding dimension was strongly associated with the outcome. Mapping this dimension back to the population network revealed differences in network structure related to right-wing populist voting between different school ties and achieved education levels. Our study contributes methodologically by demonstrating how population-scale network embeddings can be made interpretable, and substantively by linking structural network differences in education to right-wing populist voting.

[616] arXiv:2508.21302 (replaced) [pdf, html, other]
Title: Locus: Agentic Predicate Synthesis for Directed Fuzzing
Jie Zhu, Chihao Shen, Ziyang Li, Jiahao Yu, Yizheng Chen, Kexin Pei
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Directed fuzzing aims to find program inputs that lead to specified target program states. It has broad applications, such as debugging system crashes, confirming reported bugs, and generating exploits for potential vulnerabilities. This task is inherently challenging because target states are often deeply nested in the program, while the search space manifested by numerous possible program inputs is prohibitively large. Existing approaches rely on branch distances or manually-specified constraints to guide the search; however, the branches alone are often insufficient to precisely characterize progress toward reaching the target states, while the manually specified constraints are often tailored for specific bug types and thus difficult to generalize to diverse target states and programs.
We present Locus, a novel framework to improve the efficiency of directed fuzzing. Our key insight is to synthesize predicates to capture fuzzing progress as semantically meaningful intermediate states, serving as milestones towards reaching the target states. When used to instrument the program under fuzzing, they can reject executions unlikely to reach the target states, while providing additional coverage guidance. To automate this task and generalize to diverse programs, Locus features an agentic framework with program analysis tools to synthesize and iteratively refine the candidate predicates, while ensuring the predicates strictly relax the target states to prevent false rejections via symbolic execution. Our evaluation shows that Locus substantially improves the efficiency of eight state-of-the-art fuzzers in discovering real-world vulnerabilities, achieving an average speedup of 41.6x. So far, Locus has found eight previously unpatched bugs, with one already acknowledged with a draft patch.

[617] arXiv:2508.21314 (replaced) [pdf, html, other]
Title: Convergence of regularized agent-state-based Q-learning in POMDPs
Amit Sinha, Matthieu Geist, Aditya Mahajan
Comments: Accepted in CDC 2025
Subjects: Machine Learning (cs.LG)

In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practice. Two salient features of such algorithms are: (i)~the Q-table is recursively updated using an agent state (such as the state of a recurrent neural network) which is not a belief state or an information state and (ii)~policy regularization is often used to encourage exploration and stabilize the learning algorithm. We investigate the simplest form of such Q-learning algorithms which we call regularized agent-state-based Q-learning (RASQL) and show that it converges under mild technical conditions to the fixed point of an appropriately defined regularized MDP, which depends on the stationary distribution induced by the behavioral policy. We also show that a similar analysis continues to work for a variant of RASQL that learns periodic policies. We present numerical examples to illustrate that the empirical convergence behavior matches with the proposed theoretical limit.

[618] arXiv:2508.21376 (replaced) [pdf, html, other]
Title: AHELM: A Holistic Evaluation of Audio-Language Models
Tony Lee, Haoqin Tu, Chi Heem Wong, Zijun Wang, Siwei Yang, Yifan Mai, Yuyin Zhou, Cihang Xie, Percy Liang
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Evaluations of audio-language models (ALMs) -- multimodal models that take interleaved audio and text as input and output text -- are hindered by the lack of standardized benchmarks; most benchmarks measure only one or two capabilities and omit evaluative aspects such as fairness or safety. Furthermore, comparison across models is difficult as separate evaluations test a limited number of models and use different prompting methods and inference parameters. To address these shortfalls, we introduce AHELM, a benchmark that aggregates various datasets -- including 2 new synthetic audio-text datasets called PARADE, which evaluates the ALMs on avoiding stereotypes, and CoRe-Bench, which measures reasoning over conversational audio through inferential multi-turn question answering -- to holistically measure the performance of ALMs across 10 aspects we have identified as important to the development and usage of ALMs: audio perception, knowledge, reasoning, emotion detection, bias, fairness, multilinguality, robustness, toxicity, and safety. We also standardize the prompts, inference parameters, and evaluation metrics to ensure equitable comparisons across models. We test 14 open-weight and closed-API ALMs from 3 developers and 3 additional simple baseline systems each consisting of an automatic speech recognizer and a language model. Our results show that while Gemini 2.5 Pro ranks top in 5 out of 10 aspects, it exhibits group unfairness ($p=0.01$) on ASR tasks whereas most of the other models do not. We also find that the baseline systems perform reasonably well on AHELM, with one ranking 6th overall despite having only speech-to-text capabilities. For transparency, all raw prompts, model generations, and outputs are available on our website at this https URL. AHELM is intended to be a living benchmark and new datasets and models will be added over time.

[619] arXiv:2509.00045 (replaced) [pdf, html, other]
Title: Performance is not All You Need: Sustainability Considerations for Algorithms
Xiang Li, Chong Zhang, Hongpeng Wang, Shreyank Narayana Gowda, Yushi Li, Xiaobo Jin
Comments: 18 pages, 6 figures. Accepted Chinese Conference on Pattern Recognition and Computer Vision 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Performance (cs.PF)

This work focuses on the high carbon emissions generated by deep learning model training, specifically addressing the core challenge of balancing algorithm performance and energy consumption. It proposes an innovative two-dimensional sustainability evaluation system. Different from the traditional single performance-oriented evaluation paradigm, this study pioneered two quantitative indicators that integrate energy efficiency ratio and accuracy: the sustainable harmonic mean (FMS) integrates accumulated energy consumption and performance parameters through the harmonic mean to reveal the algorithm performance under unit energy consumption; the area under the sustainability curve (ASC) constructs a performance-power consumption curve to characterize the energy efficiency characteristics of the algorithm throughout the cycle. To verify the universality of the indicator system, the study constructed benchmarks in various multimodal tasks, including image classification, segmentation, pose estimation, and batch and online learning. Experiments demonstrate that the system can provide a quantitative basis for evaluating cross-task algorithms and promote the transition of green AI research from theory to practice. Our sustainability evaluation framework code can be found here, providing methodological support for the industry to establish algorithm energy efficiency standards.

[620] arXiv:2509.00062 (replaced) [pdf, html, other]
Title: Scaffold Diffusion: Sparse Multi-Category Voxel Structure Generation with Discrete Diffusion
Justin Jung
Comments: Comments: 6 pages, LaTeX; typos corrected, figure added
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Generating realistic sparse multi-category 3D voxel structures is difficult due to the cubic memory scaling of voxel structures and moreover the significant class imbalance caused by sparsity. We introduce Scaffold Diffusion, a generative model designed for sparse multi-category 3D voxel structures. By treating voxels as tokens, Scaffold Diffusion uses a discrete diffusion language model to generate 3D voxel structures. We show that discrete diffusion language models can be extended beyond inherently sequential domains such as text to generate spatially coherent 3D structures. We evaluate on Minecraft house structures from the 3D-Craft dataset and demonstrate that, unlike prior baselines and an auto-regressive formulation, Scaffold Diffusion produces realistic and coherent structures even when trained on data with over 98% sparsity. We provide an interactive viewer where readers can visualize generated samples and the generation process: this https URL

[621] arXiv:2509.00096 (replaced) [pdf, html, other]
Title: Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs
Yao Fu, Runchao Li, Xianxuan Long, Haotian Yu, Xiaotian Han, Yu Yin, Pan Li
Comments: Accepted to EMNLP2025 findings (poster)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Neural network pruning has emerged as a promising approach for deploying LLMs in low-resource scenarios while preserving downstream task performance. However, for the first time, we reveal that such pruning disrupts LLMs' internal activation features crucial for lie detection, where probing classifiers (typically small logistic regression models) trained on these features assess the truthfulness of LLM-generated statements. This discovery raises a crucial open question: how can we prune LLMs without sacrificing these critical lie detection capabilities? Our investigation further reveals that naively adjusting layer-wise pruning sparsity based on importance inadvertently removes crucial weights, failing to improve lie detection performance despite its reliance on the most crucial LLM layer. To address this issue, we propose Truthful Pruning aligned by Layer-wise Outliers (TPLO), which places greater emphasis on layers with more activation outliers and stronger discriminative features simultaneously. This preserves LLMs' original performance while retaining critical features of inner states needed for robust lie detection. Moreover, we introduce a prompting rule to enrich the TruthfulQA benchmark for better calibrating LLM pruning. Empirical results show that our approach improves the hallucination detection for pruned LLMs (achieving 88% accuracy at 50% sparsity) and enhances their performance on TruthfulQA.

[622] arXiv:2509.00117 (replaced) [pdf, html, other]
Title: Embodied AI: Emerging Risks and Opportunities for Policy Action
Jared Perlo, Alexander Robey, Fazl Barez, Luciano Floridi, Jakob Mökander
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Robotics (cs.RO)

The field of embodied AI (EAI) is rapidly advancing. Unlike virtual AI, EAI systems can exist in, learn from, reason about, and act in the physical world. With recent advances in AI models and hardware, EAI systems are becoming increasingly capable across wider operational domains. While EAI systems can offer many benefits, they also pose significant risks, including physical harm from malicious use, mass surveillance, as well as economic and societal disruption. These risks require urgent attention from policymakers, as existing policies governing industrial robots and autonomous vehicles are insufficient to address the full range of concerns EAI systems present. To help address this issue, this paper makes three contributions. First, we provide a taxonomy of the physical, informational, economic, and social risks EAI systems pose. Second, we analyze policies in the US, EU, and UK to assess how existing frameworks address these risks and to identify critical gaps. We conclude by offering policy recommendations for the safe and beneficial deployment of EAI systems, such as mandatory testing and certification schemes, clarified liability frameworks, and strategies to manage EAI's potentially transformative economic and societal impacts.

[623] arXiv:2509.00367 (replaced) [pdf, html, other]
Title: A Multimodal and Multi-centric Head and Neck Cancer Dataset for Tumor Segmentation and Outcome Prediction
Numan Saeed, Salma Hassan, Shahad Hardan, Ahmed Aly, Darya Taratynova, Umair Nawaz, Ufaq Khan, Muhammad Ridzuan, Vincent Andrearczyk, Adrien Depeursinge, Mathieu Hatt, Thomas Eugene, Raphaël Metz, Mélanie Dore, Gregory Delpon, Vijay Ram Kumar Papineni, Kareem Wahid, Cem Dede, Alaa Mohamed Shawky Ali, Carlos Sjogreen, Mohamed Naser, Clifton D. Fuller, Valentin Oreiller, Mario Jreige, John O. Prior, Catherine Cheze Le Rest, Olena Tankyevych, Pierre Decazes, Su Ruan, Stephanie Tanadini-Lang, Martin Vallières, Hesham Elhalawani, Ronan Abgral, Romain Floch, Kevin Kerleguer, Ulrike Schick, Maelle Mauguen, Arman Rahmim, Mohammad Yaqub
Comments: 10 pages, 5 figures. Numan Saeed is the corresponding author. Numan Saeed, Salma Hassan and Shahad Hardan contributed equally to this work. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We describe a publicly available multimodal dataset of annotated Positron Emission Tomography/Computed Tomography (PET/CT) studies for head and neck cancer research. The dataset includes 1123 FDG-PET/CT studies from patients with histologically confirmed head and neck cancer, acquired from 10 international medical centers. All examinations consisted of co-registered PET/CT scans with varying acquisition protocols, reflecting real-world clinical diversity across institutions. Primary gross tumor volumes (GTVp) and involved lymph nodes (GTVn) were manually segmented by experienced radiation oncologists and radiologists following standardized guidelines and quality control measures. We provide anonymized NifTi files of all studies, along with expert-annotated segmentation masks, radiotherapy dose distribution for a subset of patients, and comprehensive clinical metadata. This metadata includes TNM staging, HPV status, demographics (age and gender), long-term follow-up outcomes, survival times, censoring indicators, and treatment information. We demonstrate how this dataset can be used for three key clinical tasks: automated tumor segmentation, recurrence-free survival prediction, and HPV status classification, providing benchmark results using state-of-the-art deep learning models, including UNet, SegResNet, and multimodal prognostic frameworks.

[624] arXiv:2509.00578 (replaced) [pdf, html, other]
Title: C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection
Abdellah Zakaria Sellam, Ilyes Benaissa, Salah Eddine Bekhouche, Abdenour Hadid, Vito Renó, Cosimo Distante
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fine-grained object detection in challenging visual domains, such as vehicle damage assessment, presents a formidable challenge even for human experts to resolve reliably. While DiffusionDet has advanced the state-of-the-art through conditional denoising diffusion, its performance remains limited by local feature conditioning in context-dependent scenarios. We address this fundamental limitation by introducing Context-Aware Fusion (CAF), which leverages cross-attention mechanisms to integrate global scene context with local proposal features directly. The global context is generated using a separate dedicated encoder that captures comprehensive environmental information, enabling each object proposal to attend to scene-level understanding. Our framework significantly enhances the generative detection paradigm by enabling each object proposal to attend to comprehensive environmental information. Experimental results demonstrate an improvement over state-of-the-art models on the CarDD benchmark, establishing new performance benchmarks for context-aware object detection in fine-grained domains

[625] arXiv:2509.00587 (replaced) [pdf, other]
Title: A Hoare Logic for Symmetry Properties
Vaibhav Mehta, Justin Hsu
Comments: Accepted to OOPSLA '25
Subjects: Programming Languages (cs.PL)

Many natural program correctness properties can be stated in terms of symmetries, but existing formal methods have little support for reasoning about such properties. We consider how to formally verify a broad class of symmetry properties expressed in terms of group actions. To specify these properties, we design a syntax for group actions, supporting standard constructions and a natural notion of entailment. Then, we develop a Hoare-style logic for verifying symmetry properties of imperative programs, where group actions take the place of the typical pre- and post-condition assertions. Finally, we develop a prototype tool SymVerif, and use it to verify symmetry properties on a series of handcrafted benchmarks. Our tool uncovered an error in a model of a dynamical system described by \citet{McLachlan_Quispel_2002}.

[626] arXiv:2509.00591 (replaced) [pdf, other]
Title: Probe-Rewrite-Evaluate: A Workflow for Reliable Benchmarks and Quantifying Evaluation Awareness
Lang Xiong, Nishant Bhargava, Wesley Chang, Jianhang Hong, Haihao Liu, Kevin Zhu
Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) often exhibit significant behavioral shifts when they perceive a change from a real-world deployment context to a controlled evaluation setting, a phenomenon known as "evaluation awareness." This discrepancy poses a critical challenge for AI alignment, as benchmark performance may not accurately reflect a model's true safety and honesty. In this work, we systematically quantify these behavioral changes by manipulating the perceived context of prompts. We introduce a methodology that uses a linear probe to score prompts on a continuous scale from "test-like" to "deploy-like" and leverage an LLM rewriting strategy to shift these prompts towards a more natural, deployment-style context while preserving the original task. Using this method, we achieved a 30% increase in the average probe score across a strategic role-playing dataset after rewriting. Evaluating a suite of state-of-the-art models on these original and rewritten prompts, we find that rewritten "deploy-like" prompts induce a significant and consistent shift in behavior. Across all models, we observed an average increase in honest responses of 5.26% and a corresponding average decrease in deceptive responses of 12.40%. Furthermore, refusal rates increased by an average of 6.38%, indicating heightened safety compliance. Our findings demonstrate that evaluation awareness is a quantifiable and manipulable factor that directly influences LLM behavior, revealing that models are more prone to unsafe or deceptive outputs in perceived test environments. This underscores the urgent need for more realistic evaluation frameworks to accurately gauge true model alignment before deployment.

[627] arXiv:2509.00761 (replaced) [pdf, html, other]
Title: L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
Ziqi Wang, Boqin Yuan
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a system that reduces hallucination and uncertainty in legal question answering through coordinated multi-agent reasoning and retrieval. Unlike single-pass retrieval-augmented generation (RAG), L-MARS decomposes queries into subproblems, issues targeted searches across heterogeneous sources (Serper web, local RAG, CourtListener case law), and employs a Judge Agent to verify sufficiency, jurisdiction, and temporal validity before answer synthesis. This iterative reasoning-search-verification loop maintains coherence, filters noisy evidence, and grounds answers in authoritative law. We evaluated L-MARS on LegalSearchQA, a new benchmark of 200 up-to-date multiple choice legal questions in 2025. Results show that L-MARS substantially improves factual accuracy, reduces uncertainty, and achieves higher preference scores from both human experts and LLM-based judges. Our work demonstrates that multi-agent reasoning with agentic search offers a scalable and reproducible blueprint for deploying LLMs in high-stakes domains requiring precise legal retrieval and deliberation.

[628] arXiv:2509.00798 (replaced) [pdf, html, other]
Title: Multimodal Iterative RAG for Knowledge Visual Question Answering
Changin Choi, Wonseok Lee, Jungmin Ko, Wonjong Rhee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

While Multimodal Large Language Models (MLLMs) have significantly advanced multimodal understanding, their performance remains limited on knowledge-intensive visual questions that require external knowledge beyond the image. Retrieval-Augmented Generation (RAG) has become a promising solution for providing models with external knowledge, its conventional single-pass framework often fails to gather sufficient knowledge. To overcome this limitation, we propose MI-RAG, a Multimodal Iterative RAG framework that leverages reasoning to enhance retrieval and update reasoning over newly retrieved knowledge across modalities. At each iteration, MI-RAG leverages an accumulated reasoning record to dynamically formulate a multi-query. These queries then drive a joint search across heterogeneous knowledge bases containing both visually-grounded and textual knowledge. The newly acquired knowledge is synthesized into the reasoning record, progressively refining understanding across iterations. Experiments on challenging benchmarks, including Encyclopedic VQA, InfoSeek, and OK-VQA, show that MI-RAG significantly improves both retrieval recall and answer accuracy, establishing a scalable approach for compositional reasoning in knowledge-intensive VQA.

[629] arXiv:2509.00831 (replaced) [pdf, html, other]
Title: UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring
Zhijing Wu, Longguang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Reconstructing dynamic 3D scenes from monocular video has broad applications in AR/VR, robotics, and autonomous navigation, but often fails due to severe motion blur caused by camera and object motion. Existing methods commonly follow a two-step pipeline, where camera poses are first estimated and then 3D Gaussians are optimized. Since blurring artifacts usually undermine pose estimation, pose errors could be accumulated to produce inferior reconstruction results. To address this issue, we introduce a unified optimization framework by incorporating camera poses as learnable parameters complementary to 3DGS attributes for end-to-end optimization. Specifically, we recast camera and object motion as per-primitive SE(3) affine transformations on 3D Gaussians and formulate a unified optimization objective. For stable optimization, we introduce a three-stage training schedule that optimizes camera poses and Gaussians alternatively. Particularly, 3D Gaussians are first trained with poses being fixed, and then poses are optimized with 3D Gaussians being untouched. Finally, all learnable parameters are optimized together. Extensive experiments on the Stereo Blur dataset and challenging real-world sequences demonstrate that our method achieves significant gains in reconstruction quality and pose estimation accuracy over prior dynamic deblurring methods.

[630] arXiv:2509.00868 (replaced) [pdf, html, other]
Title: A Modular and Scalable Simulator for Connected-UAVs Communication in 5G Networks
Yong Su, Yiyi Chen, Shenghong Yi, Hui Feng, Yuedong Xu, Wang Xiang, Bo Hu
Comments: a short version is accepted by MSWiM 2025
Subjects: Networking and Internet Architecture (cs.NI)

Cellular-connected UAV systems have enabled a wide range of low-altitude aerial services. However, these systems still face many challenges, such as frequent handovers and the inefficiency of traditional transport protocols. To better study these issues, we develop a modular and scalable simulation platform specifically designed for UAVs communication leveraging the research ecology in wireless communication of MATLAB. The platform supports flexible 5G NR node deployment, customizable UAVs mobility models, and multi-network-interface extensions. It also supports multiple transport protocols including TCP, UDP, QUIC, etc., allowing to investigate how different transport protocols affect UAVs communication performance. In addition, the platform includes a handover management module, enabling the evaluation of both traditional and learning-based handover strategies. Our platform can serve as a testbed for the development and evaluation of advanced transmission strategies in cellular-connected UAV systems.

[631] arXiv:2509.00891 (replaced) [pdf, html, other]
Title: ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care
Zonghai Yao, Talha Chafekar, Junda Wang, Shuo Han, Feiyun Ouyang, Junhui Qian, Lingxi Li, Hong Yu
Comments: Equal contribution for the first two authors
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Real-world adoption of closed-loop insulin delivery systems (CLIDS) in type 1 diabetes remains low, driven not by technical failure, but by diverse behavioral, psychosocial, and social barriers. We introduce ChatCLIDS, the first benchmark to rigorously evaluate LLM-driven persuasive dialogue for health behavior change. Our framework features a library of expert-validated virtual patients, each with clinically grounded, heterogeneous profiles and realistic adoption barriers, and simulates multi-turn interactions with nurse agents equipped with a diverse set of evidence-based persuasive strategies. ChatCLIDS uniquely supports longitudinal counseling and adversarial social influence scenarios, enabling robust, multi-dimensional evaluation. Our findings reveal that while larger and more reflective LLMs adapt strategies over time, all models struggle to overcome resistance, especially under realistic social pressure. These results highlight critical limitations of current LLMs for behavior change, and offer a high-fidelity, scalable testbed for advancing trustworthy persuasive AI in healthcare and beyond.

[632] arXiv:2509.00905 (replaced) [pdf, html, other]
Title: Spotlighter: Revisiting Prompt Tuning from a Representative Mining View
Yutong Gao, Maoyuan Shao, Xinyang Huang, Chuang Zhu, Lijuan Sun, Yu Weng, Xuan Liu, Guoshun Nan
Comments: Accepted as EMNLP 2025 Findings
Journal-ref: EMNLP2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

CLIP's success has demonstrated that prompt tuning can achieve robust cross-modal semantic alignment for tasks ranging from open-domain recognition to fine-grained classification. However, redundant or weakly relevant feature components introduce noise and incur unnecessary computational costs. In this work, we propose Spotlighter, a lightweight token-selection framework that simultaneously enhances accuracy and efficiency in prompt tuning. Spotlighter evaluates each visual token's activation from both sample-wise and semantic-wise perspectives and retains only the top-scoring tokens for downstream prediction. A class-specific semantic memory bank of learned prototypes refines this selection, ensuring semantic representativeness and compensating for discarded features. To further prioritize informative signals, we introduce a two-level ranking mechanism that dynamically weights token--prototype interactions. Across 11 few-shot benchmarks, Spotlighter outperforms CLIP by up to 11.19\% in harmonic mean accuracy and achieves up to 0.8K additional FPS, with only 21 extra parameters. These results establish Spotlighter as an effective and scalable baseline for prompt tuning. Code for our method will be available at this https URL.

[633] arXiv:2509.00911 (replaced) [pdf, other]
Title: GS-TG: 3D Gaussian Splatting Accelerator with Tile Grouping for Reducing Redundant Sorting while Preserving Rasterization Efficiency
Joongho Jo, Jongsun Park
Comments: DAC 2025
Subjects: Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)

3D Gaussian Splatting (3D-GS) has emerged as a promising alternative to neural radiance fields (NeRF) as it offers high speed as well as high image quality in novel view synthesis. Despite these advancements, 3D-GS still struggles to meet the frames per second (FPS) demands of real-time applications. In this paper, we introduce GS-TG, a tile-grouping-based accelerator that enhances 3D-GS rendering speed by reducing redundant sorting operations and preserving rasterization efficiency. GS-TG addresses a critical trade-off issue in 3D-GS rendering: increasing the tile size effectively reduces redundant sorting operations, but it concurrently increases unnecessary rasterization computations. So, during sorting of the proposed approach, GS-TG groups small tiles (for making large tiles) to share sorting operations across tiles within each group, significantly reducing redundant computations. During rasterization, a bitmask assigned to each Gaussian identifies relevant small tiles, to enable efficient sharing of sorting results. Consequently, GS-TG enables sorting to be performed as if a large tile size is used by grouping tiles during the sorting stage, while allowing rasterization to proceed with the original small tiles by using bitmasks in the rasterization stage. GS-TG is a lossless method requiring no retraining or fine-tuning and it can be seamlessly integrated with previous 3D-GS optimization techniques. Experimental results show that GS-TG achieves an average speed-up of 1.54 times over state-of-the-art 3D-GS accelerators.

[634] arXiv:2509.00971 (replaced) [pdf, other]
Title: CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs
Jay Vaghasiya, Omkar Ghugarkar, Vishvesh Bhat, Vipul Dholaria, Julian McAuley
Subjects: Artificial Intelligence (cs.AI)

We introduce CoreThink, a state-of-the-art Reasoning Layer built upon a novel reasoning method called General Symbolics. This approach diverges from reasoning paradigms such as test-time scaling, Supervised Fine-Tuning (SFT), and Reinforcement Learning with Verifiable Rewards (RLVR). CoreThink General Symbolic Reasoner (GSR) is specifically structured around three key use cases: tool-calling, code generation, and planning, demonstrating exemplary performance across a total of seven benchmarks in their respective areas. Notably, we are achieving SOTA scores of 66.66% on Livecodebench v6, 89% on Instruction-Following Evals, and 24.4% on ARC-AGI-2. We also present an agentic coding IDE, developed using the principles of General Symbolics, which achieves a state-of-the-art accuracy of 62.3% on SWE-Bench Lite. We are able to achieve these improvements without any fine-tuning or training costs. Our Reasoning Layer is designed to provide a pure performance uplift, ensuring that a model's accuracy on reasoning tasks is never negatively impacted. We argue that incumbent methods will eventually lead to diminishing returns in LLM performance, necessitating the development of new reasoning techniques. This technical report details our approach at a high level and the availability of the CoreThink models for reasoning-intensive use cases.

[635] arXiv:2509.01028 (replaced) [pdf, html, other]
Title: CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve, Chengyuan Xu, Ratheesh Kalarot, Junsong Yuan
Comments: Accepted by ICCV 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In text-to-image (T2I) generation, achieving fine-grained control over attributes - such as age or smile - remains challenging, even with detailed text prompts. Slider-based methods offer a solution for precise control of image attributes. Existing approaches typically train individual adapter for each attribute separately, overlooking the entanglement among multiple attributes. As a result, interference occurs among different attributes, preventing precise control of multiple attributes together. To address this challenge, we aim to disentangle multiple attributes in slider-based generation to enbale more reliable and independent attribute manipulation. Our approach, CompSlider, can generate a conditional prior for the T2I foundation model to control multiple attributes simultaneously. Furthermore, we introduce novel disentanglement and structure losses to compose multiple attribute changes while maintaining structural consistency within the image. Since CompSlider operates in the latent space of the conditional prior and does not require retraining the foundation model, it reduces the computational burden for both training and inference. We evaluate our approach on a variety of image attributes and highlight its generality by extending to video generation.

[636] arXiv:2509.01030 (replaced) [pdf, html, other]
Title: Identifying Origins of Place Names via Retrieval Augmented Generation
Alexis Horde-Vo, Matt Duckham, Estrid He, Rafe Benli
Subjects: Information Retrieval (cs.IR)

Who is the "Batman" behind "Batman Street" in Melbourne? Understanding the historical, cultural, and societal narratives behind place names can reveal the rich context that has shaped a community. Although place names serve as essential spatial references in gazetteers, they often lack information about place name origins. Enriching these place names in today's gazetteers is a time-consuming, manual process that requires extensive exploration of a vast archive of documents and text sources. Recent advances in natural language processing and language models (LMs) hold the promise of significant automation of identifying place name origins due to their powerful capability to exploit the semantics of the stored documents. This chapter presents a retrieval augmented generation pipeline designed to search for place name origins over a broad knowledge base, DBpedia. Given a spatial query, our approach first extracts sub-graphs that may contain knowledge relevant to the query; then ranks the extracted sub-graphs to generate the final answer to the query using fine-tuned LM-based models (i.e., ColBERTv2 and Llama2). Our results highlight the key challenges facing automated retrieval of place name origins, especially the tendency of language models to under-use the spatial information contained in texts as a discriminating factor. Our approach also frames the wider implications for geographic information retrieval using retrieval augmented generation.

[637] arXiv:2509.01245 (replaced) [pdf, html, other]
Title: Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
Yusheng Zheng, Yanpeng Hu, Wei Zhang, Andi Quinn
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Operating Systems (cs.OS)

Operating system schedulers suffer from a fundamental semantic gap, where kernel policies fail to understand application-specific needs, leading to suboptimal performance. We introduce SchedCP, the first framework that enables fully autonomous Large Language Model (LLM) agents to safely and efficiently optimize Linux schedulers without human involvement. Our core insight is that the challenge is not merely to apply a better LLM, but to architect a decoupled control plane that separates the AI's role of semantic reasoning ("what to optimize") from the system's role of execution ("how to observe and act"). Implemented as Model Context Protocol(MCP) server, SchedCP provides a stable interface with three key services: a Workload Analysis Engine, an evolving Scheduler Policy Repository, and an Execution Verifier that validates all AI-generated code and configure before deployment with static and dynamic analysis.
We demonstrate this architecture's power with sched-agent, a multi-agent system that autonomously analyzes workloads, synthesizes custom eBPF scheduling policies, and deploys them via the sched\_ext infrastructure. Our evaluation shows that SchedCP achieves up to an 1.79x performance improvement, and a 13x cost reduction compared to naive agentic approaches, all while maintaining high success rate. By bridging the semantic gap, SchedCP democratizes expert-level system optimization and represents a step towards creating truly self-optimizing, application-aware operating systems. The code is open-sourced in this https URL

[638] arXiv:2509.01275 (replaced) [pdf, html, other]
Title: Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation
Jiahao Li, Yang Lu, Yachao Zhang, Fangyong Wang, Yuan Xie, Yanyun Qu
Comments: Accepted by ACMMM2025
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Open-vocabulary semantic segmentation (OVSS) conducts pixel-level classification via text-driven alignment, where the domain discrepancy between base category training and open-vocabulary inference poses challenges in discriminative modeling of latent unseen category. To address this challenge, existing vision-language model (VLM)-based approaches demonstrate commendable performance through pre-trained multi-modal representations. However, the fundamental mechanisms of latent semantic comprehension remain underexplored, making the bottleneck for OVSS. In this work, we initiate a probing experiment to explore distribution patterns and dynamics of latent semantics in VLMs under inductive learning paradigms. Building on these insights, we propose X-Agent, an innovative OVSS framework employing latent semantic-aware ``agent'' to orchestrate cross-modal attention mechanisms, simultaneously optimizing latent semantic dynamic and amplifying its perceptibility. Extensive benchmark evaluations demonstrate that X-Agent achieves state-of-the-art performance while effectively enhancing the latent semantic saliency.

[639] arXiv:2509.01293 (replaced) [pdf, html, other]
Title: Equivariant U-Shaped Neural Operators for the Cahn-Hilliard Phase-Field Model
Xiao Xue, M.F.P. ten Eikelder, Tianyue Yang, Yiqing Li, Kan He, Shuo Wang, Peter V. Coveney
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)

Phase separation in binary mixtures, governed by the Cahn-Hilliard equation, plays a central role in interfacial dynamics across materials science and soft matter. While numerical solvers are accurate, they are often computationally expensive and lack flexibility across varying initial conditions and geometries. Neural operators provide a data-driven alternative by learning solution operators between function spaces, but current architectures often fail to capture multiscale behavior and neglect underlying physical symmetries. Here we show that an equivariant U-shaped neural operator (E-UNO) can learn the evolution of the phase-field variable from short histories of past dynamics, achieving accurate predictions across space and time. The model combines global spectral convolution with a multi-resolution U-shaped architecture and regulates translation equivariance to align with the underlying physics. E-UNO outperforms standard Fourier neural operator and U-shaped neural operator baselines, particularly on fine-scale and high-frequency structures. By encoding symmetry and scale hierarchy, the model generalizes better, requires less training data, and yields physically consistent dynamics. This establishes E-UNO as an efficient surrogate for complex phase-field systems.

[640] arXiv:2509.01498 (replaced) [pdf, html, other]
Title: MSA2-Net: Utilizing Self-Adaptive Convolution Module to Extract Multi-Scale Information in Medical Image Segmentation
Chao Deng, Xiaosen Li, Xiao Qin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The nnUNet segmentation framework adeptly adjusts most hyperparameters in training scripts automatically, but it overlooks the tuning of internal hyperparameters within the segmentation network itself, which constrains the model's ability to generalize. Addressing this limitation, this study presents a novel Self-Adaptive Convolution Module that dynamically adjusts the size of the convolution kernels depending on the unique fingerprints of different datasets. This adjustment enables the MSA2-Net, when equipped with this module, to proficiently capture both global and local features within the feature maps. Self-Adaptive Convolution Module is strategically integrated into two key components of the MSA2-Net: the Multi-Scale Convolution Bridge and the Multi-Scale Amalgamation Decoder. In the MSConvBridge, the module enhances the ability to refine outputs from various stages of the CSWin Transformer during the skip connections, effectively eliminating redundant data that could potentially impair the decoder's performance. Simultaneously, the MSADecoder, utilizing the module, excels in capturing detailed information of organs varying in size during the decoding phase. This capability ensures that the decoder's output closely reproduces the intricate details within the feature maps, thus yielding highly accurate segmentation images. MSA2-Net, bolstered by this advanced architecture, has demonstrated exceptional performance, achieving Dice coefficient scores of 86.49\%, 92.56\%, 93.37\%, and 92.98\% on the Synapse, ACDC, Kvasir, and Skin Lesion Segmentation (ISIC2017) datasets, respectively. This underscores MSA2-Net's robustness and precision in medical image segmentation tasks across various datasets.

[641] arXiv:2509.01527 (replaced) [pdf, other]
Title: A Privacy-Preserving Recommender for Filling Web Forms Using a Local Large Language Model
Amirreza Nayyeri, Abbas Rasoolzadegan
Subjects: Software Engineering (cs.SE)

Web applications are increasingly used in critical domains such as education, finance, and e-commerce. This highlights the need to ensure their failure-free performance. One effective method for evaluating failure-free performance is web form testing, where defining effective test scenarios is key to a complete and accurate evaluation. A core aspect of this process involves filling form fields with suitable values to create effective test cases. However, manually generating these values is time-consuming and prone to errors. To address this, various tools have been developed to assist testers. With the appearance of large language models (LLMs), a new generation of tools seeks to handle this task more intelligently. Although many LLM-based tools have been introduced, as these models typically rely on cloud infrastructure, their use in testing confidential web forms raises concerns about unintended data leakage and breaches of confidentiality. This paper introduces a privacy-preserving recommender that operates locally using a large language model. The tool assists testers in web form testing by suggesting effective field values. This tool analyzes the HTML structure of forms, detects input types, and extracts constraints based on each field's type and contextual content, guiding proper field filling.

[642] arXiv:2509.01530 (replaced) [pdf, html, other]
Title: Asymmetric Impact of Basic Scientists during Applied Shift
Rikuei Kaku, Mikako Bito, Keita Nishimoto, Ichiro Sakata, Kimitaka Asatani
Subjects: Digital Libraries (cs.DL)

Despite broad acclaim for basic research, science is undergoing an applied shift that marginalizes basic scientists. This gap reflects an incomplete understanding of their distinctive roles, which prevents translating philosophical appreciation into effective support. We introduce a scalable metric--the application score--to position research along the basic-applied spectrum and apply it to 62 million publications (1970-2023) to reveal the distinctive contributions of basic scientists. We find a structural asymmetry: involvement of basic scientists substantially increases citation impact, even more so in applied contexts, while applied scientists show no such effect in basic domains. This asymmetric effect arises from their intellectual leadership in conceptualization, writing, and experimental design, amplified in large, multidisciplinary, and intermediate career teams. Yet basic scientists remain concentrated in historically prestigious institutions, while new entrants shift toward applied work, indicating critical undersupply. These findings provide large-scale evidence for the indispensable role of basic scientists, guiding policy and institutional strategy to sustain the foundations of discovery and innovation.

[643] arXiv:2509.01839 (replaced) [pdf, html, other]
Title: HodgeFormer: Transformers for Learnable Operators on Triangular Meshes through Data-Driven Hodge Matrices
Akis Nousias, Stavros Nousias
Comments: 13 pages, 11 figures, 9 tables
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Currently, prominent Transformer architectures applied on graphs and meshes for shape analysis tasks employ traditional attention layers that heavily utilize spectral features requiring costly eigenvalue decomposition-based methods. To encode the mesh structure, these methods derive positional embeddings, that heavily rely on eigenvalue decomposition based operations, e.g. on the Laplacian matrix, or on heat-kernel signatures, which are then concatenated to the input features. This paper proposes a novel approach inspired by the explicit construction of the Hodge Laplacian operator in Discrete Exterior Calculus as a product of discrete Hodge operators and exterior derivatives, i.e. $(L := \star_0^{-1} d_0^T \star_1 d_0)$. We adjust the Transformer architecture in a novel deep learning layer that utilizes the multi-head attention mechanism to approximate Hodge matrices $\star_0$, $\star_1$ and $\star_2$ and learn families of discrete operators $L$ that act on mesh vertices, edges and faces. Our approach results in a computationally-efficient architecture that achieves comparable performance in mesh segmentation and classification tasks, through a direct learning framework, while eliminating the need for costly eigenvalue decomposition operations or complex preprocessing operations.

[644] arXiv:2509.01882 (replaced) [pdf, html, other]
Title: HydroVision: Predicting Optically Active Parameters in Surface Water Using Computer Vision
Shubham Laxmikant Deshmukh, Matthew Wilchek, Feras A. Batarseh
Comments: This paper is under peer review for IEEE Journal of Oceanic Engineering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Ongoing advancements in computer vision, particularly in pattern recognition and scene classification, have enabled new applications in environmental monitoring. Deep learning now offers non-contact methods for assessing water quality and detecting contamination, both critical for disaster response and public health protection. This work introduces HydroVision, a deep learning-based scene classification framework that estimates optically active water quality parameters including Chlorophyll-Alpha, Chlorophylls, Colored Dissolved Organic Matter (CDOM), Phycocyanins, Suspended Sediments, and Turbidity from standard Red-Green-Blue (RGB) images of surface water. HydroVision supports early detection of contamination trends and strengthens monitoring by regulatory agencies during external environmental stressors, industrial activities, and force majeure events. The model is trained on more than 500,000 seasonally varied images collected from the United States Geological Survey Hydrologic Imagery Visualization and Information System between 2022 and 2024. This approach leverages widely available RGB imagery as a scalable, cost-effective alternative to traditional multispectral and hyperspectral remote sensing. Four state-of-the-art convolutional neural networks (VGG-16, ResNet50, MobileNetV2, DenseNet121) and a Vision Transformer are evaluated through transfer learning to identify the best-performing architecture. DenseNet121 achieves the highest validation performance, with an R2 score of 0.89 in predicting CDOM, demonstrating the framework's promise for real-world water quality monitoring across diverse conditions. While the current model is optimized for well-lit imagery, future work will focus on improving robustness under low-light and obstructed scenarios to expand its operational utility.

[645] arXiv:2509.01984 (replaced) [pdf, html, other]
Title: Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing
Quan Dao, Xiaoxiao He, Ligong Han, Ngan Hoai Nguyen, Amin Heyrani Nobar, Faez Ahmed, Han Zhang, Viet Anh Nguyen, Dimitris Metaxas
Comments: update affiliation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual autoregressive models (VAR) have recently emerged as a promising class of generative models, achieving performance comparable to diffusion models in text-to-image generation tasks. While conditional generation has been widely explored, the ability to perform prompt-guided image editing without additional training is equally critical, as it supports numerous practical real-world applications. This paper investigates the text-to-image editing capabilities of VAR by introducing Visual AutoRegressive Inverse Noise (VARIN), the first noise inversion-based editing technique designed explicitly for VAR models. VARIN leverages a novel pseudo-inverse function for argmax sampling, named Location-aware Argmax Inversion (LAI), to generate inverse Gumbel noises. These inverse noises enable precise reconstruction of the source image and facilitate targeted, controllable edits aligned with textual prompts. Extensive experiments demonstrate that VARIN effectively modifies source images according to specified prompts while significantly preserving the original background and structural details, thus validating its efficacy as a practical editing approach.

[646] arXiv:2509.02028 (replaced) [pdf, html, other]
Title: See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems
Halima Bouzidi, Haoyu Liu, Mohammad Abdullah Al Faruque
Comments: 12 pages, 1 figure, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

Language-vision understanding has driven the development of advanced perception systems, most notably the emerging paradigm of Referring Multi-Object Tracking (RMOT). By leveraging natural-language queries, RMOT systems can selectively track objects that satisfy a given semantic description, guided through Transformer-based spatial-temporal reasoning modules. End-to-End (E2E) RMOT models further unify feature extraction, temporal memory, and spatial reasoning within a Transformer backbone, enabling long-range spatial-temporal modeling over fused textual-visual representations. Despite these advances, the reliability and robustness of RMOT remain underexplored. In this paper, we examine the security implications of RMOT systems from a design-logic perspective, identifying adversarial vulnerabilities that compromise both the linguistic-visual referring and track-object matching components. Additionally, we uncover a novel vulnerability in advanced RMOT models employing FIFO-based memory, whereby targeted and consistent attacks on their spatial-temporal reasoning introduce errors that persist within the history buffer over multiple subsequent frames. We present VEIL, a novel adversarial framework designed to disrupt the unified referring-matching mechanisms of RMOT models. We show that carefully crafted digital and physical perturbations can corrupt the tracking logic reliability, inducing track ID switches and terminations. We conduct comprehensive evaluations using the Refer-KITTI dataset to validate the effectiveness of VEIL and demonstrate the urgent need for security-aware RMOT designs for critical large-scale applications.

[647] arXiv:2509.02170 (replaced) [pdf, html, other]
Title: Avoidance Decoding for Diverse Multi-Branch Story Generation
Kyeongman Park, Nakyeong Yang, Kyomin Jung
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) often generate repetitive and monotonous outputs, especially in tasks like story generation, due to limited creative diversity when given the same input prompt. To address this challenge, we propose a novel decoding strategy, Avoidance Decoding, that modifies token logits by penalizing similarity to previously generated outputs, thereby encouraging more diverse multi-branch stories. This penalty adaptively balances two similarity measures: (1) Concept-level Similarity Penalty, which is prioritized in early stages to diversify initial story concepts, and (2) Narrative-level Similarity Penalty, which is increasingly emphasized later to ensure natural yet diverse plot development. Notably, our method achieves up to 2.6 times higher output diversity and reduces repetition by an average of 30% compared to strong baselines, while effectively mitigating text degeneration. Furthermore, we reveal that our method activates a broader range of neurons, demonstrating that it leverages the model's intrinsic creativity.

[648] arXiv:2509.02184 (replaced) [pdf, html, other]
Title: Task and Motion Planning of Dynamic Systems using Hyperproperties for Signal Temporal Logics
Jianing Zhao, Bowen Ye, Xinyi Yu, Rupak Majumdar, Xiang Yin
Subjects: Systems and Control (eess.SY)

We investigate the task and motion planning problem for dynamical systems under signal temporal logic (STL) specifications. Existing works on STL control synthesis mainly focus on generating plans that satisfy properties over a single executed trajectory. In this work, we consider the planning problem for hyperproperties evaluated over a set of possible trajectories, which naturally arise in information-flow control problems. Specifically, we study discrete-time dynamical systems and employ the recently developed temporal logic HyperSTL as the new objective for planning. To solve this problem, we propose a novel recursive counterexample-guided synthesis approach capable of effectively handling HyperSTL specifications with multiple alternating quantifiers. The proposed method is not only applicable to planning but also extends to HyperSTL model checking for discrete-time dynamical systems. Finally, we present case studies on security-preserving planning and ambiguity-free planning to demonstrate the effectiveness of the proposed HyperSTL planning framework.

[649] arXiv:2509.02281 (replaced) [pdf, html, other]
Title: Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective
Shijie Wang, Li Zhang, Xinyan Liang, Yuhua Qian, Shen Hu
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)

Multimodal learning typically utilizes multimodal joint loss to integrate different modalities and enhance model performance. However, this joint learning strategy can induce modality imbalance, where strong modalities overwhelm weaker ones and limit exploitation of individual information from each modality and the inter-modality interaction information. Existing strategies such as dynamic loss weighting, auxiliary objectives and gradient modulation mitigate modality imbalance based on joint loss. These methods remain fundamentally reactive, detecting and correcting imbalance after it arises, while leaving the competitive nature of the joint loss untouched. This limitation drives us to explore a new strategy for multimodal imbalance learning that does not rely on the joint loss, enabling more effective interactions between modalities and better utilization of information from individual modalities and their interactions. In this paper, we introduce Unidirectional Dynamic Interaction (UDI), a novel strategy that abandons the conventional joint loss in favor of a proactive, sequential training scheme. UDI first trains the anchor modality to convergence, then uses its learned representations to guide the other modality via unsupervised loss. Furthermore, the dynamic adjustment of modality interactions allows the model to adapt to the task at hand, ensuring that each modality contributes optimally. By decoupling modality optimization and enabling directed information flow, UDI prevents domination by any single modality and fosters effective cross-modal feature learning. Our experimental results demonstrate that UDI outperforms existing methods in handling modality imbalance, leading to performance improvement in multimodal learning tasks.

[650] arXiv:2509.02283 (replaced) [pdf, html, other]
Title: Sem-RaDiff: Diffusion-Based 3D Radar Semantic Perception in Cluttered Agricultural Environments
Ruibin Zhang, Fei Gao
Subjects: Robotics (cs.RO)

Accurate and robust environmental perception is crucial for robot autonomous navigation. While current methods typically adopt optical sensors (e.g., camera, LiDAR) as primary sensing modalities, their susceptibility to visual occlusion often leads to degraded performance or complete system failure. In this paper, we focus on agricultural scenarios where robots are exposed to the risk of onboard sensor contamination. Leveraging radar's strong penetration capability, we introduce a radar-based 3D environmental perception framework as a viable alternative. It comprises three core modules designed for dense and accurate semantic perception: 1) Parallel frame accumulation to enhance signal-to-noise ratio of radar raw data. 2) A diffusion model-based hierarchical learning framework that first filters radar sidelobe artifacts then generates fine-grained 3D semantic point clouds. 3) A specifically designed sparse 3D network optimized for processing large-scale radar raw data. We conducted extensive benchmark comparisons and experimental evaluations on a self-built dataset collected in real-world agricultural field scenes. Results demonstrate that our method achieves superior structural and semantic prediction performance compared to existing methods, while simultaneously reducing computational and memory costs by 51.3% and 27.5%, respectively. Furthermore, our approach achieves complete reconstruction and accurate classification of thin structures such as poles and wires-which existing methods struggle to perceive-highlighting its potential for dense and accurate 3D radar perception.

[651] arXiv:2509.02305 (replaced) [pdf, html, other]
Title: Hues and Cues: Human vs. CLIP
Nuria Alabau-Bosque, Jorge Vila-Tomás, Paula Daudén-Oliver, Pablo Hernández-Cámara, Jose Manuel Jaén-Lorites, Valero Laparra, Jesús Malo
Comments: 4 pages, 3 figures. 8th annual conference on Cognitive Computational Neuroscience
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Playing games is inherently human, and a lot of games are created to challenge different human characteristics. However, these tasks are often left out when evaluating the human-like nature of artificial models. The objective of this work is proposing a new approach to evaluate artificial models via board games. To this effect, we test the color perception and color naming capabilities of CLIP by playing the board game Hues & Cues and assess its alignment with humans. Our experiments show that CLIP is generally well aligned with human observers, but our approach brings to light certain cultural biases and inconsistencies when dealing with different abstraction levels that are hard to identify with other testing strategies. Our findings indicate that assessing models with different tasks like board games can make certain deficiencies in the models stand out in ways that are difficult to test with the commonly used benchmarks.

[652] arXiv:2509.02379 (replaced) [pdf, html, other]
Title: MedDINOv3: How to adapt vision foundation models for medical image segmentation?
Yuheng Li, Yizhou Wu, Yuxiang Lai, Mingzhe Hu, Xiaofeng Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate segmentation of organs and tumors in CT and MRI scans is essential for diagnosis, treatment planning, and disease monitoring. While deep learning has advanced automated segmentation, most models remain task-specific, lacking generalizability across modalities and institutions. Vision foundation models (FMs) pretrained on billion-scale natural images offer powerful and transferable representations. However, adapting them to medical imaging faces two key challenges: (1) the ViT backbone of most foundation models still underperform specialized CNNs on medical image segmentation, and (2) the large domain gap between natural and medical images limits transferability. We introduce MedDINOv3, a simple and effective framework for adapting DINOv3 to medical segmentation. We first revisit plain ViTs and design a simple and effective architecture with multi-scale token aggregation. Then, we perform domain-adaptive pretraining on CT-3M, a curated collection of 3.87M axial CT slices, using a multi-stage DINOv3 recipe to learn robust dense features. MedDINOv3 matches or exceeds state-of-the-art performance across four segmentation benchmarks, demonstrating the potential of vision foundation models as unified backbones for medical image segmentation. The code is available at this https URL.

[653] arXiv:2509.02424 (replaced) [pdf, html, other]
Title: Faster and Better: Reinforced Collaborative Distillation and Self-Learning for Infrared-Visible Image Fusion
Yuhao Wang, Lingjuan Miao, Zhiqiang Zhou, Yajun Qiao, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Infrared and visible image fusion plays a critical role in enhancing scene perception by combining complementary information from different modalities. Despite recent advances, achieving high-quality image fusion with lightweight models remains a significant challenge. To bridge this gap, we propose a novel collaborative distillation and self-learning framework for image fusion driven by reinforcement learning. Unlike conventional distillation, this approach not only enables the student model to absorb image fusion knowledge from the teacher model, but more importantly, allows the student to perform self-learning on more challenging samples to enhance its capabilities. Particularly, in our framework, a reinforcement learning agent explores and identifies a more suitable training strategy for the this http URL agent takes both the student's performance and the teacher-student gap as inputs, which leads to the generation of challenging samples to facilitate the student's self-learning. Simultaneously, it dynamically adjusts the teacher's guidance strength based on the student's state to optimize the knowledge transfer. Experimental results demonstrate that our method can significantly improve student performance and achieve better fusion results compared to existing techniques.

[654] arXiv:2509.02479 (replaced) [pdf, html, other]
Title: SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Zhenghai Xue, Longtao Zheng, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun Ma, Bo An
Subjects: Machine Learning (cs.LG)

Large Language Models (LLMs) can significantly improve their reasoning capabilities by interacting with external tools, a paradigm known as Tool-Integrated Reasoning (TIR). However, extending TIR to multi-turn scenarios using Reinforcement Learning (RL) is often hindered by training instability and performance collapse. We identify that such instability is primarily caused by a distributional drift from external tool feedback, leading to the generation of low-probability tokens. This issue compounds over successive turns, causing catastrophic gradient norm explosions that derail the training process. To address this challenge, we introduce SimpleTIR , a plug-and-play algorithm that stabilizes multi-turn TIR training. Its core strategy is to identify and filter out trajectories containing void turns, i.e., turns that yield neither a code block nor a final answer. By removing these problematic trajectories from the policy update, SimpleTIR effectively blocks the harmful, high-magnitude gradients, thus stabilizing the learning dynamics. Extensive experiments show that SimpleTIR achieves state-of-the-art performance on challenging math reasoning benchmarks, notably elevating the AIME24 score from a text-only baseline of 22.1 to 50.5 when starting from the Qwen2.5-7B base model. Furthermore, by avoiding the constraints of supervised fine-tuning, SimpleTIR encourages the model to discover diverse and sophisticated reasoning patterns, such as self-correction and cross-validation.

[655] arXiv:2509.02499 (replaced) [pdf, html, other]
Title: MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
Junxi Wu, Jinpeng Wang, Zheng Liu, Bin Chen, Dongjian Hu, Hao Wu, Shu-Tao Xia
Comments: EMNLP 2025
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The rapid advancement of large language models has intensified public concerns about the potential misuse. Therefore, it is important to build trustworthy AI-generated text detection systems. Existing methods neglect stylistic modeling and mostly rely on static thresholds, which greatly limits the detection performance. In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. MoSEs contain three core components, namely, the Stylistics Reference Repository (SRR), the Stylistics-Aware Router (SAR), and the Conditional Threshold Estimator (CTE). For input text, SRR can activate the appropriate reference data in SRR and provide them to CTE. Subsequently, CTE jointly models the linguistic statistical properties and semantic features to dynamically determine the optimal threshold. With a discrimination score, MoSEs yields prediction labels with the corresponding confidence level. Our framework achieves an average improvement 11.34% in detection performance compared to baselines. More inspiringly, MoSEs shows a more evident improvement 39.15% in the low-resource case. Our code is available at this https URL.

[656] arXiv:2211.01280 (replaced) [pdf, html, other]
Title: An Exponentially Converging Particle Method for the Mixed Nash Equilibrium of Continuous Games
Guillaume Wang, Lénaïc Chizat
Comments: 76 pages, 6 figures. Compared to journal version: fixed typos, made cosmetic adjustments, corrected proofs in Appendices C.2 and D.2
Journal-ref: Open Journal of Mathematical Optimization, Volume 6 (2025), article no. 1, 66 p
Subjects: Optimization and Control (math.OC); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

We consider the problem of computing mixed Nash equilibria of two-player zero-sum games with continuous sets of pure strategies and with first-order access to the payoff function. This problem arises for example in game-theory-inspired machine learning applications, such as distributionally-robust learning. In those applications, the strategy sets are high-dimensional and thus methods based on discretisation cannot tractably return high-accuracy solutions.
In this paper, we introduce and analyze a particle-based method that enjoys guaranteed local convergence for this problem. This method consists in parametrizing the mixed strategies as atomic measures and applying proximal point updates to both the atoms' weights and positions. It can be interpreted as a time-implicit discretization of the "interacting" Wasserstein-Fisher-Rao gradient flow.
We prove that, under non-degeneracy assumptions, this method converges at an exponential rate to the exact mixed Nash equilibrium from any initialization satisfying a natural notion of closeness to optimality. We illustrate our results with numerical experiments and discuss applications to max-margin and distributionally-robust classification using two-layer neural networks, where our method has a natural interpretation as a simultaneous training of the network's weights and of the adversarial distribution.

[657] arXiv:2306.13903 (replaced) [pdf, html, other]
Title: On the local consequence of modal Product logic: standard completeness and decidability
Amanda Vidal
Subjects: Logic (math.LO); Computational Complexity (cs.CC)

We study modal extensions of product fuzzy logic in two settings: (i) Kripke models where the accessibility relation itself takes fuzzy values, and (ii) Kripke models with a classical (crisp) accessibility relation. In both cases, the models can be evaluated either over all product algebras or over a single product algebra. In this paper, we focus on the local consequence relation for these four types of modal product logics. We show that reasoning in these modal logics can be reduced to reasoning in propositional product logic. This reduction leads to two main results. First, these logics are standard complete: the corresponding logic defined using all product algebras coincides with the one defined using only the standard product algebra on the interval [0, 1]. Second, we show that these logics are decidable.

[658] arXiv:2404.00084 (replaced) [pdf, html, other]
Title: KKL theorem for the influence of a set of variables
Tomasz Przybyłowski
Comments: To appear in SIAM journal on Discrete Mathematics
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Probability (math.PR)

Consider a Boolean function f on the n-dimensional hypercube, and a set of variables (indexed by) $S \subset \{1,2,\ldots,n\}.$ The coalition influence of the variables S on a function f is the probability that after a random assignment of variables not in S, the value of f is undetermined. In this paper, we study a complementary notion, which we call the joint influence: the probability that, after a random assignment of variables not in S, the value of f is dependent on all variables in S.
We show that for an arbitrary fixed d, every Boolean function f on n variables admits a d-set of joint influence at least $\tfrac{1}{10} W^{\geq d}(f) (\frac{\log n}{n})^d$, where $W^{\geq d}(f)$ is the Fourier weight of f at degrees at least d. This result is a direct generalisation of the Kahn-Kalai-Linial theorem. Further, we give an example demonstrating essential sharpness of the above bound. In our study of the joint influence we consider another notion of multi-bit influence recently introduced by Tal.

[659] arXiv:2405.00282 (replaced) [pdf, html, other]
Title: MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games
Anran Hu, Junzi Zhang
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Reinforcement learning for multi-agent games has attracted lots of attention recently. However, given the challenge of solving Nash equilibria for large population games, existing works with guaranteed polynomial complexities either focus on variants of zero-sum and potential games, or aim at solving (coarse) correlated equilibria, or require access to simulators, or rely on certain assumptions that are hard to verify. This work proposes MF-OML (Mean-Field Occupation-Measure Learning), an online mean-field reinforcement learning algorithm for computing approximate Nash equilibria of large population sequential symmetric games. MF-OML is the first fully polynomial multi-agent reinforcement learning algorithm for provably solving Nash equilibria (up to mean-field approximation gaps that vanish as the number of players $N$ goes to infinity) beyond variants of zero-sum and potential games. When evaluated by the cumulative deviation from Nash equilibria, the algorithm is shown to achieve a high probability regret bound of $\tilde{O}(M^{3/4}+N^{-1/2}M)$ for games with the strong Lasry-Lions monotonicity condition, and a regret bound of $\tilde{O}(M^{11/12}+N^{- 1/6}M)$ for games with only the Lasry-Lions monotonicity condition, where $M$ is the total number of episodes and $N$ is the number of agents of the game. As a byproduct, we also obtain the first tractable globally convergent computational algorithm for computing approximate Nash equilibria of monotone mean-field games.

[660] arXiv:2409.00827 (replaced) [pdf, html, other]
Title: Log-concavity of the independence polynomials of $\mathbf{W}_{p}$ graphs
Do Trong Hoang, Vadim E. Levit, Eugen Mandrescu, My Hanh Pham
Comments: 16 pages, 2 figures
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Let $G$ be a graph of order $n$. For a positive integer $p$, $G$ is said to be a $\mathbf{W}_{p}$ graph if $n\geq p$ and every $p$ pairwise disjoint independent sets of $G$ are contained within $p$ pairwise disjoint maximum independent sets. In this paper, we establish that every connected $\mathbf{W}_{p}$ graph $G$ is $p$-quasi-regularizable if and only if $n\geq(p+1)\cdot\alpha$, where $\alpha$ is the independence number of $G$ and $p\neq2$. This finding ensures that the independence polynomial of a connected $\mathbf{W}_{p}$ graph $G$ is log-concave whenever $(p+1)\cdot\alpha\leq n\leq p\cdot\alpha+2\sqrt{p\cdot\alpha+p}$ and $\frac{\alpha^{2}}{4\left( \alpha+1\right) }\leq p$, or $p\cdot\alpha+2\sqrt{p\cdot\alpha+p}<n\leq \frac{\left( \alpha^{2}+1\right) \cdot p+\left( \alpha-1\right) ^{2}}{\alpha-1}$ and $\frac{\alpha\left( \alpha-1\right) }{\alpha+1}\leq p$. Moreover, the clique corona graph $G\circ K_{p}$ serves as an example of the $\mathbf{W}_{p}$ graph class. We further demonstrate that the independence polynomial of $G\circ K_{p}$ is always log-concave for sufficiently large $p$.
Keywords: very well-covered graph; quasi-regularizable graph; corona graph; $\mathbf{W}_{p}$ graph; independence polynomial; log-concavity.

[661] arXiv:2409.06336 (replaced) [pdf, html, other]
Title: Towards Agentic AI on Particle Accelerators
Antonin Sulc, Thorsten Hellert, Raimund Kammering, Hayden Hoschouer, Jason St. John
Comments: 5 pages, 3 figures, Machine Learning and the Physical Sciences at Workshop at the 38th conference on Neural Information Processing Systems (NeurIPS)
Journal-ref: Machine Learning and the Physical Sciences Workshop at the 38th conference on Neural Information Processing Systems (NeurIPS) December 15, 2024
Subjects: Accelerator Physics (physics.acc-ph); Artificial Intelligence (cs.AI)

As particle accelerators grow in complexity, traditional control methods face increasing challenges in achieving optimal performance. This paper envisions a paradigm shift: a decentralized multi-agent framework for accelerator control, powered by Large Language Models (LLMs) and distributed among autonomous agents. We present a proposition of a self-improving decentralized system where intelligent agents handle high-level tasks and communication and each agent is specialized to control individual accelerator components.
This approach raises some questions: What are the future applications of AI in particle accelerators? How can we implement an autonomous complex system such as a particle accelerator where agents gradually improve through experience and human feedback? What are the implications of integrating a human-in-the-loop component for labeling operational data and providing expert guidance? We show three examples, where we demonstrate the viability of such architecture.

[662] arXiv:2410.12030 (replaced) [pdf, html, other]
Title: Clifford Strategies in Interactive Protocols are Classically Simulatable
Itay Shalit
Comments: This version includes an extended introduction. Accepted to TCC 2025
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Cryptography and Security (cs.CR)

$\text{MIP}^\ast$ is the class of languages decidable by an efficient classical verifier interacting with multiple quantum provers that share entangled qubits but cannot communicate. Notably, $\text{MIP}^\ast$ was proved to equal $\text{RE}$, the class of all recursively enumerable languages.
We introduce the complexity class $\text{Clifford-MIP}^\ast$, which restricts quantum provers to Clifford operations and classical post-processing of measurement results, while still allowing shared entangled qubits in any quantum state. We show that any strategy in this model can be simulated by classical provers with shared random bits, and therefore admits a local hidden-variable description. Consequently, $\text{Clifford-MIP}^\ast = \text{MIP}$, a vastly smaller complexity class compared to $\text{RE}$.
Moreover, we resolve an open question posed by Kalai et al. (STOC 2023), by showing that quantum advantage in any single-round non-local game requires at least two provers operating outside the $\text{Clifford-MIP}^\ast$ computational model. This rules out a proposed approach for significantly improving the efficiency of quantum advantage tests that are based on compiling non-local games into single-prover interactive protocols.

[663] arXiv:2410.15361 (replaced) [pdf, html, other]
Title: A Novel Characterization of the Population Area Under the Risk Coverage Curve (AURC) and Rates of Finite Sample Estimators
Han Zhou, Jordy Van Landeghem, Teodora Popordanoska, Matthew B. Blaschko
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The selective classifier (SC) has been proposed for rank based uncertainty thresholding, which could have applications in safety critical areas such as medical diagnostics, autonomous driving, and the justice system. The Area Under the Risk-Coverage Curve (AURC) has emerged as the foremost evaluation metric for assessing the performance of SC systems. In this work, we present a formal statistical formulation of population AURC, presenting an equivalent expression that can be interpreted as a reweighted risk function. Through Monte Carlo methods, we derive empirical AURC plug-in estimators for finite sample scenarios. The weight estimators associated with these plug-in estimators are shown to be consistent, with low bias and tightly bounded mean squared error (MSE). The plug-in estimators are proven to converge at a rate of $\mathcal{O}(\sqrt{\ln(n)/n})$ demonstrating statistical consistency. We empirically validate the effectiveness of our estimators through experiments across multiple datasets, model architectures, and confidence score functions (CSFs), demonstrating consistency and effectiveness in fine-tuning AURC performance.

[664] arXiv:2411.00446 (replaced) [pdf, html, other]
Title: A Lorentz-Equivariant Transformer for All of the LHC
Johann Brehmer, Víctor Bresó, Pim de Haan, Tilman Plehn, Huilin Qu, Jonas Spinner, Jesse Thaler
Comments: 27 pages, 7 figures, 9 tables. v2: added table 5, improved tagging results. v3: added table 7, incorporate feedback
Subjects: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex)

We show that the Lorentz-Equivariant Geometric Algebra Transformer (L-GATr) yields state-of-the-art performance for a wide range of machine learning tasks at the Large Hadron Collider. L-GATr represents data in a geometric algebra over space-time and is equivariant under Lorentz transformations. The underlying architecture is a versatile and scalable transformer, which is able to break symmetries if needed. We demonstrate the power of L-GATr for amplitude regression and jet classification, and then benchmark it as the first Lorentz-equivariant generative network. For all three LHC tasks, we find significant improvements over previous architectures.

[665] arXiv:2411.10775 (replaced) [pdf, html, other]
Title: Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion
Gang He, Kepeng Xu, Li Xu, Wenxin Yu, Xianyun Wu
Comments: accepted by IJCAI 2025
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

The rise of HDR-WCG display devices has highlighted the need to convert SDRTV to HDRTV, as most video sources are still in SDR. Existing methods primarily focus on designing neural networks to learn a single-style mapping from SDRTV to HDRTV. However, the limited information in SDRTV and the diversity of styles in real-world conversions render this process an ill-posed problem, thereby constraining the performance and generalization of these methods. Inspired by generative approaches, we propose a novel method for SDRTV to HDRTV conversion guided by real HDRTV priors. Despite the limited information in SDRTV, introducing real HDRTV as reference priors significantly constrains the solution space of the originally high-dimensional ill-posed problem. This shift transforms the task from solving an unreferenced prediction problem to making a referenced selection, thereby markedly enhancing the accuracy and reliability of the conversion process. Specifically, our approach comprises two stages: the first stage employs a Vector Quantized Generative Adversarial Network to capture HDRTV priors, while the second stage matches these priors to the input SDRTV content to recover realistic HDRTV outputs. We evaluate our method on public datasets, demonstrating its effectiveness with significant improvements in both objective and subjective metrics across real and synthetic datasets.

[666] arXiv:2412.02670 (replaced) [pdf, html, other]
Title: The Broader Landscape of Robustness in Algorithmic Statistics
Gautam Kamath
Comments: To appear in IEEE BITS the Information Theory Magazine
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)

The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints. An estimator may be robust in a number of different ways: to contamination of the dataset, to heavy-tailed data, or in the sense that it preserves privacy of the dataset. We survey recent results in these areas with a focus on the problem of mean estimation, drawing technical and conceptual connections between the various forms of robustness, showing that the same underlying algorithmic ideas lead to computationally efficient estimators in all these settings.

[667] arXiv:2501.01483 (replaced) [pdf, html, other]
Title: Embedding Similarity Guided License Plate Super Resolution
Abderrezzaq Sendjasni, Mohamed-Chaker Larabi
Comments: Accepted in Neurocomputing
Journal-ref: Neurocomputing 651, 2025, 130657
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Super-resolution (SR) techniques play a pivotal role in enhancing the quality of low-resolution images, particularly for applications such as security and surveillance, where accurate license plate recognition is crucial. This study proposes a novel framework that combines pixel-based loss with embedding similarity learning to address the unique challenges of license plate super-resolution (LPSR). The introduced pixel and embedding consistency loss (PECL) integrates a Siamese network and applies contrastive loss to force embedding similarities to improve perceptual and structural fidelity. By effectively balancing pixel-wise accuracy with embedding-level consistency, the framework achieves superior alignment of fine-grained features between high-resolution (HR) and super-resolved (SR) license plates. Extensive experiments on the CCPD and PKU dataset validate the efficacy of the proposed framework, demonstrating consistent improvements over state-of-the-art methods in terms of PSNR, SSIM, LPIPS, and optical character recognition (OCR) accuracy. These results highlight the potential of embedding similarity learning to advance both perceptual quality and task-specific performance in extreme super-resolution scenarios.

[668] arXiv:2501.03379 (replaced) [pdf, html, other]
Title: The hard-core model in graph theory
Ewan Davies, Ross J. Kang
Comments: 35 pages; prepared as a chapter for a forthcoming volume, Topics in Probabilistic Graph Theory; v2 refers to some selected newer developments
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Probability (math.PR)

An independent set may not contain both a vertex and one of its neighbours. This basic fact makes the uniform distribution over independent sets rather special. We consider the hard-core model, an essential generalization of the uniform distribution over independent sets. We show how its local analysis yields remarkable insights into the global structure of independent sets in the host graph, in connection with, for instance, Ramsey numbers, graph colourings, and sphere packings.

[669] arXiv:2501.13842 (replaced) [pdf, html, other]
Title: On Supportedness in Multi-Objective Combinatorial Optimization
David Könen, Michael Stiglmayr
Comments: arXiv admin note: text overlap with arXiv:2305.12867
Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM)

This paper addresses an inconsistency in various definitions of supported non-dominated points within multi-objective combinatorial problems (MOCO). MOCO problems are known to contain supported and unsupported non-dominated points, with the latter typically outnumbering the former. Supported points are, in general, easier to determine, can serve as representations, and are used in two-phase methods to generate the entire non-dominated point set. Despite their importance, several different characterizations for supported efficient solutions (and supported non-dominated points) are used in the literature.
While these definitions are equivalent for multi-objective linear problems, they can yield different sets of supported non-dominated points for MOCO problems. We show by an example that these definitions are not equivalent for MOCO or general multi-objective optimization problems. Moreover, we analyze the structural and computational properties of the resulting sets of supported non-dominated points. These considerations motivate us to summarize equivalent definitions and characterizations for supported efficient solutions and to introduce a distinction between supported and weakly supported efficient solutions.

[670] arXiv:2502.18393 (replaced) [pdf, other]
Title: Learning sparse generalized linear models with binary outcomes via iterative hard thresholding
Namiko Matsumoto, Arya Mazumdar
Subjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

In statistics, generalized linear models (GLMs) are widely used for modeling data and can expressively capture potential nonlinear dependence of the model's outcomes on its covariates. Within the broad family of GLMs, those with binary outcomes, which include logistic and probit regressions, are motivated by common tasks such as binary classification with (possibly) non-separable data. In addition, in modern machine learning and statistics, data is often high-dimensional yet has a low intrinsic dimension, making sparsity constraints in models another reasonable consideration. In this work, we propose to use and analyze an iterative hard thresholding (projected gradient descent on the ReLU loss) algorithm, called binary iterative hard thresholding (BIHT), for parameter estimation in sparse GLMs with binary outcomes. We establish that BIHT is statistically efficient and converges to the correct solution for parameter estimation in a general class of sparse binary GLMs. Unlike many other methods for learning GLMs, including maximum likelihood estimation, generalized approximate message passing, and GLM-tron (Kakade et al. 2011; Bahmani et al. 2016), BIHT does not require knowledge of the GLM's link function, offering flexibility and generality in allowing the algorithm to learn arbitrary binary GLMs. As two applications, logistic and probit regression are additionally studied. In this regard, it is shown that in logistic regression, the algorithm is in fact statistically optimal in the sense that the order-wise sample complexity matches (up to logarithmic factors) the lower bound obtained previously. To the best of our knowledge, this is the first work achieving statistical optimality for logistic regression in all noise regimes with a computationally efficient algorithm. Moreover, for probit regression, our sample complexity is on the same order as that obtained for logistic regression.

[671] arXiv:2502.21269 (replaced) [pdf, html, other]
Title: Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks
Andrea Montanari, Pierfrancesco Urbani
Comments: 85 pages; 62 pdf figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)

Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural networks via dynamical mean field theory, a well established technique of non-equilibrium statistical physics. We show that, for large network width, the training dynamics exhibits a separation of timescales which implies: $(i)$ The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity of the network; $(ii)$ Inductive bias towards small complexity if the initialization has small enough complexity; $(iii)$ A dynamical decoupling between feature learning and overfitting regimes; $(iv)$ A non-monotone behavior of the test error, associated `feature unlearning' regime at large times.

[672] arXiv:2503.03600 (replaced) [pdf, html, other]
Title: Bounding the computational power of bosonic systems
Varun Upreti, Dorian Rudolph, Ulysse Chabaud
Comments: 11 pages, 3 figures, 10 pages of appendix
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC)

Bosonic quantum systems operate in an infinite-dimensional Hilbert space, unlike discrete-variable quantum systems. This distinct mathematical structure leads to fundamental differences in quantum information processing, such as an exponentially greater complexity of state tomography [MMB+24] or a factoring algorithm in constant space [BCCRK24]. Yet, it remains unclear whether this structural difference of bosonic systems may also translate to a practical computational advantage over finite-dimensional quantum computers. Here we take a step towards answering this question by showing that universal bosonic quantum computations can be simulated in exponential time on a classical computer, significantly improving the best previous upper bound requiring exponential memory [CJMM24]. In complexity-theoretic terms, we improve the best upper bound on $\textsf{CVBQP}$ from $\textsf{EXPSPACE}$ to $\textsf{EXP}$. This result is achieved using a simulation strategy based on finite energy cutoffs and approximate coherent state decompositions. While we propose ways to potentially refine this bound, we also present arguments supporting the plausibility of an exponential computational advantage of bosonic quantum computers over their discrete-variable counterparts. Furthermore, we emphasize the role of circuit energy as a resource and discuss why it may act as the fundamental bottleneck in realizing this advantage in practical implementations.

[673] arXiv:2503.22896 (replaced) [pdf, html, other]
Title: Representation and Stability Analysis of 1D PDEs with Periodic Boundary Conditions
Declan Jagt, Sergei Chernyshenko, Matthew Peet
Subjects: Analysis of PDEs (math.AP); Systems and Control (eess.SY); Optimization and Control (math.OC)

PDEs with periodic boundary conditions are frequently used to model processes in large spatial environments, assuming solutions to extend periodically beyond some bounded interval. However, solutions to these PDEs often do not converge to a unique equilibrium, but instead converge to non-stationary trajectories existing in the nullspace of the spatial differential operator (e.g. $\frac{\partial^2}{\partial x^2}$). To analyse this convergence behaviour, in this paper, it is shown how such trajectories can be modeled for a broad class of linear, 2nd order, 1D PDEs with periodic as well as more general boundary conditions, using the Partial Integral Equation (PIE) representation. In particular, it is first shown how any PDE state satisfying these boundary conditions can be uniquely expressed in terms of two components, existing in the image and the nullspace of the differential operator $\frac{\partial^2}{\partial x^2}$, respectively. An equivalent representation of linear PDEs is then derived as a PIE, explicitly defining the dynamics of both state components. Finally, a notion of exponential stability is defined that requires only one of the state components to converge to zero, and it is shown how this stability notion can be tested by solving a linear operator inequality. The proposed methodology is applied to examples of heat and wave equations, demonstrating that exponential stability can be verified with tight bounds on the rate of decay.

[674] arXiv:2504.01463 (replaced) [pdf, html, other]
Title: Versatile silicon integrated photonic processor: a reconfigurable solution for next-generation AI clusters
Ying Zhu, Yifan Liu, Xinyu Yang, Kailai Liu, Xin Hua, Ming Luo, Jia Liu, Siyao Chang, Shengxiang Zhang, Miao Wu, Zhicheng Wang, Hongguang Zhang, Daigao Chen, Xi Xiao, Shaohua Yu
Subjects: Optics (physics.optics); Hardware Architecture (cs.AR)

The Artificial Intelligence models pose serious challenges in intensive computing and high-bandwidth communication for conventional electronic circuit-based computing clusters. Silicon photonic technologies, owing to their high speed, low latency, large bandwidth, and complementary metal-oxide-semiconductor compatibility, have been widely implemented for data transfer and actively explored as photonic neural networks in AI clusters. However, current silicon photonic integrated chips lack adaptability for multifuncional use and hardware-software systematic coordination. Here, we develop a reconfigurable silicon photonic processor with $40$ programmable unit cells integrating over $160$ component, which, to the best of our knowledge, is the first to realize diverse functions with a chip for AI clusters, from computing acceleration and signal processing to network swtiching and secure encryption. Through a self-developed automated testing, compilation, and tuning framework to the processor without in-network monitoring photodetectors, we implement $4\times4$ dual-direction unitary and $3\times3$ uni-direction non-unitary matrix multiplications, neural networks for image recognition, micro-ring modulator wavelength locking, $4\times4$ photonic channel switching , and silicon photonic physical unclonable functions. This optoelectronic processing system, incorporating the photonic processor and its software stack, paves the way for both advanced photonic system-on-chip design and the construction of photo-electronic AI clusters.

[675] arXiv:2504.02385 (replaced) [pdf, html, other]
Title: Quantum singular value transformation without block encodings: Near-optimal complexity with minimal ancilla
Shantanav Chakraborty, Soumyabrata Hazra, Tongyang Li, Changpeng Shao, Xinzhao Wang, Yuxin Zhang
Comments: This article has been split into two parts. This version contains the first part and is about QSVT without using block encoding with one ancilla and near optimal circuit depth. The other part is about randomized QSVT and will be available on arXiv soon
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)

We develop new algorithms for Quantum Singular Value Transformation (QSVT), a unifying framework that encapsulates most known quantum algorithms and serves as the foundation for new ones. Existing implementations of QSVT rely on block encoding, incurring an intrinsic $O(\log L)$ ancilla overhead and circuit depth $\widetilde{O}(L d\lambda )$ for polynomial transformations of a Hamiltonian $H=\sum_{k=1}^L H_k$, where $d$ is the polynomial degree and $\lambda=\sum_{k}\|H_k\|$.
We introduce a simple yet powerful approach that utilizes only basic Hamiltonian simulation techniques, namely, Trotter methods, to: (i) eliminate the need for block encoding, (ii) reduce the ancilla overhead to only a single qubit, and (iii) still maintain near-optimal complexity. Our method achieves a circuit depth of $\widetilde{O}(L(d\lambda_{\mathrm{comm}})^{1+o(1)})$, without requiring any complicated multi-qubit controlled gates. Moreover, $\lambda_{\mathrm{comm}}$ depends on the nested commutators of the terms of $H$ and can be substantially smaller than $\lambda$ for many physically relevant Hamiltonians, a feature absent in standard QSVT. To achieve these results, we make use of Richardson extrapolation in a novel way, systematically eliminating errors in any interleaved sequence of arbitrary unitaries and Hamiltonian evolution operators, thereby establishing a general framework that encompasses QSVT but is more broadly applicable.
As applications, we develop end-to-end quantum algorithms for solving linear systems and estimating ground state properties of Hamiltonians, both achieving near-optimal complexity without relying on oracular access. Overall, our results establish a new framework for quantum algorithms, significantly reducing hardware overhead while maintaining near-optimal performance, with implications for both near-term and fault-tolerant quantum computing.

[676] arXiv:2504.06463 (replaced) [pdf, html, other]
Title: AstroClearNet: Deep image prior for multi-frame astronomical image restoration
Yashil Sukurdeep, Fausto Navarro, Tamás Budavári
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV)

Recovering high-fidelity images of the night sky from blurred observations is a fundamental problem in astronomy, where traditional methods typically fall short. In ground-based astronomy, combining multiple exposures to enhance signal-to-noise ratios is further complicated by variations in the point-spread function caused by atmospheric turbulence. In this work, we present a self-supervised multi-frame method, based on deep image priors, for denoising, deblurring, and coadding ground-based exposures. Central to our approach is a carefully designed convolutional neural network that integrates information across multiple observations and enforces physically motivated constraints. We demonstrate the method's potential by processing Hyper Suprime-Cam exposures, yielding promising preliminary results with sharper restored images.

[677] arXiv:2504.10453 (replaced) [pdf, html, other]
Title: Anchors no more: Using peculiar velocities to constrain $H_0$ and the primordial Universe without calibrators
Davide Piras, Francesco Sorrenti, Ruth Durrer, Martin Kunz
Comments: 23 pages, 6 figures. Minor changes to match version published in JCAP. Code available at this https URL
Journal-ref: Journal of Cosmology and Astroparticle Physics, Volume 2025, September 2025
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); General Relativity and Quantum Cosmology (gr-qc)

We develop a novel approach to constrain the Hubble parameter $H_0$ and the primordial power spectrum amplitude $A_\mathrm{s}$ using type Ia supernovae (SNIa) data. By considering SNIa as tracers of the peculiar velocity field, we can model their distance and their covariance as a function of cosmological parameters without the need of calibrators like Cepheids; this yields a new independent probe of the large-scale structure based on SNIa data without distance anchors. Crucially, we implement a differentiable pipeline in JAX, including efficient emulators and affine sampling, reducing inference time from years to hours on a single GPU. We first validate our method on mock datasets, demonstrating that we can constrain $H_0$ and $\log 10^{10}A_\mathrm{s}$ within $10\%$ and $15\%$, respectively, using $\mathcal{O}(10^3)$ SNIa. We then test our pipeline with SNIa from an $N$-body simulation, obtaining $6\%$-level unbiased constraints on $H_0$ with a moderate noise level. We finally apply our method to Pantheon+ data, constraining $H_0$ at the $15\%$ level without Cepheids when fixing $A_\mathrm{s}$ to its $\it{Planck}$ value. On the other hand, we obtain $20\%$-level constraints on $\log 10^{10}A_\mathrm{s}$ in agreement with $\it{Planck}$ when including Cepheids in the analysis. In light of upcoming observations of low redshift SNIa from the Zwicky Transient Facility and the Vera Rubin Legacy Survey of Space and Time, surveys for which our method will develop its full potential, we make our code publicly available.

[678] arXiv:2504.13037 (replaced) [pdf, other]
Title: Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond
Yundi Zhang, Paul Hager, Che Liu, Suprosanna Shit, Chen Chen, Daniel Rueckert, Jiazhen Pan
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Cardiac magnetic resonance imaging is the gold standard for non-invasive cardiac assessment, offering rich spatio-temporal views of the cardiac anatomy and physiology. Patient-level health factors, such as demographics, metabolic, and lifestyle, are known to substantially influence cardiovascular health and disease risk, yet remain uncaptured by CMR alone. To holistically understand cardiac health and to enable the best possible interpretation of an individual's disease risk, CMR and patient-level factors must be jointly exploited within an integrated framework. Recent multi-modal approaches have begun to bridge this gap, yet they often rely on limited spatio-temporal data and focus on isolated clinical tasks, thereby hindering the development of a comprehensive representation for cardiac health evaluation. To overcome these limitations, we introduce ViTa, a step toward foundation models that delivers a comprehensive representation of the heart and a precise interpretation of individual disease risk. Leveraging data from 42,000 UK Biobank participants, ViTa integrates 3D+T cine stacks from short-axis and long-axis views, enabling a complete capture of the cardiac cycle. These imaging data are then fused with detailed tabular patient-level factors, enabling context-aware insights. This multi-modal paradigm supports a wide spectrum of downstream tasks, including cardiac phenotype and physiological feature prediction, segmentation, and classification of cardiac and metabolic diseases within a single unified framework. By learning a shared latent representation that bridges rich imaging features and patient context, ViTa moves beyond traditional, task-specific models toward a universal, patient-specific understanding of cardiac health, highlighting its potential to advance clinical utility and scalability in cardiac analysis.

[679] arXiv:2505.04255 (replaced) [pdf, other]
Title: Model-based learning for joint channel estimationand hybrid MIMO precoding
Nay Klaimi (IETR, INSA Rennes), Amira Bedoui (IETR, INSA Rennes), Clément Elvira (IETR), Philippe Mary (INSA Rennes, IETR), Luc Le Magoarou (INSA Rennes, IETR)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Hybrid precoding is a key ingredient of cost-effective massive multiple-input multiple-output transceivers. However, setting jointly digital and analog precoders to optimally serve multiple users is a difficult optimization problem. Moreover, it relies heavily on precise knowledge of the channels, which is difficult to obtain, especially when considering realistic systems comprising hardware impairments. In this paper, a joint channel estimation and hybrid precoding method is proposed, which consists in an end-to-end architecture taking received pilots as inputs and outputting pre-coders. The resulting neural network is fully model-based, making it lightweight and interpretable with very few learnable parameters. The channel estimation step is performed using the unfolded matching pursuit algorithm, accounting for imperfect knowledge of the antenna system, while the precoding step is done via unfolded projected gradient ascent. The great potential of the proposed method is empirically demonstrated on realistic synthetic channels.

[680] arXiv:2505.04283 (replaced) [pdf, html, other]
Title: On multiplicities of interpoint distances
Felix Christian Clemen, Adrian Dumitrescu, Dingyuan Liu
Comments: 11 pages, 4 figures
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Given a set $X\subseteq\mathbb{R}^2$ of $n$ points and a distance $d>0$, the multiplicity of $d$ is the number of times the distance $d$ appears between points in $X$. Let $a_1(X) \geq a_2(X) \geq \cdots \geq a_m(X)$ denote the multiplicities of the $m$ distances determined by $X$ and let $a(X)=\left(a_1(X),\dots,a_m(X)\right)$. In this paper, we study several questions from Erdős's time regarding distance multiplicities. Among other results, we show that:
(1) If $X$ is convex or ``not too convex'', then there exists a distance other than the diameter that has multiplicity at most $n$.
(2) There exists a set $X \subseteq \mathbb{R}^2$ of $n$ points, such that many distances occur with high multiplicity. In particular, at least $n^{\Omega(1/\log\log{n})}$ distances have superlinear multiplicity in $n$.
(3) For any (not necessarily fixed) integer $1\leq k\leq\log{n}$, there exists $X\subseteq\mathbb{R}^2$ of $n$ points, such that the difference between the $k^{\text{th}}$ and $(k+1)^{\text{th}}$ largest multiplicities is at least $\Omega(\frac{n\log{n}}{k})$. Moreover, the distances in $X$ with the largest $k$ multiplicities can be prescribed.
(4) For every $n\in\mathbb{N}$, there exists $X\subseteq\mathbb{R}^2$ of $n$ points, not all collinear or cocircular, such that $a(X)= (n-1,n-2,\ldots,1)$. There also exists $Y\subseteq\mathbb{R}^2$ of $n$ points with pairwise distinct distance multiplicities and $a(Y) \neq (n-1,n-2,\ldots,1)$.

[681] arXiv:2505.16893 (replaced) [pdf, html, other]
Title: Statistical Test for Saliency Maps of Graph Neural Networks via Selective Inference
Shuichi Nishino, Tomohiro Shiraishi, Teruyuki Katsuoka, Ichiro Takeuchi
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Graph Neural Networks (GNNs) have gained prominence for their ability to process graph-structured data across various domains. However, interpreting GNN decisions remains a significant challenge, leading to the adoption of saliency maps for identifying salient subgraphs composed of influential nodes and edges. Despite their utility, the reliability of GNN saliency maps has been questioned, particularly in terms of their robustness to input noise. In this study, we propose a statistical testing framework to rigorously evaluate the significance of saliency maps. Our main contribution lies in addressing the inflation of the Type I error rate caused by double-dipping of data, leveraging the framework of Selective Inference. Our method provides statistically valid $p$-values while controlling the Type I error rate, ensuring that identified salient subgraphs contain meaningful information rather than random artifacts. The method is applicable to a variety of saliency methods with piecewise linearity (e.g., Class Activation Mapping). We validate our method on synthetic and real-world datasets, demonstrating its capability in assessing the reliability of GNN interpretations.

[682] arXiv:2506.05657 (replaced) [pdf, html, other]
Title: Emulating compact binary population synthesis simulations with uncertainty quantification and model comparison using Bayesian normalizing flows
Anarya Ray
Comments: 16 pages, 4 figures
Subjects: High Energy Astrophysical Phenomena (astro-ph.HE); Machine Learning (cs.LG); General Relativity and Quantum Cosmology (gr-qc)

Population synthesis simulations of compact binary coalescences~(CBCs) play a crucial role in extracting astrophysical insights from an ensemble of gravitational wave~(GW) observations. However, realistic simulations can be costly to implement for a dense grid of initial conditions. Normalizing flows can emulate population synthesis runs to enable simulation-based inference from observed catalogs and data augmentation for feature prediction in rarely synthesizable sub-populations. However, flow predictions can be wrought with uncertainties, especially for sparse training sets. In this work, we develop a method for quantifying and marginalizing uncertainties in the emulators by implementing the Bayesian Normalizing flow, a conditional density estimator constructed from Bayesian neural networks. Using the exact likelihood function naturally associated with density estimators, we sample the posterior distribution of flow parameters with suitably chosen priors to quantify and marginalize over flow uncertainties. We demonstrate the accuracy, calibration, inference, and data-augmentation impacts of the estimated uncertainties for simulations of binary black hole populations formed through common envelope evolution. We outline the applications of the proposed methodology in the context of simulation-based inference from growing GW catalogs and feature prediction, with state-of-the-art binary evolution simulators, now marginalized over model and data uncertainties.

[683] arXiv:2506.18072 (replaced) [pdf, html, other]
Title: Multimodal Medical Image Binding via Shared Text Embeddings
Yunhao Liu, Suyang Xi, Shiqi Liu, Hong Ding, Chicheng Jin, Chong Zhong, Junjun He, Catherine C. Liu, Yiqing Shen
Comments: 10 pages, 3 figures
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Medical image analysis increasingly relies on the integration of multiple imaging modalities to capture complementary anatomical and functional information, enabling more accurate diagnosis and treatment planning. Achieving aligned feature representations across these diverse modalities is therefore important for effective multimodal analysis. While contrastive language-image pre-training (CLIP) and its variant have enabled image-text alignments, they require explicitly paired data between arbitrary two modalities, which is difficult to acquire in medical contexts. To address the gap, we present Multimodal Medical Image Binding with Text (M\textsuperscript{3}Bind), a novel pre-training framework that enables seamless alignment of multiple medical imaging modalities through a shared text representation space without requiring explicit paired data between any two medical image modalities. Specifically, based on the insight that different images can naturally bind with text, M\textsuperscript{3}Bind first fine-tunes pre-trained CLIP-like image-text models to align their modality-specific text embedding space while preserving their original image-text alignments. Subsequently, we distill these modality-specific text encoders into a unified model, creating a shared text embedding space. Experiments on X-ray, CT, retina, ECG, and pathological images on multiple downstream tasks demonstrate that M\textsuperscript{3}Bind achieves state-of-the-art performance in zero-shot, few-shot classification and cross-modal retrieval tasks compared to its CLIP-like counterparts. These results validate M\textsuperscript{3}Bind's effectiveness in achieving cross-image-modal alignment for medical analysis.

[684] arXiv:2507.04233 (replaced) [pdf, html, other]
Title: Grid-Reg: Detector-Free Gridized Feature Learning and Matching for Large-Scale SAR-Optical Image Registration
Xiaochen Wei, Weiwei Guo, Zenghui Zhang, Wenxian Yu
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

It is highly challenging to register large-scale, heterogeneous SAR and optical images, particularly across platforms, due to significant geometric, radiometric, and temporal differences, which most existing methods struggle to address. To overcome these challenges, we propose Grid-Reg, a grid-based multimodal registration framework comprising a domain-robust descriptor extraction network, Hybrid Siamese Correlation Metric Learning Network (HSCMLNet), and a grid-based solver (Grid-Solver) for transformation parameter estimation. In heterogeneous imagery with large modality gaps and geometric differences, obtaining accurate correspondences is inherently difficult. To robustly measure similarity between gridded patches, HSCMLNet integrates a hybrid Siamese module with a correlation metric learning module (CMLModule) based on equiangular unit basis vectors (EUBVs), together with a manifold consistency loss to promote modality-invariant, discriminative feature learning. The Grid-Solver estimates transformation parameters by minimizing a global grid matching loss through a progressive dual-loop search strategy to reliably find patch correspondences across entire images. Furthermore, we curate a challenging benchmark dataset for SAR-to-optical registration using real-world UAV MiniSAR data and Google Earth optical imagery. Extensive experiments demonstrate that our proposed approach achieves superior performance over state-of-the-art methods.

[685] arXiv:2507.10383 (replaced) [pdf, html, other]
Title: Dynamical stability for dense patterns in discrete attractor neural networks
Uri Cohen, Máté Lengyel
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

Neural networks storing multiple discrete attractors are canonical models of biological memory. Previously, the dynamical stability of such networks could only be guaranteed under highly restrictive conditions. Here, we derive a theory of the local stability of discrete fixed points in a broad class of networks with graded neural activities and in the presence of noise. By directly analyzing the bulk and outliers of the Jacobian spectrum, we show that all fixed points are stable below a critical load that is distinct from the classical \textit{critical capacity} and depends on the statistics of neural activities in the fixed points as well as the single-neuron activation function. Our analysis highlights the computational benefits of threshold-linear activation and sparse-like patterns.

[686] arXiv:2507.17686 (replaced) [pdf, html, other]
Title: Debiased maximum-likelihood estimators for hazard ratios under kernel-based machine-learning adjustment
Takashi Hayakawa, Satoshi Asai
Comments: Proposition 3 of the first version was wrong. This was fixed by introducing new theoretical results in the second version so that all of the claims of the first version are valid For some reason, in the uploading of the second version, a duplicate of the first version was wrongly publicized, which we think is not our mistake. We therefore upload the updated version as version 3
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Previous studies have shown that hazard ratios between treatment groups estimated with the Cox model are uninterpretable because the unspecified baseline hazard of the model fails to identify temporal change in the risk set composition due to treatment assignment and unobserved factors among multiple, contradictory scenarios. To alleviate this problem, especially in studies based on observational data with uncontrolled dynamic treatment and real-time measurement of many covariates, we propose abandoning the baseline hazard and using kernel-based machine learning to explicitly model the change in the risk set with or without latent variables. For this framework, we clarify the context in which hazard ratios can be causally interpreted, and then develop a method based on Neyman orthogonality to compute debiased maximum-likelihood estimators of hazard ratios, proving necessary convergence results. Numerical simulations confirm that the proposed method identifies the true hazard ratios with minimal bias. These results lay the foundation for developing a useful, alternative method for causal inference with uncontrolled, observational data in modern epidemiology.

[687] arXiv:2507.19369 (replaced) [pdf, html, other]
Title: Binaural Target Speaker Extraction using HRTFs
Yoav Ellinson, Sharon Gannot
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

In this work, we aim to imitate the human ability to selectively attend to a single speaker, even in the presence of multiple simultaneous talkers. To achieve this, we propose a novel approach for binaural target speaker extraction that leverages the listener's Head-Related Transfer Function (HRTF) to isolate the desired speaker. Notably, our method does not rely on speaker embeddings, making it speaker-independent and enabling strong generalization across multiple speech datasets and languages. We employ a fully complex-valued neural network that operates directly on the complex-valued Short-Time Fourier transform (STFT) of the mixed audio signals, and compare it to a Real-Imaginary (RI)-based neural network, demonstrating the advantages of the former. We first evaluate the method in an anechoic, noise-free scenario, achieving excellent extraction performance while preserving the binaural cues of the target signal. We then extend the evaluation to reverberant conditions. Our method proves robust, maintaining speech clarity and source directionality while simultaneously reducing reverberation. A comparative analysis with existing binaural Target Speaker Extraction (TSE) methods demonstrates that our approach attains performance on par with competing techniques in terms of noise reduction and perceptual quality, while offering a clear advantage in preserving binaural this http URL-page: this https URL

[688] arXiv:2508.14376 (replaced) [pdf, html, other]
Title: A generalized Hurwitz stability criterion via rectangular block Hankel matrices for nonmonic matrix polynomials
Xuzhou Zhan, Zixiang Ni
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)

We develop a Hurwitz stability criterion for nonmonic matrix polynomials via column reduction, generalizing existing approaches constrained by the monic assumption, as well as Gantmacher's classical stability criterion via Markov parameters. Starting from redefining the associated Markov parameters through a column-wise adaptive splitting method, our framework constructs two structured matrices whose rectangular Hankel blocks are obtained via the extraction of these parameters. We establish an explicit interrelation between the inertias of column reduced matrix polynomials and the derived structured matrices. Furthermore, we demonstrate that the Hurwitz stability of column reduced matrix polynomials can be determined by the Hermitian positive definiteness of these rectangular block Hankel matrices.

[689] arXiv:2508.15318 (replaced) [pdf, html, other]
Title: Flow Matching at Scale: A Machine Learning Framework for Efficient Large-Size Sampling of Many-Body Systems
Qian-Rui Lee, Daw-Wei Wang
Subjects: Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)

We propose a machine learning framework based on Flow Matching to overcome the scaling limitations of Markov Chain Monte Carlo (MCMC) methods. We demonstrate its capability in the 2D XY model, where a single network, trained only on configurations from a small ($32\times 32$) lattice at sparse temperature points, generates reliable samples for a significantly larger system ($128\times 128$) across a continuous temperature range without retraining. The generated configurations show strong agreement with key thermodynamic observables and correctly capture the signatures of the Berezinskii-Kosterlitz-Thouless (BKT) transition. This dual generalization is enabled by the Flow Matching framework, which allows us to learn a continuous, temperature-conditioned mapping. At the same time, the inductive biases of the underlying CNN architecture ensure that the learned local physical rules are scale-invariant. This "train-small, generate-large" capability offers a powerful and efficient alternative for studying critical phenomena. The method can be directly applied to other classical or quantum many-body systems described by continuous fields on a lattice. Furthermore, this framework can serve as a powerful proposal generator in a hybrid scheme with MCMC, dramatically accelerating high-precision studies of the thermodynamic limit.

[690] arXiv:2508.19362 (replaced) [pdf, html, other]
Title: Geodesic complexity of the octahedron, and an algorithm for cut loci on convex polyhedra
Florian Frick, Pranav Rajbhandari
Comments: 44 pages, 26 figures
Subjects: Metric Geometry (math.MG); Computational Geometry (cs.CG)

The geodesic complexity of a length space $X$ quantifies the required number of case distinctions to continuously choose a shortest path connecting any given start and end point. We prove a local lower bound for the geodesic complexity of $X$ obtained by embedding simplices into $X\times X$. We additionally create and prove correctness of an algorithm to find cut loci on surfaces of convex polyhedra, as the structure of a space's cut loci is related to its geodesic complexity. We use these techniques to prove the geodesic complexity of the octahedron is four. Our method is inspired by earlier work of Recio-Mitter and Davis, and thus recovers their results on the geodesic complexity of the $n$-torus and the tetrahedron, respectively.

[691] arXiv:2508.19660 (replaced) [pdf, html, other]
Title: Arbitrary Precision Printed Ternary Neural Networks with Holistic Evolutionary Approximation
Vojtech Mrazek, Konstantinos Balaskas, Paula Carolina Lozano Duarte, Zdenek Vasicek, Mehdi B. Tahoori, Georgios Zervakis
Comments: Accepted for publication at IEEE Transactions on Circuits and Systems for Artificial Intelligence (TCASAI)
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Printed electronics offer a promising alternative for applications beyond silicon-based systems, requiring properties like flexibility, stretchability, conformality, and ultra-low fabrication costs. Despite the large feature sizes in printed electronics, printed neural networks have attracted attention for meeting target application requirements, though realizing complex circuits remains challenging. This work bridges the gap between classification accuracy and area efficiency in printed neural networks, covering the entire processing-near-sensor system design and co-optimization from the analog-to-digital interface-a major area and power bottleneck-to the digital classifier. We propose an automated framework for designing printed Ternary Neural Networks with arbitrary input precision, utilizing multi-objective optimization and holistic approximation. Our circuits outperform existing approximate printed neural networks by 17x in area and 59x in power on average, being the first to enable printed-battery-powered operation with under 5% accuracy loss while accounting for analog-to-digital interfacing costs.

[692] arXiv:2508.19897 (replaced) [pdf, html, other]
Title: The Information Dynamics of Generative Diffusion
Luca Ambrogioni
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Generative diffusion models have emerged as a powerful class of models in machine learning, yet a unified theoretical understanding of their operation is still developing. This perspective paper provides an integrated perspective on generative diffusion by connecting their dynamic, information-theoretic, and thermodynamic properties under a unified mathematical framework. We demonstrate that the rate of conditional entropy production during generation (i.e. the generative bandwidth) is directly governed by the expected divergence of the score function's vector field. This divergence, in turn, is linked to the branching of trajectories and generative bifurcations, which we characterize as symmetry-breaking phase transitions in the energy landscape. This synthesis offers a powerful insight: the process of generation is fundamentally driven by the controlled, noise-induced breaking of (approximate) symmetries, where peaks in information transfer correspond to critical transitions between possible outcomes. The score function acts as a dynamic non-linear filter that regulates the bandwidth of the noise by suppressing fluctuations that are incompatible with the data.

[693] arXiv:2508.20748 (replaced) [pdf, html, other]
Title: An Efficient Data-Driven Framework for Linear Quadratic Output Feedback Control
Jun Xie, Yuan-Hua Ni, Yiqin Yang, Bo Xu
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

Linear quadratic regulator with unmeasurable states and unknown system matrix parameters better aligns with practical scenarios. However, for this problem, balancing the optimality of the resulting controller and the leniency of the algorithm's feasibility conditions remains a non-trivial challenge, as no well-established general method has yet been developed to address this trade-off. To address this gap, this study first develops a comprehensive theoretical framework for state parameterization that equivalently substitutes for unknown states. By analyzing the controllability of consistent systems satisfied by substitute states, this framework quantifies the capability of substitute state data matrices to parameterize unknown closed-loop systems and output feedback controllers, thereby constructing a modified state parameterization form that meets the complete data parameterization condition of Willems' Fundamental Lemma. Leveraging this framework, this study proposes efficient model-free off-policy policy iteration and value iteration algorithms with theoretical guarantees to solve for the optimal output feedback controller. Compared with existing studies, particularly for multi-output problems where existing model-free reinforcement learning algorithms may fail, the proposed method removes redundant information in substitute states and the additional full row rank condition on regression matrices, thereby ensuring the solution of optimal output feedback controllers equivalent to optimal state feedback controllers for multi-output systems. Furthermore, this study pioneers a comprehensive and highly scalable theoretical analysis of state parameterization from a data-driven viewpoint, and the proposed algorithms exhibit significant advantages in implementation conditions, data demand, unknown handling, and convergence speed.

[694] arXiv:2509.00116 (replaced) [pdf, html, other]
Title: Meta-learning ecological priors from large language models explains human learning and decision making
Akshay K. Jagadish, Mirko Thalmann, Julian Coda-Forno, Marcel Binz, Eric Schulz
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

Human cognition is profoundly shaped by the environments in which it unfolds. Yet, it remains an open question whether learning and decision making can be explained as a principled adaptation to the statistical structure of real-world tasks. We introduce ecologically rational analysis, a computational framework that unifies the normative foundations of rational analysis with ecological grounding. Leveraging large language models to generate ecologically valid cognitive tasks at scale, and using meta-learning to derive rational models optimized for these environments, we develop a new class of learning algorithms: Ecologically Rational Meta-learned Inference (ERMI). ERMI internalizes the statistical regularities of naturalistic problem spaces and adapts flexibly to novel situations, without requiring hand-crafted heuristics or explicit parameter updates. We show that ERMI captures human behavior across 15 experiments spanning function learning, category learning, and decision making, outperforming several established cognitive models in trial-by-trial prediction. Our results suggest that much of human cognition may reflect adaptive alignment to the ecological structure of the problems we encounter in everyday life.

[695] arXiv:2509.01057 (replaced) [pdf, html, other]
Title: Q-Learning-Driven Adaptive Rewiring for Cooperative Control in Heterogeneous Networks
Yi-Ning Weng, Hsuan-Wei Lee
Comments: 40 pages, 9 figures
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)

Cooperation emergence in multi-agent systems represents a fundamental statistical physics problem where microscopic learning rules drive macroscopic collective behavior transitions. We propose a Q-learning-based variant of adaptive rewiring that builds on mechanisms studied in the literature. This method combines temporal difference learning with network restructuring so that agents can optimize strategies and social connections based on interaction histories. Through neighbor-specific Q-learning, agents develop sophisticated partnership management strategies that enable cooperator cluster formation, creating spatial separation between cooperative and defective regions. Using power-law networks that reflect real-world heterogeneous connectivity patterns, we evaluate emergent behaviors under varying rewiring constraint levels, revealing distinct cooperation patterns across parameter space rather than sharp thermodynamic transitions. Our systematic analysis identifies three behavioral regimes: a permissive regime (low constraints) enabling rapid cooperative cluster formation, an intermediate regime with sensitive dependence on dilemma strength, and a patient regime (high constraints) where strategic accumulation gradually optimizes network structure. Simulation results show that while moderate constraints create transition-like zones that suppress cooperation, fully adaptive rewiring enhances cooperation levels through systematic exploration of favorable network configurations. Quantitative analysis reveals that increased rewiring frequency drives large-scale cluster formation with power-law size distributions. Our results establish a new paradigm for understanding intelligence-driven cooperation pattern formation in complex adaptive systems, revealing how machine learning serves as an alternative driving force for spontaneous organization in multi-agent networks.

[696] arXiv:2509.02327 (replaced) [pdf, other]
Title: Variational Uncertainty Decomposition for In-Context Learning
I. Shavindra Jayasekera, Jacob Si, Filippo Valdettaro, Wenlong Chen, A. Aldo Faisal, Yingzhen Li
Comments: Fixing author order; typo p.20
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

As large language models (LLMs) gain popularity in conducting prediction tasks in-context, understanding the sources of uncertainty in in-context learning becomes essential to ensuring reliability. The recent hypothesis of in-context learning performing predictive Bayesian inference opens the avenue for Bayesian uncertainty estimation, particularly for decomposing uncertainty into epistemic uncertainty due to lack of in-context data and aleatoric uncertainty inherent in the in-context prediction task. However, the decomposition idea remains under-explored due to the intractability of the latent parameter posterior from the underlying Bayesian model. In this work, we introduce a variational uncertainty decomposition framework for in-context learning without explicitly sampling from the latent parameter posterior, by optimising auxiliary queries as probes to obtain an upper bound to the aleatoric uncertainty of an LLM's in-context learning procedure, which also induces a lower bound to the epistemic uncertainty. Through experiments on synthetic and real-world tasks, we show quantitatively and qualitatively that the decomposed uncertainties obtained from our method exhibit desirable properties of epistemic and aleatoric uncertainty.

[697] arXiv:2509.02476 (replaced) [pdf, html, other]
Title: Wild Refitting for Model-Free Excess Risk Evaluation of Opaque ML/AI Models under Bregman Loss
Haichen Hu, David Simchi-Levi
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study the problem of evaluating the excess risk of classical penalized empirical risk minimization (ERM) with Bregman losses. We show that by leveraging the recently proposed wild refitting procedure (Wainwright, 2025), one can efficiently upper bound the excess risk through the so-called "wild optimism," without relying on the global structure of the underlying function class. This property makes our approach inherently model-free. Unlike conventional analyses, our framework operates with just one dataset and black-box access to the training procedure. The method involves randomized vector-valued symmetrization with an appropriate scaling of the prediction residues and constructing artificially modified outcomes, upon which we retrain a second predictor for excess risk estimation. We establish high-probability performance guarantees both under the fixed design setting and the random design setting, demonstrating that wild refitting under Bregman losses, with an appropriately chosen wild noise scale, yields a valid upper bound on the excess risk. This work thus is promising for theoretically evaluating modern opaque ML and AI models such as deep neural networks and large language models, where the model class is too complex for classical learning theory and empirical process techniques to apply.

Total of 697 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack