-
MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration
Authors:
Jia-Kai Dong,
I-Wei Huang,
Chun-Tin Wu,
Yi-Tien Tsai
Abstract:
We introduce MSC-Bench, a large-scale benchmark for evaluating multi-hop, end-to-end tool orchestration by LLM agents in a hierarchical Model-Context Protocol (MCP) ecosystem. Existing benchmarks often evaluate tools in isolation, ignoring challenges such as functional overlap and cross-server orchestration, leading to overly optimistic assessments. MSC-Bench addresses these gaps by constructing g…
▽ More
We introduce MSC-Bench, a large-scale benchmark for evaluating multi-hop, end-to-end tool orchestration by LLM agents in a hierarchical Model-Context Protocol (MCP) ecosystem. Existing benchmarks often evaluate tools in isolation, ignoring challenges such as functional overlap and cross-server orchestration, leading to overly optimistic assessments. MSC-Bench addresses these gaps by constructing ground truth through 'equal function sets', allowing objective metrics such as F1 score and reducing the dependency on LLM-as-a-judge evaluation. Organized as a five-level curriculum, it systematically tests agent capabilities from single-tool orchestration to complex cross-server planning, and robustness to out-of-scope requests. Experiments reveal that rigid hierarchies can hinder performance without co-designed strategies, and even state-of-the-art agents exhibit systemic weaknesses in robustness. MSC-Bench provides a diagnostic framework to expose these limitations and guide the development of more capable and efficient tool-using agents. The benchmark and resources are publicly available at https://github.com/snooow1029/MSC_Bench.
△ Less
Submitted 22 October, 2025;
originally announced October 2025.
-
When LRP Diverges from Leave-One-Out in Transformers
Authors:
Weiqiu You,
Siqi Zeng,
Yao-Hung Hubert Tsai,
Makoto Yamada,
Han Zhao
Abstract:
Leave-One-Out (LOO) provides an intuitive measure of feature importance but is computationally prohibitive. While Layer-Wise Relevance Propagation (LRP) offers a potentially efficient alternative, its axiomatic soundness in modern Transformers remains largely under-examined. In this work, we first show that the bilinear propagation rules used in recent advances of AttnLRP violate the implementatio…
▽ More
Leave-One-Out (LOO) provides an intuitive measure of feature importance but is computationally prohibitive. While Layer-Wise Relevance Propagation (LRP) offers a potentially efficient alternative, its axiomatic soundness in modern Transformers remains largely under-examined. In this work, we first show that the bilinear propagation rules used in recent advances of AttnLRP violate the implementation invariance axiom. We prove this analytically and confirm it empirically in linear attention layers. Second, we also revisit CP-LRP as a diagnostic baseline and find that bypassing relevance propagation through the softmax layer -- backpropagating relevance only through the value matrices -- significantly improves alignment with LOO, particularly in middle-to-late Transformer layers. Overall, our results suggest that (i) bilinear factorization sensitivity and (ii) softmax propagation error potentially jointly undermine LRP's ability to approximate LOO in Transformers.
△ Less
Submitted 21 October, 2025;
originally announced October 2025.
-
Multimodal Chip Physical Design Engineer Assistant
Authors:
Yun-Da Tsai,
Chang-Yu Chao,
Liang-Yeh Shen,
Tsung-Han Lin,
Haoyu Yang,
Mark Ho,
Yi-Chen Lu,
Wen-Hao Liu,
Shou-De Lin,
Haoxing Ren
Abstract:
Modern chip physical design relies heavily on Electronic Design Automation (EDA) tools, which often struggle to provide interpretable feedback or actionable guidance for improving routing congestion. In this work, we introduce a Multimodal Large Language Model Assistant (MLLMA) that bridges this gap by not only predicting congestion but also delivering human-interpretable design suggestions. Our m…
▽ More
Modern chip physical design relies heavily on Electronic Design Automation (EDA) tools, which often struggle to provide interpretable feedback or actionable guidance for improving routing congestion. In this work, we introduce a Multimodal Large Language Model Assistant (MLLMA) that bridges this gap by not only predicting congestion but also delivering human-interpretable design suggestions. Our method combines automated feature generation through MLLM-guided genetic prompting with an interpretable preference learning framework that models congestion-relevant tradeoffs across visual, tabular, and textual inputs. We compile these insights into a "Design Suggestion Deck" that surfaces the most influential layout features and proposes targeted optimizations. Experiments on the CircuitNet benchmark demonstrate that our approach outperforms existing models on both accuracy and explainability. Additionally, our design suggestion guidance case study and qualitative analyses confirm that the learned preferences align with real-world design principles and are actionable for engineers. This work highlights the potential of MLLMs as interactive assistants for interpretable and context-aware physical design optimization.
△ Less
Submitted 2 July, 2025;
originally announced October 2025.
-
A Modular AIoT Framework for Low-Latency Real-Time Robotic Teleoperation in Smart Cities
Authors:
Shih-Chieh Sun,
Yun-Cheng Tsai
Abstract:
This paper presents an AI-driven IoT robotic teleoperation system designed for real-time remote manipulation and intelligent visual monitoring, tailored for smart city applications. The architecture integrates a Flutter-based cross-platform mobile interface with MQTT-based control signaling and WebRTC video streaming via the LiveKit framework. A YOLOv11-nano model is deployed for lightweight objec…
▽ More
This paper presents an AI-driven IoT robotic teleoperation system designed for real-time remote manipulation and intelligent visual monitoring, tailored for smart city applications. The architecture integrates a Flutter-based cross-platform mobile interface with MQTT-based control signaling and WebRTC video streaming via the LiveKit framework. A YOLOv11-nano model is deployed for lightweight object detection, enabling real-time perception with annotated visual overlays delivered to the user interface. Control commands are transmitted via MQTT to an ESP8266-based actuator node, which coordinates multi-axis robotic arm motion through an Arduino Mega2560 controller. The backend infrastructure is hosted on DigitalOcean, ensuring scalable cloud orchestration and stable global communication. Latency evaluations conducted under both local and international VPN scenarios (including Hong Kong, Japan, and Belgium) demonstrate actuator response times as low as 0.2 seconds and total video latency under 1.2 seconds, even across high-latency networks. This low-latency dual-protocol design ensures responsive closed-loop interaction and robust performance in distributed environments. Unlike conventional teleoperation platforms, the proposed system emphasizes modular deployment, real-time AI sensing, and adaptable communication strategies, making it well-suited for smart city scenarios such as remote infrastructure inspection, public equipment servicing, and urban automation. Future enhancements will focus on edge-device deployment, adaptive routing, and integration with city-scale IoT networks to enhance resilience and scalability.
△ Less
Submitted 13 October, 2025;
originally announced October 2025.
-
Digital Twin-enabled Multi-generation Control Co-Design with Deep Reinforcement Learning
Authors:
Ying-Kuan Tsai,
Vispi Karkaria,
Yi-Ping Chen,
Wei Chen
Abstract:
Control Co-Design (CCD) integrates physical and control system design to improve the performance of dynamic and autonomous systems. Despite advances in uncertainty-aware CCD methods, real-world uncertainties remain highly unpredictable. Multi-generation design addresses this challenge by considering the full lifecycle of a product: data collected from each generation informs the design of subseque…
▽ More
Control Co-Design (CCD) integrates physical and control system design to improve the performance of dynamic and autonomous systems. Despite advances in uncertainty-aware CCD methods, real-world uncertainties remain highly unpredictable. Multi-generation design addresses this challenge by considering the full lifecycle of a product: data collected from each generation informs the design of subsequent generations, enabling progressive improvements in robustness and efficiency. Digital Twin (DT) technology further strengthens this paradigm by creating virtual representations that evolve over the lifecycle through real-time sensing, model updating, and adaptive re-optimization. This paper presents a DT-enabled CCD framework that integrates Deep Reinforcement Learning (DRL) to jointly optimize physical design and controller. DRL accelerates real-time decision-making by allowing controllers to continuously learn from data and adapt to uncertain environments. Extending this approach, the framework employs a multi-generation paradigm, where each cycle of deployment, operation, and redesign uses collected data to refine DT models, improve uncertainty quantification through quantile regression, and inform next-generation designs of both physical components and controllers. The framework is demonstrated on an active suspension system, where DT-enabled learning from road conditions and driving behaviors yields smoother and more stable control trajectories. Results show that the method significantly enhances dynamic performance, robustness, and efficiency. Contributions of this work include: (1) extending CCD into a lifecycle-oriented multi-generation framework, (2) leveraging DTs for continuous model updating and informed design, and (3) employing DRL to accelerate adaptive real-time decision-making.
△ Less
Submitted 12 October, 2025;
originally announced October 2025.
-
Identification of low-energy kaons in the ProtoDUNE-SP detector
Authors:
DUNE Collaboration,
S. Abbaslu,
F. Abd Alrahman,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos
, et al. (1325 additional authors not shown)
Abstract:
The Deep Underground Neutrino Experiment (DUNE) is a next-generation neutrino experiment with a rich physics program that includes searches for the hypothetical phenomenon of proton decay. Utilizing liquid-argon time-projection chamber technology, DUNE is expected to achieve world-leading sensitivity in the proton decay channels that involve charged kaons in their final states. The first DUNE demo…
▽ More
The Deep Underground Neutrino Experiment (DUNE) is a next-generation neutrino experiment with a rich physics program that includes searches for the hypothetical phenomenon of proton decay. Utilizing liquid-argon time-projection chamber technology, DUNE is expected to achieve world-leading sensitivity in the proton decay channels that involve charged kaons in their final states. The first DUNE demonstrator, ProtoDUNE Single-Phase, was a 0.77 kt detector that operated from 2018 to 2020 at the CERN Neutrino Platform, exposed to a mixed hadron and electron test-beam with momenta ranging from 0.3 to 7 GeV/c. We present a selection of low-energy kaons among the secondary particles produced in hadronic reactions, using data from the 6 and 7 GeV/c beam runs. The selection efficiency is 1\% and the sample purity 92\%. The initial energies of the selected kaon candidates encompass the expected energy range of kaons originating from proton decay events in DUNE (below $\sim$200 MeV). In addition, we demonstrate the capability of this detector technology to discriminate between kaons and other particles such as protons and muons, and provide a comprehensive description of their energy loss in liquid argon, which shows good agreement with the simulation. These results pave the way for future proton decay searches at DUNE.
△ Less
Submitted 9 October, 2025;
originally announced October 2025.
-
pyGinkgo: A Sparse Linear Algebra Operator Framework for Python
Authors:
Keshvi Tuteja,
Gregor Olenik,
Roman Mishchuk,
Yu-Hsiang Tsai,
Markus Götz,
Achim Streit,
Hartwig Anzt,
Charlotte Debus
Abstract:
Sparse linear algebra is a cornerstone of many scientific computing and machine learning applications. Python has become a popular choice for these applications due to its simplicity and ease of use. Yet high performance sparse kernels in Python remain limited in functionality, especially on modern CPU and GPU architectures. We present pyGinkgo, a lightweight and Pythonic interface to the Ginkgo l…
▽ More
Sparse linear algebra is a cornerstone of many scientific computing and machine learning applications. Python has become a popular choice for these applications due to its simplicity and ease of use. Yet high performance sparse kernels in Python remain limited in functionality, especially on modern CPU and GPU architectures. We present pyGinkgo, a lightweight and Pythonic interface to the Ginkgo library, offering high-performance sparse linear algebra support with platform portability across CUDA, HIP, and OpenMP backends. pyGinkgo bridges the gap between high-performance C++ backends and Python usability by exposing Ginkgo's capabilities via Pybind11 and a NumPy and PyTorch compatible interface. We benchmark pyGinkgo's performance against state-of-the-art Python libraries including SciPy, CuPy, PyTorch, and TensorFlow. Results across hardware from different vendors demonstrate that pyGinkgo consistently outperforms existing Python tools in both sparse matrix vector (SpMV) product and iterative solver performance, while maintaining performance parity with native Ginkgo C++ code. Our work positions pyGinkgo as a compelling backend for sparse machine learning models and scientific workflows.
△ Less
Submitted 9 October, 2025;
originally announced October 2025.
-
Reasoning under Vision: Understanding Visual-Spatial Cognition in Vision-Language Models for CAPTCHA
Authors:
Python Song,
Luke Tenyi Chang,
Yun-Yun Tsai,
Penghui Li,
Junfeng Yang
Abstract:
CAPTCHA, originally designed to distinguish humans from robots, has evolved into a real-world benchmark for assessing the spatial reasoning capabilities of vision-language models. In this work, we first show that step-by-step reasoning is crucial for vision-language models (VLMs) to solve CAPTCHAs, which represent high-difficulty spatial reasoning tasks, and that current commercial vision-language…
▽ More
CAPTCHA, originally designed to distinguish humans from robots, has evolved into a real-world benchmark for assessing the spatial reasoning capabilities of vision-language models. In this work, we first show that step-by-step reasoning is crucial for vision-language models (VLMs) to solve CAPTCHAs, which represent high-difficulty spatial reasoning tasks, and that current commercial vision-language models still struggle with such reasoning. In particular, we observe that most commercial VLMs (e.g., Gemini, Claude, GPT, etc.) fail to effectively solve CAPTCHAs and thus achieve low accuracy (around 21.9 percent). However, our findings indicate that requiring the model to perform step-by-step reasoning before generating the final coordinates can significantly enhance its solving accuracy, underscoring the severity of the gap. To systematically study this issue, we introduce CAPTCHA-X, the first real-world CAPTCHA benchmark with reasoning, covering seven categories of CAPTCHAs (such as Gobang, hCaptcha, etc.) with step-by-step action solutions and grounding annotations. We further define five reasoning-oriented metrics that enable a comprehensive evaluation of models reasoning capabilities. To validate the effectiveness of reasoning, we also propose a general agentic VLM-based framework that incorporates the models inherent reasoning abilities. Our method achieves state-of-the-art performance across five high-difficulty CAPTCHA types, with an average solving accuracy of 83.9 percent, substantially surpassing existing baselines. These results reveal the limitations of current models and highlight the importance of reasoning in advancing visual-spatial challenges in the future.
△ Less
Submitted 7 October, 2025;
originally announced October 2025.
-
Feebly Interacting Particles: FIPs at LHCb
Authors:
J. Alimena,
J. Boyd,
G. Cacciapaglia,
A. Casais Vidal,
X. Cid Vidal,
S. Collaviti,
A. De Oyanguren Campos,
G. Dalla Valle Garcia,
G. Elor,
G. Ferretti,
D. Gorbunov,
E. Goudzovski,
J. Hajer,
J. Jerhot,
B. Kishor Jashal,
V. Kholoimov,
J. Klaric,
F. Kling,
E. Kriukova,
Y. Kyselov,
G. Lanfranchi,
C. Langenbruch,
A. Merli,
M. Ovchinnykov,
J. Pfaller
, et al. (12 additional authors not shown)
Abstract:
With the establishment and maturation of the experimental programs searching for new physics with sizeable couplings at the LHC, there is an increasing interest in the broader particle and astrophysics community for exploring the physics of light and feebly-interacting particles as a paradigm complementary to a New Physics sector at the TeV scale and beyond. FIPs@LHCb continues the successful seri…
▽ More
With the establishment and maturation of the experimental programs searching for new physics with sizeable couplings at the LHC, there is an increasing interest in the broader particle and astrophysics community for exploring the physics of light and feebly-interacting particles as a paradigm complementary to a New Physics sector at the TeV scale and beyond. FIPs@LHCb continues the successful series of the FIPs workshops, FIPs 2020 and FIPs 2022. The main focus of the workshop was to explore the LHCb potential to search for FIPs thanks to the new software trigger deployed during the recent upgrade. Equally important goals of the workshop were to update the available parameter space in the commonly used FIPs benchmarks by including recent results from the high energy physics community and to discuss recent theory progress necessary for a more accurate definition of observables related to FIP benchmarks. This document presents the summary of the talks presented at the workshops and the outcome of subsequent discussions.
△ Less
Submitted 6 October, 2025;
originally announced October 2025.
-
Relevance-Zone Reduction in Game Solving
Authors:
Chi-Huang Lin,
Ting Han Wei,
Chun-Jui Wang,
Hung Guei,
Chung-Chin Shih,
Yun-Jui Tsai,
I-Chen Wu,
Ti-Rong Wu
Abstract:
Game solving aims to find the optimal strategies for all players and determine the theoretical outcome of a game. However, due to the exponential growth of game trees, many games remain unsolved, even though methods like AlphaZero have demonstrated super-human level in game playing. The Relevance-Zone (RZ) is a local strategy reuse technique that restricts the search to only the regions relevant t…
▽ More
Game solving aims to find the optimal strategies for all players and determine the theoretical outcome of a game. However, due to the exponential growth of game trees, many games remain unsolved, even though methods like AlphaZero have demonstrated super-human level in game playing. The Relevance-Zone (RZ) is a local strategy reuse technique that restricts the search to only the regions relevant to the outcome, significantly reducing the search space. However, RZs are not unique. Different solutions may result in RZs of varying sizes. Smaller RZs are generally more favorable, as they increase the chance of reuse and improve pruning efficiency. To this end, we propose an iterative RZ reduction method that repeatedly solves the same position while gradually restricting the region involved, guiding the solver toward smaller RZs. We design three constraint generation strategies and integrate an RZ Pattern Table to fully leverage past solutions. In experiments on 7x7 Killall-Go, our method reduces the average RZ size to 85.95% of the original. Furthermore, the reduced RZs can be permanently stored as reusable knowledge for future solving tasks, especially for larger board sizes or different openings.
△ Less
Submitted 1 October, 2025;
originally announced October 2025.
-
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
Authors:
Yu-Che Tsai,
Kuan-Yu Chen,
Yuan-Chi Li,
Yuan-Hao Chen,
Ching-Yu Tsai,
Shou-De Lin
Abstract:
Existing large language model (LLM)-based embeddings typically adopt an encoder-only paradigm, treating LLMs as static feature extractors and overlooking their core generative strengths. We introduce GIRCSE (Generative Iterative Refinement for Contrastive Sentence Embeddings), a novel framework that leverages autoregressive generation to iteratively refine semantic representations. By producing se…
▽ More
Existing large language model (LLM)-based embeddings typically adopt an encoder-only paradigm, treating LLMs as static feature extractors and overlooking their core generative strengths. We introduce GIRCSE (Generative Iterative Refinement for Contrastive Sentence Embeddings), a novel framework that leverages autoregressive generation to iteratively refine semantic representations. By producing sequences of soft tokens optimized under contrastive objective, GIRCSE captures latent concepts and implicit semantics that encoder-only methods often miss. To guide this process, we propose an Iterative Contrastive Refinement (ICR) objective that encourages each refinement step to yield better representations. Extensive experiments show that GIRCSE outperforms strong LLM-based embedding baselines on the MTEB benchmark and instruction-following tasks. Moreover, GIRCSE exhibits an emergent test-time scaling property: generating more tokens at inference steadily improves embedding quality. Our results establish generative iterative refinement as a new paradigm for representation learning.
△ Less
Submitted 29 September, 2025;
originally announced September 2025.
-
Collisional Baryon-Dominated Dwarf Galaxies: A New Probe of Bursty Feedback and Dark Matter Physics
Authors:
Yi-Ying Wang,
Daneng Yang,
Keyu Lu,
Yue-Lin Sming Tsai,
Yi-Zhong Fan
Abstract:
High-velocity collisions between gas-rich ultra-diffuse galaxies present a promising formation channel for baryon-dominated dwarf galaxies (BDDGs). Using hydrodynamical simulations, we show that the progenitors' baryonic binding energy, $|E_{\rm bind}|$, critically controls the outcome. Repeated potential fluctuations, e.g., from bursty feedback, inject energy and reduce $|E_{\rm bind}|$ by…
▽ More
High-velocity collisions between gas-rich ultra-diffuse galaxies present a promising formation channel for baryon-dominated dwarf galaxies (BDDGs). Using hydrodynamical simulations, we show that the progenitors' baryonic binding energy, $|E_{\rm bind}|$, critically controls the outcome. Repeated potential fluctuations, e.g., from bursty feedback, inject energy and reduce $|E_{\rm bind}|$ by $\approx 15\%$, yielding fewer but substantially more massive BDDGs. By contrast, elastic self-interacting dark matter produces comparable cores without lowering $|E_{\rm bind}|$, resulting in negligible effect. This provides a novel way to distinguish between two leading galactic core formation channels, i.e., the baryon feedback and elastic dark matter self-interaction. Among 15 paired simulation runs, 13 show higher BDDG masses in the weakened-binding case, and about two thirds exhibit $>100\%$ mass enhancements. The simulations also predict systematically lower gas fractions due to sustained post-collision star formation, yielding a clean observational signature. Upcoming wide-field imaging (CSST, LSST), HI surveys (FAST), and kinematic follow-up will be crucial to test this scenario.
△ Less
Submitted 29 September, 2025;
originally announced September 2025.
-
Directly Probing Neutrino Interactions through CMB Phase Shift Measurements
Authors:
Gabriele Montefalcone,
Subhajit Ghosh,
Kimberly K. Boddy,
Daven Wei Ren Ho,
Yuhsin Tsai
Abstract:
Perturbations in the cosmic neutrino background produce a characteristic phase shift in the acoustic oscillations imprinted in the anisotropies of the cosmic microwave background (CMB), providing a unique observational probe of neutrino physics. In this work, we explore how this phase shift signature is altered in the presence of neutrino interactions with temperature-dependent scattering rates, m…
▽ More
Perturbations in the cosmic neutrino background produce a characteristic phase shift in the acoustic oscillations imprinted in the anisotropies of the cosmic microwave background (CMB), providing a unique observational probe of neutrino physics. In this work, we explore how this phase shift signature is altered in the presence of neutrino interactions with temperature-dependent scattering rates, motivated by physical constructions for neutrino self-interactions and neutrino-dark matter couplings. A key finding is that the phase shift in these realistic models -- characterized by gradual rather than instantaneous decoupling -- maintains the same functional form as the free-streaming template, with only the asymptotic amplitude decreasing for stronger interactions that delay decoupling. This simple parametrization enables us to directly constrain neutrino interactions through phase shift measurements in the temperature and polarization power spectra from CMB observations. Analyzing the latest data from \textit{Planck}, the Atacama Cosmology Telescope, and the South Pole Telescope, we derive strong constraints on the neutrino decoupling redshift. Our global analysis indicates that neutrinos have been freely streaming since deep within the radiation-dominated epoch. We also explore flavor-dependent scenarios in which only one neutrino species interacts. Overall, our work establishes a signature-driven framework that exploits the clean phase shift signal in the acoustic oscillations of the CMB as a precise and robust probe of non-standard neutrino interactions in the early universe.
△ Less
Submitted 24 September, 2025;
originally announced September 2025.
-
Cryogenics and purification systems of the ICARUS T600 detector installation at Fermilab
Authors:
F. Abd Alrahman,
P. Abratenko,
N. Abrego-Martinez,
A. Aduszkiewicz,
F. Akbar,
L. Aliaga Soplin,
M. Artero Pons,
J. Asaadi,
W. F. Badgett,
B. Behera,
V. Bellini,
R. Benocci,
J. Berger,
S. Berkman,
O. Beltramello,
S. Bertolucci,
M. Betancourt,
A. Blanchet,
F. Boffelli,
M. Bonesini,
T. Boone,
B. Bottino,
A. Braggiotti,
J. Bremer,
S. J. Brice
, et al. (172 additional authors not shown)
Abstract:
This paper describes the cryogenic and purification systems of the ICARUS T600 detector in its present implementation at the Fermi National Laboratory, Illinois, USA. The ICARUS T600 detector is made of four large Time Projection Chambers, installed in two separate containers of about 275 m3 each. The detector uses liquid argon both as target and as active media. For the correct operation of the d…
▽ More
This paper describes the cryogenic and purification systems of the ICARUS T600 detector in its present implementation at the Fermi National Laboratory, Illinois, USA. The ICARUS T600 detector is made of four large Time Projection Chambers, installed in two separate containers of about 275 m3 each. The detector uses liquid argon both as target and as active media. For the correct operation of the detector, the liquid argon must be kept in very stable thermal conditions and the contamination of electronegative impurities must be consistently kept at the level of small fractions of parts per billion. The detector was previously operated in Italy, at the INFN Gran Sasso Underground laboratory, in a 3 year duration run on the CERN to LNGS Long Baseline Neutrino Beam. For its operation on the Booster and NuMI neutrino beams, at Fermilab, for the search of sterile neutrinos and measurements of neutrino-argon cross sections, the detector was moved from Gran Sasso to CERN for the upgrades required for operation at shallow depth with high intensity neutrino beams. The liquid argon containers, the thermal insulation and all the cryogenic equipment, have been completely re-designed and rebuild, following the schemes of the previous installation in Gran Sasso. The detector and all the equipment have been transported to Fermilab, where they have been installed, tested and recently put into operation. The work described in this paper has been conducted as a joint responsibility of CERN and Fermilab with the supervision provided by the Icarus Collaboration. Design, installation, testing, commissioning and operation is the result of a common effort of CERN, Fermilab and INFN Groups.
△ Less
Submitted 1 October, 2025; v1 submitted 22 September, 2025;
originally announced September 2025.
-
Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation
Authors:
Biwen Lei,
Yang Li,
Xinhai Liu,
Shuhui Yang,
Lixin Xu,
Jingwei Huang,
Ruining Tang,
Haohan Weng,
Jian Liu,
Jing Xu,
Zhen Zhou,
Yiling Zhu,
Jiankai Xing,
Jiachen Xu,
Changfeng Ma,
Xinhao Yan,
Yunhan Yang,
Chunshi Wang,
Duoteng Xu,
Xueqi Ma,
Yuguang Chen,
Jing Li,
Mingxin Yang,
Sheng Zhang,
Yifei Feng
, et al. (75 additional authors not shown)
Abstract:
The creation of high-quality 3D assets, a cornerstone of modern game development, has long been characterized by labor-intensive and specialized workflows. This paper presents Hunyuan3D Studio, an end-to-end AI-powered content creation platform designed to revolutionize the game production pipeline by automating and streamlining the generation of game-ready 3D assets. At its core, Hunyuan3D Studio…
▽ More
The creation of high-quality 3D assets, a cornerstone of modern game development, has long been characterized by labor-intensive and specialized workflows. This paper presents Hunyuan3D Studio, an end-to-end AI-powered content creation platform designed to revolutionize the game production pipeline by automating and streamlining the generation of game-ready 3D assets. At its core, Hunyuan3D Studio integrates a suite of advanced neural modules (such as Part-level 3D Generation, Polygon Generation, Semantic UV, etc.) into a cohesive and user-friendly system. This unified framework allows for the rapid transformation of a single concept image or textual description into a fully-realized, production-quality 3D model complete with optimized geometry and high-fidelity PBR textures. We demonstrate that assets generated by Hunyuan3D Studio are not only visually compelling but also adhere to the stringent technical requirements of contemporary game engines, significantly reducing iteration time and lowering the barrier to entry for 3D content creation. By providing a seamless bridge from creative intent to technical asset, Hunyuan3D Studio represents a significant leap forward for AI-assisted workflows in game development and interactive media.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
Quantum-Enhanced Forecasting for Deep Reinforcement Learning in Algorithmic Trading
Authors:
Jun-Hao Chen,
Yu-Chien Huang,
Yun-Cheng Tsai,
Samuel Yen-Chi Chen
Abstract:
The convergence of quantum-inspired neural networks and deep reinforcement learning offers a promising avenue for financial trading. We implemented a trading agent for USD/TWD by integrating Quantum Long Short-Term Memory (QLSTM) for short-term trend prediction with Quantum Asynchronous Advantage Actor-Critic (QA3C), a quantum-enhanced variant of the classical A3C. Trained on data from 2000-01-01…
▽ More
The convergence of quantum-inspired neural networks and deep reinforcement learning offers a promising avenue for financial trading. We implemented a trading agent for USD/TWD by integrating Quantum Long Short-Term Memory (QLSTM) for short-term trend prediction with Quantum Asynchronous Advantage Actor-Critic (QA3C), a quantum-enhanced variant of the classical A3C. Trained on data from 2000-01-01 to 2025-04-30 (80\% training, 20\% testing), the long-only agent achieves 11.87\% return over around 5 years with 0.92\% max drawdown, outperforming several currency ETFs. We detail state design (QLSTM features and indicators), reward function for trend-following/risk control, and multi-core training. Results show hybrid models yield competitive FX trading performance. Implications include QLSTM's effectiveness for small-profit trades with tight risk and future enhancements. Key hyperparameters: QLSTM sequence length$=$4, QA3C workers$=$8. Limitations: classical quantum simulation and simplified strategy. \footnote{The views expressed in this article are those of the authors and do not represent the views of Wells Fargo. This article is for informational purposes only. Nothing contained in this article should be construed as investment advice. Wells Fargo makes no express or implied warranties and expressly disclaims all legal, tax, and accounting implications related to this article.
△ Less
Submitted 11 September, 2025; v1 submitted 11 September, 2025;
originally announced September 2025.
-
A Novel Summation Formula for the Hurwitz-Kronecker Class Number
Authors:
Yi-Ju Tsai
Abstract:
The purpose of this paper is to present a novel and elegant summation formula for $H_w$, the Kronecker-Hurwitz class number. Specifically, for any prime $p$, we have the formula: $$ \sum_{t^2<p} H_w(t^2-p) = \frac{p-2}{3}. $$
The purpose of this paper is to present a novel and elegant summation formula for $H_w$, the Kronecker-Hurwitz class number. Specifically, for any prime $p$, we have the formula: $$ \sum_{t^2<p} H_w(t^2-p) = \frac{p-2}{3}. $$
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
Towards mono-energetic virtual $ν$ beam cross-section measurements: A feasibility study of $ν$-Ar interaction analysis with DUNE-PRISM
Authors:
DUNE Collaboration,
S. Abbaslu,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos,
M. Andreotti
, et al. (1302 additional authors not shown)
Abstract:
Neutrino-nucleus cross-section measurements are critical for future neutrino oscillation analyses. However, our models to describe them require further refinement, and a deeper understanding of the underlying physics is essential for future neutrino oscillation experiments to realize their ambitious physics goals. Current neutrino cross-section measurements provide clear deficiencies in neutrino i…
▽ More
Neutrino-nucleus cross-section measurements are critical for future neutrino oscillation analyses. However, our models to describe them require further refinement, and a deeper understanding of the underlying physics is essential for future neutrino oscillation experiments to realize their ambitious physics goals. Current neutrino cross-section measurements provide clear deficiencies in neutrino interaction modeling, but almost all are reported averaged over broad neutrino fluxes, rendering their interpretation challenging. Using the DUNE-PRISM concept (Deep Underground Neutrino Experiment Precision Reaction Independent Spectrum Measurement) -- a movable near detector that samples multiple off-axis positions -- neutrino interaction measurements can be used to construct narrow virtual fluxes (less than 100 MeV wide). These fluxes can be used to extract charged-current neutrino-nucleus cross sections as functions of outgoing lepton kinematics within specific neutrino energy ranges. Based on a dedicated simulation with realistic event statistics and flux-related systematic uncertainties, but assuming an almost-perfect detector, we run a feasibility study demonstrating how DUNE-PRISM data can be used to measure muon neutrino charged-current integrated and differential cross sections over narrow fluxes. We find that this approach enables a model independent reconstruction of powerful observables, including energy transfer, typically accessible only in electron scattering measurements, but that large exposures may be required for differential cross-section measurements with few-\% statistical uncertainties.
△ Less
Submitted 9 September, 2025;
originally announced September 2025.
-
Boiling After the Dust Settles: Constraining First-Order Phase Transitions During Dark Energy Domination
Authors:
Seth Koren,
Yuhsin Tsai,
Runqing Wang
Abstract:
A first-order phase transition could occur in the late universe when vacuum energy begins dominating the energy density ($z \lesssim 0.3$) and convert some latent heat into other forms such as invisible radiation. This generic possibility also has concrete motivation in particle physics models which invoke a multitude of vacua to address theoretical puzzles. The naïve constraint on such an event c…
▽ More
A first-order phase transition could occur in the late universe when vacuum energy begins dominating the energy density ($z \lesssim 0.3$) and convert some latent heat into other forms such as invisible radiation. This generic possibility also has concrete motivation in particle physics models which invoke a multitude of vacua to address theoretical puzzles. The naïve constraint on such an event comes from measurements of the Hubble expansion rate, but this can only probe transitions involving $\mathcal{O}(10)\%$ of the dark energy. In this work, we show that significantly tighter constraints appear when accounting for phase transition fluctuations affecting CMB photon propagation anisotropically, akin to the integrated Sachs-Wolfe effect. For instance, if a completed phase transition has $β/H_\star\lesssim 25$, current CMB data limits the associated vacuum energy released to less than $1\%$ of the dark energy. A transition to negative vacuum energy (quasi-anti-de Sitter) is allowed only for $β/H_\star \gtrsim 300$. For $β/H_\star \lesssim 500$, the universe will not crunch for at least $14$ Gyr.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
Operation of a Modular 3D-Pixelated Liquid Argon Time-Projection Chamber in a Neutrino Beam
Authors:
DUNE Collaboration,
S. Abbaslu,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos,
M. Andreotti
, et al. (1299 additional authors not shown)
Abstract:
The 2x2 Demonstrator, a prototype for the Deep Underground Neutrino Experiment (DUNE) liquid argon (LAr) Near Detector, was exposed to the Neutrinos from the Main Injector (NuMI) neutrino beam at Fermi National Accelerator Laboratory (Fermilab). This detector prototypes a new modular design for a liquid argon time-projection chamber (LArTPC), comprised of a two-by-two array of four modules, each f…
▽ More
The 2x2 Demonstrator, a prototype for the Deep Underground Neutrino Experiment (DUNE) liquid argon (LAr) Near Detector, was exposed to the Neutrinos from the Main Injector (NuMI) neutrino beam at Fermi National Accelerator Laboratory (Fermilab). This detector prototypes a new modular design for a liquid argon time-projection chamber (LArTPC), comprised of a two-by-two array of four modules, each further segmented into two optically-isolated LArTPCs. The 2x2 Demonstrator features a number of pioneering technologies, including a low-profile resistive field shell to establish drift fields, native 3D ionization pixelated imaging, and a high-coverage dielectric light readout system. The 2.4 tonne active mass detector is flanked upstream and downstream by supplemental solid-scintillator tracking planes, repurposed from the MINERvA experiment, which track ionizing particles exiting the argon volume. The antineutrino beam data collected by the detector over a 4.5 day period in 2024 include over 30,000 neutrino interactions in the LAr active volume-the first neutrino interactions reported by a DUNE detector prototype. During its physics-quality run, the 2x2 Demonstrator operated at a nominal drift field of 500 V/cm and maintained good LAr purity, with a stable electron lifetime of approximately 1.25 ms. This paper describes the detector and supporting systems, summarizes the installation and commissioning, and presents the initial validation of collected NuMI beam and off-beam self-triggers. In addition, it highlights observed interactions in the detector volume, including candidate muon anti-neutrino events.
△ Less
Submitted 6 September, 2025;
originally announced September 2025.
-
Measurement of single charged pion production in charged-current $ν_μ$-Ar interactions with the MicroBooNE detector
Authors:
P. Abratenko,
D. Andrade Aldana,
L. Arellano,
J. Asaadi,
A. Ashkenazi,
S. Balasubramanian,
B. Baller,
A. Barnard,
G. Barr,
D. Barrow,
J. Barrow,
V. Basque,
J. Bateman,
B. Behera,
O. Benevides Rodrigues,
S. Berkman,
A. Bhat,
M. Bhattacharya,
V. Bhelande,
M. Bishai,
A. Blake,
B. Bogart,
T. Bolton,
M. B. Brunetti,
L. Camilleri
, et al. (155 additional authors not shown)
Abstract:
We present flux-averaged charged-current $ν_μ$ cross-section measurements on argon for final states containing exactly one $π^\pm$ and no other hadrons except nucleons. The analysis uses data from the MicroBooNE experiment in the Booster Neutrino Beam, corresponding to $1.11 \times 10^{21}$ protons on target. Total and single-differential cross-section measurements are provided within a phase spac…
▽ More
We present flux-averaged charged-current $ν_μ$ cross-section measurements on argon for final states containing exactly one $π^\pm$ and no other hadrons except nucleons. The analysis uses data from the MicroBooNE experiment in the Booster Neutrino Beam, corresponding to $1.11 \times 10^{21}$ protons on target. Total and single-differential cross-section measurements are provided within a phase space restricted to muon momenta above 150 MeV, pion momenta above 100 MeV, and muon-pion opening angles smaller than 2.65 rad. Differential cross sections are reported with respect to the scattering angles of the muon and pion relative to the beam direction, their momenta, and their combined opening angle. The differential cross section with respect to muon momentum is based on a subset of selected events with the muon track fully contained in the detector, whereas the cross section with respect to pion momentum is based on a subset of selected events rich in pions that have not hadronically scattered on the argon before coming to rest. The latter has not been measured on argon before. The total cross section is measured as $(3.75~\pm~0.07~\textrm{(stat.)}~\pm~0.80~\textrm{(syst.)}) \times 10^{-38} \, \text{cm}^2/\text{Ar}$ at a mean energy of approximately 0.8 GeV. Comparisons of the measured cross sections with predictions from multiple neutrino-nucleus interaction generators show good overall agreement, except at very forward muon angles.
△ Less
Submitted 3 September, 2025;
originally announced September 2025.
-
Deep Learning for Crack Detection: A Review of Learning Paradigms, Generalizability, and Datasets
Authors:
Xinan Zhang,
Haolin Wang,
Yung-An Hsieh,
Zhongyu Yang,
Anthony Yezzi,
Yi-Chang Tsai
Abstract:
Crack detection plays a crucial role in civil infrastructures, including inspection of pavements, buildings, etc., and deep learning has significantly advanced this field in recent years. While numerous technical and review papers exist in this domain, emerging trends are reshaping the landscape. These shifts include transitions in learning paradigms (from fully supervised learning to semi-supervi…
▽ More
Crack detection plays a crucial role in civil infrastructures, including inspection of pavements, buildings, etc., and deep learning has significantly advanced this field in recent years. While numerous technical and review papers exist in this domain, emerging trends are reshaping the landscape. These shifts include transitions in learning paradigms (from fully supervised learning to semi-supervised, weakly-supervised, unsupervised, few-shot, domain adaptation and fine-tuning foundation models), improvements in generalizability (from single-dataset performance to cross-dataset evaluation), and diversification in dataset acquisition (from RGB images to specialized sensor-based data). In this review, we systematically analyze these trends and highlight representative works. Additionally, we introduce a new annotated dataset collected with 3D laser scans, 3DCrack, to support future research and conduct extensive benchmarking experiments to establish baselines for commonly used deep learning methodologies, including recent foundation models. Our findings provide insights into the evolving methodologies and future directions in deep learning-based crack detection. Project page: https://github.com/nantonzhang/Awesome-Crack-Detection
△ Less
Submitted 16 September, 2025; v1 submitted 13 August, 2025;
originally announced August 2025.
-
Widest Path Games and Maximality Inheritance in Bounded Value Iteration for Stochastic Games
Authors:
Kittiphon Phalakarn,
Yun Chen Tsai,
Ichiro Hasuo
Abstract:
For model checking stochastic games (SGs), bounded value iteration (BVI) algorithms have gained attention as efficient approximate methods with rigorous precision guarantees. However, BVI may not terminate or converge when the target SG contains end components. Most existing approaches address this issue by explicitly detecting and processing end components--a process that is often computationally…
▽ More
For model checking stochastic games (SGs), bounded value iteration (BVI) algorithms have gained attention as efficient approximate methods with rigorous precision guarantees. However, BVI may not terminate or converge when the target SG contains end components. Most existing approaches address this issue by explicitly detecting and processing end components--a process that is often computationally expensive. An exception is the widest path-based BVI approach previously studied by Phalakarn et al., which we refer to as 1WP-BVI. The method performs particularly well in the presence of numerous end components. Nonetheless, its theoretical foundations remain somewhat ad hoc. In this paper, we identify and formalize the core principles underlying the widest path-based BVI approach by (i) presenting 2WP-BVI, a clean BVI algorithm based on (2-player) widest path games, and (ii) proving its correctness using what we call the maximality inheritance principle--a proof principle previously employed in a well-known result in probabilistic model checking. Our experimental results demonstrate the practical relevance and potential of our proposed 2WP-BVI algorithm.
△ Less
Submitted 8 August, 2025;
originally announced August 2025.
-
Benchmarking Quantum and Classical Sequential Models for Urban Telecommunication Forecasting
Authors:
Chi-Sheng Chen,
Samuel Yen-Chi Chen,
Yun-Cheng Tsai
Abstract:
In this study, we evaluate the performance of classical and quantum-inspired sequential models in forecasting univariate time series of incoming SMS activity (SMS-in) using the Milan Telecommunication Activity Dataset. Due to data completeness limitations, we focus exclusively on the SMS-in signal for each spatial grid cell. We compare five models, LSTM (baseline), Quantum LSTM (QLSTM), Quantum Ad…
▽ More
In this study, we evaluate the performance of classical and quantum-inspired sequential models in forecasting univariate time series of incoming SMS activity (SMS-in) using the Milan Telecommunication Activity Dataset. Due to data completeness limitations, we focus exclusively on the SMS-in signal for each spatial grid cell. We compare five models, LSTM (baseline), Quantum LSTM (QLSTM), Quantum Adaptive Self-Attention (QASA), Quantum Receptance Weighted Key-Value (QRWKV), and Quantum Fast Weight Programmers (QFWP), under varying input sequence lengths (4, 8, 12, 16, 32 and 64). All models are trained to predict the next 10-minute SMS-in value based solely on historical values within a given sequence window. Our findings indicate that different models exhibit varying sensitivities to sequence length, suggesting that quantum enhancements are not universally advantageous. Rather, the effectiveness of quantum modules is highly dependent on the specific task and architectural design, reflecting inherent trade-offs among model size, parameterization strategies, and temporal modeling capabilities.
△ Less
Submitted 22 September, 2025; v1 submitted 6 August, 2025;
originally announced August 2025.
-
GoldMind: A Teacher-Centered Knowledge Management System for Higher Education -- Lessons from Iterative Design
Authors:
Gloria Fernández-Nieto,
Lele Sha,
Yuheng Li,
Yi-Shan Tsai,
Guanliang Chen,
Yinwei Wei,
Weiqing Wang,
Jinchun Wen,
Shaveen Singh,
Ivan Silva,
Yuanfang Li,
Dragan Gasěvić,
Zachari Swiecki
Abstract:
Designing Knowledge Management Systems (KMSs) for higher education requires addressing complex human-technology interactions, especially where staff turnover and changing roles create ongoing challenges for reusing knowledge. While advances in process mining and Generative AI enable new ways of designing features to support knowledge management, existing KMSs often overlook the realities of educat…
▽ More
Designing Knowledge Management Systems (KMSs) for higher education requires addressing complex human-technology interactions, especially where staff turnover and changing roles create ongoing challenges for reusing knowledge. While advances in process mining and Generative AI enable new ways of designing features to support knowledge management, existing KMSs often overlook the realities of educators' workflows, leading to low adoption and limited impact. This paper presents findings from a two-year human-centred design study with 108 higher education teachers, focused on the iterative co-design and evaluation of GoldMind, a KMS supporting in-the-flow knowledge management during digital teaching tasks. Through three design-evaluation cycles, we examined how teachers interacted with the system and how their feedback informed successive refinements. Insights are synthesised across three themes: (1) Technology Lessons from user interaction data, (2) Design Considerations shaped by co-design and usability testing, and (3) Human Factors, including cognitive load and knowledge behaviours, analysed using Epistemic Network Analysis.
△ Less
Submitted 23 August, 2025; v1 submitted 6 August, 2025;
originally announced August 2025.
-
Capturing and Sharing Know-How through Visual Process Representations: A Human-Centred Approach to Teacher Workflows
Authors:
Gloria Fernández-Nieto,
Vanessa Echeverria,
Yuheng Li,
Yi-Shan Tsai,
Lele Sha,
Guanliang Chen,
Dragan Gasevic,
Zachari Swiecki
Abstract:
Knowledge Management is crucial for capturing and transferring expertise within universities, especially in high staff turnover contexts where expertise loss disrupts teaching. Documenting teachers' workflows is time-intensive and diverts experts from core responsibilities. Sequential Pattern Mining (SPM) leverages log data to identify expert workflows, offering an automated alternative to represe…
▽ More
Knowledge Management is crucial for capturing and transferring expertise within universities, especially in high staff turnover contexts where expertise loss disrupts teaching. Documenting teachers' workflows is time-intensive and diverts experts from core responsibilities. Sequential Pattern Mining (SPM) leverages log data to identify expert workflows, offering an automated alternative to represent workflows but requiring transformation into intuitive formats for novice educators. This paper introduces Visual Process Representations (VPR), a design approach combining SPM, Knowledge Management processes, and storytelling techniques to convert expert log data into clear visualisations. We detail the design phases and report a study evaluating visual affordances (text lists vs. pictorial-style) and teachers' perceptions of four versions of the VPR with 160 higher teachers on Prolific. Results indicate improved task performance, usability, and engagement, particularly with enriched visuals, though process memorability and task time improvements were limited. The findings highlight VPR's potential to visualise workflows and support novice educators.
△ Less
Submitted 6 August, 2025;
originally announced August 2025.
-
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Authors:
Shuangkang Fang,
I-Chao Shen,
Yufeng Wang,
Yi-Hsuan Tsai,
Yi Yang,
Shuchang Zhou,
Wenrui Ding,
Takeo Igarashi,
Ming-Hsuan Yang
Abstract:
We present MeshLLM, a novel framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes. Our approach addresses key limitations in existing methods, including the limited dataset scale when catering to LLMs' token length and the loss of 3D structural information during mesh serialization. We introduce a Primitive-Mesh decomposition strategy, which div…
▽ More
We present MeshLLM, a novel framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes. Our approach addresses key limitations in existing methods, including the limited dataset scale when catering to LLMs' token length and the loss of 3D structural information during mesh serialization. We introduce a Primitive-Mesh decomposition strategy, which divides 3D meshes into structurally meaningful subunits. This enables the creation of a large-scale dataset with 1500k+ samples, almost 50 times larger than previous methods, which aligns better with the LLM scaling law principles. Furthermore, we propose inferring face connectivity from vertices and local mesh assembly training strategies, significantly enhancing the LLMs' ability to capture mesh topology and spatial structures. Experiments show that MeshLLM outperforms the state-of-the-art LLaMA-Mesh in both mesh generation quality and shape understanding, highlighting its great potential in processing text-serialized 3D meshes.
△ Less
Submitted 5 August, 2025; v1 submitted 2 August, 2025;
originally announced August 2025.
-
Semi-Classical Asymptotic Expansions for Toeplitz Quantizations on Complex Manifolds and Orbifolds
Authors:
Yi-Hsin Tsai
Abstract:
In this thesis, we introduce complex manifolds with local spectral gaps and study their asymptotic behavior using the scaling method. With these asymptotics, we obtain an asymptotic expansion for the Bergman kernel of a Hermitian holomorphic orbifold line bundle satisfying the local spectral gap condition. Furthermore, we establish the full asymptotic expansion of both the Bergman kernel and the T…
▽ More
In this thesis, we introduce complex manifolds with local spectral gaps and study their asymptotic behavior using the scaling method. With these asymptotics, we obtain an asymptotic expansion for the Bergman kernel of a Hermitian holomorphic orbifold line bundle satisfying the local spectral gap condition. Furthermore, we establish the full asymptotic expansion of both the Bergman kernel and the Toeplitz operator, using the observations of the scaled Bergman kernel and the stationary phase formula. In addition, we establish the deformation quantization for Toeplitz operators with pseudodifferential operators.
△ Less
Submitted 19 July, 2025;
originally announced August 2025.
-
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Authors:
HunyuanWorld Team,
Zhenwei Wang,
Yuhao Liu,
Junta Wu,
Zixiao Gu,
Haoyuan Wang,
Xuhui Zuo,
Tianyu Huang,
Wenhuan Li,
Sheng Zhang,
Yihang Lian,
Yulin Tsai,
Lifu Wang,
Sicong Liu,
Puhua Jiang,
Xianghui Yang,
Dongyuan Guo,
Yixuan Tang,
Xinyue Mao,
Jiaao Yu,
Junlin Yu,
Jihong Zhang,
Meng Chen,
Liang Dong,
Yiwen Jia
, et al. (30 additional authors not shown)
Abstract:
Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and mem…
▽ More
Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360° immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360° world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation.
△ Less
Submitted 13 August, 2025; v1 submitted 29 July, 2025;
originally announced July 2025.
-
On the Testing of complete causal mediation and its applications
Authors:
Yichin Tsai,
Wan-Tzu Chang,
Jia Jyun Sie,
Cathy SJ Fann,
Iebin Lian
Abstract:
The Complete Mediation Test (CMT) serves as a specialized approach of mediation analysis to assess whether an independent variable A, influences an outcome variable Y exclusively through a mediator M, without any direct effect. An application of CMT lies in Mendelian Randomization (MR) studies, where it can be used to investigate non-pleiotropy, that is, to test whether genetic variants impact a d…
▽ More
The Complete Mediation Test (CMT) serves as a specialized approach of mediation analysis to assess whether an independent variable A, influences an outcome variable Y exclusively through a mediator M, without any direct effect. An application of CMT lies in Mendelian Randomization (MR) studies, where it can be used to investigate non-pleiotropy, that is, to test whether genetic variants impact a disease outcome solely through their effect on a target exposure variable. Traditionally, CMT has relied on two significance-based criteria and a proportion-based criterion with a heuristic threshold that has not been rigorously evaluated. In this paper, we explored the theoretical properties of conventional CMT, and proposed using standardized absolute proportion of mediation (SAPM) as a criterion for CMT. We, systematically assess the performance of various CMT criteria via simulation, and demonstrate their practical utility in the context of MR studies. Our results indicate that the offers the best performance. We also propose using different optimal thresholds depending on whether the mediator and outcome are continuous or binary. The SAPM with proper thresholds ensures that the indirect pathway meaningfully accounts for the effect of the exposure on the outcome, thereby strengthening the case for complete mediation.
△ Less
Submitted 17 July, 2025;
originally announced July 2025.
-
Heterogeneous Bribery, Technology Choice, and Capital Accumulation
Authors:
Jafar M. Olimov,
Yi-Chan Tsai,
Hao-Yu Yang
Abstract:
We study the production, entry, and technological decisions of firms in the presence of bribery. We find that bribery can be justified even in the absence of bureaucratic inefficiencies. We document substantial technology-specific heterogeneity in bribery in 148 countries and incorporate it into a general equilibrium model, where firms use capital-intensive or labor-intensive technology. When brib…
▽ More
We study the production, entry, and technological decisions of firms in the presence of bribery. We find that bribery can be justified even in the absence of bureaucratic inefficiencies. We document substantial technology-specific heterogeneity in bribery in 148 countries and incorporate it into a general equilibrium model, where firms use capital-intensive or labor-intensive technology. When bribery more heavily affects less efficient labor-intensive firms, resources move toward more efficient capital-intensive firms, resulting in higher capital accumulation and aggregate output. In poorer countries, the elimination of bribery only for capital-intensive firms increases the capital stock by 18.7% more and the aggregate output by 3.4% more than the complete elimination of bribery. In wealthier countries, the elimination of bribery only for capital-intensive firms increases the capital stock by 44.4% more and the aggregate output by 15.4% more than the complete elimination of bribery. Our findings challenge the established view of bribery as uniformly harmful and demonstrate how the within-country heterogeneity in bribery can explain cross-country differences in income.
△ Less
Submitted 18 July, 2025; v1 submitted 17 July, 2025;
originally announced July 2025.
-
Quantum-Enhanced Reinforcement Learning with LSTM Forecasting Signals for Optimizing Fintech Trading Decisions
Authors:
Yen-Ku Liu,
Yun-Huei Pan,
Pei-Fan Lu,
Yun-Cheng Tsai,
Samuel Yen-Chi Chen
Abstract:
Financial trading environments are characterized by high volatility, numerous macroeconomic signals, and dynamically shifting market regimes, where traditional reinforcement learning methods often fail to deliver breakthrough performance. In this study, we design a reinforcement learning framework tailored for financial systems by integrating quantum circuits. We compare (1) the performance of cla…
▽ More
Financial trading environments are characterized by high volatility, numerous macroeconomic signals, and dynamically shifting market regimes, where traditional reinforcement learning methods often fail to deliver breakthrough performance. In this study, we design a reinforcement learning framework tailored for financial systems by integrating quantum circuits. We compare (1) the performance of classical A3C versus quantum A3C algorithms, and (2) the impact of incorporating LSTM-based predictions of the following week's economic trends on learning outcomes. The experimental framework adopts a custom Gymnasium-compatible trading environment, simulating discrete trading actions and evaluating rewards based on portfolio feedback. Experimental results show that quantum models - especially when combined with predictive signals - demonstrate superior performance and stability under noisy financial conditions, even with shallow quantum circuit depth.
△ Less
Submitted 17 July, 2025;
originally announced July 2025.
-
Spatial and Temporal Evaluations of the Liquid Argon Purity in ProtoDUNE-SP
Authors:
DUNE Collaboration,
S. Abbaslu,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos,
M. Andreotti
, et al. (1301 additional authors not shown)
Abstract:
Liquid argon time projection chambers (LArTPCs) rely on highly pure argon to ensure that ionization electrons produced by charged particles reach readout arrays. ProtoDUNE Single-Phase (ProtoDUNE-SP) was an approximately 700-ton liquid argon detector intended to prototype the Deep Underground Neutrino Experiment (DUNE) Far Detector Horizontal Drift module. It contains two drift volumes bisected by…
▽ More
Liquid argon time projection chambers (LArTPCs) rely on highly pure argon to ensure that ionization electrons produced by charged particles reach readout arrays. ProtoDUNE Single-Phase (ProtoDUNE-SP) was an approximately 700-ton liquid argon detector intended to prototype the Deep Underground Neutrino Experiment (DUNE) Far Detector Horizontal Drift module. It contains two drift volumes bisected by the cathode plane assembly, which is biased to create an almost uniform electric field in both volumes. The DUNE Far Detector modules must have robust cryogenic systems capable of filtering argon and supplying the TPC with clean liquid. This paper will explore comparisons of the argon purity measured by the purity monitors with those measured using muons in the TPC from October 2018 to November 2018. A new method is introduced to measure the liquid argon purity in the TPC using muons crossing both drift volumes of ProtoDUNE-SP. For extended periods on the timescale of weeks, the drift electron lifetime was measured to be above 30 ms using both systems. A particular focus will be placed on the measured purity of argon as a function of position in the detector.
△ Less
Submitted 27 August, 2025; v1 submitted 11 July, 2025;
originally announced July 2025.
-
Constraints on axionlike particles from 16.5 years of Fermi-LAT data and prospects for VLAST
Authors:
Zhi-Qi Guo,
Yue-Lin Sming Tsai,
Lei Wu,
Zi-Qing Xia
Abstract:
Axionlike particles (ALPs), hypothetical particles beyond the Standard Model, are considered as promising dark matter candidates. ALPs can convert into photons and vice versa in a magnetic field via the Primakoff effect, potentially generating detectable oscillation in $γ$-ray spectra. This study analyzes 16.5 years of data from the Fermi Large Area Telescope (Fermi-LAT) on NGC 1275, the brightest…
▽ More
Axionlike particles (ALPs), hypothetical particles beyond the Standard Model, are considered as promising dark matter candidates. ALPs can convert into photons and vice versa in a magnetic field via the Primakoff effect, potentially generating detectable oscillation in $γ$-ray spectra. This study analyzes 16.5 years of data from the Fermi Large Area Telescope (Fermi-LAT) on NGC 1275, the brightest galaxy in the Perseus cluster, to constrain the ALP parameter space. Our results improve the previous 95\% exclusion limits of the photon-ALP coupling $g_{aγ}$ by a factor of 2 in the ALP mass range of $4\times 10^{-10}\,\mathrm{eV}\lesssim m_{a}\lesssim 5\times 10^{-9}\,\mathrm{eV}$. Moreover, we investigate the projected sensitivity of the future Very Large Area $γ$-ray Space Telescope (VLAST) on searching for ALPs. We find that (i) the expected sensitivity on the ALP-photon coupling can be stronger than that from the upcoming International Axion Observatory (IAXO) in the ALP mass range of $2\times 10^{-11}\,\mathrm{eV}\lesssim m_{a}\lesssim 1\times 10^{-7}\,\mathrm{eV}$, with the best sensitivity of $g_{aγ}\sim 7\times 10^{-13}\,\mathrm{GeV^{-1}}$ at $m_{a}\sim 2\times 10^{-10}\,\mathrm{eV}$; (ii) VLAST can extend the sensitivity of the ALP masses below $5\times 10^{-12}\,\mathrm{eV}$, where the ALP-photon coupling $g_{aγ}\gtrsim 1.5\times 10^{-11}\,\mathrm{GeV^{-1}}$ will be excluded; (iii) the entire parameter space of ALP accounting for TeV transparency can be fully tested. These results demonstrate that VLAST will offer an excellent opportunity for ALPs searches.
△ Less
Submitted 19 October, 2025; v1 submitted 10 July, 2025;
originally announced July 2025.
-
EscherNet++: Simultaneous Amodal Completion and Scalable View Synthesis through Masked Fine-Tuning and Enhanced Feed-Forward 3D Reconstruction
Authors:
Xinan Zhang,
Muhammad Zubair Irshad,
Anthony Yezzi,
Yi-Chang Tsai,
Zsolt Kira
Abstract:
We propose EscherNet++, a masked fine-tuned diffusion model that can synthesize novel views of objects in a zero-shot manner with amodal completion ability. Existing approaches utilize multiple stages and complex pipelines to first hallucinate missing parts of the image and then perform novel view synthesis, which fail to consider cross-view dependencies and require redundant storage and computing…
▽ More
We propose EscherNet++, a masked fine-tuned diffusion model that can synthesize novel views of objects in a zero-shot manner with amodal completion ability. Existing approaches utilize multiple stages and complex pipelines to first hallucinate missing parts of the image and then perform novel view synthesis, which fail to consider cross-view dependencies and require redundant storage and computing for separate stages. Instead, we apply masked fine-tuning including input-level and feature-level masking to enable an end-to-end model with the improved ability to synthesize novel views and conduct amodal completion. In addition, we empirically integrate our model with other feed-forward image-to-mesh models without extra training and achieve competitive results with reconstruction time decreased by 95%, thanks to its ability to synthesize arbitrary query views. Our method's scalable nature further enhances fast 3D reconstruction. Despite fine-tuning on a smaller dataset and batch size, our method achieves state-of-the-art results, improving PSNR by 3.9 and Volume IoU by 0.28 on occluded tasks in 10-input settings, while also generalizing to real-world occluded reconstruction.
△ Less
Submitted 10 July, 2025;
originally announced July 2025.
-
Measurement of charged-current muon neutrino-argon interactions without pions in the final state using the MicroBooNE detector
Authors:
MicroBooNE collaboration,
P. Abratenko,
D. Andrade Aldana,
L. Arellano,
J. Asaadi,
A. Ashkenazi,
S. Balasubramanian,
B. Baller,
A. Barnard,
G. Barr,
D. Barrow,
J. Barrow,
V. Basque,
J. Bateman,
O. Benevides Rodrigues,
S. Berkman,
A. Bhat,
M. Bhattacharya,
M. Bishai,
A. Blake,
B. Bogart,
T. Bolton,
M. B. Brunetti,
L. Camilleri,
D. Caratelli
, et al. (152 additional authors not shown)
Abstract:
We report a new measurement of flux-integrated differential cross sections for charged-current (CC) muon neutrino interactions with argon nuclei that produce no final state pions $(ν_μ\mathrm{CC}0π)$. These interactions are of particular importance as a topologically defined signal dominated by quasielastic-like interactions. This measurement was performed with the MicroBooNE liquid argon time pro…
▽ More
We report a new measurement of flux-integrated differential cross sections for charged-current (CC) muon neutrino interactions with argon nuclei that produce no final state pions $(ν_μ\mathrm{CC}0π)$. These interactions are of particular importance as a topologically defined signal dominated by quasielastic-like interactions. This measurement was performed with the MicroBooNE liquid argon time projection chamber detector located at the Fermilab Booster Neutrino Beam (BNB), and uses an exposure of $1.3\times10^{21}$ protons on target collected between 2015 and 2020. The results are presented in terms of single and double-differential cross sections as a function of the final state muon momentum and angle. The data are compared with widely-used neutrino event generators. We find good agreement with the single-differential measurements, while only a subset of generators are also able to adequately describe the data in double-differential distributions. This work facilitates comparison with Cherenkov detector measurements, including those located at the BNB.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Operation of the Trigger System for the ICARUS Detector at Fermilab
Authors:
ICARUS collaboration,
F. Abd Alrahman,
P. Abratenko,
N. Abrego-Martinez,
A. Aduszkiewicz,
F. Akbar,
L. Aliaga Soplin,
M. Artero Pons,
J. Asaadi,
W. F. Badgett,
B. Baibussinov,
F. Battisti,
V. Bellini,
R. Benocci,
J. Berger,
S. Berkman,
S. Bertolucci,
M. Betancourt,
A. Blanchet,
F. Boffelli,
M. Bonesini,
T. Boone,
B. Bottino,
A. Braggiotti,
D. Brailsford
, et al. (164 additional authors not shown)
Abstract:
The ICARUS liquid argon TPC detector is taking data on the Booster (BNB) and Main Injector (NuMI) Neutrino beam lines at Fermilab with a trigger system based on the scintillation light produced by charged particles in coincidence with the proton beam extraction from the accelerators. The architecture and the deployment of the trigger system in the first two runs for physics are presented, as well…
▽ More
The ICARUS liquid argon TPC detector is taking data on the Booster (BNB) and Main Injector (NuMI) Neutrino beam lines at Fermilab with a trigger system based on the scintillation light produced by charged particles in coincidence with the proton beam extraction from the accelerators. The architecture and the deployment of the trigger system in the first two runs for physics are presented, as well as the triggered event rates. The event recognition efficiency has been evaluated as a function of the deposited energy and the position of cosmic muons stopping inside the detector.
△ Less
Submitted 5 August, 2025; v1 submitted 25 June, 2025;
originally announced June 2025.
-
Self-Interacting Dark Matter with Mass Segregation: A Unified Explanation of Dwarf Cores and Small-Scale Lenses
Authors:
Daneng Yang,
Yi-Zhong Fan,
Siyuan Hou,
Yue-Lin Sming Tsai
Abstract:
In two-component SIDM models with inter-species interactions, mass segregation arises naturally from collisional relaxation, enhancing central densities and gravothermal evolution without requiring large cross sections. We propose a model with velocity-dependent interactions, both within and between species, that connects observations across several halo mass scales while remaining consistent with…
▽ More
In two-component SIDM models with inter-species interactions, mass segregation arises naturally from collisional relaxation, enhancing central densities and gravothermal evolution without requiring large cross sections. We propose a model with velocity-dependent interactions, both within and between species, that connects observations across several halo mass scales while remaining consistent with cluster-scale constraints. This combination enables modest mass segregation in low-mass and typical-concentration halos, consistent with recent dwarf galaxy clustering measurements. Using cosmological zoom-in simulations and controlled isolated halo studies, we show that this model produces dwarf galaxy cores that grow over time, explains the structure of dark perturbers observed in strong lensing systems, and significantly increases the number and efficiency of small-scale lenses, consistent with the galaxy-galaxy strong lensing excess reported in clusters. Our results establish mass segregation in two-component SIDM as a self-consistent and testable model capable of simultaneously addressing multiple small-scale challenges in structure formation.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification
Authors:
Nathaniel Pinckney,
Chenhui Deng,
Chia-Tung Ho,
Yun-Da Tsai,
Mingjie Liu,
Wenfei Zhou,
Brucek Khailany,
Haoxing Ren
Abstract:
We present the Comprehensive Verilog Design Problems (CVDP) benchmark, a new dataset and infrastructure to advance LLM and agent research in hardware design and verification. CVDP includes 783 problems across 13 task categories, covering RTL generation, verification, debugging, specification alignment, and technical Q&A authored by experienced hardware engineers. Problems are offered in both non-a…
▽ More
We present the Comprehensive Verilog Design Problems (CVDP) benchmark, a new dataset and infrastructure to advance LLM and agent research in hardware design and verification. CVDP includes 783 problems across 13 task categories, covering RTL generation, verification, debugging, specification alignment, and technical Q&A authored by experienced hardware engineers. Problems are offered in both non-agentic and agentic formats. The benchmark introduces more realistic and challenging contexts than prior work, with state-of-the-art models achieving no more than 34% pass@1 on code generation. Agentic tasks$\unicode{x2013}$especially those involving RTL reuse and verification$\unicode{x2013}$are particularly difficult. Evaluation uses open-source tools and model scoring infrastructure, with comprehension tasks assessed via BLEU and LLM-based judging. CVDP reveals substantial gaps in current model capabilities, underscoring the need for continued research toward robust, real-world hardware design automation.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Tree-Based Text Retrieval via Hierarchical Clustering in RAGFrameworks: Application on Taiwanese Regulations
Authors:
Chia-Heng Yu,
Yen-Lung Tsai
Abstract:
Traditional Retrieval-Augmented Generation (RAG) systems employ brute-force inner product search to retrieve the top-k most similar documents, then combined with the user query and passed to a language model. This allows the model to access external knowledge and reduce hallucinations. However, selecting an appropriate k value remains a significant challenge in practical applications: a small k ma…
▽ More
Traditional Retrieval-Augmented Generation (RAG) systems employ brute-force inner product search to retrieve the top-k most similar documents, then combined with the user query and passed to a language model. This allows the model to access external knowledge and reduce hallucinations. However, selecting an appropriate k value remains a significant challenge in practical applications: a small k may fail to retrieve sufficient information, while a large k can introduce excessive and irrelevant content. To address this, we propose a hierarchical clustering-based retrieval method that eliminates the need to predefine k. Our approach maintains the accuracy and relevance of system responses while adaptively selecting semantically relevant content. In the experiment stage, we applied our method to a Taiwanese legal dataset with expert-graded queries. The results show that our approach achieves superior performance in expert evaluations and maintains high precision while eliminating the need to predefine k, demonstrating improved accuracy and interpretability in legal text retrieval tasks. Our framework is simple to implement and easily integrates with existing RAG pipelines, making it a practical solution for real-world applications under limited resources.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Scrutinizing the impact of the solar modulation on AMS-02 antiproton excess
Authors:
Kai-Kai Duan,
Xiao Wang,
Wen-Hao Li,
Zhi-Hui Xu,
Yue-Lin Sming Tsai,
Yi-Zhong Fan
Abstract:
This study examines the impact of solar modulation on the antiproton excess observed by AMS-02, which may indicate dark matter (DM) annihilation. We analyze three solar modulation models: the force-field approximation (FFA), a time-, charge-, and rigidity-dependent FFA, and a three-dimensional numerical simulation based on the Parker transport equation. Based on AMS-02 latest antiproton data (2025…
▽ More
This study examines the impact of solar modulation on the antiproton excess observed by AMS-02, which may indicate dark matter (DM) annihilation. We analyze three solar modulation models: the force-field approximation (FFA), a time-, charge-, and rigidity-dependent FFA, and a three-dimensional numerical simulation based on the Parker transport equation. Based on AMS-02 latest antiproton data (2025), our results show that the significance of the DM signal is sensitive to the chosen modulation model, with a 2$σ$ signal for the FFA (4$σ$ if including data from H, He, C, O, B/C, and B/O) and a reduced significance for more complex models. We also address systematic uncertainties using two methods: the add-in-quadrature method, which assumes uncorrelated uncertainties between energy bins, and the nuisance parameter method, which treats systematic uncertainties as nuisance parameters during the fitting process. Fitted to AMS-02 antiproton data, DM annihilation to the $b\bar{b}$ scenario with three different solar modulation models shows that the add-in-quadrature method causes overfitting, whereas the nuisance parameters approach leads to underfitting. Statistically, the signal region of the FFA model using the add-in-quadrature method is the most reliable. This work highlights the need for refined solar modulation models and a better treatment of uncertainties for a conclusive interpretation of the AMS-02 data.
△ Less
Submitted 10 October, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Chance and Mass Interpretations of Probabilities in Markov Decision Processes (Extended Version)
Authors:
Yun Chen Tsai,
Kittiphon Phalakarn,
S. Akshay,
Ichiro Hasuo
Abstract:
Markov decision processes (MDPs) are a popular model for decision-making in the presence of uncertainty. The conventional view of MDPs in verification treats them as state transformers with probabilities defined over sequences of states and with schedulers making random choices. An alternative view, especially well-suited for modeling dynamical systems, defines MDPs as distribution transformers wi…
▽ More
Markov decision processes (MDPs) are a popular model for decision-making in the presence of uncertainty. The conventional view of MDPs in verification treats them as state transformers with probabilities defined over sequences of states and with schedulers making random choices. An alternative view, especially well-suited for modeling dynamical systems, defines MDPs as distribution transformers with schedulers distributing probability masses. Our main contribution is a unified semantical framework that accommodates these two views and two new ones. These four semantics of MDPs arise naturally through identifying different sources of randomness in an MDP (namely schedulers, configurations, and transitions) and providing different ways of interpreting these probabilities (called the chance and mass interpretations). These semantics are systematically unified through a mathematical construct called chance-mass (CM) classifier. As another main contribution, we study a reachability problem in each of the two new semantics, demonstrating their hardness and providing two algorithms for solving them.
△ Less
Submitted 24 July, 2025; v1 submitted 12 June, 2025;
originally announced June 2025.
-
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition
Authors:
Yun-Shao Tsai,
Yi-Cheng Lin,
Huang-Cheng Chou,
Hung-yi Lee
Abstract:
Bias in speech emotion recognition (SER) systems often stems from spurious correlations between speaker characteristics and emotional labels, leading to unfair predictions across demographic groups. Many existing debiasing methods require model-specific changes or demographic annotations, limiting their practical use. We present CO-VADA, a Confidence-Oriented Voice Augmentation Debiasing Approach…
▽ More
Bias in speech emotion recognition (SER) systems often stems from spurious correlations between speaker characteristics and emotional labels, leading to unfair predictions across demographic groups. Many existing debiasing methods require model-specific changes or demographic annotations, limiting their practical use. We present CO-VADA, a Confidence-Oriented Voice Augmentation Debiasing Approach that mitigates bias without modifying model architecture or relying on demographic information. CO-VADA identifies training samples that reflect bias patterns present in the training data and then applies voice conversion to alter irrelevant attributes and generate samples. These augmented samples introduce speaker variations that differ from dominant patterns in the data, guiding the model to focus more on emotion-relevant features. Our framework is compatible with various SER models and voice conversion tools, making it a scalable and practical solution for improving fairness in SER systems.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
ScaleRTL: Scaling LLMs with Reasoning Data and Test-Time Compute for Accurate RTL Code Generation
Authors:
Chenhui Deng,
Yun-Da Tsai,
Guan-Ting Liu,
Zhongzhi Yu,
Haoxing Ren
Abstract:
Recent advances in large language models (LLMs) have enabled near-human performance on software coding benchmarks, but their effectiveness in RTL code generation remains limited due to the scarcity of high-quality training data. While prior efforts have fine-tuned LLMs for RTL tasks, they do not fundamentally overcome the data bottleneck and lack support for test-time scaling due to their non-reas…
▽ More
Recent advances in large language models (LLMs) have enabled near-human performance on software coding benchmarks, but their effectiveness in RTL code generation remains limited due to the scarcity of high-quality training data. While prior efforts have fine-tuned LLMs for RTL tasks, they do not fundamentally overcome the data bottleneck and lack support for test-time scaling due to their non-reasoning nature. In this work, we introduce ScaleRTL, the first reasoning LLM for RTL coding that scales up both high-quality reasoning data and test-time compute. Specifically, we curate a diverse set of long chain-of-thought reasoning traces averaging 56K tokens each, resulting in a dataset of 3.5B tokens that captures rich RTL knowledge. Fine-tuning a general-purpose reasoning model on this corpus yields ScaleRTL that is capable of deep RTL reasoning. Subsequently, we further enhance the performance of ScaleRTL through a novel test-time scaling strategy that extends the reasoning process via iteratively reflecting on and self-correcting previous reasoning steps. Experimental results show that ScaleRTL achieves state-of-the-art performance on VerilogEval and RTLLM, outperforming 18 competitive baselines by up to 18.4% on VerilogEval and 12.7% on RTLLM.
△ Less
Submitted 15 July, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
ScoreRAG: A Retrieval-Augmented Generation Framework with Consistency-Relevance Scoring and Structured Summarization for News Generation
Authors:
Pei-Yun Lin,
Yen-lung Tsai
Abstract:
This research introduces ScoreRAG, an approach to enhance the quality of automated news generation. Despite advancements in Natural Language Processing and large language models, current news generation methods often struggle with hallucinations, factual inconsistencies, and lack of domain-specific expertise when producing news articles. ScoreRAG addresses these challenges through a multi-stage fr…
▽ More
This research introduces ScoreRAG, an approach to enhance the quality of automated news generation. Despite advancements in Natural Language Processing and large language models, current news generation methods often struggle with hallucinations, factual inconsistencies, and lack of domain-specific expertise when producing news articles. ScoreRAG addresses these challenges through a multi-stage framework combining retrieval-augmented generation, consistency relevance evaluation, and structured summarization. The system first retrieves relevant news documents from a vector database, maps them to complete news items, and assigns consistency relevance scores based on large language model evaluations. These documents are then reranked according to relevance, with low-quality items filtered out. The framework proceeds to generate graded summaries based on relevance scores, which guide the large language model in producing complete news articles following professional journalistic standards. Through this methodical approach, ScoreRAG aims to significantly improve the accuracy, coherence, informativeness, and professionalism of generated news articles while maintaining stability and consistency throughout the generation process. The code and demo are available at: https://github.com/peiyun2260/ScoreRAG.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
Authors:
Yuanze Lin,
Yi-Wen Chen,
Yi-Hsuan Tsai,
Ronald Clark,
Ming-Hsuan Yang
Abstract:
Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across frames. To address this limitation, we propose IllumiCraft, an end-to-end diffusion framework accepting three complementary inputs: (1) high-dynamic-range (HDR) video…
▽ More
Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across frames. To address this limitation, we propose IllumiCraft, an end-to-end diffusion framework accepting three complementary inputs: (1) high-dynamic-range (HDR) video maps for detailed lighting control; (2) synthetically relit frames with randomized illumination changes (optionally paired with a static background reference image) to provide appearance cues; and (3) 3D point tracks that capture precise 3D geometry information. By integrating the lighting, appearance, and geometry cues within a unified diffusion architecture, IllumiCraft generates temporally coherent videos aligned with user-defined prompts. It supports background-conditioned and text-conditioned video relighting and provides better fidelity than existing controllable video generation methods. Project Page: https://yuanze-lin.me/IllumiCraft_page
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Enhancing Interpretability of Quantum-Assisted Blockchain Clustering via AI Agent-Based Qualitative Analysis
Authors:
Yun-Cheng Tsai,
Yen-Ku Liu,
Samuel Yen-Chi Chen
Abstract:
Blockchain transaction data is inherently high dimensional, noisy, and entangled, posing substantial challenges for traditional clustering algorithms. While quantum enhanced clustering models have demonstrated promising performance gains, their interpretability remains limited, restricting their application in sensitive domains such as financial fraud detection and blockchain governance. To addres…
▽ More
Blockchain transaction data is inherently high dimensional, noisy, and entangled, posing substantial challenges for traditional clustering algorithms. While quantum enhanced clustering models have demonstrated promising performance gains, their interpretability remains limited, restricting their application in sensitive domains such as financial fraud detection and blockchain governance. To address this gap, we propose a two stage analysis framework that synergistically combines quantitative clustering evaluation with AI Agent assisted qualitative interpretation. In the first stage, we employ classical clustering methods and evaluation metrics including the Silhouette Score, Davies Bouldin Index, and Calinski Harabasz Index to determine the optimal cluster count and baseline partition quality. In the second stage, we integrate an AI Agent to generate human readable, semantic explanations of clustering results, identifying intra cluster characteristics and inter cluster relationships. Our experiments reveal that while fully trained Quantum Neural Networks (QNN) outperform random Quantum Features (QF) in quantitative metrics, the AI Agent further uncovers nuanced differences between these methods, notably exposing the singleton cluster phenomenon in QNN driven models. The consolidated insights from both stages consistently endorse the three cluster configuration, demonstrating the practical value of our hybrid approach. This work advances the interpretability frontier in quantum assisted blockchain analytics and lays the groundwork for future autonomous AI orchestrated clustering frameworks.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Low-Rank Head Avatar Personalization with Registers
Authors:
Sai Tanmay Reddy Chakkera,
Aggelina Chatziagapi,
Md Moniruzzaman,
Chen-Ping Yu,
Yi-Hsuan Tsai,
Dimitris Samaras
Abstract:
We introduce a novel method for low-rank personalization of a generic model for head avatar generation. Prior work proposes generic models that achieve high-quality face animation by leveraging large-scale datasets of multiple identities. However, such generic models usually fail to synthesize unique identity-specific details, since they learn a general domain prior. To adapt to specific subjects,…
▽ More
We introduce a novel method for low-rank personalization of a generic model for head avatar generation. Prior work proposes generic models that achieve high-quality face animation by leveraging large-scale datasets of multiple identities. However, such generic models usually fail to synthesize unique identity-specific details, since they learn a general domain prior. To adapt to specific subjects, we find that it is still challenging to capture high-frequency facial details via popular solutions like low-rank adaptation (LoRA). This motivates us to propose a specific architecture, a Register Module, that enhances the performance of LoRA, while requiring only a small number of parameters to adapt to an unseen identity. Our module is applied to intermediate features of a pre-trained model, storing and re-purposing information in a learnable 3D feature space. To demonstrate the efficacy of our personalization method, we collect a dataset of talking videos of individuals with distinctive facial details, such as wrinkles and tattoos. Our approach faithfully captures unseen faces, outperforming existing methods quantitatively and qualitatively. We will release the code, models, and dataset to the public.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
EgoVIS@CVPR: What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Authors:
Chi-Hsi Kung,
Frangil Ramirez,
Juhyung Ha,
Yi-Ting Chen,
David Crandall,
Yi-Hsuan Tsai
Abstract:
Understanding a procedural activity requires modeling both how action steps transform the scene, and how evolving scene transformations can influence the sequence of action steps, even those that are accidental or erroneous. Yet, existing work on procedure-aware video representations fails to explicitly learned the state changes (scene transformations). In this work, we study procedure-aware video…
▽ More
Understanding a procedural activity requires modeling both how action steps transform the scene, and how evolving scene transformations can influence the sequence of action steps, even those that are accidental or erroneous. Yet, existing work on procedure-aware video representations fails to explicitly learned the state changes (scene transformations). In this work, we study procedure-aware video representation learning by incorporating state-change descriptions generated by LLMs as supervision signals for video encoders. Moreover, we generate state-change counterfactuals that simulate hypothesized failure outcomes, allowing models to learn by imagining the unseen ``What if'' scenarios. This counterfactual reasoning facilitates the model's ability to understand the cause and effect of each step in an activity. To verify the procedure awareness of our model, we conduct extensive experiments on procedure-aware tasks, including temporal action segmentation, error detection, and more. Our results demonstrate the effectiveness of the proposed state-change descriptions and their counterfactuals, and achieve significant improvements on multiple tasks.
△ Less
Submitted 26 September, 2025; v1 submitted 30 May, 2025;
originally announced June 2025.
-
Robust and Annotation-Free Wound Segmentation on Noisy Real-World Pressure Ulcer Images: Towards Automated DESIGN-R\textsuperscript{\textregistered} Assessment
Authors:
Yun-Cheng Tsai
Abstract:
Purpose: Accurate wound segmentation is essential for automated DESIGN-R scoring. However, existing models such as FUSegNet, which are trained primarily on foot ulcer datasets, often fail to generalize to wounds on other body sites.
Methods: We propose an annotation-efficient pipeline that combines a lightweight YOLOv11n-based detector with the pre-trained FUSegNet segmentation model. Instead of…
▽ More
Purpose: Accurate wound segmentation is essential for automated DESIGN-R scoring. However, existing models such as FUSegNet, which are trained primarily on foot ulcer datasets, often fail to generalize to wounds on other body sites.
Methods: We propose an annotation-efficient pipeline that combines a lightweight YOLOv11n-based detector with the pre-trained FUSegNet segmentation model. Instead of relying on pixel-level annotations or retraining for new anatomical regions, our method achieves robust performance using only 500 manually labeled bounding boxes. This zero fine-tuning approach effectively bridges the domain gap and enables direct deployment across diverse wound types. This is an advance not previously demonstrated in the wound segmentation literature.
Results: Evaluated on three real-world test sets spanning foot, sacral, and trochanter wounds, our YOLO plus FUSegNet pipeline improved mean IoU by 23 percentage points over vanilla FUSegNet and increased end-to-end DESIGN-R size estimation accuracy from 71 percent to 94 percent (see Table 3 for details).
Conclusion: Our pipeline generalizes effectively across body sites without task-specific fine-tuning, demonstrating that minimal supervision, with 500 annotated ROIs, is sufficient for scalable, annotation-light wound segmentation. This capability paves the way for real-world DESIGN-R automation, reducing reliance on pixel-wise labeling, streamlining documentation workflows, and supporting objective and consistent wound scoring in clinical practice. We will publicly release the trained detector weights and configuration to promote reproducibility and facilitate downstream deployment.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.