Skip to main content

Showing 201–250 of 43,810 results for author: Huang

.
  1. arXiv:2506.11130  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

    Authors: Cheng-Kang Chou, Chan-Jan Hsu, Ho-Lam Chung, Liang-Hsuan Tseng, Hsi-Chun Cheng, Yu-Kuan Fu, Kuan Po Huang, Hung-Yi Lee

    Abstract: We propose a self-refining framework that enhances ASR performance with only unlabeled datasets. The process starts with an existing ASR model generating pseudo-labels on unannotated speech, which are then used to train a high-fidelity text-to-speech (TTS) system. Then, synthesized speech text pairs are bootstrapped into the original ASR system, completing the closed-loop self-improvement cycle. W… ▽ More

    Submitted 16 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2506.11127  [pdf, ps, other

    cs.CL cs.AI

    GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions

    Authors: Wenkang Han, Zhixiong Zeng, Jing Huang, Shu Jiang, Liming Zheng, Longrong Yang, Haibo Qiu, Chang Yao, Jingyuan Chen, Lin Ma

    Abstract: Autonomous agents for Graphical User Interfaces (GUIs) are revolutionizing human-computer interaction, yet their reliance on text-based instructions imposes limitations on accessibility and convenience, particularly in hands-free scenarios. To address this gap, we propose GUIRoboTron-Speech, the first end-to-end autonomous GUI agent that directly accepts speech instructions and on-device screensho… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  3. arXiv:2506.11121  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR

    Authors: Wei-Ping Huang, Guan-Ting Lin, Hung-yi Lee

    Abstract: Despite progress in end-to-end ASR, real-world domain mismatches still cause performance drops, which Test-Time Adaptation (TTA) aims to mitigate by adjusting models during inference. Recent work explores combining TTA with external language models, using techniques like beam search rescoring or generative error correction. In this work, we identify a previously overlooked challenge: TTA can inter… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  4. arXiv:2506.11106  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Graph-based RAG Enhancement via Global Query Disambiguation and Dependency-Aware Reranking

    Authors: Ningyuan Li, Junrui Liu, Yi Shan, Minghui Huang, Tong Li

    Abstract: Contemporary graph-based retrieval-augmented generation (RAG) methods typically begin by extracting entities from user queries and then leverage pre-constructed knowledge graphs to retrieve related relationships and metadata. However, this pipeline's exclusive reliance on entity-level extraction can lead to the misinterpretation or omission of latent yet critical information and relations. As a re… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  5. arXiv:2506.11104  [pdf, ps, other

    cs.CL cs.AI

    DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration

    Authors: Hanzhi Zhang, Heng Fan, Kewei Sha, Yan Huang, Yunhe Feng

    Abstract: Long-context understanding is crucial for many NLP applications, yet transformers struggle with efficiency due to the quadratic complexity of self-attention. Sparse attention methods alleviate this cost but often impose static, predefined masks, failing to capture heterogeneous attention patterns. This results in suboptimal token interactions, limiting adaptability and retrieval accuracy in long-s… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  6. arXiv:2506.11094  [pdf, ps, other

    cs.CL cs.AI cs.CR

    The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs

    Authors: Songyang Liu, Chaozhuo Li, Jiameng Qiu, Xi Zhang, Feiran Huang, Litian Zhang, Yiming Hei, Philip S. Yu

    Abstract: With the rapid advancement of artificial intelligence technology, Large Language Models (LLMs) have demonstrated remarkable potential in the field of Natural Language Processing (NLP), including areas such as content generation, human-computer interaction, machine translation, and code generation, among others. However, their widespread deployment has also raised significant safety concerns. In re… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 21 pages, preprint

  7. arXiv:2506.11073  [pdf, ps, other

    cs.CL cs.AI cs.CV

    CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention

    Authors: Zekai Ye, Qiming Li, Xiaocheng Feng, Libo Qin, Yichong Huang, Baohang Li, Kui Jiang, Yang Xiang, Zhirui Zhang, Yunfei Lu, Duyu Tang, Dandan Tu, Bing Qin

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal abilities but remain prone to multilingual object hallucination, with a higher likelihood of generating responses inconsistent with the visual input when utilizing queries in non-English languages compared to English. Most existing approaches to address these rely on pretraining or fine-tuning, which are resource-intensiv… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: ACL2025 Main

  8. arXiv:2506.11050  [pdf, ps, other

    cs.LG

    NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs

    Authors: Zhaoge Bi, Linghan Huang, Haolin Jin, Qingwen Zeng, Huaming Chen

    Abstract: Electricity price forecasting is a critical component of modern energy-management systems, yet existing approaches heavily rely on numerical histories and ignore contemporaneous textual signals. We introduce NSW-EPNews, the first benchmark that jointly evaluates time-series models and large language models (LLMs) on real-world electricity-price prediction. The dataset includes over 175,000 half-ho… ▽ More

    Submitted 21 May, 2025; originally announced June 2025.

    Comments: 9 pages' main texts. Submitted to NeurIPS 2025 Datasets and Benchmarks Track

  9. arXiv:2506.11041  [pdf, ps, other

    cs.LG

    ChemHGNN: A Hierarchical Hypergraph Neural Network for Reaction Virtual Screening and Discovery

    Authors: Xiaobao Huang, Yihong Ma, Anjali Gurajapu, Jules Schleinitz, Zhichun Guo, Sarah E. Reisman, Nitesh V. Chawla

    Abstract: Reaction virtual screening and discovery are fundamental challenges in chemistry and materials science, where traditional graph neural networks (GNNs) struggle to model multi-reactant interactions. In this work, we propose ChemHGNN, a hypergraph neural network (HGNN) framework that effectively captures high-order relationships in reaction networks. Unlike GNNs, which require constructing complete… ▽ More

    Submitted 21 May, 2025; originally announced June 2025.

  10. arXiv:2506.10981  [pdf, ps, other

    cs.CV

    SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis

    Authors: Weiliang Chen, Jiayi Bi, Yuanhui Huang, Wenzhao Zheng, Yueqi Duan

    Abstract: Generative models have gained significant attention in novel view synthesis (NVS) by alleviating the reliance on dense multi-view captures. However, existing methods typically fall into a conventional paradigm, where generative models first complete missing areas in 2D, followed by 3D recovery techniques to reconstruct the scene, which often results in overly smooth surfaces and distorted geometry… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  11. arXiv:2506.10966  [pdf, ps, other

    cs.RO

    GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

    Authors: Ning Gao, Yilun Chen, Shuai Yang, Xinyi Chen, Yang Tian, Hao Li, Haifeng Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang

    Abstract: Robotic manipulation in real-world settings remains challenging, especially regarding robust generalization. Existing simulation platforms lack sufficient support for exploring how policies adapt to varied instructions and scenarios. Thus, they lag behind the growing interest in instruction-following foundation models like LLMs, whose adaptability is crucial yet remains underexplored in fair compa… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  12. arXiv:2506.10965  [pdf, ps, other

    cond-mat.str-el cond-mat.mes-hall

    Apparent inconsistency between Streda formula and Hall conductivity in reentrant integer quantum anomalous Hall effect in twisted MoTe$_2$

    Authors: Yi Huang, Seth Musser, Jihang Zhu, Yang-Zhi Chou, Sankar Das Sarma

    Abstract: Recent experiments in twisted bilayer MoTe$_2$ (tMoTe$_2$) have uncovered a rich landscape of correlated phases. In this work, we investigate the reentrant integer quantum anomalous Hall (RIQAH) states reported in F. Xu, et. al., arXiv:2504.06972 which displays a notable mismatch between the Hall conductivity measured via transport and that inferred from the Streda formula. We argue that this disc… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 8 pages, 3 figures

  13. arXiv:2506.10962  [pdf, ps, other

    cs.CV cs.AI cs.LG

    SpectralAR: Spectral Autoregressive Visual Generation

    Authors: Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, Jiwen Lu

    Abstract: Autoregressive visual generation has garnered increasing attention due to its scalability and compatibility with other modalities compared with diffusion models. Most existing methods construct visual sequences as spatial patches for autoregressive generation. However, image patches are inherently parallel, contradicting the causal nature of autoregressive modeling. To address this, we propose a S… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Project Page: https://huang-yh.github.io/spectralar/

  14. arXiv:2506.10937  [pdf, ps, other

    cond-mat.mtrl-sci hep-lat

    Physics-informed Machine Learning Analysis for Nanoscale Grain Mapping by Synchrotron Laue Microdiffraction

    Authors: Ka Hung Chan, Xinyue Huang, Nobumichi Tamura, Xian Chen

    Abstract: Understanding the grain morphology, orientation distribution, and crystal structure of nanocrystals is essential for optimizing the mechanical and physical properties of functional materials. Synchrotron X-ray Laue microdiffraction is a powerful technique for characterizing crystal structures and orientation mapping using focused X-rays. However, when grain sizes are smaller than the beam size, mi… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 8 pages, 5 figures

  15. arXiv:2506.10915  [pdf, ps, other

    cs.CV cs.AI cs.LG

    M4V: Multi-Modal Mamba for Text-to-Video Generation

    Authors: Jiancheng Huang, Gengwei Zhang, Zequn Jie, Siyu Jiao, Yinlong Qian, Ling Chen, Yunchao Wei, Lin Ma

    Abstract: Text-to-video generation has significantly enriched content creation and holds the potential to evolve into powerful world simulators. However, modeling the vast spatiotemporal space remains computationally demanding, particularly when employing Transformers, which incur quadratic complexity in sequence processing and thus limit practical applications. Recent advancements in linear-time sequence m… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  16. arXiv:2506.10887  [pdf, ps, other

    cs.CL cs.LG

    Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

    Authors: Yixiao Huang, Hanlin Zhu, Tianyu Guo, Jiantao Jiao, Somayeh Sojoudi, Michael I. Jordan, Stuart Russell, Song Mei

    Abstract: Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoni… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  17. arXiv:2506.10857  [pdf, ps, other

    cs.CV cs.AI cs.MM

    VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

    Authors: Jiashuo Yu, Yue Wu, Meng Chu, Zhifei Ren, Zizheng Huang, Pei Chu, Ruijie Zhang, Yinan He, Qirui Li, Songze Li, Zhenxiang Li, Zhongying Tu, Conghui He, Yu Qiao, Yali Wang, Yi Wang, Limin Wang

    Abstract: We present VRBench, the first long narrative video benchmark crafted for evaluating large models' multi-step reasoning capabilities, addressing limitations in existing evaluations that overlook temporal reasoning and procedural validity. It comprises 1,010 long videos (with an average duration of 1.6 hours), along with 9,468 human-labeled multi-step question-answering pairs and 30,292 reasoning st… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Technical Report

  18. arXiv:2506.10754  [pdf, ps, other

    cs.SD cs.AI eess.AS

    BNMusic: Blending Environmental Noises into Personalized Music

    Authors: Chi Zuo, Martin B. Møller, Pablo Martínez-Nuevo, Huayang Huang, Yu Wu, Ye Zhu

    Abstract: While being disturbed by environmental noises, the acoustic masking technique is a conventional way to reduce the annoyance in audio engineering that seeks to cover up the noises with other dominant yet less intrusive sounds. However, misalignment between the dominant sound and the noise-such as mismatched downbeats-often requires an excessive volume increase to achieve effective masking. Motivate… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  19. arXiv:2506.10730  [pdf, ps, other

    cs.CV

    IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain

    Authors: Hong Huang, Weixiang Sun, Zhijian Wu, Jingwen Niu, Donghuan Lu, Xian Wu, Yefeng Zheng

    Abstract: Recently, the rapid advancements of vision-language models, such as CLIP, leads to significant progress in zero-/few-shot anomaly detection (ZFSAD) tasks. However, most existing CLIP-based ZFSAD methods commonly assume prior knowledge of categories and rely on carefully crafted prompts tailored to specific scenarios. While such meticulously designed text prompts effectively capture semantic inform… ▽ More

    Submitted 20 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  20. arXiv:2506.10692  [pdf, ps, other

    quant-ph

    Observation of High-Order Quantum Pancharatnam-Berry Phase with Structured Photons

    Authors: Shuang-Yin Huang, He Jiang, Zhi-Cheng Ren, Zi-Mo Cheng, Wen-Zheng Zhu, Jing Gao, Chang Liu, Xi-Lin Wang, Hui-Tian Wang

    Abstract: When a quantum system evolves so that it returns to its initial state, it will acquire a geometric phase acting as a memory of the transformation of a physical system, which has been experimentally measured in a variety of physical systems. In optics, the most prominent example is the Pancharatnam-Berry (PB) phase. Recent technological advances in phase and polarization structure have led to the d… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 6 pages, 4 figures, Accepted by Fundamental Research

  21. arXiv:2506.10639  [pdf, ps, other

    cs.CV

    GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning

    Authors: Xiaoyi Bao, Jindi Lv, Xiaofeng Wang, Zheng Zhu, Xinze Chen, YuKun Zhou, Jiancheng Lv, Xingang Wang, Guan Huang

    Abstract: Recent progress in diffusion models has greatly enhanced video generation quality, yet these models still require fine-tuning to improve specific dimensions like instance preservation, motion rationality, composition, and physical plausibility. Existing fine-tuning approaches often rely on human annotations and large-scale computational resources, limiting their practicality. In this work, we prop… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  22. arXiv:2506.10594  [pdf, ps, other

    cs.CV

    Hierarchical Error Assessment of CAD Models for Aircraft Manufacturing-and-Measurement

    Authors: Jin Huang, Honghua Chen, Mingqiang Wei

    Abstract: The most essential feature of aviation equipment is high quality, including high performance, high stability and high reliability. In this paper, we propose a novel hierarchical error assessment framework for aircraft CAD models within a manufacturing-and-measurement platform, termed HEA-MM. HEA-MM employs structured light scanners to obtain comprehensive 3D measurements of manufactured workpieces… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  23. arXiv:2506.10592  [pdf

    cond-mat.mes-hall physics.app-ph

    A Taylor Series Approximation Model for Characterizing the Output Resistance of a GFET

    Authors: Xiomara Ribero-Figueroa, Anibal Pacheco-Sanchez, Tzu-Jung Huang, David Jiménez, Ivan Puchades, Reydezel Torres-Torres

    Abstract: The mobility-degradation-based model for the drain-to-source or output resistance of a graphene field-effect-transistor is linearized here using a Taylor series approximation. This simplification is shown to be valid from magnitudes of the gate voltage not significantly higher than the Dirac voltage, and it enables the analytical determination of the transconductance parameter, the voltage related… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Journal ref: IEEE Transactions on Electron Devices, vol. 71, no. 11, pp. 7204-7207, Nov. 2024

  24. arXiv:2506.10580  [pdf, ps, other

    cs.GR cs.CV

    Transformer IMU Calibrator: Dynamic On-body IMU Calibration for Inertial Motion Capture

    Authors: Chengxu Zuo, Jiawei Huang, Xiao Jiang, Yuan Yao, Xiangren Shi, Rui Cao, Xinyu Yi, Feng Xu, Shihui Guo, Yipeng Qin

    Abstract: In this paper, we propose a novel dynamic calibration method for sparse inertial motion capture systems, which is the first to break the restrictive absolute static assumption in IMU calibration, i.e., the coordinate drift RG'G and measurement offset RBS remain constant during the entire motion, thereby significantly expanding their application scenarios. Specifically, we achieve real-time estimat… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted by SIGGRAPH 2025 (TOG)

  25. arXiv:2506.10521  [pdf, ps, other

    cs.AI cs.CL

    Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

    Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv , et al. (2 additional authors not shown)

    Abstract: Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific benchmarks, scientific Multimodal Large Language Models (MLLMs) hold the potential to significantly enhance this discovery process in realistic workflows. However, current scientific benchmarks mostly focus on ev… ▽ More

    Submitted 12 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 82 pages

  26. arXiv:2506.10520  [pdf, ps, other

    cs.IR cs.LG

    Macro Graph of Experts for Billion-Scale Multi-Task Recommendation

    Authors: Hongyu Yao, Zijin Hong, Hao Chen, Yuanchen Bei, Zhiqing Li, Qijie Shen, Zuobin Ying, Huan Gong, Feiran Huang

    Abstract: Graph-based multi-task learning at billion-scale presents a significant challenge, as different tasks correspond to distinct billion-scale graphs. Traditional multi-task learning methods often neglect these graph structures, relying solely on individual user and item embeddings. However, disregarding graph structures overlooks substantial potential for improving performance. In this paper, we intr… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  27. arXiv:2506.10511  [pdf, ps, other

    math.PR

    Uniqueness and dimension for the geodesic of the critical long-range percolation metric

    Authors: Jian Ding, Zherui Fan, Lu-Jing Huang

    Abstract: By recent works of Bäumler [2] and of the authors of this paper [5], the (limiting) random metric for the critical long-range percolation was constructed. In this paper, we prove the uniqueness of the geodesic between two fixed points, for which an important ingredient of independent interest is the continuity of the metric distribution. In addition, we establish the Hausdorff dimension of the geo… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 18 pages, 5 figures

    MSC Class: 60K35; 82B27; 82B43

  28. arXiv:2506.10508  [pdf, other

    cs.CL cs.AI

    Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs

    Authors: Yilin Xiao, Chuang Zhou, Qinggang Zhang, Bo Li, Qing Li, Xiao Huang

    Abstract: Large language models (LLMs) often struggle with knowledge-intensive tasks due to a lack of background knowledge and a tendency to hallucinate. To address these limitations, integrating knowledge graphs (KGs) with LLMs has been intensively studied. Existing KG-enhanced LLMs focus on supplementary factual knowledge, but still struggle with solving complex questions. We argue that refining the relat… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  29. arXiv:2506.10507  [pdf, ps, other

    cs.GR cs.CV

    Edit360: 2D Image Edits to 3D Assets from Any Angle

    Authors: Junchao Huang, Xinting Hu, Zhuotao Tian, Shaoshuai Shi, Li Jiang

    Abstract: Recent advances in diffusion models have significantly improved image generation and editing, but extending these capabilities to 3D assets remains challenging, especially for fine-grained edits that require multi-view consistency. Existing methods typically restrict editing to predetermined viewing angles, severely limiting their flexibility and practical applications. We introduce Edit360, a tun… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 11 pages, 9 figures

  30. arXiv:2506.10505  [pdf, ps, other

    cs.CV

    J-DDL: Surface Damage Detection and Localization System for Fighter Aircraft

    Authors: Jin Huang, Mingqiang Wei, Zikuan Li, Hangyu Qu, Wei Zhao, Xinyu Bai

    Abstract: Ensuring the safety and extended operational life of fighter aircraft necessitates frequent and exhaustive inspections. While surface defect detection is feasible for human inspectors, manual methods face critical limitations in scalability, efficiency, and consistency due to the vast surface area, structural complexity, and operational demands of aircraft maintenance. We propose a smart surface d… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  31. arXiv:2506.10465  [pdf, ps, other

    cs.CV

    MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models

    Authors: Yu Huang, Zelin Peng, Yichen Zhao, Piao Yang, Xiaokang Yang, Wei Shen

    Abstract: Medical image segmentation is crucial for clinical diagnosis, yet existing models are limited by their reliance on explicit human instructions and lack the active reasoning capabilities to understand complex clinical questions. While recent advancements in multimodal large language models (MLLMs) have improved medical question-answering (QA) tasks, most methods struggle to generate precise segment… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: †: Equal contribution

  32. arXiv:2506.10403  [pdf, ps, other

    cs.LG cs.AI

    Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation

    Authors: Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala

    Abstract: Large language models (LLMs) are widely used to evaluate the quality of LLM generations and responses, but this leads to significant challenges: high API costs, uncertain reliability, inflexible pipelines, and inherent biases. To address these, we introduce PAJAMA (Program-As-a-Judge for Automated Model Assessment), a new alternative that uses LLMs to synthesize executable judging programs instead… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  33. arXiv:2506.10395  [pdf, ps, other

    cs.CV cs.AI

    Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

    Authors: Zhiyang Xu, Jiuhai Chen, Zhaojiang Lin, Xichen Pan, Lifu Huang, Tianyi Zhou, Madian Khabsa, Qifan Wang, Di Jin, Michihiro Yasunaga, Lili Yu, Xi Victoria Lin, Shaoliang Nie

    Abstract: Recent advances in large language models (LLMs) have enabled multimodal foundation models to tackle both image understanding and generation within a unified framework. Despite these gains, unified models often underperform compared to specialized models in either task. A key challenge in developing unified models lies in the inherent differences between the visual features needed for image underst… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Unified image understanding and generation model

  34. arXiv:2506.10364  [pdf, ps, other

    cs.LG cs.CL cs.CR

    Can We Infer Confidential Properties of Training Data from LLMs?

    Authors: Pengrun Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

    Abstract: Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties -- such as patient demographics or disease prevalence -- that are not intended to be revealed. While prior work has studied property inference attacks on… ▽ More

    Submitted 15 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  35. arXiv:2506.10362  [pdf, ps, other

    eess.SP

    Relaxation-Free Min-k-Partition for PCI Assignment in 5G Networks

    Authors: Yeqing Qiu, Chengpiao Huang, Ye Xue, Zhipeng Jiang, Qingjiang Shi, Dong Zhang, Zhi-Quan Luo

    Abstract: Physical Cell Identity (PCI) is a critical parameter in 5G networks. Efficient and accurate PCI assignment is essential for mitigating mod-3 interference, mod-30 interference, collisions, and confusions among cells, which directly affect network reliability and user experience. In this paper, we propose a novel framework for PCI assignment by decomposing the problem into Min-3-Partition, Min-10-Pa… ▽ More

    Submitted 13 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  36. arXiv:2506.10353  [pdf, ps, other

    cs.CV

    Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

    Authors: Runqi Ouyang, Haoyun Li, Zhenyuan Zhang, Xiaofeng Wang, Zheng Zhu, Guan Huang, Xingang Wang

    Abstract: Recent advances in large language models, especially in natural language understanding and reasoning, have opened new possibilities for text-to-motion generation. Although existing approaches have made notable progress in semantic alignment and motion synthesis, they often rely on end-to-end mapping strategies that fail to capture deep linguistic structures and logical reasoning. Consequently, gen… ▽ More

    Submitted 16 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  37. arXiv:2506.10335  [pdf, ps, other

    cs.CV

    PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting

    Authors: Lintao Xiang, Hongpei Zheng, Yating Huang, Qijun Yang, Hujun Yin

    Abstract: 3D Gaussian splatting (3DGS) is an innovative rendering technique that surpasses the neural radiance field (NeRF) in both rendering speed and visual quality by leveraging an explicit 3D scene representation. Existing 3DGS approaches require a large number of calibrated views to generate a consistent and complete scene representation. When input views are limited, 3DGS tends to overfit the training… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  38. arXiv:2506.10316  [pdf, ps, other

    hep-ex

    Search for sub-GeV invisible particles in inclusive decays of $J/ψ$ to $φ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (704 additional authors not shown)

    Abstract: A search for an invisible particle, $X$, with a mass between 0 and 0.96 $\textrm{GeV}/\textit{c}^{2}$, is performed in the process $J/ψ\rightarrowφ+ X$ using $(8774.0\pm39.4)\times10^{6}$ $J/ψ$ events collected with the BESIII detector from 2017 to 2019. The $φ$ meson is fully reconstructed and an efficient veto of photons, neutral and charged hadrons up to twice the $K_L^0$ mass is applied to the… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 10 pages, 3 figures

  39. arXiv:2506.10315  [pdf, ps, other

    cs.LG

    PyLO: Towards Accessible Learned Optimizers in PyTorch

    Authors: Paul Janson, Benjamin Therien, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky

    Abstract: Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optimizers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX a… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted at ICML CODEML Workshop 2025

  40. arXiv:2506.10308  [pdf, ps, other

    quant-ph cond-mat.str-el physics.chem-ph physics.comp-ph

    Coupled Lindblad pseudomode theory for simulating open quantum systems

    Authors: Zhen Huang, Gunhee Park, Garnet Kin-Lic Chan, Lin Lin

    Abstract: Coupled Lindblad pseudomode theory is a promising approach for simulating non-Markovian quantum dynamics on both classical and quantum platforms, with dynamics that can be realized as a quantum channel. We provide theoretical evidence that the number of coupled pseudomodes only needs to scale as $\mathrm{polylog}(T/\varepsilon)$ in the simulation time $T$ and precision $\varepsilon$. Inspired by t… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  41. arXiv:2506.10291  [pdf, ps, other

    eess.SY

    Learning-Based Stable Optimal Control for Infinite-Time Nonlinear Regulation Problems

    Authors: Han Wang, Di Wu, Lin Cheng, Shengping Gong, Xu Huang

    Abstract: Infinite-time nonlinear optimal regulation control is widely utilized in aerospace engineering as a systematic method for synthesizing stable controllers. However, conventional methods often rely on linearization hypothesis, while recent learning-based approaches rarely consider stability guarantees. This paper proposes a learning-based framework to learn a stable optimal controller for nonlinear… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  42. arXiv:2506.10264  [pdf, ps, other

    cs.AI

    WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models

    Authors: Qiyue Yin, Pei Xu, Qiaozhe Li, Shengda Liu, Shengqi Shen, Tong Wang, Yihong Han, Xiaonan Zhao, Likun Yang, Shiyue Cao, Shiyu Qiu, Yuxuan Liu, Shizhao Yu, Lei Cui, Chengxin Yan, Jie Sun, Xiangquan Tang, Kaiqi Huang

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have led to a qualitative leap in artificial intelligence' s performance on reasoning tasks, particularly demonstrating remarkable capabilities in mathematical, symbolic, and commonsense reasoning. However, as a critical component of advanced human cognition, strategic reasoning, i.e., the ability to assess multi-agent behaviors in dynamic envir… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 15 pages, 17 figures

  43. arXiv:2506.10191  [pdf, ps, other

    quant-ph cond-mat.other physics.app-ph

    Constructive interference at the edge of quantum ergodic dynamics

    Authors: Dmitry A. Abanin, Rajeev Acharya, Laleh Aghababaie-Beni, Georg Aigeldinger, Ashok Ajoy, Ross Alcaraz, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Christian Bengs, Andreas Bengtsson, Alexander Bilmes, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird , et al. (240 additional authors not shown)

    Abstract: Quantum observables in the form of few-point correlators are the key to characterizing the dynamics of quantum many-body systems. In dynamics with fast entanglement generation, quantum observables generally become insensitive to the details of the underlying dynamics at long times due to the effects of scrambling. In experimental systems, repeated time-reversal protocols have been successfully imp… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: See following link: https://zenodo.org/records/15640503, which includes: Circuits used in Fig. 3d, Fig. 3e, Fig. 4a, Fig. 4b of the main text. In addition, OTOC (C^(2)) circuits and data with 95, 40 and 31 qubits are also provided. For system sizes <= 40 qubits, we include exact simulation results. For system sizes > 40, we include experimental data

  44. Rethinking Brain Tumor Segmentation from the Frequency Domain Perspective

    Authors: Minye Shao, Zeyu Wang, Haoran Duan, Yawen Huang, Bing Zhai, Shizheng Wang, Yang Long, Yefeng Zheng

    Abstract: Precise segmentation of brain tumors, particularly contrast-enhancing regions visible in post-contrast MRI (areas highlighted by contrast agent injection), is crucial for accurate clinical diagnosis and treatment planning but remains challenging. However, current methods exhibit notable performance degradation in segmenting these enhancing brain tumor areas, largely due to insufficient considerati… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Transactions on Medical Imaging

  45. arXiv:2506.10128  [pdf, ps, other

    cs.CV cs.LG

    ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

    Authors: Xiyao Wang, Zhengyuan Yang, Chao Feng, Yongyuan Liang, Yuhang Zhou, Xiaoyu Liu, Ziyi Zang, Ming Li, Chung-Ching Lin, Kevin Lin, Linjie Li, Furong Huang, Lijuan Wang

    Abstract: Reinforcement learning (RL) has shown great effectiveness for fine-tuning large language models (LLMs) using tasks that are challenging yet easily verifiable, such as math reasoning or code generation. However, extending this success to visual perception in vision-language models (VLMs) has been impeded by the scarcity of vision-centric tasks that are simultaneously challenging and unambiguously v… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  46. arXiv:2506.10092  [pdf, ps, other

    cs.DB

    GPU Acceleration of SQL Analytics on Compressed Data

    Authors: Zezhou Huang, Krystian Sakowski, Hans Lehnert, Wei Cui, Carlo Curino, Matteo Interlandi, Marius Dumitru, Rathijit Sen

    Abstract: GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain typically small when compared with lower-bandwidth CPU main memory. Besides brute-force scaling across many GPUs, current solutions to accelerate queries on large… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  47. arXiv:2506.10082  [pdf, ps, other

    cs.CV

    LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

    Authors: Chenjian Gao, Lihe Ding, Xin Cai, Zhanpeng Huang, Zibin Wang, Tianfan Xue

    Abstract: Video editing using diffusion models has achieved remarkable results in generating high-quality edits for videos. However, current methods often rely on large-scale pretraining, limiting flexibility for specific edits. First-frame-guided editing provides control over the first frame, but lacks flexibility over subsequent frames. To address this, we propose a mask-based LoRA (Low-Rank Adaptation) t… ▽ More

    Submitted 19 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: 12 pages

  48. arXiv:2506.10027  [pdf, ps, other

    cs.GR cs.CV cs.LG

    Learning-based density-equalizing map

    Authors: Yanwen Huang, Lok Ming Lui, Gary P. T. Choi

    Abstract: Density-equalizing map (DEM) serves as a powerful technique for creating shape deformations with the area changes reflecting an underlying density function. In recent decades, DEM has found widespread applications in fields such as data visualization, geometry processing, and medical imaging. Traditional approaches to DEM primarily rely on iterative numerical solvers for diffusion equations or opt… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  49. arXiv:2506.10019  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations

    Authors: Tian Lan, Yang-Hao Zhou, Zi-Ao Ma, Fanshu Sun, Rui-Qing Sun, Junyu Luo, Rong-Cheng Tu, Heyan Huang, Chen Xu, Zhijing Wu, Xian-Ling Mao

    Abstract: Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous automatic evaluation methods exist, current research lacks a systematic framework that comprehensively organizes these methods across text, visual, and audio mo… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  50. arXiv:2506.09994  [pdf, ps, other

    cs.RO cs.AI

    eFlesh: Highly customizable Magnetic Touch Sensing using Cut-Cell Microstructures

    Authors: Venkatesh Pattabiraman, Zizhou Huang, Daniele Panozzo, Denis Zorin, Lerrel Pinto, Raunaq Bhirangi

    Abstract: If human experience is any guide, operating effectively in unstructured environments -- like homes and offices -- requires robots to sense the forces during physical interaction. Yet, the lack of a versatile, accessible, and easily customizable tactile sensor has led to fragmented, sensor-specific solutions in robotic manipulation -- and in many cases, to force-unaware, sensorless approaches. With… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.