Skip to main content

Showing 1–50 of 43,810 results for author: Huang

.
  1. arXiv:2506.17211  [pdf, ps, other

    cs.LG

    BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning

    Authors: Xuechen Zhang, Zijian Huang, Yingcong Li, Chenshun Ni, Jiasi Chen, Samet Oymak

    Abstract: Small language models (SLMs) struggle to learn complex reasoning behaviors, especially when high-quality traces are scarce or difficult to learn from. The standard training approach combines a supervised fine-tuning (SFT) stage, often to distill capabilities of a larger model, followed by a reinforcement learning (RL)stage such as Group Relative Policy Optimization (GRPO). In this paper, we invest… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  2. arXiv:2506.17206  [pdf, ps, other

    cs.GR cs.CV cs.LG

    DreamCube: 3D Panorama Generation via Multi-plane Synchronization

    Authors: Yukun Huang, Yanning Zhou, Jianan Wang, Kaiyi Huang, Xihui Liu

    Abstract: 3D panorama synthesis is a promising yet challenging task that demands high-quality and diverse visual appearance and geometry of the generated omnidirectional content. Existing methods leverage rich image priors from pre-trained 2D foundation models to circumvent the scarcity of 3D panoramic data, but the incompatibility between 3D panoramas and 2D single views limits their effectiveness. In this… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Project page: https://yukun-huang.github.io/DreamCube/

  3. arXiv:2506.17110  [pdf, ps, other

    cs.RO cs.CV

    Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping

    Authors: Teng Guo, Baichuan Huang, Jingjin Yu

    Abstract: Accurate 6D object pose estimation is a prerequisite for successfully completing robotic prehensile and non-prehensile manipulation tasks. At present, 6D pose estimation for robotic manipulation generally relies on depth sensors based on, e.g., structured light, time-of-flight, and stereo-vision, which can be expensive, produce noisy output (as compared with RGB cameras), and fail to handle transp… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted to IROS 2025

  4. arXiv:2506.17046  [pdf, ps, other

    cs.CL cs.LG

    MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models

    Authors: Xiaolong Wang, Zhaolu Kang, Wangyuxuan Zhai, Xinyue Lou, Yunghwei Lai, Ziyue Wang, Yawen Wang, Kaiyu Huang, Yile Wang, Peng Li, Yang Liu

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated significant advances across numerous vision-language tasks. Due to their strong image-text alignment capability, MLLMs can effectively understand image-text pairs with clear meanings. However, effectively resolving the inherent ambiguities in natural language and visual contexts remains challenging. Existing multimodal benchmarks typically… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  5. arXiv:2506.16957  [pdf, ps, other

    eess.SP

    Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point

    Authors: Zisheng Wang, Feng Li, Hangbin Zhao, Zihuan Mao, Yaodong Zhang, Qisheng Huang, Bo Cao, Mingming Cao, Baolin He, Qilin Hou

    Abstract: Wi-Fi sensing has emerged as a powerful technology, leveraging channel state information (CSI) extracted from wireless data packets to enable diverse applications, ranging from human presence detection to gesture recognition and health monitoring. However, CSI extraction from commercial Wi-Fi access point lacks and out of date. This paper introduces ZTECSITool,a toolkit designed to capture high-re… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  6. arXiv:2506.16955  [pdf, ps, other

    nucl-ex astro-ph.IM

    Search for the in-situ production of $^{77}$Ge in the GERDA neutrinoless double-beta decay experiment

    Authors: M. Agostini, A. Alexander, G. Araujo, A. M. Bakalyarov, M. Balata, I. Barabanov, L. Baudis, C. Bauer, S. Belogurov, A. Bettini, L. Bezrukov, V. Biancacci, E. Bossio, V. Bothe, R. Brugnera, A. Caldwell, S. Calgaro, C. Cattadori, A. Chernogorov, P. -J. Chiu, T. Comellato, V. D'Andrea, E. V. Demidova, N. Di Marco, E. Doroshkevich , et al. (86 additional authors not shown)

    Abstract: The beta decay of $^{77}$Ge and $^{77\mathrm{m}}$Ge, both produced by neutron capture on $^{76}$Ge, is a potential background for Germanium based neutrinoless double-beta decay search experiments such as GERDA or the LEGEND experiment. In this work we present a search for $^{77}$Ge decays in the full GERDA Phase II data set. A delayed coincidence method was employed to identify the decay of… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures

  7. arXiv:2506.16953  [pdf, ps, other

    math.CO

    Dimensions of compositions modulo a prime

    Authors: Jia Huang

    Abstract: The (ordinary) representation theory of the symmetric group is fascinating and has rich connections to combinatorics, including the Frobenius correspondence to the self-dual graded Hopf algebra of symmetric functions. The $0$-Hecke algebra (of type $A$) is a deformation of the group algebra of the symmetric group, and its representation theory has an analogous correspondence to the dual graded Hop… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 18 pages

    MSC Class: 05E10

  8. arXiv:2506.16934  [pdf

    eess.IV cs.CV

    PET Tracer Separation Using Conditional Diffusion Transformer with Multi-latent Space Learning

    Authors: Bin Huang, Feihong Xu, Xinchong Shi, Shan Huang, Binxuan Li, Fei Li, Qiegen Liu

    Abstract: In clinical practice, single-radiotracer positron emission tomography (PET) is commonly used for imaging. Although multi-tracer PET imaging can provide supplementary information of radiotracers that are sensitive to physiological function changes, enabling a more comprehensive characterization of physiological and pathological states, the gamma-photon pairs generated by positron annihilation react… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  9. arXiv:2506.16922  [pdf, ps, other

    hep-ph astro-ph.CO astro-ph.HE

    Low-Energy Supernova Constraints on Lepton Flavor Violating Axions

    Authors: Zi-Miao Huang, Zuowei Liu

    Abstract: The extreme conditions within the supernova core, a high-temperature and high-density environment, create an ideal laboratory for the search for new physics beyond the Standard Model. Of particular interest are low-energy supernovae, characterized by their low explosion energies, which place strong constraints on the new-physics energy transfer from the core to the mantle. We compute low-energy su… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 12 pages, 8 figures

  10. arXiv:2506.16796  [pdf, ps, other

    cs.CV

    RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought

    Authors: Junbo Qiao, Miaomiao Cai, Wei Li, Yutong Liu, Xudong Huang, Gaoqi He, Jiao Xie, Jie Hu, Xinghao Chen, Shaohui Lin

    Abstract: Real-World Image Super-Resolution is one of the most challenging task in image restoration. However, existing methods struggle with an accurate understanding of degraded image content, leading to reconstructed results that are both low-fidelity and unnatural. We present RealSR-R1 in this work, which empowers the RealSR models with understanding and reasoning capabilities. Inspired by the success o… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  11. SocialSim: Towards Socialized Simulation of Emotional Support Conversation

    Authors: Zhuang Chen, Yaru Cao, Guanqun Bi, Jincenzi Wu, Jinfeng Zhou, Xiyao Xiao, Si Chen, Hongning Wang, Minlie Huang

    Abstract: Emotional support conversation (ESC) helps reduce people's psychological stress and provide emotional value through interactive dialogues. Due to the high cost of crowdsourcing a large ESC corpus, recent attempts use large language models for dialogue augmentation. However, existing approaches largely overlook the social dynamics inherent in ESC, leading to less effective simulations. In this pape… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: AAAI 2025 Paper #32116 (Without Publication Edits)

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 1274-1282, 2025

  12. arXiv:2506.16728  [pdf, ps, other

    cs.CV

    Few-Shot Generalized Category Discovery With Retrieval-Guided Decision Boundary Enhancement

    Authors: Yunhan Ren, Feng Luo, Siyu Huang

    Abstract: While existing Generalized Category Discovery (GCD) models have achieved significant success, their performance with limited labeled samples and a small number of known categories remains largely unexplored. In this work, we introduce the task of Few-shot Generalized Category Discovery (FSGCD), aiming to achieve competitive performance in GCD tasks under conditions of known information scarcity. T… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted by ICMR 2025

  13. arXiv:2506.16718  [pdf, ps, other

    cs.MA cs.AI

    Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation

    Authors: Chenxu Wang, Yonggang Jin, Cheng Hu, Youpeng Zhao, Zipeng Dai, Jian Zhao, Shiyu Huang, Liuyu Xiang, Junge Zhang, Zhaofeng He

    Abstract: Adapting a single agent to a new multi-agent system brings challenges, necessitating adjustments across various tasks, environments, and interactions with unknown teammates and opponents. Addressing this challenge is highly complex, and researchers have proposed two simplified scenarios, Multi-agent reinforcement learning for zero-shot learning and Ad-Hoc Teamwork. Building on these foundations, w… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: This manuscript is under submission to Neurocomputing

    Report number: NEUCOM-D-25-02272R1

  14. arXiv:2506.16695  [pdf

    cond-mat.mtrl-sci

    Crystal Growth of Chalcogenides and Oxy-Chalcogenides Using Chloride Exchange Reaction

    Authors: Shantanu Singh, Boyang Zhao, Christopher E. Stevens, Mythili Surendran, Tzu-Chi Huang, Bi-Hsuan Lin, Joshua R. Hendrickson, Jayakanth Ravichandran

    Abstract: Chalcogenides and oxy-chalcogenides, including complex chalcogenides and transition metal dichalcogenides, are emerging semiconductors with direct or indirect band gaps within the visible spectrum. These materials are being explored for various photonic and electronic applications, such as photodetectors, photovoltaics, and phase-change electronics. Understanding the fundamental properties of thes… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  15. arXiv:2506.16691  [pdf, ps, other

    cs.CV

    LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation

    Authors: Tongtian Yue, Longteng Guo, Yepeng Tang, Zijia Zhao, Xinxin Zhu, Hua Huang, Jing Liu

    Abstract: Despite the impressive advancements of Large Vision-Language Models (LVLMs), existing approaches suffer from a fundamental bottleneck: inefficient visual-language integration. Current methods either disrupt the model's inherent structure or introduce severe long-context computational burden, severely limiting scalability and efficiency. In this paper, we rethink multimodal integration and present… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  16. arXiv:2506.16683  [pdf, ps, other

    cs.IR cs.AI

    A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation

    Authors: Penglong Zhai, Yifang Yuan, Fanyi Di, Jie Li, Yue Liu, Chen Li, Jie Huang, Sicong Wang, Yao Xu, Xin Li

    Abstract: Generative retrieval-based recommendation has emerged as a promising paradigm aiming at directly generating the identifiers of the target candidates. However, in large-scale recommendation systems, this approach becomes increasingly cumbersome due to the redundancy and sheer scale of the token space. To overcome these limitations, recent research has explored the use of semantic tokens as an alter… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 12 pages,7 figures

  17. arXiv:2506.16654  [pdf, ps, other

    cs.LG cs.AI cs.DB

    Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures

    Authors: Vijay Prakash Dwivedi, Charilaos Kanatsoulis, Shenyang Huang, Jure Leskovec

    Abstract: Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data and has been applied to molecules, social networks, recommendation systems, and transportation, among other domains. Data in multi-tabular relational databases can also be constructed as 'relational entity graphs' for Relational Deep Learning (RDL) - a new blueprint… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  18. arXiv:2506.16633  [pdf, ps, other

    cs.CL cs.AI cs.MM

    GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View

    Authors: Fenghua Cheng, Jinxiang Wang, Sen Wang, Zi Huang, Xue Li

    Abstract: Multimodal reasoning is a process of understanding, integrating and inferring information across different data modalities. It has recently attracted surging academic attention as a benchmark for Artificial Intelligence (AI). Although there are various tasks for evaluating multimodal reasoning ability, they still have limitations. Lack of reasoning on hierarchical visual clues at different levels… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  19. arXiv:2506.16595  [pdf

    cond-mat.mes-hall

    Optimizing Time-resolved Magneto-optical Kerr Effect for High-fidelity Magnetic Characterization

    Authors: Yun Kim, Dingbin Huang, Deyuan Lyu, Haoyue Sun, Jian-Ping Wang, Paul A. Crowell, Xiaojia Wang

    Abstract: Spintronics has emerged as a key technology for fast and non-volatile memory with great CMOS compatibility. As the building blocks for these cutting-edge devices, magnetic materials require precise characterization of their critical properties, such as the effective anisotropy field ($H_{\rm{k,eff}}$, related to magnetic stability) and damping ($α$ key factor in device energy efficiency). Accurate… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Submitted to Appl. Phys. Lett. Manuscript: 16 pages, 5 figures; Supplementary Materials: 18 pages, 12 figures

  20. arXiv:2506.16594  [pdf, ps, other

    cs.CL

    A Scoping Review of Synthetic Data Generation for Biomedical Research and Applications

    Authors: Hanshu Rao, Weisi Liu, Haohan Wang, I-Chan Huang, Zhe He, Xiaolei Huang

    Abstract: Synthetic data generation--mitigating data scarcity, privacy concerns, and data quality challenges in biomedical fields--has been facilitated by rapid advances of large language models (LLMs). This scoping review follows PRISMA-ScR guidelines and synthesizes 59 studies, published between 2020 and 2025 and collected from PubMed, ACM, Web of Science, and Google Scholar. The review systematically exa… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  21. arXiv:2506.16578  [pdf, ps, other

    cs.CV

    SafeTriage: Facial Video De-identification for Privacy-Preserving Stroke Triage

    Authors: Tongan Cai, Haomiao Ni, Wenchao Ma, Yuan Xue, Qian Ma, Rachel Leicht, Kelvin Wong, John Volpi, Stephen T. C. Wong, James Z. Wang, Sharon X. Huang

    Abstract: Effective stroke triage in emergency settings often relies on clinicians' ability to identify subtle abnormalities in facial muscle coordination. While recent AI models have shown promise in detecting such patterns from patient facial videos, their reliance on real patient data raises significant ethical and privacy challenges -- especially when training robust and generalizable models across inst… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: IPMI 2025

  22. arXiv:2506.16531  [pdf, ps, other

    cs.CV

    How Hard Is Snow? A Paired Domain Adaptation Dataset for Clear and Snowy Weather: CADC+

    Authors: Mei Qi Tang, Sean Sedwards, Chengjie Huang, Krzysztof Czarnecki

    Abstract: The impact of snowfall on 3D object detection performance remains underexplored. Conducting such an evaluation requires a dataset with sufficient labelled data from both weather conditions, ideally captured in the same driving environment. Current driving datasets with LiDAR point clouds either do not provide enough labelled data in both snowy and clear weather conditions, or rely on de-snowing me… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: IEEE IV 2025

  23. arXiv:2506.16504  [pdf, ps, other

    cs.CV cs.AI

    Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details

    Authors: Zeqiang Lai, Yunfei Zhao, Haolin Liu, Zibo Zhao, Qingxiang Lin, Huiwen Shi, Xianghui Yang, Mingxin Yang, Shuhui Yang, Yifei Feng, Sheng Zhang, Xin Huang, Di Luo, Fan Yang, Fang Yang, Lifu Wang, Sicong Liu, Yixuan Tang, Yulin Cai, Zebin He, Tian Liu, Yuhong Liu, Jie Jiang, Linus, Jingwei Huang , et al. (1 additional authors not shown)

    Abstract: In this report, we present Hunyuan3D 2.5, a robust suite of 3D diffusion models aimed at generating high-fidelity and detailed textured 3D assets. Hunyuan3D 2.5 follows two-stages pipeline of its previous version Hunyuan3D 2.0, while demonstrating substantial advancements in both shape and texture generation. In terms of shape generation, we introduce a new shape foundation model -- LATTICE, which… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Technical report

  24. arXiv:2506.16481  [pdf, ps, other

    astro-ph.EP

    SO emission in the dynamically perturbed protoplanetary disks around CQ Tau and MWC 758

    Authors: Francesco Zagaria, Haochang Jiang, Gianni Cataldi, Stefano Facchini, Myriam Benisty, Yuri Aikawa, Sean Andrews, Jaehan Bae, Marcelo Barraza-Alfaro, Pietro Curone, Ian Czekala, Daniele Fasano, Cassandra Hall, Iain Hammond, Jane Huang, John D. Ilee, Andrés F. Izquierdo, Jensen Lawrence, Giuseppe Lodato, François Ménard, Christophe Pinte, Giovanni P. Rosotti, Jochen Stadler, Richard Teague, Leonardo Testi , et al. (3 additional authors not shown)

    Abstract: We report the serendipitous detection of the SO $J_N=6_5-5_4$ (219.949 GHz) rotational transition in archival Atacama Large Millimeter/submillimeter Array (ALMA) observations of the spiral hosting protoplanetary disks around CQ Tau (with $\approx4.9σ$ significance) and MWC 758 (with $\approx3.4σ$ significance). In the former, the SO emission comes in the shape of a ring, arises from the edge of th… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted for publication in ApJ. 23 pages 7 figures

  25. arXiv:2506.16447  [pdf, ps, other

    cs.CR cs.CL

    Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models

    Authors: Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, Zhixuan Chu, Yiming Li

    Abstract: Backdoor unalignment attacks against Large Language Models (LLMs) enable the stealthy compromise of safety alignment using a hidden trigger while evading normal safety auditing. These attacks pose significant threats to the applications of LLMs in the real-world Large Language Model as a Service (LLMaaS) setting, where the deployed model is a fully black-box system that can only interact through t… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted at ICLR 2025

    Journal ref: Proceedings of The Thirteenth International Conference on Learning Representations (ICLR 2025)

  26. arXiv:2506.16398  [pdf, ps, other

    cs.CV

    HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis

    Authors: Peixiang Huang, Yanyan Huang, Weiqin Zhao, Junjun He, Lequan Yu

    Abstract: Pathology is essential for cancer diagnosis, with multiple instance learning (MIL) widely used for whole slide image (WSI) analysis. WSIs exhibit a natural hierarchy -- patches, regions, and slides -- with distinct semantic associations. While some methods attempt to leverage this hierarchy for improved representation, they predominantly rely on Euclidean embeddings, which struggle to fully captur… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  27. arXiv:2506.16381  [pdf, ps, other

    cs.CL cs.SD eess.AS

    InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems

    Authors: Kexin Huang, Qian Tu, Liwei Fan, Chenchen Yang, Dong Zhang, Shimin Li, Zhaoye Fei, Qinyuan Cheng, Xipeng Qiu

    Abstract: In modern speech synthesis, paralinguistic information--such as a speaker's vocal timbre, emotional state, and dynamic prosody--plays a critical role in conveying nuance beyond mere semantics. Traditional Text-to-Speech (TTS) systems rely on fixed style labels or inserting a speech prompt to control these cues, which severely limits flexibility. Recent attempts seek to employ natural-language inst… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 19 pages, 9 figures

  28. arXiv:2506.16346  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Preferred Synthesis of Armchair SnS2 Nanotubes

    Authors: Abid, Luneng Zhao, Ju Huang, Yongjia Zheng, Yuta Sato, Qingyun Lin, Zhen Han, Chunxia Yang, Tianyu Wang, Bill Herve Nduwarugira, Yicheng Ma, Lingfeng Wang, Yige Zheng, Hang Wang, Salman Ullah, Afzal Khan, Qi Zhang, Wenbin Li, Junfeng Gao, Bingfeng Ju, Feng Ding, Yan Li, Kazu Suenaga, Shigeo Maruyama, Huayong Yang , et al. (1 additional authors not shown)

    Abstract: In this work, we present the synthesis of tin disulfide (SnS2) nanotubes (NTs) with preferred chiral angle. A sacrificial template is used to create channels of boron nitride nanotubes (BNNTs) with an optimized diameter of 4-5 nm, inside of which SnS2 NTs are formed with the high yield and structural purity. Atomic resolution imaging and nano-area electron diffraction reveal that these synthesized… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  29. arXiv:2506.16336  [pdf, ps, other

    cs.RO cs.MA

    Goal-conditioned Hierarchical Reinforcement Learning for Sample-efficient and Safe Autonomous Driving at Intersections

    Authors: Yiou Huang

    Abstract: Reinforcement learning (RL) exhibits remarkable potential in addressing autonomous driving tasks. However, it is difficult to train a sample-efficient and safe policy in complex scenarios. In this article, we propose a novel hierarchical reinforcement learning (HRL) framework with a goal-conditioned collision prediction (GCCP) module. In the hierarchical structure, the GCCP module predicts collisi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  30. arXiv:2506.16317  [pdf, ps, other

    hep-ph

    Two loop QCD corrections to $e^+ e^- \to J/ψ+ η_c$ in asymptotic expansion

    Authors: Cong Li, Xu-Dong Huang, Wen-Long Sang

    Abstract: Within the framework of NRQCD, the short-distance coefficients (SDCs) for the process $e^+e^-\to J/ψ+η_c$ have been obtained up to NNLO in asymptotic expansions over $r={16m_c^2}/{s}$ up to $r^{15}$. Although these asymptotic expressions are deviated from the full results near the threshold $r= 1$, they provide excellent approximations to the full results for $r<0.8$, with deviations less than… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 18 pages, 4 figures, 1 tables, 1 attached file

  31. arXiv:2506.16305  [pdf, ps, other

    math.AP math.DG

    A remark for fully non-linear elliptic equations on compact almost Hermitian manifolds

    Authors: Liding Huang

    Abstract: In this paper, we generalize the definition of sub-slope, introduced by Guo-Song, to almost Hermitian manifolds and prove the existence of solutions for a general class of fully non-linear equations on compact almost Hermitian manifolds. As an application, we solve the complex Hessian quotient equation and the deformed Hermitian-Yang-Mills equation in the almost Hermitian setting.

    Submitted 19 June, 2025; originally announced June 2025.

  32. arXiv:2506.16265  [pdf, ps, other

    cs.CV cs.RO eess.IV physics.geo-ph

    Dense 3D Displacement Estimation for Landslide Monitoring via Fusion of TLS Point Clouds and Embedded RGB Images

    Authors: Zhaoyi Wang, Jemil Avers Butt, Shengyu Huang, Tomislav Medic, Andreas Wieser

    Abstract: Landslide monitoring is essential for understanding geohazards and mitigating associated risks. However, existing point cloud-based methods typically rely on either geometric or radiometric information and often yield sparse or non-3D displacement estimates. In this paper, we propose a hierarchical partition-based coarse-to-fine approach that fuses 3D point clouds and co-registered RGB images to e… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 20 pages, 16 figures. Preprint under peer review. Example data and code available at [GitHub](https://github.com/zhaoyiww/fusion4landslide)

  33. arXiv:2506.16261  [pdf, ps, other

    math.AP

    Global well-posedness for 2D compressible radially symmetric Navier-Stokes equations with swirl

    Authors: Xiangdi Huang, Weili Meng

    Abstract: In this paper, we consider the radially symmetric compressible Navier-Stokes equations with swirl in two-dimensional disks, where the shear viscosity coefficient \(μ= \text{const}> 0\), and the bulk one \(λ= ρ^β(β>0)\). When \(β\geq 1\), we prove the global existence and asymptotic behavior of the large strong solutions for initial values that allow for vacuum. One of the key ingredients is to sho… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 44 pages

    MSC Class: 35Q30; 76N10

  34. arXiv:2506.16250  [pdf, ps, other

    quant-ph cs.IT

    Graph-Cover-based Characterization of the Bethe Partition Function of Double-Edge Factor Graphs

    Authors: Yuwen Huang, Pascal O. Vontobel

    Abstract: For standard factor graphs (S-FGs) with non-negative real-valued local functions, Vontobel provided a combinatorial characterization of the Bethe approximation of the partition function, also known as the Bethe partition function, using finite graph covers. The proof of this characterization, i.e., the graph-cover theorem for S-FGs, heavily relied on the method of types. In this paper, we study… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2412.05942

  35. arXiv:2506.16233  [pdf, ps, other

    astro-ph.GA cs.LG

    Can AI Dream of Unseen Galaxies? Conditional Diffusion Model for Galaxy Morphology Augmentation

    Authors: Chenrui Ma, Zechang Sun, Tao Jing, Zheng Cai, Yuan-Sen Ting, Song Huang, Mingyu Li

    Abstract: Observational astronomy relies on visual feature identification to detect critical astrophysical phenomena. While machine learning (ML) increasingly automates this process, models often struggle with generalization in large-scale surveys due to the limited representativeness of labeled datasets -- whether from simulations or human annotation -- a challenge pronounced for rare yet scientifically va… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: We have submitted to AAS journals. See another independent work for further reference -- Category-based Galaxy Image Generation via Diffusion Models (Fan, Tang et al.). Comments are welcome

  36. arXiv:2506.16211  [pdf, ps, other

    cs.RO

    ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models

    Authors: Puhao Li, Yingying Wu, Ziheng Xi, Wanlin Li, Yuzhe Huang, Zhiyuan Zhang, Yinghan Chen, Jianan Wang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang

    Abstract: Learning real-world robotic manipulation is challenging, particularly when limited demonstrations are available. Existing methods for few-shot manipulation often rely on simulation-augmented data or pre-built modules like grasping and pose estimation, which struggle with sim-to-real gaps and lack extensibility. While large-scale imitation pre-training shows promise, adapting these general-purpose… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Website: https://controlvla.github.io

  37. arXiv:2506.16210  [pdf, ps, other

    eess.IV cs.CV

    From Coarse to Continuous: Progressive Refinement Implicit Neural Representation for Motion-Robust Anisotropic MRI Reconstruction

    Authors: Zhenxuan Zhang, Lipei Zhang, Yanqi Cheng, Zi Wang, Fanwen Wang, Haosen Zhang, Yue Yang, Yinzhe Wu, Jiahao Huang, Angelica I Aviles-Rivero, Zhifan Gao, Guang Yang, Peter J. Lally

    Abstract: In motion-robust magnetic resonance imaging (MRI), slice-to-volume reconstruction is critical for recovering anatomically consistent 3D brain volumes from 2D slices, especially under accelerated acquisitions or patient motion. However, this task remains challenging due to hierarchical structural disruptions. It includes local detail loss from k-space undersampling, global structural aliasing cause… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  38. arXiv:2506.16136  [pdf, ps, other

    cs.SE

    Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing

    Authors: Kai Huang, Jian Zhang, Xiaofei Xie, Chunyang Chen

    Abstract: Large language model-(LLM) based automated program repair (APR) techniques have shown promising results in resolving real-world GitHub issue tasks. Existing APR systems are primarily evaluated in unimodal settings (e.g., SWE-bench). However, these autonomous systems struggle to resolve multimodal problem scenarios (e.g., SWE-bench M) due to limitations in interpreting and leveraging visual informa… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  39. arXiv:2506.16112  [pdf, ps, other

    cs.CV

    AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models

    Authors: Yuan Zhang, Chun-Kai Fan, Tao Huang, Ming Lu, Sicheng Yu, Junwen Pan, Kuan Cheng, Qi She, Shanghang Zhang

    Abstract: Inspired by text prompts in large language models (LLMs), visual prompts have been explored to enhance the reasoning capabilities of large vision-language models (LVLMs). Current methods design heuristic visual prompts, such as overlaying a text-query-guided attention heatmap on the original input image. However, designing effective prompts manually is challenging and time-consuming, and it often… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 19 pages

  40. arXiv:2506.16102  [pdf, ps, other

    eess.IV cs.CV

    Fast Training-free Perceptual Image Compression

    Authors: Ziran Zhu, Tongda Xu, Minye Huang, Dailan He, Xingtong Ge, Xinjie Zhang, Ling Li, Yan Wang

    Abstract: Training-free perceptual image codec adopt pre-trained unconditional generative model during decoding to avoid training new conditional generative model. However, they heavily rely on diffusion inversion or sample communication, which take 1 min to intractable amount of time to decode a single image. In this paper, we propose a training-free algorithm that improves the perceptual quality of any ex… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  41. arXiv:2506.16100  [pdf, ps, other

    hep-ph

    Seesaw Portal to Super Heavy Dark Matter with $Z_3$ Symmetry

    Authors: Cai-Xia Yang, Zhi-Long Han, Fei Huang, Yi Jin, Honglei Li

    Abstract: Right-handed neutrinos $N$ are introduced to explain the origin of the tiny neutrino masses via the seesaw mechanism. Required by relatively large Yukawa coupling and leptogenesis, masses of right-handed neutrinos are beyond $10^{9}$ GeV. Such heavy right-handed neutrino can mediate the production of super heavy dark matter $χ$ via the freeze-in mechanism. In the minimal $Z_2$ symmetric model, the… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 19 pages, 7 figures

  42. arXiv:2506.16078  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CR

    Probing the Robustness of Large Language Models Safety to Latent Perturbations

    Authors: Tianle Gu, Kexin Huang, Zongqi Wang, Yixu Wang, Jie Li, Yuanqi Yao, Yang Yao, Yujiu Yang, Yan Teng, Yingchun Wang

    Abstract: Safety alignment is a key requirement for building reliable Artificial General Intelligence. Despite significant advances in safety alignment, we observe that minor latent shifts can still trigger unsafe responses in aligned models. We argue that this stems from the shallow nature of existing alignment methods, which focus on surface-level refusal behaviors without sufficiently altering internal r… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  43. arXiv:2506.16037  [pdf, ps, other

    cs.CL cs.LG

    Enhancing Document-Level Question Answering via Multi-Hop Retrieval-Augmented Generation with LLaMA 3

    Authors: Xinyue Huang, Ziqi Lin, Fang Sun, Wenchao Zhang, Kejian Tong, Yunbo Liu

    Abstract: This paper presents a novel Retrieval-Augmented Generation (RAG) framework tailored for complex question answering tasks, addressing challenges in multi-hop reasoning and contextual understanding across lengthy documents. Built upon LLaMA 3, the framework integrates a dense retrieval module with advanced context fusion and multi-hop reasoning mechanisms, enabling more accurate and coherent respons… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  44. arXiv:2506.16031  [pdf, ps, other

    astro-ph.HE

    Longtime Monitoring of TeV Radio Galaxies with HAWC

    Authors: R. Alfaro, C. Alvarez, E. Anita-Rangel, J. C. Arteaga-Velázquez, D. Avila Rojas, H. A. Ayala Solares, R. Babu, P. Bangale, E. Belmont-Moreno, A. Bernal, K. S. Caballero-Mora, T. Capistrán, A. Carramiñana, F. Carreón, S. Casanova, U. Cotti, J. Cotzomi, S. Coutiño de León, E. De la Fuente, D. Depaoli, P. Desiati, N. Di Lalla, R. Diaz Hernandez, M. A. DuVernois, J. C. Díaz-Vélez , et al. (63 additional authors not shown)

    Abstract: We present the monitoring of the TeV-emitting radio galaxies M87, NGC~1275, 3C~264, and IC~310 with the High Altitude Water Cherenkov Observatory (HAWC) over a period of approximately $7.5$ years. The analysis includes light curves at daily, weekly and monthly time scales for the four sources. We report the detection of gamma-ray emission from M87 with a significance exceeding 5$σ$. Due to its sig… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 14 pages, 1 table, 7 figures

  45. arXiv:2506.16020  [pdf, ps, other

    cs.SD eess.AS

    VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge

    Authors: Zijing Zhao, Kai Wang, Hao Huang, Ying Hu, Liang He, Jichen Yang

    Abstract: To explore the potential advantages of utilizing spatial cues from images for generating stereo singing voices with room reverberation, we introduce VS-Singer, a vision-guided model designed to produce stereo singing voices with room reverberation from scene images. VS-Singer comprises three modules: firstly, a modal interaction network integrates spatial features into text encoding to create a li… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  46. arXiv:2506.16001  [pdf, ps, other

    cs.LG cs.AI

    AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction

    Authors: Qianru Zhang, Honggang Wen, Ming Li, Dong Huang, Siu-Ming Yiu, Christian S. Jensen, Pietro Liò

    Abstract: Time series forecasting requires architectures that simultaneously achieve three competing objectives: (1) strict temporal causality for reliable predictions, (2) sub-quadratic complexity for practical scalability, and (3) multi-scale pattern recognition for accurate long-horizon forecasting. We introduce AutoHFormer, a hierarchical autoregressive transformer that addresses these challenges throug… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 14 pages

  47. arXiv:2506.15961  [pdf

    cs.DC cs.AI cs.LG

    TrainVerify: Equivalence-Based Verification for Distributed LLM Training

    Authors: Yunchi Lu, Youshan Miao, Cheng Tan, Peng Huang, Yi Zhu, Xian Zhang, Fan Yang

    Abstract: Training large language models (LLMs) at scale requires parallel execution across thousands of devices, incurring enormous computational costs. Yet, these costly distributed trainings are rarely verified, leaving them prone to silent errors and potentially wasting millions of GPU hours. We introduce TrainVerify, a system for verifiable distributed training of LLMs. Given a deep learning model's lo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  48. arXiv:2506.15956  [pdf, ps, other

    physics.app-ph cond-mat.mes-hall

    Scalable quantum current source on commercial 22-nm CMOS process technology

    Authors: Ajit Dash, Suyash Pati Tripathi, Dimitrios Georgakopoulos, MengKe Feng, Steve Yianni, Ensar Vahapoglu, Md Mamunur Rahman, Shai Bonen, Owen Brace, Jonathan Y. Huang, Wee Han Lim, Kok Wai Chan, Will Gilbert, Arne Laucht, Andrea Morello, Andre Saraiva, Christopher C. Escott, Sorin P. Voinigescu, Andrew S. Dzurak, Tuomo Tanttu

    Abstract: Utilizing quantum effects in nanoscopic devices has in the past mostly been accessible through academic cleanrooms and research foundries. Opening the quantum frontier for wider industrial applications likely requires the scale of well-established complementary metal-oxide-semiconductor (CMOS) foundries for manufacturing transistor-based quantum devices operable above subkelvin temperatures. Here,… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 16 pages, 4 figures, 3 extended data figures, 7 extended data tables

  49. arXiv:2506.15943  [pdf, ps, other

    cs.LG

    On the optimal regret of collaborative personalized linear bandits

    Authors: Bruce Huang, Ruida Zhou, Lin F. Yang, Suhas Diggavi

    Abstract: Stochastic linear bandits are a fundamental model for sequential decision making, where an agent selects a vector-valued action and receives a noisy reward with expected value given by an unknown linear function. Although well studied in the single-agent setting, many real-world scenarios involve multiple agents solving heterogeneous bandit problems, each with a different unknown parameter. Applyi… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 30 pages, 4 figures

  50. arXiv:2506.15873  [pdf, ps, other

    cs.HC

    DeckFlow: Iterative Specification on a Multimodal Generative Canvas

    Authors: Gregory Croisdale, Emily Huang, John Joon Young Chung, Anhong Guo, Xu Wang, Austin Z. Henley, Cyrus Omar

    Abstract: Generative AI promises to allow people to create high-quality personalized media. Although powerful, we identify three fundamental design problems with existing tooling through a literature review. We introduce a multimodal generative AI tool, DeckFlow, to address these problems. First, DeckFlow supports task decomposition by allowing users to maintain multiple interconnected subtasks on an infini… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.