Skip to main content

Showing 1–50 of 1,591 results for author: Wu, K

.
  1. arXiv:2507.04789  [pdf, ps, other

    cs.RO

    Training-free Generation of Temporally Consistent Rewards from VLMs

    Authors: Yinuo Zhao, Jiale Yuan, Zhiyuan Xu, Xiaoshuai Hao, Xinyi Zhang, Kun Wu, Zhengping Che, Chi Harold Liu, Jian Tang

    Abstract: Recent advances in vision-language models (VLMs) have significantly improved performance in embodied tasks such as goal decomposition and visual comprehension. However, providing accurate rewards for robotic manipulation without fine-tuning VLMs remains challenging due to the absence of domain-specific robotic knowledge in pre-trained datasets and high computational costs that hinder real-time app… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2507.04455  [pdf, ps, other

    cs.CL

    GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models

    Authors: Kai Yao, Zhaorui Tan, Penglei Gao, Lichun Li, Kaixin Wu, Yinggui Wang, Yuan Zhao, Yixin Ji, Wei Wang, Jianke Zhu

    Abstract: The rapid growth of large language models (LLMs) with traditional centralized fine-tuning emerges as a key technique for adapting these models to domain-specific challenges, yielding privacy risks for both model and data owners. One promising solution, called offsite-tuning (OT), is proposed to address these challenges, where a weaker emulator is compressed from the original model and further fine… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Accepted by ACL 2025 main

  3. arXiv:2507.04118  [pdf, ps, other

    cs.CV

    PromptSR: Cascade Prompting for Lightweight Image Super-Resolution

    Authors: Wenyang Liu, Chen Cai, Jianjun Gao, Kejun Wu, Yi Wang, Kim-Hui Yap, Lap-Pui Chau

    Abstract: Although the lightweight Vision Transformer has significantly advanced image super-resolution (SR), it faces the inherent challenge of a limited receptive field due to the window-based self-attention modeling. The quadratic computational complexity relative to window size restricts its ability to use a large window size for expanding the receptive field while maintaining low computational costs. T… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: Accepted in TMM

  4. arXiv:2507.03243  [pdf, ps, other

    cs.HC

    Beyond Charging Anxiety: An Explainable Approach to Understanding User Preferences of EV Charging Stations Using Review Data

    Authors: Zifei Wang, Emmanuel Abolarin, Kai Wu, Venkatarao Rebba, Jian Hu, Zhen Hu, Shan Bao, Feng Zhou

    Abstract: Electric vehicles (EVs) charging infrastructure is directly related to the overall EV user experience and thus impacts the widespread adoption of EVs. Understanding key factors that affect EV users' charging experience is essential for building a robust and user-friendly EV charging infrastructure. This study leverages about $17,000$ charging station (CS) reviews on Google Maps to explore EV user… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 19 pages, 8 figures

  5. arXiv:2507.01192  [pdf, ps, other

    cs.CC

    PCPP-Based Reconfiguration Inapproximability: Query Complexity vs. Soundness Gap Trade-offs

    Authors: Venkatesan Guruswami, Xuandi Ren, Kewen Wu

    Abstract: The Reconfiguration Inapproximability Hypothesis (RIH), recently established by Hirahara-Ohsaka (STOC'24) and Karthik-Manurangsi (ECCC'24), studies the hardness of reconfiguring one solution into another in constraint satisfaction problems (CSP) when restricted to approximate intermediate solutions. In this work, we make a tighter connection between RIH's soundness gap and that of probabilisticall… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  6. arXiv:2507.01059  [pdf, ps, other

    cs.MA cs.AI cs.CL cs.CV cs.RO

    Automated Vehicles Should be Connected with Natural Language

    Authors: Xiangbo Gao, Keshu Wu, Hao Zhang, Kexin Tian, Yang Zhou, Zhengzhong Tu

    Abstract: Multi-agent collaborative driving promises improvements in traffic safety and efficiency through collective perception and decision making. However, existing communication media -- including raw sensor data, neural network features, and perception results -- suffer limitations in bandwidth efficiency, information completeness, and agent interoperability. Moreover, traditional approaches have large… ▽ More

    Submitted 29 June, 2025; originally announced July 2025.

  7. arXiv:2506.21123  [pdf, ps, other

    eess.SP

    Characterization of Rydberg-Atom Signal Reception of Dual-Frequency Signals Coupled with Two Energy Levels

    Authors: Hao Wu, Chongwu Xie, Xinyuan Yao, Kang-Da Wu, Shanchi Wu, Rui Ni, Guo-Yong Xiang, Chen Gong

    Abstract: Rydberg atomic sensors have been adopted for novel radio frequency (RF) measurement technique and the sensing capability for signals in multiple frequencies makes it attractive for multi-user communication. However, unlike traditional antennas where the signals in multiple frequencies are orthogonal, the received signals of atomic sensors corresponding to different energy levels will be downconver… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  8. arXiv:2506.20707  [pdf, ps, other

    astro-ph.CO

    SPT-3G D1: CMB temperature and polarization power spectra and cosmology from 2019 and 2020 observations of the SPT-3G Main field

    Authors: E. Camphuis, W. Quan, L. Balkenhol, A. R. Khalife, F. Ge, F. Guidi, N. Huang, G. P. Lynch, Y. Omori, C. Trendafilova, A. J. Anderson, B. Ansarinejad, M. Archipley, P. S. Barry, K. Benabed, A. N. Bender, B. A. Benson, F. Bianchini, L. E. Bleem, F. R. Bouchet, L. Bryant, M. G. Campitiello, J. E. Carlstrom, C. L. Chang, P. Chaubal , et al. (72 additional authors not shown)

    Abstract: We present measurements of the temperature and E-mode polarization angular power spectra of the cosmic microwave background (CMB) from observations of 4% of the sky with SPT-3G, the current camera on the South Pole Telescope (SPT). The maps used in this analysis are the deepest used in a CMB TT/TE/EE analysis to date. The maps and resulting power spectra have been validated through blind and unbli… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: The manuscript contains 83 pages, 42 figures, and 11 tables

  9. arXiv:2506.20535  [pdf, ps, other

    cs.DC cs.AI cs.LG

    WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

    Authors: Hongzhen Huang, Kunming Zhang, Hanlong Liao, Kui Wu, Guoming Tang

    Abstract: The rapid advancement of AI, particularly large language models (LLMs), has raised significant concerns about the energy use and carbon emissions associated with model training and inference. However, existing tools for measuring and reporting such impacts are often fragmented, lacking systematic metric integration and offering limited support for correlation analysis among them. This paper presen… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures and 5 tables

  10. arXiv:2506.19684  [pdf, ps, other

    eess.SP

    Beyond 200 Gb/s/lane: An Analytical Approach to Optimal Detection in Shaped IM-DD Optical Links with Relative Intensity Noise

    Authors: Felipe Villenas, Kaiquan Wu, Yunus Can Gültekin, Jamal Riani, Alex Alvarado

    Abstract: Next-generation intensity-modulation (IM) and direct-detection (DD) systems used in data centers are expected to operate at 400 Gb/s/lane and beyond. Such rates can be achieved by increasing the system bandwidth or the modulation format, which in turn requires maintaining or increasing the signal-to-noise ratio (SNR). Such SNR requirements can be achieved by increasing the transmitted optical powe… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: preprint

  11. arXiv:2506.19627  [pdf, ps, other

    eess.SP

    On Error Rate Approximations for FSO Systems with Weak Turbulence and Pointing Errors

    Authors: Carmen Álvarez Roa, Yunus Can Gültekin, Kaiquan Wu, Cornelis Willem Korevaar, Alex Alvarado

    Abstract: Atmospheric attenuation, atmospheric turbulence, geometric spread, and pointing errors, degrade the performance of free-space optical transmission. In the weak turbulence regime, the probability density function describing the distribution of the channel fading coefficient that models these four effects is known in the literature. This function is an integral equation, which makes it difficult to… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  12. arXiv:2506.19283  [pdf, ps, other

    cs.CV cs.AI cs.RO

    AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration

    Authors: Xiangbo Gao, Yuheng Wu, Fengze Yang, Xuewen Luo, Keshu Wu, Xinghao Chen, Yuping Wang, Chenxi Liu, Yang Zhou, Zhengzhong Tu

    Abstract: While multi-vehicular collaborative driving demonstrates clear advantages over single-vehicle autonomy, traditional infrastructure-based V2X systems remain constrained by substantial deployment costs and the creation of "uncovered danger zones" in rural and suburban areas. We present AirV2X-Perception, a large-scale dataset that leverages Unmanned Aerial Vehicles (UAVs) as a flexible alternative o… ▽ More

    Submitted 2 July, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  13. arXiv:2506.18512  [pdf, ps, other

    eess.IV cs.CL cs.CV q-bio.QM

    MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis

    Authors: Yuting Zhang, Kaishen Yuan, Hao Lu, Yutao Yue, Jintai Chen, Kaishun Wu

    Abstract: Accurate and interpretable multi-disease diagnosis remains a critical challenge in medical research, particularly when leveraging heterogeneous multimodal medical data. Current approaches often rely on single-modal data, limiting their ability to comprehensively understand complex diseases. To address this, we propose MedTVT-R1, a novel Multimodal Large Language Model (MLLM) framework designed to… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  14. arXiv:2506.16160  [pdf, ps, other

    cs.CV

    Align the GAP: Prior-based Unified Multi-Task Remote Physiological Measurement Framework For Domain Generalization and Personalization

    Authors: Jiyao Wang, Xiao Yang, Hao Lu, Dengbo He, Kaishun Wu

    Abstract: Multi-source synsemantic domain generalization (MSSDG) for multi-task remote physiological measurement seeks to enhance the generalizability of these metrics and attracts increasing attention. However, challenges like partial labeling and environmental noise may disrupt task-specific accuracy. Meanwhile, given that real-time adaptation is necessary for personalized products, the test-time personal… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  15. arXiv:2506.12854  [pdf, other

    astro-ph.EP astro-ph.IM astro-ph.SR

    A Far-Infrared Search for Planet Nine Using AKARI All-Sky Survey

    Authors: Amos Y. -A. Chen, Tomotsugu Goto, Issei Yamamura, Takao Nakagawa, Cossas K. -W. Wu, Terry Long Phan, Tetsuya Hashimoto, Yuri Uno, Simon C. -C. Ho, Seong Jin Kim

    Abstract: An unusual orbital element clustering of Kuiper belt objects (KBOs) has been observed. The most promising dynamic solution is the presence of a giant planet in the outer Solar system, Planet Nine. However, due to its extreme distance, intensive searches in optical have not been successful. We aim to find Planet Nine in the far-infrared, where it has the peak of the black body radiation, using the… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 12 pages, 8 figures, 2 tables. Accepted for publication in Publications of the Astronomical Society of Australia. For more information, see https://docs.google.com/document/d/1WUfVFAe-cWPEq4YJVCMagR3PaN_dudgWsGs9oZnVvQc/edit?usp=sharing

  16. arXiv:2506.10729  [pdf

    cond-mat.mes-hall

    Construction of Kondo Chains by Engineering Porphyrin π-Radicals on Au(111)

    Authors: Yan Zhao, Kaiyue Jiang, Peng-Yi Liu, Ruoning Li, Jie Li, Xin Li, Xinchen Fang, Anjing Zhao, Yutong Zhu, Hongxiang Xu, Ting Chen, Dong Wang, Xiaodong Zhuang, Shimin Hou, Kai Wu, Song Gao, Qing-Feng Sun, Yajie Zhang, Yongfeng Wang

    Abstract: Quantum manipulation of molecular radical spins provides a crucial platform for exploring emergent phenomena in many-body systems. Here, we combine surface-confined synthesis with scanning tunneling microscopy (STM) tip-induced dehydrogenation to achieve atom-precise engineering of quasi-one-dimensional porphyrin-based Kondo chains (1-7 units) on Au(111). Key design innovations leverage large-size… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  17. arXiv:2506.10155  [pdf

    cs.CL cs.AI cs.LG

    Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities

    Authors: Elizabeth Demers, Victor Xiaoqi Wang, Kean Wu

    Abstract: Human capital (HC) is increasingly important to corporate value creation. Unlike other assets, however, HC is not currently subject to well-defined measurement or disclosure rules. We use a machine learning algorithm (word2vec) trained on a confirmed set of HC disclosures to develop a comprehensive list of HC-related keywords classified into five subcategories (DEI; health and safety; labor relati… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 50 pages, 6 figures, 5 tables

    Journal ref: Journal of Information Systems 38 (2024) 163-186

  18. arXiv:2506.09482  [pdf, ps, other

    cs.CV

    Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression

    Authors: Dingcheng Zhen, Qian Qiao, Tan Yu, Kangxi Wu, Ziwei Zhang, Siyuan Liu, Shunshun Yin, Ming Tao

    Abstract: We introduce TransDiff, the first image generation model that marries Autoregressive (AR) Transformer with diffusion models. In this joint modeling framework, TransDiff encodes labels and images into high-level semantic features and employs a diffusion model to estimate the distribution of image samples. On the ImageNet 256x256 benchmark, TransDiff significantly outperforms other image generation… ▽ More

    Submitted 15 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  19. arXiv:2506.09124  [pdf, ps, other

    hep-th hep-ph

    The geometric bookkeeping guide to Feynman integral reduction and $\varepsilon$-factorised differential equations

    Authors: Iris Bree, Federico Gasparotto, Antonela Matijašić, Pouria Mazloumi, Dmytro Melnichenko, Sebastian Pögel, Toni Teschke, Xing Wang, Stefan Weinzierl, Konglong Wu, Xiaofeng Xu

    Abstract: We report on three improvements in the context of Feynman integral reduction and $\varepsilon$-factorised differential equations: Firstly, we show that with a specific choice of prefactors, we trivialise the $\varepsilon$-dependence of the integration-by-parts identities. Secondly, we observe that with a specific choice of order relation in the Laporta algorithm, we directly obtain a basis of mast… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 8 pages

  20. arXiv:2506.08822  [pdf, ps, other

    cs.RO cs.AI

    FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency

    Authors: Yifei Su, Ning Liu, Dong Chen, Zhen Zhao, Kun Wu, Meng Li, Zhiyuan Xu, Zhengping Che, Jian Tang

    Abstract: Generative modeling-based visuomotor policies have been widely adopted in robotic manipulation attributed to their ability to model multimodal action distributions. However, the high inference cost of multi-step sampling limits their applicability in real-time robotic systems. To address this issue, existing approaches accelerate the sampling process in generative modeling-based visuomotor policie… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  21. arXiv:2506.07179  [pdf, ps, other

    cs.LG cs.AI

    Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting

    Authors: Kaiqi Wu, Weiyang Kong, Sen Zhang, Yubao Liu, Zitong Chen

    Abstract: Traffic prediction is a critical task in spatial-temporal forecasting with broad applications in travel planning and urban management. Adaptive graph convolution networks have emerged as mainstream solutions due to their ability to learn node embeddings in a data-driven manner and capture complex latent dependencies. However, existing adaptive graph learning methods for traffic forecasting often e… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  22. arXiv:2506.06644  [pdf, ps, other

    cs.LG stat.ML

    Spark Transformer: Reactivating Sparsity in FFN and Attention

    Authors: Chong You, Kan Wu, Zhipeng Jia, Lin Chen, Srinadh Bhojanapalli, Jiaxian Guo, Utku Evci, Jan Wassenberg, Praneeth Netrapalli, Jeremiah J. Willcock, Suvinay Subramanian, Felix Chern, Alek Andreev, Shreya Pathak, Felix Yu, Prateek Jain, David E. Culler, Henry M. Levy, Sanjiv Kumar

    Abstract: The discovery of the lazy neuron phenomenon in trained Transformers, where the vast majority of neurons in their feed-forward networks (FFN) are inactive for each token, has spurred tremendous interests in activation sparsity for enhancing large model efficiency. While notable progress has been made in translating such sparsity to wall-time benefits, modern Transformers have moved away from the Re… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  23. arXiv:2506.05630  [pdf, ps, other

    astro-ph.HE

    X-ray Polarization Detection of the Pulsar Wind Nebula in G21.5-0.9 with IXPE

    Authors: Niccolò Di Lalla, Nicola Omodei, Niccolò Bucciantini, Jack T. Dinsmore, Nicolò Cibrario, Stefano Silvestri, Josephine Wong, Patrick Slane, Tsunefumi Mizuno, Michela Negro, Roger W. Romani, Riccardo Ferrazzoli, Stephen Chi-Yung Ng, Miltiadis Michailidis, Yi-Jung Yang, Fei Xie, Martin C. Weisskopf, Philip Kaaret, Iván Agudo, L. A. Antonelli, Matteo Bachetti, Luca Baldini, Wayne H. Baumgartner, Ronaldo Bellazzini, Stefano Bianchi , et al. (76 additional authors not shown)

    Abstract: We present the X-ray polarization observation of G21.5-0.9, a young Galactic supernova remnant (SNR), conducted with the Imaging X-ray Polarimetry Explorer (IXPE) in October 2023, with a total livetime of approximately 837 ks. Using different analysis methods, such as a space-integrated study of the entire region of the PWN and a space-resolved polarization map, we detect significant polarization… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  24. arXiv:2506.04941  [pdf, ps, other

    cs.RO

    ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning

    Authors: Zhao Jin, Zhengping Che, Zhen Zhao, Kun Wu, Yuheng Zhang, Yinuo Zhao, Zehui Liu, Qiang Zhang, Xiaozhu Ju, Jing Tian, Yousong Xue, Jian Tang

    Abstract: Robot learning increasingly relies on simulation to advance complex ability such as dexterous manipulations and precise interactions, necessitating high-quality digital assets to bridge the sim-to-real gap. However, existing open-source articulated-object datasets for simulation are limited by insufficient visual realism and low physical fidelity, which hinder their utility for training models mas… ▽ More

    Submitted 5 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  25. arXiv:2506.04562  [pdf, other

    cs.GR cs.CV

    Handle-based Mesh Deformation Guided By Vision Language Model

    Authors: Xingpeng Sun, Shiyang Jia, Zherong Pan, Kui Wu, Aniket Bera

    Abstract: Mesh deformation is a fundamental tool in 3D content manipulation. Despite extensive prior research, existing approaches often suffer from low output quality, require significant manual tuning, or depend on data-intensive training. To address these limitations, we introduce a training-free, handle-based mesh deformation method. % Our core idea is to leverage a Vision-Language Model (VLM) to interp… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  26. arXiv:2506.03574  [pdf, ps, other

    cs.RO

    SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models

    Authors: Meng Li, Zhen Zhao, Zhengping Che, Fei Liao, Kun Wu, Zhiyuan Xu, Pei Ren, Zhao Jin, Ning Liu, Jian Tang

    Abstract: Robots deployed in dynamic environments must be able to not only follow diverse language instructions but flexibly adapt when user intent changes mid-execution. While recent Vision-Language-Action (VLA) models have advanced multi-task learning and instruction following, they typically assume static task intent, failing to respond when new instructions arrive during ongoing execution. This limitati… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Website: https://switchvla.github.io

  27. arXiv:2506.02917  [pdf, other

    cs.RO

    Text-guided Generation of Efficient Personalized Inspection Plans

    Authors: Xingpeng Sun, Zherong Pan, Xifeng Gao, Kui Wu, Aniket Bera

    Abstract: We propose a training-free, Vision-Language Model (VLM)-guided approach for efficiently generating trajectories to facilitate target inspection planning based on text descriptions. Unlike existing Vision-and-Language Navigation (VLN) methods designed for general agents in unknown environments, our approach specifically targets the efficient inspection of known scenes, with widespread applications… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 8 pages, 5 figures

  28. arXiv:2506.01761  [pdf, ps, other

    eess.SP

    A New 5 bit/2D-symbol Modulation Format for Relative Intensity Noise-dominated IM-DD Systems

    Authors: Felipe Villenas, Kaiquan Wu, Yunus Can Gültekin, Jamal Riani, Alex Alvarado

    Abstract: We propose a novel 5-bit/2D-symbol modulation format based on PAM-6 optimized for IM-DD systems dominated by relative intensity noise. The proposed modulation scheme improves SNR by 0.94 dB compared to conventional PAM-6 and achieves near-optimal BER performance.

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Submitted to ECOC 2025

  29. arXiv:2506.01376  [pdf, ps, other

    cs.LG

    Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training

    Authors: Minghao Xu, Jiaze Song, Keming Wu, Xiangxin Zhou, Bin Cui, Wentao Zhang

    Abstract: Understanding the various properties of glycans with machine learning has shown some preliminary promise. However, previous methods mainly focused on modeling the backbone structure of glycans as graphs of monosaccharides (i.e., sugar units), while they neglected the atomic structures underlying each monosaccharide, which are actually important indicators of glycan properties. We fill this blank b… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Published at ICML 2025. All code and data are released

  30. arXiv:2506.00298  [pdf, ps, other

    astro-ph.GA astro-ph.CO

    Millimeter-wave observations of Euclid Deep Field South using the South Pole Telescope: A data release of temperature maps and catalogs

    Authors: M. Archipley, A. Hryciuk, L. E. Bleem, K. Kornoelje, M. Klein, A. J. Anderson, B. Ansarinejad, M. Aravena, L. Balkenhol, P. S. Barry, K. Benabed, A. N. Bender, B. A. Benson, F. Bianchini, S. Bocquet, F. R. Bouchet, E. Camphuis, M. G. Campitiello, J. E. Carlstrom, J. Cathey, C. L. Chang, S. C. Chapman, P. Chaubal, P. M. Chichura, A. Chokshi , et al. (86 additional authors not shown)

    Abstract: Context. The South Pole Telescope third-generation camera (SPT-3G) has observed over 10,000 square degrees of sky at 95, 150, and 220 GHz (3.3, 2.0, 1.4 mm, respectively) overlapping the ongoing 14,000 square-degree Euclid Wide Survey. The Euclid collaboration recently released Euclid Deep Field observations in the first quick data release (Q1). Aims. With the goal of releasing complementary milli… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: 26 pages, 12 figures, to be submitted to A&A

  31. arXiv:2505.24476  [pdf, ps, other

    cs.CV

    Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model

    Authors: Yuting Zhang, Hao Lu, Qingyong Hu, Yin Wang, Kaishen Yuan, Xin Liu, Kaishun Wu

    Abstract: Periodic or quasi-periodic phenomena reveal intrinsic characteristics in various natural processes, such as weather patterns, movement behaviors, traffic flows, and biological signals. Given that these phenomena span multiple modalities, the capabilities of Multimodal Large Language Models (MLLMs) offer promising potential to effectively capture and understand their complex nature. However, curren… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  32. arXiv:2505.23189  [pdf, ps, other

    cs.RO cs.CV

    TrackVLA: Embodied Visual Tracking in the Wild

    Authors: Shaoan Wang, Jiazhao Zhang, Minghan Li, Jiahang Liu, Anqi Li, Kui Wu, Fangwei Zhong, Junzhi Yu, Zhizheng Zhang, He Wang

    Abstract: Embodied visual tracking is a fundamental skill in Embodied AI, enabling an agent to follow a specific target in dynamic environments using only egocentric vision. This task is inherently challenging as it requires both accurate target recognition and effective trajectory planning under conditions of severe occlusion and high scene dynamics. Existing approaches typically address this challenge thr… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  33. arXiv:2505.22787  [pdf, ps, other

    cs.CL

    Can Large Language Models Match the Conclusions of Systematic Reviews?

    Authors: Christopher Polzak, Alejandro Lozano, Min Woo Sun, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy

    Abstract: Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  34. arXiv:2505.22523  [pdf, ps, other

    cs.CV

    PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

    Authors: Junwen Chen, Heyang Jiang, Yanbin Wang, Keming Wu, Ji Li, Chao Zhang, Keiji Yanai, Dong Chen, Yuhui Yuan

    Abstract: Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. I… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Homepage: https://prism-layers.github.io/

  35. arXiv:2505.21743  [pdf, other

    cs.LG cs.AI

    Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen

    Authors: Zihao Li, Xinyuan Cao, Xiangbo Gao, Kexin Tian, Keshu Wu, Mohammad Anis, Hao Zhang, Keke Long, Jiwan Jiang, Xiaopeng Li, Yunlong Zhang, Tianbao Yang, Dominique Lord, Zhengzhong Tu, Yang Zhou

    Abstract: Traffic safety science has long been hindered by a fundamental data paradox: the crashes we most wish to prevent are precisely those events we rarely observe. Existing crash-frequency models and surrogate safety metrics rely heavily on sparse, noisy, and under-reported records, while even sophisticated, high-fidelity simulations undersample the long-tailed situations that trigger catastrophic outc… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  36. arXiv:2505.21349  [pdf, ps, other

    cs.CE

    Out of the Past: An AI-Enabled Pipeline for Traffic Simulation from Noisy, Multimodal Detector Data and Stakeholder Feedback

    Authors: Rex Chen, Karen Wu, John McCartney, Norman Sadeh, Fei Fang

    Abstract: How can a traffic simulation be designed to faithfully reflect real-world traffic conditions? Past data-driven approaches to traffic simulation in the literature have relied on unrealistic or suboptimal heuristics. They also fail to adequately account for the effects of uncertainty and multimodality in the data on simulation outcomes. In this work, we integrate advances in AI to construct a three-… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 12 pages; 5 figures; preprint version

  37. arXiv:2505.21220  [pdf, ps, other

    astro-ph.CO cs.LG

    Wavelet Flow For Extragalactic Foreground Simulations

    Authors: M. Mebratu, W. L. K. Wu

    Abstract: Extragalactic foregrounds in cosmic microwave background (CMB) observations are both a source of cosmological and astrophysical information and a nuisance to the CMB. Effective field-level modeling that captures their non-Gaussian statistical distributions is increasingly important for optimal information extraction, particularly given the precise and low-noise observations from current and upcomi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 19 pages, 7 figures

  38. arXiv:2505.20718  [pdf, ps, other

    cs.CV cs.AI

    VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models

    Authors: Kui Wu, Shuhang Xu, Hao Chen, Churan Wang, Zhoujun Li, Yizhou Wang, Fangwei Zhong

    Abstract: We introduce a novel self-improving framework that enhances Embodied Visual Tracking (EVT) with Vision-Language Models (VLMs) to address the limitations of current active visual tracking systems in recovering from tracking failure. Our approach combines the off-the-shelf active tracking methods with VLMs' reasoning capabilities, deploying a fast visual policy for normal tracking and activating VLM… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  39. arXiv:2505.20710  [pdf, ps, other

    cs.CV

    Hierarchical Instruction-aware Embodied Visual Tracking

    Authors: Kui Wu, Hao Chen, Churan Wang, Fakhri Karray, Zhoujun Li, Yizhou Wang, Fangwei Zhong

    Abstract: User-Centric Embodied Visual Tracking (UC-EVT) presents a novel challenge for reinforcement learning-based models due to the substantial gap between high-level user instructions and low-level agent actions. While recent advancements in language models (e.g., LLMs, VLMs, VLAs) have improved instruction comprehension, these models face critical limitations in either inference speed (LLMs, VLMs) or g… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  40. arXiv:2505.19611  [pdf, other

    cs.CV cs.AI

    Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning

    Authors: Ruolin Shen, Xiaozhong Ji, Kai WU, Jiangning Zhang, Yijun He, HaiHua Yang, Xiaobin Hu, Xiaoyu Sun

    Abstract: Current multi-modal models exhibit a notable misalignment with the human visual system when identifying objects that are visually assimilated into the background. Our observations reveal that these multi-modal models cannot distinguish concealed objects, demonstrating an inability to emulate human cognitive processes which effectively utilize foreground-background similarity principles for visual… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Project Website: \url{https://github.com/HUuxiaobin/VRRF}

  41. arXiv:2505.19539  [pdf, ps, other

    eess.SP

    Water Level Sensing via Communication Signals in a Bi-Static System

    Authors: Zhongqin Wang, J. Andrew Zhang, Kai Wu, Y. Jay Guo

    Abstract: Accurate water level sensing is essential for flood monitoring, agricultural irrigation, and water resource optimization. Traditional methods require dedicated sensor deployments, leading to high installation costs, vulnerability to interference, and limited resolution. This work proposes PMNs-WaterSense, a novel scheme leveraging Channel State Information (CSI) from existing mobile networks for w… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  42. arXiv:2505.18805  [pdf, ps, other

    cs.GR

    DiffHairCard: Auto Hair Card Extraction with Differentiable Rendering

    Authors: Zhongtian Zheng, Tao Huang, Haozhe Su, Xueqi Ma, Yuefan Shen, Tongtong Wang, Yin Yang, Xifeng Gao, Zherong Pan, Kui Wu

    Abstract: Hair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited numbe… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  43. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 4 July, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  44. arXiv:2505.12884  [pdf, ps, other

    cs.LG cs.AI cs.CV

    TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks

    Authors: Yuanze Hu, Zhaoxin Fan, Xinyu Wang, Gen Li, Ye Qiu, Zhichao Yang, Wenjun Wu, Kejian Wu, Yifan Sun, Xiaotie Deng, Jin Dong

    Abstract: Lightweight Vision-Language Models (VLMs) are indispensable for resource-constrained applications. The prevailing approach to aligning vision and language models involves freezing both the vision encoder and the language model while training small connector modules. However, this strategy heavily depends on the intrinsic capabilities of the language model, which can be suboptimal for lightweight m… ▽ More

    Submitted 30 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  45. arXiv:2505.12285  [pdf, ps, other

    cs.NE

    CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design

    Authors: Ziyao Huang, Weiwei Wu, Kui Wu, Jianping Wang, Wei-Bin Lee

    Abstract: Tackling complex optimization problems often relies on expert-designed heuristics, typically crafted through extensive trial and error. Recent advances demonstrate that large language models (LLMs), when integrated into well-designed evolutionary search frameworks, can autonomously discover high-performing heuristics at a fraction of the traditional cost. However, existing approaches predominantly… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  46. arXiv:2505.11733  [pdf, ps, other

    cs.CL

    MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports

    Authors: Kevin Wu, Eric Wu, Rahul Thapa, Kevin Wei, Angela Zhang, Arvind Suresh, Jacqueline J. Tao, Min Woo Sun, Alejandro Lozano, James Zou

    Abstract: Doctors and patients alike increasingly use Large Language Models (LLMs) to diagnose clinical cases. However, unlike domains such as math or coding, where correctness can be objectively defined by the final answer, medical diagnosis requires both the outcome and the reasoning process to be accurate. Currently, widely used medical benchmarks like MedQA and MMLU assess only accuracy in the final ans… ▽ More

    Submitted 20 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  47. arXiv:2505.11462  [pdf, ps, other

    cs.CL cs.AI

    Disentangling Reasoning and Knowledge in Medical Large Language Models

    Authors: Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou

    Abstract: Medical reasoning in large language models (LLMs) aims to emulate clinicians' diagnostic thinking, but current benchmarks such as MedQA-USMLE, MedMCQA, and PubMedQA often mix reasoning with factual recall. We address this by separating 11 biomedical QA benchmarks into reasoning- and knowledge-focused subsets using a PubMedBERT classifier that reaches 81 percent accuracy, comparable to human perfor… ▽ More

    Submitted 23 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  48. arXiv:2505.10560  [pdf, other

    cs.DB cs.NI

    Approximation-First Timeseries Monitoring Query At Scale

    Authors: Zeying Zhu, Jonathan Chamberlain, Kenny Wu, David Starobinski, Zaoxing Liu

    Abstract: Timeseries monitoring systems such as Prometheus play a crucial role in gaining observability of the underlying system components. These systems collect timeseries metrics from various system components and perform monitoring queries over periodic window-based aggregations (i.e., rule queries). However, despite wide adoption, the operational costs and query latency of rule queries remain high. In… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  49. arXiv:2505.09126  [pdf, ps, other

    math.DS

    Multiple parameter bifurcations in a modified Gower-Leslie predator-prey system with addictive Allee effect

    Authors: Xiaoling Wang, Kuilin Wu, Lan Zou

    Abstract: In this paper, we explore a modified Leslie-Gower type predator-prey model with Holling I functional response and addictive Allee effect in prey. It is shown that the highest codimension of a nilpotent cusp 4, and the model can undergo degenerate Bogdanov-Takens bifurcation of codimension 4. Besides, when the model has a center-type equilibrium, we show that it is a weak focus with order 4, and th… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 29 pages, 9 figures

  50. arXiv:2505.08854  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Generative AI for Autonomous Driving: Frontiers and Opportunities

    Authors: Yuping Wang, Shuo Xing, Cui Can, Renjie Li, Hongyuan Hua, Kexin Tian, Zhaobin Mo, Xiangbo Gao, Keshu Wu, Sulong Zhou, Hengxu You, Juntong Peng, Junge Zhang, Zehao Wang, Rui Song, Mingxuan Yan, Walter Zimmer, Xingcheng Zhou, Peiran Li, Zhaohan Lu, Chia-Ju Chen, Yue Huang, Ryan A. Rossi, Lichao Sun, Hongkai Yu , et al. (22 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.