Skip to main content

Showing 1–50 of 4,737 results for author: Chen, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10446  [pdf, ps, other

    cs.CL

    Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

    Authors: Zemin Huang, Zhiyang Chen, Zijun Wang, Tiancheng Li, Guo-Jun Qi

    Abstract: We introduce the \emph{Diffusion Chain of Lateral Thought (DCoLT)}, a reasoning framework for diffusion language models. DCoLT treats each intermediate step in the reverse diffusion process as a latent "thinking" action and optimizes the entire reasoning trajectory to maximize the reward on the correctness of the final answer with outcome-based Reinforcement Learning (RL). Unlike traditional Chain… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10278  [pdf, ps, other

    cs.AI

    MASS: Multi-Agent Simulation Scaling for Portfolio Construction

    Authors: Taian Guo, Haiyang Shen, Jinsheng Huang, Zhengyang Mao, Junyu Luo, Zhuoru Chen, Xuhui Liu, Bingyu Xia, Luchen Liu, Yun Ma, Ming Zhang

    Abstract: LLM-based multi-agent has gained significant attention for their potential in simulation and enhancing performance. However, existing works are limited to pure simulations or are constrained by predefined workflows, restricting their applicability and effectiveness. In this paper, we introduce the Multi-Agent Scaling Simulation (MASS) for portfolio construction. MASS achieves stable and continuous… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.10007  [pdf, ps, other

    cs.LG math.OC stat.ML

    Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning

    Authors: Zijun Chen, Shengbo Wang, Nian Si

    Abstract: Motivated by practical applications where stable long-term performance is critical-such as robotics, operations research, and healthcare-we study the problem of distributionally robust (DR) average-reward reinforcement learning. We propose two algorithms that achieve near-optimal sample complexity. The first reduces the problem to a DR discounted Markov decision process (MDP), while the second, An… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2505.09103  [pdf, ps, other

    cs.RO

    VGC-RIO: A Tightly Integrated Radar-Inertial Odometry with Spatial Weighted Doppler Velocity and Local Geometric Constrained RCS Histograms

    Authors: Jianguang Xiang, Xiaofeng He, Zizhuo Chen, Lilian Zhang, Xincan Luo, Jun Mao

    Abstract: Recent advances in 4D radar-inertial odometry have demonstrated promising potential for autonomous lo calization in adverse conditions. However, effective handling of sparse and noisy radar measurements remains a critical challenge. In this paper, we propose a radar-inertial odometry with a spatial weighting method that adapts to unevenly distributed points and a novel point-description histogram… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  5. arXiv:2505.08527  [pdf, other

    cs.CV

    Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting

    Authors: Zheang Huai, Hui Tang, Yi Li, Zhuangzhuang Chen, Xiaomeng Li

    Abstract: Source-free domain adaptation (SFDA) for segmentation aims at adapting a model trained in the source domain to perform well in the target domain with only the source model and unlabeled target data.Inspired by the recent success of Segment Anything Model (SAM) which exhibits the generality of segmenting images of various modalities and in different domains given human-annotated prompts like boundi… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.08151  [pdf, other

    cs.AI

    Foundation Models Knowledge Distillation For Battery Capacity Degradation Forecast

    Authors: Joey Chan, Zhen Chen, Ershun Pan

    Abstract: Accurate estimation of lithium-ion battery capacity degradation is critical for enhancing the reliability and safety of battery operations. Traditional expert models, tailored to specific scenarios, provide isolated estimations. With the rapid advancement of data-driven techniques, a series of general-purpose time-series foundation models have been developed. However, foundation models specificall… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2505.07591  [pdf, ps, other

    cs.CL cs.AI

    A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

    Authors: Junjie Ye, Caishuang Huang, Zhuohan Chen, Wenjie Fu, Chenyuan Yang, Leyi Yang, Yilong Wu, Peng Wang, Meng Zhou, Xiaolong Yang, Tao Gui, Qi Zhang, Zhongchao Shi, Jianping Fan, Xuanjing Huang

    Abstract: Instruction following evaluates large language models (LLMs) on their ability to generate outputs that adhere to user-defined constraints. However, existing benchmarks often rely on templated constraint prompts, which lack the diversity of real-world usage and limit fine-grained performance assessment. To fill this gap, we propose a multi-dimensional constraint framework encompassing three constra… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  8. arXiv:2505.07515  [pdf, ps, other

    cs.DS math.PR

    Improved Mixing of Critical Hardcore Model

    Authors: Zongchen Chen, Tianhui Jiang

    Abstract: The hardcore model is one of the most classic and widely studied examples of undirected graphical models. Given a graph $G$, the hardcore model describes a Gibbs distribution of $λ$-weighted independent sets of $G$. In the last two decades, a beautiful computational phase transition has been established at a precise threshold $λ_c(Δ)$ where $Δ$ denotes the maximum degree, where the task of samplin… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 28 pages, 0 figures

  9. arXiv:2505.07396  [pdf, ps, other

    cs.CV cs.LG

    TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset

    Authors: Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna, Mathias Pechinger, Zhaiyu Chen, Yao Sun, Alejandro Rueda Segura, Ziyang Xu, Omar AbdelGafar, Mansour Mehranfar, Chandan Yeshwanth, Yueh-Cheng Liu, Hadi Yazdi, Jiapan Wang, Stefan Auer, Katharina Anders, Klaus Bogenberger, Andre Borrmann , et al. (9 additional authors not shown)

    Abstract: Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually… ▽ More

    Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing

  10. arXiv:2505.07180  [pdf, ps, other

    cs.LG stat.ML

    Causal View of Time Series Imputation: Some Identification Results on Missing Mechanism

    Authors: Ruichu Cai, Kaitao Zheng, Junxian Huang, Zijian Li, Zhengming Chen, Boyan Xu, Zhifeng Hao

    Abstract: Time series imputation is one of the most challenge problems and has broad applications in various fields like health care and the Internet of Things. Existing methods mainly aim to model the temporally latent dependencies and the generation process from the observed time series data. In real-world scenarios, different types of missing mechanisms, like MAR (Missing At Random), and MNAR (Missing No… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  11. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  12. arXiv:2505.07035  [pdf, ps, other

    cs.IT eess.SP

    Robust Movable-Antenna Position Optimization with Imperfect CSI for MISO Systems

    Authors: Haifeng Ma, Weidong Mei, Xin Wei, Boyu Ning, Zhi Chen

    Abstract: Movable antenna (MA) technology has emerged as a promising solution for reconfiguring wireless channel conditions through local antenna movement within confined regions. Unlike previous works assuming perfect channel state information (CSI), this letter addresses the robust MA position optimization problem under imperfect CSI conditions for a multiple-input single-output (MISO) MA system. Specific… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted to IEEE Communications Letters

  13. arXiv:2505.06918  [pdf, other

    eess.IV cs.CV cs.LG

    Uni-AIMS: AI-Powered Microscopy Image Analysis

    Authors: Yanhui Hong, Nan Wang, Zhiyi Xia, Haoyi Tao, Xi Fang, Yiming Li, Jiankun Wang, Peng Jin, Xiaochen Cai, Shengyu Li, Ziqi Chen, Zezhong Zhang, Guolin Ke, Linfeng Zhang

    Abstract: This paper presents a systematic solution for the intelligent recognition and automatic analysis of microscopy images. We developed a data engine that generates high-quality annotated datasets through a combination of the collection of diverse microscopy images from experiments, synthetic data generation and a human-in-the-loop annotation process. To address the unique challenges of microscopy ima… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  14. arXiv:2505.06860  [pdf, other

    cs.CR cs.AI cs.LG

    DP-TRAE: A Dual-Phase Merging Transferable Reversible Adversarial Example for Image Privacy Protection

    Authors: Xia Du, Jiajie Zhu, Jizhe Zhou, Chi-man Pun, Zheng Lin, Cong Wu, Zhe Chen, Jun Luo

    Abstract: In the field of digital security, Reversible Adversarial Examples (RAE) combine adversarial attacks with reversible data hiding techniques to effectively protect sensitive data and prevent unauthorized analysis by malicious Deep Neural Networks (DNNs). However, existing RAE techniques primarily focus on white-box attacks, lacking a comprehensive evaluation of their effectiveness in black-box scena… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 12 pages, 5 figures

  15. arXiv:2505.06288  [pdf

    cs.LG stat.ML

    IIKL: Isometric Immersion Kernel Learning with Riemannian Manifold for Geometric Preservation

    Authors: Zihao Chen, Wenyong Wang, Jiachen Yang, Yu Xiang

    Abstract: Geometric representation learning in preserving the intrinsic geometric and topological properties for discrete non-Euclidean data is crucial in scientific applications. Previous research generally mapped non-Euclidean discrete data into Euclidean space during representation learning, which may lead to the loss of some critical geometric information. In this paper, we propose a novel Isometric Imm… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 16 pages, 14 figures

  16. arXiv:2505.06217  [pdf, ps, other

    cs.CV

    Adapting a Segmentation Foundation Model for Medical Image Classification

    Authors: Pengfei Gu, Haoteng Tang, Islam A. Ebeid, Jose A. Nunez, Fabian Vazquez, Diego Adame, Marcus Zhan, Huimin Li, Bin Fu, Danny Z. Chen

    Abstract: Recent advancements in foundation models, such as the Segment Anything Model (SAM), have shown strong performance in various vision tasks, particularly image segmentation, due to their impressive zero-shot segmentation capabilities. However, effectively adapting such models for medical image classification is still a less explored topic. In this paper, we introduce a new framework to adapt SAM for… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  17. arXiv:2505.06118  [pdf, ps, other

    eess.IV cs.AI cs.CV

    The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review

    Authors: Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying

    Abstract: Automatic lymph node segmentation is the cornerstone for advances in computer vision tasks for early detection and staging of cancer. Traditional segmentation methods are constrained by manual delineation and variability in operator proficiency, limiting their ability to achieve high accuracy. The introduction of deep learning technologies offers new possibilities for improving the accuracy of lym… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  18. arXiv:2505.05914  [pdf, ps, other

    cs.IT eess.SP

    Mechanical Power Modeling and Energy Efficiency Maximization for Movable Antenna Systems

    Authors: Xin Wei, Weidong Mei, Xuan Huang, Zhi Chen, Boyu Ning

    Abstract: Movable antennas (MAs) have recently garnered significant attention in wireless communications due to their capability to reshape wireless channels via local antenna movement within a confined region. However, to achieve accurate antenna movement, MA drivers introduce non-negligible mechanical power consumption, rendering energy efficiency (EE) optimization more critical compared to conventional f… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  19. arXiv:2505.05829  [pdf, other

    cs.CV cs.LG eess.IV

    Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

    Authors: Zhiyuan Chen, Keyi Li, Yifan Jia, Le Ye, Yufei Ma

    Abstract: Diffusion transformer (DiT) models have achieved remarkable success in image generation, thanks for their exceptional generative capabilities and scalability. Nonetheless, the iterative nature of diffusion models (DMs) results in high computation complexity, posing challenges for deployment. Although existing cache-based acceleration methods try to utilize the inherent temporal similarity to skip… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: accepted by CVPR2025

  20. arXiv:2505.05794  [pdf, ps, other

    cs.AR cs.AI cs.NE

    What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips

    Authors: Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang

    Abstract: Large language models (LLMs) are rapidly pushing the limits of contemporary computing hardware. For example, training GPT-3 has been estimated to consume around 1300 MWh of electricity, and projections suggest future models may require city-scale (gigawatt) power budgets. These demands motivate exploration of computing paradigms beyond conventional von Neumann architectures. This review surveys em… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 36 pages, 22 figures

  21. arXiv:2505.05784  [pdf, ps, other

    q-fin.TR cs.AI cs.CE q-fin.CP

    FlowHFT: Flow Policy Induced Optimal High-Frequency Trading under Diverse Market Conditions

    Authors: Yang Li, Zhi Chen, Steve Yang

    Abstract: High-frequency trading (HFT) is an investing strategy that continuously monitors market states and places bid and ask orders at millisecond speeds. Traditional HFT approaches fit models with historical data and assume that future market states follow similar patterns. This limits the effectiveness of any single model to the specific conditions it was trained for. Additionally, these models achieve… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 14 pages, 1 figure, 6 tables, 2 algorithms

  22. arXiv:2505.05724  [pdf, ps, other

    cs.IT

    Towards Secure Semantic Transmission In the Era of GenAI: A Diffusion-based Framework

    Authors: Boxiang He, Zihan Chen, Junshan Luo, Chuanhong Liu, Shilian Wang, Fanggang Wang, Tony Q. S. Quek

    Abstract: Semantic communication, due to its focus on the transmitting meaning rather than the raw bit data, poses unique security challenges compared to the traditional communication systems. In particular, semantic communication systems are vulnerable to the malicious attacks that focus on the semantic layer, with the intention of understanding or distorting the intended meaning of the transmitted privacy… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  23. arXiv:2505.05474  [pdf, ps, other

    cs.CV

    3D Scene Generation: A Survey

    Authors: Beichen Wen, Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

    Abstract: 3D scene generation seeks to synthesize spatially structured, semantically meaningful, and photorealistic environments for applications such as immersive media, robotics, autonomous driving, and embodied AI. Early methods based on procedural rules offered scalability but limited diversity. Recent advances in deep generative models (e.g., GANs, diffusion models) and 3D representations (e.g., NeRF,… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Project Page: https://github.com/hzxie/Awesome-3D-Scene-Generation

  24. arXiv:2505.05301  [pdf, other

    quant-ph cs.LG math.OC

    Operator-Level Quantum Acceleration of Non-Logconcave Sampling

    Authors: Jiaqi Leng, Zhiyan Ding, Zherui Chen, Lin Lin

    Abstract: Sampling from probability distributions of the form $σ\propto e^{-βV}$, where $V$ is a continuous potential, is a fundamental task across physics, chemistry, biology, computer science, and statistics. However, when $V$ is non-convex, the resulting distribution becomes non-logconcave, and classical methods such as Langevin dynamics often exhibit poor performance. We introduce the first quantum algo… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 43 pages, 7 figures

  25. arXiv:2505.05018  [pdf, ps, other

    cs.IT

    Diffusion-enabled Secure Semantic Communication Against Eavesdropping

    Authors: Boxiang He, Zihan Chen, Fanggang Wang, Shilian Wang, Zhijin Qin, Tony Q. S. Quek

    Abstract: In this paper, AN is introduced into semantic communication systems for the first time to prevent semantic eavesdropping. However, the introduction of AN also poses challenges for the legitimate receiver in extracting semantic information. Recently, denoising diffusion probabilistic models (DDPM) have demonstrated their powerful capabilities in generating multimedia content. Here, the paired plugg… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  26. arXiv:2505.04656  [pdf, other

    cs.GR

    MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation

    Authors: Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, Huaping Liu

    Abstract: In this paper, we introduce MeshGen, an advanced image-to-3D pipeline that generates high-quality 3D meshes with detailed geometry and physically based rendering (PBR) textures. Addressing the challenges faced by existing 3D native diffusion models, such as suboptimal auto-encoder performance, limited controllability, poor generalization, and inconsistent image-based PBR texturing, MeshGen employs… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: To appear at CVPR 2025 with highlight

  27. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  28. arXiv:2505.04113  [pdf, ps, other

    cs.SD eess.AS

    Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment

    Authors: Xueyao Zhang, Yuancheng Wang, Chaoren Wang, Ziniu Li, Zhuo Chen, Zhizheng Wu

    Abstract: Modern zero-shot text-to-speech (TTS) systems, despite using extensive pre-training, often struggle in challenging scenarios such as tongue twisters, repeated words, code-switching, and cross-lingual synthesis, leading to intelligibility issues. To address these limitations, this paper leverages preference alignment techniques, which enable targeted construction of out-of-pretraining-distribution… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  29. arXiv:2505.04073  [pdf

    cs.CL

    Natural Language Generation in Healthcare: A Review of Methods and Applications

    Authors: Mengxian Lyu, Xiaohan Li, Ziyi Chen, Jinqian Pan, Cheng Peng, Sankalp Talankar, Yonghui Wu

    Abstract: Natural language generation (NLG) is the key technology to achieve generative artificial intelligence (AI). With the breakthroughs in large language models (LLMs), NLG has been widely used in various medical applications, demonstrating the potential to enhance clinical workflows, support clinical decision-making, and improve clinical documentation. Heterogeneous and diverse medical data modalities… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  30. arXiv:2505.03985  [pdf, other

    cs.AI cs.SE

    LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration

    Authors: Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Martini, Meiyi Ma

    Abstract: Emergency response services are critical to public safety, with 9-1-1 call-takers playing a key role in ensuring timely and effective emergency operations. To ensure call-taking performance consistency, quality assurance is implemented to evaluate and refine call-takers' skillsets. However, traditional human-led evaluations struggle with high call volumes, leading to low coverage and delayed asses… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted at IJCAI-2025

  31. arXiv:2505.03814  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs

    Authors: Ganghua Wang, Zhaorun Chen, Bo Li, Haifeng Xu

    Abstract: As foundation models continue to scale, the size of trained models grows exponentially, presenting significant challenges for their evaluation. Current evaluation practices involve curating increasingly large datasets to assess the performance of large language models (LLMs). However, there is a lack of systematic analysis and guidance on determining the sufficiency of test data or selecting infor… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  32. arXiv:2505.03804  [pdf, other

    cs.LG cs.AI

    MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance

    Authors: Xing Hu, Zhixuan Chen, Dawei Yang, Zukang Xu, Chen Xu, Zhihang Yuan, Sifan Zhou, Jiangyong Yu

    Abstract: Mixture-of-Experts (MoE) large language models (LLMs), which leverage dynamic routing and sparse activation to enhance efficiency and scalability, have achieved higher performance while reducing computational costs. However, these models face significant memory overheads, limiting their practical deployment and broader adoption. Post-training quantization (PTQ), a widely used method for compressin… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  33. arXiv:2505.03803  [pdf, other

    cs.LG cs.AI

    RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

    Authors: Chen Xu, Yuxuan Yue, Zukang Xu, Xing Hu, Jiangyong Yu, Zhixuan Chen, Sifan Zhou, Zhihang Yuan, Dawei Yang

    Abstract: RWKV is a modern RNN architecture with comparable performance to Transformer, but still faces challenges when deployed to resource-constrained devices. Post Training Quantization (PTQ), which is a an essential technique to reduce model size and inference latency, has been widely used in Transformer models. However, it suffers significant degradation of performance when applied to RWKV. This paper… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  34. arXiv:2505.03547  [pdf, ps, other

    cs.AI

    STORY2GAME: Generating (Almost) Everything in an Interactive Fiction Game

    Authors: Eric Zhou, Shreyas Basavatia, Moontashir Siam, Zexin Chen, Mark O. Riedl

    Abstract: We introduce STORY2GAME, a novel approach to using Large Language Models to generate text-based interactive fiction games that starts by generating a story, populates the world, and builds the code for actions in a game engine that enables the story to play out interactively. Whereas a given set of hard-coded actions can artificially constrain story generation, the ability to generate actions mean… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  35. arXiv:2505.03310  [pdf, other

    cs.CV

    3D Gaussian Splatting Data Compression with Mixture of Priors

    Authors: Lei Liu, Zhenghao Chen, Dong Xu

    Abstract: 3D Gaussian Splatting (3DGS) data compression is crucial for enabling efficient storage and transmission in 3D scene modeling. However, its development remains limited due to inadequate entropy models and suboptimal quantization strategies for both lossless and lossy compression scenarios, where existing methods have yet to 1) fully leverage hyperprior information to construct robust conditional e… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  36. arXiv:2505.03293  [pdf, other

    cs.CL

    Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback

    Authors: Shijing Zhu, Zhuang Chen, Guanqun Bi, Binghang Li, Yaxi Deng, Dazhen Wan, Libiao Peng, Xiyao Xiao, Rongsheng Zhang, Tangjie Lv, Zhipeng Hu, FangFang Li, Minlie Huang

    Abstract: Large language models (LLMs) have shown promise in providing scalable mental health support, while evaluating their counseling capability remains crucial to ensure both efficacy and safety. Existing evaluations are limited by the static assessment that focuses on knowledge tests, the single perspective that centers on user experience, and the open-loop framework that lacks actionable feedback. To… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: in progress

  37. arXiv:2505.03127  [pdf, ps, other

    eess.SY cs.IT

    Integrated Sensing, Computing, Communication, and Control for Time-Sequence-Based Semantic Communications

    Authors: Qingliang Li, Bo Chang, Weidong Mei, Zhi Chen

    Abstract: In the upcoming industrial internet of things (IIoT) era, a surge of task-oriented applications will rely on real-time wireless control systems (WCSs). For these systems, ultra-reliable and low-latency wireless communication will be crucial to ensure the timely transmission of control information. To achieve this purpose, we propose a novel time-sequence-based semantic communication paradigm, wher… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: This version of the manuscript was submitted to IEEE Transactions on Communications for possible publication

  38. arXiv:2505.02833  [pdf, other

    cs.RO cs.CV cs.LG

    TWIST: Teleoperated Whole-Body Imitation System

    Authors: Yanjie Ze, Zixuan Chen, João Pedro Araújo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, C. Karen Liu

    Abstract: Teleoperating humanoid robots in a whole-body manner marks a fundamental step toward developing general-purpose robotic intelligence, with human motion providing an ideal interface for controlling all degrees of freedom. Yet, most current humanoid teleoperation systems fall short of enabling coordinated whole-body behavior, typically limiting themselves to isolated locomotion or manipulation tasks… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Project website: https://humanoid-teleop.github.io

  39. arXiv:2505.02795  [pdf, other

    cs.LG cs.AI cs.DC

    HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Praneeth Vepakomma, Wei Ni, Jun Luo, Yue Gao

    Abstract: Recently, large language models (LLMs) have achieved remarkable breakthroughs, revolutionizing the natural language processing domain and beyond. Due to immense parameter sizes, fine-tuning these models with private data for diverse downstream tasks has become mainstream. Though federated learning (FL) offers a promising solution for fine-tuning LLMs without sharing raw data, substantial computing… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages, 22 figures

  40. arXiv:2505.02655  [pdf, ps, other

    cs.LG cs.AI

    SCFormer: Structured Channel-wise Transformer with Cumulative Historical State for Multivariate Time Series Forecasting

    Authors: Shiwei Guo, Ziang Chen, Yupeng Ma, Yunfei Han, Yi Wang

    Abstract: The Transformer model has shown strong performance in multivariate time series forecasting by leveraging channel-wise self-attention. However, this approach lacks temporal constraints when computing temporal features and does not utilize cumulative historical series effectively.To address these limitations, we propose the Structured Channel-wise Transformer with Cumulative Historical state (SCForm… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  41. arXiv:2505.02390  [pdf, ps, other

    cs.LG cs.AI

    Quantitative Analysis of Performance Drop in DeepSeek Model Quantization

    Authors: Enbo Zhao, Yi Shen, Shuming Shi, Jieyun Huang, Zhihao Chen, Ning Wang, Siqi Xiao, Jian Zhang, Kai Wang, Shiguo Lian

    Abstract: Recently, there is a high demand for deploying DeepSeek-R1 and V3 locally, possibly because the official service often suffers from being busy and some organizations have data privacy concerns. While single-machine deployment offers infrastructure simplicity, the models' 671B FP8 parameter configuration exceeds the practical memory limits of a standard 8-GPU machine. Quantization is a widely used… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  42. arXiv:2505.02186  [pdf

    cs.CE

    Probabilistic Method for Optimizing Submarine Search and Rescue Strategy Under Environmental Uncertainty

    Authors: Runhao Liu, Ziming Chen, Peng Zhang

    Abstract: When coping with the urgent challenge of locating and rescuing a deep-sea submersible in the event of communication or power failure, environmental uncertainty in the ocean can not be ignored. However, classic physical models are limited to deterministic scenarios. Therefore, we present a hybrid algorithm framework combined with dynamic analysis for target submarine, Monte Carlo and Bayesian metho… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  43. arXiv:2505.01656  [pdf, other

    cs.CV

    A Novel WaveInst-based Network for Tree Trunk Structure Extraction and Pattern Analysis in Forest Inventory

    Authors: Chenyang Fan, Xujie Zhu, Taige Luo, Sheng Xu, Zhulin Chen, Hongxin Yang

    Abstract: The pattern analysis of tree structure holds significant scientific value for genetic breeding and forestry management. The current trunk and branch extraction technologies are mainly LiDAR-based or UAV-based. The former approaches obtain high-precision 3D data, but its equipment cost is high and the three-dimensional (3D) data processing is complex. The latter approaches efficiently capture canop… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  44. arXiv:2505.01450  [pdf, other

    cs.LG

    Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks

    Authors: Chaoyi Wang, Junjie Zheng, Zihao Chen, Shiyu Xia, Chaofan Ding, Xiaohao Zhang, Xi Tao, Xiaoming He, Xinhan Di

    Abstract: Movie dubbing has advanced significantly, yet assessing the real-world effectiveness of these models remains challenging. A comprehensive evaluation benchmark is crucial for two key reasons: 1) Existing metrics fail to fully capture the complexities of dialogue, narration, monologue, and actor adaptability in movie dubbing. 2) A practical evaluation system should offer valuable insights to improve… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures, accepted to the AI for Content Creation workshop at CVPR 2025 in Nashville, TN

  45. arXiv:2505.00949  [pdf, other

    cs.CL cs.AI cs.LG

    Llama-Nemotron: Efficient Reasoning Models

    Authors: Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen , et al. (109 additional authors not shown)

    Abstract: We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior i… ▽ More

    Submitted 14 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  46. arXiv:2505.00610  [pdf, other

    cs.AI

    Combining LLMs with Logic-Based Framework to Explain MCTS

    Authors: Ziyan An, Xia Wang, Hendrik Baier, Zirong Chen, Abhishek Dubey, Taylor T. Johnson, Jonathan Sprinkle, Ayan Mukhopadhyay, Meiyi Ma

    Abstract: In response to the lack of trust in Artificial Intelligence (AI) for sequential planning, we design a Computational Tree Logic-guided large language model (LLM)-based natural language explanation framework designed for the Monte Carlo Tree Search (MCTS) algorithm. MCTS is often considered challenging to interpret due to the complexity of its search trees, but our framework is flexible enough to ha… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted by AAMAS-25 as an extended abstract

  47. arXiv:2505.00527  [pdf, other

    cs.RO

    DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation

    Authors: Zixuan Chen, Junhui Yin, Yangtao Chen, Jing Huo, Pinzhuo Tian, Jieqi Shi, Yiwen Hou, Yinchuan Li, Yang Gao

    Abstract: Generalizing language-conditioned multi-task imitation learning (IL) models to novel long-horizon 3D manipulation tasks remains a significant challenge. To address this, we propose DeCo (Task Decomposition and Skill Composition), a model-agnostic framework compatible with various multi-task IL models, designed to enhance their zero-shot generalization to novel, compositional, long-horizon 3D manip… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  48. arXiv:2505.00395  [pdf, other

    cs.IT

    GAN-based Generator of Adversarial Attack on Intelligent End-to-End Autoencoder-based Communication System

    Authors: Jianyuan Chen, Lin Zhang, Zuwei Chen, Yawen Chen, Hongcheng Zhuang

    Abstract: Deep neural networks have been applied in wireless communications system to intelligently adapt to dynamically changing channel conditions, while the users are still under the threat of the malicious attacks due to the broadcasting property of wireless channels. However, most attack models require the knowledge of the target details, which is difficult to be implemented in real systems. Our object… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  49. arXiv:2505.00036  [pdf, other

    cs.CL cs.CY

    A Framework to Assess the Persuasion Risks Large Language Model Chatbots Pose to Democratic Societies

    Authors: Zhongren Chen, Joshua Kalla, Quan Le, Shinpei Nakamura-Sakai, Jasjeet Sekhon, Ruixiao Wang

    Abstract: In recent years, significant concern has emerged regarding the potential threat that Large Language Models (LLMs) pose to democratic societies through their persuasive capabilities. We expand upon existing research by conducting two survey experiments and a real-world simulation exercise to determine whether it is more cost effective to persuade a large number of voters using LLM chatbots compared… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  50. arXiv:2504.21814  [pdf, other

    cs.CV

    Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields

    Authors: Yixin Gao, Xiaohan Pan, Xin Li, Zhibo Chen

    Abstract: The rapid development of AIGC foundation models has revolutionized the paradigm of image compression, which paves the way for the abandonment of most pixel-level transform and coding, compelling us to ask: why compress what you can generate if the AIGC foundation model is powerful enough to faithfully generate intricate structure and fine-grained details from nothing more than some compact descrip… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.