Skip to main content

Showing 1–50 of 3,950 results for author: Zhang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10281  [pdf, other

    cs.CV

    MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting

    Authors: Mengqiu Xu, Kaixin Chen, Heng Guo, Yixiang Huang, Ming Wu, Zhenwei Shi, Chuang Zhang, Jun Guo

    Abstract: Deep learning approaches for marine fog detection and forecasting have outperformed traditional methods, demonstrating significant scientific and practical importance. However, the limited availability of open-source datasets remains a major challenge. Existing datasets, often focused on a single region or satellite, restrict the ability to evaluate model performance across diverse conditions and… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09989  [pdf, ps, other

    cs.DC cs.AI cs.NI

    AI Greenferencing: Routing AI Inferencing to Green Modular Data Centers with Heron

    Authors: Tella Rajashekhar Reddy, Palak, Rohan Gandhi, Anjaly Parayil, Chaojie Zhang, Mike Shepperd, Liangcheng Yu, Jayashree Mohan, Srinivasan Iyengar, Shivkumar Kalyanaraman, Debopam Bhattacherjee

    Abstract: AI power demand is growing unprecedentedly thanks to the high power density of AI compute and the emerging inferencing workload. On the supply side, abundant wind power is waiting for grid access in interconnection queues. In this light, this paper argues bringing AI workload to modular compute clusters co-located in wind farms. Our deployment right-sizing strategy makes it economically viable to… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09577  [pdf, ps, other

    cs.RO

    VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation

    Authors: Chaofan Zhang, Peng Hao, Xiaoge Cao, Xiaoshuai Hao, Shaowei Cui, Shuo Wang

    Abstract: While vision-language models have advanced significantly, their application in language-conditioned robotic manipulation is still underexplored, especially for contact-rich tasks that extend beyond visually dominant pick-and-place scenarios. To bridge this gap, we introduce Vision-Tactile-Language-Action model, a novel framework that enables robust policy generation in contact-intensive scenarios… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09305  [pdf

    cs.RO

    Embodied Intelligent Industrial Robotics: Concepts and Techniques

    Authors: Chaoran Zhang, Chenhao Zhang, Zhaobo Xu, Qinghongbing Xie, Pingfa Feng, Long Zeng

    Abstract: In recent years, embodied intelligent robotics (EIR) has made significant progress in multi-modal perception, autonomous decision-making, and physical interaction. Some robots have already been tested in general-purpose scenarios such as homes and shopping malls. We aim to advance the research and application of embodied intelligence in industrial scenes. However, current EIR lacks a deep understa… ▽ More

    Submitted 15 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: 60 pages, 11 figures. The associated project can be found at https://github.com/jackyzengl/EIIR

  5. arXiv:2505.08258  [pdf, ps, other

    cs.NI

    Hybrid Wi-Fi/PDR Indoor Localization with Fingerprint Matching

    Authors: Chunyi Zhang, Zongwei Li, Xiaoqi Li

    Abstract: Indoor position technology has become one of the research highlights in the Internet of Things (IoT), but there is still a lack of universal, low-cost, and high-precision solutions. This paper conducts research on indoor position technology based on location fingerprints and proposes a practical hybrid indoor positioning system. In this experiment, the location fingerprint database is established… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 18 pages

  6. arXiv:2505.08200  [pdf, ps, other

    cs.CL cs.AI

    A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs

    Authors: Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun, Ivan Tsvigun, Zhuohan Xie, Igor Kiselev, Nico Daheim, Caiqi Zhang, Artem Vazhentsev, Mrinmaya Sachan, Preslav Nakov, Timothy Baldwin

    Abstract: Large Language Models (LLMs) have the tendency to hallucinate, i.e., to sporadically generate false or fabricated information. This presents a major challenge, as hallucinations often appear highly convincing and users generally lack the tools to detect them. Uncertainty quantification (UQ) provides a framework for assessing the reliability of model outputs, aiding in the identification of potenti… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2505.08194  [pdf, other

    cs.RO

    CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding

    Authors: Wenxuan Ma, Xiaoge Cao, Yixiang Zhang, Chaofan Zhang, Shaobo Yang, Peng Hao, Bin Fang, Yinghao Cai, Shaowei Cui, Shuo Wang

    Abstract: Recent advancements in integrating tactile sensing with vision-language models (VLMs) have demonstrated remarkable potential for robotic multimodal perception. However, existing tactile descriptions remain limited to superficial attributes like texture, neglecting critical contact states essential for robotic manipulation. To bridge this gap, we propose CLTP, an intuitive and effective language ta… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 16 pages

  8. arXiv:2505.08178  [pdf, ps, other

    cs.CV

    Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images

    Authors: Ziteng Liu, Dongdong He, Chenghong Zhang, Wenpeng Gao, Yili Fu

    Abstract: Occlusion and the scarcity of labeled surgical data are significant challenges in disparity estimation for stereo laparoscopic images. To address these issues, this study proposes a Depth Guided Occlusion-Aware Disparity Refinement Network (DGORNet), which refines disparity maps by leveraging monocular depth information unaffected by occlusion. A Position Embedding (PE) module is introduced to pro… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  9. arXiv:2505.07858  [pdf, other

    cs.CL cs.AI

    Scaling Laws for Speculative Decoding

    Authors: Siyuan Yan, Mo Zhu, Guo-qing Jiang, Jianfei Wang, Jiaxing Chen, Wentai Zhang, Xiang Liao, Xiao Cui, Chen Zhang, Zhuoran Song, Ran Zhu

    Abstract: The escalating demand for efficient decoding in large language models (LLMs) is particularly critical for reasoning-intensive architectures like OpenAI-o3 and DeepSeek-R1, which depend on extended chain-of-thought reasoning. This study investigates speculative decoding techniques through dense LLM architectures to establish foundational insights for accelerating reasoning tasks. While speculative… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 17 pages, 8 figures

  10. arXiv:2505.07782  [pdf, ps, other

    cs.LG

    MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

    Authors: Rushi Qiang, Yuchen Zhuang, Yinghao Li, Dingu Sagar V K, Rongzhi Zhang, Changhao Li, Ian Shu-Hei Wong, Sherry Yang, Percy Liang, Chao Zhang, Bo Dai

    Abstract: We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experimen… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  11. arXiv:2505.07347  [pdf, other

    cs.CV

    AI-Enabled Accurate Non-Invasive Assessment of Pulmonary Hypertension Progression via Multi-Modal Echocardiography

    Authors: Jiewen Yang, Taoran Huang, Shangwei Ding, Xiaowei Xu, Qinhua Zhao, Yong Jiang, Jiarong Guo, Bin Pu, Jiexuan Zheng, Caojin Zhang, Hongwen Fei, Xiaomeng Li

    Abstract: Echocardiographers can detect pulmonary hypertension using Doppler echocardiography; however, accurately assessing its progression often proves challenging. Right heart catheterization (RHC), the gold standard for precise evaluation, is invasive and unsuitable for routine use, limiting its practicality for timely diagnosis and monitoring of pulmonary hypertension progression. Here, we propose MePH… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  12. arXiv:2505.07203  [pdf, other

    cs.DC

    PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

    Authors: Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, Junchen Jiang

    Abstract: Besides typical generative applications, like ChatGPT, GitHub Copilot, and Cursor, we observe an emerging trend that LLMs are increasingly used in traditional discriminative tasks, such as recommendation, credit verification, and data labeling. The key characteristic of these emerging use cases is that the LLM generates only a single output token, rather than an arbitrarily long sequence of tokens… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  13. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  14. arXiv:2505.07057  [pdf, ps, other

    cs.CV

    DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models

    Authors: Junhao Xia, Chaoyang Zhang, Yecheng Zhang, Chengyang Zhou, Zhichang Wang, Bochun Liu, Dongshuo Yin

    Abstract: Video generation based on diffusion models presents a challenging multimodal task, with video editing emerging as a pivotal direction in this field. Recent video editing approaches primarily fall into two categories: training-required and training-free methods. While training-based methods incur high computational costs, training-free alternatives often yield suboptimal performance. To address the… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  15. arXiv:2505.07032  [pdf, other

    cs.CV

    MarkMatch: Same-Hand Stuffing Detection

    Authors: Fei Zhao, Runlin Zhang, Chengcui Zhang, Nitesh Saxena

    Abstract: We present MarkMatch, a retrieval system for detecting whether two paper ballot marks were filled by the same hand. Unlike the previous SOTA method BubbleSig, which used binary classification on isolated mark pairs, MarkMatch ranks stylistic similarity between a query mark and a mark in the database using contrastive learning. Our model is trained with a dense batch similarity matrix and a dual lo… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  16. arXiv:2505.07027  [pdf, ps, other

    cs.AI cs.CL cs.LG cs.NE physics.chem-ph

    LLM-Augmented Chemical Synthesis and Design Decision Programs

    Authors: Haorui Wang, Jeff Guo, Lingkai Kong, Rampi Ramprasad, Philippe Schwaller, Yuanqi Du, Chao Zhang

    Abstract: Retrosynthesis, the process of breaking down a target molecule into simpler precursors through a series of valid reactions, stands at the core of organic chemistry and drug development. Although recent machine learning (ML) research has advanced single-step retrosynthetic modeling and subsequent route searches, these solutions remain restricted by the extensive combinatorial space of possible path… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  17. arXiv:2505.07003  [pdf, ps, other

    cs.CV

    CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation

    Authors: Peng Li, Suizhi Ma, Jialiang Chen, Yuan Liu, Chongyi Zhang, Wei Xue, Wenhan Luo, Alla Sheffer, Wenping Wang, Yike Guo

    Abstract: Recently, 3D generation methods have shown their powerful ability to automate 3D model creation. However, most 3D generation methods only rely on an input image or a text prompt to generate a 3D model, which lacks the control of each component of the generated 3D model. Any modifications of the input image lead to an entire regeneration of the 3D models. In this paper, we introduce a new method ca… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Siggraph 2025

  18. arXiv:2505.06944  [pdf, ps, other

    cs.IT

    Radio Map-Enabled 3D Trajectory and Communication Optimization for Low-Altitude Air-Ground Cooperation

    Authors: Menghao Hu, Tong Zhang, Shuai Wang, Chiya Zhang, Changyang She, Gaojie Chen, Miaowen Wen

    Abstract: Low-altitude economy includes the application of unmanned aerial vehicles (UAVs) serving ground robots. This paper investigates the 3-dimensional (3D) trajectory and communication optimization for low-altitude air-ground cooperation systems, where mobile unmanned ground vehicles (UGVs) upload data to UAVs. We propose a joint optimization algorithm to maximize the minimal sum-rate of UGVs while ens… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 6 pages; 6 figures; submit to IEEE for possible publication

  19. arXiv:2505.06810  [pdf, ps, other

    cs.ET

    QSeer: A Quantum-Inspired Graph Neural Network for Parameter Initialization in Quantum Approximate Optimization Algorithm Circuits

    Authors: Lei Jiang, Chi Zhang, Fan Chen

    Abstract: To mitigate the barren plateau problem, effective parameter initialization is crucial for optimizing the Quantum Approximate Optimization Algorithm (QAOA) in the near-term Noisy Intermediate-Scale Quantum (NISQ) era. Prior physics-driven approaches leveraged the optimal parameter concentration phenomenon, utilizing medium values of previously optimized QAOA parameters stored in databases as initia… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  20. arXiv:2505.06321  [pdf, other

    cs.LG cs.AI

    Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning

    Authors: Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various domains. However, they still face significant challenges, including high computational costs for training and limitations in solving complex reasoning problems. Although existing methods have extended the reasoning capabilities of LLMs through structured paradigms, these approaches often rely on task-specific prompts and… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  21. arXiv:2505.06307  [pdf, ps, other

    cs.CR cs.AI

    Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought

    Authors: Mingfei Zeng, Ming Xie, Xixi Zheng, Chunhai Li, Chuan Zhang, Liehuang Zhu

    Abstract: The rapid development of Internet of Things (IoT) technology has transformed people's way of life and has a profound impact on both production and daily activities. However, with the rapid advancement of IoT technology, the security of IoT devices has become an unavoidable issue in both research and applications. Although some efforts have been made to detect or mitigate IoT security vulnerabiliti… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  22. arXiv:2505.05992  [pdf, ps, other

    cs.NE

    CogniSNN: A First Exploration to Random Graph Architecture based Spiking Neural Networks with Enhanced Expandability and Neuroplasticity

    Authors: Yongsheng Huang, Peibo Duan, Zhipeng Liu, Kai Sun, Changsheng Zhang, Bin Zhang, Mingkun Xu

    Abstract: Despite advances in spiking neural networks (SNNs) in numerous tasks, their architectures remain highly similar to traditional artificial neural networks (ANNs), restricting their ability to mimic natural connections between biological neurons. This paper develops a new modeling paradigm for SNN with random graph architecture (RGA), termed Cognition-aware SNN (CogniSNN). Furthermore, we improve th… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  23. arXiv:2505.05595  [pdf, ps, other

    q-fin.TR cs.AI cs.CE cs.LG

    Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer

    Authors: Wenhao Guo, Yuda Wang, Zeqiao Huang, Changjiang Zhang, Shumin ma

    Abstract: In the complex landscape of traditional futures trading, where vast data and variables like real-time Limit Order Books (LOB) complicate price predictions, we introduce the FutureQuant Transformer model, leveraging attention mechanisms to navigate these challenges. Unlike conventional models focused on point predictions, the FutureQuant model excels in forecasting the range and volatility of futur… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 16 pages, 12 figures

  24. arXiv:2505.05225  [pdf, ps, other

    cs.CL

    QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation

    Authors: Mengze Hong, Wailing Ng, Di Jiang, Chen Jason Zhang

    Abstract: The rapid advancement of Chinese large language models (LLMs) underscores the need for domain-specific evaluations to ensure reliable applications. However, existing benchmarks often lack coverage in vertical domains and offer limited insights into the Chinese working context. Leveraging qualification exams as a unified framework for human expertise evaluation, we introduce QualBench, the first mu… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  25. arXiv:2505.04836  [pdf, ps, other

    eess.SP cs.CV

    Integrated Image Reconstruction and Target Recognition based on Deep Learning Technique

    Authors: Cien Zhang, Jiaming Zhang, Jiajun He, Okan Yurduseven

    Abstract: Computational microwave imaging (CMI) has gained attention as an alternative technique for conventional microwave imaging techniques, addressing their limitations such as hardware-intensive physical layer and slow data collection acquisition speed to name a few. Despite these advantages, CMI still encounters notable computational bottlenecks, especially during the image reconstruction stage. In th… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Submitted to The 2025 15th IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC 2025)

  26. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  27. arXiv:2505.03973  [pdf, other

    cs.CL

    Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale

    Authors: Jiale Liu, Yifan Zeng, Shaokun Zhang, Chi Zhang, Malte Højmark-Bertelsen, Marie Normann Gadeberg, Huazheng Wang, Qingyun Wu

    Abstract: LLM-based optimization has shown remarkable potential in enhancing agentic systems. However, the conventional approach of prompting LLM optimizer with the whole training trajectories on training dataset in a single pass becomes untenable as datasets grow, leading to context window overflow and degraded pattern recognition. To address these challenges, we propose Fine-Grained Optimization (FGO), a… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  28. arXiv:2505.03836  [pdf, ps, other

    cs.IR cs.AI cs.CV

    OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery

    Authors: Chongsheng Zhang, Shuwen Wu, Yingqi Chen, Matthias Aßenmacher, Christian Heumann, Yi Men, Gaojuan Fan, João Gama

    Abstract: Oracle Bone Inscription (OBI) is the earliest systematic writing system in China, while the identification of Oracle Bone (OB) duplicates is a fundamental issue in OBI research. In this work, we design a progressive OB duplicate discovery framework that combines unsupervised low-level keypoints matching with high-level text-centric content-based matching to refine and rank the candidate OB duplica… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: This is the long version of our OBD-Finder paper for AI-enabled Oracle Bone Duplicates Discovery (currently under review at the ECML PKDD 2025 Demo Track). The models, video illustration and demonstration of this paper are available at: https://github.com/cszhangLMU/OBD-Finder/. Illustration video: https://www.youtube.com/watch?v=5QT4f0YIo0Q

  29. arXiv:2505.03641  [pdf, other

    cs.AI

    Synthesizing Images on Perceptual Boundaries of ANNs for Uncovering and Manipulating Human Perceptual Variability

    Authors: Chen Wei, Chi Zhang, Jiachen Zou, Haotian Deng, Dietmar Heinke, Quanying Liu

    Abstract: Human decision-making in cognitive tasks and daily life exhibits considerable variability, shaped by factors such as task difficulty, individual preferences, and personal experiences. Understanding this variability across individuals is essential for uncovering the perceptual and decision-making mechanisms that humans rely on when faced with uncertainty and ambiguity. We present a computational fr… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: accepted at ICML 2025

  30. arXiv:2505.03567  [pdf, other

    cs.CV

    Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images

    Authors: Zengli Luo, Canlong Zhang, Xiaochun Lu, Zhixin Li, Zhiwen Wang

    Abstract: Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granulari… ▽ More

    Submitted 6 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

    Comments: 9pages,5figures

  31. arXiv:2505.03420  [pdf, other

    cs.MM cs.CV

    Mitigating Image Captioning Hallucinations in Vision-Language Models

    Authors: Fei Zhao, Chengcui Zhang, Runlin Zhang, Tianyang Wang, Xi Li

    Abstract: Hallucinations in vision-language models (VLMs) hinder reliability and real-world applicability, usually stemming from distribution shifts between pretraining data and test samples. Existing solutions, such as retraining or fine-tuning on additional data, demand significant computational resources and labor-intensive data collection, while ensemble-based methods incur additional costs by introduci… ▽ More

    Submitted 11 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  32. arXiv:2505.03059  [pdf, other

    cs.CL

    Improving Model Alignment Through Collective Intelligence of Open-Source LLMS

    Authors: Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou

    Abstract: Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  33. arXiv:2505.02922  [pdf, ps, other

    cs.LG

    RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

    Authors: Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang

    Abstract: The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints. We present RetroInfer, a novel system that reconceptualizes the key-value (KV) cache as a vector storage system which exploits the inherent attention sparsity to accelerate long-context LLM inference. At its core is the wave index,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages

  34. arXiv:2505.01665  [pdf, other

    cs.LG

    Adaptively Point-weighting Curriculum Learning

    Authors: Wensheng Li, Hao Wang, Ruifeng Zhou, Hanting Guan, Chao Zhang, Dacheng Tao

    Abstract: Curriculum learning (CL) is referred to as a training strategy that makes easy samples learned first and then fits hard samples. It imitates the process of humans learning knowledge, and has become a potential manner of effectively training deep networks. In this study, we develop the adaptively point-weighting (APW) curriculum learning algorithm, which adaptively assigns the weight to every train… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  35. arXiv:2505.01618  [pdf, other

    cs.LG cs.AI

    Don't be lazy: CompleteP enables compute-efficient deep transformers

    Authors: Nolan Dey, Bin Claire Zhang, Lorenzo Noci, Mufan Li, Blake Bordelon, Shane Bergsma, Cengiz Pehlevan, Boris Hanin, Joel Hestness

    Abstract: We study compute efficiency of LLM training when using different parameterizations, i.e., rules for adjusting model and optimizer hyperparameters (HPs) as model size changes. Some parameterizations fail to transfer optimal base HPs (such as learning rate) across changes in model depth, requiring practitioners to either re-tune these HPs as they scale up (expensive), or accept sub-optimal training… ▽ More

    Submitted 14 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

    Comments: 10 main pages, 16 appendix pages, 13 figures

  36. arXiv:2505.01432  [pdf, other

    cs.CE

    Dynamic Asset Pricing: Integrating FinBERT-Based Sentiment Quantification with the Fama--French Five-Factor Model

    Authors: Chi Zhang

    Abstract: This paper presents a comprehensive study on the integration of text-derived, time-varying sentiment factors into traditional multi-factor asset pricing models. Leveraging FinBERT, a domain-specific deep learning language model, we construct a dynamic sentiment index and its volatility from large-scale financial news and social media data covering 2020 to 2022. By embedding these sentiment measure… ▽ More

    Submitted 22 April, 2025; originally announced May 2025.

    Comments: 22 pages,2 figures

  37. arXiv:2505.01003  [pdf, other

    cs.CV

    3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer

    Authors: Kamel Aouaidjia, Aofan Li, Wenhao Zhang, Chongsheng Zhang

    Abstract: Nowadays, Transformers and Graph Convolutional Networks (GCNs) are the prevailing techniques for 3D human pose estimation. However, Transformer-based methods either ignore the spatial neighborhood relationships between the joints when used for skeleton representations or disregard the local temporal patterns of the local joint movements in skeleton sequence modeling, while GCN-based methods often… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 16 pages, 9 figures, 7 tables

  38. arXiv:2505.00812  [pdf, other

    cs.LG cs.AI

    Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization

    Authors: Kuan Zhang, Chengliang Chai, Jingzhe Xu, Chi Zhang, Ye Yuan, Guoren Wang, Lei Cao

    Abstract: Recent studies indicate that deep neural networks degrade in generalization performance under noisy supervision. Existing methods focus on isolating clean subsets or correcting noisy labels, facing limitations such as high computational costs, heavy hyperparameter tuning process, and coarse-grained optimization. To address these challenges, we propose a novel two-stage noisy learning framework tha… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  39. arXiv:2505.00742  [pdf, other

    cs.CV cs.AI eess.IV

    Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

    Authors: Jiaxu Qian, Chendong Wang, Yifan Yang, Chaoyun Zhang, Huiqiang Jiang, Xufang Luo, Yu Kang, Qingwei Lin, Anlan Zhang, Shiqi Jiang, Ting Cao, Tianjun Mao, Suman Banerjee, Guyue Liu, Saravan Rajmohan, Dongmei Zhang, Yuqing Yang, Qi Zhang, Lili Qiu

    Abstract: Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omi… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  40. arXiv:2505.00690  [pdf, other

    cs.CV cs.AI cs.RO

    Towards Autonomous Micromobility through Scalable Urban Simulation

    Authors: Wayne Wu, Honglin He, Chaoyuan Zhang, Jack He, Seth Z. Zhao, Ran Gong, Quanyi Li, Bolei Zhou

    Abstract: Micromobility, which utilizes lightweight mobile machines moving in urban public spaces, such as delivery robots and mobility scooters, emerges as a promising alternative to vehicular mobility. Current micromobility depends mostly on human manual operation (in-person or remote control), which raises safety and efficiency concerns when navigating busy urban environments full of unpredictable obstac… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: CVPR 2025 Highlight. Project page: https://metadriverse.github.io/urban-sim/

  41. arXiv:2505.00551  [pdf, other

    cs.CL

    100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

    Authors: Chong Zhang, Yue Deng, Xiang Lin, Bin Wang, Dianwen Ng, Hai Ye, Xingxuan Li, Yao Xiao, Zhanfeng Mo, Qi Zhang, Lidong Bing

    Abstract: The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. However, the implementation details of the released models have not been fully open… ▽ More

    Submitted 15 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  42. arXiv:2505.00415  [pdf, other

    cs.LG

    CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series

    Authors: Tian Lan, Yifei Gao, Yimeng Lu, Chen Zhang

    Abstract: Unsupervised Time series anomaly detection plays a crucial role in applications across industries. However, existing methods face significant challenges due to data distributional shifts across different domains, which are exacerbated by the non-stationarity of time series over time. Existing models fail to generalize under multiple heterogeneous source domains and emerging unseen new target domai… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  43. arXiv:2505.00340  [pdf, ps, other

    cs.CR

    Vehicular Communication Security: Multi-Channel and Multi-Factor Authentication

    Authors: Marco De Vincenzi, Shuyang Sun, Chen Bo Calvin Zhang, Manuel Garcia, Shaozu Ding, Chiara Bodei, Ilaria Matteucci, Sanjay E. Sarma, Dajiang Suo

    Abstract: Secure and reliable communications are crucial for Intelligent Transportation Systems (ITSs), where Vehicle-to-Infrastructure (V2I) communication plays a key role in enabling mobility-enhancing and safety-critical services. Current V2I authentication relies on credential-based methods over wireless Non-Line-of-Sight (NLOS) channels, leaving them exposed to remote impersonation and proximity attack… ▽ More

    Submitted 8 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  44. arXiv:2505.00031  [pdf, other

    cs.CL cs.AI

    Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving

    Authors: Jin Zhang, Flood Sung, Zhilin Yang, Yang Gao, Chongjie Zhang

    Abstract: In the field of large language model (LLM) post-training, the effectiveness of utilizing synthetic data generated by the LLM itself has been well-presented. However, a key question remains unaddressed: what essential information should such self-generated data encapsulate? Existing approaches only produce step-by-step problem solutions, and fail to capture the abstract meta-knowledge necessary for… ▽ More

    Submitted 28 April, 2025; originally announced May 2025.

  45. arXiv:2504.21435  [pdf, other

    cs.CV cs.AI cs.CL

    SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding

    Authors: Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang

    Abstract: With the rapid development of Multi-modal Large Language Models (MLLMs), an increasing number of benchmarks have been established to evaluate the video understanding capabilities of these models. However, these benchmarks focus on standalone videos and mainly assess "visual elements" like human actions and object states. In reality, contemporary videos often encompass complex and continuous narrat… ▽ More

    Submitted 13 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

    Comments: 29 pages, 15 figures, CVPR 2025

  46. arXiv:2504.21055  [pdf, ps, other

    cs.LG cs.AI

    Modeling and Performance Analysis for Semantic Communications Based on Empirical Results

    Authors: Shuai Ma, Bin Shen, Chuanhui Zhang, Youlong Wu, Hang Li, Shiyin Li, Guangming Shi, Naofal Al-Dhahir

    Abstract: Due to the black-box characteristics of deep learning based semantic encoders and decoders, finding a tractable method for the performance analysis of semantic communications is a challenging problem. In this paper, we propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR, which can be applied for both image reconstruction tasks and inferenc… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  47. arXiv:2504.20771  [pdf, other

    cs.CL

    Turing Machine Evaluation for Large Language Model

    Authors: Haitao Wu, Zongbo Han, Huaxi Huang, Changqing Zhang

    Abstract: With the rapid development and widespread application of Large Language Models (LLMs), rigorous evaluation has become particularly crucial. This research adopts a novel perspective, focusing on evaluating the core computational reasoning ability of LLMs, defined as the capacity of model to accurately understand rules, and execute logically computing operations. This capability assesses the reliabi… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  48. arXiv:2504.20452  [pdf, other

    cs.IR cs.AI

    Enhancing News Recommendation with Hierarchical LLM Prompting

    Authors: Hai-Dang Kieu, Delvin Ce Zhang, Minh Duc Nguyen, Min Xu, Qiang Wu, Dung D. Le

    Abstract: Personalized news recommendation systems often struggle to effectively capture the complexity of user preferences, as they rely heavily on shallow representations, such as article titles and abstracts. To address this problem, we introduce a novel method, namely PNR-LLM, for Large Language Models for Personalized News Recommendation. Specifically, PNR-LLM harnesses the generation capabilities of L… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  49. arXiv:2504.20314  [pdf, other

    cs.LG cs.AI

    Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

    Authors: Qitao Tan, Sung-En Chang, Rui Xia, Huidong Ji, Chence Yang, Ci Zhang, Jun Liu, Zheng Zhan, Zhou Zou, Yanzhi Wang, Jin Lu, Geng Yuan

    Abstract: Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms,… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  50. arXiv:2504.20050  [pdf, ps, other

    cs.CR

    Multi-Party Private Set Operations from Predicative Zero-Sharing

    Authors: Minglang Dong, Yu Chen, Cong Zhang, Yujie Bai, Yang Cao

    Abstract: Typical protocols in the multi-party private set operations (MPSO) setting enable m > 2 parties to perform certain secure computation on the intersection or union of their private sets, realizing a very limited range of MPSO functionalities. Most works in this field focus on just one or two specific functionalities, resulting in a large variety of isolated schemes and a lack of a unified framework… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.