Skip to main content

Showing 1–50 of 491 results for author: Wei, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08775  [pdf, ps, other

    cs.CL

    HealthBench: Evaluating Large Language Models Towards Improved Human Health

    Authors: Rahul K. Arora, Jason Wei, Rebecca Soskin Hicks, Preston Bowman, Joaquin Quiñonero-Candela, Foivos Tsimpourlas, Michael Sharman, Meghan Shah, Andrea Vallone, Alex Beutel, Johannes Heidecke, Karan Singhal

    Abstract: We present HealthBench, an open-source benchmark measuring the performance and safety of large language models in healthcare. HealthBench consists of 5,000 multi-turn conversations between a model and an individual user or healthcare professional. Responses are evaluated using conversation-specific rubrics created by 262 physicians. Unlike previous multiple-choice or short-answer benchmarks, Healt… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Blog: https://openai.com/index/healthbench/ Code: https://github.com/openai/simple-evals

  2. arXiv:2505.08316  [pdf, ps, other

    cs.CE cs.CV

    Improving Unsupervised Task-driven Models of Ventral Visual Stream via Relative Position Predictivity

    Authors: Dazhong Rong, Hao Dong, Xing Gao, Jiyu Wei, Di Hong, Yaoyao Hao, Qinming He, Yueming Wang

    Abstract: Based on the concept that ventral visual stream (VVS) mainly functions for object recognition, current unsupervised task-driven methods model VVS by contrastive learning, and have achieved good brain similarity. However, we believe functions of VVS extend beyond just object recognition. In this paper, we introduce an additional function involving VVS, named relative position (RP) prediction. We fi… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted for full publication at CogSci 2025 (https://cognitivesciencesociety.org/cogsci-2025/)

  3. arXiv:2505.06005  [pdf, ps, other

    cs.DS cs.CC cs.DM

    Second Price Matching with Complete Allocation and Degree Constraints

    Authors: Rom Pinchasi, Neta Singer, Lukas Vogl, Jiaye Wei

    Abstract: We study the Second Price Matching problem, introduced by Azar, Birnbaum, Karlin, and Nguyen in 2009. In this problem, a bipartite graph (bidders and goods) is given, and the profit of a matching is the number of matches containing a second unmatched bidder. Maximizing profit is known to be APX-hard and the current best approximation guarantee is $1/2$. APX-hardness even holds when all degrees are… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  4. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  5. arXiv:2505.02648  [pdf, other

    cs.CV

    MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

    Authors: Mingcheng Li, Xiaolu Hou, Ziyang Liu, Dingkang Yang, Ziyun Qian, Jiawei Chen, Jinjie Wei, Yue Jiang, Qingyao Xu, Lihua Zhang

    Abstract: Diffusion models have shown excellent performance in text-to-image generation. Nevertheless, existing methods often suffer from performance bottlenecks when handling complex prompts that involve multiple objects, characteristics, and relations. Therefore, we propose a Multi-agent Collaboration-based Compositional Diffusion (MCCD) for text-to-image generation for complex scenes. Specifically, we de… ▽ More

    Submitted 6 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  6. arXiv:2505.02152  [pdf, other

    cs.RO

    Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

    Authors: Cunxin Fan, Xiaosong Jia, Yihang Sun, Yixiao Wang, Jianglan Wei, Ziyang Gong, Xiangyu Zhao, Masayoshi Tomizuka, Xue Yang, Junchi Yan, Mingyu Ding

    Abstract: Vision-Language-Action (VLA) models have shown great promise for generalist robotic manipulation in the physical world. However, existing models are restricted to robot observations and text-only instructions, lacking the flexibility of interleaved multimodal instructions enabled by recent advances in foundation models in the digital world. In this paper, we present Interleave-VLA, the first frame… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  7. arXiv:2505.00586  [pdf, other

    cs.RO cs.LG

    ParkDiffusion: Heterogeneous Multi-Agent Multi-Modal Trajectory Prediction for Automated Parking using Diffusion Models

    Authors: Jiarong Wei, Niclas Vödisch, Anna Rehr, Christian Feist, Abhinav Valada

    Abstract: Automated parking is a critical feature of Advanced Driver Assistance Systems (ADAS), where accurate trajectory prediction is essential to bridge perception and planning modules. Despite its significance, research in this domain remains relatively limited, with most existing studies concentrating on single-modal trajectory prediction of vehicles. In this work, we propose ParkDiffusion, a novel app… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  8. arXiv:2504.20069  [pdf, other

    cs.LG cs.AI eess.SP

    A Simple Review of EEG Foundation Models: Datasets, Advancements and Future Perspectives

    Authors: Junhong Lai, Jiyu Wei, Lin Yao, Yueming Wang

    Abstract: Electroencephalogram (EEG) signals play a crucial role in understanding brain activity and diagnosing neurological disorders. This review focuses on the recent development of EEG foundation models(EEG-FMs), which have shown great potential in processing and analyzing EEG data. We discuss various EEG-FMs, including their architectures, pre-training strategies, their pre-training and downstream data… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  9. arXiv:2504.19237  [pdf, other

    cs.SE

    Deep Reinforcement Learning for Automated Web GUI Testing

    Authors: Zhiyu Gu, Chenxu Liu, Guoquan Wu, Yifei Zhang, ChenXi Yang, Zheheng Liang, Wei Chen, Jun Wei

    Abstract: Automated GUI testing of web applications has always been considered a challenging task considering their large state space and complex interaction logic. Deep Reinforcement Learning (DRL) is a recent extension of Reinforcement Learning (RL), which takes advantage of the powerful learning capabilities of neural networks, making it suitable for complex exploration space. In this paper, leveraging t… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 12 pages, 7 figures

  10. arXiv:2504.17427  [pdf, other

    cs.IR

    Beyond Whole Dialogue Modeling: Contextual Disentanglement for Conversational Recommendation

    Authors: Guojia An, Jie Zou, Jiwei Wei, Chaoning Zhang, Fuming Sun, Yang Yang

    Abstract: Conversational recommender systems aim to provide personalized recommendations by analyzing and utilizing contextual information related to dialogue. However, existing methods typically model the dialogue context as a whole, neglecting the inherent complexity and entanglement within the dialogue. Specifically, a dialogue comprises both focus information and background information, which mutually i… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  11. arXiv:2504.16358  [pdf, other

    cs.CL

    Text-to-TrajVis: Enabling Trajectory Data Visualizations from Natural Language Questions

    Authors: Tian Bai, Huiyan Ying, Kailong Suo, Junqiu Wei, Tao Fan, Yuanfeng Song

    Abstract: This paper introduces the Text-to-TrajVis task, which aims to transform natural language questions into trajectory data visualizations, facilitating the development of natural language interfaces for trajectory visualization systems. As this is a novel task, there is currently no relevant dataset available in the community. To address this gap, we first devised a new visualization language called… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  12. arXiv:2504.16074  [pdf, other

    cs.CL

    PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

    Authors: Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang , et al. (27 additional authors not shown)

    Abstract: We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, t… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 21 pages ,8 figures, 4 tables

  13. arXiv:2504.15742  [pdf, other

    cs.DB cs.SE

    Proving Cypher Query Equivalence

    Authors: Lei Tang, Wensheng Dou, Yingying Zheng, Lijie Xu, Wei Wang, Jun Wei, Tao Huang

    Abstract: Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important foundation for optimizing graph query performance, ensuring graph query reliability, etc. Although researchers have proposed many SQL query equivalence provers for relational database syste… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 14 pages, accepted by ICDE 2025

  14. arXiv:2504.15545  [pdf, other

    eess.IV cs.CV

    VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining

    Authors: Zizhi Chen, Xinyu Zhang, Minghao Han, Yizhou Liu, Ziyun Qian, Weifeng Zhang, Xukun Zhang, Jingwei Wei, Lihua Zhang

    Abstract: In histopathology, tissue sections are typically stained using common H&E staining or special stains (MAS, PAS, PASM, etc.) to clearly visualize specific tissue structures. The rapid advancement of deep learning offers an effective solution for generating virtually stained images, significantly reducing the time and labor costs associated with traditional histochemical staining. However, a new cha… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  15. arXiv:2504.14858  [pdf, other

    cs.AI cs.CL

    AlignRAG: An Adaptable Framework for Resolving Misalignments in Retrieval-Aware Reasoning of RAG

    Authors: Jiaqi Wei, Hao Zhou, Xiang Zhang, Di Zhang, Zijie Qiu, Wei Wei, Jinzhe Li, Wanli Ouyang, Siqi Sun

    Abstract: Retrieval-augmented generation (RAG) has emerged as a foundational paradigm for knowledge-grounded text generation. However, existing RAG pipelines often fail to ensure that the reasoning trajectories align with the evidential constraints imposed by retrieved content. In this paper, we reframe RAG as a problem of retrieval-aware reasoning and identify a core challenge: reasoning misalignment-the m… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  16. arXiv:2504.13655  [pdf, other

    cs.CL cs.AI cs.IR

    Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts

    Authors: Jie Zou, Cheng Lin, Weikang Guo, Zheng Wang, Jiwei Wei, Yang Yang, Hengtao Shen

    Abstract: Conversational recommender systems enable natural language conversations and thus lead to a more engaging and effective recommendation scenario. As the conversations for recommender systems usually contain limited contextual information, many existing conversational recommender systems incorporate external sources to enrich the contextual information. However, how to combine different types of con… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 30 pages

  17. arXiv:2504.13596  [pdf, other

    cs.CV cs.RO

    LMPOcc: 3D Semantic Occupancy Prediction Utilizing Long-Term Memory Prior from Historical Traversals

    Authors: Shanshuai Yuan, Julong Wei, Muer Tie, Xiangyun Ren, Zhongxue Gan, Wenchao Ding

    Abstract: Vision-based 3D semantic occupancy prediction is critical for autonomous driving, enabling unified modeling of static infrastructure and dynamic agents. In practice, autonomous vehicles may repeatedly traverse identical geographic locations under varying environmental conditions, such as weather fluctuations and illumination changes. Existing methods in 3D occupancy prediction predominantly integr… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  18. arXiv:2504.13420  [pdf, other

    cs.RO cs.SE

    Testing the Fault-Tolerance of Multi-Sensor Fusion Perception in Autonomous Driving Systems

    Authors: Haoxiang Tian, Wenqiang Ding, Xingshuo Han, Guoquan Wu, An Guo, Junqi Zhang. Wei Chen, Jun Wei, Tianwei Zhang

    Abstract: High-level Autonomous Driving Systems (ADSs), such as Google Waymo and Baidu Apollo, typically rely on multi-sensor fusion (MSF) based approaches to perceive their surroundings. This strategy increases perception robustness by combining the respective strengths of the camera and LiDAR and directly affects the safety-critical driving decisions of autonomous vehicles (AVs). However, in real-world au… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  19. arXiv:2504.12516  [pdf, ps, other

    cs.CL

    BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

    Authors: Jason Wei, Zhiqing Sun, Spencer Papay, Scott McKinney, Jeffrey Han, Isa Fulford, Hyung Won Chung, Alex Tachard Passos, William Fedus, Amelia Glaese

    Abstract: We present BrowseComp, a simple yet challenging benchmark for measuring the ability for agents to browse the web. BrowseComp comprises 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information. Despite the difficulty of the questions, BrowseComp is simple and easy-to-use, as predicted answers are short and easily verifiable against reference… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  20. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  21. arXiv:2504.10258  [pdf, other

    cs.CV cs.MM

    XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark

    Authors: Shuai Liu, Youmeng Li, Jizeng Wei

    Abstract: Document Reading Order Recovery is a fundamental task in document image understanding, playing a pivotal role in enhancing Retrieval-Augmented Generation (RAG) and serving as a critical preprocessing step for large language models (LLMs). Existing methods often struggle with complex layouts(e.g., multi-column newspapers), high-overhead interactions between cross-modal elements (visual regions and… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  22. arXiv:2504.09601  [pdf, other

    cs.CV cs.LG cs.MM eess.IV physics.med-ph

    Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation

    Authors: Jia Wei, Xiaoqi Zhao, Jonghye Woo, Jinsong Ouyang, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu

    Abstract: Single domain generalization (SDG) has recently attracted growing attention in medical image segmentation. One promising strategy for SDG is to leverage consistent semantic shape priors across different imaging protocols, scanner vendors, and clinical sites. However, existing dictionary learning methods that encode shape priors often suffer from limited representational power with a small set of o… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025 workshop

  23. arXiv:2504.09566  [pdf, other

    cs.CL

    Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution

    Authors: Chenghao Li, Chaoning Zhang, Yi Lu, Jiaquan Zhang, Qigan Sun, Xudong Wang, Jiwei Wei, Guoqing Wang, Yang Yang, Heng Tao Shen

    Abstract: Chain-of-Thought (CoT) prompting enhances the reasoning of large language models (LLMs) by decomposing problems into sequential steps, mimicking human logic and reducing errors. However, complex tasks with vast solution spaces and vague constraints often exceed the capacity of a single reasoning chain. Inspired by Minimal Free Resolution (MFR) in commutative algebra and algebraic geometry, we prop… ▽ More

    Submitted 16 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

  24. arXiv:2504.08154  [pdf, other

    cs.CV

    Investigating Vision-Language Model for Point Cloud-based Vehicle Classification

    Authors: Yiqiao Li, Jie Wei, Camille Kamga

    Abstract: Heavy-duty trucks pose significant safety challenges due to their large size and limited maneuverability compared to passenger vehicles. A deeper understanding of truck characteristics is essential for enhancing the safety perspective of cooperative autonomous driving. Traditional LiDAR-based truck classification methods rely on extensive manual annotations, which makes them labor-intensive and co… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 5 pages,3 figures, 1 table, CVPR DriveX workshop

  25. arXiv:2504.07866  [pdf, ps, other

    cs.CL cs.AI

    Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

    Authors: Yichun Yin, Wenyong Huang, Kaikai Song, Yehui Tang, Xueyu Wu, Wei Guo, Peng Guo, Yaoyuan Wang, Xiaojun Meng, Yasheng Wang, Dong Li, Can Chen, Dandan Tu, Yin Li, Fisher Yu, Ruiming Tang, Yunhe Wang, Baojun Wang, Bin Wang, Bo Wang, Boxiao Liu, Changzheng Zhang, Duyu Tang, Fei Mi, Hui Jin , et al. (27 additional authors not shown)

    Abstract: We present Pangu Ultra, a Large Language Model (LLM) with 135 billion parameters and dense Transformer modules trained on Ascend Neural Processing Units (NPUs). Although the field of LLM has been witnessing unprecedented advances in pushing the scale and capability of LLM in recent years, training such a large-scale model still involves significant optimization and system challenges. To stabilize… ▽ More

    Submitted 11 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: fix conflicts of latex pacakges

  26. arXiv:2504.04753  [pdf, other

    cs.CV

    CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

    Authors: Cheng Chen, Jiacheng Wei, Tianrun Chen, Chi Zhang, Xiaofeng Yang, Shangzhan Zhang, Bingchen Yang, Chuan-Sheng Foo, Guosheng Lin, Qixing Huang, Fayao Liu

    Abstract: Creating CAD digital twins from the physical world is crucial for manufacturing, design, and simulation. However, current methods typically rely on costly 3D scanning with labor-intensive post-processing. To provide a user-friendly design process, we explore the problem of reverse engineering from unconstrained real-world CAD images that can be easily captured by users of all experiences. However,… ▽ More

    Submitted 10 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR2025

  27. arXiv:2503.23348  [pdf, other

    cs.RO cs.CV

    Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts

    Authors: Jianhua Sun, Jiude Wei, Yuxuan Li, Cewu Lu

    Abstract: We human rely on a wide range of commonsense knowledge to interact with an extensive number and categories of objects in the physical world. Likewise, such commonsense knowledge is also crucial for robots to successfully develop generalized object manipulation skills. While recent advancements in Large Language Models (LLM) have showcased their impressive capabilities in acquiring commonsense know… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  28. arXiv:2503.23273  [pdf, ps, other

    cs.DS

    Improved algorithms for single machine serial-batch scheduling to minimize makespan and maximum cost

    Authors: Shuguang Li, Zhenxin Wen, Jing Wei

    Abstract: This paper studies the bicriteria problem of scheduling $n$ jobs on a serial-batch machine to minimize makespan and maximum cost simultaneously. A serial-batch machine can process up to $b$ jobs as a batch, where $b$ is known as the batch capacity. When a new batch starts, a constant setup time is required for the machine. Within each batch, the jobs are processed sequentially, and thus the proces… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  29. arXiv:2503.18746  [pdf, other

    cs.CV

    Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition

    Authors: Yifei Zhang, Chang Liu, Jin Wei, Xiaomeng Yang, Yu Zhou, Can Ma, Xiangyang Ji

    Abstract: Text images are unique in their dual nature, encompassing both visual and linguistic information. The visual component encompasses structural and appearance-based features, while the linguistic dimension incorporates contextual and semantic elements. In scenarios with degraded visual quality, linguistic patterns serve as crucial supplements for comprehension, highlighting the necessity of integrat… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  30. arXiv:2503.16071  [pdf, other

    cs.CL cs.AI cs.IR

    Tuning LLMs by RAG Principles: Towards LLM-native Memory

    Authors: Jiale Wei, Shuchi Wu, Ruochen Liu, Xiang Ying, Jingbo Shang, Fangbo Tao

    Abstract: Memory, additional information beyond the training of large language models (LLMs), is crucial to various real-world applications, such as personal assistant. The two mainstream solutions to incorporate memory into the generation process are long-context LLMs and retrieval-augmented generation (RAG). In this paper, we first systematically compare these two types of solutions on three renovated/new… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  31. arXiv:2503.13252  [pdf, other

    cs.RO eess.SP

    Digital Beamforming Enhanced Radar Odometry

    Authors: Jingqi Jiang, Shida Xu, Kaicheng Zhang, Jiyuan Wei, Jingyang Wang, Sen Wang

    Abstract: Radar has become an essential sensor for autonomous navigation, especially in challenging environments where camera and LiDAR sensors fail. 4D single-chip millimeter-wave radar systems, in particular, have drawn increasing attention thanks to their ability to provide spatial and Doppler information with low hardware cost and power consumption. However, most single-chip radar systems using traditio… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  32. arXiv:2503.11004  [pdf, other

    cs.CV

    VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention

    Authors: Jiangning Wei, Lixiong Qin, Bo Yu, Tianjian Zou, Chuhan Yan, Dandan Xiao, Yang Yu, Lan Yang, Ke Li, Jun Liu

    Abstract: Action recognition is a crucial task in artificial intelligence, with significant implications across various domains. We initially perform a comprehensive analysis of seven prominent action recognition methods across five widely-used datasets. This analysis reveals a critical, yet previously overlooked, observation: as the velocity of actions increases, the performance of these methods variably d… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI 2025

  33. arXiv:2503.10907  [pdf, other

    cs.MA cs.AI cs.CY

    H2-MARL: Multi-Agent Reinforcement Learning for Pareto Optimality in Hospital Capacity Strain and Human Mobility during Epidemic

    Authors: Xueting Luo, Hao Deng, Jihong Yang, Yao Shen, Huanhuan Guo, Zhiyuan Sun, Mingqing Liu, Jiming Wei, Shengjie Zhao

    Abstract: The necessity of achieving an effective balance between minimizing the losses associated with restricting human mobility and ensuring hospital capacity has gained significant attention in the aftermath of COVID-19. Reinforcement learning (RL)-based strategies for human mobility management have recently advanced in addressing the dynamic evolution of cities and epidemics; however, they still face c… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  34. arXiv:2503.10084  [pdf, other

    cs.CL

    Why Does Your CoT Prompt (Not) Work? Theoretical Analysis of Prompt Space Complexity, its Interaction with Answer Space During CoT Reasoning with LLMs: A Recurrent Perspective

    Authors: Xiang Zhang, Juntai Cao, Jiaqi Wei, Chenyu You, Dujian Ding

    Abstract: Despite the remarkable successes of Large Language Models (LLMs), their fundamental Transformer architecture possesses inherent theoretical limitations that restrict their capability to handle reasoning tasks with increasing computational complexity. Chain-of-Thought (CoT) prompting has emerged as a practical solution, supported by several theoretical studies. However, current CoT-based methods (i… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2410.14198

  35. arXiv:2503.08163  [pdf, other

    cs.LG cs.AI cs.CE

    XAI4Extremes: An interpretable machine learning framework for understanding extreme-weather precursors under climate change

    Authors: Jiawen Wei, Aniruddha Bora, Vivek Oommen, Chenyu Dong, Juntao Yang, Jeff Adie, Chen Chen, Simon See, George Karniadakis, Gianmarco Mengaldo

    Abstract: Extreme weather events are increasing in frequency and intensity due to climate change. This, in turn, is exacting a significant toll in communities worldwide. While prediction skills are increasing with advances in numerical weather prediction and artificial intelligence tools, extreme weather still present challenges. More specifically, identifying the precursors of such extreme weather events a… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  36. arXiv:2503.08102  [pdf, other

    cs.AI cs.CL cs.HC

    AI-native Memory 2.0: Second Me

    Authors: Jiale Wei, Xiang Ying, Tao Gao, Fangyi Bao, Felix Tao, Jingbo Shang

    Abstract: Human interaction with the external world fundamentally involves the exchange of personal memory, whether with other individuals, websites, applications, or, in the future, AI agents. A significant portion of this interaction is redundant, requiring users to repeatedly provide the same information across different contexts. Existing solutions, such as browser-stored credentials, autofill mechanism… ▽ More

    Submitted 12 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  37. arXiv:2503.08040  [pdf, other

    cs.LG

    Accurate INT8 Training Through Dynamic Block-Level Fallback

    Authors: Pengle Zhang, Jia Wei, Jintao Zhang, Jun Zhu, Jianfei Chen

    Abstract: Transformer models have achieved remarkable success across various AI applications but face significant training costs. Low-bit training, such as INT8 training, can leverage computational units with higher throughput, and has already demonstrated its effectiveness on GPT2 models with block-level quantization. However, it struggles with modern Transformer variants incorporating GLU units. This is b… ▽ More

    Submitted 11 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  38. arXiv:2503.04170  [pdf, other

    cs.ET cs.AI

    Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework

    Authors: Xiaolong Li, Jianhao Wei, Haidong Wang, Li Dong, Ruoyang Chen, Changyan Yi, Jun Cai, Dusit Niyato, Xuemin, Shen

    Abstract: In intelligent transportation systems (ITSs), incorporating pedestrians and vehicles in-the-loop is crucial for developing realistic and safe traffic management solutions. However, there is falls short of simulating complex real-world ITS scenarios, primarily due to the lack of a digital twin implementation framework for characterizing interactions between pedestrians and vehicles at different loc… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  39. arXiv:2503.04036  [pdf, other

    cs.CR cs.CL cs.LG

    Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge

    Authors: Xinyue Cui, Johnny Tian-Zheng Wei, Swabha Swayamdipta, Robin Jia

    Abstract: Data watermarking in language models injects traceable signals, such as specific token sequences or stylistic patterns, into copyrighted text, allowing copyright holders to track and verify training data ownership. Previous data watermarking techniques primarily focus on effective memorization after pretraining, while overlooking challenges that arise in other stages of the LLM pipeline, such as t… ▽ More

    Submitted 11 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  40. arXiv:2502.19777  [pdf, other

    cs.CV

    InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models

    Authors: Shuchang Zhou, Jiwei Wei, Shiyuan He, Yuyang Zhou, Chaoning Zhang, Jie Zou, Ning Xie, Yang Yang

    Abstract: Prompt tuning has become a popular strategy for adapting Vision-Language Models (VLMs) to zero/few-shot visual recognition tasks. Some prompting techniques introduce prior knowledge due to its richness, but when learnable tokens are randomly initialized and disconnected from prior knowledge, they tend to overfit on seen classes and struggle with domain shifts for unseen ones. To address this issue… ▽ More

    Submitted 31 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  41. arXiv:2502.18137  [pdf, other

    cs.LG cs.AI cs.CV cs.PF

    SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

    Authors: Jintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, Jianfei Chen

    Abstract: An efficient attention implementation is essential for large models due to its quadratic time complexity. Fortunately, attention commonly exhibits sparsity, i.e., many values in the attention map are near zero, allowing for the omission of corresponding computations. Many studies have utilized the sparse pattern to accelerate attention. However, most existing works focus on optimizing attention wi… ▽ More

    Submitted 1 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  42. arXiv:2502.17899  [pdf, other

    cs.CL

    Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation

    Authors: Tong Li, Shu Yang, Junchao Wu, Jiyao Wei, Lijie Hu, Mengdi Li, Derek F. Wong, Joshua R. Oltmanns, Di Wang

    Abstract: We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce \ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negat… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  43. arXiv:2502.16290  [pdf, other

    cs.CY cs.CL

    Interrogating LLM design under a fair learning doctrine

    Authors: Johnny Tian-Zheng Wei, Maggie Wang, Ameya Godbole, Jonathan H. Choi, Robin Jia

    Abstract: The current discourse on large language models (LLMs) and copyright largely takes a "behavioral" perspective, focusing on model outputs and evaluating whether they are substantially similar to training data. However, substantial similarity is difficult to define algorithmically and a narrow focus on model outputs is insufficient to address all copyright risks. In this interdisciplinary work, we ta… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  44. arXiv:2502.13383  [pdf, other

    cs.CL cs.CV cs.LG

    MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification

    Authors: Linzhuang Sun, Hao Liang, Jingxuan Wei, Bihui Yu, Tianpeng Li, Fan Yang, Zenan Zhou, Wentao Zhang

    Abstract: According to the Test-Time Scaling, the integration of External Slow-Thinking with the Verify mechanism has been demonstrated to enhance multi-round reasoning in large language models (LLMs). However, in the multimodal (MM) domain, there is still a lack of a strong MM-Verifier. In this paper, we introduce MM-Verifier and MM-Reasoner to enhance multimodal reasoning through longer inference and more… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  45. arXiv:2502.12548  [pdf, other

    cs.LG cs.AI

    Improving the Stability of GNN Force Field Models by Reducing Feature Correlation

    Authors: Yujie Zeng, Wenlong He, Ihor Vasyltsov, Jiaxin Wei, Ying Zhang, Lin Chen, Yuehua Dai

    Abstract: Recently, Graph Neural Network based Force Field (GNNFF) models are widely used in Molecular Dynamics (MD) simulation, which is one of the most cost-effective means in semiconductor material research. However, even such models provide high accuracy in energy and force Mean Absolute Error (MAE) over trained (in-distribution) datasets, they often become unstable during long-time MD simulation when u… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  46. arXiv:2502.11880  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

    Authors: Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei

    Abstract: The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ternary LLMs remain scarce. To bridge this gap, we introduce Bitnet.cpp, an inference system optimized for BitNet b1.58 and ternary LLMs. Given that mixed-precision matrix multiplication (mpGEMM) cons… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 18 pages, 11 figures

  47. arXiv:2502.11134  [pdf, other

    cs.AI astro-ph.IM

    Solving Online Resource-Constrained Scheduling for Follow-Up Observation in Astronomy: a Reinforcement Learning Approach

    Authors: Yajie Zhang, Ce Yu, Chao Sun, Jizeng Wei, Junhan Ju, Shanjiang Tang

    Abstract: In the astronomical observation field, determining the allocation of observation resources of the telescope array and planning follow-up observations for targets of opportunity (ToOs) are indispensable components of astronomical scientific discovery. This problem is computationally challenging, given the online observation setting and the abundance of time-varying factors that can affect whether a… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  48. arXiv:2502.06501  [pdf, other

    cs.CV

    Learning Clustering-based Prototypes for Compositional Zero-shot Learning

    Authors: Hongyu Qu, Jianan Wei, Xiangbo Shu, Wenguan Wang

    Abstract: Learning primitive (i.e., attribute and object) concepts from seen compositions is the primary challenge of Compositional Zero-Shot Learning (CZSL). Existing CZSL solutions typically rely on oversimplified data assumptions, e.g., modeling each primitive with a single centroid primitive representation, ignoring the natural diversities of the attribute (resp. object) when coupled with different obje… ▽ More

    Submitted 22 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted to ICLR 2025; Project page: https://github.com/quhongyu/ClusPro

  49. arXiv:2502.01968  [pdf, other

    cs.CL cs.AI

    Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning

    Authors: Jinlong Pang, Na Di, Zhaowei Zhu, Jiaheng Wei, Hao Cheng, Chen Qian, Yang Liu

    Abstract: Recent studies show that in supervised fine-tuning (SFT) of large language models (LLMs), data quality matters more than quantity. While most data cleaning methods concentrate on filtering entire samples, the quality of individual tokens within a sample can vary significantly. After pre-training, even in high-quality samples, patterns or phrases that are not task-related can be redundant or uninfo… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  50. arXiv:2502.01325  [pdf, other

    cs.HC

    The Homework Wars: Exploring Emotions, Behaviours, and Conflicts in Parent-Child Homework Interactions

    Authors: Nan Gao, Yibin Liu, Xin Tang, Yanyan Liu, Chun Yu, Yun Huang, Yuntao Wang, Flora D. Salim, Xuhai Orson Xu, Jun Wei, Yuanchun Shi

    Abstract: Parental involvement in homework is a crucial aspect of family education, but it often leads to emotional strain and conflicts that can severely impact family well-being. This paper presents findings from a 4-week in situ study involving 78 families in China, where we collected and analyzed 602 valid audio recordings (totalling 475 hours) and daily surveys. Leveraging large language models (LLMs)… ▽ More

    Submitted 4 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.