Skip to main content

Showing 1–50 of 1,064 results for author: Mao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09343  [pdf, ps, other

    cs.DC cs.AI cs.AR

    Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

    Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei

    Abstract: The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inferen… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version will appear as part of the Industry Track in Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25)

  2. arXiv:2505.07003  [pdf, ps, other

    cs.CV

    CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation

    Authors: Peng Li, Suizhi Ma, Jialiang Chen, Yuan Liu, Chongyi Zhang, Wei Xue, Wenhan Luo, Alla Sheffer, Wenping Wang, Yike Guo

    Abstract: Recently, 3D generation methods have shown their powerful ability to automate 3D model creation. However, most 3D generation methods only rely on an input image or a text prompt to generate a 3D model, which lacks the control of each component of the generated 3D model. Any modifications of the input image lead to an entire regeneration of the 3D models. In this paper, we introduce a new method ca… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Siggraph 2025

  3. arXiv:2505.06637  [pdf, other

    cs.AI

    Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers

    Authors: Chi Xu, Yili Jin, Sami Ma, Rongsheng Qian, Hao Fang, Jiangchuan Liu, Xue Liu, Edith C. H. Ngai, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric

    Abstract: Wild salmon are essential to the ecological, economic, and cultural sustainability of the North Pacific Rim. Yet climate variability, habitat loss, and data limitations in remote ecosystems that lack basic infrastructure support pose significant challenges to effective fisheries management. This project explores the integration of multimodal foundation AI and expert-in-the-loop frameworks to enhan… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 10 pages, accepted by IJCAI 2025, AI and Social Good Track

  4. arXiv:2505.05794  [pdf, ps, other

    cs.AR cs.AI cs.NE

    What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips

    Authors: Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang

    Abstract: Large language models (LLMs) are rapidly pushing the limits of contemporary computing hardware. For example, training GPT-3 has been estimated to consume around 1300 MWh of electricity, and projections suggest future models may require city-scale (gigawatt) power budgets. These demands motivate exploration of computing paradigms beyond conventional von Neumann architectures. This review surveys em… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 36 pages, 22 figures

  5. arXiv:2505.05595  [pdf, ps, other

    q-fin.TR cs.AI cs.CE cs.LG

    Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer

    Authors: Wenhao Guo, Yuda Wang, Zeqiao Huang, Changjiang Zhang, Shumin ma

    Abstract: In the complex landscape of traditional futures trading, where vast data and variables like real-time Limit Order Books (LOB) complicate price predictions, we introduce the FutureQuant Transformer model, leveraging attention mechanisms to navigate these challenges. Unlike conventional models focused on point predictions, the FutureQuant model excels in forecasting the range and volatility of futur… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 16 pages, 12 figures

  6. arXiv:2505.03261  [pdf, other

    cs.CV eess.IV

    DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor

    Authors: Wei-Ting Chen, Yu-Jiet Vong, Yi-Tsung Lee, Sy-Yen Kuo, Qiang Gao, Sizhuo Ma, Jian Wang

    Abstract: Video Quality Assessment (VQA) aims to evaluate video quality based on perceptual distortions and human preferences. Despite the promising performance of existing methods using Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), they often struggle to align closely with human perceptions, particularly in diverse real-world scenarios. This challenge is exacerbated by the limited sc… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  7. arXiv:2505.01406  [pdf, other

    cs.CV cs.CR cs.LG

    VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models

    Authors: Mohammadreza Teymoorianfard, Shiqing Ma, Amir Houmansadr

    Abstract: The rapid rise of video diffusion models has enabled the generation of highly realistic and temporally coherent videos, raising critical concerns about content authenticity, provenance, and misuse. Existing watermarking approaches, whether passive, post-hoc, or adapted from image-based techniques, often struggle to withstand video-specific manipulations such as frame insertion, dropping, or reorde… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  8. arXiv:2505.00979  [pdf, other

    cs.CL cs.AI

    Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models

    Authors: Xuhui Jiang, Shengjie Ma, Chengjin Xu, Cehao Yang, Liyu Zhang, Jian Guo

    Abstract: Large Language Models (LLMs) have achieved remarkable success but remain data-inefficient, especially when learning from small, specialized corpora with limited and proprietary data. Existing synthetic data generation methods for continue pre-training focus on intra-document content and overlook cross-document knowledge associations, limiting content diversity and depth. We propose Synthetic-on-Gr… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  9. arXiv:2505.00687  [pdf, ps, other

    eess.IV cs.CV

    GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution

    Authors: Aditya Arora, Zhengzhong Tu, Yufei Wang, Ruizheng Bai, Jian Wang, Sizhuo Ma

    Abstract: In this paper, we propose GuideSR, a novel single-step diffusion-based image super-resolution (SR) model specifically designed to enhance image fidelity. Existing diffusion-based SR approaches typically adapt pre-trained generative models to image restoration tasks by adding extra conditioning on a VAE-downsampled representation of the degraded input, which often compromises structural fidelity. G… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  10. arXiv:2505.00063  [pdf, other

    cs.CL cs.CV

    GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling

    Authors: Siqi Li, Yufan Shen, Xiangnan Chen, Jiayi Chen, Hengwei Ju, Haodong Duan, Song Mao, Hongbin Zhou, Bo Zhang, Pinlong Cai, Licheng Wen, Botian Shi, Yong Liu, Xinyu Cai, Yu Qiao

    Abstract: The rapid advancement of multimodal large language models (MLLMs) has profoundly impacted the document domain, creating a wide array of application scenarios. This progress highlights the need for a comprehensive benchmark to evaluate these models' capabilities across various document-specific tasks. However, existing benchmarks often fail to locate specific model weaknesses or guide systematic im… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  11. arXiv:2504.21801  [pdf, other

    cs.CL cs.AI

    DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

    Authors: Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a ch… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  12. arXiv:2504.21055  [pdf, ps, other

    cs.LG cs.AI

    Modeling and Performance Analysis for Semantic Communications Based on Empirical Results

    Authors: Shuai Ma, Bin Shen, Chuanhui Zhang, Youlong Wu, Hang Li, Shiyin Li, Guangming Shi, Naofal Al-Dhahir

    Abstract: Due to the black-box characteristics of deep learning based semantic encoders and decoders, finding a tractable method for the performance analysis of semantic communications is a challenging problem. In this paper, we propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR, which can be applied for both image reconstruction tasks and inferenc… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  13. arXiv:2504.20674  [pdf, other

    cs.CE

    DiffLiB: High-fidelity differentiable modeling of lithium-ion batteries and efficient gradient-based parameter identification

    Authors: Weipeng Xu, Kaiqi Yang, Yuzhi Zhang, Shichao Sun, Sheng Mao, Tianju Xue

    Abstract: The physics-based Doyle-Fuller-Newman (DFN) model, widely adopted for its precise electrochemical modeling, stands out among various simulation models of lithium-ion batteries (LIBs). Although the DFN model is powerful in forward predictive analysis, the inverse identification of its model parameters has remained a long-standing challenge. The numerous unknown parameters associated with the nonlin… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  14. arXiv:2504.20412  [pdf, other

    cs.SE cs.AI cs.OS

    CrashFixer: A crash resolution agent for the Linux kernel

    Authors: Alex Mathai, Chenxi Huang, Suwei Ma, Jihwan Kim, Hailie Mitchell, Aleksandr Nogikh, Petros Maniatis, Franjo Ivančić, Junfeng Yang, Baishakhi Ray

    Abstract: Code large language models (LLMs) have shown impressive capabilities on a multitude of software engineering tasks. In particular, they have demonstrated remarkable utility in the task of code repair. However, common benchmarks used to evaluate the performance of code LLMs are often limited to small-scale settings. In this work, we build upon kGym, which shares a benchmark for system-level Linux ke… ▽ More

    Submitted 13 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  15. arXiv:2504.19660  [pdf, other

    cs.NI eess.SP

    Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey

    Authors: Yunting Xu, Jiacheng Wang, Ruichen Zhang, Changyuan Zhao, Dusit Niyato, Jiawen Kang, Zehui Xiong, Bo Qian, Haibo Zhou, Shiwen Mao, Abbas Jamalipour, Xuemin Shen, Dong In Kim

    Abstract: Mixture of Experts (MoE) has emerged as a promising paradigm for scaling model capacity while preserving computational efficiency, particularly in large-scale machine learning architectures such as large language models (LLMs). Recent advances in MoE have facilitated its adoption in wireless networks to address the increasing complexity and heterogeneity of modern communication systems. This paper… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Survey paper, 30 pages, 13 figures

  16. arXiv:2504.19399  [pdf, other

    cs.RO

    Follow Everything: A Leader-Following and Obstacle Avoidance Framework with Goal-Aware Adaptation

    Authors: Qianyi Zhang, Shijian Ma, Boyi Liu, Jianhao Jiao, Dimitrios Kanoulas

    Abstract: Robust and flexible leader-following is a critical capability for robots to integrate into human society. While existing methods struggle to generalize to leaders of arbitrary form and often fail when the leader temporarily leaves the robot's field of view, this work introduces a unified framework addressing both challenges. First, traditional detection models are replaced with a segmentation mode… ▽ More

    Submitted 12 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

  17. arXiv:2504.18432  [pdf, other

    cs.NI

    FlexiNS: A SmartNIC-Centric, Line-Rate and Flexible Network Stack

    Authors: Xuzheng Chen, Jie Zhang, Baolin Zhu, Xueying Zhu, Zhongqing Chen, Shu Ma, Lingjun Zhu, Chao Shi, Yin Zhang, Zeke Wang

    Abstract: As the gap between network and CPU speeds rapidly increases, the CPU-centric network stack proves inadequate due to excessive CPU and memory overhead. While hardware-offloaded network stacks alleviate these issues, they suffer from limited flexibility in both control and data planes. Offloading network stack to off-path SmartNIC seems promising to provide high flexibility; however, throughput rema… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  18. arXiv:2504.18415  [pdf, other

    cs.CL cs.LG

    BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

    Authors: Hongyu Wang, Shuming Ma, Furu Wei

    Abstract: Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by activation outliers, which complicate quantization to low bit-widths. We introduce BitNet v2, a novel framework enabling native 4-bit activation quantization for 1-bit LLMs. To tackle outliers in attention and feed-forward network activations, we propose H-BitLinear, a module applying an online Hadamard transformation prior… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Work in progress

  19. POET: Prompt Offset Tuning for Continual Human Action Adaptation

    Authors: Prachi Garg, Joseph K J, Vineeth N Balasubramanian, Necati Cihan Camgoz, Chengde Wan, Kenrick Kin, Weiguang Si, Shugao Ma, Fernando De La Torre

    Abstract: As extended reality (XR) is redefining how users interact with computing devices, research in human action recognition is gaining prominence. Typically, models deployed on immersive computing devices are static and limited to their default set of classes. The goal of our research is to provide users and developers with the capability to personalize their experience by adding new action classes to… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: ECCV 2024 (Oral), webpage https://humansensinglab.github.io/POET-continual-action-recognition/

    Journal ref: ECCV 2024, Lecture Notes in Computer Science, vol. 15122, Springer, 2025, pp. 436-455

  20. arXiv:2504.16146  [pdf, other

    eess.SP cs.IT cs.NI

    Aerial Active STAR-RIS-assisted Satellite-Terrestrial Covert Communications

    Authors: Chuang Zhang, Geng Sun, Jiahui Li, Jiacheng Wang, Ruichen Zhang, Dusit Niyato, Shiwen Mao, Tony Q. S. Quek

    Abstract: An integration of satellites and terrestrial networks is crucial for enhancing performance of next generation communication systems. However, the networks are hindered by the long-distance path loss and security risks in dense urban environments. In this work, we propose a satellite-terrestrial covert communication system assisted by the aerial active simultaneous transmitting and reflecting recon… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  21. arXiv:2504.15622  [pdf, other

    cs.CR

    Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey

    Authors: Shuang Tian, Tao Zhang, Jiqiang Liu, Jiacheng Wang, Xuangou Wu, Xiaoqiang Zhu, Ruichen Zhang, Weiting Zhang, Zhenhui Yuan, Shiwen Mao, Dong In Kim

    Abstract: With the rapid development of technology and the acceleration of digitalisation, the frequency and complexity of cyber security threats are increasing. Traditional cybersecurity approaches, often based on static rules and predefined scenarios, are struggling to adapt to the rapidly evolving nature of modern cyberattacks. There is an urgent need for more adaptive and intelligent defence strategies.… ▽ More

    Submitted 28 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 20 pages, 3 figures

  22. arXiv:2504.15037  [pdf, other

    cs.LG

    A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

    Authors: Huanyu Zhang, Chengzu Li, Wenshan Wu, Shaoguang Mao, Yan xia, Ivan Vulić, Zhang Zhang, Liang Wang, Tieniu Tan, Furu Wei

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. However, recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world, thereby limiting their broader applications. We argue that… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  23. arXiv:2504.14587  [pdf, other

    cs.LG cs.IR

    Generative Auto-Bidding with Value-Guided Explorations

    Authors: Jingtong Gao, Yewen Li, Shuai Mao, Peng Jiang, Nan Jiang, Yejing Wang, Qingpeng Cai, Fei Pan, Peng Jiang, Kun Gai, Bo An, Xiangyu Zhao

    Abstract: Auto-bidding, with its strong capability to optimize bidding decisions within dynamic and competitive online environments, has become a pivotal strategy for advertising platforms. Existing approaches typically employ rule-based strategies or Reinforcement Learning (RL) techniques. However, rule-based strategies lack the flexibility to adapt to time-varying market conditions, and RL-based methods s… ▽ More

    Submitted 25 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  24. arXiv:2504.13915  [pdf, other

    cs.CV

    Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding

    Authors: Dibyadip Chatterjee, Edoardo Remelli, Yale Song, Bugra Tekin, Abhay Mittal, Bharat Bhatnagar, Necati Cihan Camgöz, Shreyas Hampali, Eric Sauser, Shugao Ma, Angela Yao, Fadime Sener

    Abstract: We introduce ProVideLLM, an end-to-end framework for real-time procedural video understanding. ProVideLLM integrates a multimodal cache configured to store two types of tokens - verbalized text tokens, which provide compressed textual summaries of long-term observations, and visual tokens, encoded with DETR-QFormer to capture fine-grained details from short-term observations. This design reduces t… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 13 pages, 5 figures; https://dibschat.github.io/ProVideLLM

  25. arXiv:2504.12401  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Event-Based Image Deblurring: Methods and Results

    Authors: Lei Sun, Andrea Alfarano, Peiqi Duan, Shaolin Su, Kaiwei Wang, Boxin Shi, Radu Timofte, Danda Pani Paudel, Luc Van Gool, Qinglin Liu, Wei Yu, Xiaoqian Lv, Lu Yang, Shuigen Wang, Shengping Zhang, Xiangyang Ji, Long Bao, Yuqiang Yang, Jinao Song, Ziyi Wang, Shuang Wen, Heng Sun, Kean Liu, Mingchen Zhong, Senyan Xu , et al. (63 additional authors not shown)

    Abstract: This paper presents an overview of NTIRE 2025 the First Challenge on Event-Based Image Deblurring, detailing the proposed methodologies and corresponding results. The primary goal of the challenge is to design an event-based method that achieves high-quality image deblurring, with performance quantitatively assessed using Peak Signal-to-Noise Ratio (PSNR). Notably, there are no restrictions on com… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  26. arXiv:2504.12285  [pdf, other

    cs.CL cs.LG

    BitNet b1.58 2B4T Technical Report

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei

    Abstract: We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability. Our results demonstrate that BitNet b1.58 2B4T achieves performanc… ▽ More

    Submitted 24 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: Work in progress

  27. arXiv:2504.11588  [pdf, other

    cs.CV cs.AI

    Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey

    Authors: Siteng Ma, Honghui Du, Yu An, Jing Wang, Qinqin Wang, Haochang Wu, Aonghus Lawlor, Ruihai Dong

    Abstract: Deep learning has achieved significant breakthroughs in medical imaging, but these advancements are often dependent on large, well-annotated datasets. However, obtaining such datasets poses a significant challenge, as it requires time-consuming and labor-intensive annotations from medical experts. Consequently, there is growing interest in learning paradigms such as incomplete, inexact, and absent… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 33 pages, 10 figures, 8 tables. Will be submit to Medical Image Analysis

    MSC Class: 68T07; 68T45; 92C50; 92C55 ACM Class: I.2.10; I.4.5; I.4.6; I.4.9; J.3

  28. arXiv:2504.11281  [pdf, other

    cs.HC cs.CL cs.CR

    The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections

    Authors: Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li

    Abstract: A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions. It does so by perceiving and interpreting the graphical user interfaces (GUIs) of relevant apps, often visually, inferring necessary sequences of actions, and then interacting with GUIs by executing the actions such as clicking, typing, an… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  29. arXiv:2504.10918  [pdf, other

    cs.HC

    Adaptive Human-Agent Teaming: A Review of Empirical Studies from the Process Dynamics Perspective

    Authors: Mengyao Wang, Jiayun Wu, Shuai Ma, Nuo Li, Peng Zhang, Ning Gu, Tun Lu

    Abstract: The rapid advancement of AI, including Large Language Models, has propelled autonomous agents forward, accelerating the human-agent teaming (HAT) paradigm to leverage complementary strengths. However, HAT research remains fragmented, often focusing on isolated team development phases or specific challenges like trust calibration while overlooking the real-world need for adaptability. Addressing th… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  30. arXiv:2504.09823  [pdf, other

    cs.IR

    RAKG:Document-level Retrieval Augmented Knowledge Graph Construction

    Authors: Hairong Zhang, Jiaheng Si, Guohang Yan, Boyuan Qi, Pinlong Cai, Song Mao, Ding Wang, Botian Shi

    Abstract: With the rise of knowledge graph based retrieval-augmented generation (RAG) techniques such as GraphRAG and Pike-RAG, the role of knowledge graphs in enhancing the reasoning capabilities of large language models (LLMs) has become increasingly prominent. However, traditional Knowledge Graph Construction (KGC) methods face challenges like complex entity disambiguation, rigid schema definition, and i… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 9 pages, 6 figures

  31. arXiv:2504.08520  [pdf, other

    eess.SP cs.IT

    Joint Transmit Waveform and Receive Filter Design for ISAC System with Jamming

    Authors: Yuan Shu, Chenhao Qi, Shiwen Mao

    Abstract: In this paper, to suppress jamming in the complex electromagnetic environment, we propose a joint transmit waveform and receive filter design framework for integrated sensing and communications (ISAC). By jointly optimizing the transmit waveform and receive filters, we aim at minimizing the multiuser interference (MUI), subject to the constraints of the target mainlobe, jamming mainlobe and peak s… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  32. arXiv:2504.07278  [pdf

    cs.LG cs.AI

    A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation

    Authors: Fatemeh Amrollahi, Nicholas Marshall, Fateme Nateghi Haredasht, Kameron C Black, Aydin Zahedivash, Manoj V Maddali, Stephen P. Ma, Amy Chang, MD Phar Stanley C Deresinski, Mary Kane Goldstein, Steven M. Asch, Niaz Banaei, Jonathan H Chen

    Abstract: Blood cultures are often over ordered without clear justification, straining healthcare resources and contributing to inappropriate antibiotic use pressures worsened by the global shortage. In study of 135483 emergency department (ED) blood culture orders, we developed machine learning (ML) models to predict the risk of bacteremia using structured electronic health record (EHR) data and provider n… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 10 pages, 2 figures, 2 tables, conference

  33. arXiv:2504.07002  [pdf, ps, other

    cs.CR cs.SE

    DeCoMa: Detecting and Purifying Code Dataset Watermarks through Dual Channel Code Abstraction

    Authors: Yuan Xiao, Yuchen Chen, Shiqing Ma, Haocheng Huang, Chunrong Fang, Yanwei Chen, Weisong Sun, Yunfeng Zhu, Xiaofang Zhang, Zhenyu Chen

    Abstract: Watermarking is a technique to help identify the source of data points, which can be used to help prevent the misuse of protected datasets. Existing methods on code watermarking, leveraging the idea from the backdoor research, embed stealthy triggers as watermarks.Despite their high resilience against dilution attacks and backdoor detections, the robustness has not been fully evaluated. To fill th… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted to ISSTA 2025. Code is available at https://github.com/xiaoyuanpigo/DeCoMa

  34. Signaling Human Intentions to Service Robots: Understanding the Use of Social Cues during In-Person Conversations

    Authors: Hanfang Lyu, Xiaoyu Wang, Nandi Zhang, Shuai Ma, Qian Zhu, Yuhan Luo, Fugee Tsung, Xiaojuan Ma

    Abstract: As social service robots become commonplace, it is essential for them to effectively interpret human signals, such as verbal, gesture, and eye gaze, when people need to focus on their primary tasks to minimize interruptions and distractions. Toward such a socially acceptable Human-Robot Interaction, we conducted a study ($N=24$) in an AR-simulated context of a coffee chat. Participants elicited so… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: CHI '25

  35. arXiv:2504.04346  [pdf, other

    cs.AI cs.SI

    Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

    Authors: Zhijie Duan, Kai Wei, Zhaoqian Xue, Jiayan Zhou, Shu Yang, Siyuan Ma, Jin Jin, Lingyao li

    Abstract: Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG)… ▽ More

    Submitted 7 April, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

    MSC Class: J.4

  36. ProtoGCD: Unified and Unbiased Prototype Learning for Generalized Category Discovery

    Authors: Shijie Ma, Fei Zhu, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: Generalized category discovery (GCD) is a pragmatic but underexplored problem, which requires models to automatically cluster and discover novel categories by leveraging the labeled samples from old classes. The challenge is that unlabeled data contain both old and new classes. Early works leveraging pseudo-labeling with parametric classifiers handle old and new classes separately, which brings ab… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted to IEEE TPAMI 2025

  37. arXiv:2504.02495  [pdf, other

    cs.CL cs.AI cs.LG

    Inference-Time Scaling for Generalist Reward Modeling

    Authors: Zijun Liu, Peiyi Wang, Runxin Xu, Shirong Ma, Chong Ruan, Peng Li, Yang Liu, Yu Wu

    Abstract: Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $\textit{proper learning methods could enable effective inference-time scalability}$. A key challenge of RL is to obtain accurate reward signals for LLMs in various domains beyond verifiable questions… ▽ More

    Submitted 5 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: Preprint, under review. 42 pages

  38. arXiv:2504.02061  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Aligned Better, Listen Better for Audio-Visual Large Language Models

    Authors: Yuxin Guo, Shuailei Ma, Shijie Ma, Xiaoyi Bao, Chen-Wei Xie, Kecheng Zheng, Tingyu Weng, Siyang Sun, Yun Zheng, Wei Zou

    Abstract: Audio is essential for multimodal video understanding. On the one hand, video inherently contains audio, which supplies complementary information to vision. Besides, video large language models (Video-LLMs) can encounter many audio-centric settings. However, existing Video-LLMs and Audio-Visual Large Language Models (AV-LLMs) exhibit deficiencies in exploiting audio information, leading to weak un… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted to ICLR 2025

  39. arXiv:2504.00890  [pdf, other

    stat.ML cs.LG

    Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks

    Authors: Xiao Guo, Xuming He, Xiangyu Chang, Shujie Ma

    Abstract: This paper develops a new spectral clustering-based method called TransNet for transfer learning in community detection of network data. Our goal is to improve the clustering performance of the target network using auxiliary source networks, which are heterogeneous, privacy-preserved, and locally stored across various sources. The edges of each locally stored network are perturbed using the random… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  40. arXiv:2503.24229  [pdf, other

    cs.CV

    Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes

    Authors: Daichi Otsuka, Shinichi Mae, Ryosuke Yamada, Hirokatsu Kataoka

    Abstract: In the recent years, the research community has witnessed growing use of 3D point cloud data for the high applicability in various real-world applications. By means of 3D point cloud, this modality enables to consider the actual size and spatial understanding. The applied fields include mechanical control of robots, vehicles, or other real-world systems. Along this line, we would like to improve 3… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  41. arXiv:2503.23290  [pdf, other

    cs.NI

    Efficient Twin Migration in Vehicular Metaverses: Multi-Agent Split Deep Reinforcement Learning with Spatio-Temporal Trajectory Generation

    Authors: Junlong Chen, Jiawen Kang, Minrui Xu, Fan Wu, Hongliang Zhang, Huawei Huang, Dusit Niyato, Shiwen Mao

    Abstract: Vehicle Twins (VTs) as digital representations of vehicles can provide users with immersive experiences in vehicular metaverse applications, e.g., Augmented Reality (AR) navigation and embodied intelligence. VT migration is an effective way that migrates the VT when the locations of physical entities keep changing to maintain seamless immersive VT services. However, an efficient VT migration is ch… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  42. arXiv:2503.21297  [pdf, other

    cs.AR cs.DC

    MLDSE: Scaling Design Space Exploration Infrastructure for Multi-Level Hardware

    Authors: Huanyu Qu, Weihao Zhang, Junfeng Lin, Songchen Ma, Hongyi Li, Luping Shi, Chengzhong Xu

    Abstract: To efficiently support large-scale NNs, multi-level hardware, leveraging advanced integration and interconnection technologies, has emerged as a promising solution to counter the slowdown of Moore's law. However, the vast design space of such hardware, coupled with the complexity of their spatial hierarchies and organizations, introduces significant challenges for design space exploration (DSE). E… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  43. arXiv:2503.20981  [pdf, other

    cs.CL cs.AI cs.SI

    Patients Speak, AI Listens: LLM-based Analysis of Online Reviews Uncovers Key Drivers for Urgent Care Satisfaction

    Authors: Xiaoran Xu, Zhaoqian Xue, Chi Zhang, Jhonatan Medri, Junjie Xiong, Jiayan Zhou, Jin Jin, Yongfeng Zhang, Siyuan Ma, Lingyao Li

    Abstract: Investigating the public experience of urgent care facilities is essential for promoting community healthcare development. Traditional survey methods often fall short due to limited scope, time, and spatial coverage. Crowdsourcing through online reviews or social media offers a valuable approach to gaining such insights. With recent advancements in large language models (LLMs), extracting nuanced… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  44. arXiv:2503.20762  [pdf, other

    cs.LG math.OC

    ASGO: Adaptive Structured Gradient Optimization

    Authors: Kang An, Yuxing Liu, Rui Pan, Shiqian Ma, Donald Goldfarb, Tong Zhang

    Abstract: Training deep neural networks (DNNs) is a structured optimization problem, because the parameters are naturally represented by matrices and tensors rather than simple vectors. Under this structural representation, it has been widely observed that gradients are low-rank and Hessians are approximately block-wise diagonal. These structured properties are crucial for designing efficient optimization a… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 25 pages, 4 figures

  45. arXiv:2503.20663  [pdf, other

    cs.CV

    ARMO: Autoregressive Rigging for Multi-Category Objects

    Authors: Mingze Sun, Shiwei Mao, Keyi Chen, Yurun Chen, Shunlin Lu, Jingbo Wang, Junting Dong, Ruqi Huang

    Abstract: Recent advancements in large-scale generative models have significantly improved the quality and diversity of 3D shape generation. However, most existing methods focus primarily on generating static 3D models, overlooking the potentially dynamic nature of certain shapes, such as humanoids, animals, and insects. To address this gap, we focus on rigging, a fundamental task in animation that establis… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  46. arXiv:2503.19480  [pdf, other

    cs.CV

    GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

    Authors: Shijie Ma, Yuying Ge, Teng Wang, Yuxin Guo, Yixiao Ge, Ying Shan

    Abstract: The synergy between generative and discriminative models receives growing attention. While discriminative Contrastive Language-Image Pre-Training (CLIP) excels in high-level semantics, it struggles with perceiving fine-grained visual details. Generally, to enhance representations, generative models take CLIP's visual features as conditions for reconstruction. However, the underlying principle rema… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Project released at: https://mashijie1028.github.io/GenHancer/

  47. arXiv:2503.19367  [pdf, other

    cs.CV

    VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction

    Authors: Zizhi Chen, Minghao Han, Xukun Zhang, Shuwei Ma, Tao Liu, Xing Wei, Lihua Zhang

    Abstract: Multimodal learning combining pathology images and genomic sequences enhances cancer survival analysis but faces clinical implementation barriers due to limited access to genomic sequencing in under-resourced regions. To enable survival prediction using only whole-slide images (WSI), we propose the Visual-Genomic Answering-Guided Transformer (VGAT), a framework integrating Visual Question Answerin… ▽ More

    Submitted 29 March, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: Acceppted by ICME2025

  48. arXiv:2503.18073  [pdf, other

    cs.CV cs.RO

    PanopticSplatting: End-to-End Panoptic Gaussian Splatting

    Authors: Yuxuan Xie, Xuan Yu, Changjian Jiang, Sitong Mao, Shunbo Zhou, Rui Fan, Rong Xiong, Yue Wang

    Abstract: Open-vocabulary panoptic reconstruction is a challenging task for simultaneous scene reconstruction and understanding. Recently, methods have been proposed for 3D scene understanding based on Gaussian splatting. However, these methods are multi-staged, suffering from the accumulated errors and the dependence of hand-designed components. To streamline the pipeline and achieve global optimization, w… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 8 pages, 6 figures

  49. arXiv:2503.18065  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation

    Authors: Ziming Wei, Bingqian Lin, Yunshuang Nie, Jiaqi Chen, Shikui Ma, Hang Xu, Xiaodan Liang

    Abstract: Data scarcity is a long-standing challenge in the Vision-Language Navigation (VLN) field, which extremely hinders the generalization of agents to unseen environments. Previous works primarily rely on additional simulator data or web-collected images/videos to improve the generalization. However, the simulator environments still face limited diversity, and the web-collected data often requires exte… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  50. arXiv:2503.18061  [pdf, other

    cs.NE cs.LG

    Reinforcement Learning-based Self-adaptive Differential Evolution through Automated Landscape Feature Learning

    Authors: Hongshu Guo, Sijie Ma, Zechuan Huang, Yuzhi Hu, Zeyuan Ma, Xinglin Zhang, Yue-Jiao Gong

    Abstract: Recently, Meta-Black-Box-Optimization (MetaBBO) methods significantly enhance the performance of traditional black-box optimizers through meta-learning flexible and generalizable meta-level policies that excel in dynamic algorithm configuration (DAC) tasks within the low-level optimization, reducing the expertise required to adapt optimizers for novel optimization tasks. Though promising, existing… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted as full paper at ACM GECCO 2025