Skip to main content

Showing 1–50 of 182 results for author: Sheng, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.24044  [pdf, ps, other

    cs.CV cs.AI cs.RO

    A Survey on Vision-Language-Action Models for Autonomous Driving

    Authors: Sicong Jiang, Zilin Huang, Kangan Qian, Ziang Luo, Tianze Zhu, Yang Zhong, Yihong Tang, Menglin Kong, Yunlong Wang, Siwen Jiao, Hao Ye, Zihao Sheng, Xin Zhao, Tuopu Wen, Zheng Fu, Sikai Chen, Kun Jiang, Diange Yang, Seongjin Choi, Lijun Sun

    Abstract: The rapid progress of multimodal large language models (MLLM) has paved the way for Vision-Language-Action (VLA) paradigms, which integrate visual perception, natural language understanding, and control within a single policy. Researchers in autonomous driving are actively adapting these methods to the vehicle domain. Such models promise autonomous vehicles that can interpret high-level instructio… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  2. arXiv:2506.23490  [pdf, ps, other

    eess.IV cs.AI cs.CV

    UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound

    Authors: Junxuan Yu, Yaofei Duan, Yuhao Huang, Yu Wang, Rongbo Ling, Weihao Luo, Ang Zhang, Jingxian Xu, Qiongying Ni, Yongsong Zhou, Binghan Li, Haoran Dou, Liping Liu, Yanfen Chu, Feng Geng, Zhe Sheng, Zhifeng Ding, Dingxin Zhang, Rui Huang, Yuhang Zhang, Xiaowei Xu, Tao Tan, Dong Ni, Zhongshan Gou, Xin Yang

    Abstract: Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quan… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: accepted by miccai 2025

  3. arXiv:2506.18046  [pdf, ps, other

    cs.LG

    TAB: Unified Benchmarking of Time Series Anomaly Detection Methods

    Authors: Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, Bin Yang

    Abstract: Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare. With the ongoing instrumentation of reality, more time series data will be available, leading also to growing demands for TSAD. While many TSAD methods already exist, new and better methods are still desirable. However, effective progress hinges on the availability of relia… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Accepted by PVLDB2025

  4. Convergence-Privacy-Fairness Trade-Off in Personalized Federated Learning

    Authors: Xiyu Zhao, Qimei Cui, Weicai Li, Wei Ni, Ekram Hossain, Quan Z. Sheng, Xiaofeng Tao, Ping Zhang

    Abstract: Personalized federated learning (PFL), e.g., the renowned Ditto, strikes a balance between personalization and generalization by conducting federated learning (FL) to guide personalized learning (PL). While FL is unaffected by personalized model training, in Ditto, PL depends on the outcome of the FL. However, the clients' concern about their privacy and consequent perturbation of their local mode… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  5. A Novel Indicator for Quantifying and Minimizing Information Utility Loss of Robot Teams

    Authors: Xiyu Zhao, Qimei Cui, Wei Ni, Quan Z. Sheng, Abbas Jamalipour, Guoshun Nan, Xiaofeng Tao, Ping Zhang

    Abstract: The timely exchange of information among robots within a team is vital, but it can be constrained by limited wireless capacity. The inability to deliver information promptly can result in estimation errors that impact collaborative efforts among robots. In this paper, we propose a new metric termed Loss of Information Utility (LoIU) to quantify the freshness and utility of information critical for… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  6. arXiv:2506.05610  [pdf, ps, other

    cs.CL

    Mitigating Confounding in Speech-Based Dementia Detection through Weight Masking

    Authors: Zhecheng Sheng, Xiruo Ding, Brian Hur, Changye Li, Trevor Cohen, Serguei Pakhomov

    Abstract: Deep transformer models have been used to detect linguistic anomalies in patient transcripts for early Alzheimer's disease (AD) screening. While pre-trained neural language models (LMs) fine-tuned on AD transcripts perform well, little research has explored the effects of the gender of the speakers represented by these transcripts. This work addresses gender confounding in dementia detection and p… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 16 pages, 20 figures. Accepted to ACL 2025 Main Conference

  7. arXiv:2505.16377  [pdf

    cs.RO cs.AI

    VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving

    Authors: Yansong Qu, Zilin Huang, Zihao Sheng, Jiancong Chen, Sikai Chen, Samuel Labi

    Abstract: Reinforcement learning (RL)-based autonomous driving policy learning faces critical limitations such as low sample efficiency and poor generalization; its reliance on online interactions and trial-and-error learning is especially unacceptable in safety-critical scenarios. Existing methods including safe RL often fail to capture the true semantic meaning of "safety" in complex driving contexts, lea… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  8. arXiv:2505.11320  [pdf, other

    cs.CR

    Understanding and Characterizing Obfuscated Funds Transfers in Ethereum Smart Contracts

    Authors: Zhang Sheng, Tan Kia Quang, Shen Wang, Shengchen Duan, Kai Li, Yue Duan

    Abstract: Scam contracts on Ethereum have rapidly evolved alongside the rise of DeFi and NFT ecosystems, utilizing increasingly complex code obfuscation techniques to avoid early detection. This paper systematically investigates how obfuscation amplifies the financial risks of fraudulent contracts and undermines existing auditing tools. We propose a transfer-centric obfuscation taxonomy, distilling seven ke… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  9. arXiv:2505.09661  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Introducing voice timbre attribute detection

    Authors: Jinghao He, Zhengyan Sheng, Liping Chen, Kong Aik Lee, Zhen-Hua Ling

    Abstract: This paper focuses on explaining the timbre conveyed by speech signals and introduces a task termed voice timbre attribute detection (vTAD). In this task, voice timbre is explained with a set of sensory attributes describing its human perception. A pair of speech utterances is processed, and their intensity is compared in a designated timbre descriptor. Moreover, a framework is proposed, which is… ▽ More

    Submitted 22 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2505.09382

  10. arXiv:2505.09382  [pdf, ps, other

    cs.SD cs.AI eess.AS

    The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

    Authors: Zhengyan Sheng, Jinghao He, Liping Chen, Kong Aik Lee, Zhen-Hua Ling

    Abstract: Voice timbre refers to the unique quality or character of a person's voice that distinguishes it from others as perceived by human hearing. The Voice Timbre Attribute Detection (VtaD) 2025 challenge focuses on explaining the voice timbre attribute in a comparative manner. In this challenge, the human impression of voice timbre is verbalized with a set of sensory descriptors, including bright, coar… ▽ More

    Submitted 22 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  11. arXiv:2505.06911  [pdf, ps, other

    cs.LG cs.AI

    Enhancing Robustness to Missing Modalities through Clustered Federated Learning

    Authors: Lishan Yang, Wei Emma Zhang, Quan Z. Sheng, Weitong Chen, Lina Yao, Weitong Chen, Ali Shakeri

    Abstract: In the era of big data, data mining has become indispensable for uncovering hidden patterns and insights from vast and complex datasets. The integration of multimodal data sources further enhances its potential. Multimodal Federated Learning (MFL) is a distributed approach that enhances the efficiency and quality of multimodal learning, ensuring collaborative work and privacy protection. However,… ▽ More

    Submitted 2 July, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

    Comments: 15 pages, 3 figures

    ACM Class: I.2.11; I.2.7

  12. Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers

    Authors: Shuang Wang, He Zhang, Tianxing Wu, Yueyou Zhang, Wei Emma Zhang, Quan Z. Sheng

    Abstract: Worldwide, Geo-distributed Data Centers (GDCs) provide computing and storage services for massive workflow applications, resulting in high electricity costs that vary depending on geographical locations and time. How to reduce electricity costs while satisfying the deadline constraints of workflow applications is important in GDCs, which is determined by the execution time of servers, power, and e… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: have been accepted by IEEE Transactions on Services Computing

  13. arXiv:2504.18010  [pdf, other

    cs.RO cs.AI cs.HC

    Sky-Drive: A Distributed Multi-Agent Simulation Platform for Human-AI Collaborative and Socially-Aware Future Transportation

    Authors: Zilin Huang, Zihao Sheng, Zhengyang Wan, Yansong Qu, Yuhao Luo, Boyue Wang, Pei Li, Yen-Jung Chen, Jiancong Chen, Keke Long, Jiayi Meng, Yue Leng, Sikai Chen

    Abstract: Recent advances in autonomous system simulation platforms have significantly enhanced the safe and scalable testing of driving policies. However, existing simulators do not yet fully meet the needs of future transportation research-particularly in enabling effective human-AI collaboration and modeling socially-aware driving agents. This paper introduces Sky-Drive, a novel distributed multi-agent s… ▽ More

    Submitted 27 May, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: 14 pages, 7 figures

  14. arXiv:2504.13405  [pdf, other

    cs.CV

    ProgRoCC: A Progressive Approach to Rough Crowd Counting

    Authors: Shengqin Jiang, Linfei Li, Haokui Zhang, Qingshan Liu, Amin Beheshti, Jian Yang, Anton van den Hengel, Quan Z. Sheng, Yuankai Qi

    Abstract: As the number of individuals in a crowd grows, enumeration-based techniques become increasingly infeasible and their estimates increasingly unreliable. We propose instead an estimation-based version of the problem: we label Rough Crowd Counting that delivers better accuracy on the basis of training data that is easier to acquire. Rough crowd counting requires only rough annotations of the number o… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Under review

  15. arXiv:2503.20104  [pdf, other

    cs.CL

    "Is There Anything Else?'': Examining Administrator Influence on Linguistic Features from the Cookie Theft Picture Description Cognitive Test

    Authors: Changye Li, Zhecheng Sheng, Trevor Cohen, Serguei Pakhomov

    Abstract: Alzheimer's Disease (AD) dementia is a progressive neurodegenerative disease that negatively impacts patients' cognitive ability. Previous studies have demonstrated that changes in naturalistic language samples can be useful for early screening of AD dementia. However, the nature of language deficits often requires test administrators to use various speech elicitation techniques during spontaneous… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to CMCL 2025 workshop, co-located with NAACL 2025

  16. arXiv:2503.19735  [pdf

    eess.IV cs.CV

    InterSliceBoost: Identifying Tissue Layers in Three-dimensional Ultrasound Images for Chronic Lower Back Pain (cLBP) Assessment

    Authors: Zixue Zeng, Matthew Cartier, Xiaoyan Zhao, Pengyu Chen, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison C. Bean, Ryan P. Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Kang Kim, Ajay D. Wasan, Jiantao Pu

    Abstract: Available studies on chronic lower back pain (cLBP) typically focus on one or a few specific tissues rather than conducting a comprehensive layer-by-layer analysis. Since three-dimensional (3-D) images often contain hundreds of slices, manual annotation of these anatomical structures is both time-consuming and error-prone. We aim to develop and validate a novel approach called InterSliceBoost to e… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  17. arXiv:2503.03642  [pdf, other

    cs.DS

    Improved FPT Approximation Algorithms for TSP

    Authors: Jingyang Zhao, Zimo Sheng, Mingyu Xiao

    Abstract: TSP is a classic and extensively studied problem with numerous real-world applications in artificial intelligence and operations research. It is well-known that TSP admits a constant approximation ratio on metric graphs but becomes NP-hard to approximate within any computable function $f(n)$ on general graphs. This disparity highlights a significant gap between the results on metric graphs and gen… ▽ More

    Submitted 21 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: Improve the runtime of the FPT 3-approx. alg. from $2^{\mathcal{O}({q^2})}\cdot n^{\mathcal{O}(1)}$ to $2^{\mathcal{O}({q\log q})}\cdot n^{\mathcal{O}(1)}$

  18. arXiv:2502.16094  [pdf, other

    cs.CR

    Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging

    Authors: Lin Lu, Zhigang Zuo, Ziji Sheng, Pan Zhou

    Abstract: Model merging has emerged as a promising approach for updating large language models (LLMs) by integrating multiple domain-specific models into a cross-domain merged model. Despite its utility and plug-and-play nature, unmonitored mergers can introduce significant security vulnerabilities, such as backdoor attacks and model merging abuse. In this paper, we identify a novel and more realistic attac… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: 17 pages, 3 figures

  19. arXiv:2502.15119  [pdf, other

    cs.RO cs.AI cs.CV

    CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models

    Authors: Zihao Sheng, Zilin Huang, Yansong Qu, Yue Leng, Sruthi Bhavanam, Sikai Chen

    Abstract: Ensuring safety in autonomous driving systems remains a critical challenge, particularly in handling rare but potentially catastrophic safety-critical scenarios. While existing research has explored generating safety-critical scenarios for autonomous vehicle (AV) testing, there is limited work on effectively incorporating these scenarios into policy learning to enhance safety. Furthermore, develop… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  20. arXiv:2502.11094  [pdf, other

    cs.SD cs.AI

    SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer

    Authors: Zhengyan Sheng, Zhihao Du, Shiliang Zhang, Zhijie Yan, Yexin Yang, Zhenhua Ling

    Abstract: This paper presents a dual-stream text-to-speech (TTS) model, SyncSpeech, capable of receiving streaming text input from upstream models while simultaneously generating streaming speech, facilitating seamless interaction with large language models. SyncSpeech has the following advantages: Low latency, as it begins generating streaming speech upon receiving the second text token; High efficiency, a… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  21. arXiv:2502.11013  [pdf, other

    cs.LG cs.AI

    Collaborative Deterministic-Probabilistic Forecasting for Real-World Spatiotemporal Systems

    Authors: Zhi Sheng, Yuan Yuan, Yudi Zhang, Depeng Jin, Yong Li

    Abstract: Probabilistic forecasting is crucial for real-world spatiotemporal systems, such as climate, energy, and urban environments, where quantifying uncertainty is essential for informed, risk-aware decision-making. While diffusion models have shown promise in capturing complex data distributions, their application to spatiotemporal forecasting remains limited due to complex spatiotemporal dynamics and… ▽ More

    Submitted 17 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  22. arXiv:2502.10669  [pdf, other

    cs.CV

    Is Self-Supervised Pre-training on Satellite Imagery Better than ImageNet? A Systematic Study with Sentinel-2

    Authors: Saad Lahrichi, Zion Sheng, Shufan Xia, Kyle Bradbury, Jordan Malof

    Abstract: Self-supervised learning (SSL) has demonstrated significant potential in pre-training robust models with limited labeled data, making it particularly valuable for remote sensing (RS) tasks. A common assumption is that pre-training on domain-aligned data provides maximal benefits on downstream tasks, particularly when compared to ImageNet-pretraining (INP). In this work, we investigate this assumpt… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  23. arXiv:2502.07049  [pdf, other

    cs.CR cs.AI

    LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights

    Authors: Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, Jeff Huang

    Abstract: Large Language Models (LLMs) are emerging as transformative tools for software vulnerability detection, addressing critical challenges in the security domain. Traditional methods, such as static and dynamic analysis, often falter due to inefficiencies, high false positive rates, and the growing complexity of modern software systems. By leveraging their ability to analyze code structures, identify… ▽ More

    Submitted 12 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: 33 pages, 12 figures

  24. arXiv:2501.17690  [pdf

    cs.CV cs.AI cs.LG

    Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment

    Authors: Zixue Zeng, Xiaoyan Zhao, Matthew Cartier, Tong Yu, Jing Wang, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison Bean, Ryan Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Dinesh Kumbhare, Kang Kim, Ajay Wasan, Jiantao Pu

    Abstract: We introduce a novel segmentation-aware joint training framework called generative reinforcement network (GRN) that integrates segmentation loss feedback to optimize both image generation and segmentation performance in a single stage. An image enhancement technique called segmentation-guided enhancement (SGE) is also developed, where the generator produces images tailored specifically for the seg… ▽ More

    Submitted 23 June, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  25. arXiv:2501.15289  [pdf, other

    cs.DC

    ExClique: An Express Consensus Algorithm for High-Speed Transaction Process in Blockchains

    Authors: Chonghe Zhao, Yipeng Zhou, Shengli Zhang, Quan Z. Sheng, Yang Zhang, Shiting Wen

    Abstract: Proof of Authority (PoA) plays a pivotal role in blockchains for reaching consensus. Clique, which selects consensus nodes to generate blocks with a pre-determined order, is the most popular implementation of PoA due to its low communication overhead and energy consumption. However, our study unveils that the speed to process transactions by Clique is severely restricted by 1) the long communicati… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: Accepted for publication in IEEE INFOCOM 2025

  26. arXiv:2501.13794  [pdf, ps, other

    cs.LG

    Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction

    Authors: Zhi Sheng, Daisy Yuan, Jingtao Ding, Yong Li

    Abstract: Accurate prediction of mobile traffic, i.e., network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  27. arXiv:2501.13382  [pdf, other

    cs.PF cs.DC

    Accelerating Gaussian beam tracing method with dynamic parallelism on graphics processing units

    Authors: Zhang Sheng, Lishu Duan, Hanbo Jiang

    Abstract: This study presents a reconstruction of the Gaussian Beam Tracing solution using CUDA, with a particular focus on the utilisation of GPU acceleration as a means of overcoming the performance limitations of traditional CPU algorithms in complex acoustic simulations. The algorithm is implemented and optimised on the NVIDIA RTX A6000 GPU, resulting in a notable enhancement in the performance of the G… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  28. arXiv:2501.06394  [pdf, other

    cs.SD cs.AI eess.AS

    Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation

    Authors: Zhengyan Sheng, Zhihao Du, Heng Lu, Shiliang Zhang, Zhen-Hua Ling

    Abstract: Recent advancements in personalized speech generation have brought synthetic speech increasingly close to the realism of target speakers' recordings, yet multimodal speaker generation remains on the rise. This paper introduces UniSpeaker, a unified approach for multimodality-driven speaker generation. Specifically, we propose a unified voice aggregator based on KV-Former, applying soft contrastive… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

  29. arXiv:2501.05589   

    cs.HC

    LGL-BCI: A Motor-Imagery-Based Brain-Computer Interface with Geometric Learning

    Authors: Jianchao Lu, Yuzhe Tian, Yang Zhang, Quan Z. Sheng, Xi Zheng

    Abstract: Brain--computer interfaces are groundbreaking technology whereby brain signals are used to control external devices. Despite some advances in recent years, electroencephalogram (EEG)-based motor-imagery tasks face challenges, such as amplitude and phase variability and complex spatial correlations, with a need for smaller models and faster inference. In this study, we develop a prototype, called t… ▽ More

    Submitted 24 February, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

    Comments: We made a submission by mistake. The article arXiv:2501.05589 should be submitted as an update of article arXiv:2310.08051 instead of a new submission. We are seeking remove arXiv:2501.05589 and update the arXiv:2310.08051 to the latest version

  30. arXiv:2501.02032  [pdf, other

    cs.CR cs.AI cs.SE

    Dynamic Feature Fusion: Combining Global Graph Structures and Local Semantics for Blockchain Fraud Detection

    Authors: Zhang Sheng, Liangliang Song, Yanbin Wang

    Abstract: The advent of blockchain technology has facilitated the widespread adoption of smart contracts in the financial sector. However, current fraud detection methodologies exhibit limitations in capturing both global structural patterns within transaction networks and local semantic relationships embedded in transaction data. Most existing models focus on either structural information or semantic featu… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

  31. arXiv:2412.18254  [pdf, other

    cs.CV

    RaCMC: Residual-Aware Compensation Network with Multi-Granularity Constraints for Fake News Detection

    Authors: Xinquan Yu, Ziqi Sheng, Wei Lu, Xiangyang Luo, Jiantao Zhou

    Abstract: Multimodal fake news detection aims to automatically identify real or fake news, thereby mitigating the adverse effects caused by such misinformation. Although prevailing approaches have demonstrated their effectiveness, challenges persist in cross-modal feature fusion and refinement for classification. To address this, we present a residual-aware compensation network with multi-granularity constr… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 9 pages, 4 figures

  32. arXiv:2412.15544  [pdf, other

    cs.RO cs.AI cs.CV

    VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

    Authors: Zilin Huang, Zihao Sheng, Yansong Qu, Junwei You, Sikai Chen

    Abstract: In recent years, reinforcement learning (RL)-based methods for learning driving policies have gained increasing attention in the autonomous driving community and have achieved remarkable progress in various driving scenarios. However, traditional RL approaches rely on manually engineered rewards, which require extensive human effort and often lack generalizability. To address these limitations, we… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 28 pages, 16 figures

  33. arXiv:2412.14602  [pdf, other

    cs.LG cs.AI

    Towards Scalable and Deep Graph Neural Networks via Noise Masking

    Authors: Yuxuan Liang, Wentao Zhang, Zeang Sheng, Ling Yang, Quanqing Xu, Jiawei Jiang, Yunhai Tong, Bin Cui

    Abstract: In recent years, Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks. However, scaling them to large graphs is challenging due to the high computational and storage costs of repeated feature propagation and non-linear transformation during training. One commonly employed approach to address this challenge is model-simplification, which only executes the Propaga… ▽ More

    Submitted 9 April, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  34. arXiv:2412.10117  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

    Authors: Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou

    Abstract: In our previous work, we introduced CosyVoice, a multilingual speech synthesis model based on supervised discrete speech tokens. By employing progressive semantic decoding with two popular generative models, language models (LMs) and Flow Matching, CosyVoice demonstrated high prosody naturalness, content consistency, and speaker similarity in speech in-context learning. Recently, significant progr… ▽ More

    Submitted 25 December, 2024; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: Tech report, work in progress

  35. arXiv:2412.09981  [pdf, other

    cs.CV cs.AI

    SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints

    Authors: Ziqi Sheng, Wei Lu, Xiangyang Luo, Jiantao Zhou, Xiaochun Cao

    Abstract: Image forgery localization (IFL) is a crucial technique for preventing tampered image misuse and protecting social safety. However, due to the rapid development of image tampering technologies, extracting more comprehensive and accurate forgery clues remains an urgent challenge. To address these challenges, we introduce a novel information-theoretic IFL framework named SUMI-IFL that imposes suffic… ▽ More

    Submitted 27 December, 2024; v1 submitted 13 December, 2024; originally announced December 2024.

  36. arXiv:2412.07673  [pdf, other

    cs.HC

    Ask Humans or AI? Exploring Their Roles in Visualization Troubleshooting

    Authors: Shuyu Shen, Sirong Lu, Leixian Shen, Zhonghua Sheng, Nan Tang, Yuyu Luo

    Abstract: Visualization authoring is an iterative process requiring users to modify parameters like color schemes and data transformations to achieve desired aesthetics and effectively convey insights. Due to the complexity of these adjustments, users often create defective visualizations and require troubleshooting support. In this paper, we examine two primary approaches for visualization troubleshooting:… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: 14 pages, 7 figures

  37. arXiv:2411.17388  [pdf, other

    cs.CL cs.AI

    Can LLMs be Good Graph Judge for Knowledge Graph Construction?

    Authors: Haoyu Huang, Chong Chen, Zeang Sheng, Yang Li, Wentao Zhang

    Abstract: In real-world scenarios, most of the data obtained from the information retrieval (IR) system is unstructured. Converting natural language sentences into structured Knowledge Graphs (KGs) remains a critical challenge. We identified three limitations with respect to existing KG construction methods: (1) There could be a large amount of noise in real-world documents, which could result in extracting… ▽ More

    Submitted 20 May, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

  38. arXiv:2411.12972  [pdf, other

    cs.LG

    UniFlow: A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction

    Authors: Yuan Yuan, Jingtao Ding, Chonghua Han, Zhi Sheng, Depeng Jin, Yong Li

    Abstract: Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniF… ▽ More

    Submitted 31 March, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

  39. arXiv:2411.12588  [pdf, other

    cs.SI

    Learning To Sample the Meta-Paths for Social Event Detection

    Authors: Congbo Ma, Hu Wang, Zitai Qiu, Shan Xue, Jia Wu, Jian Yang, Preslav Nakov, Quan Z. Sheng

    Abstract: Social media data is inherently rich, as it includes not only text content, but also users, geolocation, entities, temporal information, and their relationships. This data richness can be effectively modeled using heterogeneous information networks (HINs) as it can handle multiple types of nodes and relationships, allowing for a comprehensive representation of complex interactions within social da… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  40. arXiv:2411.06493  [pdf, other

    cs.CR cs.AI

    LProtector: An LLM-driven Vulnerability Detection System

    Authors: Ze Sheng, Fenghua Wu, Xiangwu Zuo, Chao Li, Yuxin Qiao, Lei Hang

    Abstract: This paper presents LProtector, an automated vulnerability detection system for C/C++ codebases driven by the large language model (LLM) GPT-4o and Retrieval-Augmented Generation (RAG). As software complexity grows, traditional methods face challenges in detecting vulnerabilities effectively. LProtector leverages GPT-4o's powerful code comprehension and generation capabilities to perform binary cl… ▽ More

    Submitted 14 November, 2024; v1 submitted 10 November, 2024; originally announced November 2024.

    Comments: 5 pages, 4 figures. This is a preprint version of the article. The final version will be published in the proceedings of the IEEE conference

  41. arXiv:2411.03672  [pdf

    cs.CV cs.AI

    MetaSSC: Enhancing 3D Semantic Scene Completion for Autonomous Driving through Meta-Learning and Long-sequence Modeling

    Authors: Yansong Qu, Zixuan Xu, Zilin Huang, Zihao Sheng, Tiantian Chen, Sikai Chen

    Abstract: Semantic scene completion (SSC) is essential for achieving comprehensive perception in autonomous driving systems. However, existing SSC methods often overlook the high deployment costs in real-world applications. Traditional architectures, such as 3D Convolutional Neural Networks (3D CNNs) and self-attention mechanisms, face challenges in efficiently capturing long-range dependencies within 3D vo… ▽ More

    Submitted 19 February, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

  42. arXiv:2409.07494  [pdf, other

    cs.CR cs.LG q-fin.GN

    Ethereum Fraud Detection via Joint Transaction Language Model and Graph Representation Learning

    Authors: Jianguo Sun, Yifan Jia, Yanbin Wang, Yiwei Liu, Zhang Sheng, Ye Tian

    Abstract: Ethereum faces growing fraud threats. Current fraud detection methods, whether employing graph neural networks or sequence models, fail to consider the semantic information and similarity patterns within transactions. Moreover, these approaches do not leverage the potential synergistic benefits of combining both types of models. To address these challenges, we propose TLMG4Eth that combines a tran… ▽ More

    Submitted 18 February, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

  43. arXiv:2409.00858  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

    Authors: Zilin Huang, Zihao Sheng, Sikai Chen

    Abstract: In the field of autonomous driving, developing safe and trustworthy autonomous driving policies remains a significant challenge. Recently, Reinforcement Learning with Human Feedback (RLHF) has attracted substantial attention due to its potential to enhance training safety and sampling efficiency. Nevertheless, existing RLHF-enabled methods often falter when faced with imperfect human demonstration… ▽ More

    Submitted 5 September, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: 33 pages, 20 figures

  44. arXiv:2408.17380  [pdf, other

    cs.AI cs.LG

    Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

    Authors: Zihao Sheng, Zilin Huang, Sikai Chen

    Abstract: Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency compared to model-free RL by utilizing a virtual environment model. However, it is challenging to obtain sufficiently accurate representations of the environmental dynamics due to uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performa… ▽ More

    Submitted 2 February, 2025; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by Communications in Transportation Research

  45. arXiv:2408.14851  [pdf, other

    cs.IR

    Graph and Sequential Neural Networks in Session-based Recommendation: A Survey

    Authors: Zihao Li, Chao Yang, Yakun Chen, Xianzhi Wang, Hongxu Chen, Guandong Xu, Lina Yao, Quan Z. Sheng

    Abstract: Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In this survey, we will give a comprehensive overview… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  46. arXiv:2408.14735  [pdf, other

    cs.MM cs.CR cs.DC

    PPVF: An Efficient Privacy-Preserving Online Video Fetching Framework with Correlated Differential Privacy

    Authors: Xianzhi Zhang, Yipeng Zhou, Di Wu, Quan Z. Sheng, Miao Hu, Linchang Xiao

    Abstract: Online video streaming has evolved into an integral component of the contemporary Internet landscape. Yet, the disclosure of user requests presents formidable privacy challenges. As users stream their preferred online videos, their requests are automatically seized by video content providers, potentially leaking users' privacy. Unfortunately, current protection methods are not well-suited to pre… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  47. arXiv:2408.09478  [pdf, other

    cs.LG cs.CR

    Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training

    Authors: Huitong Jin, Yipeng Zhou, Laizhong Cui, Quan Z. Sheng

    Abstract: Pre-training exploits public datasets to pre-train an advanced machine learning model, so that the model can be easily tuned to adapt to various downstream tasks. Pre-training has been extensively explored to mitigate computation and communication resource consumption. Inspired by these advantages, we are the first to explore how model pre-training can mitigate noise detriment in differentially pr… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  48. arXiv:2408.08642  [pdf, other

    cs.LG

    The Power of Bias: Optimizing Client Selection in Federated Learning with Heterogeneous Differential Privacy

    Authors: Jiating Ma, Yipeng Zhou, Qi Li, Quan Z. Sheng, Laizhong Cui, Jiangchuan Liu

    Abstract: To preserve the data privacy, the federated learning (FL) paradigm emerges in which clients only expose model gradients rather than original data for conducting model training. To enhance the protection of model gradients in FL, differentially private federated learning (DPFL) is proposed which incorporates differentially private (DP) noises to obfuscate gradients before they are exposed. Yet, an… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  49. arXiv:2408.08108  [pdf, other

    cs.CV

    Unsupervised Part Discovery via Dual Representation Alignment

    Authors: Jiahao Xia, Wenjian Huang, Min Xu, Jianguo Zhang, Haimin Zhang, Ziyu Sheng, Dong Xu

    Abstract: Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks. Previous research has established that Vision Transformer can learn instance-level attention without labels, extracting high-quality instance-level representations for boosting downstream tasks. In this paper,… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted by TPAMI-2024

  50. arXiv:2406.11389  [pdf, other

    cs.LG

    SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

    Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

    Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024