Skip to main content

Showing 1–50 of 1,557 results for author: Tang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10257  [pdf, ps, other

    cs.CV

    Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot

    Authors: Hao Lu, Jiaqi Tang, Jiyao Wang, Yunfan LU, Xu Cao, Qingyong Hu, Yin Wang, Yuting Zhang, Tianxin Xie, Yunpeng Zhang, Yong Chen, Jiayu. Gao, Bin Huang, Dengbo He, Shuiguang Deng, Hao Chen, Ying-Cong Chen

    Abstract: The intelligent driving cockpit, an important part of intelligent driving, needs to match different users' comfort, interaction, and safety needs. This paper aims to build a Super-Aligned and GEneralist DRiving agent, SAGE DeeR. Sage Deer achieves three highlights: (1) Super alignment: It achieves different reactions according to different people's preferences and biases. (2) Generalist: It can u… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09498  [pdf, other

    cs.CV cs.AI

    Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput

    Authors: Bo Zhang, Shuo Li, Runhe Tian, Yang Yang, Jixin Tang, Jinhao Zhou, Lin Ma

    Abstract: In this paper, we introduce Flash-VL 2B, a novel approach to optimizing Vision-Language Models (VLMs) for real-time applications, targeting ultra-low latency and high throughput without sacrificing accuracy. Leveraging advanced architectural enhancements and efficient computational strategies, Flash-VL 2B is designed to maximize throughput by reducing processing time while maintaining competitive… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 18 pages, 7 figures

  3. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  5. arXiv:2505.06687  [pdf, ps, other

    cs.AR

    Extend IVerilog to Support Batch RTL Fault Simulation

    Authors: Jiaping Tang, Jianan Mu, Zizhen Liu, Zhiteng Chao, Jing Ye, Huawei Li

    Abstract: The advancement of functional safety has made RTL-level fault simulation increasingly important to achieve iterative efficiency in the early stages of design and to ensure compliance with functional safety standards. In this paper, we extend IVerilog to support batch RTL fault simulation and integrate the event-driven algorithm and the concurrent fault simulation algorithm. Comparative experiments… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  6. arXiv:2505.05512  [pdf, other

    cs.CV cs.RO

    Occupancy World Model for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Jingkai Sun, Jiahang Cao, Jiaxu Wang, Hao Cheng, Xiaozhu Ju, Zhengping Che, Renjing Xu, Jian Tang

    Abstract: Understanding and forecasting the scene evolutions deeply affect the exploration and decision of embodied agents. While traditional methods simulate scene evolutions through trajectory prediction of potential instances, current works use the occupancy world model as a generative framework for describing fine-grained overall scene dynamics. However, existing methods cluster on the outdoor structure… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  7. Statistical CSI Acquisition for Multi-frequency Massive MIMO Systems

    Authors: Jinke Tang, Li You, Xinrui Gong, Chenjie Xie, Xiqi Gao, Xiang-Gen Xia, Xueyuan Shi

    Abstract: Multi-frequency massive multi-input multi-output (MIMO) communication is a promising strategy for both 5G and future 6G systems, ensuring reliable transmission while enhancing frequency resource utilization. Statistical channel state information (CSI) has been widely adopted in multi-frequency massive MIMO transmissions to reduce overhead and improve transmission performance. In this paper, we pro… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 9 figures. Accepted for publication on IEEE Transactions on Communications

  8. Massive MIMO-OFDM Channel Acquisition with Time-Frequency Phase-Shifted Pilots

    Authors: Jinke Tang, Xiqi Gao, Li You, Ding Shi, Jiyuan Yang, Xiang-Gen Xia, Xinwei Zhao, Peigang Jiang

    Abstract: In this paper, we propose a channel acquisition approach with time-frequency phase-shifted pilots (TFPSPs) for massive multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We first present a triple-beam (TB) based channel tensor model, allowing for the representation of the space-frequency-time (SFT) domain channel as the product of beam matrices and the TB doma… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 10 figures. Accepted for publication on IEEE Transactions on Communications

  9. arXiv:2505.04172  [pdf, other

    eess.IV cs.HC physics.med-ph

    A Dataset and Toolkit for Multiparameter Cardiovascular Physiology Sensing on Rings

    Authors: Jiankai Tang, Kegang Wang, Yingke Ding, Jiatong Ji, Zeyu Wang, Xiyuxing Zhang, Ping Chen, Yuanchun Shi, Yuntao Wang

    Abstract: Smart rings offer a convenient way to continuously and unobtrusively monitor cardiovascular physiological signals. However, a gap remains between the ring hardware and reliable methods for estimating cardiovascular parameters, partly due to the lack of publicly available datasets and standardized analysis tools. In this work, we present $τ$-Ring, the first open-source ring-based dataset designed f… ▽ More

    Submitted 8 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  10. arXiv:2505.00619  [pdf, other

    cs.CV

    Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification

    Authors: Neng Dong, Shuanglin Yan, Liyan Zhang, Jinhui Tang

    Abstract: Visible-Infrared Person Re-Identification (VI-ReID) is a challenging task due to the large modality discrepancy between visible and infrared images, which complicates the alignment of their features into a suitable common space. Moreover, style noise, such as illumination and color contrast, reduces the identity discriminability and modality invariance of features. To address these challenges, we… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  11. arXiv:2505.00232  [pdf, other

    cs.LG cs.AI

    Scaling On-Device GPU Inference for Large Generative Models

    Authors: Jiuqiang Tang, Raman Sarokin, Ekaterina Ignasheva, Grant Jensen, Lin Chen, Juhyun Lee, Andrei Kulik, Matthias Grundmann

    Abstract: Driven by the advancements in generative AI, large machine learning models have revolutionized domains such as image processing, audio synthesis, and speech recognition. While server-based deployments remain the locus of peak performance, the imperative for on-device inference, necessitated by privacy and efficiency considerations, persists. Recognizing GPUs as the on-device ML accelerator with th… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

    Comments: to be published in CVPR 2025 Workshop on Efficient and On-Device Generation (EDGE)

  12. arXiv:2504.21347  [pdf, other

    cs.AI cs.HC

    IRL Dittos: Embodied Multimodal AI Agent Interactions in Open Spaces

    Authors: Seonghee Lee, Denae Ford, John Tang, Sasa Junuzovic, Asta Roseway, Ed Cutrell, Kori Inkpen

    Abstract: We introduce the In Real Life (IRL) Ditto, an AI-driven embodied agent designed to represent remote colleagues in shared office spaces, creating opportunities for real-time exchanges even in their absence. IRL Ditto offers a unique hybrid experience by allowing in-person colleagues to encounter a digital version of their remote teammates, initiating greetings, updates, or small talk as they might… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 8 pages, 3 figures

    ACM Class: H.5.2; I.2.9

  13. arXiv:2504.19607  [pdf, ps, other

    cs.RO

    Adaptive Locomotion on Mud through Proprioceptive Sensing of Substrate Properties

    Authors: Shipeng Liu, Jiaze Tang, Siyuan Meng, Feifei Qian

    Abstract: Muddy terrains present significant challenges for terrestrial robots, as subtle changes in composition and water content can lead to large variations in substrate strength and force responses, causing the robot to slip or get stuck. This paper presents a method to estimate mud properties using proprioceptive sensing, enabling a flipper-driven robot to adapt its locomotion through muddy substrates… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 12 pages, 8 figures

  14. arXiv:2504.19353  [pdf, other

    cs.LG cs.AI

    Flow Along the K-Amplitude for Generative Modeling

    Authors: Weitao Du, Shuning Chang, Jiasheng Tang, Yu Rong, Fan Wang, Shengchao Liu

    Abstract: In this work, we propose a novel generative learning paradigm, K-Flow, an algorithm that flows along the $K$-amplitude. Here, $k$ is a scaling parameter that organizes frequency bands (or projected coefficients), and amplitude describes the norm of such projected coefficients. By incorporating the $K$-amplitude decomposition, K-Flow enables flow matching across the scaling parameter as time. We di… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  15. arXiv:2504.19298  [pdf, other

    cs.CL

    AndroidGen: Building an Android Language Agent under Data Scarcity

    Authors: Hanyu Lai, Junjie Gao, Xiao Liu, Yifan Xu, Shudan Zhang, Yuxiao Dong, Jie Tang

    Abstract: Large language models have opened up a world of possibilities for various NLP tasks, sparking optimism for the future. Despite their potential, LLMs have yet to be widely used as agents on real mobile devices. The main challenge is the need for high-quality data sources. Time constraints and labor intensity often hinder human annotation. On the other hand, existing LLMs exhibit inadequate completi… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  16. arXiv:2504.18428  [pdf, other

    cs.CL

    PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

    Authors: Yiming Wang, Pei Zhang, Jialong Tang, Haoran Wei, Baosong Yang, Rui Wang, Chenshu Sun, Feitong Sun, Jiran Zhang, Junxuan Wu, Qiqian Cang, Yichang Zhang, Fei Huang, Junyang Lin, Fei Huang, Jingren Zhou

    Abstract: In this paper, we introduce PolyMath, a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs. We conduct a comprehensive evaluation for advanced L… ▽ More

    Submitted 30 April, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: Work in Progress

  17. arXiv:2504.18020  [pdf, other

    cs.CV

    Federated Client-tailored Adapter for Medical Image Segmentation

    Authors: Guyue Hu, Siyuan Song, Yukun Kang, Zhu Yin, Gangming Zhao, Chenglong Li, Jin Tang

    Abstract: Medical image segmentation in X-ray images is beneficial for computer-aided diagnosis and lesion localization. Existing methods mainly fall into a centralized learning paradigm, which is inapplicable in the practical medical scenario that only has access to distributed data islands. Federated Learning has the potential to offer a distributed solution but struggles with heavy training instability d… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  18. arXiv:2504.16473  [pdf, other

    cs.AR

    ERASER: Efficient RTL FAult Simulation Framework with Trimmed Execution Redundancy

    Authors: Jiaping Tang, Jianan Mu, Silin Liu, Zizhen Liu, Feng Gu, Xinyu Zhang, Leyan Wang, Shenwen Liang, Jing Ye, Huawei Li, Xiaowei Li

    Abstract: As intelligent computing devices increasingly integrate into human life, ensuring the functional safety of the corresponding electronic chips becomes more critical. A key metric for functional safety is achieving a sufficient fault coverage. To meet this requirement, extensive time-consuming fault simulation of the RTL code is necessary during the chip design phase.The main overhead in RTL fault s… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 7 pages

  19. arXiv:2504.16423  [pdf, other

    cs.HC

    Advancing Radar Hand Gesture Recognition: A Hybrid Spectrum Synthetic Framework Merging Simulation with Neural Networks

    Authors: Jiaqi Tang, Xinbo Xu, Yinsong Xu, Qingchao Chen

    Abstract: Millimeter wave (mmWave) radar sensors play a vital role in hand gesture recognition (HGR) by detecting subtle motions while preserving user privacy. However, the limited scale of radar datasets hinders the performance. Existing synthetic data generation methods fall short in two key areas. On the one hand, modeling-based approaches fail to accurately simulate the wave propagation and reflection a… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  20. arXiv:2504.16339  [pdf, other

    cs.AR

    Transitive Array: An Efficient GEMM Accelerator with Result Reuse

    Authors: Cong Guo, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai Li, Yiran Chen

    Abstract: Deep Neural Networks (DNNs) and Large Language Models (LLMs) have revolutionized artificial intelligence, yet their deployment faces significant memory and computational challenges, especially in resource-constrained environments. Quantization techniques have mitigated some of these issues by reducing data precision, primarily focusing on General Matrix Multiplication (GEMM). This study introduces… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: ISCA 2025

  21. arXiv:2504.15806  [pdf, other

    cs.LG cs.AI

    DAE-KAN: A Kolmogorov-Arnold Network Model for High-Index Differential-Algebraic Equations

    Authors: Kai Luo, Juan Tang, Mingchao Cai, Xiaoqing Zeng, Manqi Xie, Ming Yan

    Abstract: Kolmogorov-Arnold Networks (KANs) have emerged as a promising alternative to Multi-layer Perceptrons (MLPs) due to their superior function-fitting abilities in data-driven modeling. In this paper, we propose a novel framework, DAE-KAN, for solving high-index differential-algebraic equations (DAEs) by integrating KANs with Physics-Informed Neural Networks (PINNs). This framework not only preserves… ▽ More

    Submitted 23 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  22. arXiv:2504.15796  [pdf, other

    cs.CV cs.LG

    Locating and Mitigating Gradient Conflicts in Point Cloud Domain Adaptation via Saliency Map Skewness

    Authors: Jiaqi Tang, Yinsong Xu, Qingchao Chen

    Abstract: Object classification models utilizing point cloud data are fundamental for 3D media understanding, yet they often struggle with unseen or out-of-distribution (OOD) scenarios. Existing point cloud unsupervised domain adaptation (UDA) methods typically employ a multi-task learning (MTL) framework that combines primary classification tasks with auxiliary self-supervision tasks to bridge the gap betw… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  23. arXiv:2504.15615  [pdf, ps, other

    cs.LG stat.ML

    Dimension-Free Decision Calibration for Nonlinear Loss Functions

    Authors: Jingwu Tang, Jiayun Wu, Zhiwei Steven Wu, Jiahao Zhang

    Abstract: When model predictions inform downstream decision making, a natural question is under what conditions can the decision-makers simply respond to the predictions as if they were the true outcomes. Calibration suffices to guarantee that simple best-response to predictions is optimal. However, calibration for high-dimensional prediction outcome spaces requires exponential computational and statistical… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  24. arXiv:2504.15046  [pdf, other

    cs.AI

    Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision

    Authors: Shilin Zhang, Zican Hu, Wenhao Wu, Xinyi Xie, Jianxiang Tang, Chunlin Chen, Daoyi Dong, Yu Cheng, Zhenhong Sun, Zhi Wang

    Abstract: RL systems usually tackle generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader source o… ▽ More

    Submitted 22 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: 18 pages, 8 figures

  25. arXiv:2504.14894  [pdf, other

    cs.RO eess.SY

    Never too Cocky to Cooperate: An FIM and RL-based USV-AUV Collaborative System for Underwater Tasks in Extreme Sea Conditions

    Authors: Jingzehua Xu, Guanwen Xie, Jiwei Tang, Yimian Ding, Weiyi Liu, Shuai Zhang, Yi Li

    Abstract: This paper develops a novel unmanned surface vehicle (USV)-autonomous underwater vehicle (AUV) collaborative system designed to enhance underwater task performance in extreme sea conditions. The system integrates a dual strategy: (1) high-precision multi-AUV localization enabled by Fisher information matrix-optimized USV path planning, and (2) reinforcement learning-based cooperative planning and… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  26. arXiv:2504.14877  [pdf, other

    cs.CV

    Collaborative Enhancement Network for Low-quality Multi-spectral Vehicle Re-identification

    Authors: Aihua Zheng, Yongqi Sun, Zi Wang, Chenglong Li, Jin Tang

    Abstract: The performance of multi-spectral vehicle Re-identification (ReID) is significantly degraded when some important discriminative cues in visible, near infrared and thermal infrared spectra are lost. Existing methods generate or enhance missing details in low-quality spectra data using the high-quality one, generally called the primary spectrum, but how to justify the primary spectrum is a challengi… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  27. arXiv:2504.14847  [pdf, other

    cs.CV

    Reliable Multi-Modal Object Re-Identification via Modality-Aware Graph Reasoning

    Authors: Xixi Wan, Aihua Zheng, Zi Wang, Bo Jiang, Jin Tang, Jixin Ma

    Abstract: Multi-modal data provides abundant and diverse object information, crucial for effective modal interactions in Re-Identification (ReID) tasks. However, existing approaches often overlook the quality variations in local features and fail to fully leverage the complementary information across modalities, particularly in the case of low-quality features. In this paper, we propose to address this issu… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  28. arXiv:2504.14604  [pdf, other

    cs.RO

    RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Hengle Ren, Renjing Xu, Jian Tang

    Abstract: 3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network's estimation of complex environments and also limits t… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  29. arXiv:2504.14582  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu, Hyunhee Park, Suejin Han, Hakjae Jeon, Dafeng Zhang, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Lu Zhao, Yuyi Zhang, Pengyu Yan, Jiawei Hu, Pengwei Liu, Fengjun Guo, Hongyuan Yu , et al. (86 additional authors not shown)

    Abstract: This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that ach… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_ImageSR_x4

  30. arXiv:2504.14482  [pdf, other

    cs.CL cs.SD

    DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

    Authors: Xiang Li, Duyi Pan, Hongru Xiao, Jiale Han, Jing Tang, Jiabao Ma, Wei Wang, Bo Cheng

    Abstract: Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication. However, existing datasets involve high construction costs due to manual annotation and suffer from limited character diversity, contextual scenarios, and emotional expressiveness. To address these issues, we propose DialogueAgents, a novel hybrid agent-based speech synthesis framework, which… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Accepted by ICME 2025. Dataset and code are publicly available: [https://github.com/uirlx/DialogueAgents](https://github.com/uirlx/DialogueAgents)

  31. arXiv:2504.14423  [pdf, other

    cs.CV cs.AI

    Adversarial Attack for RGB-Event based Visual Object Tracking

    Authors: Qiang Chen, Xiao Wang, Haowen Wang, Bo Jiang, Lin Zhu, Dawei Zhang, Yonghong Tian, Jin Tang

    Abstract: Visual object tracking is a crucial research topic in the fields of computer vision and multi-modal fusion. Among various approaches, robust visual tracking that combines RGB frames with Event streams has attracted increasing attention from researchers. While striving for high accuracy and efficiency in tracking, it is also important to explore how to effectively conduct adversarial attacks and de… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  32. arXiv:2504.14326  [pdf, other

    cs.NI

    Diffusion-based Dynamic Contract for Federated AI Agent Construction in Mobile Metaverses

    Authors: Jinbo Wen, Jiawen Kang, Yang Zhang, Yue Zhong, Dusit Niyato, Jie Xu, Jianhang Tang, Chau Yuen

    Abstract: Mobile metaverses have attracted significant attention from both academia and industry, which are envisioned as the next-generation Internet, providing users with immersive and ubiquitous metaverse services through mobile devices. Driven by Large Language Models (LLMs) and Vision-Language Models (VLMs), Artificial Intelligence (AI) agents hold the potential to empower the creation, maintenance, an… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  33. arXiv:2504.14147  [pdf, other

    cs.IR cs.AI cs.CL

    HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation

    Authors: Jiakai Tang, Jingsen Zhang, Zihang Tian, Xueyang Feng, Lei Wang, Xu Chen

    Abstract: Recent advancements in explainable recommendation have greatly bolstered user experience by elucidating the decision-making rationale. However, the existing methods actually fail to provide effective feedback signals for potentially better or worse generated explanations due to their reliance on traditional supervised learning paradigms in sparse interaction data. To address these issues, we propo… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  34. arXiv:2504.13925  [pdf, other

    cs.HC cs.CY

    TigerGPT: A New AI Chatbot for Adaptive Campus Climate Surveys

    Authors: Jinwen Tang, Songxi Chen, Yi Shang

    Abstract: Campus climate surveys play a pivotal role in capturing how students, faculty, and staff experience university life, yet traditional methods frequently suffer from low participation and minimal follow-up. We present TigerGPT, a new AI chatbot that generates adaptive, context-aware dialogues enriched with visual elements. Through real-time follow-up prompts, empathetic messaging, and flexible topic… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  35. arXiv:2504.13811  [pdf, other

    cs.CR cs.LG

    Can LLMs handle WebShell detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework

    Authors: Feijiang Han, Jiaming Zhang, Chuyi Deng, Jianheng Tang, Yunhuai Liu

    Abstract: WebShell attacks, in which malicious scripts are injected into web servers, are a major cybersecurity threat. Traditional machine learning and deep learning methods are hampered by issues such as the need for extensive training data, catastrophic forgetting, and poor generalization. Recently, Large Language Models (LLMs) have gained attention for code-related tasks, but their potential in WebShell… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Under Review

  36. arXiv:2504.13176  [pdf, other

    cs.CV

    IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design

    Authors: Fei Shen, Jian Yu, Cong Wang, Xin Jiang, Xiaoyu Du, Jinhui Tang

    Abstract: This paper presents IMAGGarment-1, a fine-grained garment generation (FGG) framework that enables high-fidelity garment synthesis with precise control over silhouette, color, and logo placement. Unlike existing methods that are limited to single-condition inputs, IMAGGarment-1 addresses the challenges of multi-conditional controllability in personalized fashion design and digital apparel applicati… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  37. arXiv:2504.12576  [pdf, other

    cs.CV cs.AI

    CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework

    Authors: Wentao Wu, Xiao Wang, Chenglong Li, Bo Jiang, Jin Tang, Bin Luo, Qi Liu

    Abstract: Event cameras have attracted increasing attention in recent years due to their advantages in high dynamic range, high temporal resolution, low power consumption, and low latency. Some researchers have begun exploring pre-training directly on event data. Nevertheless, these efforts often fail to establish strong connections with RGB frames, limiting their applicability in multi-modal fusion scenari… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  38. arXiv:2504.12292  [pdf, ps, other

    cs.CV cs.AI cs.LG

    SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians

    Authors: Liam Schoneveld, Zhe Chen, Davide Davoli, Jiapeng Tang, Saimon Terazawa, Ko Nishino, Matthias Nießner

    Abstract: Accurate, real-time 3D reconstruction of human heads from monocular images and videos underlies numerous visual applications. As 3D ground truth data is hard to come by at scale, previous methods have sought to learn from abundant 2D videos in a self-supervised manner. Typically, this involves the use of differentiable mesh rendering, which is effective but faces limitations. To improve on this, w… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: For video demonstrations and additional materials please see https://nlml.github.io/sheap/

  39. arXiv:2504.11914  [pdf, other

    cs.CV

    AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection

    Authors: Yuhao Chao, Jie Liu, Jie Tang, Gangshan Wu

    Abstract: Industrial Anomaly Detection (IAD) poses a formidable challenge due to the scarcity of defective samples, making it imperative to deploy models capable of robust generalization to detect unseen anomalies effectively. Traditional approaches, often constrained by hand-crafted features or domain-specific expert models, struggle to address this limitation, underscoring the need for a paradigm shift. W… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  40. arXiv:2504.10985  [pdf, other

    cs.CV

    DMPT: Decoupled Modality-aware Prompt Tuning for Multi-modal Object Re-identification

    Authors: Minghui Lin, Shu Wang, Xiang Wang, Jianhua Tang, Longbin Fu, Zhengrong Zuo, Nong Sang

    Abstract: Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e., ViT) have displayed remarkable progress and achieved excellent performance. However, these methods usually adopt the standard full fine-tuning paradigm, which requires the optimization of considerable backbone parameters, causing extensive computational and storage requirements. In this work,… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

  41. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  42. arXiv:2504.10358  [pdf, other

    cs.CV cs.AI

    FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos

    Authors: Rui Chen, Lei Sun, Jing Tang, Geng Li, Xiangxiang Chu

    Abstract: Recent advances in video generation have posed great challenges in the assessment of AI-generated content, particularly with the emergence of increasingly sophisticated models. The various inconsistencies and defects observed in such videos are inherently complex, making overall scoring notoriously difficult. In this paper, we emphasize the critical importance of integrating fine-grained reasoning… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 10 pages, 4 figures

  43. arXiv:2504.09461  [pdf, other

    cs.RO cs.AR

    ADDT -- A Digital Twin Framework for Proactive Safety Validation in Autonomous Driving Systems

    Authors: Bo Yu, Chaoran Yuan, Zishen Wan, Jie Tang, Fadi Kurdahi, Shaoshan Liu

    Abstract: Autonomous driving systems continue to face safety-critical failures, often triggered by rare and unpredictable corner cases that evade conventional testing. We present the Autonomous Driving Digital Twin (ADDT) framework, a high-fidelity simulation platform designed to proactively identify hidden faults, evaluate real-time performance, and validate safety before deployment. ADDT combines realisti… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  44. arXiv:2504.09215  [pdf, other

    cs.CV cs.MM

    Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition

    Authors: Zhicheng Zhang, Hao Tang, Jinhui Tang

    Abstract: Given the critical role of birds in ecosystems, Fine-Grained Bird Recognition (FGBR) has gained increasing attention, particularly in distinguishing birds within similar subcategories. Although Vision Transformer (ViT)-based methods often outperform Convolutional Neural Network (CNN)-based methods in FGBR, recent studies reveal that the limited receptive field of plain ViT model hinders representa… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: Accepted by AAAI2025

  45. arXiv:2504.08902  [pdf, other

    cs.CV cs.LG

    LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping

    Authors: Pascal Chang, Sergio Sancho, Jingwei Tang, Markus Gross, Vinicius C. Azevedo

    Abstract: Anamorphosis refers to a category of images that are intentionally distorted, making them unrecognizable when viewed directly. Their true form only reveals itself when seen from a specific viewpoint, which can be through some catadioptric device like a mirror or a lens. While the construction of these mathematical devices can be traced back to as early as the 17th century, they are only interpreta… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR 2025 (Oral)

  46. arXiv:2504.08242  [pdf, other

    cs.DC cs.AI cs.NI

    Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices

    Authors: Shengyuan Ye, Bei Ouyang, Liekang Zeng, Tianyi Qian, Xiaowen Chu, Jian Tang, Xu Chen

    Abstract: Generative large language models (LLMs) have garnered significant attention due to their exceptional capabilities in various AI tasks. Traditionally deployed in cloud datacenters, LLMs are now increasingly moving towards more accessible edge platforms to protect sensitive user data and ensure privacy preservation. The limited computational resources of individual edge devices, however, can result… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE International Conference on Computer Communications 2025

  47. arXiv:2504.07433  [pdf, other

    cs.CL

    From Token to Line: Enhancing Code Generation with a Long-Term Perspective

    Authors: Tingwei Lu, Yangning Li, Liyuan Wang, Binghuai Lin, Jiwei Tang, Wanshi Xu, Hai-Tao Zheng, Yinghui Li, Bingxu An, Zhao Wei, Yong Xu

    Abstract: The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limit… ▽ More

    Submitted 18 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  48. arXiv:2504.07282  [pdf, other

    cs.CL

    RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models

    Authors: Lv Qingsong, Yangning Li, Zihua Lan, Zishan Xu, Jiwei Tang, Yinghui Li, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu

    Abstract: In the instruction fine-tuning of large language models (LLMs), it has become a consensus that a few high-quality instructions are superior to a large number of low-quality instructions. At present, many instruction selection methods have been proposed, but most of these methods select instruction based on heuristic quality metrics, and only consider data selection before training. These designs l… ▽ More

    Submitted 14 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  49. arXiv:2504.06803  [pdf, other

    cs.CV

    DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation

    Authors: Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Hao Luo, Yibing Song, Gao Huang, Fan Wang, Yang You

    Abstract: Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the \emph{static} inference paradigm, which inevitably introduces redundant computation in certain \emph{diffusion timesteps} and \emph{spatial regions}. To overcome thi… ▽ More

    Submitted 16 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: Extended journal version for ICLR. arXiv admin note: substantial text overlap with arXiv:2410.03456

  50. arXiv:2504.06322  [pdf

    cs.CY cs.AI cs.HC

    Assessing employment and labour issues implicated by using AI

    Authors: Thijs Willems, Darion Jin Hotan, Jiawen Cheryl Tang, Norakmal Hakim bin Norhashim, King Wang Poon, Zi An Galvyn Goh, Radha Vinod

    Abstract: This chapter critiques the dominant reductionist approach in AI and work studies, which isolates tasks and skills as replaceable components. Instead, it advocates for a systemic perspective that emphasizes the interdependence of tasks, roles, and workplace contexts. Two complementary approaches are proposed: an ethnographic, context-rich method that highlights how AI reconfigures work environments… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: This manuscript is accepted for publication in Emad Yaghmaei, et al., eds., Global Perspectives on AI Impact Assessment (Oxford University Press, forthcoming 2025)