Skip to main content

Showing 1–50 of 4,372 results for author: Zhang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10493  [pdf, ps, other

    cs.CL

    CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning

    Authors: Shaohan Wang, Licheng Zhang, Zheren Fu, Zhendong Mao

    Abstract: Retrieval-Augmented Generation (RAG) is an effective method to enhance the capabilities of large language models (LLMs). Existing methods focus on optimizing the retriever or generator in the RAG system by directly utilizing the top-k retrieved documents. However, the documents effectiveness are various significantly across user queries, i.e. some documents provide valuable knowledge while others… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10307  [pdf, ps, other

    cs.LG

    Negative Metric Learning for Graphs

    Authors: Yiyang Zhao, Chengpei Wu, Lilin Zhang, Ning Yang

    Abstract: Graph contrastive learning (GCL) often suffers from false negatives, which degrades the performance on downstream tasks. The existing methods addressing the false negative issue usually rely on human prior knowledge, still leading GCL to suboptimal results. In this paper, we propose a novel Negative Metric Learning (NML) enhanced GCL (NML-GCL). NML-GCL employs a learnable Negative Metric Network (… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09907  [pdf

    cs.LG cs.AI cs.CE

    Avocado Price Prediction Using a Hybrid Deep Learning Model: TCN-MLP-Attention Architecture

    Authors: Linwei Zhang, LuFeng, Ruijia Liang

    Abstract: With the growing demand for healthy foods, agricultural product price forecasting has become increasingly important. Hass avocados, as a high-value crop, exhibit complex price fluctuations influenced by factors such as seasonality, region, and weather. Traditional prediction models often struggle with highly nonlinear and dynamic data. To address this, we propose a hybrid deep learning model, TCN-… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09701  [pdf, ps, other

    cs.CL

    VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts

    Authors: Xin Liu, Lechen Zhang, Sheza Munir, Yiyang Gu, Lu Wang

    Abstract: Large language models (LLMs) excel at generating long-form responses, but evaluating their factuality remains challenging due to complex inter-sentence dependencies within the generated facts. Prior solutions predominantly follow a decompose-decontextualize-verify pipeline but often fail to capture essential context and miss key relational facts. In this paper, we introduce VeriFact, a factuality… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.09415  [pdf, other

    cs.CV

    FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models

    Authors: Hongyang Wang, Yichen Shi, Zhuofu Tao, Yuhao Gao, Liepiao Zhang, Xun Lin, Jun Feng, Xiaochen Yuan, Zitong Yu, Xiaochun Cao

    Abstract: Face anti-spoofing (FAS) is crucial for protecting facial recognition systems from presentation attacks. Previous methods approached this task as a classification problem, lacking interpretability and reasoning behind the predicted results. Recently, multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and decision-making in visual tasks. However, there… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  6. arXiv:2505.09343  [pdf, ps, other

    cs.DC cs.AI cs.AR

    Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

    Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei

    Abstract: The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inferen… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version will appear as part of the Industry Track in Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25)

  7. arXiv:2505.09193  [pdf, other

    eess.IV cs.CV

    BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression

    Authors: Wei Jiang, Junru Li, Kai Zhang, Li Zhang

    Abstract: Recent forward prediction-based learned video compression (LVC) methods have achieved impressive results, even surpassing VVC reference software VTM under the Low Delay B (LDB) configuration. In contrast, learned bidirectional video compression (BVC) remains underexplored and still lags behind its forward-only counterparts. This performance gap is mainly due to the limited ability to extract diver… ▽ More

    Submitted 14 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: The first learned video codec that surpasses VTM 13.2 RA across all standard test datasets. Code will be available at https://github.com/JiangWeibeta/ECVC

  8. arXiv:2505.09103  [pdf, ps, other

    cs.RO

    VGC-RIO: A Tightly Integrated Radar-Inertial Odometry with Spatial Weighted Doppler Velocity and Local Geometric Constrained RCS Histograms

    Authors: Jianguang Xiang, Xiaofeng He, Zizhuo Chen, Lilian Zhang, Xincan Luo, Jun Mao

    Abstract: Recent advances in 4D radar-inertial odometry have demonstrated promising potential for autonomous lo calization in adverse conditions. However, effective handling of sparse and noisy radar measurements remains a critical challenge. In this paper, we propose a radar-inertial odometry with a spatial weighting method that adapts to unevenly distributed points and a novel point-description histogram… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  9. arXiv:2505.08919  [pdf, ps, other

    cs.GR cs.AI cs.CV

    Template-Guided Reconstruction of Pulmonary Segments with Neural Implicit Functions

    Authors: Kangxian Xie, Yufei Zhu, Kaiming Kuang, Li Zhang, Hongwei Bran Li, Mingchen Gao, Jiancheng Yang

    Abstract: High-quality 3D reconstruction of pulmonary segments plays a crucial role in segmentectomy and surgical treatment planning for lung cancer. Due to the resolution requirement of the target reconstruction, conventional deep learning-based methods often suffer from computational resource constraints or limited granularity. Conversely, implicit modeling is favored due to its computational efficiency a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: In revision process

  10. arXiv:2505.08723  [pdf, other

    cs.CV

    TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series

    Authors: Xiaolei Qin, Di Wang, Jing Zhang, Fengxiang Wang, Xin Su, Bo Du, Liangpei Zhang

    Abstract: Satellite image time series (SITS) provide continuous observations of the Earth's surface, making them essential for applications such as environmental management and disaster assessment. However, existing spatiotemporal foundation models rely on plain vision transformers, which encode entire temporal sequences without explicitly capturing multiscale spatiotemporal relationships between land objec… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.08686  [pdf, ps, other

    cs.GR cs.CV cs.LG

    CAD-Coder:Text-Guided CAD Files Code Generation

    Authors: Changqi He, Shuhan Zhang, Liguo Zhang, Jiajun Miao

    Abstract: Computer-aided design (CAD) is a way to digitally create 2D drawings and 3D models of real-world products. Traditional CAD typically relies on hand-drawing by experts or modifications of existing library files, which doesn't allow for rapid personalization. With the emergence of generative artificial intelligence, convenient and efficient personalized CAD generation has become possible. However, e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Report number: ICCV 2025 Submission 11025

  12. arXiv:2505.08203  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Not that Groove: Zero-Shot Symbolic Music Editing

    Authors: Li Zhang

    Abstract: Most work in AI music generation focused on audio, which has seen limited use in the music production industry due to its rigidity. To maximize flexibility while assuming only textual instructions from producers, we are among the first to tackle symbolic music editing. We circumvent the known challenge of lack of labeled data by proving that LLMs with zero-shot prompting can effectively edit drum… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  13. arXiv:2505.08137  [pdf, ps, other

    cs.LG cs.CL cs.GR cs.MM

    Large Language Models for Computer-Aided Design: A Survey

    Authors: Licheng Zhang, Bach Le, Naveed Akhtar, Siew-Kei Lam, Tuan Ngo

    Abstract: Large Language Models (LLMs) have seen rapid advancements in recent years, with models like ChatGPT and DeepSeek, showcasing their remarkable capabilities across diverse domains. While substantial research has been conducted on LLMs in various fields, a comprehensive review focusing on their integration with Computer-Aided Design (CAD) remains notably absent. CAD is the industry standard for 3D mo… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  14. arXiv:2505.07896  [pdf, ps, other

    q-bio.GN cs.AI

    Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability

    Authors: Douglas Jiang, Zilin Dai, Luxuan Zhang, Qiyi Yu, Haoqi Sun, Feng Tian

    Abstract: Understanding cell identity and function through single-cell level sequencing data remains a key challenge in computational biology. We present a novel framework that leverages gene-specific textual annotations from the NCBI Gene database to generate biologically contextualized cell embeddings. For each cell in a single-cell RNA sequencing (scRNA-seq) dataset, we rank genes by expression level, re… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  15. arXiv:2505.07882  [pdf, other

    cs.AI cs.LG

    Enhancing Trust Management System for Connected Autonomous Vehicles Using Machine Learning Methods: A Survey

    Authors: Qian Xu, Lei Zhang, Yixiao Liu

    Abstract: Connected Autonomous Vehicles (CAVs) operate in dynamic, open, and multi-domain networks, rendering them vulnerable to various threats. Trust Management Systems (TMS) systematically organize essential steps in the trust mechanism, identifying malicious nodes against internal threats and external threats, as well as ensuring reliable decision-making for more cooperative tasks. Recent advances in ma… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 31 pages, 9 figures

  16. arXiv:2505.07721  [pdf, other

    cs.CV

    Gameplay Highlights Generation

    Authors: Vignesh Edithal, Le Zhang, Ilia Blank, Imran Junejo

    Abstract: In this work, we enable gamers to share their gaming experience on social media by automatically generating eye-catching highlight reels from their gameplay session Our automation will save time for gamers while increasing audience engagement. We approach the highlight generation problem by first identifying intervals in the video where interesting events occur and then concatenate them. We develo… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  17. arXiv:2505.07692  [pdf, other

    cs.DB

    ABase: the Multi-Tenant NoSQL Serverless Database for Diverse and Dynamic Workloads in Large-scale Cloud Environments

    Authors: Rong Kang, Yanbin Chen, Ye Liu, Fuxin Jiang, Qingshuo Li, Miao Ma, Jian Liu, Guangliang Zhao, Tieying Zhang, Jianjun Chen, Lei Zhang

    Abstract: Multi-tenant architectures enhance the elasticity and resource utilization of NoSQL databases by allowing multiple tenants to co-locate and share resources. However, in large-scale cloud environments, the diverse and dynamic nature of workloads poses significant challenges for multi-tenant NoSQL databases. Based on our practical observations, we have identified three crucial challenges: (1) the im… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: SIGMOD 2025 accepted

  18. arXiv:2505.07322  [pdf, ps, other

    cs.CV

    RealRep: Generalized SDR-to-HDR Conversion with Style Disentangled Representation Learning

    Authors: Gang He, Siqi Wang, Kepeng Xu, Lin Zhang

    Abstract: High-Dynamic-Range Wide-Color-Gamut (HDR-WCG) technology is becoming increasingly prevalent, intensifying the demand for converting Standard Dynamic Range (SDR) content to HDR. Existing methods primarily rely on fixed tone mapping operators, which are inadequate for handling SDR inputs with diverse styles commonly found in real-world scenarios. To address this challenge, we propose a generalized S… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  19. arXiv:2505.07247  [pdf, other

    cs.CL cs.AI

    SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models

    Authors: Peichao Lai, Kexuan Zhang, Yi Lin, Linyihan Zhang, Feiyang Ye, Jinhao Yan, Yanwei Xu, Conghui He, Yilei Wang, Wentao Zhang, Bin Cui

    Abstract: Subjective Answer Grading (SAG) plays a crucial role in education, standardized testing, and automated assessment systems, particularly for evaluating short-form responses in Short Answer Scoring (SAS). However, existing approaches often produce coarse-grained scores and lack detailed reasoning. Although large language models (LLMs) have demonstrated potential as zero-shot evaluators, they remain… ▽ More

    Submitted 15 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  20. arXiv:2505.07208  [pdf, other

    cs.SE

    An Empirical Study: MEMS as a Static Performance Metric

    Authors: Liwei Zhang, Baoquan Cui, Xutong Ma, Jian Zhang

    Abstract: Static performance estimation is essential during compile-time analysis, yet traditional runtime-based methods are costly and platform-dependent. We investigate mems, the number of memory accesses, as a static and architecture-independent performance metric. We develop a Clang-based automated instrumentation tool that rewrites source code to insert path tracing and \textit{mems} counting logic. Th… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  21. arXiv:2505.06918  [pdf, other

    eess.IV cs.CV cs.LG

    Uni-AIMS: AI-Powered Microscopy Image Analysis

    Authors: Yanhui Hong, Nan Wang, Zhiyi Xia, Haoyi Tao, Xi Fang, Yiming Li, Jiankun Wang, Peng Jin, Xiaochen Cai, Shengyu Li, Ziqi Chen, Zezhong Zhang, Guolin Ke, Linfeng Zhang

    Abstract: This paper presents a systematic solution for the intelligent recognition and automatic analysis of microscopy images. We developed a data engine that generates high-quality annotated datasets through a combination of the collection of diverse microscopy images from experiments, synthetic data generation and a human-in-the-loop annotation process. To address the unique challenges of microscopy ima… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  22. arXiv:2505.06512  [pdf, other

    cs.CV

    HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation

    Authors: Hang Wang, Zhi-Qi Cheng, Chenhao Lin, Chao Shen, Lei Zhang

    Abstract: Text-to-image synthesis has progressed to the point where models can generate visually compelling images from natural language prompts. Yet, existing methods often fail to reconcile high-level semantic fidelity with explicit spatial control, particularly in scenes involving multiple objects, nuanced relations, or complex layouts. To bridge this gap, we propose a Hierarchical Cross-Modal Alignment… ▽ More

    Submitted 14 May, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

    Comments: 10 pages, 4 figures

  23. arXiv:2505.06347  [pdf, ps, other

    quant-ph cs.AI hep-lat hep-ph

    Quantum State Preparation via Large-Language-Model-Driven Evolution

    Authors: Qing-Hong Cao, Zong-Yue Hou, Ying-Ying Li, Xiaohui Liu, Zhuo-Yang Song, Liang-Qi Zhang, Shutao Zhang, Ke Zhao

    Abstract: We propose an automated framework for quantum circuit design by integrating large-language models (LLMs) with evolutionary optimization to overcome the rigidity, scalability limitations, and expert dependence of traditional ones in variational quantum algorithms. Our approach (FunSearch) autonomously discovers hardware-efficient ansätze with new features of scalability and system-size-independent… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 6 + 4 pages, 14 figures

    Report number: CPTNP-25-0001

  24. arXiv:2505.06277  [pdf, other

    eess.SP cs.AI cs.CV cs.NI

    Terahertz Spatial Wireless Channel Modeling with Radio Radiance Field

    Authors: John Song, Lihao Zhang, Feng Ye, Haijian Sun

    Abstract: Terahertz (THz) communication is a key enabler for 6G systems, offering ultra-wide bandwidth and unprecedented data rates. However, THz signal propagation differs significantly from lower-frequency bands due to severe free space path loss, minimal diffraction and specular reflection, and prominent scattering, making conventional channel modeling and pilot-based estimation approaches inefficient. I… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: submitted to IEEE conferences

  25. arXiv:2505.05936  [pdf, ps, other

    cs.CV

    CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking

    Authors: Weihong Li, Xiaoqiong Liu, Heng Fan, Libo Zhang

    Abstract: Recent advancements in visual object tracking have markedly improved the capabilities of unmanned aerial vehicle (UAV) tracking, which is a critical component in real-world robotics applications. While the integration of hierarchical lightweight networks has become a prevalent strategy for enhancing efficiency in UAV tracking, it often results in a significant drop in network capacity, which furth… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by ICRA 2025

  26. arXiv:2505.05738  [pdf, ps, other

    cs.LG cs.AI

    Accurate and Efficient Multivariate Time Series Forecasting via Offline Clustering

    Authors: Yiming Niu, Jinliang Deng, Lulu Zhang, Zimu Zhou, Yongxin Tong

    Abstract: Accurate and efficient multivariate time series (MTS) forecasting is essential for applications such as traffic management and weather prediction, which depend on capturing long-range temporal dependencies and interactions between entities. Existing methods, particularly those based on Transformer architectures, compute pairwise dependencies across all time steps, leading to a computational comple… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  27. arXiv:2505.05007  [pdf, other

    cs.CV

    Driving with Context: Online Map Matching for Complex Roads Using Lane Markings and Scenario Recognition

    Authors: Xin Bi, Zhichao Li, Yuxuan Xia, Panpan Tong, Lijuan Zhang, Yang Chen, Junsheng Fu

    Abstract: Accurate online map matching is fundamental to vehicle navigation and the activation of intelligent driving functions. Current online map matching methods are prone to errors in complex road networks, especially in multilevel road area. To address this challenge, we propose an online Standard Definition (SD) map matching method by constructing a Hidden Markov Model (HMM) with multiple probability… ▽ More

    Submitted 10 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 9 pages and 12 figures. Under review at IEEE RA-L

  28. arXiv:2505.04967  [pdf, ps, other

    cs.SI cs.LG

    Community and hyperedge inference in multiple hypergraphs

    Authors: Li Ni, Ziqi Deng, Lin Mu, Lei Zhang, Wenjian Luo, Yiwen Zhang

    Abstract: Hypergraphs, capable of representing high-order interactions via hyperedges, have become a powerful tool for modeling real-world biological and social systems. Inherent relationships within these real-world systems, such as the encoding relationship between genes and their protein products, drive the establishment of interconnections between multiple hypergraphs. Here, we demonstrate how to utiliz… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  29. arXiv:2505.04396  [pdf, other

    cs.LG physics.ao-ph

    Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast

    Authors: Jingnan Wang, Jie Chao, Shangshang Yang, Congyi Nai, Kaijun Ren, Kefeng Deng, Xi Chen, Yaxin Liu, Hanqiuzi Wen, Ziniu Xiao, Lifeng Zhang, Xiaodong Wang, Jiping Guan, Baoxiang Pan

    Abstract: The planning and operation of renewable energy, especially wind power, depend crucially on accurate, timely, and high-resolution weather information. Coarse-grid global numerical weather forecasts are typically downscaled to meet these requirements, introducing challenges of scale inconsistency, process representation error, computation cost, and entanglement of distinct uncertainty sources from c… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  30. Revolutionizing Newcomers' Onboarding Process in OSS Communities: The Future AI Mentor

    Authors: Xin Tan, Xiao Long, Yinghao Zhu, Lin Shi, Xiaoli Lian, Li Zhang

    Abstract: Onboarding newcomers is vital for the sustainability of open-source software (OSS) projects. To lower barriers and increase engagement, OSS projects have dedicated experts who provide guidance for newcomers. However, timely responses are often hindered by experts' busy schedules. The recent rapid advancements of AI in software engineering have brought opportunities to leverage AI as a substitute f… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  31. arXiv:2505.04180  [pdf, other

    cs.IR

    Towards Large-scale Generative Ranking

    Authors: Yanhua Huang, Yuqi Chen, Xiong Cao, Rui Yang, Mingliang Qi, Yinghao Zhu, Qingchang Han, Yaowei Liu, Zhaoyu Liu, Xuefeng Yao, Yuting Jia, Leilei Ma, Yinqi Zhang, Taoyu Zhu, Liujie Zhang, Lei Chen, Weihang Chen, Min Zhu, Ruiwen Xu, Lei Zhang

    Abstract: Generative recommendation has recently emerged as a promising paradigm in information retrieval. However, generative ranking systems are still understudied, particularly with respect to their effectiveness and feasibility in large-scale industrial settings. This paper investigates this topic at the ranking stage of Xiaohongshu's Explore Feed, a recommender system that serves hundreds of millions o… ▽ More

    Submitted 8 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  32. arXiv:2505.04094  [pdf, other

    cs.CR cs.SE

    SolPhishHunter: Towards Detecting and Understanding Phishing on Solana

    Authors: Ziwei Li, Zigui Jiang, Ming Fang, Jiaxin Chen, Zhiying Wu, Jiajing Wu, Lun Zhang, Zibin Zheng

    Abstract: Solana is a rapidly evolving blockchain platform that has attracted an increasing number of users. However, this growth has also drawn the attention of malicious actors, with some phishers extending their reach into the Solana ecosystem. Unlike platforms such as Ethereum, Solana has distinct designs of accounts and transactions, leading to the emergence of new types of phishing transactions that w… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  33. arXiv:2505.03463  [pdf, other

    cs.CV physics.med-ph

    Nonperiodic dynamic CT reconstruction using backward-warping INR with regularization of diffeomorphism (BIRD)

    Authors: Muge Du, Zhuozhao Zheng, Wenying Wang, Guotao Quan, Wuliang Shi, Le Shen, Li Zhang, Liang Li, Yinong Liu, Yuxiang Xing

    Abstract: Dynamic computed tomography (CT) reconstruction faces significant challenges in addressing motion artifacts, particularly for nonperiodic rapid movements such as cardiac imaging with fast heart rates. Traditional methods struggle with the extreme limited-angle problems inherent in nonperiodic cases. Deep learning methods have improved performance but face generalization challenges. Recent implicit… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  34. arXiv:2505.03266  [pdf

    physics.optics cs.IT eess.SP

    Rapid diagnostics of reconfigurable intelligent surfaces using space-time-coding modulation

    Authors: Yi Ning Zheng, Lei Zhang, Xiao Qing Chen, Marco Rossi, Giuseppe Castaldi, Shuo Liu, Tie Jun Cui, Vincenzo Galdi

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time co… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 30 pages, 6 figures, 1 table, supporting information

  35. arXiv:2505.02831  [pdf, other

    cs.CV

    No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves

    Authors: Dengyang Jiang, Mengmeng Wang, Liuzhuozheng Li, Lei Zhang, Haoyu Wang, Wei Wei, Guang Dai, Yanning Zhang, Jingdong Wang

    Abstract: Recent studies have demonstrated that learning a meaningful internal representation can both accelerate generative training and enhance the generation quality of diffusion transformers. However, existing approaches necessitate to either introduce an external and complex representation training framework or rely on a large-scale, pre-trained representation foundation model to provide representation… ▽ More

    Submitted 13 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: Self-Representation Alignment for Diffusion Transformers. Code: https://github.com/vvvvvjdy/SRA

  36. arXiv:2505.02648  [pdf, other

    cs.CV

    MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

    Authors: Mingcheng Li, Xiaolu Hou, Ziyang Liu, Dingkang Yang, Ziyun Qian, Jiawei Chen, Jinjie Wei, Yue Jiang, Qingyao Xu, Lihua Zhang

    Abstract: Diffusion models have shown excellent performance in text-to-image generation. Nevertheless, existing methods often suffer from performance bottlenecks when handling complex prompts that involve multiple objects, characteristics, and relations. Therefore, we propose a Multi-agent Collaboration-based Compositional Diffusion (MCCD) for text-to-image generation for complex scenes. Specifically, we de… ▽ More

    Submitted 6 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  37. arXiv:2505.02086  [pdf, other

    cs.CE

    A Deep Learning Scheme of Electromagnetic Scattering From Scatterers With Incomplete Profiles

    Authors: Ji-Yuan Wang, Xin-Yue Lou, Liang Zhang, Yun-Chuan Wang, Xiao-Min Pan

    Abstract: A deep learning scheme is proposed to solve the electromagnetic (EM) scattering problems where the profile of the dielectric scatterer of interest is incomplete. As a compensation, a limited amount of scattering data is provided, which is in principle containing sufficient information associated with the missing part of the profile. The existing solvers can hardly realize the compensation if the k… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  38. arXiv:2505.02064  [pdf, other

    cs.CV

    RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

    Authors: Shuhang Xun, Sicheng Tao, Jungang Li, Yibo Shi, Zhixin Lin, Zhanhui Zhu, Yibo Yan, Hanqian Li, Linghao Zhang, Shikang Wang, Yixin Liu, Hanbo Zhang, Ying Ma, Xuming Hu

    Abstract: Multimodal Large Language Models (MLLMs) increasingly excel at perception, understanding, and reasoning. However, current benchmarks inadequately evaluate their ability to perform these tasks continuously in dynamic, real-world environments. To bridge this gap, we introduce RTV-Bench, a fine-grained benchmark for MLLM real-time video analysis. RTV-Bench uses three key principles: (1) Multi-Timesta… ▽ More

    Submitted 5 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

    Comments: 13 pages, 4 figures, 5 tables

  39. arXiv:2505.01831  [pdf, other

    eess.IV cs.CV

    Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

    Authors: Haofan Wu, Yin Huang, Yuqing Wu, Qiuyu Yang, Bingfang Wang, Li Zhang, Muhammad Fahadullah Khan, Ali Zia, M. Saleh Memon, Syed Sohail Bukhari, Abdul Fattah Memon, Daizong Ji, Ya Zhang, Ghulam Mustafa, Yin Fang

    Abstract: High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on r… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: Under review at Neural Networks

  40. arXiv:2505.01676  [pdf, other

    cs.DB cs.SE

    LogDB: Multivariate Log-based Failure Diagnosis for Distributed Databases (Extended from MultiLog)

    Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li

    Abstract: Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation, often leading to significant economic losses. Log data, naturally generated within systems, can effectively reflect internal system states. In practice, operato… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: An extension of MultiLog

  41. arXiv:2505.01660  [pdf, other

    cs.LG

    Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification

    Authors: Sicong Li, Qianqian Xu, Zhiyong Yang, Zitai Wang, Linchao Zhang, Xiaochun Cao, Qingming Huang

    Abstract: Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult. Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM), such as ImbSAM and CC-SAM, to improve generalization by flattening the loss landscape. However, these attempts face a trade-off between computational efficiency and control over the loss landscape. On the… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  42. arXiv:2505.01652  [pdf, other

    cs.LG cs.AI

    Causally Fair Node Classification on Non-IID Graph Data

    Authors: Yucong Dai, Lu Zhang, Yaowei Hu, Susan Gauch, Yongkai Wu

    Abstract: Fair machine learning seeks to identify and mitigate biases in predictions against unfavorable populations characterized by demographic attributes, such as race and gender. Recently, a few works have extended fairness to graph data, such as social networks, but most of them neglect the causal relationships among data instances. This paper addresses the prevalent challenge in fairness-aware ML algo… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  43. arXiv:2505.01050  [pdf, other

    cs.CV cs.LG

    Transferable Adversarial Attacks on Black-Box Vision-Language Models

    Authors: Kai Hu, Weichen Yu, Li Zhang, Alexander Robey, Andy Zou, Chengming Xu, Haoqi Hu, Matt Fredrikson

    Abstract: Vision Large Language Models (VLLMs) are increasingly deployed to offer advanced capabilities on inputs comprising both text and images. While prior research has shown that adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts, the extent and effectiveness of such vulnerabilities remain underexplored for VLLMs. We present a comprehe… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  44. arXiv:2505.00979  [pdf, other

    cs.CL cs.AI

    Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models

    Authors: Xuhui Jiang, Shengjie Ma, Chengjin Xu, Cehao Yang, Liyu Zhang, Jian Guo

    Abstract: Large Language Models (LLMs) have achieved remarkable success but remain data-inefficient, especially when learning from small, specialized corpora with limited and proprietary data. Existing synthetic data generation methods for continue pre-training focus on intra-document content and overlook cross-document knowledge associations, limiting content diversity and depth. We propose Synthetic-on-Gr… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  45. arXiv:2505.00627  [pdf, other

    cs.CV

    Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis

    Authors: Zhongying Deng, Haoyu Wang, Ziyan Huang, Lipei Zhang, Angelica I. Aviles-Rivero, Chaoyu Liu, Junjun He, Zoe Kourtzi, Carola-Bibiane Schönlieb

    Abstract: Brain diseases, such as Alzheimer's disease and brain tumors, present profound challenges due to their complexity and societal impact. Recent advancements in brain foundation models have shown significant promise in addressing a range of brain-related tasks. However, current brain foundation models are limited by task and data homogeneity, restricted generalization beyond segmentation or classific… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 35 pages, 4 figures

  46. arXiv:2505.00619  [pdf, other

    cs.CV

    Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification

    Authors: Neng Dong, Shuanglin Yan, Liyan Zhang, Jinhui Tang

    Abstract: Visible-Infrared Person Re-Identification (VI-ReID) is a challenging task due to the large modality discrepancy between visible and infrared images, which complicates the alignment of their features into a suitable common space. Moreover, style noise, such as illumination and color contrast, reduces the identity discriminability and modality invariance of features. To address these challenges, we… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  47. arXiv:2505.00395  [pdf, other

    cs.IT

    GAN-based Generator of Adversarial Attack on Intelligent End-to-End Autoencoder-based Communication System

    Authors: Jianyuan Chen, Lin Zhang, Zuwei Chen, Yawen Chen, Hongcheng Zhuang

    Abstract: Deep neural networks have been applied in wireless communications system to intelligently adapt to dynamically changing channel conditions, while the users are still under the threat of the malicious attacks due to the broadcasting property of wireless channels. However, most attack models require the knowledge of the target details, which is difficult to be implemented in real systems. Our object… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  48. arXiv:2505.00302  [pdf, other

    cs.LG

    Temporal Attention Evolutional Graph Convolutional Network for Multivariate Time Series Forecasting

    Authors: Xinlong Zhao, Liying Zhang, Tianbo Zou, Yan Zhang

    Abstract: Multivariate time series forecasting enables the prediction of future states by leveraging historical data, thereby facilitating decision-making processes. Each data node in a multivariate time series encompasses a sequence of multiple dimensions. These nodes exhibit interdependent relationships, forming a graph structure. While existing prediction methods often assume a fixed graph structure, man… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 13 pages, 7 figures

    MSC Class: 68T09 (Primary); 68T07 (Secondary)

  49. arXiv:2505.00236  [pdf

    cs.LG

    Node2Vec-DGI-EL: A Hierarchical Graph Representation Learning Model for Ingredient-Disease Association Prediction

    Authors: Leifeng Zhang, Xin Dong, Shuaibing Jia, Jianhua Zhang

    Abstract: Traditional Chinese medicine, as an essential component of traditional medicine, contains active ingredients that serve as a crucial source for modern drug development, holding immense therapeutic potential and development value. A multi-layered and complex network is formed from Chinese medicine to diseases and used to predict the potential associations between Chinese medicine ingredients and di… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  50. arXiv:2504.21801  [pdf, other

    cs.CL cs.AI

    DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

    Authors: Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a ch… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.