Skip to main content

Showing 1–50 of 2,675 results for author: Sun, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10297  [pdf, other

    cs.LG cs.AI cs.CR

    Defending the Edge: Representative-Attention for Mitigating Backdoor Attacks in Federated Learning

    Authors: Chibueze Peace Obioma, Youcheng Sun, Mustafa A. Mustafa

    Abstract: Federated learning (FL) enhances privacy and reduces communication cost for resource-constrained edge clients by supporting distributed model training at the edge. However, the heterogeneous nature of such devices produces diverse, non-independent, and identically distributed (non-IID) data, making the detection of backdoor attacks more challenging. In this paper, we propose a novel federated repr… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Submitted to ESORICS 2025

  2. arXiv:2505.09965  [pdf, ps, other

    cs.CV

    MambaControl: Anatomy Graph-Enhanced Mamba ControlNet with Fourier Refinement for Diffusion-Based Disease Trajectory Prediction

    Authors: Hao Yang, Tao Tan, Shuai Tan, Weiqin Yang, Kunyan Cai, Calvin Chen, Yue Sun

    Abstract: Modelling disease progression in precision medicine requires capturing complex spatio-temporal dynamics while preserving anatomical integrity. Existing methods often struggle with longitudinal dependencies and structural consistency in progressive disorders. To address these limitations, we introduce MambaControl, a novel framework that integrates selective state-space modelling with diffusion pro… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09659  [pdf, ps, other

    cs.LG cs.CL

    LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models

    Authors: Long Chen, Xiaotian Song, Yanan Sun

    Abstract: Spiking Large Language Models (LLMs) have emerged as an energy-efficient alternative to conventional LLMs through their event-driven computation. To effectively obtain spiking LLMs, researchers develop different ANN-to-SNN conversion methods by leveraging pre-trained ANN parameters while inheriting the energy efficiency of SNN. However, existing conversion methods struggle with extreme activation… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09284  [pdf, ps, other

    cs.LG stat.ML

    Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations

    Authors: Panqi Chen, Yifan Sun, Lei Cheng, Yang Yang, Weichang Li, Yang Liu, Weiqing Liu, Jiang Bian, Shikai Fang

    Abstract: Modeling and reconstructing multidimensional physical dynamics from sparse and off-grid observations presents a fundamental challenge in scientific research. Recently, diffusion-based generative modeling shows promising potential for physical simulation. However, current approaches typically operate on on-grid data with preset spatiotemporal resolution, but struggle with the sparsely observed and… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.08915  [pdf, ps, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech

    An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models

    Authors: Jialin Mao, Itay Griniasty, Yan Sun, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari

    Abstract: Recent experiments have shown that training trajectories of multiple deep neural networks with different architectures, optimization algorithms, hyper-parameter settings, and regularization methods evolve on a remarkably low-dimensional "hyper-ribbon-like" manifold in the space of probability distributions. Inspired by the similarities in the training trajectories of deep networks and linear netwo… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.08838  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

    Authors: Peixuan Ge, Tongkun Su, Faqin Lv, Baoliang Zhao, Peng Zhang, Chi Hong Wong, Liang Yao, Yu Sun, Zenan Wang, Pak Kin Wong, Ying Hu

    Abstract: Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveragi… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  7. arXiv:2505.08295  [pdf, ps, other

    cs.LG cs.AI

    A Practical Introduction to Deep Reinforcement Learning

    Authors: Yinghan Sun, Hongxi Wang, Hua Chen, Wei Zhang

    Abstract: Deep reinforcement learning (DRL) has emerged as a powerful framework for solving sequential decision-making problems, achieving remarkable success in a wide range of applications, including game AI, autonomous driving, biomedicine, and large language models. However, the diversity of algorithms and the complexity of theoretical foundations often pose significant challenges for beginners seeking t… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  8. Will Your Next Pair Programming Partner Be Human? An Empirical Evaluation of Generative AI as a Collaborative Teammate in a Semester-Long Classroom Setting

    Authors: Wenhan Lyu, Yimeng Wang, Yifan Sun, Yixuan Zhang

    Abstract: Generative AI (GenAI), especially Large Language Models (LLMs), is rapidly reshaping both programming workflows and computer science education. Many programmers now incorporate GenAI tools into their workflows, including for collaborative coding tasks such as pair programming. While prior research has demonstrated the benefits of traditional pair programming and begun to explore GenAI-assisted cod… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Accepted by Learning @ Scale 2025

  9. arXiv:2505.07674  [pdf

    cs.LG

    Joint Graph Convolution and Sequential Modeling for Scalable Network Traffic Estimation

    Authors: Nan Jiang, Wenxuan Zhu, Xu Han, Weiqiang Huang, Yumeng Sun

    Abstract: This study focuses on the challenge of predicting network traffic within complex topological environments. It introduces a spatiotemporal modeling approach that integrates Graph Convolutional Networks (GCN) with Gated Recurrent Units (GRU). The GCN component captures spatial dependencies among network nodes, while the GRU component models the temporal evolution of traffic data. This combination al… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  10. arXiv:2505.07546  [pdf, ps, other

    cs.IR cs.AI

    GRADA: Graph-based Reranker against Adversarial Documents Attack

    Authors: Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu

    Abstract: Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved documents, thereby overcoming the limitations of models' static intrinsic knowledge. However, these systems are susceptible to adversarial attacks that manipulate the retrieval process by introducing documents that are adversarial yet semantically si… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  11. arXiv:2505.07396  [pdf, ps, other

    cs.CV cs.LG

    TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset

    Authors: Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna, Mathias Pechinger, Zhaiyu Chen, Yao Sun, Alejandro Rueda Segura, Ziyang Xu, Omar AbdelGafar, Mansour Mehranfar, Chandan Yeshwanth, Yueh-Cheng Liu, Hadi Yazdi, Jiapan Wang, Stefan Auer, Katharina Anders, Klaus Bogenberger, Andre Borrmann , et al. (9 additional authors not shown)

    Abstract: Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually… ▽ More

    Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing

  12. arXiv:2505.06896  [pdf, ps, other

    cs.DC stat.CO

    RCOMPSs: A Scalable Runtime System for R Code Execution on Manycore Systems

    Authors: Xiran Zhang, Javier Conejero, Sameh Abdulah, Jorge Ejarque, Ying Sun, Rosa M. Badia, David E. Keyes, Marc G. Genton

    Abstract: R has become a cornerstone of scientific and statistical computing due to its extensive package ecosystem, expressive syntax, and strong support for reproducible analysis. However, as data sizes and computational demands grow, native R parallelism support remains limited. This paper presents RCOMPSs, a scalable runtime system that enables efficient parallel execution of R applications on multicore… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  13. arXiv:2505.06263  [pdf

    eess.SP cs.LG

    From Biometrics to Environmental Control: AI-Enhanced Digital Twins for Personalized Health Interventions in Healing Landscapes

    Authors: Yiping Meng, Yiming Sun

    Abstract: The dynamic nature of human health and comfort calls for adaptive systems that respond to individual physiological needs in real time. This paper presents an AI-enhanced digital twin framework that integrates biometric signals, specifically electrocardiogram (ECG) data, with environmental parameters such as temperature, humidity, and ventilation. Leveraging IoT-enabled sensors and biometric monito… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  14. arXiv:2505.06152  [pdf, ps, other

    cs.CV cs.AI

    MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks

    Authors: Wenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan

    Abstract: Medical vision-language models (VLMs) have shown promise as clinical assistants across various medical fields. However, specialized dermatology VLM capable of delivering professional and detailed diagnostic analysis remains underdeveloped, primarily due to less specialized text descriptions in current dermatology multimodal datasets. To address this issue, we propose MM-Skin, the first large-scale… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  15. arXiv:2505.06145  [pdf

    cs.CL cs.LG

    Towards Robust Few-Shot Text Classification Using Transformer Architectures and Dual Loss Strategies

    Authors: Xu Han, Yumeng Sun, Weiqiang Huang, Hongye Zheng, Junliang Du

    Abstract: Few-shot text classification has important application value in low-resource environments. This paper proposes a strategy that combines adaptive fine-tuning, contrastive learning, and regularization optimization to improve the classification performance of Transformer-based models. Experiments on the FewRel 2.0 dataset show that T5-small, DeBERTa-v3, and RoBERTa-base perform well in few-shot tasks… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  16. arXiv:2505.04649  [pdf, ps, other

    cs.CL

    FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights

    Authors: Chengzhang Yu, Yiming Zhang, Zhixin Liu, Zenghui Ding, Yining Sun, Zhanpeng Jin

    Abstract: The automation of scientific research through large language models (LLMs) presents significant opportunities but faces critical challenges in knowledge synthesis and quality assurance. We introduce Feedback-Refined Agent Methodology (FRAME), a novel framework that enhances medical paper generation through iterative refinement and structured feedback. Our approach comprises three key innovations:… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 12 pages, 4 figures, 5 table

  17. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  18. arXiv:2505.04486  [pdf, other

    cs.CV cs.AI cs.LG

    Efficient Flow Matching using Latent Variables

    Authors: Anirban Samaddar, Yixuan Sun, Viktor Nilsson, Sandeep Madireddy

    Abstract: Flow matching models have shown great potential in image generation tasks among probabilistic generative models. Building upon the ideas of continuous normalizing flows, flow matching models generalize the transport path of the diffusion models from a simple prior distribution to the data. Most flow matching models in the literature do not explicitly model the underlying structure/manifold in the… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  19. arXiv:2505.04232  [pdf, ps, other

    cs.IT

    Binary Reconstruction Codes for Correcting One Deletion and One Substitution

    Authors: Yuling Li, Yubo Sun, Gennian Ge

    Abstract: In this paper, we investigate binary reconstruction codes capable of correcting one deletion and one substitution. We define the \emph{single-deletion single-substitution ball} function $ \mathcal{B} $ as a mapping from a sequence to the set of sequences that can be derived from it by performing one deletion and one substitution. A binary \emph{$(n,N;\mathcal{B})$-reconstruction code} is defined a… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  20. arXiv:2505.04046  [pdf, other

    cs.LG cs.CR cs.CV

    Reliable Disentanglement Multi-view Learning Against View Adversarial Attacks

    Authors: Xuyang Wang, Siyuan Duan, Qizhi Li, Guiduo Duan, Yuan Sun, Dezhong Peng

    Abstract: Recently, trustworthy multi-view learning has attracted extensive attention because evidence learning can provide reliable uncertainty estimation to enhance the credibility of multi-view predictions. Existing trusted multi-view learning methods implicitly assume that multi-view data is secure. In practice, however, in safety-sensitive applications such as autonomous driving and security monitoring… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 11 pages, 11 figures, accepted by International Joint Conference on Artificial Intelligence (IJCAI 2025)

  21. arXiv:2505.03621  [pdf, other

    cs.CV

    PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing

    Authors: Yiping Xie, Bo Zhao, Mingtong Dai, Jian-Ping Zhou, Yue Sun, Tao Tan, Weicheng Xie, Linlin Shen, Zitong Yu

    Abstract: Remote photoplethysmography (rPPG) enables non-contact physiological measurement but remains highly susceptible to illumination changes, motion artifacts, and limited temporal modeling. Large Language Models (LLMs) excel at capturing long-range dependencies, offering a potential solution but struggle with the continuous, noise-sensitive nature of rPPG signals due to their text-centric design. To b… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  22. arXiv:2505.03114  [pdf, other

    cs.CV

    Path and Bone-Contour Regularized Unpaired MRI-to-CT Translation

    Authors: Teng Zhou, Jax Luo, Yuping Sun, Yiheng Tan, Shun Yao, Nazim Haouchine, Scott Raymond

    Abstract: Accurate MRI-to-CT translation promises the integration of complementary imaging information without the need for additional imaging sessions. Given the practical challenges associated with acquiring paired MRI and CT scans, the development of robust methods capable of leveraging unpaired datasets is essential for advancing the MRI-to-CT translation. Current unpaired MRI-to-CT translation methods,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  23. arXiv:2505.02862  [pdf, ps, other

    cs.CL cs.AI

    Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs

    Authors: Haoming Yang, Ke Ma, Xiaojun Jia, Yingfei Sun, Qianqian Xu, Qingming Huang

    Abstract: Despite the remarkable performance of Large Language Models (LLMs), they remain vulnerable to jailbreak attacks, which can compromise their safety mechanisms. Existing studies often rely on brute-force optimization or manual design, failing to uncover potential risks in real-world scenarios. To address this, we propose a novel jailbreak attack framework, ICRT, inspired by heuristics and biases in… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  24. Spatiotemporal Non-Uniformity-Aware Online Task Scheduling in Collaborative Edge Computing for Industrial Internet of Things

    Authors: Yang Li, Xing Zhang, Yukun Sun, Wenbo Wang, Bo Lei

    Abstract: Mobile edge computing mitigates the shortcomings of cloud computing caused by unpredictable wide-area network latency and serves as a critical enabling technology for the Industrial Internet of Things (IIoT). Unlike cloud computing, mobile edge networks offer limited and distributed computing resources. As a result, collaborative edge computing emerges as a promising technology that enhances edge… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted to IEEE Transactions on Mobile Computing

  25. Robust Duality Learning for Unsupervised Visible-Infrared Person Re-Identification

    Authors: Yongxiang Li, Yuan Sun, Yang Qin, Dezhong Peng, Xi Peng, Peng Hu

    Abstract: Unsupervised visible-infrared person re-identification (UVI-ReID) aims to retrieve pedestrian images across different modalities without costly annotations, but faces challenges due to the modality gap and lack of supervision. Existing methods often adopt self-training with clustering-generated pseudo-labels but implicitly assume these labels are always correct. In practice, however, this assumpti… ▽ More

    Submitted 6 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  26. arXiv:2505.02152  [pdf, other

    cs.RO

    Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

    Authors: Cunxin Fan, Xiaosong Jia, Yihang Sun, Yixiao Wang, Jianglan Wei, Ziyang Gong, Xiangyu Zhao, Masayoshi Tomizuka, Xue Yang, Junchi Yan, Mingyu Ding

    Abstract: Vision-Language-Action (VLA) models have shown great promise for generalist robotic manipulation in the physical world. However, existing models are restricted to robot observations and text-only instructions, lacking the flexibility of interleaved multimodal instructions enabled by recent advances in foundation models in the digital world. In this paper, we present Interleave-VLA, the first frame… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  27. Multi-Scale Graph Learning for Anti-Sparse Downscaling

    Authors: Yingda Fan, Runlong Yu, Janet R. Barclay, Alison P. Appling, Yiming Sun, Yiqun Xie, Xiaowei Jia

    Abstract: Water temperature can vary substantially even across short distances within the same sub-watershed. Accurate prediction of stream water temperature at fine spatial resolutions (i.e., fine scales, $\leq$ 1 km) enables precise interventions to maintain water quality and protect aquatic habitats. Although spatiotemporal models have made substantial progress in spatially coarse time series modeling, c… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: AAAI-25, Multi-scale deep learning approach for spatial downscaling of geospatial data with sparse observations

    MSC Class: 68T05; 68U05 ACM Class: I.2.6; I.2.10

    Journal ref: AAAI-25, pages 27969-27977, 2025

  28. arXiv:2505.01267  [pdf, other

    cs.CV

    Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain

    Authors: Gaozheng Pei, Ke Ma, Yingfei Sun, Qianqian Xu, Qingming Huang

    Abstract: The diffusion-based adversarial purification methods attempt to drown adversarial perturbations into a part of isotropic noise through the forward process, and then recover the clean images through the reverse process. Due to the lack of distribution information about adversarial perturbations in the pixel domain, it is often unavoidable to damage normal semantics. We turn to the frequency domain… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  29. arXiv:2505.01008  [pdf, other

    cs.LG

    Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content

    Authors: Haoyue Bai, Yiyou Sun, Wei Cheng, Haifeng Chen

    Abstract: The recent proliferation of photorealistic images created by generative models has sparked both excitement and concern, as these images are increasingly indistinguishable from real ones to the human eye. While offering new creative and commercial possibilities, the potential for misuse, such as in misinformation and fraud, highlights the need for effective detection methods. Current detection appr… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  30. arXiv:2504.21444  [pdf, other

    cs.NI

    A Unified QoS-Aware Multiplexing Framework for Next Generation Immersive Communication with Legacy Wireless Applications

    Authors: Jihong Li, Shunqing Zhang, Tao Yu, Guangjin Pan, Kaixuan Huang, Xiaojing Chen, Yanzan Sun, Junyu Liu, Jiandong Li, Derrick Wing Kwan Ng

    Abstract: Immersive communication, including emerging augmented reality, virtual reality, and holographic telepresence, has been identified as a key service for enabling next-generation wireless applications. To align with legacy wireless applications, such as enhanced mobile broadband or ultra-reliable low-latency communication, network slicing has been widely adopted. However, attempting to statistically… ▽ More

    Submitted 2 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  31. arXiv:2504.21187  [pdf, other

    cs.LG

    LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

    Authors: Neha Prakriya, Zijian Ding, Yizhou Sun, Jason Cong

    Abstract: FPGAs are increasingly adopted in datacenter environments for their reconfigurability and energy efficiency. High-Level Synthesis (HLS) tools have eased FPGA programming by raising the abstraction level from RTL to untimed C/C++, yet attaining high performance still demands expert knowledge and iterative manual insertion of optimization pragmas to modify the microarchitecture. To address this chal… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  32. arXiv:2504.20460  [pdf, ps, other

    cs.IT

    Sequence Reconstruction under Channels with Multiple Bursts of Insertions or Deletions

    Authors: Zhaojun Lan, Yubo Sun, Wenjun Yu, Gennian Ge

    Abstract: The sequence reconstruction problem involves a model where a sequence is transmitted over several identical channels. This model investigates the minimum number of channels required for the unique reconstruction of the transmitted sequence. Levenshtein established that this number exceeds the maximum size of the intersection between the error balls of any two distinct transmitted sequences by one.… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  33. arXiv:2504.19441  [pdf, ps, other

    cs.IT eess.SP

    Age of Information Analysis for NOMA-Assisted Grant-Free Transmissions with Randomly Arrived Packets

    Authors: Yanshi Sun, Yanglin Ye, Caihong Kai, Zhiguo Ding, Bin Chen

    Abstract: This paper investigates the application of non-orthogonal multiple access (NOMA) to grant-free transmissions to reduce the age of information (AoI) in uplink status update systems, where multiple sources upload their {status updates} to {a common} receiver. Unlike existing studies which {adopted} the idealized generate-at-will (GAW) model, {i.e., a status} update data can be generated and transmit… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  34. arXiv:2504.19432  [pdf, other

    cs.CV cs.AI

    EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation

    Authors: Zhe Dong, Yuzhe Sun, Tianzhu Liu, Wangmeng Zuo, Yanfeng Gu

    Abstract: Satellite imagery and maps, as two fundamental data modalities in remote sensing, offer direct observations of the Earth's surface and human-interpretable geographic abstractions, respectively. The task of bidirectional translation between satellite images and maps (BSMT) holds significant potential for applications in urban planning and disaster response. However, this task presents two major cha… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  35. arXiv:2504.18765  [pdf, other

    cs.AI

    A Vision for Auto Research with LLM Agents

    Authors: Chengwei Liu, Chong Wang, Jiayue Cao, Jingquan Ge, Kun Wang, Lvye Zhang, Ming-Ming Cheng, Penghai Zhao, Tianlin Li, Xiaojun Jia, Xiang Li, Xinfeng Li, Yang Liu, Yebo Feng, Yihao Huang, Yijia Xu, Yuqiang Sun, Zhenhong Zhou, Zhengzi Xu

    Abstract: This paper introduces Agent-Based Auto Research, a structured multi-agent framework designed to automate, coordinate, and optimize the full lifecycle of scientific research. Leveraging the capabilities of large language models (LLMs) and modular agent collaboration, the system spans all major research phases, including literature review, ideation, methodology planning, experimentation, paper writi… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  36. arXiv:2504.18022  [pdf, other

    cs.IT eess.SY

    Iterative Joint Detection of Kalman Filter and Channel Decoder for Sensor-to-Controller Link in Wireless Networked Control Systems

    Authors: Jinnan Piao, Dong Li, Yiming Sun, Zhibo Li, Ming Yang, Xueting Yu

    Abstract: In this letter, we propose an iterative joint detection algorithm of Kalman filter (KF) and channel decoder for the sensor-to-controller link of wireless networked control systems, which utilizes the prior information of control system to improve the control and communication performance. In the algorithm, we first use the KF to estimate the probability density of the control system outputs and ca… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 5 pages, 4 figures

  37. arXiv:2504.17828  [pdf, other

    cs.CV cs.AI

    VEU-Bench: Towards Comprehensive Understanding of Video Editing

    Authors: Bozheng Li, Yongliang Wu, Yi Lu, Jiashuo Yu, Licheng Tang, Jiawang Cao, Wenqing Zhu, Yuyang Sun, Jay Wu, Wenbo Zhu

    Abstract: Widely shared videos on the internet are often edited. Recently, although Video Large Language Models (Vid-LLMs) have made great progress in general video understanding tasks, their capabilities in video editing understanding (VEU) tasks remain unexplored. To address this gap, in this paper, we introduce VEU-Bench (Video Editing Understanding Benchmark), a comprehensive benchmark that categorizes… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR2025

  38. arXiv:2504.17075  [pdf, other

    cs.CL cs.CY

    Agree to Disagree? A Meta-Evaluation of LLM Misgendering

    Authors: Arjun Subramonian, Vagrant Gautam, Preethi Seshadri, Dietrich Klakow, Kai-Wei Chang, Yizhou Sun

    Abstract: Numerous methods have been proposed to measure LLM misgendering, including probability-based evaluations (e.g., automatically with templatic sentences) and generation-based evaluations (e.g., with automatic heuristics or human validation). However, it has gone unexamined whether these evaluation methods have convergent validity, that is, whether their results align. Therefore, we conduct a systema… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Work in progress

  39. arXiv:2504.16918  [pdf, other

    cs.CL cs.AI

    OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents

    Authors: Raghav Thind, Youran Sun, Ling Liang, Haizhao Yang

    Abstract: Optimization plays a vital role in scientific research and practical applications, but formulating a concrete optimization problem described in natural language into a mathematical form and selecting a suitable solver to solve the problem requires substantial domain expertise. We introduce \textbf{OptimAI}, a framework for solving \underline{Optim}ization problems described in natural language by… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  40. arXiv:2504.16374  [pdf, other

    cs.RO

    DPGP: A Hybrid 2D-3D Dual Path Potential Ghost Probe Zone Prediction Framework for Safe Autonomous Driving

    Authors: Weiming Qu, Jiawei Du, Shenghai Yuan, Jia Wang, Yang Sun, Shengyi Liu, Yuanhao Zhu, Jianfeng Yu, Song Cao, Rui Xia, Xiaoyu Tang, Xihong Wu, Dingsheng Luo

    Abstract: Modern robots must coexist with humans in dense urban environments. A key challenge is the ghost probe problem, where pedestrians or objects unexpectedly rush into traffic paths. This issue affects both autonomous vehicles and human drivers. Existing works propose vehicle-to-everything (V2X) strategies and non-line-of-sight (NLOS) imaging for ghost probe zone detection. However, most require high… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  41. arXiv:2504.16172  [pdf, other

    math.NA cs.AI cs.LG math.PR stat.ML

    Physics-Informed Inference Time Scaling via Simulation-Calibrated Scientific Machine Learning

    Authors: Zexi Fan, Yan Sun, Shihao Yang, Yiping Lu

    Abstract: High-dimensional partial differential equations (PDEs) pose significant computational challenges across fields ranging from quantum chemistry to economics and finance. Although scientific machine learning (SciML) techniques offer approximate solutions, they often suffer from bias and neglect crucial physical insights. Inspired by inference-time scaling strategies in language models, we propose Sim… ▽ More

    Submitted 25 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  42. arXiv:2504.16142  [pdf

    eess.SP cs.AI cs.LG

    A Non-Invasive Load Monitoring Method for Edge Computing Based on MobileNetV3 and Dynamic Time Regulation

    Authors: Hangxu Liu, Yaojie Sun, Yu Wang

    Abstract: In recent years, non-intrusive load monitoring (NILM) technology has attracted much attention in the related research field by virtue of its unique advantage of utilizing single meter data to achieve accurate decomposition of device-level energy consumption. Cutting-edge methods based on machine learning and deep learning have achieved remarkable results in load decomposition accuracy by fusing ti… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  43. arXiv:2504.16084  [pdf, ps, other

    cs.CL cs.LG

    TTRL: Test-Time Reinforcement Learning

    Authors: Yuxin Zuo, Kaiyan Zhang, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui, Ning Ding, Bowen Zhou

    Abstract: This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to ground-truth information. While this setting appears elusive, we find that common practices in Test-Time Scaling (TTS), such as majority voting, yield surprisingly… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  44. arXiv:2504.16074  [pdf, other

    cs.CL

    PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

    Authors: Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang , et al. (27 additional authors not shown)

    Abstract: We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, t… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 21 pages ,8 figures, 4 tables

  45. arXiv:2504.15585  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

    Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li , et al. (57 additional authors not shown)

    Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concer… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  46. arXiv:2504.15322  [pdf, other

    cs.LG cs.AI physics.ao-ph

    How to systematically develop an effective AI-based bias correction model?

    Authors: Xiao Zhou, Yuze Sun, Jie Wu, Xiaomeng Huang

    Abstract: This study introduces ReSA-ConvLSTM, an artificial intelligence (AI) framework for systematic bias correction in numerical weather prediction (NWP). We propose three innovations by integrating dynamic climatological normalization, ConvLSTM with temporal causality constraints, and residual self-attention mechanisms. The model establishes a physics-aware nonlinear mapping between ECMWF forecasts and… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  47. arXiv:2504.14877  [pdf, other

    cs.CV

    Collaborative Enhancement Network for Low-quality Multi-spectral Vehicle Re-identification

    Authors: Aihua Zheng, Yongqi Sun, Zi Wang, Chenglong Li, Jin Tang

    Abstract: The performance of multi-spectral vehicle Re-identification (ReID) is significantly degraded when some important discriminative cues in visible, near infrared and thermal infrared spectra are lost. Existing methods generate or enhance missing details in low-quality spectra data using the high-quality one, generally called the primary spectrum, but how to justify the primary spectrum is a challengi… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  48. arXiv:2504.14845  [pdf, other

    cs.IR

    Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph

    Authors: Qiushi Xiong, Zhipeng Xu, Zhenghao Liu, Mengjia Wang, Zulong Chen, Yue Sun, Yu Gu, Xiaohua Li, Ge Yu

    Abstract: Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation. Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents. Existing models often rely on the emergent capabilities of Large Language Models (LLM… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  49. arXiv:2504.14548  [pdf, other

    cs.CV cs.AI

    VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control

    Authors: Lifeng Lin, Rongfeng Lu, Quan Chen, Haofan Ren, Ming Lu, Yaoqi Sun, Chenggang Yan, Anke Xue

    Abstract: Sparse-view 3D reconstruction is a fundamental yet challenging task in practical 3D reconstruction applications. Recently, many methods based on the 3D Gaussian Splatting (3DGS) framework have been proposed to address sparse-view 3D reconstruction. Although these methods have made considerable advancements, they still show significant issues with overfitting. To reduce the overfitting, we introduc… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: 10 pages,8 figures

  50. arXiv:2504.14350  [pdf, other

    cs.AI

    Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint

    Authors: Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu

    Abstract: Recent work has demonstrated the remarkable potential of Large Language Models (LLMs) in test-time scaling. By making the models think before answering, they are able to achieve much higher accuracy with extra inference computation. However, in many real-world scenarios, models are used under time constraints, where an answer should be given to the user within a certain output length. It is unclea… ▽ More

    Submitted 22 April, 2025; v1 submitted 19 April, 2025; originally announced April 2025.