Skip to main content

Showing 1–50 of 4,506 results for author: Wang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10351  [pdf, other

    cs.CV

    A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

    Authors: Jie Zhu, Jirong Zha, Ding Li, Leye Wang

    Abstract: Self-supervised learning shows promise in harnessing extensive unlabeled data, but it also confronts significant privacy concerns, especially in vision. In this paper, we perform membership inference on visual self-supervised models in a more realistic setting: self-supervised training method and details are unknown for an adversary when attacking as he usually faces a black-box system in practice… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: An extension of our ACM CCS2024 conference paper (arXiv:2404.02462). We show the impacts of scaling from both data and model aspects on membership inference for self-supervised visual encoders

  2. arXiv:2505.10018  [pdf, other

    cs.RO

    LEMON-Mapping: Loop-Enhanced Large-Scale Multi-Session Point Cloud Merging and Optimization for Globally Consistent Mapping

    Authors: Lijie Wang, Xiaoyi Zhong, Ziyi Xu, Kaixin Chai, Anke Zhao, Tianyu Zhao, Qianhao Wang, Fei Gao

    Abstract: With the rapid development of robotics, multi-robot collaboration has become critical and challenging. One key problem is integrating data from multiple robots to build a globally consistent and accurate map for robust cooperation and precise localization. While traditional multi-robot pose graph optimization (PGO) maintains basic global consistency, it focuses primarily on pose optimization and i… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09995  [pdf, ps, other

    cs.NI cs.DL

    A Survey on Open-Source Edge Computing Simulators and Emulators: The Computing and Networking Convergence Perspective

    Authors: Jianpeng Qi, Chao Liu, Xiao Zhang, Lei Wang, Rui Wang, Junyu Dong, Yanwei Yu

    Abstract: Edge computing, with its low latency, dynamic scalability, and location awareness, along with the convergence of computing and communication paradigms, has been successfully applied in critical domains such as industrial IoT, smart healthcare, smart homes, and public safety. This paper provides a comprehensive survey of open-source edge computing simulators and emulators, presented in our GitHub r… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 10 pages, 2 figures, 5 tables

  4. arXiv:2505.09926  [pdf, ps, other

    cs.CV cs.AI

    AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection

    Authors: Bin-Bin Gao, Yue Zhu, Jiangtao Yan, Yuezhi Cai, Weixi Zhang, Meng Wang, Jun Liu, Yong Liu, Lei Wang, Chengjie Wang

    Abstract: Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images. However, existing methods struggle with designing prompt templates, complex token… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 27 pages, 15 figures, 22 tables

  5. arXiv:2505.09701  [pdf, ps, other

    cs.CL

    VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts

    Authors: Xin Liu, Lechen Zhang, Sheza Munir, Yiyang Gu, Lu Wang

    Abstract: Large language models (LLMs) excel at generating long-form responses, but evaluating their factuality remains challenging due to complex inter-sentence dependencies within the generated facts. Prior solutions predominantly follow a decompose-decontextualize-verify pipeline but often fail to capture essential context and miss key relational facts. In this paper, we introduce VeriFact, a factuality… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  6. arXiv:2505.09196  [pdf, other

    cs.CV

    PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement

    Authors: Tong Li, Lizhi Wang, Hansen Feng, Lin Zhu, Hua Huang

    Abstract: Low-light image enhancement (LLIE) is a fundamental task in computational photography, aiming to improve illumination, reduce noise, and enhance image quality. While recent advancements focus on designing increasingly complex neural network models, we observe a peculiar phenomenon: resetting certain parameters to random values unexpectedly improves enhancement performance for some images. Drawing… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 11 pages, 9 tables, 9 figures

  7. arXiv:2505.09074  [pdf, other

    cs.RO

    Deployable and Generalizable Motion Prediction: Taxonomy, Open Challenges and Future Directions

    Authors: Letian Wang, Marc-Antoine Lavoie, Sandro Papais, Barza Nisar, Yuxiao Chen, Wenhao Ding, Boris Ivanovic, Hao Shao, Abulikemu Abuduweili, Evan Cook, Yang Zhou, Peter Karkus, Jiachen Li, Changliu Liu, Marco Pavone, Steven Waslander

    Abstract: Motion prediction, the anticipation of future agent states or scene evolution, is rooted in human cognition, bridging perception and decision-making. It enables intelligent systems, such as robots and self-driving cars, to act safely in dynamic, human-involved environments, and informs broader time-series reasoning challenges. With advances in methods, representations, and datasets, the field has… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Initial draft, 162 pages, 40 figures, 13 tables

  8. arXiv:2505.08804  [pdf, ps, other

    cs.CR cs.LG

    TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis

    Authors: Longtian Wang, Xiaofei Xie, Tianlin Li, Yuhan Zhi, Chao Shen

    Abstract: Text-to-image (T2I) models have significantly advanced in producing high-quality images. However, such models have the ability to generate images containing not-safe-for-work (NSFW) content, such as pornography, violence, political content, and discrimination. To mitigate the risk of generating NSFW content, refusal mechanisms, i.e., safety checkers, have been developed to check potential NSFW con… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 13 pages, 5 figures

  9. arXiv:2505.08614  [pdf, ps, other

    cs.CV

    WaveGuard: Robust Deepfake Detection and Source Tracing via Dual-Tree Complex Wavelet and Graph Neural Networks

    Authors: Ziyuan He, Zhiqing Guo, Liejun Wang, Gaobo Yang, Yunfeng Diao, Dan Ma

    Abstract: Deepfake technology poses increasing risks such as privacy invasion and identity theft. To address these threats, we propose WaveGuard, a proactive watermarking framework that enhances robustness and imperceptibility via frequency-domain embedding and graph-based structural consistency. Specifically, we embed watermarks into high-frequency sub-bands using Dual-Tree Complex Wavelet Transform (DT-CW… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: 11 pages, 5 figures, 4 tables

  10. arXiv:2505.07796  [pdf, other

    cs.CL cs.AI cs.LG

    Learning Dynamics in Continual Pre-Training for Large Language Models

    Authors: Xingjin Wang, Howe Tissue, Lu Wang, Linjing Li, Daniel Dajun Zeng

    Abstract: Continual Pre-Training (CPT) has become a popular and effective method to apply strong foundation models to specific downstream tasks. In this work, we explore the learning dynamics throughout the CPT process for large language models. We specifically focus on how general and downstream domain performance evolves at each training step, with domain performance measured via validation losses. We hav… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML2025 (spotlight)

  11. arXiv:2505.07634  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

    Authors: Jian Liu, Xiongtao Shi, Thai Duy Nguyen, Haitian Zhang, Tianxiang Zhang, Wei Sun, Yanjie Li, Athanasios V. Vasilakos, Giovanni Iacca, Arshad Ali Khan, Arvind Kumar, Jae Won Cho, Ajmal Mian, Lihua Xie, Erik Cambria, Lin Wang

    Abstract: The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the ris… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: 51 pages, 17 figures, 9 tables

  12. arXiv:2505.07622  [pdf, ps, other

    cs.CV

    A Unified Hierarchical Framework for Fine-grained Cross-view Geo-localization over Large-scale Scenarios

    Authors: Zhuo Song, Ye Zhang, Kunhong Li, Longguang Wang, Yulan Guo

    Abstract: Cross-view geo-localization is a promising solution for large-scale localization problems, requiring the sequential execution of retrieval and metric localization tasks to achieve fine-grained predictions. However, existing methods typically focus on designing standalone models for these two tasks, resulting in inefficient collaboration and increased training overhead. In this paper, we propose Un… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  13. arXiv:2505.07581  [pdf, other

    cs.AI cs.CY

    YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models

    Authors: Lei Wang, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen

    Abstract: Leveraging large language model (LLM) based agents to simulate human social behaviors has recently gained significant attention. In this paper, we introduce a novel social simulator called YuLan-OneSim. Compared to previous works, YuLan-OneSim distinguishes itself in five key aspects: (1) Code-free scenario construction: Users can simply describe and refine their simulation scenarios through natur… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  14. arXiv:2505.07528  [pdf, other

    cs.CL

    SEReDeEP: Hallucination Detection in Retrieval-Augmented Models via Semantic Entropy and Context-Parameter Fusion

    Authors: Lei Wang

    Abstract: Retrieval-Augmented Generation (RAG) models frequently encounter hallucination phenomena when integrating external information with internal parametric knowledge. Empirical studies demonstrate that the disequilibrium between external contextual information and internal parametric knowledge constitutes a primary factor in hallucination generation. Existing hallucination detection methodologies pred… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  15. arXiv:2505.07381  [pdf, ps, other

    cs.CV cs.AI

    Few-shot Semantic Encoding and Decoding for Video Surveillance

    Authors: Baoping Cheng, Yukun Zhang, Liming Wang, Xiaoyan Xie, Tao Fu, Dongkun Wang, Xiaoming Tao

    Abstract: With the continuous increase in the number and resolution of video surveillance cameras, the burden of transmitting and storing surveillance video is growing. Traditional communication methods based on Shannon's theory are facing optimization bottlenecks. Semantic communication, as an emerging communication method, is expected to break through this bottleneck and reduce the storage and transmissio… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  16. arXiv:2505.06898  [pdf, ps, other

    cs.CV cs.CL

    Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration

    Authors: Honglong Yang, Shanshan Song, Yi Qin, Lehan Wang, Haonan Wang, Xinpeng Ding, Qixiang Zhang, Bodong Du, Xiaomeng Li

    Abstract: Generalist Medical AI (GMAI) systems have demonstrated expert-level performance in biomedical perception tasks, yet their clinical utility remains limited by inadequate multi-modal explainability and suboptimal prognostic capabilities. Here, we present XMedGPT, a clinician-centric, multi-modal AI assistant that integrates textual and visual interpretability to support transparent and trustworthy m… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  17. arXiv:2505.06880  [pdf, ps, other

    cs.SE

    Benchmarking and Revisiting Code Generation Assessment: A Mutation-Based Approach

    Authors: Longtian Wang, Tianlin Li, Xiaofei Xie, Yuhan Zhi, Jian Wang, Chao Shen

    Abstract: Code Large Language Models (CLLMs) have exhibited outstanding performance in program synthesis, attracting the focus of the research community. The evaluation of CLLM's program synthesis capability has generally relied on manually curated benchmarks. However, there is a substantial gap between real-world scenarios and benchmark settings. Existing benchmarks typically provide only a single input pr… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 8 pages, 4 figures

  18. arXiv:2505.06625  [pdf, ps, other

    cs.AR cs.AI cs.OS

    CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

    Authors: Tianhao Cai, Liang Wang, Limin Xiao, Meng Han, Zeyu Wang, Lin Sun, Xiaojian Liao

    Abstract: With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 7 pages, 9 figures. This paper has been accepted to the 2025 Design Automation Conference (DAC)

  19. arXiv:2505.06573  [pdf, ps, other

    cs.CV

    ElectricSight: 3D Hazard Monitoring for Power Lines Using Low-Cost Sensors

    Authors: Xingchen Li, LiDian Wang, Yu Sheng, ZhiPeng Tang, Haojie Ren, Guoliang You, YiFan Duan, Jianmin Ji, Yanyong Zhang

    Abstract: Protecting power transmission lines from potential hazards involves critical tasks, one of which is the accurate measurement of distances between power lines and potential threats, such as large cranes. The challenge with this task is that the current sensor-based methods face challenges in balancing accuracy and cost in distance measurement. A common practice is to install cameras on transmission… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  20. arXiv:2505.06557  [pdf, ps, other

    cs.CV

    Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining

    Authors: Lu Dong, Haiyu Zhang, Hongjie Zhang, Yifei Huang, Zhen-Hua Ling, Yu Qiao, Limin Wang, Yali Wang

    Abstract: The task of weakly supervised temporal sentence grounding (WSTSG) aims to detect temporal intervals corresponding to a language description from untrimmed videos with only video-level video-language correspondence. For an anchor sample, most existing approaches generate negative samples either from other videos or within the same video for contrastive learning. However, some training samples are h… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: TCSVT 2025, doi at https://ieeexplore.ieee.org/document/10970001

  21. arXiv:2505.06556  [pdf, other

    cs.DB cs.DC

    TierBase: A Workload-Driven Cost-Optimized Key-Value Store

    Authors: Zhitao Shen, Shiyu Yang, Weibo Chen, Kunming Wang, Yue Li, Jiabao Jin, Wei Jia, Junwei Chen, Yuan Su, Xiaoxia Duan, Wei Chen, Lei Wang, Jie Song, Ruoyi Ruan, Xuemin Lin

    Abstract: In the current era of data-intensive applications, the demand for high-performance, cost-effective storage solutions is paramount. This paper introduces a Space-Performance Cost Model for key-value store, designed to guide cost-effective storage configuration decisions. The model quantifies the trade-offs between performance and storage costs, providing a framework for optimizing resource allocati… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: Accepted by ICDE 2025

  22. arXiv:2505.06418  [pdf, ps, other

    cs.CL

    Is your multimodal large language model a good science tutor?

    Authors: Ming Liu, Liwen Wang, Wensheng Zhang

    Abstract: Multimodal large language models (MLLMs) demonstrate impressive performance on scientific reasoning tasks (e.g., ScienceQA). However, most existing benchmarks focus narrowly on the accuracy of the final answer while ignoring other metrics. In particular, when applying MLLMs to educational contexts, the goal is not only correctness but also the ability to teach. In this paper, we propose a framewor… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  23. arXiv:2505.06413  [pdf, ps, other

    cs.CV cs.AI

    Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving

    Authors: Ming Liu, Siyuan Liang, Koushik Howlader, Liwen Wang, Dacheng Tao, Wensheng Zhang

    Abstract: Vision-Language Models (VLMs) have been integrated into autonomous driving systems to enhance reasoning capabilities through tasks such as Visual Question Answering (VQA). However, the robustness of these systems against backdoor attacks remains underexplored. In this paper, we propose a natural reflection-based backdoor attack targeting VLM systems in autonomous driving scenarios, aiming to induc… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  24. arXiv:2505.05922  [pdf, ps, other

    cs.CR cs.LG

    Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

    Authors: Haoqi Wu, Wei Dai, Li Wang, Qiang Yan

    Abstract: Large Language Models (LLMs) have gained significant popularity due to their remarkable capabilities in text understanding and generation. However, despite their widespread deployment in inference services such as ChatGPT, concerns about the potential leakage of sensitive user data have arisen. Existing solutions primarily rely on privacy-enhancing technologies to mitigate such risks, facing the t… ▽ More

    Submitted 15 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: to be published in ICML 2025

  25. arXiv:2505.05515  [pdf, other

    q-bio.NC cs.LG

    Nature's Insight: A Novel Framework and Comprehensive Analysis of Agentic Reasoning Through the Lens of Neuroscience

    Authors: Zinan Liu, Haoran Li, Jingyi Lu, Gaoyuan Ma, Xu Hong, Giovanni Iacca, Arvind Kumar, Shaojun Tang, Lin Wang

    Abstract: Autonomous AI is no longer a hard-to-reach concept, it enables the agents to move beyond executing tasks to independently addressing complex problems, adapting to change while handling the uncertainty of the environment. However, what makes the agents truly autonomous? It is agentic reasoning, that is crucial for foundation models to develop symbolic logic, statistical correlations, or large-scale… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 39 pages, 17 figures

  26. arXiv:2505.05456  [pdf, other

    cs.CV

    SITE: towards Spatial Intelligence Thorough Evaluation

    Authors: Wenqi Wang, Reuben Tan, Pengyue Zhu, Jianwei Yang, Zhengyuan Yang, Lijuan Wang, Andrey Kolobov, Jianfeng Gao, Boqing Gong

    Abstract: Spatial intelligence (SI) represents a cognitive ability encompassing the visualization, manipulation, and reasoning about spatial relationships, underpinning disciplines from neuroscience to robotics. We introduce SITE, a benchmark dataset towards SI Thorough Evaluation in a standardized format of multi-choice visual question-answering, designed to assess large vision-language models' spatial int… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  27. arXiv:2505.05315  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Scalable Chain of Thoughts via Elastic Reasoning

    Authors: Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong

    Abstract: Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT). However, their uncontrolled output lengths pose significant challenges for real-world deployment, where inference-time budgets on tokens, latency, or compute are strictly constrained. We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that exp… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  28. arXiv:2505.04956  [pdf, other

    cs.LG cs.AI

    Graffe: Graph Representation Learning via Diffusion Probabilistic Models

    Authors: Dingshuo Chen, Shuchen Xue, Liuji Chen, Yingheng Wang, Qiang Liu, Shu Wu, Zhi-Ming Ma, Liang Wang

    Abstract: Diffusion probabilistic models (DPMs), widely recognized for their potential to generate high-quality samples, tend to go unnoticed in representation learning. While recent progress has highlighted their potential for capturing visual semantics, adapting DPMs to graph representation learning remains in its infancy. In this paper, we introduce Graffe, a self-supervised diffusion model proposed for… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 16 pages, 4 figures, under review

  29. arXiv:2505.04921  [pdf, other

    cs.CV cs.CL

    Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

    Authors: Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang

    Abstract: Reasoning lies at the heart of intelligence, shaping the ability to make decisions, draw conclusions, and generalize across domains. In artificial intelligence, as systems increasingly operate in open, uncertain, and multimodal environments, reasoning becomes essential for enabling robust and adaptive behavior. Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integra… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 75 Pages,10 figures; Project: https://github.com/HITsz-TMG/Awesome-Large-Multimodal-Reasoning-Models

  30. arXiv:2505.03779  [pdf, other

    cs.LG

    Neural Co-Optimization of Structural Topology, Manufacturable Layers, and Path Orientations for Fiber-Reinforced Composites

    Authors: Tao Liu, Tianyu Zhang, Yongxue Chen, Weiming Wang, Yu Jiang, Yuming Huang, Charlie C. L. Wang

    Abstract: We propose a neural network-based computational framework for the simultaneous optimization of structural topology, curved layers, and path orientations to achieve strong anisotropic strength in fiber-reinforced thermoplastic composites while ensuring manufacturability. Our framework employs three implicit neural fields to represent geometric shape, layer sequence, and fiber orientation. This enab… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  31. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  32. arXiv:2505.03116  [pdf, ps, other

    cs.CV

    TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion

    Authors: Haoyue Liu, Jinghan Xu, Yi Chang, Hanyu Zhou, Haozhi Zhao, Lin Wang, Luxin Yan

    Abstract: Video frame interpolation (VFI) that leverages the bio-inspired event cameras as guidance has recently shown better performance and memory efficiency than the frame-based methods, thanks to the event cameras' advantages, such as high temporal resolution. A hurdle for event-based VFI is how to effectively deal with non-linear motion, caused by the dynamic changes in motion direction and speed withi… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  33. arXiv:2505.03097  [pdf, other

    cs.CV

    Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability

    Authors: Lei Wang, Senmao Li, Fei Yang, Jianye Wang, Ziheng Zhang, Yuhan Liu, Yaxing Wang, Jian Yang

    Abstract: The diffusion models, in early stages focus on constructing basic image structures, while the refined details, including local features and textures, are generated in later stages. Thus the same network layers are forced to learn both structural and textural information simultaneously, significantly differing from the traditional deep learning architectures (e.g., ResNet or GANs) which captures or… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted to CVPR 2025

  34. arXiv:2505.02835  [pdf, ps, other

    cs.CV cs.CL

    R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

    Authors: Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, Haojie Ding, Jiankang Chen, Fan Yang, Zhang Zhang, Tingting Gao, Liang Wang

    Abstract: Multimodal Reward Models (MRMs) play a crucial role in enhancing the performance of Multimodal Large Language Models (MLLMs). While recent advancements have primarily focused on improving the model structure and training data of MRMs, there has been limited exploration into the effectiveness of long-term reasoning capabilities for reward modeling and how to activate these capabilities in MRMs. In… ▽ More

    Submitted 9 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: Home page: https://github.com/yfzhang114/r1_reward

  35. arXiv:2505.02471  [pdf, other

    cs.CV

    Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

    Abstract: We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale repr… ▽ More

    Submitted 7 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: https://github.com/inclusionAI/Ming/tree/main/Ming-unify

  36. EnsembleCI: Ensemble Learning for Carbon Intensity Forecasting

    Authors: Leyi Yan, Linda Wang, Sihang Liu, Yi Ding

    Abstract: Carbon intensity (CI) measures the average carbon emissions generated per unit of electricity, making it a crucial metric for quantifying and managing the environmental impact. Accurate CI predictions are vital for minimizing carbon footprints, yet the state-of-the-art method (CarbonCast) falls short due to its inability to address regional variability and lack of adaptability. To address these li… ▽ More

    Submitted 6 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

    Comments: 5 pages, 5 figures, 3 tables, In The 16th ACM International Conference on Future and Sustainable Energy Systems (E-ENERGY'25)

  37. arXiv:2505.01712  [pdf, other

    cs.AI cs.NI

    World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks

    Authors: Lingyi Wang, Rashed Shelim, Walid Saad, Naren Ramakrishnan

    Abstract: Traditional reinforcement learning (RL)-based learning approaches for wireless networks rely on expensive trial-and-error mechanisms and real-time feedback based on extensive environment interactions, which leads to low data efficiency and short-sighted policies. These limitations become particularly problematic in complex, dynamic networks with high uncertainty and long-term planning requirements… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  38. arXiv:2505.01595  [pdf, other

    cs.CL cs.AI cs.LG

    Always Tell Me The Odds: Fine-grained Conditional Probability Estimation

    Authors: Liaoyaqi Wang, Zhengping Jiang, Anqi Liu, Benjamin Van Durme

    Abstract: We present a state-of-the-art model for fine-grained probability estimation of propositions conditioned on context. Recent advances in large language models (LLMs) have significantly enhanced their reasoning capabilities, particularly on well-defined tasks with complete information. However, LLMs continue to struggle with making accurate and well-calibrated probabilistic predictions under uncertai… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  39. arXiv:2505.01537  [pdf, other

    cs.HC

    Passing the Buck to AI: How Individuals' Decision-Making Patterns Affect Reliance on AI

    Authors: Katelyn Xiaoying Mei, Rock Yuren Pang, Alex Lyford, Lucy Lu Wang, Katharina Reinecke

    Abstract: Psychological research has identified different patterns individuals have while making decisions, such as vigilance (making decisions after thorough information gathering), hypervigilance (rushed and anxious decision-making), and buckpassing (deferring decisions to others). We examine whether these decision-making patterns shape peoples' likelihood of seeking out or relying on AI. In an online exp… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  40. arXiv:2505.01168  [pdf, other

    cs.LG cs.AI

    Harmonizing Intra-coherence and Inter-divergence in Ensemble Attacks for Adversarial Transferability

    Authors: Zhaoyang Ma, Zhihao Wu, Wang Lu, Xin Gao, Jinghang Yue, Taolin Zhang, Lipo Wang, Youfang Lin, Jing Wang

    Abstract: The development of model ensemble attacks has significantly improved the transferability of adversarial examples, but this progress also poses severe threats to the security of deep neural networks. Existing methods, however, face two critical challenges: insufficient capture of shared gradient directions across models and a lack of adaptive weight allocation mechanisms. To address these issues, w… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  41. arXiv:2505.01068  [pdf, other

    cs.CL cs.AI

    Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs

    Authors: Yijie Jin, Junjie Peng, Xuanchao Lin, Haochen Yuan, Lan Wang, Cangzhi Zheng

    Abstract: Multimodal Sentiment Analysis (MSA) is a rapidly developing field that integrates multimodal information to recognize sentiments, and existing models have made significant progress in this area. The central challenge in MSA is multimodal fusion, which is predominantly addressed by Multimodal Transformers (MulTs). Although act as the paradigm, MulTs suffer from efficiency concerns. In this work, fr… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  42. arXiv:2505.00974  [pdf, ps, other

    cs.IT

    On the Worst-Case Complexity of Gibbs Decoding for Reed--Muller Codes

    Authors: Xuzhe Xia, Nicholas Kwan, Lele Wang

    Abstract: Reed--Muller (RM) codes are known to achieve capacity on binary symmetric channels (BSC) under the Maximum a Posteriori (MAP) decoder. However, it remains an open problem to design a capacity achieving polynomial-time RM decoder. Due to a lemma by Liu, Cuff, and Verdú, it can be shown that decoding by sampling from the posterior distribution is also capacity-achieving for RM codes over BSC. The Gi… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  43. arXiv:2505.00822  [pdf, other

    stat.ME cs.LG stat.ML

    Q-Learning with Clustered-SMART (cSMART) Data: Examining Moderators in the Construction of Clustered Adaptive Interventions

    Authors: Yao Song, Kelly Speth, Amy Kilbourne, Andrew Quanbeck, Daniel Almirall, Lu Wang

    Abstract: A clustered adaptive intervention (cAI) is a pre-specified sequence of decision rules that guides practitioners on how best - and based on which measures - to tailor cluster-level intervention to improve outcomes at the level of individuals within the clusters. A clustered sequential multiple assignment randomized trial (cSMART) is a type of trial that is used to inform the empirical development o… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  44. arXiv:2504.21646  [pdf, other

    cs.CV

    Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection

    Authors: Liqin Wang, Qianyue Hu, Wei Lu, Xiangyang Luo

    Abstract: The success of face recognition (FR) systems has led to serious privacy concerns due to potential unauthorized surveillance and user tracking on social networks. Existing methods for enhancing privacy fail to generate natural face images that can protect facial privacy. In this paper, we propose diffusion-based adversarial identity manipulation (DiffAIM) to generate natural and highly transferable… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  45. arXiv:2504.21571  [pdf

    cs.NI

    FreeBeacon: Efficient Communication and Data Aggregation in Battery-Free IoT

    Authors: Gaosheng Liu, Kasım Sinan Yıldırım, Lin Wang

    Abstract: To improve sustainability, Internet-of-Things (IoT) is increasingly adopting battery-free devices powered by ambient energy scavenged from the environment. The unpredictable availability of ambient energy leads to device intermittency, bringing critical challenges to device communication and related fundamental operations like data aggregation. We propose FreeBeacon, a novel scheme for efficient… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  46. arXiv:2504.21421  [pdf

    cs.CL

    The Distribution of Dependency Distance and Hierarchical Distance in Contemporary Written Japanese and Its Influencing Factors

    Authors: Linxuan Wang, Shuiyuan Yu

    Abstract: To explore the relationship between dependency distance (DD) and hierarchical distance (HD) in Japanese, we compared the probability distributions of DD and HD with and without sentence length fixed, and analyzed the changes in mean dependency distance (MDD) and mean hierarchical distance (MHD) as sentence length increases, along with their correlation coefficient based on the Balanced Corpus of C… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted by the 13th International Quantitative Linguistics Conference QUALICO 2025

  47. arXiv:2504.21099  [pdf, other

    cs.LG cs.AI

    A Survey on Parameter-Efficient Fine-Tuning for Foundation Models in Federated Learning

    Authors: Jieming Bian, Yuanzhe Peng, Lei Wang, Yin Huang, Jie Xu

    Abstract: Foundation models have revolutionized artificial intelligence by providing robust, versatile architectures pre-trained on large-scale datasets. However, adapting these massive models to specific downstream tasks requires fine-tuning, which can be prohibitively expensive in computational resources. Parameter-Efficient Fine-Tuning (PEFT) methods address this challenge by selectively updating only a… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: survey paper, under updating

  48. arXiv:2504.21043  [pdf, other

    cs.CR cs.AI

    CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain

    Authors: Lingxiang Wang, Hainan Zhang, Qinnan Zhang, Ziwei Wang, Hongwei Zheng, Jin Dong, Zhiming Zheng

    Abstract: Large language models (LLMs) excel at generating code from natural language instructions, yet they often lack an understanding of security vulnerabilities. This limitation makes it difficult for LLMs to avoid security risks in generated code, particularly in high-security programming tasks such as smart contract development for blockchain. Researchers have attempted to enhance the vulnerability aw… ▽ More

    Submitted 6 May, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  49. arXiv:2504.20447  [pdf, other

    cs.SD cs.AI eess.AS

    APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech

    Authors: Zhicheng Lian, Lizhi Wang, Hua Huang

    Abstract: Automatic speech quality assessment aims to quantify subjective human perception of speech through computational models to reduce the need for labor-consuming manual evaluations. While models based on deep learning have achieved progress in predicting mean opinion scores (MOS) to assess synthetic speech, the neglect of fundamental auditory perception mechanisms limits consistency with human judgme… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  50. arXiv:2504.20073  [pdf, other

    cs.LG cs.AI cs.CL

    RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

    Authors: Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li

    Abstract: Training large language models (LLMs) as interactive agents presents unique challenges including long-horizon decision making and interacting with stochastic environment feedback. While reinforcement learning (RL) has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO (State-Thinking-Actions-Reward Policy Optimization), a general framework for t… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.