Skip to main content

Showing 1–50 of 7,743 results for author: Wang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10565  [pdf, other

    cs.CV

    Depth Anything with Any Prior

    Authors: Zehan Wang, Siyu Chen, Lihe Yang, Jialei Wang, Ziang Zhang, Hengshuang Zhao, Zhou Zhao

    Abstract: This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene. To this end, we design a coarse-to-fine pipeline to progressively integrate the two complementary depth sources. First, we intr… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Home page: https://prior-depth-anything.github.io/

  2. arXiv:2505.10425  [pdf, ps, other

    cs.LG

    Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

    Authors: Jingyao Wang, Wenwen Qiang, Zeen Song, Changwen Zheng, Hui Xiong

    Abstract: Large language models (LLMs) excel at complex tasks thanks to advances in reasoning abilities. However, existing methods overlook the trade-off between reasoning effectiveness and computational efficiency, often encouraging unnecessarily long reasoning chains and wasting tokens. To address this, we propose Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LL… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.10257  [pdf, ps, other

    cs.CV

    Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot

    Authors: Hao Lu, Jiaqi Tang, Jiyao Wang, Yunfan LU, Xu Cao, Qingyong Hu, Yin Wang, Yuting Zhang, Tianxin Xie, Yunpeng Zhang, Yong Chen, Jiayu. Gao, Bin Huang, Dengbo He, Shuiguang Deng, Hao Chen, Ying-Cong Chen

    Abstract: The intelligent driving cockpit, an important part of intelligent driving, needs to match different users' comfort, interaction, and safety needs. This paper aims to build a Super-Aligned and GEneralist DRiving agent, SAGE DeeR. Sage Deer achieves three highlights: (1) Super alignment: It achieves different reactions according to different people's preferences and biases. (2) Generalist: It can u… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2505.10083  [pdf, ps, other

    cs.LG

    ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data

    Authors: Chengsen Wang, Qi Qi, Zhongwen Rao, Lujia Pan, Jingyu Wang, Jianxin Liao

    Abstract: Conventional forecasting methods rely on unimodal time series data, limiting their ability to exploit rich textual information. Recently, large language models (LLMs) and time series foundation models (TSFMs) have demonstrated powerful capability in textual reasoning and temporal modeling, respectively. Integrating the strengths of both to construct a multimodal model that concurrently leverages b… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  5. arXiv:2505.10043  [pdf, other

    cs.IR cs.AI

    Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights

    Authors: Yifan Wu, Lutao Yan, Yizhang Zhu, Yinan Mei, Jiannan Wang, Nan Tang, Yuyu Luo

    Abstract: Charts are crucial for data analysis and decision-making.Text-to-chart retrieval systems have become increasingly important for Business Intelligence (BI), where users need to find relevant charts that match their analytical needs. These needs can be categorized into precise queries that are well-specified and fuzzy queries that are more exploratory -- both require understanding the semantics and… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  6. arXiv:2505.10034  [pdf, ps, other

    cs.AI

    The First MPDD Challenge: Multimodal Personality-aware Depression Detection

    Authors: Changzeng Fu, Zelin Fu, Xinhe Kuang, Jiacheng Dong, Qi Zhang, Kaifeng Su, Yikai Su, Wenbo Shi, Junfeng Yao, Yuliang Zhao, Shiqi Zhao, Jiadong Wang, Siyang Song, Chaoran Liu, Yuichiro Yoshikawa, Björn Schuller, Hiroshi Ishiguro

    Abstract: Depression is a widespread mental health issue affecting diverse age groups, with notable prevalence among college students and the elderly. However, existing datasets and detection methods primarily focus on young adults, neglecting the broader age spectrum and individual differences that influence depression manifestation. Current approaches often establish a direct mapping between multimodal da… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted as part of the MPDD Challenge in the ACMMM 2025 Grand Challenge

    MSC Class: 68T07 ACM Class: I.2.0; H.5.1

  7. arXiv:2505.09110  [pdf, ps, other

    cs.CR cs.DC cs.LG

    Toward Malicious Clients Detection in Federated Learning

    Authors: Zhihao Dou, Jiaqi Wang, Wei Sun, Zhuqing Liu, Minghong Fang

    Abstract: Federated learning (FL) enables multiple clients to collaboratively train a global machine learning model without sharing their raw data. However, the decentralized nature of FL introduces vulnerabilities, particularly to poisoning attacks, where malicious clients manipulate their local models to disrupt the training process. While Byzantine-robust aggregation rules have been developed to mitigate… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: To appear in ACM ASIACCS 2025

  8. arXiv:2505.09039  [pdf, ps, other

    cs.CL

    Atomic Consistency Preference Optimization for Long-Form Question Answering

    Authors: Jingfeng Chen, Raghuveer Thirukovalluru, Junlin Wang, Kaiwei Luo, Bhuwan Dhingra

    Abstract: Large Language Models (LLMs) frequently produce factoid hallucinations - plausible yet incorrect answers. A common mitigation strategy is model alignment, which improves factual accuracy by training on curated factual and non-factual pairs. However, this approach often relies on a stronger model (e.g., GPT-4) or an external knowledge base to assess factual correctness, which may not always be acce… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 16 pages, 2 figures

  9. arXiv:2505.08446  [pdf, ps, other

    cs.AI

    Agent-as-a-Service based on Agent Network

    Authors: Yuhan Zhu, Haojie Liu, Jian Wang, Bing Li, Zikang Yin, Yefei Liao

    Abstract: The rise of large model-based AI agents has spurred interest in Multi-Agent Systems (MAS) for their capabilities in decision-making, collaboration, and adaptability. While the Model Context Protocol (MCP) addresses tool invocation and data exchange challenges via a unified protocol, it lacks support for organizing agent-level collaboration. To bridge this gap, we propose Agent-as-a-Service based o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: work in progress

  10. arXiv:2505.08349  [pdf, other

    cs.CV cs.AI

    FAD: Frequency Adaptation and Diversion for Cross-domain Few-shot Learning

    Authors: Ruixiao Shi, Fu Feng, Yucheng Xie, Jing Wang, Xin Geng

    Abstract: Cross-domain few-shot learning (CD-FSL) requires models to generalize from limited labeled samples under significant distribution shifts. While recent methods enhance adaptability through lightweight task-specific modules, they operate solely in the spatial domain and overlook frequency-specific variations that are often critical for robust transfer. We observe that spatially similar images across… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.08318  [pdf, ps, other

    cs.DB

    A Unified Model for Cardinality Estimation by Learning from Data and Queries via Sum-Product Networks

    Authors: Jiawei Liu, Ju Fan, Tongyu Liu, Kai Zeng, Jiannan Wang, Quehuan Liu, Tao Ye, Nan Tang

    Abstract: Cardinality estimation is a fundamental component in database systems, crucial for generating efficient execution plans. Despite advancements in learning-based cardinality estimation, existing methods may struggle to simultaneously optimize the key criteria: estimation accuracy, inference time, and storage overhead, limiting their practical applicability in real-world database environments. This p… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 17 pages, 8 figures

    ACM Class: H.2.4; E.5

  12. arXiv:2505.08294  [pdf, other

    cs.CV

    FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units

    Authors: Jian Wang, Baoyuan Wu, Li Liu, Qingshan Liu

    Abstract: The rapid evolution of generative AI has increased the threat of realistic audio-visual deepfakes, demanding robust detection methods. Existing solutions primarily address unimodal (audio or visual) forgeries but struggle with multimodal manipulations due to inadequate handling of heterogeneous modality features and poor generalization across datasets. To this end, we propose a novel framework cal… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  13. arXiv:2505.08083  [pdf, ps, other

    cs.CY cs.HC

    LLMs to Support K-12 Teachers in Culturally Relevant Pedagogy: An AI Literacy Example

    Authors: Jiayi Wang, Ruiwei Xiao, Xinying Hou, Hanqi Li, Ying Jui Tseng, John Stamper, Ken Koedinger

    Abstract: Culturally Relevant Pedagogy (CRP) is vital in K-12 education, yet teachers struggle to implement CRP into practice due to time, training, and resource gaps. This study explores how Large Language Models (LLMs) can address these barriers by introducing CulturAIEd, an LLM tool that assists teachers in adapting AI literacy curricula to students' cultural contexts. Through an exploratory pilot with f… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  14. arXiv:2505.07967  [pdf, ps, other

    stat.ML cs.LG

    Wasserstein Distributionally Robust Nonparametric Regression

    Authors: Changyu Liu, Yuling Jiao, Junhui Wang, Jian Huang

    Abstract: Distributionally robust optimization has become a powerful tool for prediction and decision-making under model uncertainty. By focusing on the local worst-case risk, it enhances robustness by identifying the most unfavorable distribution within a predefined ambiguity set. While extensive research has been conducted in parametric settings, studies on nonparametric frameworks remain limited. This pa… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 50 pages

    MSC Class: 62G05; 62G08; 68T07

  15. arXiv:2505.07879  [pdf, ps, other

    cs.IR cs.AI cs.CV

    OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval

    Authors: Wei Yang, Jingjing Fu, Rui Wang, Jinyu Wang, Lei Song, Jiang Bian

    Abstract: Vision-language retrieval-augmented generation (RAG) has become an effective approach for tackling Knowledge-Based Visual Question Answering (KB-VQA), which requires external knowledge beyond the visual content presented in images. The effectiveness of Vision-language RAG systems hinges on multimodal retrieval, which is inherently challenging due to the diverse modalities and knowledge granulariti… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 19 pages, 6 figures, 17 tables

  16. arXiv:2505.07858  [pdf, other

    cs.CL cs.AI

    Scaling Laws for Speculative Decoding

    Authors: Siyuan Yan, Mo Zhu, Guo-qing Jiang, Jianfei Wang, Jiaxing Chen, Wentai Zhang, Xiang Liao, Xiao Cui, Chen Zhang, Zhuoran Song, Ran Zhu

    Abstract: The escalating demand for efficient decoding in large language models (LLMs) is particularly critical for reasoning-intensive architectures like OpenAI-o3 and DeepSeek-R1, which depend on extended chain-of-thought reasoning. This study investigates speculative decoding techniques through dense LLM architectures to establish foundational insights for accelerating reasoning tasks. While speculative… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 17 pages, 8 figures

  17. arXiv:2505.07845  [pdf, other

    cs.RO

    PierGuard: A Planning Framework for Underwater Robotic Inspection of Coastal Piers

    Authors: Pengyu Wang, Hin Wang Lin, Jialu Li, Jiankun Wang, Ling Shi, Max Q. -H. Meng

    Abstract: Using underwater robots instead of humans for the inspection of coastal piers can enhance efficiency while reducing risks. A key challenge in performing these tasks lies in achieving efficient and rapid path planning within complex environments. Sampling-based path planning methods, such as Rapidly-exploring Random Tree* (RRT*), have demonstrated notable performance in high-dimensional spaces. In… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  18. arXiv:2505.07839  [pdf

    eess.IV cs.AI

    Sub-diffraction terahertz backpropagation compressive imaging

    Authors: Yongsheng Zhu, Shaojing Liu, Ximiao Wang, Runli Li, Haili Yang, Jiali Wang, Hongjia Zhu, Yanlin Ke, Ningsheng Xu, Huanjun Chen, Shaozhi Deng

    Abstract: Terahertz single-pixel imaging (TSPI) has garnered significant attention due to its simplicity and cost-effectiveness. However, the relatively long wavelength of THz waves limits sub-diffraction-scale imaging resolution. Although TSPI technique can achieve sub-wavelength resolution, it requires harsh experimental conditions and time-consuming processes. Here, we propose a sub-diffraction THz backp… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  19. arXiv:2505.07808  [pdf

    cs.RO

    AcoustoBots: A swarm of robots for acoustophoretic multimodal interactions

    Authors: Narsimlu Kemsaram, James Hardwick, Jincheng Wang, Bonot Gautam, Ceylan Besevli, Giorgos Christopoulos, Sourabh Dogra, Lei Gao, Akin Delibasi, Diego Martinez Plasencia, Orestis Georgiou, Marianna Obrist, Ryuji Hirayama, Sriram Subramanian

    Abstract: Acoustophoresis has enabled novel interaction capabilities, such as levitation, volumetric displays, mid-air haptic feedback, and directional sound generation, to open new forms of multimodal interactions. However, its traditional implementation as a singular static unit limits its dynamic range and application versatility. This paper introduces AcoustoBots - a novel convergence of acoustophoresis… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  20. arXiv:2505.07555  [pdf, ps, other

    cs.IT eess.SP

    Energy-Efficient Resource Allocation for NOMA-Assisted Uplink Pinching-Antenna Systems

    Authors: Ming Zeng, Xingwang Li, Ji Wang, Gaojian Huang, Octavia A. Dobre, Zhiguo Ding

    Abstract: The pinching-antenna architecture has emerged as a promising solution for reconfiguring wireless propagation environments and enhancing system performance. While prior research has primarily focused on sum-rate maximization or transmit power minimization of pinching-antenna systems, the critical aspect of energy efficiency (EE) has received limited attention. Given the increasing importance of EE… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: submitted IEEE WCL; 4 figures; 5 pages

  21. arXiv:2505.07396  [pdf, ps, other

    cs.CV cs.LG

    TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset

    Authors: Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna, Mathias Pechinger, Zhaiyu Chen, Yao Sun, Alejandro Rueda Segura, Ziyang Xu, Omar AbdelGafar, Mansour Mehranfar, Chandan Yeshwanth, Yueh-Cheng Liu, Hadi Yazdi, Jiapan Wang, Stefan Auer, Katharina Anders, Klaus Bogenberger, Andre Borrmann , et al. (9 additional authors not shown)

    Abstract: Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually… ▽ More

    Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing

  22. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  23. arXiv:2505.06918  [pdf, other

    eess.IV cs.CV cs.LG

    Uni-AIMS: AI-Powered Microscopy Image Analysis

    Authors: Yanhui Hong, Nan Wang, Zhiyi Xia, Haoyi Tao, Xi Fang, Yiming Li, Jiankun Wang, Peng Jin, Xiaochen Cai, Shengyu Li, Ziqi Chen, Zezhong Zhang, Guolin Ke, Linfeng Zhang

    Abstract: This paper presents a systematic solution for the intelligent recognition and automatic analysis of microscopy images. We developed a data engine that generates high-quality annotated datasets through a combination of the collection of diverse microscopy images from experiments, synthetic data generation and a human-in-the-loop annotation process. To address the unique challenges of microscopy ima… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  24. arXiv:2505.06903  [pdf, ps, other

    cs.CV

    CheXLearner: Text-Guided Fine-Grained Representation Learning for Progression Detection

    Authors: Yuanzhuo Wang, Junwen Duan, Xinyu Li, Jianxin Wang

    Abstract: Temporal medical image analysis is essential for clinical decision-making, yet existing methods either align images and text at a coarse level - causing potential semantic mismatches - or depend solely on visual information, lacking medical semantic integration. We present CheXLearner, the first end-to-end framework that unifies anatomical region detection, Riemannian manifold-based structure alig… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  25. arXiv:2505.06880  [pdf, ps, other

    cs.SE

    Benchmarking and Revisiting Code Generation Assessment: A Mutation-Based Approach

    Authors: Longtian Wang, Tianlin Li, Xiaofei Xie, Yuhan Zhi, Jian Wang, Chao Shen

    Abstract: Code Large Language Models (CLLMs) have exhibited outstanding performance in program synthesis, attracting the focus of the research community. The evaluation of CLLM's program synthesis capability has generally relied on manually curated benchmarks. However, there is a substantial gap between real-world scenarios and benchmark settings. Existing benchmarks typically provide only a single input pr… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 8 pages, 4 figures

  26. arXiv:2505.06791  [pdf, ps, other

    cs.RO

    cpRRTC: GPU-Parallel RRT-Connect for Constrained Motion Planning

    Authors: Jiaming Hu, Jiawei Wang, Henrik Christensen

    Abstract: Motion planning is a fundamental problem in robotics that involves generating feasible trajectories for a robot to follow. Recent advances in parallel computing, particularly through CPU and GPU architectures, have significantly reduced planning times to the order of milliseconds. However, constrained motion planning especially using sampling based methods on GPUs remains underexplored. Prior work… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  27. arXiv:2505.06702  [pdf, ps, other

    cs.HC

    Do Language Model Agents Align with Humans in Rating Visualizations? An Empirical Study

    Authors: Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong, Zhang, Yu Zhang, Siming Chen

    Abstract: Large language models encode knowledge in various domains and demonstrate the ability to understand visualizations. They may also capture visualization design knowledge and potentially help reduce the cost of formative studies. However, it remains a question whether large language models are capable of predicting human feedback on visualizations. To investigate this question, we conducted three st… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 14 pages, 8 figures

  28. arXiv:2505.06524  [pdf, ps, other

    cs.CV

    Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation

    Authors: Jingyao Wang, Jianqi Zhang, Wenwen Qiang, Changwen Zheng

    Abstract: Despite the strength of the Segment Anything Model (SAM), it struggles with generalization issues in open-vocabulary multi-entity segmentation (OVMS). Through empirical and causal analyses, we find that (i) the prompt bias is the primary cause of the generalization issues; (ii) this bias is closely tied to the task-irrelevant generating factors within the prompts, which act as confounders and affe… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  29. arXiv:2505.06520  [pdf, other

    cs.LG cs.AI cs.CR

    PRUNE: A Patching Based Repair Framework for Certiffable Unlearning of Neural Networks

    Authors: Xuran Li, Jingyi Wang, Xiaohan Yuan, Peixin Zhang, Zhan Qin, Zhibo Wang, Kui Ren

    Abstract: It is often desirable to remove (a.k.a. unlearn) a speciffc part of the training data from a trained neural network model. A typical application scenario is to protect the data holder's right to be forgotten, which has been promoted by many recent regulation rules. Existing unlearning methods involve training alternative models with remaining data, which may be costly and challenging to verify fro… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  30. arXiv:2505.06497  [pdf, ps, other

    cs.LG

    FedADP: Unified Model Aggregation for Federated Learning with Heterogeneous Model Architectures

    Authors: Jiacheng Wang, Hongtao Lv, Lei Liu

    Abstract: Traditional Federated Learning (FL) faces significant challenges in terms of efficiency and accuracy, particularly in heterogeneous environments where clients employ diverse model architectures and have varying computational resources. Such heterogeneity complicates the aggregation process, leading to performance bottlenecks and reduced model generalizability. To address these issues, we propose F… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  31. arXiv:2505.06240  [pdf, ps, other

    eess.SP cs.IT

    Pinching-Antenna Assisted Simultaneous Wireless Information and Power Transfer

    Authors: Yixuan Li, Ji Wang, Yuanwei Liu, Zhiguo Ding

    Abstract: This letter introduces a novel pinching-antenna-system (PASS) assisted simultaneous wireless information and power transfer (SWIPT), where multiple pinching antennas (PAs) are strategically activiated on a waveguide to facilitate information transmission to multiple information receivers (IRs) and power transfer to multiple energy receivers (ERs) simultaneously. Leveraging the single-waveguide arc… ▽ More

    Submitted 26 April, 2025; originally announced May 2025.

  32. arXiv:2505.05950  [pdf, ps, other

    cs.LG

    FloE: On-the-Fly MoE Inference on Memory-constrained GPU

    Authors: Yuxin Zhou, Zheng Li, Jun Zhang, Jue Wang, Yiping Wang, Zhongle Xie, Ke Chen, Lidan Shou

    Abstract: With the widespread adoption of Mixture-of-Experts (MoE) models, there is a growing demand for efficient inference on memory-constrained devices. While offloading expert parameters to CPU memory and loading activated experts on demand has emerged as a potential solution, the large size of activated experts overburdens the limited PCIe bandwidth, hindering the effectiveness in latency-sensitive sce… ▽ More

    Submitted 11 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  33. arXiv:2505.05901  [pdf, other

    cs.CV cs.AI

    Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection

    Authors: Hanzhe Liang, Aoran Wang, Jie Zhou, Xin Jin, Can Gao, Jinbao Wang

    Abstract: In this paper, we explore a novel approach to 3D anomaly detection (AD) that goes beyond merely identifying anomalies based on structural characteristics. Our primary perspective is that most anomalies arise from unpredictable defective forces originating from both internal and external sources. To address these anomalies, we seek out opposing forces that can help correct them. Therefore, we intro… ▽ More

    Submitted 15 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: 26 pages

  34. arXiv:2505.05834  [pdf, other

    cs.CV

    Dual-level Fuzzy Learning with Patch Guidance for Image Ordinal Regression

    Authors: Chunlai Dong, Haochao Ying, Qibo Qiu, Jinhong Wang, Danny Chen, Jian Wu

    Abstract: Ordinal regression bridges regression and classification by assigning objects to ordered classes. While human experts rely on discriminative patch-level features for decisions, current approaches are limited by the availability of only image-level ordinal labels, overlooking fine-grained patch-level characteristics. In this paper, we propose a Dual-level Fuzzy Learning with Patch Guidance framewor… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  35. arXiv:2505.05804  [pdf, other

    cs.CV

    Describe Anything in Medical Images

    Authors: Xi Xiao, Yunbei Zhang, Thanh-Huy Nguyen, Ba-Thinh Lam, Janet Wang, Jihun Hamm, Tianyang Wang, Xingjian Li, Xiao Wang, Hao Xu, Tianming Liu, Min Xu

    Abstract: Localized image captioning has made significant progress with models like the Describe Anything Model (DAM), which can generate detailed region-specific descriptions without explicit region-text supervision. However, such capabilities have yet to be widely applied to specialized domains like medical imaging, where diagnostic interpretation relies on subtle regional findings rather than global unde… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  36. A Day in Their Shoes: Using LLM-Based Perspective-Taking Interactive Fiction to Reduce Stigma Toward Dirty Work

    Authors: Xiangzhe Yuan, Jiajun Wang, Qian Wan, Siying Hu

    Abstract: Occupations referred to as "dirty work" often face entrenched social stigma, which adversely affects the mental health of workers in these fields and impedes occupational equity. In this study, we propose a novel Interactive Fiction (IF) framework powered by Large Language Models (LLMs) to encourage perspective-taking and reduce biases against these stigmatized yet essential roles. Through an expe… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Conference paper for FAccT '25

  37. arXiv:2505.05599  [pdf, ps, other

    cs.CV cs.AI

    Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling

    Authors: Seraj Al Mahmud Mostafa, Chenxi Wang, Jia Yue, Yuta Hozumi, Jianwu Wang

    Abstract: Object localization in satellite imagery is particularly challenging due to the high variability of objects, low spatial resolution, and interference from noise and dominant features such as clouds and city lights. In this research, we focus on three satellite datasets: upper atmospheric Gravity Waves (GW), mesospheric Bores (Bore), and Ocean Eddies (OE), each presenting its own unique challenges.… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted to International conference on Advanced Machine Learning and Data Science (AMLDS) 2025

  38. arXiv:2505.05517  [pdf, other

    cs.CV cs.LG cs.RO

    Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions

    Authors: Hongyi Chen, Yunchao Yao, Yufei Ye, Zhixuan Xu, Homanga Bharadhwaj, Jiashun Wang, Shubham Tulsiani, Zackory Erickson, Jeffrey Ichnowski

    Abstract: Functional grasp is essential for enabling dexterous multi-finger robot hands to manipulate objects effectively. However, most prior work either focuses on power grasping, which simply involves holding an object still, or relies on costly teleoperated robot demonstrations to teach robots how to grasp each object functionally. Instead, we propose extracting human grasp information from web images s… ▽ More

    Submitted 12 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  39. arXiv:2505.05512  [pdf, other

    cs.CV cs.RO

    Occupancy World Model for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Jingkai Sun, Jiahang Cao, Jiaxu Wang, Hao Cheng, Xiaozhu Ju, Zhengping Che, Renjing Xu, Jian Tang

    Abstract: Understanding and forecasting the scene evolutions deeply affect the exploration and decision of embodied agents. While traditional methods simulate scene evolutions through trajectory prediction of potential instances, current works use the occupancy world model as a generative framework for describing fine-grained overall scene dynamics. However, existing methods cluster on the outdoor structure… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  40. arXiv:2505.04921  [pdf, other

    cs.CV cs.CL

    Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

    Authors: Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang

    Abstract: Reasoning lies at the heart of intelligence, shaping the ability to make decisions, draw conclusions, and generalize across domains. In artificial intelligence, as systems increasingly operate in open, uncertain, and multimodal environments, reasoning becomes essential for enabling robust and adaptive behavior. Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integra… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 75 Pages,10 figures; Project: https://github.com/HITsz-TMG/Awesome-Large-Multimodal-Reasoning-Models

  41. arXiv:2505.04892  [pdf, other

    cs.DS

    PSSketch: Finding Persistent and Sparse Flow with High Accuracy and Efficiency

    Authors: Jiayao Wang, Qilong Shi, Xiyan Liang, Han Wang, Wenjun Li, Ziling Wei, Weizhe Zhang, Shuhui Chen

    Abstract: Finding persistent sparse (PS) flow is critical to early warning of many threats. Previous works have predominantly focused on either heavy or persistent flows, with limited attention given to PS flows. Although some recent studies pay attention to PS flows, they struggle to establish an objective criterion due to insufficient data-driven observations, resulting in reduced accuracy. In this paper,… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  42. arXiv:2505.04410  [pdf, other

    cs.CV

    DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

    Authors: Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian

    Abstract: Dense visual prediction tasks have been constrained by their reliance on predefined categories, limiting their applicability in real-world scenarios where visual concepts are unbounded. While Vision-Language Models (VLMs) like CLIP have shown promise in open-vocabulary tasks, their direct application to dense prediction often leads to suboptimal performance due to limitations in local feature repr… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  43. arXiv:2505.04396  [pdf, other

    cs.LG physics.ao-ph

    Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast

    Authors: Jingnan Wang, Jie Chao, Shangshang Yang, Congyi Nai, Kaijun Ren, Kefeng Deng, Xi Chen, Yaxin Liu, Hanqiuzi Wen, Ziniu Xiao, Lifeng Zhang, Xiaodong Wang, Jiping Guan, Baoxiang Pan

    Abstract: The planning and operation of renewable energy, especially wind power, depend crucially on accurate, timely, and high-resolution weather information. Coarse-grid global numerical weather forecasts are typically downscaled to meet these requirements, introducing challenges of scale inconsistency, process representation error, computation cost, and entanglement of distinct uncertainty sources from c… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  44. arXiv:2505.04098  [pdf, other

    cs.NI eess.SP

    Satellite-Assisted Low-Altitude Economy Networking: Concepts, Applications, and Opportunities

    Authors: Shizhao He, Jiacheng Wang, Ying-Chang Liang, Geng Sun, Dusit Niyato

    Abstract: The low-altitude economy (LAE) is a new economic paradigm that leverages low-altitude vehicles (LAVs) to perform diverse missions across diverse areas. To support the operations of LAE, it is essential to establish LAE networks that enable LAV management and communications.Existing studies mainly reuse terrestrial networks to construct LAE networks. However, the limited coverage of terrestrial net… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures

  45. arXiv:2505.04084  [pdf, other

    cs.SE cs.AI

    An Empirical Study of OpenAI API Discussions on Stack Overflow

    Authors: Xiang Chen, Jibin Wang, Chaoyang Gao, Xiaolin Ju, Zhanqi Cui

    Abstract: The rapid advancement of large language models (LLMs), represented by OpenAI's GPT series, has significantly impacted various domains such as natural language processing, software development, education, healthcare, finance, and scientific research. However, OpenAI APIs introduce unique challenges that differ from traditional APIs, such as the complexities of prompt engineering, token-based cost m… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  46. arXiv:2505.04003  [pdf, ps, other

    eess.IV cs.CV

    Prototype-Based Information Compensation Network for Multi-Source Remote Sensing Data Classification

    Authors: Feng Gao, Sheng Liu, Chuanzheng Gong, Xiaowei Zhou, Jiayi Wang, Junyu Dong, Qian Du

    Abstract: Multi-source remote sensing data joint classification aims to provide accuracy and reliability of land cover classification by leveraging the complementary information from multiple data sources. Existing methods confront two challenges: inter-frequency multi-source feature coupling and inconsistency of complementary information exploration. To solve these issues, we present a Prototype-based Info… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by IEEE TGRS 2025

  47. arXiv:2505.03822  [pdf

    cs.LG cs.AI

    DRSLF: Double Regularized Second-Order Low-Rank Representation for Web Service QoS Prediction

    Authors: Hao Wu, Jialiang Wang

    Abstract: Quality-of-Service (QoS) data plays a crucial role in cloud service selection. Since users cannot access all services, QoS can be represented by a high-dimensional and incomplete (HDI) matrix. Latent factor analysis (LFA) models have been proven effective as low-rank representation techniques for addressing this issue. However, most LFA models rely on first-order optimizers and use L2-norm regular… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  48. arXiv:2505.03667  [pdf, other

    cs.CV

    Distribution-Conditional Generation: From Class Distribution to Creative Generation

    Authors: Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng

    Abstract: Text-to-image (T2I) diffusion models are effective at producing semantically aligned images, but their reliance on training data distributions limits their ability to synthesize truly novel, out-of-distribution concepts. Existing methods typically enhance creativity by combining pairs of known concepts, yielding compositions that, while out-of-distribution, remain linguistically describable and bo… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  49. arXiv:2505.03533  [pdf, other

    cs.LG

    Small-Scale-Fading-Aware Resource Allocation in Wireless Federated Learning

    Authors: Jiacheng Wang, Le Liang, Hao Ye, Chongtao Guo, Shi Jin

    Abstract: Judicious resource allocation can effectively enhance federated learning (FL) training performance in wireless networks by addressing both system and statistical heterogeneity. However, existing strategies typically rely on block fading assumptions, which overlooks rapid channel fluctuations within each round of FL gradient uploading, leading to a degradation in FL training performance. Therefore,… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  50. arXiv:2505.03467  [pdf

    cs.CL

    Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis

    Authors: Shuang Zhou, Jiashuo Wang, Zidu Xu, Song Wang, David Brauer, Lindsay Welton, Jacob Cogan, Yuen-Hei Chung, Lei Tian, Zaifu Zhan, Yu Hou, Mingquan Lin, Genevieve B. Melton, Rui Zhang

    Abstract: Explainable disease diagnosis, which leverages patient information (e.g., signs and symptoms) and computational models to generate probable diagnoses and reasonings, offers clear clinical values. However, when clinical notes encompass insufficient evidence for a definite diagnosis, such as the absence of definitive symptoms, diagnostic uncertainty usually arises, increasing the risk of misdiagnosi… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 22 pages, 8 figures