Skip to main content

Showing 1–50 of 72,084 results for author: Wang

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18330  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

    Authors: Lixin Wu, Na Cai, Qiao Cheng, Jiachen Wang, Yitao Duan

    Abstract: We introduce Confucius3-Math, an open-source large language model with 14B parameters that (1) runs efficiently on a single consumer-grade GPU; (2) achieves SOTA performances on a range of mathematical reasoning tasks, outperforming many models with significantly larger sizes. In particular, as part of our mission to enhancing education and knowledge dissemination with AI, Confucius3-Math is speci… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  2. arXiv:2506.18309  [pdf, ps, other

    cs.IR cs.AI

    LettinGo: Explore User Profile Generation for Recommendation System

    Authors: Lu Wang, Di Zhang, Fangkai Yang, Pu Zhao, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Qingwei Lin, Weiwei Deng, Dongmei Zhang, Feng Sun, Qi Zhang

    Abstract: User profiling is pivotal for recommendation systems, as it transforms raw user interaction data into concise and structured representations that drive personalized recommendations. While traditional embedding-based profiles lack interpretability and adaptability, recent advances with large language models (LLMs) enable text-based profiles that are semantically richer and more transparent. However… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 11 pages, 3 figures

  3. arXiv:2506.18308  [pdf

    cs.HC

    Supporting Car-Following Behavior through V2V-Based Beyond-Visual-Range Information Display

    Authors: Feiqi Gu, Zhixiong Wang, Zhenyu Wang, Dengbo He

    Abstract: Rear-end collisions constituted a large portion of crashes on the road, despite efforts to mitigate rear-end collisions, such as forward collision warnings. The chance of rear-end collisions is closely related to drivers' car-following (CF) behaviors in the traffic flow. Given that drivers may rely on more than the information of the direct lead vehicle (DLV) when making CF decisions, expanding dr… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  4. arXiv:2506.18254  [pdf, ps, other

    cs.LG cs.AI cs.CL

    RLPR: Extrapolating RLVR to General Domains without Verifiers

    Authors: Tianyu Yu, Bo Ji, Shouli Wang, Shu Yao, Zefan Wang, Ganqu Cui, Lifan Yuan, Ning Ding, Yuan Yao, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates promising potential in advancing the reasoning capabilities of LLMs. However, its success remains largely confined to mathematical and code domains. This primary limitation stems from the heavy reliance on domain-specific verifiers, which results in prohibitive complexity and limited scalability. To address the challenge, our key o… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Website: https://github.com/openbmb/RLPR

  5. arXiv:2506.18246  [pdf, ps, other

    cs.CV

    Referring Expression Instance Retrieval and A Strong End-to-End Baseline

    Authors: Xiangzhao Hao, Kuan Zhu, Hongyu Guo, Haiyun Guo, Ming Tang, JinQiao Wang

    Abstract: Natural language querying of visual content underpins many vision-language tasks, typically categorized by text granularity and visual search scope. Text-Image Retrieval (TIR) retrieves whole images using coarse descriptions, while Referring Expression Comprehension (REC) localizes objects using fine-grained expressions within a single image. However, real-world scenarios often require both instan… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  6. arXiv:2506.18240  [pdf, ps, other

    cs.LG cs.AI physics.optics

    Quantum-Classical Hybrid Quantized Neural Network

    Authors: Wenxin Li, Chuan Wang, Hongdong Zhu, Qi Gao, Yin Ma, Hai Wei, Kai Wen

    Abstract: Here in this work, we present a novel Quadratic Binary Optimization (QBO) model for quantized neural network training, enabling the use of arbitrary activation and loss functions through spline interpolation. We introduce Forward Interval Propagation (FIP), a method designed to tackle the challenges of non-linearity and the multi-layer composite structure in neural networks by discretizing activat… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 30 pages, 5 figures, comments are welcome

  7. arXiv:2506.18237  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AdapThink: Adaptive Thinking Preferences for Reasoning Language Model

    Authors: Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun

    Abstract: Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking'' paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mech… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  8. arXiv:2506.18233  [pdf, ps, other

    cs.AI

    The 4th Dimension for Scaling Model Size

    Authors: Ruike Zhu, Hanwen Zhang, Tianyu Shi, Chi Wang, Tianyi Zhou, Zengyi Qin

    Abstract: Scaling the size of large language models typically involves three dimensions: depth, width, and the number of parameters. In this work, we explore a fourth dimension, virtual logical depth (VLD), which increases the effective algorithmic depth without changing the overall parameter count by reusing parameters within the model. Although parameter reuse is not a new concept, its potential and chara… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  9. arXiv:2506.18204  [pdf, ps, other

    cs.CV cs.AI

    Multimodal Fusion SLAM with Fourier Attention

    Authors: Youjie Zhou, Guofeng Mei, Yiming Wang, Yi Wan, Fabio Poiesi

    Abstract: Visual SLAM is particularly challenging in environments affected by noise, varying lighting conditions, and darkness. Learning-based optical flow algorithms can leverage multiple modalities to address these challenges, but traditional optical flow-based visual SLAM approaches often require significant computational resources.To overcome this limitation, we propose FMF-SLAM, an efficient multimodal… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  10. arXiv:2506.18178  [pdf, ps, other

    cs.RO

    Integrating LLMs and Digital Twins for Adaptive Multi-Robot Task Allocation in Construction

    Authors: Min Deng, Bo Fu, Lingyao Li, Xi Wang

    Abstract: Multi-robot systems are emerging as a promising solution to the growing demand for productivity, safety, and adaptability across industrial sectors. However, effectively coordinating multiple robots in dynamic and uncertain environments, such as construction sites, remains a challenge, particularly due to unpredictable factors like material delays, unexpected site conditions, and weather-induced d… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  11. arXiv:2506.18172  [pdf, ps, other

    cs.CV cs.AI

    STACT-Time: Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification

    Authors: Irsyad Adam, Tengyue Zhang, Shrayes Raman, Zhuyu Qiu, Brandon Taraku, Hexiang Feng, Sile Wang, Ashwath Radhachandran, Shreeram Athreya, Vedrana Ivezic, Peipei Ping, Corey Arnold, William Speier

    Abstract: Thyroid cancer is among the most common cancers in the United States. Thyroid nodules are frequently detected through ultrasound (US) imaging, and some require further evaluation via fine-needle aspiration (FNA) biopsy. Despite its effectiveness, FNA often leads to unnecessary biopsies of benign nodules, causing patient discomfort and anxiety. To address this, the American College of Radiology Thy… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  12. arXiv:2506.18145  [pdf, ps, other

    cs.LG cs.AI

    Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection

    Authors: Zheng Zhan, Liliang Ren, Shuohang Wang, Liyuan Liu, Yang Liu, Yeyun Gong, Yanzhi Wang, Yelong Shen

    Abstract: Linear State Space Models (SSMs) offer remarkable performance gains in efficient sequence modeling, with constant inference-time computation and memory complexity. Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the ex… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  13. arXiv:2506.18134  [pdf, ps, other

    cs.CV

    Targeted False Positive Synthesis via Detector-guided Adversarial Diffusion Attacker for Robust Polyp Detection

    Authors: Quan Zhou, Gan Luo, Qiang Hu, Qingyong Zhang, Jinhua Zhang, Yinjiao Tian, Qiang Li, Zhiwei Wang

    Abstract: Polyp detection is crucial for colorectal cancer screening, yet existing models are limited by the scale and diversity of available data. While generative models show promise for data augmentation, current methods mainly focus on enhancing polyp diversity, often overlooking the critical issue of false positives. In this paper, we address this gap by proposing an adversarial diffusion framework to… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Early Accepted by MICCAI 2025

  14. arXiv:2506.18123  [pdf, ps, other

    cs.RO cs.LG

    RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies

    Authors: Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, Jonathan Tremblay, Kanav Arora, Kirsty Ellis, Luca Macesanu, Matthew Leonard, Meedeum Cho, Ozgur Aslan, Shivin Dass, Jie Wang, Xingfang Yuan, Xuning Yang, Abhishek Gupta, Dinesh Jayaraman, Glen Berseth, Kostas Daniilidis , et al. (5 additional authors not shown)

    Abstract: Comprehensive, unbiased, and comparable evaluation of modern generalist policies is uniquely challenging: existing approaches for robot benchmarking typically rely on heavy standardization, either by specifying fixed evaluation tasks and environments, or by hosting centralized ''robot challenges'', and do not readily scale to evaluating generalist policies across a broad range of tasks and environ… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Website: https://robo-arena.github.io/

  15. CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study

    Authors: Tingrui Zhang, Honglin Wu, Zekun Jiang, Yingying Wang, Rui Ye, Huiming Ni, Chang Liu, Jin Cao, Xuan Sun, Rong Shao, Xiaorong Wei, Yingchun Sun

    Abstract: Aimed to develop and validate a CT radiomics-based explainable machine learning model for diagnosing malignancy and benignity specifically in endometrial cancer (EC) patients. A total of 83 EC patients from two centers, including 46 with malignant and 37 with benign conditions, were included, with data split into a training set (n=59) and a testing set (n=24). The regions of interest (ROIs) were m… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 30 pages, 5 figures, 3 tables

  16. arXiv:2506.18102  [pdf, ps, other

    cs.CL

    InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating

    Authors: Fuyu Wang, Jiangtong Li, Kun Zhu, Changjun Jiang

    Abstract: With the rapid advancements in large language models (LLMs), debating tasks, such as argument quality assessment and debate process simulation, have made significant progress. However, existing LLM-based debating systems focus on responding to specific arguments while neglecting objective assessments such as authenticity and logical validity. Furthermore, these systems lack a structured approach t… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 20 pages; Accepted to ACL 2025 Main

  17. arXiv:2506.18096  [pdf, ps, other

    cs.AI

    Deep Research Agents: A Systematic Examination And Roadmap

    Authors: Yuxuan Huang, Yihang Chen, Haozheng Zhang, Kang Li, Meng Fang, Linyi Yang, Xiaoguang Li, Lifeng Shang, Songcen Xu, Jianye Hao, Kun Shao, Jun Wang

    Abstract: The rapid progress of Large Language Models (LLMs) has given rise to a new category of autonomous AI systems, referred to as Deep Research (DR) agents. These agents are designed to tackle complex, multi-turn informational research tasks by leveraging a combination of dynamic reasoning, adaptive long-horizon planning, multi-hop information retrieval, iterative tool use, and the generation of struct… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  18. arXiv:2506.18095  [pdf, ps, other

    cs.CV cs.AI cs.LG

    ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

    Authors: Junying Chen, Zhenyang Cai, Pengcheng Chen, Shunian Chen, Ke Ji, Xidong Wang, Yunjin Yang, Benyou Wang

    Abstract: Recent advances in multimodal generative models have unlocked photorealistic, instruction-aligned image generation, yet leading systems like GPT-4o-Image remain proprietary and inaccessible. To democratize these capabilities, we present ShareGPT-4o-Image, the first dataset comprising 45K text-to-image and 46K text-and-image-to-image data, all synthesized using GPT-4o's image generation capabilitie… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  19. arXiv:2506.18088  [pdf, ps, other

    cs.RO cs.AI cs.CL cs.CV cs.MA

    RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    Authors: Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo , et al. (1 additional authors not shown)

    Abstract: Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Page: https://robotwin-platform.github.io/

  20. arXiv:2506.18084  [pdf, ps, other

    cs.CV

    TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving

    Authors: Wenzhuo Liu, Yicheng Qiao, Zhen Wang, Qiannan Guo, Zilong Chen, Meihua Zhou, Xinran Li, Letian Wang, Zhiwei Li, Huaping Liu, Wenshuo Wang

    Abstract: Multi-task learning (MTL) can advance assistive driving by exploring inter-task correlations through shared representations. However, existing methods face two critical limitations: single-modality constraints limiting comprehensive scene understanding and inefficient architectures impeding real-time deployment. This paper proposes TEM^3-Learning (Time-Efficient Multimodal Multi-task Learning), a… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  21. arXiv:2506.18071  [pdf, ps, other

    cs.CV cs.AI

    MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering

    Authors: Jisheng Dang, Huilin Song, Junbin Xiao, Bimei Wang, Han Peng, Haoxuan Li, Xun Yang, Meng Wang, Tat-Seng Chua

    Abstract: Grounded Video Question Answering (Grounded VideoQA) requires aligning textual answers with explicit visual evidence. However, modern multimodal models often rely on linguistic priors and spurious correlations, resulting in poorly grounded predictions. In this work, we propose MUPA, a cooperative MUlti-Path Agentic approach that unifies video grounding, question answering, answer reflection and ag… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  22. arXiv:2506.18067  [pdf, ps, other

    eess.SP cs.IT

    Cooperative Bistatic ISAC Systems for Low-Altitude Economy

    Authors: Zhenkun Zhang, Yining Xu, Cunhua Pan, Hong Ren, Yiming Yu, Jiangzhou Wang

    Abstract: The burgeoning low-altitude economy (LAE) necessitates integrated sensing and communication (ISAC) systems capable of high-accuracy multi-target localization and velocity estimation under hardware and coverage constraints inherent in conventional ISAC architectures. This paper addresses these challenges by proposing a cooperative bistatic ISAC framework within MIMO-OFDM cellular networks, enabling… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  23. arXiv:2506.18052  [pdf, ps, other

    cs.SI

    A Survey on False Information Detection: From A Perspective of Propagation on Social Networks

    Authors: Kun Xie, Sibo Wang

    Abstract: The proliferation of false information in the digital age has become a pressing concern, necessitating the development of effective and robust detection methods. This paper offers a comprehensive review of existing false information detection techniques, approached from a novel perspective that emphasizes the propagation characteristics of misinformation. We introduce a new taxonomy that categoriz… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  24. arXiv:2506.18050  [pdf, ps, other

    cs.SE

    VFArchē: A Dual-Mode Framework for Locating Vulnerable Functions in Open-Source Software

    Authors: Lyuye Zhang, Jian Zhang, Kaixuan Li, Chong Wang, Chengwei Liu, Jiahui Wu, Sen Chen, Yaowen Zheng, Yang Liu

    Abstract: Software Composition Analysis (SCA) has become pivotal in addressing vulnerabilities inherent in software project dependencies. In particular, reachability analysis is increasingly used in Open-Source Software (OSS) projects to identify reachable vulnerabilities (e.g., CVEs) through call graphs, enabling a focus on exploitable risks. Performing reachability analysis typically requires the vulnerab… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 15 pages

  25. arXiv:2506.18048  [pdf, ps, other

    cs.CV

    CLGRPO: Reasoning Ability Enhancement for Small VLMs

    Authors: Fanyi Wang, Binzhi Dong, Haotian Hu, Jinjin Xu, Zhiwang Zhang

    Abstract: Small Vision Language Models (SVLMs) generally refer to models with parameter sizes less than or equal to 2B. Their low cost and power consumption characteristics confer high commercial value. However, their reasoning abilities are limited by the number of parameters. To address this issue, this paper proposes a post-training optimization paradigm called the Incremental Training Strategy to enhanc… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 11 pages, 5 figures

  26. arXiv:2506.18042  [pdf, ps, other

    cs.CV

    CmFNet: Cross-modal Fusion Network for Weakly-supervised Segmentation of Medical Images

    Authors: Dongdong Meng, Sheng Li, Hao Wu, Suqing Tian, Wenjun Ma, Guoping Wang, Xueqing Yan

    Abstract: Accurate automatic medical image segmentation relies on high-quality, dense annotations, which are costly and time-consuming. Weakly supervised learning provides a more efficient alternative by leveraging sparse and coarse annotations instead of dense, precise ones. However, segmentation performance degradation and overfitting caused by sparse annotations remain key challenges. To address these is… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 10 pages, 6 figures

  27. arXiv:2506.18028  [pdf, ps, other

    cs.CV

    MiCo: Multiple Instance Learning with Context-Aware Clustering for Whole Slide Image Analysis

    Authors: Junjian Li, Hulin Kuang, Jin Liu, Hailin Yue, Mengshen He, Jianxin Wang

    Abstract: Multiple instance learning (MIL) has shown significant promise in histopathology whole slide image (WSI) analysis for cancer diagnosis and prognosis. However, the inherent spatial heterogeneity of WSIs presents critical challenges, as morphologically similar tissue types are often dispersed across distant anatomical regions. Conventional MIL methods struggle to model these scattered tissue distrib… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: MICCAI 2025

  28. arXiv:2506.18023  [pdf, ps, other

    cs.CV cs.AI cs.CL

    PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

    Authors: Kui Huang, Xinrong Chen, Wenyu Lv, Jincheng Liao, Guanzhong Wang, Yi Liu

    Abstract: This report introduces PP-DocBee2, an advanced version of the PP-DocBee, designed to enhance multimodal document understanding. Built on a large multimodal model architecture, PP-DocBee2 addresses the limitations of its predecessor through key technological improvements, including enhanced synthetic data quality, improved visual feature fusion strategy, and optimized inference methodologies. These… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  29. arXiv:2506.18019  [pdf, ps, other

    cs.AI

    Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities

    Authors: Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu

    Abstract: AI agents have experienced a paradigm shift, from early dominance by reinforcement learning (RL) to the rise of agents powered by large language models (LLMs), and now further advancing towards a synergistic fusion of RL and LLM capabilities. This progression has endowed AI agents with increasingly strong abilities. Despite these advances, to accomplish complex real-world tasks, agents are require… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 20 pages, 7 figures

  30. arXiv:2506.18016  [pdf, ps, other

    cs.RO cs.AI

    ADA-DPM: A Neural Descriptors-based Adaptive Noise Point Filtering Strategy for SLAM

    Authors: Yongxin Shao, Binrui Wang, Aihong Tan

    Abstract: LiDAR SLAM has demonstrated significant application value in various fields, including mobile robot navigation and high-precision map construction. However, existing methods often need to make a trade-off between positioning accuracy and system robustness when faced with dynamic object interference, point cloud noise, and unstructured environments. To address this challenge, we propose an adaptive… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  31. arXiv:2506.18013  [pdf, ps, other

    cs.DB cs.DS

    Dual-Hierarchy Labelling: Scaling Up Distance Queries on Dynamic Road Networks

    Authors: Muhammad Farhan, Henning Koehler, Qing Wang

    Abstract: Computing the shortest-path distance between any two given vertices in road networks is an important problem. A tremendous amount of research has been conducted to address this problem, most of which are limited to static road networks. Since road networks undergo various real-time traffic conditions, there is a pressing need to address this problem for dynamic road networks. Existing state-of-the… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  32. arXiv:2506.18006  [pdf, ps, other

    cs.CV

    OSDMamba: Enhancing Oil Spill Detection from Remote Sensing Images Using Selective State Space Model

    Authors: Shuaiyu Chen, Fu Wang, Peng Ren, Chunbo Luo, Zeyu Fu

    Abstract: Semantic segmentation is commonly used for Oil Spill Detection (OSD) in remote sensing images. However, the limited availability of labelled oil spill samples and class imbalance present significant challenges that can reduce detection accuracy. Furthermore, most existing methods, which rely on convolutional neural networks (CNNs), struggle to detect small oil spill areas due to their limited rece… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  33. arXiv:2506.17968  [pdf, ps, other

    cs.LG cs.AI cs.CV math.PR stat.ML

    h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective

    Authors: Wenjian Huang, Guiping Cao, Jiahao Xia, Jingkun Chen, Hao Wang, Jianguo Zhang

    Abstract: Deep neural networks have demonstrated remarkable performance across numerous learning tasks but often suffer from miscalibration, resulting in unreliable probability outputs. This has inspired many recent works on mitigating miscalibration, particularly through post-hoc recalibration methods that aim to obtain calibrated probabilities without sacrificing the classification performance of pre-trai… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  34. arXiv:2506.17964  [pdf, ps, other

    cs.CE

    Learning from the Storm: A Multivariate Machine Learning Approach to Predicting Hurricane-Induced Economic Losses

    Authors: Bolin Shen, Eren Erman Ozguven, Yue Zhao, Guang Wang, Yiqun Xie, Yushun Dong

    Abstract: Florida is particularly vulnerable to hurricanes, which frequently cause substantial economic losses. While prior studies have explored specific contributors to hurricane-induced damage, few have developed a unified framework capable of integrating a broader range of influencing factors to comprehensively assess the sources of economic loss. In this study, we propose a comprehensive modeling frame… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  35. arXiv:2506.17960  [pdf, ps, other

    cs.RO cs.AI

    GeNIE: A Generalizable Navigation System for In-the-Wild Environments

    Authors: Jiaming Wang, Diwen Liu, Jizhuo Chen, Jiaxuan Da, Nuowen Qian, Tram Minh Man, Harold Soh

    Abstract: Reliable navigation in unstructured, real-world environments remains a significant challenge for embodied agents, especially when operating across diverse terrains, weather conditions, and sensor configurations. In this paper, we introduce GeNIE (Generalizable Navigation System for In-the-Wild Environments), a robust navigation framework designed for global deployment. GeNIE integrates a generaliz… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 8 pages, 5 figures. Jiaming Wang, Diwen Liu, and Jizhuo Chen contributed equally

  36. arXiv:2506.17945  [pdf, ps, other

    cs.MA

    Optimization of Flying Ad Hoc Network Topology and Collaborative Path Planning for Multiple UAVs

    Authors: Ming He, Peizhao Wang, Haihua Chen, Bin Sun, Hongpeng Wang

    Abstract: Multiple unmanned aerial vehicles (UAVs) play a vital role in monitoring and data collection in wide area environments with harsh conditions. In most scenarios, issues such as real-time data retrieval and real-time UAV positioning are often disregarded, essentially neglecting the communication constraints. In this paper, we comprehensively address both the coverage of the target area and the data… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  37. arXiv:2506.17930  [pdf, ps, other

    cs.AI cs.CL cs.LG cs.NE cs.RO

    Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective

    Authors: Jianyu Wang, Zhiqiang Hu, Lidong Bing

    Abstract: We propose a novel prompt design paradigm that challenges conventional wisdom in large language model (LLM) prompting. While conventional wisdom prioritizes well-crafted instructions and demonstrations for in-context learning (ICL), we show that pruning random demonstrations into seemingly incoherent "gibberish" can remarkably improve performance across diverse tasks. Notably, the "gibberish" alwa… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: ICML 2025, and Code will be released at: https://github.com/jianyu-cs/PromptQuine/

    Journal ref: Forty-second International Conference on Machine Learning, 2025

  38. arXiv:2506.17916  [pdf, ps, other

    cs.DS math.PR

    Semirandom Planted Clique via 1-norm Isometry Property

    Authors: Venkatesan Guruswami, Hsin-Po Wang

    Abstract: We give a polynomial-time algorithm that finds a planted clique of size $k \ge \sqrt{n \log n}$ in the semirandom model, improving the state-of-the-art $\sqrt{n} (\log n)^2$ bound. This $\textit{semirandom planted clique problem}$ concerns finding the planted subset $S$ of $k$ vertices of a graph $G$ on $V$, where the induced subgraph $G[S]$ is complete, the cut edges in $G[S; V \setminus S]$ are… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 13 pages, 2 figures, IPCO 2025

  39. arXiv:2506.17912  [pdf, ps, other

    cs.CV cs.MM

    PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis

    Authors: Chuhao Jin, Haosen Li, Bingzi Zhang, Che Liu, Xiting Wang, Ruihua Song, Wenbing Huang, Ying Qin, Fuzheng Zhang, Di Zhang

    Abstract: Recent advances in large language models (LLMs) have enabled breakthroughs in many multimodal generation tasks, but a significant performance gap still exists in text-to-motion generation, where LLM-based methods lag far behind non-LLM methods. We identify the granularity of motion tokenization as a critical bottleneck: fine-grained tokenization induces local dependency issues, where LLMs overemph… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 14 pages, 7 figures

  40. arXiv:2506.17881  [pdf, ps, other

    cs.CL cs.AI

    Multi-turn Jailbreaking via Global Refinement and Active Fabrication

    Authors: Hua Tang, Lingyong Yan, Yukun Zhao, Shuaiqiang Wang, Jizhou Huang, Dawei Yin

    Abstract: Large Language Models (LLMs) have achieved exceptional performance across a wide range of tasks. However, they still pose significant safety risks due to the potential misuse for malicious purposes. Jailbreaks, which aim to elicit models to generate harmful content, play a critical role in identifying the underlying security threats. Recent jailbreaking primarily focuses on single-turn scenarios,… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  41. arXiv:2506.17873  [pdf, ps, other

    cs.CV cs.AI

    SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model

    Authors: Guankun Wang, Wenjin Mo, Junyi Wang, Long Bai, Kun Yuan, Ming Hu, Jinlin Wu, Junjun He, Yiming Huang, Nicolas Padoy, Zhen Lei, Hongbin Liu, Nassir Navab, Hongliang Ren

    Abstract: Recent advances in Multimodal Large Language Models have demonstrated great potential in the medical domain, facilitating users to understand surgical scenes and procedures. Beyond image-based methods, the exploration of Video Large Language Models (Vid-LLMs) has emerged as a promising avenue for capturing the complex sequences of information involved in surgery. However, there is still a lack of… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  42. arXiv:2506.17864  [pdf, ps, other

    cs.CL

    QueueEDIT: Structural Self-Correction for Sequential Model Editing in LLMs

    Authors: Taolin Zhang, Haidong Kang, Dongyang Li, Qizhou Chen, Chengyu Wang Xiaofeng He, Richang Hong

    Abstract: Recently, large language models (LLMs) have demonstrated impressive results but still suffer from hallucinations. Model editing has been proposed to correct factual inaccuracies in LLMs. A challenging case is sequential model editing (SME), which aims to rectify errors continuously rather than treating them as a one-time task. During SME, the general capabilities of LLMs can be negatively affected… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  43. arXiv:2506.17858  [pdf, ps, other

    cs.CV

    Fetuses Made Simple: Modeling and Tracking of Fetal Shape and Pose

    Authors: Yingcheng Liu, Peiqi Wang, Sebastian Diaz, Esra Abaci Turk, Benjamin Billot, Patricia Ellen Grant, Polina Golland

    Abstract: Analyzing fetal body motion and shape is paramount in prenatal diagnostics and monitoring. Existing methods for fetal MRI analysis mainly rely on anatomical keypoints or volumetric body segmentations. Keypoints simplify body structure to facilitate motion analysis, but may ignore important details of full-body shape. Body segmentations capture complete shape information but complicate temporal ana… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  44. arXiv:2506.17828  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach

    Authors: Xinnan Zhang, Chenliang Li, Siliang Zeng, Jiaxiang Li, Zhongruo Wang, Kaixiang Lin, Songtao Lu, Alfredo Garcia, Mingyi Hong

    Abstract: Aligning large language models (LLMs) with human preferences usually requires fine-tuning methods such as RLHF and DPO. These methods directly optimize the model parameters, so they cannot be used in test-time to improve model performance, nor are they applicable when the model weights are not accessible. In contrast, test-time methods sidestep weight updates by leveraging reward functions to guid… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  45. arXiv:2506.17798  [pdf, ps, other

    cs.SE cs.CR

    SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis

    Authors: Wang Lingxiang, Quanzhi Fu, Wenjia Song, Gelei Deng, Yi Liu, Dan Williams, Ying Zhang

    Abstract: The integration of open-source third-party library dependencies in Java development introduces significant security risks when these libraries contain known vulnerabilities. Existing Software Composition Analysis (SCA) tools struggle to effectively detect vulnerable API usage from these libraries due to limitations in understanding API usage semantics and computational challenges in analyzing comp… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  46. arXiv:2506.17784  [pdf, ps, other

    cs.AI

    AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction

    Authors: Song Wang, Zhen Tan, Zihan Chen, Shuang Zhou, Tianlong Chen, Jundong Li

    Abstract: Recent progress in large language model (LLM)-based multi-agent collaboration highlights the power of structured communication in enabling collective intelligence. However, existing methods largely rely on static or graph-based inter-agent topologies, lacking the potential adaptability and flexibility in communication. In this work, we propose a new framework that rethinks multi-agent coordination… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  47. arXiv:2506.17772  [pdf, ps, other

    cs.SE

    PAGENT: Learning to Patch Software Engineering Agents

    Authors: Haoran Xue, Gias Uddin, Song Wang

    Abstract: LLM Agents produce patches automatically to resolve an issue. However, they can generate inaccurate patches. Little is known about the root causes behind those failed patches or how those could be fixed. This paper reports an empirical study of the failed patches generated by seven top LLM code agents. We collected 114 issues from the SWE-bench Lite dataset that remained unresolved across the agen… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  48. arXiv:2506.17728  [pdf, ps, other

    cs.CL cs.AI

    KAG-Thinker: Teaching Large Language Models to Think with Human-like Reasoning Process

    Authors: Dalong Zhang, Jun Xu, Jun Zhou, Lei Liang, Lin Yuan, Ling Zhong, Mengshu Sun, Peilong Zhao, QiWei Wang, Xiaorui Wang, Xinkai Du, YangYang Hou, Yu Ao, ZhaoYang Wang, Zhengke Gui, ZhiYing Yi, Zhongpu Bo

    Abstract: In this paper, we introduce KAG-Thinker, a novel human-like reasoning framework built upon a parameter-light large language model (LLM). Our approach enhances the logical coherence and contextual consistency of the thinking process in question-answering (Q\&A) tasks on domain-specific knowledge bases (KBs) within LLMs. This framework simulates human cognitive mechanisms for handling complex proble… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  49. arXiv:2506.17709  [pdf, ps, other

    cs.LG cs.CR stat.ML

    CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition

    Authors: Zebin Wang, Menghan Lin, Bolin Shen, Ken Anderson, Molei Liu, Tianxi Cai, Yushun Dong

    Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable utility across diverse applications, and their growing complexity has made Machine Learning as a Service (MLaaS) a viable platform for scalable deployment. However, this accessibility also exposes GNN to serious security threats, most notably model extraction attacks (MEAs), in which adversaries strategically query a deployed model to const… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  50. arXiv:2506.17697  [pdf, ps, other

    cs.AI

    Beyond Syntax: Action Semantics Learning for App Agents

    Authors: Bohan Tang, Dezhao Luo, Jingxuan Chen, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao

    Abstract: The advent of Large Language Models (LLMs) enables the rise of App agents that interpret user intent and operate smartphone Apps through actions such as clicking and scrolling. While prompt-based solutions with closed LLM APIs show promising ability, they incur heavy compute costs and external API dependency. Fine-tuning smaller open-source LLMs solves these limitations. However, current fine-tuni… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.