Skip to main content

Showing 1–50 of 806 results for author: Gao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.25552  [pdf, ps, other

    cs.AI

    Evaluating Foundation Models with Pathological Concept Learning for Kidney Cancer

    Authors: Shangqi Gao, Sihan Wang, Yibo Gao, Boming Wang, Xiahai Zhuang, Anne Warren, Grant Stewart, James Jones, Mireia Crispin-Ortuzar

    Abstract: To evaluate the translational capabilities of foundation models, we develop a pathological concept learning approach focused on kidney cancer. By leveraging TNM staging guidelines and pathology reports, we build comprehensive pathological concepts for kidney cancer. Then, we extract deep features from whole slide images using foundation models, construct pathological graphs to capture spatial corr… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: Best Paper Award at MICCAI AMAI 2025

    ACM Class: J.3

  2. arXiv:2509.24498  [pdf, ps, other

    cs.SE

    JSProtect: A Scalable Obfuscation Framework for Mini-Games in WeChat

    Authors: Zhihao Li, Chaozheng Wang, Zongjie Li, Xinyong Peng, Zelin Su, Qun Xia, Haochuan Lu, Ting Xiong, Man Ho Lam, Shuzheng Gao, Yuchong Xie, Cuiyun Gao, Shuai Wang, Yuetang Deng, Huafeng Ma

    Abstract: The WeChat mini-game ecosystem faces rampant intellectual property theft to other platforms via secondary development, yet existing JavaScript obfuscation tools are ill-equipped for large-scale applications, suffering from prohibitive processing times, severe runtime performance degradation, and unsustainable code size inflation. This paper introduces JSProtect, a high-throughput parallelized obfu… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 10 pages

  3. arXiv:2509.23690  [pdf, ps, other

    cs.CV cs.CL

    HomeSafeBench: A Benchmark for Embodied Vision-Language Models in Free-Exploration Home Safety Inspection

    Authors: Siyuan Gao, Jiashu Yao, Haoyu Wen, Yuhang Guo, Zeming Liu, Heyan Huang

    Abstract: Embodied agents can identify and report safety hazards in the home environments. Accurately evaluating their capabilities in home safety inspection tasks is curcial, but existing benchmarks suffer from two key limitations. First, they oversimplify safety inspection tasks by using textual descriptions of the environment instead of direct visual information, which hinders the accurate evaluation of… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  4. arXiv:2509.23426  [pdf, ps, other

    cs.AI cs.LG

    Democratizing AI scientists using ToolUniverse

    Authors: Shanghua Gao, Richard Zhu, Pengwei Sui, Zhenglun Kong, Sufian Aldogom, Yepeng Huang, Ayush Noori, Reza Shamji, Krishna Parvataneni, Theodoros Tsiligkaridis, Marinka Zitnik

    Abstract: AI scientists are emerging computational systems that serve as collaborative partners in discovery. These systems remain difficult to build because they are bespoke, tied to rigid workflows, and lack shared environments that unify tools, data, and analyses into a common ecosystem. In omics, unified ecosystems have transformed research by enabling interoperability, reuse, and community-driven devel… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: https://aiscientist.tools

  5. arXiv:2509.19182  [pdf, ps, other

    cs.HC cs.AI

    YAC: Bridging Natural Language and Interactive Visual Exploration with Generative AI for Biomedical Data Discovery

    Authors: Devin Lange, Shanghua Gao, Pengwei Sui, Austen Money, Priya Misner, Marinka Zitnik, Nils Gehlenborg

    Abstract: Incorporating natural language input has the potential to improve the capabilities of biomedical data discovery interfaces. However, user interface elements and visualizations are still powerful tools for interacting with data, even in the new world of generative AI. In our prototype system, YAC, Yet Another Chatbot, we bridge the gap between natural language and interactive visualizations by gene… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  6. arXiv:2509.18808  [pdf, ps, other

    cs.SE

    SR-Eval: Evaluating LLMs on Code Generation under Stepwise Requirement Refinement

    Authors: Zexun Zhan, Shuzheng Gao, Ruida Hu, Cuiyun Gao

    Abstract: Large language models (LLMs) have achieved remarkable progress in code generation. However, existing benchmarks mainly formalize the task as a static, single-turn problem, overlooking the stepwise requirement changes and iterative workflows in real-world software development. This mismatch limits the understanding of how well LLMs can support real-world development workflows. Constructing such ite… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  7. arXiv:2509.16454  [pdf, ps, other

    cs.HC cs.AI

    A Generative AI System for Biomedical Data Discovery with Grammar-Based Visualizations

    Authors: Devin Lange, Shanghua Gao, Pengwei Sui, Austen Money, Priya Misner, Marinka Zitnik, Nils Gehlenborg

    Abstract: We explore the potential for combining generative AI with grammar-based visualizations for biomedical data discovery. In our prototype, we use a multi-agent system to generate visualization specifications and apply filters. These visualizations are linked together, resulting in an interactive dashboard that is progressively constructed. Our system leverages the strengths of natural language while… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

  8. arXiv:2509.14233  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

    Authors: Alejandro Hernández-Cano, Alexander Hägele, Allen Hao Huang, Angelika Romanou, Antoni-Joan Solergibert, Barna Pasztor, Bettina Messmer, Dhia Garbaya, Eduard Frank Ďurech, Ido Hakimi, Juan García Giraldo, Mete Ismayilzada, Negar Foroutan, Skander Moalla, Tiancheng Chen, Vinko Sabolčec, Yixuan Xu, Michael Aerni, Badr AlKhamissi, Ines Altemir Marinas, Mohammad Hossein Amani, Matin Ansaripour, Ilia Badanin, Harold Benoit, Emanuela Boros , et al. (76 additional authors not shown)

    Abstract: We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively r… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  9. arXiv:2509.14210  [pdf, ps, other

    cs.RO

    GLIDE: A Coordinated Aerial-Ground Framework for Search and Rescue in Unknown Environments

    Authors: Seth Farrell, Chenghao Li, Hongzhan Yu, Hesam Mojtahedi, Sicun Gao, Henrik I. Christensen

    Abstract: We present a cooperative aerial-ground search-and-rescue (SAR) framework that pairs two unmanned aerial vehicles (UAVs) with an unmanned ground vehicle (UGV) to achieve rapid victim localization and obstacle-aware navigation in unknown environments. We dub this framework Guided Long-horizon Integrated Drone Escort (GLIDE), highlighting the UGV's reliance on UAV guidance for long-horizon planning.… ▽ More

    Submitted 28 September, 2025; v1 submitted 17 September, 2025; originally announced September 2025.

  10. arXiv:2509.12777  [pdf, ps, other

    cs.CV cs.AI

    CECT-Mamba: a Hierarchical Contrast-enhanced-aware Model for Pancreatic Tumor Subtyping from Multi-phase CECT

    Authors: Zhifang Gong, Shuo Gao, Ben Zhao, Yingjing Xu, Yijun Yang, Shenghong Ju, Guangquan Zhou

    Abstract: Contrast-enhanced computed tomography (CECT) is the primary imaging technique that provides valuable spatial-temporal information about lesions, enabling the accurate diagnosis and subclassification of pancreatic tumors. However, the high heterogeneity and variability of pancreatic tumors still pose substantial challenges for precise subtyping diagnosis. Previous methods fail to effectively explor… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  11. arXiv:2509.09342  [pdf, ps, other

    cs.IR

    CESRec: Constructing Pseudo Interactions for Sequential Recommendation via Conversational Feedback

    Authors: Yifan Wang, Shen Gao, Jiabao Fang, Rui Yan, Billy Chiu, Shuo Shang

    Abstract: Sequential Recommendation Systems (SRS) have become essential in many real-world applications. However, existing SRS methods often rely on collaborative filtering signals and fail to capture real-time user preferences, while Conversational Recommendation Systems (CRS) excel at eliciting immediate interests through natural language interactions but neglect historical behavior. To bridge this gap, w… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

  12. arXiv:2509.07996  [pdf, ps, other

    cs.CV cs.RO

    3D and 4D World Modeling: A Survey

    Authors: Lingdong Kong, Wesley Yang, Jianbiao Mei, Youquan Liu, Ao Liang, Dekai Zhu, Dongyue Lu, Wei Yin, Xiaotao Hu, Mingkai Jia, Junyuan Deng, Kaiwen Zhang, Yang Wu, Tianyi Yan, Shenyuan Gao, Song Wang, Linfeng Li, Liang Pan, Yong Liu, Jianke Zhu, Wei Tsang Ooi, Steven C. H. Hoi, Ziwei Liu

    Abstract: World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit. While prior work largely emphasizes generative methods for 2D image and video data, they overlook the rapidly growing body of work that leverages native 3D and 4D representations such as RGB-D imagery, occupancy grids, and LiDAR point clouds for large… ▽ More

    Submitted 11 September, 2025; v1 submitted 4 September, 2025; originally announced September 2025.

    Comments: Survey; 34 pages, 10 figures, 14 tables; GitHub Repo at https://github.com/worldbench/survey

  13. arXiv:2509.07504  [pdf, ps, other

    cs.CR

    Backdoor Attacks and Defenses in Computer Vision Domain: A Survey

    Authors: Bilal Hussain Abbasi, Yanjun Zhang, Leo Zhang, Shang Gao

    Abstract: Backdoor (trojan) attacks embed hidden, controllable behaviors into machine-learning models so that models behave normally on benign inputs but produce attacker-chosen outputs when a trigger is present. This survey reviews the rapidly growing literature on backdoor attacks and defenses in the computer-vision domain. We introduce a multi-dimensional taxonomy that organizes attacks and defenses by i… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

  14. arXiv:2509.06052  [pdf, ps, other

    cs.SE cs.AI cs.CR

    Empirical Study of Code Large Language Models for Binary Security Patch Detection

    Authors: Qingyuan Li, Binchang Li, Cuiyun Gao, Shuzheng Gao, Zongjie Li

    Abstract: Security patch detection (SPD) is crucial for maintaining software security, as unpatched vulnerabilities can lead to severe security risks. In recent years, numerous learning-based SPD approaches have demonstrated promising results on source code. However, these approaches typically cannot be applied to closed-source applications and proprietary systems that constitute a significant portion of re… ▽ More

    Submitted 7 September, 2025; originally announced September 2025.

  15. arXiv:2509.05926  [pdf, ps, other

    physics.optics cs.AI physics.app-ph

    Meta-training of diffractive meta-neural networks for super-resolution direction of arrival estimation

    Authors: Songtao Yang, Sheng Gao, Chu Wu, Zejia Zhao, Haiou Zhang, Xing Lin

    Abstract: Diffractive neural networks leverage the high-dimensional characteristics of electromagnetic (EM) fields for high-throughput computing. However, the existing architectures face challenges in integrating large-scale multidimensional metasurfaces with precise network training and haven't utilized multidimensional EM field coding scheme for super-resolution sensing. Here, we propose diffractive meta-… ▽ More

    Submitted 7 September, 2025; originally announced September 2025.

    Comments: 47 pages, 17 figures

  16. arXiv:2509.05881  [pdf, ps, other

    cs.SE cs.AI

    GeoAnalystBench: A GeoAI benchmark for assessing large language models for spatial analysis workflow and code generation

    Authors: Qianheng Zhang, Song Gao, Chen Wei, Yibo Zhao, Ying Nie, Ziru Chen, Shijie Chen, Yu Su, Huan Sun

    Abstract: Recent advances in large language models (LLMs) have fueled growing interest in automating geospatial analysis and GIS workflows, yet their actual capabilities remain uncertain. In this work, we call for rigorous evaluation of LLMs on well-defined geoprocessing tasks before making claims about full GIS automation. To this end, we present GeoAnalystBench, a benchmark of 50 Python-based tasks derive… ▽ More

    Submitted 6 September, 2025; originally announced September 2025.

    Comments: 34 pages, 8 figures

    ACM Class: I.2

    Journal ref: Transactions in GIS, 2025

  17. arXiv:2509.04337  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Decoupled Entity Representation Learning for Pinterest Ads Ranking

    Authors: Jie Liu, Yinrui Li, Jiankai Sun, Kungang Li, Han Sun, Sihan Wang, Huasen Wu, Siyuan Gao, Paulo Soares, Nan Li, Zhifang Liu, Haoyang Li, Siping Ji, Ling Leng, Prathibha Deshikachar

    Abstract: In this paper, we introduce a novel framework following an upstream-downstream paradigm to construct user and item (Pin) embeddings from diverse data sources, which are essential for Pinterest to deliver personalized Pins and ads effectively. Our upstream models are trained on extensive data sources featuring varied signals, utilizing complex architectures to capture intricate relationships betwee… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  18. arXiv:2508.21581  [pdf, ps, other

    cs.CV

    Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer

    Authors: Daniël Boeke, Cedrik Blommestijn, Rebecca N. Wray, Kalina Chupetlovska, Shangqi Gao, Zeyu Gao, Regina G. H. Beets-Tan, Mireia Crispin-Ortuzar, James O. Jones, Wilson Silva, Ines P. Machado

    Abstract: Recurrence risk estimation in clear cell renal cell carcinoma (ccRCC) is essential for guiding postoperative surveillance and treatment. The Leibovich score remains widely used for stratifying distant recurrence risk but offers limited patient-level resolution and excludes imaging information. This study evaluates multimodal recurrence prediction by integrating preoperative computed tomography (CT… ▽ More

    Submitted 29 August, 2025; originally announced August 2025.

    Comments: 12 pages, 2 figures, 1 table. Accepted at the Multimodal Learning and Fusion Across Scales for Clinical Decision Support (ML-CDS) Workshop, MICCAI 2025. This is the submitted version with authors, affiliations, and acknowledgements included; it has not undergone peer review or revisions. The final version will appear in the Springer Lecture Notes in Computer Science (LNCS) proceedings

  19. arXiv:2508.21148  [pdf, ps, other

    cs.CL cs.AI

    A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

    Authors: Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su , et al. (78 additional authors not shown)

    Abstract: Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a un… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

  20. arXiv:2508.17590  [pdf, ps, other

    cs.DB cs.AI cs.CL cs.MA

    RubikSQL: Lifelong Learning Agentic Knowledge Base as an Industrial NL2SQL System

    Authors: Zui Chen, Han Li, Xinhao Zhang, Xiaoyu Chen, Chunyin Dong, Yifeng Wang, Xin Cai, Su Zhang, Ziqi Li, Chi Ding, Jinxu Li, Shuai Wang, Dousheng Zhao, Sanhai Gao, Guangyi Liu

    Abstract: We present RubikSQL, a novel NL2SQL system designed to address key challenges in real-world enterprise-level NL2SQL, such as implicit intents and domain-specific terminology. RubikSQL frames NL2SQL as a lifelong learning task, demanding both Knowledge Base (KB) maintenance and SQL generation. RubikSQL systematically builds and refines its KB through techniques including database profiling, structu… ▽ More

    Submitted 24 August, 2025; originally announced August 2025.

    Comments: 18 pages, 3 figures, 3 tables, to be submitted to VLDB 2026 (PVLDB Volume 19)

    ACM Class: H.2.3; I.2.4; I.2.7

  21. arXiv:2508.15763  [pdf, ps, other

    cs.LG cs.CL cs.CV

    Intern-S1: A Scientific Multimodal Foundation Model

    Authors: Lei Bai, Zhongrui Cai, Yuhang Cao, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Nianchen Deng, Ning Ding, Nanqing Dong, Peijie Dong, Shihan Dou, Sinan Du, Haodong Duan , et al. (152 additional authors not shown)

    Abstract: In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared… ▽ More

    Submitted 24 August, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

  22. arXiv:2508.15146  [pdf, ps, other

    cs.HC

    QueryGenie: Making LLM-Based Database Querying Transparent and Controllable

    Authors: Longfei Chen, Shenghan Gao, Shiwei Wang, Ken Lin, Yun Wang, Quan Li

    Abstract: Conversational user interfaces powered by large language models (LLMs) have significantly lowered the technical barriers to database querying. However, existing tools still encounter several challenges, such as misinterpretation of user intent, generation of hallucinated content, and the absence of effective mechanisms for human feedback-all of which undermine their reliability and practical utili… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

    Comments: Accepted by The 38th Annual ACM Symposium on User Interface Software and Technology (UIST Adjunct '25), September 28-October 1, 2025, Busan, Republic of Korea

  23. arXiv:2508.14601  [pdf, ps, other

    cs.NI

    Multi-Tier UAV Edge Computing for Low Altitude Networks Towards Long-Term Energy Stability

    Authors: Yufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang

    Abstract: This paper presents a novel multi-tier UAV-assisted edge computing system designed for low-altitude networks. The system comprises vehicle users, lightweight Low-Tier UAVs (L-UAVs), and High-Tier UAV (H-UAV). L-UAVs function as small-scale edge servers positioned closer to vehicle users, while the H-UAV, equipped with more powerful server and larger-capacity battery, serves as mobile backup server… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

  24. arXiv:2508.12491  [pdf, ps, other

    cs.LG

    Cost-Aware Contrastive Routing for LLMs

    Authors: Reza Shirkavand, Shangqian Gao, Peiran Yu, Heng Huang

    Abstract: We study cost-aware routing for large language models across diverse and dynamic pools of models. Existing approaches often overlook prompt-specific context, rely on expensive model profiling, assume a fixed set of experts, or use inefficient trial-and-error strategies. We introduce Cost-Spectrum Contrastive Routing (CSCR), a lightweight framework that maps both prompts and models into a shared em… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

  25. arXiv:2508.12226  [pdf, ps, other

    cs.CV

    In vivo 3D ultrasound computed tomography of musculoskeletal tissues with generative neural physics

    Authors: Zhijun Zeng, Youjia Zheng, Chang Su, Qianhang Wu, Hao Hu, Zeyuan Dong, Shan Gao, Yang Lv, Rui Tang, Ligang Cui, Zhiyong Hou, Weijun Lin, Zuoqiang Shi, Yubing Li, He Sun

    Abstract: Ultrasound computed tomography (USCT) is a radiation-free, high-resolution modality but remains limited for musculoskeletal imaging due to conventional ray-based reconstructions that neglect strong scattering. We propose a generative neural physics framework that couples generative networks with physics-informed neural simulation for fast, high-fidelity 3D USCT. By learning a compact surrogate of… ▽ More

    Submitted 16 August, 2025; originally announced August 2025.

    MSC Class: 65N21; 92C55; 68T07

  26. arXiv:2508.11958  [pdf, ps, other

    cs.SE

    Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset

    Authors: Zhipeng Xue, Xiaoting Zhang, Zhipeng Gao, Xing Hu, Shan Gao, Xin Xia, Shanping Li

    Abstract: The Large Language Models (LLMs) have demonstrated great potential in code-related tasks. However, most research focuses on improving the output quality of LLMs (e.g., correctness), and less attention has been paid to the LLM input (e.g., the training code quality). Given that code smells are widely existed in practice and can negatively impact software maintainability and readability, this study… ▽ More

    Submitted 16 August, 2025; originally announced August 2025.

  27. arXiv:2508.11737  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    Ovis2.5 Technical Report

    Authors: Shiyin Lu, Yang Li, Yu Xia, Yuwei Hu, Shanshan Zhao, Yanqing Ma, Zhichao Wei, Yinglun Li, Lunhao Duan, Jianshan Zhao, Yuxuan Han, Haijun Li, Wanying Chen, Junke Tang, Chengkun Hou, Zhixing Du, Tianli Zhou, Wenjie Zhang, Huping Ding, Jiahe Li, Wen Li, Gui Hu, Yiliang Gu, Siran Yang, Jiamang Wang , et al. (17 additional authors not shown)

    Abstract: We present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the degradation from fixed-resolution tiling and preserving both fine detail and global layout -- crucial for visually dense content like complex cha… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

  28. arXiv:2508.09142  [pdf, ps, other

    eess.SP cs.AI

    Bayesian-Driven Graph Reasoning for Active Radio Map Construction

    Authors: Wenlihan Lu, Shijian Gao, Miaowen Wen, Yuxuan Liang, Liuqing Yang, Chan-Byoung Chae, H. Vincent Poor

    Abstract: With the emergence of the low-altitude economy, radio maps have become essential for ensuring reliable wireless connectivity to aerial platforms. Autonomous aerial agents are commonly deployed for data collection using waypoint-based navigation; however, their limited battery capacity significantly constrains coverage and efficiency. To address this, we propose an uncertainty-aware radio map (URAM… ▽ More

    Submitted 22 August, 2025; v1 submitted 28 July, 2025; originally announced August 2025.

  29. arXiv:2508.07811  [pdf, ps, other

    cs.CV

    DiTVR: Zero-Shot Diffusion Transformer for Video Restoration

    Authors: Sicheng Gao, Nancy Mehta, Zongwei Wu, Radu Timofte

    Abstract: Video restoration aims to reconstruct high quality video sequences from low quality inputs, addressing tasks such as super resolution, denoising, and deblurring. Traditional regression based methods often produce unrealistic details and require extensive paired datasets, while recent generative diffusion models face challenges in ensuring temporal consistency. We introduce DiTVR, a zero shot video… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

    Comments: 7 pages, 6 figures

  30. arXiv:2508.06926  [pdf, ps, other

    cs.SE

    Integrating Rules and Semantics for LLM-Based C-to-Rust Translation

    Authors: Feng Luo, Kexing Ji, Cuiyun Gao, Shuzheng Gao, Jia Feng, Kui Liu, Xin Xia, Michael R. Lyu

    Abstract: Automated translation of legacy C code into Rust aims to ensure memory safety while reducing the burden of manual migration. Early approaches in code translation rely on static rule-based methods, but they suffer from limited coverage due to dependence on predefined rule patterns. Recent works regard the task as a sequence-to-sequence problem by leveraging large language models (LLMs). Although th… ▽ More

    Submitted 9 August, 2025; originally announced August 2025.

    Comments: Accepted in ICSME 25 Industry Track

  31. arXiv:2508.05432  [pdf, ps, other

    cs.AI cs.CY

    Whose Truth? Pluralistic Geo-Alignment for (Agentic) AI

    Authors: Krzysztof Janowicz, Zilong Liu, Gengchen Mai, Zhangyu Wang, Ivan Majic, Alexandra Fortacz, Grant McKenzie, Song Gao

    Abstract: AI (super) alignment describes the challenge of ensuring (future) AI systems behave in accordance with societal norms and goals. While a quickly evolving literature is addressing biases and inequalities, the geographic variability of alignment remains underexplored. Simply put, what is considered appropriate, truthful, or legal can differ widely across regions due to cultural norms, political real… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

  32. arXiv:2508.03686  [pdf, ps, other

    cs.CL cs.AI

    CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

    Authors: Shudong Liu, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang, Kai Chen

    Abstract: Answer verification is crucial not only for evaluating large language models (LLMs) by matching their unstructured outputs against standard answers, but also serves as the reward model to guide LLM optimization. Most evaluation frameworks rely on regularized matching or employ general LLMs for answer verification, which demands extensive, repetitive customization for regex rules or evaluation prom… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

    Comments: Technical Report; 31 Pages

  33. arXiv:2508.03535  [pdf, ps, other

    cs.CV

    CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation

    Authors: Kaishen Yuan, Yuting Zhang, Shang Gao, Yijie Zhu, Wenshuo Chen, Yutao Yue

    Abstract: Emotional Image Content Generation (EICG) aims to generate semantically clear and emotionally faithful images based on given emotion categories, with broad application prospects. While recent text-to-image diffusion models excel at generating concrete concepts, they struggle with the complexity of abstract emotions. There have also emerged methods specifically designed for EICG, but they excessive… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

    Comments: 10 pages, 9 figures

  34. arXiv:2508.03002  [pdf, ps, other

    cs.LG

    Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

    Authors: Haidong Kang, Lianbo Ma, Guo Yu, Shangce Gao

    Abstract: Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of existing MPQ methods is to optimize quantization policies (i.e., bit-width allocation) in a gradient descent manner, termed as Differentiable (DMPQ). At the en… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

  35. arXiv:2508.02461  [pdf

    cs.CR

    Experimental Evaluation of Post-Quantum Homomorphic Encryption for Privacy-Preserving V2X Communication

    Authors: Abdullah Al Mamun, Kyle Yates, Antsa Rakotondrafara, Mashrur Chowdhury, Ryann Cartor, Shuhong Gao

    Abstract: Intelligent Transportation Systems (ITS) fundamentally rely on vehicle-generated data for applications such as congestion monitoring and route optimization, making the preservation of user privacy a critical challenge. Homomorphic Encryption (HE) offers a promising solution by enabling computation on encrypted data without revealing underlying content. This study presents the first real-world expe… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

    Comments: This version has been submitted to the TRB Annual Meeting 2026 and is currently under review

  36. arXiv:2508.01655  [pdf, ps, other

    cs.CR cs.SE

    JSidentify-V2: Leveraging Dynamic Memory Fingerprinting for Mini-Game Plagiarism Detection

    Authors: Zhihao Li, Chaozheng Wang, Zongjie Li, Xinyong Peng, Qun Xia, Haochuan Lu, Ting Xiong, Shuzheng Gao, Cuiyun Gao, Shuai Wang, Yuetang Deng, Huafeng Ma

    Abstract: The explosive growth of mini-game platforms has led to widespread code plagiarism, where malicious users access popular games' source code and republish them with modifications. While existing static analysis tools can detect simple obfuscation techniques like variable renaming and dead code injection, they fail against sophisticated deep obfuscation methods such as encrypted code with local or cl… ▽ More

    Submitted 3 August, 2025; originally announced August 2025.

    Comments: 12 pages

  37. Unlocking Excellence: The Impact of Voucher Incentives on Cybersecurity Education

    Authors: Jianhua Li, Shang Gao, Michelle Harvey, Trina Myers

    Abstract: While voucher incentives have been popular for primary and secondary schools, they are less used in higher education. In this study, we leverage industry voucher incentives to inspire students in cybersecurity education (CSE). We adopt a 100% portfolio-based assessment strategy, where students can freely select their target grades in the investigated unit. We purposely design one of the high disti… ▽ More

    Submitted 2 August, 2025; originally announced August 2025.

  38. arXiv:2508.01473  [pdf, ps, other

    cs.CL

    TreeDiff: AST-Guided Code Generation with Diffusion LLMs

    Authors: Yiming Zeng, Jinghan Cao, Zexin Li, Yiming Chen, Tao Ren, Dawei Xiang, Xidong Wu, Shangqian Gao, Tingting Yu

    Abstract: Recent advances in diffusion-based language models have opened new possibilities for controllable and bidirectional sequence generation. These models provide an alternative to traditional autoregressive approaches by framing text generation as an iterative denoising process. However, applying diffusion models to structured domains such as source code remains a significant challenge. Programming la… ▽ More

    Submitted 7 August, 2025; v1 submitted 2 August, 2025; originally announced August 2025.

  39. arXiv:2508.01174  [pdf, ps, other

    cs.LG cs.AI

    RSPO: Risk-Seeking Policy Optimization for Pass@k and Max@k Metrics in Large Language Models

    Authors: Kaichen Zhang, Shenghao Gao, Yuzhong Hong, Haipeng Sun, Junwei Bao, Hongfei Jiang, Yang Song, Hong Dingqian, Hui Xiong

    Abstract: Current large language model post-training optimizes a risk-neutral objective that maximizes expected reward, yet evaluation relies heavily on risk-seeking metrics like Pass@k (at least one success in k trials) and Max@k (maximum reward across k responses). This mismatch in risk preferences can inevitably lead to suboptimal performance. To bridge this gap, we propose Risk-Seeking Policy Optimizati… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

  40. arXiv:2508.00719  [pdf, ps, other

    cs.CL cs.AI

    DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS

    Authors: Yingxu Wang, Shiqi Fan, Mengzhu Wang, Siyang Gao, Chao Wang, Nan Yin

    Abstract: Knowledge Graph Question Answering (KGQA) aims to interpret natural language queries and perform structured reasoning over knowledge graphs by leveraging their relational and semantic structures to retrieve accurate answers. Existing methods primarily follow either the retrieve-then-reason paradigm, which relies on Graph Neural Networks or heuristic rules to extract static candidate paths, or dyna… ▽ More

    Submitted 25 September, 2025; v1 submitted 1 August, 2025; originally announced August 2025.

  41. arXiv:2507.23400  [pdf, ps, other

    cs.CL cs.IR

    MRGSEM-Sum: An Unsupervised Multi-document Summarization Framework based on Multi-Relational Graphs and Structural Entropy Minimization

    Authors: Yongbing Zhang, Fang Nan, Shengxiang Gao, Yuxin Huang, Kaiwen Tan, Zhengtao Yu

    Abstract: The core challenge faced by multi-document summarization is the complexity of relationships among documents and the presence of information redundancy. Graph clustering is an effective paradigm for addressing this issue, as it models the complex relationships among documents using graph structures and reduces information redundancy through clustering, achieving significant research progress. Howev… ▽ More

    Submitted 31 July, 2025; originally announced July 2025.

  42. arXiv:2507.21649  [pdf, ps, other

    cs.CV

    The Evolution of Video Anomaly Detection: A Unified Framework from DNN to MLLM

    Authors: Shibo Gao, Peipei Yang, Haiyang Guo, Yangyang Liu, Yi Chen, Shuai Li, Han Zhu, Jian Xu, Xu-Yao Zhang, Linlin Huang

    Abstract: Video anomaly detection (VAD) aims to identify and ground anomalous behaviors or events in videos, serving as a core technology in the fields of intelligent surveillance and public safety. With the advancement of deep learning, the continuous evolution of deep model architectures has driven innovation in VAD methodologies, significantly enhancing feature representation and scene adaptability, ther… ▽ More

    Submitted 29 July, 2025; originally announced July 2025.

  43. arXiv:2507.21507  [pdf, ps, other

    cs.CV cs.MM

    VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding

    Authors: Shibo Gao, Peipei Yang, Yangyang Liu, Yi Chen, Han Zhu, Xuyao Zhang, Linlin Huang

    Abstract: Video Anomaly Detection (VAD) aims to identify anomalous events in videos and accurately determine their time intervals. Current VAD methods mainly fall into two categories: traditional DNN-based approaches that focus on temporal localization, and LLM-based approaches that emphasize semantic understanding. Both anomaly understanding and grounding are essential for comprehensive video anomaly detec… ▽ More

    Submitted 29 July, 2025; originally announced July 2025.

    Comments: 21 pages, 19 figures, 8 tables

  44. arXiv:2507.21302  [pdf, ps, other

    cs.CL

    Can human clinical rationales improve the performance and explainability of clinical text classification models?

    Authors: Christoph Metzner, Shang Gao, Drahomira Herrmannova, Heidi A. Hanson

    Abstract: AI-driven clinical text classification is vital for explainable automated retrieval of population-level health information. This work investigates whether human-based clinical rationales can serve as additional supervision to improve both performance and explainability of transformer-based models that automatically encode clinical documents. We analyzed 99,125 human-based clinical rationales that… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

  45. arXiv:2507.20745  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Regularizing Subspace Redundancy of Low-Rank Adaptation

    Authors: Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu

    Abstract: Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices remain unrestricted during training, causing high representation redundancy and diminishing the effectiveness of feature adaptation in the resulting subspaces. While… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

    Comments: 10 pages, 4 figures, Accepted by ACMMM2025

  46. arXiv:2507.19905  [pdf, ps, other

    cs.CR cs.CV

    ConSeg: Contextual Backdoor Attack Against Semantic Segmentation

    Authors: Bilal Hussain Abbasi, Zirui Gong, Yanjun Zhang, Shang Gao, Antonio Robles-Kelly, Leo Zhang

    Abstract: Despite significant advancements in computer vision, semantic segmentation models may be susceptible to backdoor attacks. These attacks, involving hidden triggers, aim to cause the models to misclassify instances of the victim class as the target class when triggers are present, posing serious threats to the reliability of these models. To further explore the field of backdoor attacks against sema… ▽ More

    Submitted 26 July, 2025; originally announced July 2025.

  47. arXiv:2507.19427  [pdf, ps, other

    cs.LG cs.AI

    Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

    Authors: StepFun, :, Bin Wang, Bojun Wang, Changyi Wan, Guanzhe Huang, Hanpeng Hu, Haonan Jia, Hao Nie, Mingliang Li, Nuo Chen, Siyu Chen, Song Yuan, Wuxun Xie, Xiaoniu Song, Xing Chen, Xingping Yang, Xuelin Zhang, Yanbo Yu, Yaoyu Wang, Yibo Zhu, Yimin Jiang, Yu Zhou, Yuanwei Lu, Houyi Li , et al. (175 additional authors not shown)

    Abstract: Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache… ▽ More

    Submitted 25 July, 2025; originally announced July 2025.

  48. arXiv:2507.17528  [pdf, ps, other

    cs.LG

    Generalized Low-Rank Matrix Contextual Bandits with Graph Information

    Authors: Yao Wang, Jiannan Li, Yue Kang, Shanxing Gao, Zhenxin Xiao

    Abstract: The matrix contextual bandit (CB), as an extension of the well-known multi-armed bandit, is a powerful framework that has been widely applied in sequential decision-making scenarios involving low-rank structure. In many real-world scenarios, such as online advertising and recommender systems, additional graph information often exists beyond the low-rank structure, that is, the similar relationship… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  49. arXiv:2507.16867  [pdf, ps, other

    cs.LG cs.AI

    Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization

    Authors: Yunyi Zhao, Wei Zhang, Cheng Xiang, Hongyang Du, Dusit Niyato, Shuhua Gao

    Abstract: This paper introduces DiffCarl, a diffusion-modeled carbon- and risk-aware reinforcement learning algorithm for intelligent operation of multi-microgrid systems. With the growing integration of renewables and increasing system complexity, microgrid communities face significant challenges in real-time energy scheduling and optimization under uncertainty. DiffCarl integrates a diffusion model into a… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: 10 pages, 5 figures

  50. arXiv:2507.16814  [pdf, ps, other

    cs.LG cs.CV

    Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

    Authors: Junhao Shen, Haiteng Zhao, Yuzhe Gu, Songyang Gao, Kuikun Liu, Haian Huang, Jianfei Gao, Dahua Lin, Wenwei Zhang, Kai Chen

    Abstract: Enhancing large vision-language models (LVLMs) with visual slow-thinking reasoning is crucial for solving complex multimodal tasks. However, since LVLMs are mainly trained with vision-language alignment, it is difficult to adopt on-policy reinforcement learning (RL) to develop the slow thinking ability because the rollout space is restricted by its initial abilities. Off-policy RL offers a way to… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.