Skip to main content

Showing 1–50 of 1,770 results for author: Gao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09971  [pdf, ps, other

    cs.CV

    APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds

    Authors: Yuan Gao, Shaobo Xia, Sheng Nie, Cheng Wang, Xiaohuan Xi, Bisheng Yang

    Abstract: Airborne laser scanning (ALS) point cloud segmentation is a fundamental task for large-scale 3D scene understanding. In real-world applications, models are typically fixed after training. However, domain shifts caused by changes in the environment, sensor types, or sensor degradation often lead to a decline in model performance. Continuous Test-Time Adaptation (CTTA) offers a solution by adapting… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 18 pages,12 figures

  2. arXiv:2505.09415  [pdf, other

    cs.CV

    FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models

    Authors: Hongyang Wang, Yichen Shi, Zhuofu Tao, Yuhao Gao, Liepiao Zhang, Xun Lin, Jun Feng, Xiaochen Yuan, Zitong Yu, Xiaochun Cao

    Abstract: Face anti-spoofing (FAS) is crucial for protecting facial recognition systems from presentation attacks. Previous methods approached this task as a classification problem, lacking interpretability and reasoning behind the predicted results. Recently, multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and decision-making in visual tasks. However, there… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.09386  [pdf, ps, other

    cs.NI cs.SI

    Instant AoI Optimization through Relay Location Selection in Disaster Multi-hop Communication

    Authors: Yang Gao, Zezhi Zeng

    Abstract: Meteorological disasters such as typhoons, forest fires, and floods can damage the communication infrastructures, which will further disable the communication capabilities of cellular networks. The multi-hop wireless communication based on IoT devices (e.g., rescue robots, UAVs, and mobile devices) becomes an available and rapidly deployable communication approach for search and rescue operations.… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09258  [pdf, ps, other

    cs.DC

    Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

    Authors: Zhonggen Li, Xiangyu Ke, Yifan Zhu, Yunjun Gao, Feifei Li

    Abstract: Graph embeddings provide continuous vector representations of nodes in a graph, which are widely applicable in community detection, recommendations, and various scientific fields. However, existing graph embedding systems either face scalability challenges due to the high cost of RAM and multiple GPUs, or rely on disk storage at the expense of I/O efficiency. In this paper, we propose Legend, a li… ▽ More

    Submitted 15 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.09155  [pdf, ps, other

    cs.CV

    AMSnet 2.0: A Large AMS Database with AI Segmentation for Net Detection

    Authors: Yichen Shi, Zhuofu Tao, Yuhao Gao, Li Huang, Hongyang Wang, Zhiping Yu, Ting-Jung Lin, Lei He

    Abstract: Current multimodal large language models (MLLMs) struggle to understand circuit schematics due to their limited recognition capabilities. This could be attributed to the lack of high-quality schematic-netlist training data. Existing work such as AMSnet applies schematic parsing to generate netlists. However, these methods rely on hard-coded heuristics and are difficult to apply to complex or noisy… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: accepted by LAD25

  6. arXiv:2505.08808  [pdf, ps, other

    cs.CV cs.AI

    SparseMeXT Unlocking the Potential of Sparse Representations for HD Map Construction

    Authors: Anqing Jiang, Jinhao Chai, Yu Gao, Yiru Wang, Yuwen Heng, Zhigang Sun, Hao Sun, Zezhong Zhao, Li Sun, Jian Zhou, Lijuan Zhu, Shugong Xu, Hao Zhao

    Abstract: Recent advancements in high-definition \emph{HD} map construction have demonstrated the effectiveness of dense representations, which heavily rely on computationally intensive bird's-eye view \emph{BEV} features. While sparse representations offer a more efficient alternative by avoiding dense BEV processing, existing methods often lag behind due to the lack of tailored designs. These limitations… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  7. arXiv:2505.08744  [pdf, other

    cs.AI

    DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

    Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang , et al. (6 additional authors not shown)

    Abstract: To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 figures

  8. arXiv:2505.07818  [pdf, other

    cs.CV

    DanceGRPO: Unleashing GRPO on Visual Generation

    Authors: Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, Ping Luo

    Abstract: Recent breakthroughs in generative models-particularly diffusion models and rectified flows-have revolutionized visual content creation, yet aligning model outputs with human preferences remains a critical challenge. Existing reinforcement learning (RL)-based methods for visual generation face critical limitations: incompatibility with modern Ordinary Differential Equations (ODEs)-based sampling p… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project Page: https://dancegrpo.github.io/

  9. arXiv:2505.07687  [pdf, ps, other

    eess.IV cs.CV

    ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

    Authors: Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao

    Abstract: Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for pr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: MICCAI 2025(under view)

  10. arXiv:2505.07559  [pdf, ps, other

    cs.IT eess.SP

    Pinching-Antenna Systems (PASS) Aided Over-the-air Computation

    Authors: Zhonghao Lyu, Haoyun Li, Yulan Gao, Ming Xiao, H. Vincent Poor

    Abstract: Over-the-air computation (AirComp) enables fast data aggregation for edge intelligence applications. However the performance of AirComp can be severely degraded by channel misalignments. Pinching antenna systems (PASS) have recently emerged as a promising solution for physically reshaping favorable wireless channels to reduce misalignments and thus AirComp errors, via low-cost, fully passive, and… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 5 figures

  11. HALO: Half Life-Based Outdated Fact Filtering in Temporal Knowledge Graphs

    Authors: Feng Ding, Tingting Wang, Yupeng Gao, Shuo Yu, Jing Ren, Feng Xia

    Abstract: Outdated facts in temporal knowledge graphs (TKGs) result from exceeding the expiration date of facts, which negatively impact reasoning performance on TKGs. However, existing reasoning methods primarily focus on positive importance of historical facts, neglecting adverse effects of outdated facts. Besides, training on these outdated facts yields extra computational cost. To address these challeng… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  12. arXiv:2505.07294  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    HuB: Learning Extreme Humanoid Balance

    Authors: Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, Yang Gao

    Abstract: The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1.5 meters-both requiring precise balance control. While recent research on humanoid control has leveraged reinforcement learning to track human motions for skill acquisition, applying this paradigm to balance-intensive tasks remains challenging. In th… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project website: https://hub-robot.github.io

  13. arXiv:2505.06883  [pdf, other

    cs.RO cs.AI cs.LG

    FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots

    Authors: Botian Xu, Haoyang Weng, Qingzhou Lu, Yang Gao, Huazhe Xu

    Abstract: Reinforcement learning (RL) has made significant strides in legged robot control, enabling locomotion across diverse terrains and complex loco-manipulation capabilities. However, the commonly used position or velocity tracking-based objectives are agnostic to forces experienced by the robot, leading to stiff and potentially dangerous behaviors and poor control during forceful interactions. To addr… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  14. arXiv:2505.05155  [pdf, other

    cs.LG cs.CR

    FedTDP: A Privacy-Preserving and Unified Framework for Trajectory Data Preparation via Federated Learning

    Authors: Zhihao Zeng, Ziquan Fang, Wei Shao, Lu Chen, Yunjun Gao

    Abstract: Trajectory data, which capture the movement patterns of people and vehicles over time and space, are crucial for applications like traffic optimization and urban planning. However, issues such as noise and incompleteness often compromise data quality, leading to inaccurate trajectory analyses and limiting the potential of these applications. While Trajectory Data Preparation (TDP) can enhance data… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  15. arXiv:2505.05056  [pdf, other

    cs.CL cs.AI

    Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations

    Authors: Linrong Pan, Chenglong Jiang, Gaoze Hou, Ying Gao

    Abstract: This paper reports the construction of the Teochew-Wild, a speech corpus of the Teochew dialect. The corpus includes 18.9 hours of in-the-wild Teochew speech data from multiple speakers, covering both formal and colloquial expressions, with precise orthographic and pinyin annotations. Additionally, we provide supplementary text processing tools and resources to propel research and applications in… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  16. arXiv:2505.04852  [pdf, other

    cs.SE cs.AI cs.PL

    PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

    Authors: Yifei Gao, Chengpeng Wang, Pengxiang Huang, Xuwei Liu, Mingwei Zheng, Xiangyu Zhang

    Abstract: There has been a growing interest in translating C code to Rust due to Rust's robust memory and thread safety guarantees. Tools such as C2RUST enable syntax-guided transpilation from C to semantically equivalent Rust code. However, the resulting Rust programs often rely heavily on unsafe constructs--particularly raw pointers--which undermines Rust's safety guarantees. This paper aims to improve th… ▽ More

    Submitted 9 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  17. arXiv:2505.04665  [pdf

    cs.CL cs.AI

    Personalized Risks and Regulatory Strategies of Large Language Models in Digital Advertising

    Authors: Haoyang Feng, Yanjun Dai, Yuan Gao

    Abstract: Although large language models have demonstrated the potential for personalized advertising recommendations in experimental environments, in actual operations, how advertising recommendation systems can be combined with measures such as user privacy protection and data security is still an area worthy of in-depth discussion. To this end, this paper studies the personalized risks and regulatory str… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  18. arXiv:2505.04653  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    Advancing Conversational Diagnostic AI with Multimodal Reasoning

    Authors: Khaled Saab, Jan Freyberg, Chunjong Park, Tim Strother, Yong Cheng, Wei-Hung Weng, David G. T. Barrett, David Stutz, Nenad Tomasev, Anil Palepu, Valentin Liévin, Yash Sharma, Roma Ruparel, Abdullah Ahmed, Elahe Vedadi, Kimberly Kanada, Cian Hughes, Yun Liu, Geoff Brown, Yang Gao, Sean Li, S. Sara Mahdavi, James Manyika, Katherine Chou, Yossi Matias , et al. (11 additional authors not shown)

    Abstract: Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in medical consultation, but the abil… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  19. arXiv:2505.03750  [pdf, other

    cs.AR cs.AI

    AI-Powered Agile Analog Circuit Design and Optimization

    Authors: Jinhai Hu, Wang Ling Goh, Yuan Gao

    Abstract: Artificial intelligence (AI) techniques are transforming analog circuit design by automating device-level tuning and enabling system-level co-optimization. This paper integrates two approaches: (1) AI-assisted transistor sizing using Multi-Objective Bayesian Optimization (MOBO) for direct circuit parameter optimization, demonstrated on a linearly tunable transconductor; and (2) AI-integrated circu… ▽ More

    Submitted 8 May, 2025; v1 submitted 17 April, 2025; originally announced May 2025.

    Comments: 3 pages, 5 figures, AI4X, 2025

  20. arXiv:2505.02795  [pdf, other

    cs.LG cs.AI cs.DC

    HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Praneeth Vepakomma, Wei Ni, Jun Luo, Yue Gao

    Abstract: Recently, large language models (LLMs) have achieved remarkable breakthroughs, revolutionizing the natural language processing domain and beyond. Due to immense parameter sizes, fine-tuning these models with private data for diverse downstream tasks has become mainstream. Though federated learning (FL) offers a promising solution for fine-tuning LLMs without sharing raw data, substantial computing… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages, 22 figures

  21. arXiv:2505.02211  [pdf, other

    eess.IV cs.CV

    CSASN: A Multitask Attention-Based Framework for Heterogeneous Thyroid Carcinoma Classification in Ultrasound Images

    Authors: Peiqi Li, Yincheng Gao, Renxing Li, Haojie Yang, Yunyun Liu, Boji Liu, Jiahui Ni, Ying Zhang, Yulu Wu, Xiaowei Fang, Lehang Guo, Liping Sun, Jiangang Chen

    Abstract: Heterogeneous morphological features and data imbalance pose significant challenges in rare thyroid carcinoma classification using ultrasound imaging. To address this issue, we propose a novel multitask learning framework, Channel-Spatial Attention Synergy Network (CSASN), which integrates a dual-branch feature extractor - combining EfficientNet for local spatial encoding and ViT for global semant… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 18 pages, 10 figures, 4 tables

  22. arXiv:2505.01974  [pdf, other

    cs.RO

    KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation

    Authors: Di Zhang, Chengbo Yuan, Chuan Wen, Hai Zhang, Junqiao Zhao, Yang Gao

    Abstract: Collecting demonstrations enriched with fine-grained tactile information is critical for dexterous manipulation, particularly in contact-rich tasks that require precise force control and physical interaction. While prior works primarily focus on teleoperation or video-based retargeting, they often suffer from kinematic mismatches and the absence of real-time tactile feedback, hindering the acquisi… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  23. arXiv:2505.00527  [pdf, other

    cs.RO

    DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation

    Authors: Zixuan Chen, Junhui Yin, Yangtao Chen, Jing Huo, Pinzhuo Tian, Jieqi Shi, Yiwen Hou, Yinchuan Li, Yang Gao

    Abstract: Generalizing language-conditioned multi-task imitation learning (IL) models to novel long-horizon 3D manipulation tasks remains a significant challenge. To address this, we propose DeCo (Task Decomposition and Skill Composition), a model-agnostic framework compatible with various multi-task IL models, designed to enhance their zero-shot generalization to novel, compositional, long-horizon 3D manip… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  24. arXiv:2505.00415  [pdf, other

    cs.LG

    CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series

    Authors: Tian Lan, Yifei Gao, Yimeng Lu, Chen Zhang

    Abstract: Unsupervised Time series anomaly detection plays a crucial role in applications across industries. However, existing methods face significant challenges due to data distributional shifts across different domains, which are exacerbated by the non-stationarity of time series over time. Existing models fail to generalize under multiple heterogeneous source domains and emerging unseen new target domai… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  25. arXiv:2505.00031  [pdf, other

    cs.CL cs.AI

    Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving

    Authors: Jin Zhang, Flood Sung, Zhilin Yang, Yang Gao, Chongjie Zhang

    Abstract: In the field of large language model (LLM) post-training, the effectiveness of utilizing synthetic data generated by the LLM itself has been well-presented. However, a key question remains unaddressed: what essential information should such self-generated data encapsulate? Existing approaches only produce step-by-step problem solutions, and fail to capture the abstract meta-knowledge necessary for… ▽ More

    Submitted 28 April, 2025; originally announced May 2025.

  26. arXiv:2504.21814  [pdf, other

    cs.CV

    Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields

    Authors: Yixin Gao, Xiaohan Pan, Xin Li, Zhibo Chen

    Abstract: The rapid development of AIGC foundation models has revolutionized the paradigm of image compression, which paves the way for the abandonment of most pixel-level transform and coding, compelling us to ask: why compress what you can generate if the AIGC foundation model is powerful enough to faithfully generate intricate structure and fine-grained details from nothing more than some compact descrip… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  27. arXiv:2504.21738  [pdf, ps, other

    cs.RO

    LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning

    Authors: Yiyang Shao, Xiaoyu Huang, Bike Zhang, Qiayuan Liao, Yuman Gao, Yufeng Chi, Zhongyu Li, Sophia Shao, Koushil Sreenath

    Abstract: General-purpose humanoid robots are expected to interact intuitively with humans, enabling seamless integration into daily life. Natural language provides the most accessible medium for this purpose. However, translating language into humanoid whole-body motion remains a significant challenge, primarily due to the gap between linguistic understanding and physical actions. In this work, we present… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  28. arXiv:2504.21414  [pdf, other

    cs.CV

    Adapting In-Domain Few-Shot Segmentation to New Domains without Retraining

    Authors: Qi Fan, Kaiqi Liu, Nian Liu, Hisham Cholakkal, Rao Muhammad Anwer, Wenbin Li, Yang Gao

    Abstract: Cross-domain few-shot segmentation (CD-FSS) aims to segment objects of novel classes in new domains, which is often challenging due to the diverse characteristics of target domains and the limited availability of support data. Most CD-FSS methods redesign and retrain in-domain FSS models using various domain-generalization techniques, which are effective but costly to train. To address these issue… ▽ More

    Submitted 12 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  29. arXiv:2504.21282  [pdf, ps, other

    cs.DB

    Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index

    Authors: Yuxiang Guo, Zhonghao Hu, Yuren Mao, Baihua Zheng, Yunjun Gao, Mingwei Zhou

    Abstract: Natural language (NL)-driven table discovery identifies relevant tables from large table repositories based on NL queries. While current deep-learning-based methods using the traditional dense vector search pipeline, i.e., representation-index-search, achieve remarkable accuracy, they face several limitations that impede further performance improvements: (i) the errors accumulated during the table… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted by VLDB 2025

  30. arXiv:2504.21054  [pdf, other

    cs.CR cs.AI

    FFCBA: Feature-based Full-target Clean-label Backdoor Attacks

    Authors: Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Liantao Wu, Zhe Li, Weifeng Liu

    Abstract: Backdoor attacks pose a significant threat to deep neural networks, as backdoored models would misclassify poisoned samples with specific triggers into target classes while maintaining normal performance on clean samples. Among these, multi-target backdoor attacks can simultaneously target multiple classes. However, existing multi-target backdoor attacks all follow the dirty-label paradigm, where… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  31. arXiv:2504.21052  [pdf, other

    cs.CR cs.AI

    SFIBA: Spatial-based Full-target Invisible Backdoor Attacks

    Authors: Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Zhishuai Li, Weifeng Liu

    Abstract: Multi-target backdoor attacks pose significant security threats to deep neural networks, as they can preset multiple target classes through a single backdoor injection. This allows attackers to control the model to misclassify poisoned samples with triggers into any desired target class during inference, exhibiting superior attack performance compared with conventional backdoor attacks. However, e… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  32. arXiv:2504.20624  [pdf, other

    cs.AI

    PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval

    Authors: Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao, Jia Chen, Zhe Xu, Yi Wu, Yao Hu

    Abstract: Social chatbots have become essential intelligent companions in daily scenarios ranging from emotional support to personal interaction. However, conventional chatbots with passive response mechanisms usually rely on users to initiate or sustain dialogues by bringing up new topics, resulting in diminished engagement and shortened dialogue duration. In this paper, we present PaRT, a novel framework… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  33. arXiv:2504.17809  [pdf, other

    cs.NI

    Monero Peer-to-peer Network Topology Analysis

    Authors: Yu Gao, Yu Zhang, Matija Piškorec, Claudio J. Tessone

    Abstract: Monero, a privacy-focused cryptocurrency, employs a decentralized peer-to-peer (P2P) network that plays a critical role in transaction propagation and consensus formation. While much research has explored Monero's privacy transaction mechanisms, its underlying P2P network architecture has remained relatively underexplored. In this study, building on our recent work on Monero network detection, we… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  34. arXiv:2504.17515  [pdf, other

    cs.CV

    Mamba-Sea: A Mamba-based Framework with Global-to-Local Sequence Augmentation for Generalizable Medical Image Segmentation

    Authors: Zihan Cheng, Jintao Guo, Jian Zhang, Lei Qi, Luping Zhou, Yinghuan Shi, Yang Gao

    Abstract: To segment medical images with distribution shifts, domain generalization (DG) has emerged as a promising setting to train models on source domains that can generalize to unseen target domains. Existing DG methods are mainly based on CNN or ViT architectures. Recently, advanced state space models, represented by Mamba, have shown promising results in various supervised medical image segmentation.… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE TMI 2025. The code is available at https://github.com/orange-czh/Mamba-Sea

  35. arXiv:2504.17309  [pdf, other

    cs.CL

    CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality

    Authors: Junyan Zhang, Shuliang Liu, Aiwei Liu, Yubo Gao, Jungang Li, Xiaojie Gu, Xuming Hu

    Abstract: Watermarking technology is a method used to trace the usage of content generated by large language models. Sentence-level watermarking aids in preserving the semantic integrity within individual sentences while maintaining greater robustness. However, many existing sentence-level watermarking techniques depend on arbitrary segmentation or generation processes to embed watermarks, which can limit t… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Published at the 1st workshop on GenAI Watermarking, collocated with ICLR 2025

  36. arXiv:2504.15986  [pdf, other

    cs.DC

    Charting the Uncharted: The Landscape of Monero Peer-to-Peer Network

    Authors: Yu Gao, Matija Piškorec, Yu Zhang, Nicolò Vallarano, Claudio J. Tessone

    Abstract: The Monero blockchain enables anonymous transactions through advanced cryptography in its peer-to-peer network, which underpins decentralization, security, and trustless interactions. However, privacy measures obscure peer connections, complicating network analysis. This study proposes a method to infer peer connections in Monero's latest protocol version, where timestamp data is unavailable. We c… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  37. arXiv:2504.15909  [pdf, other

    cs.IR

    Synergizing RAG and Reasoning: A Systematic Review

    Authors: Yunfan Gao, Yun Xiong, Yijie Zhong, Yuxi Bi, Ming Xue, Haofen Wang

    Abstract: Recent breakthroughs in large language models (LLMs), particularly in reasoning capabilities, have propelled Retrieval-Augmented Generation (RAG) to unprecedented levels. By synergizing retrieval mechanisms with advanced reasoning, LLMs can now tackle increasingly complex problems. This paper presents a systematic review of the collaborative interplay between RAG and reasoning, clearly defining "r… ▽ More

    Submitted 24 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  38. arXiv:2504.15619  [pdf, other

    cs.CV

    AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization

    Authors: Jinda Lu, Jinghan Li, Yuan Gao, Junkang Wu, Jiancan Wu, Xiang Wang, Xiangnan He

    Abstract: Preference alignment through Direct Preference Optimization (DPO) has demonstrated significant effectiveness in aligning multimodal large language models (MLLMs) with human preferences. However, existing methods focus primarily on language preferences while neglecting the critical visual context. In this paper, we propose an Adaptive Vision-enhanced Preference optimization (AdaViP) that addresses… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  39. arXiv:2504.14861  [pdf, other

    cs.DB cs.IR

    Stitching Inner Product and Euclidean Metrics for Topology-aware Maximum Inner Product Search

    Authors: Tingyang Chen, Cong Fu, Xiangyu Ke, Yunjun Gao, Yabo Ni, Anxiang Zeng

    Abstract: Maximum Inner Product Search (MIPS) is a fundamental challenge in machine learning and information retrieval, particularly in high-dimensional data applications. Existing approaches to MIPS either rely solely on Inner Product (IP) similarity, which faces issues with local optima and redundant computations, or reduce the MIPS problem to the Nearest Neighbor Search under the Euclidean metric via spa… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by SIGIR 2025

  40. arXiv:2504.14837  [pdf, other

    cs.DB

    SQL-Factory: A Multi-Agent Framework for High-Quality and Large-Scale SQL Generation

    Authors: Jiahui Li, Tongwang Wu, Yuren Mao, Yunjun Gao, Yajie Feng, Huaizhong Liu

    Abstract: High quality SQL corpus is essential for intelligent database. For example, Text-to-SQL requires SQL queries and correspond natural language questions as training samples. However, collecting such query corpus remains challenging in practice due to the high cost of manual annotation, which highlights the importance of automatic SQL generation. Despite recent advances, existing generation methods s… ▽ More

    Submitted 1 May, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  41. arXiv:2504.13479  [pdf, other

    cs.NI cs.DC cs.LG

    SFL-LEO: Asynchronous Split-Federated Learning Design for LEO Satellite-Ground Network Framework

    Authors: Jiasheng Wu, Jingjing Zhang, Zheng Lin, Zhe Chen, Xiong Wang, Wenjun Zhu, Yue Gao

    Abstract: Recently, the rapid development of LEO satellite networks spurs another widespread concern-data processing at satellites. However, achieving efficient computation at LEO satellites in highly dynamic satellite networks is challenging and remains an open problem when considering the constrained computation capability of LEO satellites. For the first time, we propose a novel distributed learning fram… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 13 pages, 14 figures

  42. arXiv:2504.12034  [pdf, other

    cs.SE cs.CR

    OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine

    Authors: Jie Ma, Ningyu He, Jinwen Xi, Mingzhe Xing, Haoyu Wang, Ying Gao, Yinliang Yue

    Abstract: As Ethereum continues to thrive, the Ethereum Virtual Machine (EVM) has become the cornerstone powering tens of millions of active smart contracts. Intuitively, security issues in EVMs could lead to inconsistent behaviors among smart contracts or even denial-of-service of the entire blockchain network. However, to the best of our knowledge, only a limited number of studies focus on the security of… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: To appear in ISSTA'25

  43. arXiv:2504.11346  [pdf, other

    cs.CV

    Seedream 3.0 Technical Report

    Authors: Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xuanda Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Zhonghua Zhai , et al. (6 additional authors not shown)

    Abstract: We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 st… ▽ More

    Submitted 16 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Seedream 3.0 Technical Report

  44. arXiv:2504.10961  [pdf

    cs.HC cs.AI

    Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students

    Authors: Audrey Zhang, Yifei Gao, Wannapon Suraworachet, Tanya Nazaretsky, Mutlu Cukurova

    Abstract: As generative AI transforms educational feedback practices, understanding students' perceptions of different feedback providers becomes crucial for effective implementation. This study addresses a critical gap by comparing undergraduate students' trust in AI-generated, human-created, and human-AI co-produced feedback, informing how institutions can adapt feedback practices in this new era. Through… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 35 pages, 6 figures. Under review at Assessment and Evaluation in Higher Education

  45. arXiv:2504.09941  [pdf, other

    cs.LG cs.AI

    FedRecon: Missing Modality Reconstruction in Distributed Heterogeneous Environments

    Authors: Junming Liu, Guosun Zeng, Ding Wang, Yanting Gao, Yufei Jin

    Abstract: Multimodal data are often incomplete and exhibit Non-Independent and Identically Distributed (Non-IID) characteristics in real-world scenarios. These inherent limitations lead to both modality heterogeneity through partial modality absence and data heterogeneity from distribution divergence, creating fundamental challenges for effective federated learning (FL). To address these coupled challenges,… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 18 pages, 32 figures

  46. arXiv:2504.09915  [pdf, other

    cs.IR cs.MM

    StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning

    Authors: Yuxi Bi, Yunfan Gao, Haofen Wang

    Abstract: Advancements in Generative AI offers new opportunities for FashionAI, surpassing traditional recommendation systems that often lack transparency and struggle to integrate expert knowledge, leaving the potential for personalized fashion styling remain untapped. To address these challenges, we present PAFA (Principle-Aware Fashion), a multi-granular knowledge base that organizes professional styling… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  47. arXiv:2504.09580  [pdf, ps, other

    cs.IT

    Bounds and Optimal Constructions of Generalized Merge-Convertible Codes for Code Conversion into LRCs

    Authors: Haoming Shi, Weijun Fang, Yuan Gao

    Abstract: Error-correcting codes are essential for ensuring fault tolerance in modern distributed data storage systems. However, in practice, factors such as the failure rates of storage devices can vary significantly over time, resulting in changes to the optimal code parameters. To reduce storage cost while maintaining efficiency, Maturana and Rashmi introduced a theoretical framework known as code conver… ▽ More

    Submitted 20 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

  48. arXiv:2504.09567  [pdf, other

    stat.ML cs.LG stat.ME

    Conditional Independence Test Based on Transport Maps

    Authors: Chenxuan He, Yuan Gao, Liping Zhu, Jian Huang

    Abstract: Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We propose a novel framework for testing conditional independence using transport maps. At the population level, we show that two well-defined transport maps can transfo… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 35 pages

    MSC Class: 62G05; 62G08; 68T07

  49. arXiv:2504.09255  [pdf, other

    cs.CV

    FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment

    Authors: Sijing Wu, Yunhao Li, Ziwen Xu, Yixuan Gao, Huiyu Duan, Wei Sun, Guangtao Zhai

    Abstract: Face video quality assessment (FVQA) deserves to be explored in addition to general video quality assessment (VQA), as face videos are the primary content on social media platforms and human visual system (HVS) is particularly sensitive to human faces. However, FVQA is rarely explored due to the lack of large-scale FVQA datasets. To fill this gap, we present the first large-scale in-the-wild FVQA… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  50. arXiv:2504.09065  [pdf, other

    cs.DB

    Substitutability-Based Graph Node Pricing

    Authors: Huiju Wang, Yuanyuan Gao, Zhengkui Wang, Xiao Yue

    Abstract: In the era o fdat commodification,the pricing o fgraph data presents unique challenges that differ significantly from traditional data markets. This paper addresses the critical issue of node pricing within graph structures, an area that has been largely overlooked in existing literature. We introduce a novel pricing mechanism based on the concept of substitutability, inspired by economic principl… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 12 pages,7 figures