Skip to main content

Showing 1–50 of 795 results for author: Gu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09701  [pdf, ps, other

    cs.CL

    VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts

    Authors: Xin Liu, Lechen Zhang, Sheza Munir, Yiyang Gu, Lu Wang

    Abstract: Large language models (LLMs) excel at generating long-form responses, but evaluating their factuality remains challenging due to complex inter-sentence dependencies within the generated facts. Prior solutions predominantly follow a decompose-decontextualize-verify pipeline but often fail to capture essential context and miss key relational facts. In this paper, we introduce VeriFact, a factuality… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2505.09325  [pdf, ps, other

    cs.SD eess.AS

    SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset

    Authors: Yicheng Gu, Chaoren Wang, Junan Zhang, Xueyao Zhang, Zihao Fang, Haorui He, Zhizheng Wu

    Abstract: The lack of a publicly-available large-scale and diverse dataset has long been a significant bottleneck for singing voice applications like Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC). To tackle this problem, we present SingNet, an extensive, diverse, and in-the-wild singing voice dataset. Specifically, we propose a data processing pipeline to extract ready-to-use training dat… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.05913  [pdf, ps, other

    cs.CV

    DFEN: Dual Feature Equalization Network for Medical Image Segmentation

    Authors: Jianjian Yin, Yi Chen, Chengyu Li, Zhichao Zheng, Yanhui Gu, Junsheng Zhou

    Abstract: Current methods for medical image segmentation primarily focus on extracting contextual feature information from the perspective of the whole image. While these methods have shown effective performance, none of them take into account the fact that pixels at the boundary and regions with a low number of class pixels capture more contextual feature information from other classes, leading to misclass… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  4. arXiv:2505.03981  [pdf, ps, other

    cs.AI cs.CL cs.LG

    X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

    Authors: Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon

    Abstract: Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  5. arXiv:2504.21501  [pdf, ps, other

    cs.LG

    Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables

    Authors: Yaru Liu, Yiqi Gu, Michael K. Ng

    Abstract: In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning because of the high non-convexity of loss functions and the vanishing gradient issue. Our idea is to introduce auxiliary variables to separate the layers of the de… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 32 pages, 11 figures

  6. arXiv:2504.21040  [pdf, other

    cs.CV

    Can a Large Language Model Assess Urban Design Quality? Evaluating Walkability Metrics Across Expertise Levels

    Authors: Chenyi Cai, Kosuke Kuriyama, Youlong Gu, Filip Biljecki, Pieter Herthogs

    Abstract: Urban street environments are vital to supporting human activity in public spaces. The emergence of big data, such as street view images (SVIs) combined with multimodal large language models (MLLMs), is transforming how researchers and practitioners investigate, measure, and evaluate semantic and visual elements of urban environments. Considering the low threshold for creating automated evaluative… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  7. arXiv:2504.19432  [pdf, other

    cs.CV cs.AI

    EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation

    Authors: Zhe Dong, Yuzhe Sun, Tianzhu Liu, Wangmeng Zuo, Yanfeng Gu

    Abstract: Satellite imagery and maps, as two fundamental data modalities in remote sensing, offer direct observations of the Earth's surface and human-interpretable geographic abstractions, respectively. The task of bidirectional translation between satellite images and maps (BSMT) holds significant potential for applications in urban planning and disaster response. However, this task presents two major cha… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  8. arXiv:2504.19314  [pdf, other

    cs.CL

    BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

    Authors: Peilin Zhou, Bruce Leon, Xiang Ying, Can Zhang, Yifan Shao, Qichen Ye, Dading Chong, Zhiling Jin, Chenxuan Xie, Meng Cao, Yuxin Gu, Sixin Hong, Jing Ren, Jian Chen, Chao Liu, Yining Hua

    Abstract: As large language models (LLMs) evolve into tool-using agents, the ability to browse the web in real-time has become a critical yardstick for measuring their reasoning and retrieval competence. Existing benchmarks such as BrowseComp concentrate on English and overlook the linguistic, infrastructural, and censorship-related complexities of other major information ecosystems -- most notably Chinese.… ▽ More

    Submitted 1 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: Under Review

  9. TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System

    Authors: Zheng Wei, Jing Xing, Yida Gu, Wenjing Huang, Dong Dai, Guangming Tan, Dingwen Tao

    Abstract: Compared to replication-based storage systems, erasure-coded storage incurs significantly higher overhead during data updates. To address this issue, various parity logging methods have been pro- posed. Nevertheless, due to the long update path and substantial amount of random I/O involved in erasure code update processes, the resulting long latency and low throughput often fail to meet the requir… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 14 pages, 8 figures, accepted by ACM HPDC 2025

  10. arXiv:2504.14845  [pdf, other

    cs.IR

    Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph

    Authors: Qiushi Xiong, Zhipeng Xu, Zhenghao Liu, Mengjia Wang, Zulong Chen, Yue Sun, Yu Gu, Xiaohua Li, Ge Yu

    Abstract: Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation. Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents. Existing models often rely on the emergent capabilities of Large Language Models (LLM… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  11. arXiv:2504.14812  [pdf, ps, other

    cs.CR

    CSI2Dig: Recovering Digit Content from Smartphone Loudspeakers Using Channel State Information

    Authors: Yangyang Gu, Xianglong Li, Haolin Wu, Jing Chen, Kun He, Ruiying Du, Cong Wu

    Abstract: Eavesdropping on sounds emitted by mobile device loudspeakers can capture sensitive digital information, such as SMS verification codes, credit card numbers, and withdrawal passwords, which poses significant security risks. Existing schemes either require expensive specialized equipment, rely on spyware, or are limited to close-range signal acquisition. In this paper, we propose a scheme, CSI2Dig,… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: 14 pages, 14 figures

    MSC Class: 68T10 ACM Class: I.5.1

  12. arXiv:2504.14694  [pdf, other

    cs.LG cs.AI

    Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data

    Authors: Yuting He, Yiqiang Chen, XiaoDong Yang, Hanchao Yu, Yi-Hua Huang, Yang Gu

    Abstract: Federated learning (FL) enables multiple clients to collaboratively train a global model while keeping local data decentralized. Data heterogeneity (non-IID) across clients has imposed significant challenges to FL, which makes local models re-optimize towards their own local optima and forget the global knowledge, resulting in performance degradation and convergence slowdown. Many existing works h… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  13. arXiv:2504.12341  [pdf, other

    cs.CL

    Streamlining Biomedical Research with Specialized LLMs

    Authors: Linqing Chen, Weilei Wang, Yubin Xia, Wentao Wu, Peng Xu, Zilong Bai, Jie Fang, Chaobo Xu, Ran Hu, Licong Xu, Haoran Hua, Jing Sun, Hanmeng Zhong, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yong Gu, Tao Shi, Chaochao Wang, Jianping Lu, Cheng Sun, Yixin Wang , et al. (8 additional authors not shown)

    Abstract: In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, imag… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Journal ref: Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations,p9--19,2025

  14. arXiv:2504.10967  [pdf, other

    cs.CV

    An Efficient and Mixed Heterogeneous Model for Image Restoration

    Authors: Yubin Gu, Yuan Meng, Kaihang Zheng, Xiaoshuai Sun, Jiayi Ji, Weijian Ruan, Liujuan Cao, Rongrong Ji

    Abstract: Image restoration~(IR), as a fundamental multimedia data processing task, has a significant impact on downstream visual applications. In recent years, researchers have focused on developing general-purpose IR models capable of handling diverse degradation types, thereby reducing the cost and complexity of model development. Current mainstream approaches are based on three architectural paradigms:… ▽ More

    Submitted 19 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: v2: modify some typos

  15. arXiv:2504.09506  [pdf, other

    cs.CV

    Pillar-Voxel Fusion Network for 3D Object Detection in Airborne Hyperspectral Point Clouds

    Authors: Yanze Jiang, Yanfeng Gu, Xian Li

    Abstract: Hyperspectral point clouds (HPCs) can simultaneously characterize 3D spatial and spectral information of ground objects, offering excellent 3D perception and target recognition capabilities. Current approaches for generating HPCs often involve fusion techniques with hyperspectral images and LiDAR point clouds, which inevitably lead to geometric-spectral distortions due to fusion errors and obstacl… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  16. arXiv:2504.08646  [pdf, other

    cs.CV cs.RO

    MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction

    Authors: Ian Noronha, Advait Prasad Jawaji, Juan Camilo Soto, Jiajun An, Yan Gu, Upinder Kaur

    Abstract: Animal-robot interaction (ARI) remains an unexplored challenge in robotics, as robots struggle to interpret the complex, multimodal communication cues of animals, such as body language, movement, and vocalizations. Unlike human-robot interaction, which benefits from established datasets and frameworks, animal-robot interaction lacks the foundational resources needed to facilitate meaningful bidire… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted to ICRA 2025

  17. arXiv:2504.08524  [pdf, other

    eess.AS cs.AI cs.SD

    Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion

    Authors: Na Li, Chuke Wang, Yu Gu, Zhifeng Li

    Abstract: Voice conversion (VC) transforms source speech into a target voice by preserving the content. However, timbre information from the source speaker is inherently embedded in the content representations, causing significant timbre leakage and reducing similarity to the target speaker. To address this, we introduce a residual block to a content extractor. The residual block consists of two weighted br… ▽ More

    Submitted 29 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  18. arXiv:2504.07079  [pdf, other

    cs.AI cs.CL cs.CV

    SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

    Authors: Boyuan Zheng, Michael Y. Fatemi, Xiaolong Jin, Zora Zhiruo Wang, Apurva Gandhi, Yueqi Song, Yu Gu, Jayanth Srinivasa, Gaowen Liu, Graham Neubig, Yu Su

    Abstract: To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  19. AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting

    Authors: Xiaolin Fan, Yan Wang, Yingying Zhang, Mingkun Bao, Bosen Jia, Dong Lu, Yifan Gu, Jian Cheng, Haogang Zhu

    Abstract: Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. Howe… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 12 pages, 8 figures, published to TMI

    Journal ref: IEEE TRANSACTIONS ON MEDICAL IMAGING, March 2025

  20. arXiv:2504.05594  [pdf, other

    cs.CV

    Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

    Authors: Qi Mao, Lan Chen, Yuchao Gu, Mike Zheng Shou, Ming-Hsuan Yang

    Abstract: Balancing fidelity and editability is essential in text-based image editing (TIE), where failures commonly lead to over- or under-editing issues. Existing methods typically rely on attention injections for structure preservation and leverage the inherent text alignment capabilities of pre-trained text-to-image (T2I) models for editability, but they lack explicit and unified mechanisms to properly… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: under review

  21. arXiv:2504.04619  [pdf, other

    cs.DS

    New Algorithms for Incremental Minimum Spanning Trees and Temporal Graph Applications

    Authors: Xiangyun Ding, Yan Gu, Yihan Sun

    Abstract: Processing graphs with temporal information (the temporal graphs) has become increasingly important in the real world. In this paper, we study efficient solutions to temporal graph applications using new algorithms for Incremental Minimum Spanning Trees (MST). The first contribution of this work is to formally discuss how a broad set of setting-problem combinations of temporal graph processing can… ▽ More

    Submitted 9 May, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  22. arXiv:2504.04589  [pdf, other

    cs.SD eess.AS eess.SP

    Diff-SSL-G-Comp: Towards a Large-Scale and Diverse Dataset for Virtual Analog Modeling

    Authors: Yicheng Gu, Runsong Zhang, Lauri Juvela, Zhizheng Wu

    Abstract: Virtual Analog (VA) modeling aims to simulate the behavior of hardware circuits via algorithms to replicate their tone digitally. Dynamic Range Compressor (DRC) is an audio processing module that controls the dynamics of a track by reducing and amplifying the volumes of loud and quiet sounds, which is essential in music production. In recent years, neural-network-based VA modeling has shown great… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: Submitted to DAFx 2025

  23. arXiv:2504.02842  [pdf, other

    eess.SP cs.LG stat.AP stat.ME

    Enhanced ECG Arrhythmia Detection Accuracy by Optimizing Divergence-Based Data Fusion

    Authors: Baozhuo Su, Qingli Dou, Kang Liu, Zhengxian Qu, Jerry Deng, Ting Tan, Yanan Gu

    Abstract: AI computation in healthcare faces significant challenges when clinical datasets are limited and heterogeneous. Integrating datasets from multiple sources and different equipments is critical for effective AI computation but is complicated by their diversity, complexity, and lack of representativeness, so we often need to join multiple datasets for analysis. The currently used method is fusion aft… ▽ More

    Submitted 19 March, 2025; originally announced April 2025.

    Comments: 13 pages, 8 figures, 6 tables

  24. arXiv:2504.01990  [pdf, other

    cs.AI

    Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

    Authors: Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, Yuheng Cheng, Suyuchen Wang, Xiaoqiang Wang, Yuyu Luo, Haibo Jin, Peiyan Zhang, Ollie Liu, Jiaqi Chen, Huan Zhang, Zhaoyang Yu, Haochen Shi, Boyan Li, Dekun Wu, Fengwei Teng, Xiaojun Jia , et al. (22 additional authors not shown)

    Abstract: The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains. As these agents increasingly drive AI research and practical applications, their design, evaluation, and continuous improvement present intricate… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  25. arXiv:2504.01786  [pdf, other

    cs.GR cs.LG

    BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing

    Authors: Yunqi Gu, Ian Huang, Jihyeon Je, Guandao Yang, Leonidas Guibas

    Abstract: 3D graphics editing is crucial in applications like movie production and game design, yet it remains a time-consuming process that demands highly specialized domain expertise. Automating this process is challenging because graphical editing requires performing a variety of tasks, each requiring distinct skill sets. Recently, vision-language models (VLMs) have emerged as a powerful framework for au… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: CVPR 2025 Accepted

  26. arXiv:2504.00977  [pdf, ps, other

    cs.CL

    Chinese Grammatical Error Correction: A Survey

    Authors: Mengyang Qiu, Qingyu Gao, Linxuan Yang, Yang Gu, Tran Minh Nguyen, Zihao Huang, Jungyeul Park

    Abstract: Chinese Grammatical Error Correction (CGEC) is a critical task in Natural Language Processing, addressing the growing demand for automated writing assistance in both second-language (L2) and native (L1) Chinese writing. While L2 learners struggle with mastering complex grammatical structures, L1 users also benefit from CGEC in academic, professional, and formal contexts where writing precision is… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  27. arXiv:2503.23536  [pdf, other

    cs.LG cs.AI

    A Survey on Unlearnable Data

    Authors: Jiahao Li, Yiqiang Chen, Yunbing Xing, Yang Gu, Xiangyuan Lan

    Abstract: Unlearnable data (ULD) has emerged as an innovative defense technique to prevent machine learning models from learning meaningful patterns from specific data, thus protecting data privacy and security. By introducing perturbations to the training data, ULD degrades model performance, making it difficult for unauthorized models to extract useful representations. Despite the growing significance of… ▽ More

    Submitted 1 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

    Comments: 31 pages, 3 figures, Code in https://github.com/LiJiahao-Alex/Awesome-UnLearnable-Data

  28. arXiv:2503.21460  [pdf, other

    cs.CL

    Large Language Model Agent: A Survey on Methodology, Applications and Challenges

    Authors: Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu , et al. (1 additional authors not shown)

    Abstract: The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architec… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 329 papers surveyed, resources are at https://github.com/luo-junyu/Awesome-Agent-Papers

  29. arXiv:2503.20631  [pdf, other

    cs.RO cs.CV

    Robust Flower Cluster Matching Using The Unscented Transform

    Authors: Andy Chu, Rashik Shrestha, Yu Gu, Jason N. Gross

    Abstract: Monitoring flowers over time is essential for precision robotic pollination in agriculture. To accomplish this, a continuous spatial-temporal observation of plant growth can be done using stationary RGB-D cameras. However, image registration becomes a serious challenge due to changes in the visual appearance of the plant caused by the pollination process and occlusions from growth and camera angle… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: *CASE2025 Under Review*

  30. arXiv:2503.19406  [pdf, other

    cs.CV

    M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation

    Authors: Ziyuan Liu, Jiawei Zhang, Wenyu Wang, Yuantao Gu

    Abstract: Most existing change detection (CD) methods focus on optical images captured at different times, and deep learning (DL) has achieved remarkable success in this domain. However, in extreme scenarios such as disaster response, synthetic aperture radar (SAR), with its active imaging capability, is more suitable for providing post-event data. This introduces new challenges for CD methods, as existing… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 5 pages, 2 figures

  31. arXiv:2503.19325  [pdf, other

    cs.CV

    Long-Context Autoregressive Video Modeling with Next-Frame Prediction

    Authors: Yuchao Gu, Weijia Mao, Mike Zheng Shou

    Abstract: Long-context autoregressive modeling has significantly advanced language generation, but video generation still struggles to fully utilize extended temporal contexts. To investigate long-context video modeling, we introduce Frame AutoRegressive (FAR), a strong baseline for video autoregressive modeling. Just as language models learn causal dependencies between tokens (i.e., Token AR), FAR models t… ▽ More

    Submitted 17 April, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: Project page at https://farlongctx.github.io/

  32. arXiv:2503.18784  [pdf, other

    cs.CV

    Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection

    Authors: Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, Yan Gu

    Abstract: Out-of-distribution (OOD) detection is the task of identifying inputs that deviate from the training data distribution. This capability is essential for safely deploying deep computer vision models in open-world environments. In this work, we propose a post-hoc method, Perturbation-Rectified OOD detection (PRO), based on the insight that prediction confidence for OOD inputs is more susceptible to… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  33. arXiv:2503.17983  [pdf, other

    cs.CV

    Histomorphology-driven multi-instance learning for breast cancer WSI classification

    Authors: Baizhi Wang, Rui Yan, Wenxin Ma, Xu Zhang, Yuhao Wang, Xiaolong Li, Yunjie Gu, Zihang Jiang, S. Kevin Zhou

    Abstract: Histomorphology is crucial in breast cancer diagnosis. However, existing whole slide image (WSI) classification methods struggle to effectively incorporate histomorphology information, limiting their ability to capture key and fine-grained pathological features. To address this limitation, we propose a novel framework that explicitly incorporates histomorphology (tumor cellularity, cellular morpho… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 10 pages,5 figures

  34. arXiv:2503.15667  [pdf, other

    cs.CV

    DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

    Authors: Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li

    Abstract: Generating high-quality 360-degree views of human heads from single-view images is essential for enabling accessible immersive telepresence applications and scalable personalized content creation. While cutting-edge methods for full head generation are limited to modeling realistic human heads, the latest diffusion-based approaches for style-omniscient head synthesis can produce only frontal views… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: Page:https://freedomgu.github.io/DiffPortrait360 Code:https://github.com/FreedomGu/DiffPortrait360/

  35. arXiv:2503.13327  [pdf, other

    cs.CV

    Edit Transfer: Learning Image Editing via Vision In-Context Relations

    Authors: Lan Chen, Qi Mao, Yuchao Gu, Mike Zheng Shou

    Abstract: We introduce a new setting, Edit Transfer, where a model learns a transformation from just a single source-target example and applies it to a new query image. While text-based methods excel at semantic manipulations through textual prompts, they often struggle with precise geometric details (e.g., poses and viewpoint changes). Reference-based editing, on the other hand, typically focuses on style… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  36. arXiv:2503.12441  [pdf, other

    cs.CV

    Consistent-Point: Consistent Pseudo-Points for Semi-Supervised Crowd Counting and Localization

    Authors: Yuda Zou, Zelong Liu, Yuliang Gu, Bo Du, Yongchao Xu

    Abstract: Crowd counting and localization are important in applications such as public security and traffic management. Existing methods have achieved impressive results thanks to extensive laborious annotations. This paper propose a novel point-localization-based semi-supervised crowd counting and localization method termed Consistent-Point. We identify and address two inconsistencies of pseudo-points, whi… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  37. arXiv:2503.11692  [pdf, other

    cs.RO cs.CV

    FloPE: Flower Pose Estimation for Precision Pollination

    Authors: Rashik Shrestha, Madhav Rijal, Trevor Smith, Yu Gu

    Abstract: This study presents Flower Pose Estimation (FloPE), a real-time flower pose estimation framework for computationally constrained robotic pollination systems. Robotic pollination has been proposed to supplement natural pollination to ensure global food security due to the decreased population of natural pollinators. However, flower pose estimation for pollination is challenging due to natural varia… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: IROS2025 under review

  38. arXiv:2503.11342  [pdf, other

    cs.CV

    Road Rage Reasoning with Vision-language Models (VLMs): Task Definition and Evaluation Dataset

    Authors: Yibing Weng, Yu Gu, Fuji Ren

    Abstract: Road rage, triggered by driving-related stimuli such as traffic congestion and aggressive driving, poses a significant threat to road safety. Previous research on road rage regulation has primarily focused on response suppression, lacking proactive prevention capabilities. With the advent of Vision-Language Models (VLMs), it has become possible to reason about trigger events visually and then enga… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  39. arXiv:2503.10103  [pdf, other

    cs.CV cs.LG

    Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation

    Authors: Jiawei Zhang, Ziyuan Liu, Leon Yan, Gen Li, Yuantao Gu

    Abstract: Diffusion models have demonstrated remarkable performance in modeling complex data priors, catalyzing their widespread adoption in solving various inverse problems. However, the inherently iterative nature of diffusion-based inverse algorithms often requires hundreds to thousands of steps, with performance degradation occurring under fewer steps which limits their practical applicability. While hi… ▽ More

    Submitted 16 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: preprint

  40. arXiv:2503.09078  [pdf, other

    cs.RO

    Sequential Multi-Object Grasping with One Dexterous Hand

    Authors: Sicheng He, Zeyu Shangguan, Kuanning Wang, Yongchong Gu, Yuqian Fu, Yanwei Fu, Daniel Seita

    Abstract: Sequentially grasping multiple objects with multi-fingered hands is common in daily life, where humans can fully leverage the dexterity of their hands to enclose multiple objects. However, the diversity of object geometries and the complex contact interactions required for high-DOF hands to grasp one object while enclosing another make sequential multi-object grasping challenging for robots. In th… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  41. arXiv:2503.07497  [pdf, other

    cs.RO

    Force Aware Branch Manipulation To Assist Agricultural Tasks

    Authors: Madhav Rijal, Rashik Shrestha, Trevor Smith, Yu Gu

    Abstract: This study presents a methodology to safely manipulate branches to aid various agricultural tasks. Humans in a real agricultural environment often manipulate branches to perform agricultural tasks effectively, but current agricultural robots lack this capability. This proposed strategy to manipulate branches can aid in different precision agriculture tasks, such as fruit picking in dense foliage,… ▽ More

    Submitted 11 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  42. arXiv:2503.04156  [pdf

    eess.SP cs.SD eess.AS

    Frequency-Based Alignment of EEG and Audio Signals Using Contrastive Learning and SincNet for Auditory Attention Detection

    Authors: Yuan Liao, Yuhong Zhang, Qiushi Han, Yuhang Yang, Weiwei Ding, Yuzhe Gu, Hengxin Yang, Liya Huang

    Abstract: Humans exhibit a remarkable ability to focus auditory attention in complex acoustic environments, such as cocktail parties. Auditory attention detection (AAD) aims to identify the attended speaker by analyzing brain signals, such as electroencephalography (EEG) data. Existing AAD algorithms often leverage deep learning's powerful nonlinear modeling capabilities, few consider the neural mechanisms… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  43. arXiv:2503.03562  [pdf, other

    cs.CV cs.AI

    Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection

    Authors: Wenqiao Li, Yao Gu, Xintao Chen, Xiaohao Xu, Ming Hu, Xiaonan Huang, Yingna Wu

    Abstract: Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge. The long-term goal of Industrial Anomaly Detection (IAD) is to enable machines to autonomously replicate this skill. However, current IAD algorithms are largely developed and tested on static, semantically simple datasets, which diverge from real-world scenarios where… ▽ More

    Submitted 25 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR25

  44. arXiv:2503.03294  [pdf, other

    eess.IV cs.CV

    Interactive Segmentation and Report Generation for CT Images

    Authors: Yannian Gu, Wenhui Lei, Hanyu Chen, Xiaofan Zhang, Shaoting Zhang

    Abstract: Automated CT report generation plays a crucial role in improving diagnostic accuracy and clinical workflow efficiency. However, existing methods lack interpretability and impede patient-clinician understanding, while their static nature restricts radiologists from dynamically adjusting assessments during image review. Inspired by interactive segmentation techniques, we propose a novel interactive… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  45. arXiv:2503.02846  [pdf, other

    cs.CL

    Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

    Authors: Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Large language models (LLMs) exhibit hallucinations (i.e., unfaithful or nonsensical information) when serving as AI assistants in various domains. Since hallucinations always come with truthful content in the LLM responses, previous factuality alignment methods that conduct response-level preference learning inevitably introduced noises during training. Therefore, this paper proposes a fine-grain… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Accepted by ICLR 2025. Code is available at https://github.com/open-compass/ANAH

  46. arXiv:2503.02047  [pdf, other

    cs.DB

    Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification

    Authors: Yumeng Song, Yu Gu, Tianyi Li, Yushuai Li, Christian S. Jensen, Ge Yu

    Abstract: As large volumes of trajectory data accumulate, simplifying trajectories to reduce storage and querying costs is increasingly studied. Existing proposals face three main problems. First, they require numerous iterations to decide which GPS points to delete. Second, they focus only on the relationships between neighboring points (local information) while neglecting the overall structure (global inf… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted by VLDB2025

  47. arXiv:2503.01424  [pdf, other

    cs.AI cs.CL

    From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems

    Authors: Zekun Zhou, Xiaocheng Feng, Lei Huang, Xiachong Feng, Ziyun Song, Ruihan Chen, Liang Zhao, Weitao Ma, Yuxuan Gu, Baoxin Wang, Dayong Wu, Guoping Hu, Ting Liu, Bing Qin

    Abstract: Research is a fundamental process driving the advancement of human civilization, yet it demands substantial time and effort from researchers. In recent years, the rapid development of artificial intelligence (AI) technologies has inspired researchers to explore how AI can accelerate and enhance research. To monitor relevant advancements, this paper presents a systematic review of the progress in t… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  48. arXiv:2503.01390  [pdf, other

    cs.OS cs.PL cs.SE

    Scalable and Accurate Application-Level Crash-Consistency Testing via Representative Testing

    Authors: Yile Gu, Ian Neal, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci

    Abstract: Crash consistency is essential for applications that must persist data. Crash-consistency testing has been commonly applied to find crash-consistency bugs in applications. The crash-state space grows exponentially as the number of operations in the program increases, necessitating techniques for pruning the search space. However, state-of-the-art crash-state space pruning is far from ideal. Some t… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: OOPSLA 2025

  49. arXiv:2503.00273  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

    Authors: Yuzhou Gu, Yanjun Han, Jian Qian

    Abstract: We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we characterize the optimal success probability and mutual information over time. Our findings reveal distinct growth phases in mutual information -- initially linear, t… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

  50. arXiv:2502.21253  [pdf

    cs.CE

    A novel boundary integrated neural networks for in plane fracture mechanics analysis of elastic and piezoelectric materials

    Authors: Peijun Zhang, Yan Gu, Okyay Altay, Chuanzeng Zhang

    Abstract: In this study, we propose a novel approach, termed boundary integrated neural networks (BINNs), for analyzing in-plane crack problems within the framework of linear elastic fracture mechanics. The proposed approach integrates artificial neural networks (ANNs) with classical boundary integral equations (BIEs), enabling an efficient and accurate evaluation of partial differential equations (PDEs) as… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.