Skip to main content

Showing 1–50 of 286 results for author: Lyu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03730  [pdf, ps, other

    cs.CV cs.AI cs.HC cs.LG

    Less is More: Empowering GUI Agent with Context-Aware Simplification

    Authors: Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie

    Abstract: The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the ex… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted to ICCV 2025

  2. arXiv:2507.03037  [pdf, ps, other

    cs.CV

    Intelligent Histology for Tumor Neurosurgery

    Authors: Xinhai Hou, Akhil Kondepudi, Cheng Jiang, Yiwei Lyu, Samir Harake, Asadur Chowdury, Anna-Katharina Meißner, Volker Neuschmelting, David Reinecke, Gina Furtjes, Georg Widhalm, Lisa Irina Koerner, Jakob Straehle, Nicolas Neidert, Pierre Scheffler, Juergen Beck, Michael Ivan, Ashish Shah, Aditya Pandey, Sandra Camelo-Piragua, Dieter Henrik Heiland, Oliver Schnell, Chris Freudiger, Jacob Young, Melike Pekmezci , et al. (5 additional authors not shown)

    Abstract: The importance of rapid and accurate histologic analysis of surgical tissue in the operating room has been recognized for over a century. Our standard-of-care intraoperative pathology workflow is based on light microscopy and H\&E histology, which is slow, resource-intensive, and lacks real-time digital imaging capabilities. Here, we present an emerging and innovative method for intraoperative his… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.02503  [pdf, ps, other

    cs.LG cs.AI cs.CE

    Continual Gradient Low-Rank Projection Fine-Tuning for LLMs

    Authors: Chenxu Wang, Yilin Lyu, Zicheng Sun, Liping Jing

    Abstract: Continual fine-tuning of Large Language Models (LLMs) is hampered by the trade-off between efficiency and expressiveness. Low-Rank Adaptation (LoRA) offers efficiency but constrains the model's ability to learn new tasks and transfer knowledge due to its low-rank nature and reliance on explicit parameter constraints. We propose GORP (Gradient LOw Rank Projection) for Continual Learning, a novel tr… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 15 pages, 6 figures, accepted by ACL 2025 main

  4. arXiv:2507.02029  [pdf, ps, other

    cs.RO

    RoboBrain 2.0 Technical Report

    Authors: BAAI RoboBrain Team, Mingyu Cao, Huajie Tan, Yuheng Ji, Minglan Lin, Zhiyu Li, Zhou Cao, Pengwei Wang, Enshen Zhou, Yi Han, Yingbo Tang, Xiangqi Xu, Wei Guo, Yaoxu Lyu, Yijie Xu, Jiayu Shi, Mengfei Du, Cheng Chi, Mengdi Zhao, Xiaoshuai Hao, Junkai Zhao, Xiaojie Zhang, Sh/anyu Rong, Huaihai Lyu, Zhengliang Cai , et al. (26 additional authors not shown)

    Abstract: We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain… ▽ More

    Submitted 5 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

  5. arXiv:2507.01243  [pdf, ps, other

    cs.RO cs.LG

    Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion

    Authors: Ziang Zheng, Guojian Zhan, Shiqi Liu, Yao Lyu, Tao Zhang, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has shown great potential in enabling quadruped robots to perform agile locomotion. However, directly training policies to simultaneously handle dual extreme challenges, i.e., extreme underactuation and extreme terrains, as in monopedal hopping tasks, remains highly challenging due to unstable early-stage interactions and unreliable reward feedback. To address this, we… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  6. arXiv:2506.18443  [pdf, ps, other

    cs.RO cs.CV

    Radar and Event Camera Fusion for Agile Robot Ego-Motion Estimation

    Authors: Yang Lyu, Zhenghao Zou, Yanfeng Li, Chunhui Zhao, Quan Pan

    Abstract: Achieving reliable ego motion estimation for agile robots, e.g., aerobatic aircraft, remains challenging because most robot sensors fail to respond timely and clearly to highly dynamic robot motions, often resulting in measurement blurring, distortion, and delays. In this paper, we propose an IMU-free and feature-association-free framework to achieve aggressive ego-motion velocity estimation of a… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  7. arXiv:2506.13276  [pdf, ps, other

    cs.AI

    Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks

    Authors: Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang

    Abstract: Text-attributed graphs (TAGs) integrate textual data with graph structures, providing valuable insights in applications such as social network analysis and recommendation systems. Graph Neural Networks (GNNs) effectively capture both topological structure and textual information in TAGs but are vulnerable to adversarial attacks. Existing graph injection attack (GIA) methods assume that attackers c… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  8. arXiv:2506.11148  [pdf, ps, other

    cs.CV cs.LG

    LLM-to-Phy3D: Physically Conform Online 3D Object Generation with LLMs

    Authors: Melvin Wong, Yueming Lyu, Thiago Rios, Stefan Menzel, Yew-Soon Ong

    Abstract: The emergence of generative artificial intelligence (GenAI) and large language models (LLMs) has revolutionized the landscape of digital content creation in different modalities. However, its potential use in Physical AI for engineering design, where the production of physically viable artifacts is paramount, remains vastly underexplored. The absence of physical knowledge in existing LLM-to-3D mod… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  9. arXiv:2506.10366  [pdf, ps, other

    cs.CV

    FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion

    Authors: Tianpei Zhang, Jufeng Zhao, Yiming Zhu, Guangmang Cui, Yuhan Lyu

    Abstract: The infrared and visible images fusion (IVIF) is receiving increasing attention from both the research community and industry due to its excellent results in downstream applications. Existing deep learning approaches often utilize convolutional neural networks to extract image features. However, the inherently capacity of convolution operations to capture global context can lead to information los… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  10. arXiv:2506.09345  [pdf, ps, other

    cs.CV

    An Effective End-to-End Solution for Multimodal Action Recognition

    Authors: Songping Wang, Xiantao Hu, Yueming Lyu, Caifeng Shan

    Abstract: Recently, multimodal tasks have strongly advanced the field of action recognition with their rich multimodal information. However, due to the scarcity of tri-modal data, research on tri-modal action recognition tasks faces many challenges. To this end, we have proposed a comprehensive multimodal action recognition solution that effectively utilizes multimodal information. First, the existing data… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  11. arXiv:2506.09344  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.SD eess.AS

    Ming-Omni: A Unified Multimodal Model for Perception and Generation

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Chuanyang Zheng, Chunluan Zhou, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Dandan Zheng, Fudong Wang, Furong Xu, GuangMing Yao, Jun Zhou, Jingdong Chen, Jianxin Sun, Jiajia Liu, Jianjiang Zhu, Jun Peng, Kaixiang Ji, Kaiyou Song, Kaimeng Ren, Libin Wang, Lixiang Ru, Lele Xie, Longhua Tan , et al. (33 additional authors not shown)

    Abstract: We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 18 pages,8 figures

  12. arXiv:2506.02839  [pdf, ps, other

    cs.IR cs.AI

    DeepShop: A Benchmark for Deep Research Shopping Agents

    Authors: Yougang Lyu, Xiaoyu Zhang, Lingyong Yan, Maarten de Rijke, Zhaochun Ren, Xiuying Chen

    Abstract: Web agents for online shopping have shown great promise in automating user interactions across e-commerce platforms. Benchmarks for assessing such agents do not reflect the complexity of real-world shopping scenarios, as they often consist of overly simple queries with deterministic paths, such as "Find iPhone 15." Real shopping scenarios are inherently more layered, involving multi-dimensional pr… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  13. arXiv:2505.22967  [pdf, ps, other

    cs.LG cs.MA

    MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming

    Authors: Chengqi Zheng, Jianda Chen, Yueming Lyu, Wen Zheng Terence Ng, Haopeng Zhang, Yew-Soon Ong, Ivor Tsang, Haiyan Yin

    Abstract: Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  14. arXiv:2505.21862  [pdf, ps, other

    cs.CV

    Towards Scalable Language-Image Pre-training for 3D Medical Imaging

    Authors: Chenhui Zhao, Yiwei Lyu, Asadur Chowdury, Edward Harake, Akhil Kondepudi, Akshay Rao, Xinhai Hou, Honglak Lee, Todd Hollon

    Abstract: Language-image pre-training has demonstrated strong performance in 2D medical imaging, but its success in 3D modalities such as CT and MRI remains limited due to the high computational demands of volumetric data, which pose a significant barrier to training on large-scale, uncurated clinical studies. In this study, we introduce Hierarchical attention for Language-Image Pre-training (HLIP), a scala… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  15. arXiv:2505.18657  [pdf, ps, other

    cs.AI

    MLLMs are Deeply Affected by Modality Bias

    Authors: Xu Zheng, Chenfei Liao, Yuqian Fu, Kaiyu Lei, Yuanhuiyi Lyu, Lutao Jiang, Bin Ren, Jialei Chen, Jiawen Wang, Chengxin Li, Linfeng Zhang, Danda Pani Paudel, Xuanjing Huang, Yu-Gang Jiang, Nicu Sebe, Dacheng Tao, Luc Van Gool, Xuming Hu

    Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images. MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs. This position paper argues that MLLMs are deeply affected by modality bias. Firstly, we diagnose the current state of m… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  16. arXiv:2505.18341  [pdf, ps, other

    cs.RO cs.AI

    CrashAgent: Crash Scenario Generation via Multi-modal Reasoning

    Authors: Miao Li, Wenhao Ding, Haohong Lin, Yiqi Lyu, Yihang Yao, Yuyou Zhang, Ding Zhao

    Abstract: Training and evaluating autonomous driving algorithms requires a diverse range of scenarios. However, most available datasets predominantly consist of normal driving behaviors demonstrated by human drivers, resulting in a limited number of safety-critical cases. This imbalance, often referred to as a long-tail distribution, restricts the ability of driving algorithms to learn from crucial scenario… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  17. arXiv:2505.16936  [pdf, ps, other

    cs.LG

    SPAR: Self-supervised Placement-Aware Representation Learning for Multi-Node IoT Systems

    Authors: Yizhuo Chen, Tianchen Wang, You Lyu, Yanlan Hu, Jinyang Li, Tomoyoshi Kimura, Hongjue Zhao, Yigong Hu, Denizhan Kara, Tarek Abdelzaher

    Abstract: This work develops the underpinnings of self-supervised placement-aware representation learning given spatially-distributed (multi-view and multimodal) sensor observations, motivated by the need to represent external environmental state in multi-sensor IoT systems in a manner that correctly distills spatial phenomena from the distributed multi-vantage observations. The objective of sensing in IoT… ▽ More

    Submitted 23 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  18. arXiv:2505.11920  [pdf, ps, other

    cs.RO

    H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos

    Authors: Guangrun Li, Yaoxu Lyu, Zhuoyang Liu, Chengkai Hou, Jieyu Zhang, Shanghang Zhang

    Abstract: Large-scale pre-training using videos has proven effective for robot learning. However, the models pre-trained on such data can be suboptimal for robot learning due to the significant visual gap between human hands and those of different robots. To remedy this, we propose H2R, a simple data augmentation technique that detects human hand keypoints, synthesizes robot motions in simulation, and compo… ▽ More

    Submitted 26 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

  19. arXiv:2505.11907  [pdf, ps, other

    cs.CV

    Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?

    Authors: Zihao Dongfang, Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Danda Pani Paudel, Luc Van Gool, Kailun Yang, Xuming Hu

    Abstract: The 180x360 omnidirectional field of view captured by 360-degree cameras enables their use in a wide range of applications such as embodied AI and virtual reality. Although recent advances in multimodal large language models (MLLMs) have shown promise in visual-spatial reasoning, most studies focus on standard pinhole-view images, leaving omnidirectional perception largely unexplored. In this pape… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  20. arXiv:2505.06635  [pdf, ps, other

    cs.CV

    Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization

    Authors: Xu Zheng, Yuanhuiyi Lyu, Lutao Jiang, Danda Pani Paudel, Luc Van Gool, Xuming Hu

    Abstract: Fusing and balancing multi-modal inputs from novel sensors for dense prediction tasks, particularly semantic segmentation, is critically important yet remains a significant challenge. One major limitation is the tendency of multi-modal frameworks to over-rely on easily learnable modalities, a phenomenon referred to as unimodal dominance or bias. This issue becomes especially problematic in real-wo… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  21. arXiv:2505.03673  [pdf, ps, other

    cs.RO

    RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration

    Authors: Huajie Tan, Xiaoshuai Hao, Cheng Chi, Minglan Lin, Yaoxu Lyu, Mingyu Cao, Dong Liang, Zhuo Chen, Mengsi Lyu, Cheng Peng, Chenrui He, Yulong Ao, Yonghua Lin, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

    Abstract: The dawn of embodied intelligence has ushered in an unprecedented imperative for resilient, cognition-enabled multi-agent collaboration across next-generation ecosystems, revolutionizing paradigms in autonomous manufacturing, adaptive service robotics, and cyber-physical production architectures. However, current robotic systems face significant limitations, such as limited cross-embodiment adapta… ▽ More

    Submitted 5 June, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

    Comments: 22 pages, 10 figures

  22. arXiv:2505.02961  [pdf, ps, other

    cs.SE

    Can We Recycle Our Old Models? An Empirical Evaluation of Model Selection Mechanisms for AIOps Solutions

    Authors: Yingzhe Lyu, Hao Li, Heng Li, Ahmed E. Hassan

    Abstract: AIOps (Artificial Intelligence for IT Operations) solutions leverage the tremendous amount of data produced during the operation of large-scale systems and machine learning models to assist software practitioners in their system operations. Existing AIOps solutions usually maintain AIOps models against concept drift through periodical retraining, despite leaving a pile of discarded historical mode… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: text overlap with arXiv:2311.03213

  23. arXiv:2505.02508  [pdf, ps, other

    stat.ML cs.LG math.ST

    Resolving Memorization in Empirical Diffusion Model for Manifold Data in High-Dimensional Spaces

    Authors: Yang Lyu, Yuchun Qian, Tan Minh Nguyen, Xin T. Tong

    Abstract: Diffusion models is a popular computational tool to generate new data samples. It utilizes a forward diffusion process that add noise to the data distribution and then use a reverse process to remove noises to produce samples from the data distribution. However, when the empirical data distribution consists of $n$ data point, using the empirical diffusion model will necessarily produce one of the… ▽ More

    Submitted 6 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  24. arXiv:2504.17670  [pdf, other

    cs.CV

    DiMeR: Disentangled Mesh Reconstruction Model

    Authors: Lutao Jiang, Jiantao Lin, Kanghao Chen, Wenhang Ge, Xin Yang, Yifan Jiang, Yuanhuiyi Lyu, Xu Zheng, Yinchuan Li, Yingcong Chen

    Abstract: We propose DiMeR, a novel geometry-texture disentangled feed-forward model with 3D supervision for sparse-view mesh reconstruction. Existing methods confront two persistent obstacles: (i) textures can conceal geometric errors, i.e., visually plausible images can be rendered even with wrong geometry, producing multiple ambiguous optimization objectives in geometry-texture mixed solution space for s… ▽ More

    Submitted 26 May, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: Project Page: https://lutao2021.github.io/DiMeR_page/

  25. arXiv:2504.16947  [pdf, other

    cs.SI cs.AI

    SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments

    Authors: Dachun Sun, You Lyu, Jinning Li, Yizhuo Chen, Tianshi Wang, Tomoyoshi Kimura, Tarek Abdelzaher

    Abstract: This paper introduces SCRAG, a prediction framework inspired by social computing, designed to forecast community responses to real or hypothetical social media posts. SCRAG can be used by public relations specialists (e.g., to craft messaging in ways that avoid unintended misinterpretations) or public figures and influencers (e.g., to anticipate social responses), among other applications related… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  26. arXiv:2504.14921   

    cs.CV cs.AI

    Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos

    Authors: Songping Wang, Hanqing Liu, Yueming Lyu, Xiantao Hu, Ziwen He, Wei Wang, Caifeng Shan, Liang Wang

    Abstract: Adversarial Training (AT) has been shown to significantly enhance adversarial robustness via a min-max optimization approach. However, its effectiveness in video recognition tasks is hampered by two main challenges. First, fast adversarial training for video models remains largely unexplored, which severely impedes its practical applications. Specifically, most video adversarial training methods a… ▽ More

    Submitted 23 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: After the submission of the paper, we realized that the study still has room for expansion. In order to make the research findings more profound and comprehensive, we have decided to withdraw the paper so that we can conduct further research and expansion

  27. arXiv:2504.12129   

    cs.CV

    Anti-Aesthetics: Protecting Facial Privacy against Customized Text-to-Image Synthesis

    Authors: Songping Wang, Yueming Lyu, Shiqi Liu, Ning Li, Tong Tong, Hao Sun, Caifeng Shan

    Abstract: The rise of customized diffusion models has spurred a boom in personalized visual content creation, but also poses risks of malicious misuse, severely threatening personal privacy and copyright protection. Some studies show that the aesthetic properties of images are highly positively correlated with human perception of image quality. Inspired by this, we approach the problem from a novel and intr… ▽ More

    Submitted 23 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: After the submission of the paper, we realized that the study still has room for expansion. In order to make the research findings more profound and comprehensive, we have decided to withdraw the paper so that we can conduct further research and expansion

  28. arXiv:2504.10871  [pdf, other

    cs.CV

    DAAF:Degradation-Aware Adaptive Fusion Framework for Robust Infrared and Visible Images Fusion

    Authors: Tianpei Zhang, Jufeng Zhao, Yiming Zhu, Guangmang Cui, Yuxin Jing, Yuhan Lyu

    Abstract: Existing infrared and visible image fusion(IVIF) algorithms often prioritize high-quality images, neglecting image degradation such as low light and noise, which limits the practical potential. This paper propose Degradation-Aware Adaptive image Fusion (DAAF), which achieves unified modeling of adaptive degradation optimization and image fusion. Specifically, DAAF comprises an auxiliary Adaptive D… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  29. arXiv:2504.09612  [pdf, other

    cs.HC

    A Systematic Literature Review of Infrastructure Studies in SIGCHI

    Authors: Yao Lyu, Jie Cai, John M. Carroll

    Abstract: Infrastructure is an indispensable part of human life. Over the past decades, the Human-Computer Interaction (HCI) community has paid increasing attention to human interactions with infrastructure. In this paper, we conducted a systematic literature review on infrastructure studies in SIGCHI, one of the most influential communities in HCI. We collected a total of 190 primary studies, covering work… ▽ More

    Submitted 15 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted to CSCW'25

  30. arXiv:2504.07758  [pdf, other

    cs.CV eess.IV

    PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution

    Authors: Shuangfan Zhou, Chu Zhou, Youwei Lyu, Heng Guo, Zhanyu Ma, Boxin Shi, Imari Sato

    Abstract: Polarization cameras can capture multiple polarized images with different polarizer angles in a single shot, bringing convenience to polarization-based downstream tasks. However, their direct outputs are color-polarization filter array (CPFA) raw images, requiring demosaicing to reconstruct full-resolution, full-color polarized images; unfortunately, this necessary step introduces artifacts that m… ▽ More

    Submitted 22 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  31. arXiv:2504.04334  [pdf, other

    cs.SE

    Artificial Intelligence for Software Architecture: Literature Review and the Road Ahead

    Authors: Alessio Bucaioni, Martin Weyssow, Junda He, Yunbo Lyu, David Lo

    Abstract: This paper presents a forward-looking vision for artificial intelligence-driven software architecture that addresses longstanding challenges in design and evolution. Although artificial intelligence has achieved notable success in software engineering, its explicit application to software architecture remains under-explored. Traditional practices, heavily reliant on expert knowledge and complex tr… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  32. arXiv:2504.04141  [pdf, other

    cs.CL

    Cognitive Debiasing Large Language Models for Decision-Making

    Authors: Yougang Lyu, Shijie Ren, Yue Feng, Zihan Wang, Zhumin Chen, Zhaochun Ren, Maarten de Rijke

    Abstract: Large language models (LLMs) have shown potential in supporting decision-making applications, particularly as personal assistants in the financial, healthcare, and legal domains. While prompt engineering strategies have enhanced the capabilities of LLMs in decision-making, cognitive biases inherent to LLMs present significant challenges. Cognitive biases are systematic patterns of deviation from n… ▽ More

    Submitted 23 May, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

  33. arXiv:2504.03071  [pdf, other

    cs.CL cs.AI

    AD-GPT: Large Language Models in Alzheimer's Disease

    Authors: Ziyu Liu, Lintao Tang, Zeliang Sun, Zhengliang Liu, Yanjun Lyu, Wei Ruan, Yangshuang Xu, Liang Shan, Jiyoon Shin, Xiaohe Chen, Dajiang Zhu, Tianming Liu, Rongjie Liu, Chao Huang

    Abstract: Large language models (LLMs) have emerged as powerful tools for medical information retrieval, yet their accuracy and depth remain limited in specialized domains such as Alzheimer's disease (AD), a growing global health challenge. To address this gap, we introduce AD-GPT, a domain-specific generative pre-trained transformer designed to enhance the retrieval and analysis of AD-related genetic and n… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  34. arXiv:2504.02855  [pdf, other

    eess.SY cs.AI

    Exploration of Multi-Element Collaborative Research and Application for Modern Power System Based on Generative Large Models

    Authors: Lu Cheng, Qixiu Zhang, Beibei Xu, Zhiwei Huang, Cirun Zhang, Yanan Lyu, Fan Zhang

    Abstract: The transition to intelligent, low-carbon power systems necessitates advanced optimization strategies for managing renewable energy integration, energy storage, and carbon emissions. Generative Large Models (GLMs) provide a data-driven approach to enhancing forecasting, scheduling, and market operations by processing multi-source data and capturing complex system dynamics. This paper explores the… ▽ More

    Submitted 26 March, 2025; originally announced April 2025.

  35. arXiv:2503.23153  [pdf, other

    cs.HC cs.AI

    Conversational Agents for Older Adults' Health: A Systematic Literature Review

    Authors: Jiaxin An, Siqi Yi, Yao Lyu, Houjiang Liu, Yan Zhang

    Abstract: There has been vast literature that studies Conversational Agents (CAs) in facilitating older adults' health. The vast and diverse studies warrants a comprehensive review that concludes the main findings and proposes research directions for future studies, while few literature review did it from human-computer interaction (HCI) perspective. In this study, we present a survey of existing studies on… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: 31 pages, 4 figures

  36. arXiv:2503.19823  [pdf, other

    q-bio.NC cs.AI cs.CV

    GyralNet Subnetwork Partitioning via Differentiable Spectral Modularity Optimization

    Authors: Yan Zhuang, Minheng Chen, Chao Cao, Tong Chen, Jing Zhang, Xiaowei Yu, Yanjun Lyu, Lu Zhang, Tianming Liu, Dajiang Zhu

    Abstract: Understanding the structural and functional organization of the human brain requires a detailed examination of cortical folding patterns, among which the three-hinge gyrus (3HG) has been identified as a key structural landmark. GyralNet, a network representation of cortical folding, models 3HGs as nodes and gyral crests as edges, highlighting their role as critical hubs in cortico-cortical connect… ▽ More

    Submitted 31 March, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: 10 pages, 3 figures

  37. arXiv:2503.18016  [pdf, other

    cs.CV

    Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

    Authors: Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, Xuming Hu

    Abstract: Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI), particularly in enhancing the capabilities of large language models (LLMs) by enabling access to external, reliable, and up-to-date knowledge sources. In the context of AI-Generated Content (AIGC), RAG has proven invaluable by augmenting model outputs with supplementary, relevant information, t… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 19 pages, 10 figures

  38. arXiv:2503.14655  [pdf, other

    q-bio.NC cs.AI cs.CV eess.IV

    Core-Periphery Principle Guided State Space Model for Functional Connectome Classification

    Authors: Minheng Chen, Xiaowei Yu, Jing Zhang, Tong Chen, Chao Cao, Yan Zhuang, Yanjun Lyu, Lu Zhang, Tianming Liu, Dajiang Zhu

    Abstract: Understanding the organization of human brain networks has become a central focus in neuroscience, particularly in the study of functional connectivity, which plays a crucial role in diagnosing neurological disorders. Advances in functional magnetic resonance imaging and machine learning techniques have significantly improved brain network analysis. However, traditional machine learning approaches… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  39. arXiv:2503.14084  [pdf, other

    eess.IV cs.LG

    Semantic Communication in Dynamic Channel Scenarios: Collaborative Optimization of Dual-Pipeline Joint Source-Channel Coding and Personalized Federated Learning

    Authors: Xingrun Yan, Shiyuan Zuo, Yifeng Lyu, Rongfei Fan, Han Hu

    Abstract: Semantic communication is designed to tackle issues like bandwidth constraints and high latency in communication systems. However, in complex network topologies with multiple users, the enormous combinations of client data and channel state information (CSI) pose significant challenges for existing semantic communication architectures. To improve the generalization ability of semantic communicatio… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  40. arXiv:2503.09621  [pdf, other

    eess.SY cs.RO

    Adaptive Deadlock Avoidance for Decentralized Multi-agent Systems via CBF-inspired Risk Measurement

    Authors: Yanze Zhang, Yiwei Lyu, Siwon Jo, Yupeng Yang, Wenhao Luo

    Abstract: Decentralized safe control plays an important role in multi-agent systems given the scalability and robustness without reliance on a central authority. However, without an explicit global coordinator, the decentralized control methods are often prone to deadlock -- a state where the system reaches equilibrium, causing the robots to stall. In this paper, we propose a generalized decentralized frame… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 7 pages, accepted to ICRA 2025

  41. arXiv:2503.07640  [pdf

    cs.LG cs.AI q-bio.NC

    BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification

    Authors: Jing Zhang, Xiaowei Yu, Tong Chen, Chao Cao, Mingheng Chen, Yan Zhuang, Yanjun Lyu, Lu Zhang, Li Su, Tianming Liu, Dajiang Zhu

    Abstract: The Lewy body dementia (LBD) is the second most common neurodegenerative dementia after Alzheimer's disease (AD). Early differentiation between AD and LBD is crucial because they require different treatment approaches, but this is challenging due to significant clinical overlap, heterogeneity, complex pathogenesis, and the rarity of LBD. While recent advances in artificial intelligence (AI) demons… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  42. arXiv:2503.07098  [pdf, other

    cs.CV

    OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation

    Authors: Ding Zhong, Xu Zheng, Chenfei Liao, Yuanhuiyi Lyu, Jialei Chen, Shengyang Wu, Linfeng Zhang, Xuming Hu

    Abstract: Segment Anything Model 2 (SAM2) has emerged as a strong base model in various pinhole imaging segmentation tasks. However, when applying it to $360^\circ$ domain, the significant field-of-view (FoV) gap between pinhole ($70^\circ \times 70^\circ$) and panoramic images ($180^\circ \times 360^\circ$) poses unique challenges. Two major concerns for this application includes 1) inevitable distortion a… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  43. arXiv:2503.06923  [pdf, other

    cs.CV cs.AI

    From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

    Authors: Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Junjie Chen, Linfeng Zhang

    Abstract: Diffusion Transformers (DiT) have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications. To solve this problem, feature caching has been proposed to accelerate diffusion models by caching the features in the previous timesteps and then reusing them in the following timesteps. However, at timesteps with significant inte… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 13 pages, 14 figures

  44. arXiv:2503.06700  [pdf, other

    cs.CV

    MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation

    Authors: Chenfei Liao, Xu Zheng, Yuanhuiyi Lyu, Haiwei Xue, Yihong Cao, Jiawen Wang, Kailun Yang, Xuming Hu

    Abstract: Research has focused on Multi-Modal Semantic Segmentation (MMSS), where pixel-wise predictions are derived from multiple visual modalities captured by diverse sensors. Recently, the large vision model, Segment Anything Model 2 (SAM2), has shown strong zero-shot segmentation performance on both images and videos. When extending SAM2 to MMSS, two issues arise: 1. How can SAM2 be adapted to multi-mod… ▽ More

    Submitted 20 March, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  45. arXiv:2503.06276   

    cs.CV

    Exploring Adversarial Transferability between Kolmogorov-arnold Networks

    Authors: Songping Wang, Xinquan Yue, Yueming Lyu, Caifeng Shan

    Abstract: Kolmogorov-Arnold Networks (KANs) have emerged as a transformative model paradigm, significantly impacting various fields. However, their adversarial robustness remains less underexplored, especially across different KAN architectures. To explore this critical safety issue, we conduct an analysis and find that due to overfitting to the specific basis functions of KANs, they possess poor adversaria… ▽ More

    Submitted 23 April, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: After the submission of the paper, we realized that the study still has room for expansion. In order to make the research findings more profound and comprehensive, we have decided to withdraw the paper so that we can conduct further research and expansion

  46. arXiv:2503.05474  [pdf, other

    cs.LG cs.AI

    Personalized Federated Learning via Learning Dynamic Graphs

    Authors: Ziran Zhou, Guanyu Gao, Xiaohu Wu, Yan Lyu

    Abstract: Personalized Federated Learning (PFL) aims to train a personalized model for each client that is tailored to its local data distribution, learning fails to perform well on individual clients due to variations in their local data distributions. Most existing PFL methods focus on personalizing the aggregated global model for each client, neglecting the fundamental aspect of federated learning: the r… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  47. Eggly: Designing Mobile Augmented Reality Neurofeedback Training Games for Children with Autism Spectrum Disorder

    Authors: Yue Lyu, Pengcheng An, Yage Xiao, Zibo Selena Zhang, Huan Zhang, Keiko Katsuragawa, Jian Zhao

    Abstract: Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects how children communicate and relate to other people and the world around them. Emerging studies have shown that neurofeedback training (NFT) games are an effective and playful intervention to enhance social and attentional capabilities for autistic children. However, NFT is primarily available in a clinical setting that i… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 23 pages, 9 figures, Presented at Ubicomp'23

    ACM Class: J.4

    Journal ref: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2023

  48. arXiv:2502.20658  [pdf

    cs.HC cs.CY cs.SI

    Displaying Fear, Sadness, and Joy in Public: Schizophrenia Vloggers' Video Narration of Emotion and Online Care-Seeking

    Authors: Jiaying "Lizzy" Liu, Yunlong Wang, Allen Jue, Yao Lyu, Yiheng Su, Shuo Niu, Yan Zhang

    Abstract: Individuals with severe mental illnesses (SMI), particularly schizophrenia, experience complex and intense emotions frequently. They increasingly turn to vlogging as an authentic medium for emotional disclosure and online support-seeking. While previous research has primarily focused on text-based disclosure, little is known about how people construct narratives around emotions and emotional exper… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  49. arXiv:2502.19298  [pdf, other

    cs.IR

    Agent-centric Information Access

    Authors: Evangelos Kanoulas, Panagiotis Eustratiadis, Yongkang Li, Yougang Lyu, Vaishali Pal, Gabrielle Poerwawinata, Jingfen Qiao, Zihan Wang

    Abstract: As large language models (LLMs) become more specialized, we envision a future where millions of expert LLMs exist, each trained on proprietary data and excelling in specific domains. In such a system, answering a query requires selecting a small subset of relevant models, querying them efficiently, and synthesizing their responses. This paper introduces a framework for agent-centric information ac… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  50. arXiv:2502.18702  [pdf, other

    cs.IR cs.CL

    A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition

    Authors: Zihan Wang, Ziqi Zhao, Yougang Lyu, Zhumin Chen, Maarten de Rijke, Zhaochun Ren

    Abstract: Zero-shot named entity recognition (NER) aims to develop entity recognition systems from unannotated text corpora. This task presents substantial challenges due to minimal human intervention. Recent work has adapted large language models (LLMs) for zero-shot NER by crafting specialized prompt templates. It advances model self-learning abilities by incorporating self-annotated demonstrations. Howev… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: Accepted at WWW 2025