Skip to main content

Showing 1–50 of 7,305 results for author: Chen, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10348  [pdf, ps, other

    cs.HC cs.SD eess.AS

    ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

    Authors: Cunhang Fan, Xiaoke Yang, Hongyu Zhang, Ying Chen, Lu Li, Jian Zhou, Zhao Lv

    Abstract: Auditory attention detection (AAD) aims to identify the direction of the attended speaker in multi-speaker environments from brain signals, such as Electroencephalography (EEG) signals. However, existing EEG-based AAD methods overlook the spatio-temporal dependencies of EEG signals, limiting their decoding and generalization abilities. To address these issues, this paper proposes a Lightweight Spa… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10257  [pdf, ps, other

    cs.CV

    Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot

    Authors: Hao Lu, Jiaqi Tang, Jiyao Wang, Yunfan LU, Xu Cao, Qingyong Hu, Yin Wang, Yuting Zhang, Tianxin Xie, Yunpeng Zhang, Yong Chen, Jiayu. Gao, Bin Huang, Dengbo He, Shuiguang Deng, Hao Chen, Ying-Cong Chen

    Abstract: The intelligent driving cockpit, an important part of intelligent driving, needs to match different users' comfort, interaction, and safety needs. This paper aims to build a Super-Aligned and GEneralist DRiving agent, SAGE DeeR. Sage Deer achieves three highlights: (1) Super alignment: It achieves different reactions according to different people's preferences and biases. (2) Generalist: It can u… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09861  [pdf, other

    cs.LG cs.AI cs.IR stat.ME

    LiDDA: Data Driven Attribution at LinkedIn

    Authors: John Bencina, Erkut Aykutlug, Yue Chen, Zerui Zhang, Stephanie Sorenson, Shao Tang, Changshuai Wei

    Abstract: Data Driven Attribution, which assigns conversion credits to marketing interactions based on causal patterns learned from data, is the foundation of modern marketing intelligence and vital to any marketing businesses and advertising platform. In this paper, we introduce a unified transformer-based attribution approach that can handle member-level data, aggregate-level data, and integration of exte… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09653  [pdf, other

    quant-ph cs.AI cs.ET cs.LG cs.NE

    Differentiable Quantum Architecture Search in Quantum-Enhanced Neural Network Parameter Generation

    Authors: Samuel Yen-Chi Chen, Chen-Yu Liu, Kuan-Cheng Chen, Wei-Jia Huang, Yen-Jui Chang, Wei-Hao Huang

    Abstract: The rapid advancements in quantum computing (QC) and machine learning (ML) have led to the emergence of quantum machine learning (QML), which integrates the strengths of both fields. Among QML approaches, variational quantum circuits (VQCs), also known as quantum neural networks (QNNs), have shown promise both empirically and theoretically. However, their broader adoption is hindered by reliance o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  5. arXiv:2505.09590  [pdf, ps, other

    cs.IR

    Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation

    Authors: Tao Huang, Yihong Chen, Wei Fan, Wei Zhou, Junhao Wen

    Abstract: Graph Convolutional Networks (GCNs) are widely used to improve recommendation accuracy and performance by effectively learning the representations of user and item nodes. However, two major challenges remain: (1) the lack of further optimization in the graph representation structure and (2) insufficient attention given to the varying contributions of different convolutional layers.This paper propo… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  6. arXiv:2505.09558  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

    Authors: Shengpeng Ji, Tianle Liang, Yangzhuo Li, Jialong Zuo, Minghui Fang, Jinzheng He, Yifu Chen, Zhengqing Liu, Ziyue Jiang, Xize Cheng, Siqi Zheng, Jin Xu, Junyang Lin, Zhou Zhao

    Abstract: End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models' conversational performance has largely been overlooked. This is primarily due to the intelligent chatbots convey a wealth of non-textual information which cannot be easily measured using text-based language models like ChatGPT.… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2505.09395  [pdf, other

    quant-ph cs.AI cs.LG

    Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting

    Authors: Chen-Yu Liu, Kuan-Cheng Chen, Yi-Chien Chen, Samuel Yen-Chi Chen, Wei-Hao Huang, Wei-Jia Huang, Yen-Jui Chang

    Abstract: Typhoon trajectory forecasting is essential for disaster preparedness but remains computationally demanding due to the complexity of atmospheric dynamics and the resource requirements of deep learning models. Quantum-Train (QT), a hybrid quantum-classical framework that leverages quantum neural networks (QNNs) to generate trainable parameters exclusively during training, eliminating the need for q… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  8. arXiv:2505.09115  [pdf, ps, other

    cs.HC cs.AI

    PreCare: Designing AI Assistants for Advance Care Planning (ACP) to Enhance Personal Value Exploration, Patient Knowledge, and Decisional Confidence

    Authors: Yu Lun Hsu, Yun-Rung Chou, Chiao-Ju Chang, Yu-Cheng Chang, Zer-Wei Lee, Rokas Gipiškis, Rachel Li, Chih-Yuan Shih, Jen-Kuei Peng, Hsien-Liang Huang, Jaw-Shiun Tsai, Mike Y. Chen

    Abstract: Advance Care Planning (ACP) allows individuals to specify their preferred end-of-life life-sustaining treatments before they become incapacitated by injury or terminal illness (e.g., coma, cancer, dementia). While online ACP offers high accessibility, it lacks key benefits of clinical consultations, including personalized value exploration, immediate clarification of decision consequences. To brid… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  9. arXiv:2505.09109  [pdf, ps, other

    cs.RO cs.CV

    FoldNet: Learning Generalizable Closed-Loop Policy for Garment Folding via Keypoint-Driven Asset and Demonstration Synthesis

    Authors: Yuxing Chen, Bowen Xiao, He Wang

    Abstract: Due to the deformability of garments, generating a large amount of high-quality data for robotic garment manipulation tasks is highly challenging. In this paper, we present a synthetic garment dataset that can be used for robotic garment folding. We begin by constructing geometric garment templates based on keypoints and applying generative models to generate realistic texture patterns. Leveraging… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  10. arXiv:2505.09074  [pdf, other

    cs.RO

    Deployable and Generalizable Motion Prediction: Taxonomy, Open Challenges and Future Directions

    Authors: Letian Wang, Marc-Antoine Lavoie, Sandro Papais, Barza Nisar, Yuxiao Chen, Wenhao Ding, Boris Ivanovic, Hao Shao, Abulikemu Abuduweili, Evan Cook, Yang Zhou, Peter Karkus, Jiachen Li, Changliu Liu, Marco Pavone, Steven Waslander

    Abstract: Motion prediction, the anticipation of future agent states or scene evolution, is rooted in human cognition, bridging perception and decision-making. It enables intelligent systems, such as robots and self-driving cars, to act safely in dynamic, human-involved environments, and informs broader time-series reasoning challenges. With advances in methods, representations, and datasets, the field has… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Initial draft, 162 pages, 40 figures, 13 tables

  11. arXiv:2505.08971  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

    Authors: Yangyi Chen, Hao Peng, Tong Zhang, Heng Ji

    Abstract: In standard large vision-language models (LVLMs) pre-training, the model typically maximizes the joint probability of the caption conditioned on the image via next-token prediction (NTP); however, since only a small subset of caption tokens directly relates to the visual content, this naive NTP unintentionally fits the model to noise and increases the risk of hallucination. We present PRIOR, a sim… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: The code will be available at https://github.com/Yangyi-Chen/PRIOR

  12. arXiv:2505.08782  [pdf, ps, other

    cs.LG cs.CE

    Addressing the Current Challenges of Quantum Machine Learning through Multi-Chip Ensembles

    Authors: Junghoon Justin Park, Jiook Cha, Samuel Yen-Chi Chen, Huan-Hsin Tseng, Shinjae Yoo

    Abstract: Quantum Machine Learning (QML) holds significant promise for solving computational challenges across diverse domains. However, its practical deployment is constrained by the limitations of noisy intermediate-scale quantum (NISQ) devices, including noise, limited scalability, and trainability issues in variational quantum circuits (VQCs). We introduce the multi-chip ensemble VQC framework, which pa… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  13. arXiv:2505.08712  [pdf, ps, other

    cs.RO

    NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance

    Authors: Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, Meng Wei, Hanqing Wang, Yilun Chen, Tai Wang, Jiangmiao Pang

    Abstract: Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in divers… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 6 figures

  14. arXiv:2505.08604  [pdf, ps, other

    cs.CV

    Unsupervised Out-of-Distribution Detection in Medical Imaging Using Multi-Exit Class Activation Maps and Feature Masking

    Authors: Yu-Jen Chen, Xueyang Li, Yiyu Shi, Tsung-Yi Ho

    Abstract: Out-of-distribution (OOD) detection is essential for ensuring the reliability of deep learning models in medical imaging applications. This work is motivated by the observation that class activation maps (CAMs) for in-distribution (ID) data typically emphasize regions that are highly relevant to the model's predictions, whereas OOD data often lacks such focused activations. By masking input images… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 10 pages, 2 figures

  15. arXiv:2505.08548  [pdf, other

    cs.RO cs.AI cs.LG

    From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation

    Authors: Yifu Yuan, Haiqin Cui, Yibin Chen, Zibin Dong, Fei Ni, Longxin Kou, Jinyi Liu, Pengyi Li, Yan Zheng, Jianye Hao

    Abstract: Achieving generalization in robotic manipulation remains a critical challenge, particularly for unseen scenarios and novel tasks. Current Vision-Language-Action (VLA) models, while building on top of general Vision-Language Models (VLMs), still fall short of achieving robust zero-shot performance due to the scarcity and heterogeneity prevalent in embodied datasets. To address these limitations, we… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Early version

  16. arXiv:2505.08525  [pdf

    cs.CV

    Dynamic Snake Upsampling Operater and Boundary-Skeleton Weighted Loss for Tubular Structure Segmentation

    Authors: Yiqi Chen, Ganghai Huang, Sheng Zhang, Jianglin Dai

    Abstract: Accurate segmentation of tubular topological structures (e.g., fissures and vasculature) is critical in various fields to guarantee dependable downstream quantitative analysis and modeling. However, in dense prediction tasks such as semantic segmentation and super-resolution, conventional upsampling operators cannot accommodate the slenderness of tubular structures and the curvature of morphology.… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  17. arXiv:2505.08367  [pdf, ps, other

    cs.RO

    MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

    Authors: Xianghui Wang, Xinming Zhang, Yanjun Chen, Xiaoyu Shen, Wei Zhang

    Abstract: Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  18. arXiv:2505.08266  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction

    Authors: Yanbin Wei, Xuehao Wang, Zhan Zhuang, Yang Chen, Shuhao Chen, Yulong Zhang, Yu Zhang, James Kwok

    Abstract: Message-passing graph neural networks (MPNNs) and structural features (SFs) are cornerstones for the link prediction task. However, as a common and intuitive mode of understanding, the potential of visual perception has been overlooked in the MPNN community. For the first time, we equip MPNNs with vision structural awareness by proposing an effective framework called Graph Vision Network (GVN), al… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  19. arXiv:2505.08199  [pdf, ps, other

    cs.LG

    A Multi-scale Representation Learning Framework for Long-Term Time Series Forecasting

    Authors: Boshi Gao, Qingjian Ni, Fanbo Ju, Yu Chen, Ziqi Zhao

    Abstract: Long-term time series forecasting (LTSF) offers broad utility in practical settings like energy consumption and weather prediction. Accurately predicting long-term changes, however, is demanding due to the intricate temporal patterns and inherent multi-scale variations within time series. This work confronts key issues in LTSF, including the suboptimal use of multi-granularity information, the neg… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  20. arXiv:2505.07734  [pdf, ps, other

    cs.CV

    LAMM-ViT: AI Face Detection via Layer-Aware Modulation of Region-Guided Attention

    Authors: Jiangling Zhang, Weijie Zhu, Jirui Huang, Yaxiong Chen

    Abstract: Detecting AI-synthetic faces presents a critical challenge: it is hard to capture consistent structural relationships between facial regions across diverse generation techniques. Current methods, which focus on specific artifacts rather than fundamental inconsistencies, often fail when confronted with novel generative models. To address this limitation, we introduce Layer-aware Mask Modulation Vis… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  21. arXiv:2505.07692  [pdf, other

    cs.DB

    ABase: the Multi-Tenant NoSQL Serverless Database for Diverse and Dynamic Workloads in Large-scale Cloud Environments

    Authors: Rong Kang, Yanbin Chen, Ye Liu, Fuxin Jiang, Qingshuo Li, Miao Ma, Jian Liu, Guangliang Zhao, Tieying Zhang, Jianjun Chen, Lei Zhang

    Abstract: Multi-tenant architectures enhance the elasticity and resource utilization of NoSQL databases by allowing multiple tenants to co-locate and share resources. However, in large-scale cloud environments, the diverse and dynamic nature of workloads poses significant challenges for multi-tenant NoSQL databases. Based on our practical observations, we have identified three crucial challenges: (1) the im… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: SIGMOD 2025 accepted

  22. arXiv:2505.07460  [pdf, other

    cs.AI cs.CL

    A Survey on Collaborative Mechanisms Between Large and Small Language Models

    Authors: Yi Chen, JiaHao Zhao, HaoHao Han

    Abstract: Large Language Models (LLMs) deliver powerful AI capabilities but face deployment challenges due to high resource costs and latency, whereas Small Language Models (SLMs) offer efficiency and deployability at the cost of reduced performance. Collaboration between LLMs and SLMs emerges as a crucial paradigm to synergistically balance these trade-offs, enabling advanced AI applications, especially on… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  23. arXiv:2505.07245  [pdf

    cs.LG cs.AI

    REMEDI: Relative Feature Enhanced Meta-Learning with Distillation for Imbalanced Prediction

    Authors: Fei Liu, Huanhuan Ren, Yu Guan, Xiuxu Wang, Wang Lv, Zhiqiang Hu, Yaxi Chen

    Abstract: Predicting future vehicle purchases among existing owners presents a critical challenge due to extreme class imbalance (<0.5% positive rate) and complex behavioral patterns. We propose REMEDI (Relative feature Enhanced Meta-learning with Distillation for Imbalanced prediction), a novel multi-stage framework addressing these challenges. REMEDI first trains diverse base models to capture complementa… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  24. arXiv:2505.07130  [pdf, ps, other

    cs.IT

    Minimal Linear Codes Violating the Ashikhmin-Barg Condition from Arbitrary Projective Linear Codes

    Authors: Hao Chen, Yaqi Chen, Conghui Xie, Huimin Lao

    Abstract: In recent years, there have been many constructions of minimal linear codes violating the Ashikhmin-Barg condition from Boolean functions, linear codes with few nonzero weights or partial difference sets. In this paper, we first give a general method to transform a minimal code satisfying the Ashikhmin-Barg condition to a minimal code violating the Ashikhmin-Barg condition. Then we give a construc… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 27 pages

  25. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  26. arXiv:2505.06987  [pdf, other

    cs.CL cs.AI

    Convert Language Model into a Value-based Strategic Planner

    Authors: Xiaoyu Wang, Yue Zhao, Qingqing Gu, Zhonglin Jiang, Xiaokai Chen, Yong Chen, Luo Ji

    Abstract: Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage t… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 11 pages, 5 figures, Accepted by ACL 2025 Industry Track

  27. arXiv:2505.06901  [pdf, ps, other

    cs.AR

    Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression

    Authors: Feng Cheng, Cong Guo, Chiyue Wei, Junyao Zhang, Changchun Zhou, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu, Hai "Helen" Li, Yiran Chen

    Abstract: Large language models (LLMs) have demonstrated transformative capabilities across diverse artificial intelligence applications, yet their deployment is hindered by substantial memory and computational demands, especially in resource-constrained environments. Quantization techniques have emerged as a critical solution, reducing data precision to enhance memory and computational efficiency. However,… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: ISCA 2025

  28. arXiv:2505.06840  [pdf, other

    cs.CV

    Visual Instruction Tuning with Chain of Region-of-Interest

    Authors: Yixin Chen, Shuai Zhang, Boran Han, Bernie Wang

    Abstract: High-resolution (HR) images are pivotal for enhancing the recognition and understanding capabilities of multimodal large language models (MLLMs). However, directly increasing image resolution can significantly escalate computational demands. In this study, we propose a method called Chain of Region-of-Interest (CoRoI) for Visual Instruction Tuning, aimed at alleviating the computational burden ass… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: N/A

  29. arXiv:2505.06475  [pdf, ps, other

    cs.LG

    Probing In-Context Learning: Impact of Task Complexity and Model Architecture on Generalization and Efficiency

    Authors: Binwen Liu, Peiyu Xu, Quan Yuan, Yihong Chen

    Abstract: We investigate in-context learning (ICL) through a meticulous experimental framework that systematically varies task complexity and model architecture. Extending beyond the linear regression baseline, we introduce Gaussian kernel regression and nonlinear dynamical system tasks, which emphasize temporal and recursive reasoning. We evaluate four distinct models: a GPT2-style Transformer, a Transform… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  30. arXiv:2505.06302  [pdf, other

    cs.LG cs.AI

    QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives

    Authors: Xuzhi Zhang, Shaohui Peng, Qirui Zhou, Yuanbo Wen, Qi Guo, Ruizhi Chen, Xinguo Zhu, Weiqiang Xiong, Haixin Chen, Congying Ma, Ke Gao, Chen Zhao, Yanjun Wu, Yunji Chen, Ling Li

    Abstract: Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks po… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

    ACM Class: I.2.2

  31. arXiv:2505.06274  [pdf, ps, other

    cs.LG cs.AI

    PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

    Authors: Baijiong Lin, Weisen Jiang, Yuancheng Xu, Hao Chen, Ying-Cong Chen

    Abstract: Multi-objective test-time alignment aims to adapt large language models (LLMs) to diverse multi-dimensional user preferences during inference while keeping LLMs frozen. Recently, GenARM (Xu et al., 2025) first independently trains Autoregressive Reward Models (ARMs) for each preference dimension without awareness of each other, then combines their outputs based on user-specific preference vectors… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  32. arXiv:2505.06105  [pdf, other

    eess.IV cs.CV

    S2MNet: Speckle-To-Mesh Net for Three-Dimensional Cardiac Morphology Reconstruction via Echocardiogram

    Authors: Xilin Gong, Yongkai Chen, Shushan Wu, Fang Wang, Ping Ma, Wenxuan Zhong

    Abstract: Echocardiogram is the most commonly used imaging modality in cardiac assessment duo to its non-invasive nature, real-time capability, and cost-effectiveness. Despite its advantages, most clinical echocardiograms provide only two-dimensional views, limiting the ability to fully assess cardiac anatomy and function in three dimensions. While three-dimensional echocardiography exists, it often suffers… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  33. arXiv:2505.05913  [pdf, ps, other

    cs.CV

    DFEN: Dual Feature Equalization Network for Medical Image Segmentation

    Authors: Jianjian Yin, Yi Chen, Chengyu Li, Zhichao Zheng, Yanhui Gu, Junsheng Zhou

    Abstract: Current methods for medical image segmentation primarily focus on extracting contextual feature information from the perspective of the whole image. While these methods have shown effective performance, none of them take into account the fact that pixels at the boundary and regions with a low number of class pixels capture more contextual feature information from other classes, leading to misclass… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  34. arXiv:2505.05869  [pdf

    cs.LG cs.AI physics.comp-ph

    Generative Discovery of Partial Differential Equations by Learning from Math Handbooks

    Authors: Hao Xu, Yuntian Chen, Rui Cao, Tianning Tang, Mengge Du, Jian Li, Adrian H. Callaghan, Dongxiao Zhang

    Abstract: Data driven discovery of partial differential equations (PDEs) is a promising approach for uncovering the underlying laws governing complex systems. However, purely data driven techniques face the dilemma of balancing search space with optimization efficiency. This study introduces a knowledge guided approach that incorporates existing PDEs documented in a mathematical handbook to facilitate the d… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  35. arXiv:2505.05714  [pdf, other

    cs.CL

    TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries

    Authors: Jinze Lv, Jian Chen, Zi Long, Xianghua Fu, Yin Chen

    Abstract: Most existing multimodal machine translation (MMT) datasets are predominantly composed of static images or short video clips, lacking extensive video data across diverse domains and topics. As a result, they fail to meet the demands of real-world MMT tasks, such as documentary translation. In this study, we developed TopicVD, a topic-based dataset for video-supported multimodal machine translation… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: NLDB 2025

  36. arXiv:2505.05446  [pdf, ps, other

    cs.CV cs.CL

    Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding

    Authors: Han Xiao, Yina Xie, Guanxin Tan, Yinghao Chen, Rui Hu, Ke Wang, Aojun Zhou, Hao Li, Hao Shao, Xudong Lu, Peng Gao, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li

    Abstract: Visual Document Understanding has become essential with the increase of text-rich visual content. This field poses significant challenges due to the need for effective integration of visual perception and textual comprehension, particularly across diverse document types with complex layouts. Moreover, existing fine-tuning datasets for this domain often fall short in providing the detailed contextu… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: CVPR2025

  37. arXiv:2505.05440  [pdf, other

    cs.AI

    EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation

    Authors: Biao Yi, Xavier Hu, Yurun Chen, Shengyu Zhang, Hongxia Yang, Fan Wu, Fei Wu

    Abstract: Cloud-based mobile agents powered by (multimodal) large language models ((M)LLMs) offer strong reasoning abilities but suffer from high latency and cost. While fine-tuned (M)SLMs enable edge deployment, they often lose general capabilities and struggle with complex tasks. To address this, we propose \textbf{EcoAgent}, an \textbf{E}dge-\textbf{C}loud c\textbf{O}llaborative multi-agent framework for… ▽ More

    Submitted 9 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  38. arXiv:2505.05410  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Reasoning Models Don't Always Say What They Think

    Authors: Yanda Chen, Joe Benton, Ansh Radhakrishnan, Jonathan Uesato, Carson Denison, John Schulman, Arushi Somani, Peter Hase, Misha Wagner, Fabien Roger, Vlad Mikulik, Samuel R. Bowman, Jan Leike, Jared Kaplan, Ethan Perez

    Abstract: Chain-of-thought (CoT) offers a potential boon for AI safety as it allows monitoring a model's CoT to try to understand its intentions and reasoning processes. However, the effectiveness of such monitoring hinges on CoTs faithfully representing models' actual reasoning processes. We evaluate CoT faithfulness of state-of-the-art reasoning models across 6 reasoning hints presented in the prompts and… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  39. arXiv:2505.05137  [pdf, ps, other

    cs.LG cs.CV

    Research on Anomaly Detection Methods Based on Diffusion Models

    Authors: Yi Chen

    Abstract: Anomaly detection is a fundamental task in machine learning and data mining, with significant applications in cybersecurity, industrial fault diagnosis, and clinical disease monitoring. Traditional methods, such as statistical modeling and machine learning-based approaches, often face challenges in handling complex, high-dimensional data distributions. In this study, we explore the potential of di… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 table

  40. arXiv:2505.05057  [pdf, other

    cs.SE

    Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware

    Authors: Yujia Chen, Mingyu Chen, Cuiyun Gao, Zhihan Jiang, Zhongqi Li, Yuchi Ma

    Abstract: Application Programming Interfaces (APIs) are crucial in modern software development. Large Language Models (LLMs) assist in automated code generation but often struggle with API hallucination, including invoking non-existent APIs and misusing existing ones in practical development scenarios. Existing studies resort to Retrieval-Augmented Generation (RAG) methods for mitigating the hallucination i… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by FSE 2025 Industry Track

  41. arXiv:2505.05007  [pdf, other

    cs.CV

    Driving with Context: Online Map Matching for Complex Roads Using Lane Markings and Scenario Recognition

    Authors: Xin Bi, Zhichao Li, Yuxuan Xia, Panpan Tong, Lijuan Zhang, Yang Chen, Junsheng Fu

    Abstract: Accurate online map matching is fundamental to vehicle navigation and the activation of intelligent driving functions. Current online map matching methods are prone to errors in complex road networks, especially in multilevel road area. To address this challenge, we propose an online Standard Definition (SD) map matching method by constructing a Hidden Markov Model (HMM) with multiple probability… ▽ More

    Submitted 10 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 9 pages and 12 figures. Under review at IEEE RA-L

  42. arXiv:2505.04891  [pdf

    cs.LG cs.AI stat.ML

    Clustering with Communication: A Variational Framework for Single Cell Representation Learning

    Authors: Cong Qi, Yeqing Chen, Jie Zhang, Wei Zhi

    Abstract: Single-cell RNA sequencing (scRNA-seq) has revealed complex cellular heterogeneity, but recent studies emphasize that understanding biological function also requires modeling cell-cell communication (CCC), the signaling interactions mediated by ligand-receptor pairs that coordinate cellular behavior. Tools like CellChat have demonstrated that CCC plays a critical role in processes such as cell dif… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  43. arXiv:2505.04741  [pdf, other

    cs.LG cs.AI cs.CL

    When Bad Data Leads to Good Models

    Authors: Kenneth Li, Yida Chen, Fernanda Viégas, Martin Wattenberg

    Abstract: In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy e… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  44. arXiv:2505.04656  [pdf, other

    cs.GR

    MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation

    Authors: Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, Huaping Liu

    Abstract: In this paper, we introduce MeshGen, an advanced image-to-3D pipeline that generates high-quality 3D meshes with detailed geometry and physically based rendering (PBR) textures. Addressing the challenges faced by existing 3D native diffusion models, such as suboptimal auto-encoder performance, limited controllability, poor generalization, and inconsistent image-based PBR texturing, MeshGen employs… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: To appear at CVPR 2025 with highlight

  45. arXiv:2505.04652  [pdf, other

    eess.IV cs.CV

    Rethinking Boundary Detection in Deep Learning-Based Medical Image Segmentation

    Authors: Yi Lin, Dong Zhang, Xiao Fang, Yufan Chen, Kwang-Ting Cheng, Hao Chen

    Abstract: Medical image segmentation is a pivotal task within the realms of medical image analysis and computer vision. While current methods have shown promise in accurately segmenting major regions of interest, the precise segmentation of boundary areas remains challenging. In this study, we propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transfo… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by Medical Image Analysis

  46. arXiv:2505.04638  [pdf, ps, other

    cs.AI cs.CL cs.IR

    Towards Artificial Intelligence Research Assistant for Expert-Involved Learning

    Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao

    Abstract: Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research, yet their reliability and specific contributions to biomedical applications remain insufficiently characterized. In this study, we present \textbf{AR}tificial \textbf{I}ntelligence research assistant for \textbf{E}xpert-involved \textbf{L}earning (ARIEL), a multimodal datas… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 36 pages, 7 figures

  47. arXiv:2505.04410  [pdf, other

    cs.CV

    DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

    Authors: Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian

    Abstract: Dense visual prediction tasks have been constrained by their reliance on predefined categories, limiting their applicability in real-world scenarios where visual concepts are unbounded. While Vision-Language Models (VLMs) like CLIP have shown promise in open-vocabulary tasks, their direct application to dense prediction often leads to suboptimal performance due to limitations in local feature repr… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  48. arXiv:2505.04180  [pdf, other

    cs.IR

    Towards Large-scale Generative Ranking

    Authors: Yanhua Huang, Yuqi Chen, Xiong Cao, Rui Yang, Mingliang Qi, Yinghao Zhu, Qingchang Han, Yaowei Liu, Zhaoyu Liu, Xuefeng Yao, Yuting Jia, Leilei Ma, Yinqi Zhang, Taoyu Zhu, Liujie Zhang, Lei Chen, Weihang Chen, Min Zhu, Ruiwen Xu, Lei Zhang

    Abstract: Generative recommendation has recently emerged as a promising paradigm in information retrieval. However, generative ranking systems are still understudied, particularly with respect to their effectiveness and feasibility in large-scale industrial settings. This paper investigates this topic at the ranking stage of Xiaohongshu's Explore Feed, a recommender system that serves hundreds of millions o… ▽ More

    Submitted 8 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  49. arXiv:2505.04161  [pdf

    cs.LG cs.CY cs.MA physics.comp-ph

    Optimization of Infectious Disease Intervention Measures Based on Reinforcement Learning -- Empirical analysis based on UK COVID-19 epidemic data

    Authors: Baida Zhang, Yakai Chen, Huichun Li, Zhenghu Zu

    Abstract: Globally, the outbreaks of infectious diseases have exerted an extremely profound and severe influence on health security and the economy. During the critical phases of epidemics, devising effective intervention measures poses a significant challenge to both the academic and practical arenas. There is numerous research based on reinforcement learning to optimize intervention measures of infectious… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  50. Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model

    Authors: Mingruo Yuan, Ben Kao, Tien-Hsuan Wu, Michael M. K. Cheung, Henry W. H. Chan, Anne S. Y. Cheung, Felix W. H. Chan, Yongxi Chen

    Abstract: Access to legal information is fundamental to access to justice. Yet accessibility refers not only to making legal documents available to the public, but also rendering legal information comprehensible to them. A vexing problem in bringing legal information to the public is how to turn formal legal documents such as legislation and judgments, which are often highly technical, to easily navigable a… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Journal ref: Artificial Intelligence and Law 2024-09