Skip to main content

Showing 1–50 of 281 results for author: Pan, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04049  [pdf, ps, other

    cs.CV cs.RO

    Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation

    Authors: Ziying Song, Lin Liu, Hongyu Pan, Bencheng Liao, Mingzhe Guo, Lei Yang, Yongchang Zhang, Shaoqing Xu, Caiyan Jia, Yadan Luo

    Abstract: Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories.… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: 16 pages, 6 figures

  2. arXiv:2507.00576  [pdf, ps, other

    cs.DC

    DynoStore: A wide-area distribution system for the management of data over heterogeneous storage

    Authors: Dante D. Sanchez-Gallegos, J. L. Gonzalez-Compean, Maxime Gonthier, Valerie Hayot-Sasson, J. Gregory Pauloski, Haochen Pan, Kyle Chard, Jesus Carretero, Ian Foster

    Abstract: Data distribution across different facilities offers benefits such as enhanced resource utilization, increased resilience through replication, and improved performance by processing data near its source. However, managing such data is challenging due to heterogeneous access protocols, disparate authentication models, and the lack of a unified coordination framework. This paper presents DynoStore,… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 10 pages. Conference: The 25th IEEE International Symposium on Cluster, Cloud, and Internet Computing

  3. arXiv:2506.23488  [pdf, ps, other

    cs.NI

    Generative AI-enhanced Low-Altitude UAV-Mounted Stacked Intelligent Metasurfaces

    Authors: Geng Sun, Mingzhe Fan, Lei Zhang, Hongyang Pan, Jiahui Li, Chuang Zhang, Linyao Li, Changyuan Zhao, Chau Yuen

    Abstract: Wireless communication systems face significant challenges in meeting the increasing demands for higher data rates and more reliable connectivity in complex environments. Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for realizing wave-domain signal processing, with mobile SIMs offering superior communication performance compared to their fixed counterparts. In thi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper has been already submitted to TCCN

  4. arXiv:2506.23334  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation

    Authors: Hongyi Pan, Ziliang Hong, Gorkem Durak, Ziyue Xu, Ulas Bagci

    Abstract: Federated learning (FL) has emerged as a promising paradigm for collaboratively training deep learning models across institutions without exchanging sensitive medical data. However, its effectiveness is often hindered by limited data availability and non-independent, identically distributed data across participating clients, which can degrade model performance and generalization. To address these… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  5. arXiv:2506.18364  [pdf, ps, other

    cs.CV

    Spatial frequency information fusion network for few-shot learning

    Authors: Wenqing Zhao, Guojia Xie, Han Pan, Biao Yang, Weichuan Zhang

    Abstract: The objective of Few-shot learning is to fully leverage the limited data resources for exploring the latent correlations within the data by applying algorithms and training a model with outstanding performance that can adequately meet the demands of practical applications. In practical applications, the number of images in each category is usually less than that in traditional deep learning, which… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  6. arXiv:2506.15701  [pdf, ps, other

    cs.LG cs.AI

    Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning

    Authors: Haolin Pan, Hongyu Lin, Haoran Luo, Yang Liu, Kaichun Yao, Libo Zhang, Mingjie Xing, Yanjun Wu

    Abstract: Compiler auto-tuning optimizes pass sequences to improve performance metrics such as Intermediate Representation (IR) instruction count. Although recent advances leveraging Large Language Models (LLMs) have shown promise in automating compiler tuning, two significant challenges still remain: the absence of high-quality reasoning datasets for agents training, and limited effective interactions with… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

  7. arXiv:2506.13612  [pdf, ps, other

    cs.CR cs.AI cs.DC

    EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning

    Authors: Zhiqiang Li, Haiyong Bao, Menghong Guan, Hao Pan, Cheng Huang, Hong-Ning Dai

    Abstract: Despite federated learning (FL)'s potential in collaborative learning, its performance has deteriorated due to the data heterogeneity of distributed users. Recently, clustered federated learning (CFL) has emerged to address this challenge by partitioning users into clusters according to their similarity. However, CFL faces difficulties in training when users are unwilling to share their cluster id… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted by AAAI 25

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 39(17), 18593-18601, 2025

  8. arXiv:2506.13050  [pdf, ps, other

    cs.GR cs.CV

    NeuVAS: Neural Implicit Surfaces for Variational Shape Modeling

    Authors: Pengfei Wang, Qiujie Dong, Fangtian Liang, Hao Pan, Lei Yang, Congyi Zhang, Guying Lin, Caiming Zhang, Yuanfeng Zhou, Changhe Tu, Shiqing Xin, Alla Sheffer, Xin Li, Wenping Wang

    Abstract: Neural implicit shape representation has drawn significant attention in recent years due to its smoothness, differentiability, and topological flexibility. However, directly modeling the shape of a neural implicit surface, especially as the zero-level set of a neural signed distance function (SDF), with sparse geometric control is still a challenging task. Sparse input shape control typically incl… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  9. arXiv:2506.04669  [pdf, other

    cs.LG

    Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning

    Authors: Wanfu Gao, Hanlin Pan, Qingqi Han, Kunpeng Liu

    Abstract: The "Curse of dimensionality" is prevalent across various data patterns, which increases the risk of model overfitting and leads to a decline in model classification performance. However, few studies have focused on this issue in Partial Multi-label Learning (PML), where each sample is associated with a set of candidate labels, at least one of which is correct. Existing PML methods addressing this… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: accept in ijcai25

  10. D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage

    Authors: Maxime Gonthier, Dante D. Sanchez-Gallegos, Haochen Pan, Bogdan Nicolae, Sicheng Zhou, Hai Duc Nguyen, Valerie Hayot-Sasson, J. Gregory Pauloski, Jesus Carretero, Kyle Chard, Ian Foster

    Abstract: The exponential growth of data necessitates distributed storage models, such as peer-to-peer systems and data federations. While distributed storage can reduce costs and increase reliability, the heterogeneity in storage capacity, I/O performance, and failure rates of storage resources makes their efficient use a challenge. Further, node failures are common and can lead to data unavailability and… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

    Comments: Will be published at 2025 International Conference on Supercomputing, Salt Lake City, UT, USA

  11. arXiv:2506.00936  [pdf, ps, other

    cs.LG cs.AI q-bio.QM

    Uncertainty-Aware Metabolic Stability Prediction with Dual-View Contrastive Learning

    Authors: Peijin Guo, Minghui Li, Hewen Pan, Bowen Chen, Yang Wu, Zikang Guo, Leo Yu Zhang, Shengshan Hu, Shengqing Hu

    Abstract: Accurate prediction of molecular metabolic stability (MS) is critical for drug research and development but remains challenging due to the complex interplay of molecular interactions. Despite recent advances in graph neural networks (GNNs) for MS prediction, current approaches face two critical limitations: (1) incomplete molecular modeling due to atom-centric message-passing mechanisms that disre… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: This manuscript has been accepted for publication at ECML-PKDD 2025. The final version will be published in the conference proceedings

  12. arXiv:2505.24245  [pdf, ps, other

    cs.CV cs.AI

    LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework

    Authors: Xin Kang, Zihan Zheng, Lei Chu, Yue Gao, Jiahao Li, Hao Pan, Xuejin Chen, Yan Lu

    Abstract: We present LTM3D, a Latent Token space Modeling framework for conditional 3D shape generation that integrates the strengths of diffusion and auto-regressive (AR) models. While diffusion-based methods effectively model continuous latent spaces and AR models excel at capturing inter-token dependencies, combining these paradigms for 3D shape generation remains a challenge. To address this, LTM3D feat… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  13. arXiv:2505.23228  [pdf, ps, other

    cs.LG

    Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method

    Authors: Wanfu Gao, Jun Gao, Qingqi Han, Hanlin Pan, Kunpeng Liu

    Abstract: The rapid growth in feature dimension may introduce implicit associations between features and labels in multi-label datasets, making the relationships between features and labels increasingly complex. Moreover, existing methods often adopt low-dimensional linear decomposition to explore the associations between features and labels. However, linear decomposition struggles to capture complex nonlin… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  14. arXiv:2505.22477  [pdf

    cs.HC cs.AI cs.CY

    Human-Centered Human-AI Collaboration (HCHAC)

    Authors: Qi Gao, Wei Xu, Hanxi Pan, Mowei Shen, Zaifeng Gao

    Abstract: In the intelligent era, the interaction between humans and intelligent systems fundamentally involves collaboration with autonomous intelligent agents. Human-AI Collaboration (HAC) represents a novel type of human-machine relationship facilitated by autonomous intelligent machines equipped with AI technologies. In this paradigm, AI agents serve not only as auxiliary tools but also as active teamma… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: This article is a chapter from the upcoming book Handbook of Human-Centered Artificial Intelligence

  15. arXiv:2505.20714  [pdf, ps, other

    cs.NI cs.AI cs.LG

    Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting

    Authors: Zechen Li, Lanqing Yang, Yiheng Bian, Hao Pan, Yongjian Fu, Yezhou Wang, Yi-Chao Chen, Guangtao Xue, Ju Ren

    Abstract: This paper presents an innovative frequency-embedded 3D Gaussian splatting (3DGS) algorithm for wideband radio-frequency (RF) radiance field modeling, offering an advancement over the existing works limited to single-frequency modeling. Grounded in fundamental physics, we uncover the complex relationship between EM wave propagation behaviors and RF frequencies. Inspired by this, we design an EM fe… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  16. arXiv:2505.19188  [pdf, ps, other

    cs.LG

    Chordless Structure: A Pathway to Simple and Expressive GNNs

    Authors: Hongxu Pan, Shuxian Hu, Mo Zhou, Zhibin Wang, Rong Gu, Chen Tian, Kun Yang, Sheng Zhong

    Abstract: Researchers have proposed various methods of incorporating more structured information into the design of Graph Neural Networks (GNNs) to enhance their expressiveness. However, these methods are either computationally expensive or lacking in provable expressiveness. In this paper, we observe that the chords increase the complexity of the graph structure while contributing little useful information… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  17. arXiv:2505.16403  [pdf, ps, other

    cs.LG eess.SY

    Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach

    Authors: Huazi Pan, Yanjun Zhang, Leo Yu Zhang, Scott Adams, Abbas Kouzani, Suiyang Khoo

    Abstract: Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA)… ▽ More

    Submitted 28 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: This paper is to appear in IJCAI 2025, code available at: https://github.com/Halsey777/FedSA

  18. arXiv:2505.14738  [pdf, ps, other

    cs.AI

    R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution

    Authors: Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, Yelong Shen, Weizhu Chen, Jiang Bian

    Abstract: Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. While crowdsourcing platforms alleviate some challenges, high-level data science tasks remain labor-intensive and iterative. To overcome these limitations, we introduce R&D-Agent, a dual-agent framework for iterative exploration. The Researcher agent uses pe… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 7 pages, 1 figure, 1 table

  19. arXiv:2505.13211  [pdf, ps, other

    cs.CV cs.AI

    MAGI-1: Autoregressive Video Generation at Scale

    Authors: Sand. ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang, Weifeng Luo, Xiaoyang Kang, Yuchen Sun, Yue Cao, Yunpeng Huang, Yutong Lin, Yuxin Fang, Zewei Tao, Zheng Zhang, Zhongshu Wang, Zixun Liu, Dai Shi, Guoli Su, Hanwen Sun, Hong Pan , et al. (14 additional authors not shown)

    Abstract: We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks condition… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  20. arXiv:2505.12628  [pdf, ps, other

    cs.LG

    Dual-Agent Reinforcement Learning for Automated Feature Generation

    Authors: Wanfu Gao, Zengyao Man, Hanlin Pan, Kunpeng Liu

    Abstract: Feature generation involves creating new features from raw data to capture complex relationships among the original features, improving model robustness and machine learning performance. Current methods using reinforcement learning for feature generation have made feature exploration more flexible and efficient. However, several challenges remain: first, during feature expansion, a large number of… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  21. arXiv:2505.02016  [pdf, ps, other

    cs.AR

    ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA

    Authors: Zhengyuan Shi, Zeju Li, Chengyu Ma, Yunhao Zhou, Ziyang Zheng, Jiawei Liu, Hongyang Pan, Lingfeng Zhou, Kezhi Li, Jiaying Zhu, Lingwei Yan, Zhiqiang He, Chenhao Xue, Wentao Jiang, Fan Yang, Guangyu Sun, Xiaoyan Yang, Gang Chen, Chuan Shi, Zhufei Chu, Jun Yang, Qiang Xu

    Abstract: We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  22. arXiv:2505.00032  [pdf

    cs.CL cs.AI

    MDD-LLM: Towards Accuracy Large Language Models for Major Depressive Disorder Diagnosis

    Authors: Yuyang Sha, Hongxin Pan, Wei Xu, Weiyu Meng, Gang Luo, Xinyu Du, Xiaobing Zhai, Henry H. Y. Tong, Caijuan Shi, Kefeng Li

    Abstract: Major depressive disorder (MDD) impacts more than 300 million people worldwide, highlighting a significant public health issue. However, the uneven distribution of medical resources and the complexity of diagnostic methods have resulted in inadequate attention to this disorder in numerous countries and regions. This paper introduces a high-performance MDD diagnosis tool named MDD-LLM, an AI-driven… ▽ More

    Submitted 28 April, 2025; originally announced May 2025.

  23. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  24. arXiv:2504.12824  [pdf, other

    cs.AR

    Mixed Structural Choice Operator: Enhancing Technology Mapping with Heterogeneous Representations

    Authors: Zhang Hu, Hongyang Pan, Yinshui Xia, Lunyao Wang, Zhufei Chu

    Abstract: The independence of logic optimization and technology mapping poses a significant challenge in achieving high-quality synthesis results. Recent studies have improved optimization outcomes through collaborative optimization of multiple logic representations and have improved structural bias through structural choices. However, these methods still rely on technology-independent optimization and fail… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted by DAC 2025. Please note that this is not the final camera-ready version

  25. arXiv:2504.12667  [pdf, other

    cs.CV

    Two Tasks, One Goal: Uniting Motion and Planning for Excellent End To End Autonomous Driving Performance

    Authors: Lin Liu, Ziying Song, Hongyu Pan, Lei Yang, Caiyan Jia

    Abstract: End-to-end autonomous driving has made impressive progress in recent years. Former end-to-end autonomous driving approaches often decouple planning and motion tasks, treating them as separate modules. This separation overlooks the potential benefits that planning can gain from learning out-of-distribution data encountered in motion tasks. However, unifying these tasks poses significant challenges,… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  26. arXiv:2504.09282  [pdf, other

    cs.CV

    VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro

    Authors: Zheyuan Zhang, Monica Dou, Linkai Peng, Hongyi Pan, Ulas Bagci, Boqing Gong

    Abstract: Advertisement videos serve as a rich and valuable source of purpose-driven information, encompassing high-quality visual, textual, and contextual cues designed to engage viewers. They are often more complex than general videos of similar duration due to their structured narratives and rapid scene transitions, posing significant challenges to multi-modal large language models (MLLMs). In this work,… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  27. arXiv:2504.08000  [pdf, other

    cs.AI cs.LG

    Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

    Authors: Jiahua Lan, Sen Zhang, Haixia Pan, Ruijun Liu, Li Shen, Dacheng Tao

    Abstract: In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grai… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Reinforcement learning, RL skill neuron, stability and plasticity

  28. arXiv:2504.04616  [pdf, other

    cs.CL

    DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition

    Authors: Qi Zhang, Huitong Pan, Zhijia Chen, Longin Jan Latecki, Cornelia Caragea, Eduard Dragut

    Abstract: Distantly Supervised Named Entity Recognition (DS-NER) has attracted attention due to its scalability and ability to automatically generate labeled data. However, distant annotation introduces many mislabeled instances, limiting its performance. Most of the existing work attempt to solve this problem by developing intricate models to learn from the noisy labels. An alternative approach is to attem… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: Accepted to NAACL2025-Findings

  29. arXiv:2504.02830  [pdf, other

    math.OC cs.GR cs.LG

    DualMS: Implicit Dual-Channel Minimal Surface Optimization for Heat Exchanger Design

    Authors: Weizheng Zhang, Hao Pan, Lin Lu, Xiaowei Duan, Xin Yan, Ruonan Wang, Qiang Du

    Abstract: Heat exchangers are critical components in a wide range of engineering applications, from energy systems to chemical processing, where efficient thermal management is essential. The design objectives for heat exchangers include maximizing the heat exchange rate while minimizing the pressure drop, requiring both a large interface area and a smooth internal structure. State-of-the-art designs, such… ▽ More

    Submitted 19 May, 2025; v1 submitted 2 March, 2025; originally announced April 2025.

  30. arXiv:2503.23091  [pdf, ps, other

    cs.CL

    Parsing Through Boundaries in Chinese Word Segmentation

    Authors: Yige Chen, Zelong Li, Cindy Zhang, Changbing Yang, Amandisa Cady, Ai Ka Lee, Zejiao Zeng, Eunkyul Leah Jo, Haihua Pan, Jungyeul Park

    Abstract: Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer und… ▽ More

    Submitted 4 July, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

    Comments: Submitted to EMNLP2025 System Demonstration

  31. arXiv:2503.20644  [pdf, other

    cs.CV

    MMGen: Unified Multi-modal Image Generation and Understanding in One Go

    Authors: Jiepeng Wang, Zhaoqing Wang, Hao Pan, Yuan Liu, Dongdong Yu, Changhu Wang, Wenping Wang

    Abstract: A unified diffusion framework for multi-modal generation and understanding has the transformative potential to achieve seamless and controllable image diffusion and other cross-modal tasks. In this paper, we introduce MMGen, a unified framework that integrates multiple generative tasks into a single diffusion model. This includes: (1) multi-modal category-conditioned generation, where multi-modal… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Our project page: https://jiepengwang.github.io/MMGen/

  32. arXiv:2503.18297  [pdf, other

    cs.CV

    Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module

    Authors: Yishen Liu, Shengda Liu, Hudan Pan

    Abstract: Medical report generation requires specialized expertise that general large models often fail to accurately capture. Moreover, the inherent repetition and similarity in medical data make it difficult for models to extract meaningful features, resulting in a tendency to overfit. So in this paper, we propose a multimodal model, Co-Attention Triple-LSTM Network (CA-TriNet), a deep learning model that… ▽ More

    Submitted 27 March, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

  33. arXiv:2503.17666  [pdf, other

    cs.LG q-bio.QM

    Multi-Modality Representation Learning for Antibody-Antigen Interactions Prediction

    Authors: Peijin Guo, Minghui Li, Hewen Pan, Ruixiang Huang, Lulu Xue, Shengqing Hu, Zikang Guo, Wei Wan, Shengshan Hu

    Abstract: While deep learning models play a crucial role in predicting antibody-antigen interactions (AAI), the scarcity of publicly available sequence-structure pairings constrains their generalization. Current AAI methods often focus on residue-level static details, overlooking fine-grained structural representations of antibodies and their inter-antibody similarities. To tackle this challenge, we introdu… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: 2025 IEEE International Conference on Multimedia and Expo (ICME 2025), June 30 - July 4, 2025, Nantes, France

  34. arXiv:2503.17340  [pdf, other

    cs.MM cs.AI cs.CV cs.SD eess.AS

    Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

    Authors: Congyi Fan, Jian Guan, Xuanjia Zhao, Dongli Xu, Youtian Lin, Tong Ye, Pengming Feng, Haiwei Pan

    Abstract: Automatically generating natural, diverse and rhythmic human dance movements driven by music is vital for virtual reality and film industries. However, generating dance that naturally follows music remains a challenge, as existing methods lack proper beat alignment and exhibit unnatural motion dynamics. In this paper, we propose Danceba, a novel framework that leverages gating mechanism to enhance… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 10 pages, 6 figures

  35. arXiv:2503.13517  [pdf, other

    cs.CL cs.AI

    CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning

    Authors: Hao Cui, Zahra Shamsi, Gowoon Cheon, Xuejian Ma, Shutong Li, Maria Tikhanovskaya, Peter Norgaard, Nayantara Mudur, Martyna Plomecka, Paul Raccuglia, Yasaman Bahri, Victor V. Albert, Pranesh Srinivasan, Haining Pan, Philippe Faist, Brian Rohr, Ekin Dogus Cubuk, Muratahan Aykol, Amil Merchant, Michael J. Statt, Dan Morris, Drew Purves, Elise Kleeman, Ruth Alcantara, Matthew Abraham , et al. (9 additional authors not shown)

    Abstract: Scientific problem-solving involves synthesizing information while applying expert knowledge. We introduce CURIE, a scientific long-Context Understanding,Reasoning and Information Extraction benchmark to measure the potential of Large Language Models (LLMs) in scientific problem-solving and assisting scientists in realistic workflows. This benchmark introduces ten challenging tasks with a total of… ▽ More

    Submitted 13 May, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted at ICLR 2025 main conference

  36. arXiv:2503.12752  [pdf, other

    cs.DC

    WRATH: Workload Resilience Across Task Hierarchies in Task-based Parallel Programming Frameworks

    Authors: Sicheng Zhou, Zhuozhao Li, Valérie Hayot-Sasson, Haochen Pan, Maxime Gonthier, J. Gregory Pauloski, Ryan Chard, Kyle Chard, Ian Foster

    Abstract: Failures in Task-based Parallel Programming (TBPP) can severely degrade performance and result in incomplete or incorrect outcomes. Existing failure-handling approaches, including reactive, proactive, and resilient methods such as retry and checkpointing mechanisms, often apply uniform retry mechanisms regardless of the root cause of failures, failing to account for the unique characteristics of T… ▽ More

    Submitted 27 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: Preprint version

  37. arXiv:2503.11973  [pdf, other

    cs.LG

    Machine Learning-Based Model for Postoperative Stroke Prediction in Coronary Artery Disease

    Authors: Haonan Pan, Shuheng Chen, Elham Pishgar, Kamiar Alaei, Greg Placencia, Maryam Pishgar

    Abstract: Coronary artery disease remains one of the leading causes of mortality globally. Despite advances in revascularization treatments like PCI and CABG, postoperative stroke is inevitable. This study aims to develop and evaluate a sophisticated machine learning prediction model to assess postoperative stroke risk in coronary revascularization patients.This research employed data from the MIMIC-IV data… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 19 pages, 7 figures, submitted to PLOS One. The study employs machine learning techniques, particularly Support Vector Machines, to predict postoperative stroke risk in coronary artery disease patients undergoing revascularization. It utilizes the MIMIC-IV v3.1 database and incorporates SHapley Additive Properties analysis for model interpretation

    MSC Class: es: 62P10 68T05 90C90 (Primary); 62J02 68Q32 90C59 92C50 (Secondary)

  38. arXiv:2503.10615  [pdf, other

    cs.CV

    R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

    Authors: Yi Yang, Xiaoxuan He, Hongkun Pan, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Dacheng Yin, Fengyun Rao, Minfeng Zhu, Bo Zhang, Wei Chen

    Abstract: Large Language Models have demonstrated remarkable reasoning capability in complex textual tasks. However, multimodal reasoning, which requires integrating visual and textual information, remains a significant challenge. Existing visual-language models often struggle to effectively analyze and reason visual content, resulting in suboptimal performance on complex reasoning tasks. Moreover, the abse… ▽ More

    Submitted 18 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: Code and Model: https://github.com/Fancy-MLLM/R1-onevision

  39. arXiv:2503.10115  [pdf, other

    cs.LG

    Reconsidering Feature Structure Information and Latent Space Alignment in Partial Multi-label Feature Selection

    Authors: Hanlin Pan, Kunpeng Liu, Wanfu Gao

    Abstract: The purpose of partial multi-label feature selection is to select the most representative feature subset, where the data comes from partial multi-label datasets that have label ambiguity issues. For label disambiguation, previous methods mainly focus on utilizing the information inside the labels and the relationship between the labels and features. However, the information existing in the feature… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 9pages,6 figures,accept at AAAI 25

  40. arXiv:2503.10036  [pdf, other

    cs.DB

    CCaaLF: Concurrency Control as a Learnable Function

    Authors: Hexiang Pan, Shaofeng Cai, Tien Tuan Anh Dinh, Yuncheng Wu, Yeow Meng Chee, Gang Chen, Beng Chin Ooi

    Abstract: Concurrency control (CC) algorithms are important in modern transactional databases, as they enable high performance by executing transactions concurrently while ensuring correctness. However, state-of-the-art CC algorithms struggle to perform well across diverse workloads, and most do not consider workload drifts. In this paper, we propose CCaaLF (Concurrency Control as a Learnable Function), a… ▽ More

    Submitted 25 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    MSC Class: 68P15 ACM Class: H.2.4

  41. arXiv:2503.05991  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    GrInAdapt: Scaling Retinal Vessel Structural Map Segmentation Through Grounding, Integrating and Adapting Multi-device, Multi-site, and Multi-modal Fundus Domains

    Authors: Zixuan Liu, Aaron Honjaya, Yuekai Xu, Yi Zhang, Hefu Pan, Xin Wang, Linda G Shapiro, Sheng Wang, Ruikang K Wang

    Abstract: Retinal vessel segmentation is critical for diagnosing ocular conditions, yet current deep learning methods are limited by modality-specific challenges and significant distribution shifts across imaging devices, resolutions, and anatomical regions. In this paper, we propose GrInAdapt, a novel framework for source-free multi-target domain adaptation that leverages multi-view images to refine segmen… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  42. arXiv:2503.04446  [pdf, other

    cs.SI cs.MM

    SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity

    Authors: Yijie Xu, Bolun Zheng, Wei Zhu, Hangjia Pan, Yuchen Yao, Ning Xu, Anan Liu, Quan Zhang, Chenggang Yan

    Abstract: Social media popularity prediction task aims to predict the popularity of posts on social media platforms, which has a positive driving effect on application scenarios such as content optimization, digital marketing and online advertising. Though many studies have made significant progress, few of them pay much attention to the integration between popularity prediction with temporal alignment. In… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: accept by CVPR 2025

  43. arXiv:2503.03125  [pdf, other

    cs.RO

    Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

    Authors: Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, Yadan Luo

    Abstract: End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine… ▽ More

    Submitted 8 May, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

    Comments: 16 pages, 8 figures

  44. arXiv:2503.01090  [pdf, other

    cs.CL

    Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs

    Authors: Haowen Pan, Xiaozhi Wang, Yixin Cao, Zenglin Shi, Xun Yang, Juanzi Li, Meng Wang

    Abstract: Knowledge editing aims to update outdated information in Large Language Models (LLMs). A representative line of study is locate-then-edit methods, which typically employ causal tracing to identify the modules responsible for recalling factual knowledge about entities. However, we find these methods are often sensitive only to changes in the subject entity, leaving them less effective at adapting t… ▽ More

    Submitted 17 March, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

    Comments: ICLR 2025

  45. arXiv:2502.06816  [pdf, other

    cs.LG cs.AI

    DeepCell: Multiview Representation Learning for Post-Mapping Netlists

    Authors: Zhengyuan Shi, Chengyu Ma, Ziyang Zheng, Lingfeng Zhou, Hongyang Pan, Wentao Jiang, Fan Yang, Xiaoyan Yang, Zhufei Chu, Qiang Xu

    Abstract: Representation learning for post-mapping (PM) netlists is a critical challenge in Electronic Design Automation (EDA), driven by the diverse and complex nature of modern circuit designs. Existing approaches focus on intermediate representations like And-Inverter Graphs (AIGs), limiting their applicability to post-synthesis stages. We introduce DeepCell, a multiview representation learning framework… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  46. arXiv:2502.05293  [pdf, other

    cs.DC

    Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems

    Authors: Wenyi Wang, Maxime Gonthier, Poornima Nookala, Haochen Pan, Ian Foster, Ioan Raicu, Kyle Chard

    Abstract: Achieving efficient task parallelism on many-core architectures is an important challenge. The widely used GNU OpenMP implementation of the popular OpenMP parallel programming model incurs high overhead for fine-grained, short-running tasks due to time spent on runtime synchronization. In this work, we introduce and analyze three key advances that collectively achieve significant performance gains… ▽ More

    Submitted 19 March, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: 13 pages, 11 figures, camera-ready, accepted by IPDPS2025

    ACM Class: D.1.3

  47. arXiv:2502.02780  [pdf, other

    cs.HC cs.AI cs.LG

    Classroom Simulacra: Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation

    Authors: Songlin Xu, Hao-Ning Wen, Hongyi Pan, Dallas Dominguez, Dongyin Hu, Xinyu Zhang

    Abstract: Student simulation supports educators to improve teaching by interacting with virtual students. However, most existing approaches ignore the modulation effects of course materials because of two challenges: the lack of datasets with granularly annotated course materials, and the limitation of existing simulation models in processing extremely long textual data. To solve the challenges, we first ru… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 26 pages

  48. arXiv:2501.12023  [pdf, other

    cs.LG cs.CV eess.IV

    Comparative Analysis of Pre-trained Deep Learning Models and DINOv2 for Cushing's Syndrome Diagnosis in Facial Analysis

    Authors: Hongjun Liu, Changwei Song, Jiaqi Qiang, Jianqiang Li, Hui Pan, Lin Lu, Xiao Long, Qing Zhao, Jiuzuo Huang, Shi Chen

    Abstract: Cushing's syndrome is a condition caused by excessive glucocorticoid secretion from the adrenal cortex, often manifesting with moon facies and plethora, making facial data crucial for diagnosis. Previous studies have used pre-trained convolutional neural networks (CNNs) for diagnosing Cushing's syndrome using frontal facial images. However, CNNs are better at capturing local features, while Cushin… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  49. arXiv:2501.10651  [pdf, other

    cs.DC cond-mat.mtrl-sci cs.LG

    MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow

    Authors: Xiaoli Yan, Nathaniel Hudson, Hyun Park, Daniel Grzenda, J. Gregory Pauloski, Marcus Schwarting, Haochen Pan, Hassan Harb, Samuel Foreman, Chris Knight, Tom Gibbs, Kyle Chard, Santanu Chaudhuri, Emad Tajkhorshid, Ian Foster, Mohamad Moosavi, Logan Ward, E. A. Huerta

    Abstract: We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screeni… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

    Comments: 13 pages, 10 figures

  50. arXiv:2501.02471  [pdf, other

    cs.CL cs.AI

    Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine

    Authors: Yishen Liu, Shengda Luo, Zishao Zhong, Tongtong Wu, Jianguo Zhang, Peiyao Ou, Yong Liang, Liang Liu, Hudan Pan

    Abstract: Large language models (LLMs) primarily trained on English texts, often face biases and inaccuracies in Chinese contexts. Their limitations are pronounced in fields like Traditional Chinese Medicine (TCM), where cultural and clinical subtleties are vital, further hindered by a lack of domain-specific data, such as rheumatoid arthritis (RA). To address these issues, this paper introduces Hengqin-RA-… ▽ More

    Submitted 27 March, 2025; v1 submitted 5 January, 2025; originally announced January 2025.

    Comments: 8 pages, 5 figures, AAAI-2025 Workshop