-
Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation
Authors:
Ziying Song,
Lin Liu,
Hongyu Pan,
Bencheng Liao,
Mingzhe Guo,
Lei Yang,
Yongchang Zhang,
Shaoqing Xu,
Caiyan Jia,
Yadan Luo
Abstract:
Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories.…
▽ More
Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
DynoStore: A wide-area distribution system for the management of data over heterogeneous storage
Authors:
Dante D. Sanchez-Gallegos,
J. L. Gonzalez-Compean,
Maxime Gonthier,
Valerie Hayot-Sasson,
J. Gregory Pauloski,
Haochen Pan,
Kyle Chard,
Jesus Carretero,
Ian Foster
Abstract:
Data distribution across different facilities offers benefits such as enhanced resource utilization, increased resilience through replication, and improved performance by processing data near its source. However, managing such data is challenging due to heterogeneous access protocols, disparate authentication models, and the lack of a unified coordination framework. This paper presents DynoStore,…
▽ More
Data distribution across different facilities offers benefits such as enhanced resource utilization, increased resilience through replication, and improved performance by processing data near its source. However, managing such data is challenging due to heterogeneous access protocols, disparate authentication models, and the lack of a unified coordination framework. This paper presents DynoStore, a system that manages data across heterogeneous storage systems. At the core of DynoStore are data containers, an abstraction that provides standardized interfaces for seamless data management, irrespective of the underlying storage systems. Multiple data container connections create a cohesive wide-area storage network, ensuring resilience using erasure coding policies. Furthermore, a load-balancing algorithm ensures equitable and efficient utilization of storage resources. We evaluate DynoStore using benchmarks and real-world case studies, including the management of medical and satellite data across geographically distributed environments. Our results demonstrate a 10\% performance improvement compared to centralized cloud-hosted systems while maintaining competitive performance with state-of-the-art solutions such as Redis and IPFS. DynoStore also exhibits superior fault tolerance, withstanding more failures than traditional systems.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Generative AI-enhanced Low-Altitude UAV-Mounted Stacked Intelligent Metasurfaces
Authors:
Geng Sun,
Mingzhe Fan,
Lei Zhang,
Hongyang Pan,
Jiahui Li,
Chuang Zhang,
Linyao Li,
Changyuan Zhao,
Chau Yuen
Abstract:
Wireless communication systems face significant challenges in meeting the increasing demands for higher data rates and more reliable connectivity in complex environments. Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for realizing wave-domain signal processing, with mobile SIMs offering superior communication performance compared to their fixed counterparts. In thi…
▽ More
Wireless communication systems face significant challenges in meeting the increasing demands for higher data rates and more reliable connectivity in complex environments. Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for realizing wave-domain signal processing, with mobile SIMs offering superior communication performance compared to their fixed counterparts. In this paper, we investigate a novel unmanned aerial vehicle (UAV)-mounted SIMs (UAV-SIMs) assisted communication system within the low-altitude economy (LAE) networks paradigm, where UAVs function as both base stations that cache SIM-processed data and mobile platforms that flexibly deploy SIMs to enhance uplink communications from ground users. To maximize network capacity, we formulate a UAV-SIM-based joint optimization problem (USBJOP) that comprehensively addresses three critical aspects: the association between UAV-SIMs and users, the three-dimensional positioning of UAV-SIMs, and the phase shifts across multiple SIM layers. Due to the inherent non-convexity and NP-hardness of USBJOP, we decompose it into three sub-optimization problems, \textit{i.e.}, association between UAV-SIMs and users optimization problem (AUUOP), UAV location optimization problem (ULOP), and UAV-SIM phase shifts optimization problem (USPSOP), and solve them using an alternating optimization strategy. Specifically, we transform AUUOP and ULOP into convex forms solvable by the CVX tool, while addressing USPSOP through a generative artificial intelligence (GAI)-based hybrid optimization algorithm. Simulations demonstrate that our proposed approach significantly outperforms benchmark schemes, achieving approximately 1.5 times higher network capacity compared to suboptimal alternatives. Additionally, our proposed GAI method reduces the algorithm runtime by 10\% while maintaining solution quality.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation
Authors:
Hongyi Pan,
Ziliang Hong,
Gorkem Durak,
Ziyue Xu,
Ulas Bagci
Abstract:
Federated learning (FL) has emerged as a promising paradigm for collaboratively training deep learning models across institutions without exchanging sensitive medical data. However, its effectiveness is often hindered by limited data availability and non-independent, identically distributed data across participating clients, which can degrade model performance and generalization. To address these…
▽ More
Federated learning (FL) has emerged as a promising paradigm for collaboratively training deep learning models across institutions without exchanging sensitive medical data. However, its effectiveness is often hindered by limited data availability and non-independent, identically distributed data across participating clients, which can degrade model performance and generalization. To address these challenges, we propose a generative AI based data augmentation framework that integrates synthetic image sharing into the federated training process for breast cancer diagnosis via ultrasound images. Specifically, we train two simple class-specific Deep Convolutional Generative Adversarial Networks: one for benign and one for malignant lesions. We then simulate a realistic FL setting using three publicly available breast ultrasound image datasets: BUSI, BUS-BRA, and UDIAT. FedAvg and FedProx are adopted as baseline FL algorithms. Experimental results show that incorporating a suitable number of synthetic images improved the average AUC from 0.9206 to 0.9237 for FedAvg and from 0.9429 to 0.9538 for FedProx. We also note that excessive use of synthetic data reduced performance, underscoring the importance of maintaining a balanced ratio of real and synthetic samples. Our findings highlight the potential of generative AI based data augmentation to enhance FL results in the breast ultrasound image classification task.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Spatial frequency information fusion network for few-shot learning
Authors:
Wenqing Zhao,
Guojia Xie,
Han Pan,
Biao Yang,
Weichuan Zhang
Abstract:
The objective of Few-shot learning is to fully leverage the limited data resources for exploring the latent correlations within the data by applying algorithms and training a model with outstanding performance that can adequately meet the demands of practical applications. In practical applications, the number of images in each category is usually less than that in traditional deep learning, which…
▽ More
The objective of Few-shot learning is to fully leverage the limited data resources for exploring the latent correlations within the data by applying algorithms and training a model with outstanding performance that can adequately meet the demands of practical applications. In practical applications, the number of images in each category is usually less than that in traditional deep learning, which can lead to over-fitting and poor generalization performance. Currently, many Few-shot classification models pay more attention to spatial domain information while neglecting frequency domain information, which contains more feature information. Ignoring frequency domain information will prevent the model from fully exploiting feature information, which would effect the classification performance. Based on conventional data augmentation, this paper proposes an SFIFNet with innovative data preprocessing. The key of this method is enhancing the accuracy of image feature representation by integrating frequency domain information with spatial domain information. The experimental results demonstrate the effectiveness of this method in enhancing classification performance.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning
Authors:
Haolin Pan,
Hongyu Lin,
Haoran Luo,
Yang Liu,
Kaichun Yao,
Libo Zhang,
Mingjie Xing,
Yanjun Wu
Abstract:
Compiler auto-tuning optimizes pass sequences to improve performance metrics such as Intermediate Representation (IR) instruction count. Although recent advances leveraging Large Language Models (LLMs) have shown promise in automating compiler tuning, two significant challenges still remain: the absence of high-quality reasoning datasets for agents training, and limited effective interactions with…
▽ More
Compiler auto-tuning optimizes pass sequences to improve performance metrics such as Intermediate Representation (IR) instruction count. Although recent advances leveraging Large Language Models (LLMs) have shown promise in automating compiler tuning, two significant challenges still remain: the absence of high-quality reasoning datasets for agents training, and limited effective interactions with the compilation environment. In this work, we introduce Compiler-R1, the first reinforcement learning (RL)-driven framework specifically augmenting LLM capabilities for compiler auto-tuning. Compiler-R1 features a curated, high-quality reasoning dataset and a novel two-stage end-to-end RL training pipeline, enabling efficient environment exploration and learning through an outcome-based reward. Extensive experiments across seven datasets demonstrate Compiler-R1 achieving an average 8.46% IR instruction count reduction compared to opt -Oz, showcasing the strong potential of RL-trained LLMs for compiler optimization. Our code and datasets are publicly available at https://github.com/Panhaolin2001/Compiler-R1.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning
Authors:
Zhiqiang Li,
Haiyong Bao,
Menghong Guan,
Hao Pan,
Cheng Huang,
Hong-Ning Dai
Abstract:
Despite federated learning (FL)'s potential in collaborative learning, its performance has deteriorated due to the data heterogeneity of distributed users. Recently, clustered federated learning (CFL) has emerged to address this challenge by partitioning users into clusters according to their similarity. However, CFL faces difficulties in training when users are unwilling to share their cluster id…
▽ More
Despite federated learning (FL)'s potential in collaborative learning, its performance has deteriorated due to the data heterogeneity of distributed users. Recently, clustered federated learning (CFL) has emerged to address this challenge by partitioning users into clusters according to their similarity. However, CFL faces difficulties in training when users are unwilling to share their cluster identities due to privacy concerns. To address these issues, we present an innovative Efficient and Robust Secure Aggregation scheme for CFL, dubbed EBS-CFL. The proposed EBS-CFL supports effectively training CFL while maintaining users' cluster identity confidentially. Moreover, it detects potential poisonous attacks without compromising individual client gradients by discarding negatively correlated gradients and aggregating positively correlated ones using a weighted approach. The server also authenticates correct gradient encoding by clients. EBS-CFL has high efficiency with client-side overhead O(ml + m^2) for communication and O(m^2l) for computation, where m is the number of cluster identities, and l is the gradient size. When m = 1, EBS-CFL's computational efficiency of client is at least O(log n) times better than comparison schemes, where n is the number of clients.In addition, we validate the scheme through extensive experiments. Finally, we theoretically prove the scheme's security.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
NeuVAS: Neural Implicit Surfaces for Variational Shape Modeling
Authors:
Pengfei Wang,
Qiujie Dong,
Fangtian Liang,
Hao Pan,
Lei Yang,
Congyi Zhang,
Guying Lin,
Caiming Zhang,
Yuanfeng Zhou,
Changhe Tu,
Shiqing Xin,
Alla Sheffer,
Xin Li,
Wenping Wang
Abstract:
Neural implicit shape representation has drawn significant attention in recent years due to its smoothness, differentiability, and topological flexibility. However, directly modeling the shape of a neural implicit surface, especially as the zero-level set of a neural signed distance function (SDF), with sparse geometric control is still a challenging task. Sparse input shape control typically incl…
▽ More
Neural implicit shape representation has drawn significant attention in recent years due to its smoothness, differentiability, and topological flexibility. However, directly modeling the shape of a neural implicit surface, especially as the zero-level set of a neural signed distance function (SDF), with sparse geometric control is still a challenging task. Sparse input shape control typically includes 3D curve networks or, more generally, 3D curve sketches, which are unstructured and cannot be connected to form a curve network, and therefore more difficult to deal with. While 3D curve networks or curve sketches provide intuitive shape control, their sparsity and varied topology pose challenges in generating high-quality surfaces to meet such curve constraints. In this paper, we propose NeuVAS, a variational approach to shape modeling using neural implicit surfaces constrained under sparse input shape control, including unstructured 3D curve sketches as well as connected 3D curve networks. Specifically, we introduce a smoothness term based on a functional of surface curvatures to minimize shape variation of the zero-level set surface of a neural SDF. We also develop a new technique to faithfully model G0 sharp feature curves as specified in the input curve sketches. Comprehensive comparisons with the state-of-the-art methods demonstrate the significant advantages of our method.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning
Authors:
Wanfu Gao,
Hanlin Pan,
Qingqi Han,
Kunpeng Liu
Abstract:
The "Curse of dimensionality" is prevalent across various data patterns, which increases the risk of model overfitting and leads to a decline in model classification performance. However, few studies have focused on this issue in Partial Multi-label Learning (PML), where each sample is associated with a set of candidate labels, at least one of which is correct. Existing PML methods addressing this…
▽ More
The "Curse of dimensionality" is prevalent across various data patterns, which increases the risk of model overfitting and leads to a decline in model classification performance. However, few studies have focused on this issue in Partial Multi-label Learning (PML), where each sample is associated with a set of candidate labels, at least one of which is correct. Existing PML methods addressing this problem are mainly based on the low-rank assumption. However, low-rank assumption is difficult to be satisfied in practical situations and may lead to loss of high-dimensional information. Furthermore, we find that existing methods have poor ability to identify positive labels, which is important in real-world scenarios. In this paper, a PML feature selection method is proposed considering two important characteristics of dataset: label relationship's noise-resistance and label connectivity. Our proposed method utilizes label relationship's noise-resistance to disambiguate labels. Then the learning process is designed through the reformed low-rank assumption. Finally, representative labels are found through label connectivity, and the weight matrix is reconstructed to select features with strong identification ability to these labels. The experimental results on benchmark datasets demonstrate the superiority of the proposed method.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage
Authors:
Maxime Gonthier,
Dante D. Sanchez-Gallegos,
Haochen Pan,
Bogdan Nicolae,
Sicheng Zhou,
Hai Duc Nguyen,
Valerie Hayot-Sasson,
J. Gregory Pauloski,
Jesus Carretero,
Kyle Chard,
Ian Foster
Abstract:
The exponential growth of data necessitates distributed storage models, such as peer-to-peer systems and data federations. While distributed storage can reduce costs and increase reliability, the heterogeneity in storage capacity, I/O performance, and failure rates of storage resources makes their efficient use a challenge. Further, node failures are common and can lead to data unavailability and…
▽ More
The exponential growth of data necessitates distributed storage models, such as peer-to-peer systems and data federations. While distributed storage can reduce costs and increase reliability, the heterogeneity in storage capacity, I/O performance, and failure rates of storage resources makes their efficient use a challenge. Further, node failures are common and can lead to data unavailability and even data loss. Erasure coding is a common resiliency strategy implemented in storage systems to mitigate failures by striping data across storage locations. However, erasure coding is computationally expensive and existing systems do not consider the heterogeneous resources and their varied capacity and performance when placing data chunks. We tackle the challenges of using erasure coding with distributed and heterogeneous nodes, aiming to store as much data as possible, minimize encoding and decoding time, and meeting user-defined reliability requirements for each data item. We propose two new dynamic scheduling algorithms, D-Rex LB and D-Rex SC, that adaptively choose erasure coding parameters and map chunks to heterogeneous nodes. D-Rex SC achieves robust performance for both storage utilization and throughput, at a higher computational cost, while D-Rex LB is faster but with slightly less competitive performance. In addition, we propose two greedy algorithms, GreedyMinStorage and GreedyLeastUsed, that optimize for storage utilization and load balancing, respectively. Our experimental evaluation shows that our dynamic schedulers store, on average, 45% more data items without significantly degrading I/O throughput compared to state-of-the-art algorithms, while GreedyLeastUsed is able to store 21% more data items while also increasing throughput.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
Uncertainty-Aware Metabolic Stability Prediction with Dual-View Contrastive Learning
Authors:
Peijin Guo,
Minghui Li,
Hewen Pan,
Bowen Chen,
Yang Wu,
Zikang Guo,
Leo Yu Zhang,
Shengshan Hu,
Shengqing Hu
Abstract:
Accurate prediction of molecular metabolic stability (MS) is critical for drug research and development but remains challenging due to the complex interplay of molecular interactions. Despite recent advances in graph neural networks (GNNs) for MS prediction, current approaches face two critical limitations: (1) incomplete molecular modeling due to atom-centric message-passing mechanisms that disre…
▽ More
Accurate prediction of molecular metabolic stability (MS) is critical for drug research and development but remains challenging due to the complex interplay of molecular interactions. Despite recent advances in graph neural networks (GNNs) for MS prediction, current approaches face two critical limitations: (1) incomplete molecular modeling due to atom-centric message-passing mechanisms that disregard bond-level topological features, and (2) prediction frameworks that lack reliable uncertainty quantification. To address these challenges, we propose TrustworthyMS, a novel contrastive learning framework designed for uncertainty-aware metabolic stability prediction. First, a molecular graph topology remapping mechanism synchronizes atom-bond interactions through edge-induced feature propagation, capturing both localized electronic effects and global conformational constraints. Second, contrastive topology-bond alignment enforces consistency between molecular topology views and bond patterns via feature alignment, enhancing representation robustness. Third, uncertainty modeling through Beta-Binomial uncertainty quantification enables simultaneous prediction and confidence calibration under epistemic uncertainty. Through extensive experiments, our results demonstrate that TrustworthyMS outperforms current state-of-the-art methods in terms of predictive performance.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework
Authors:
Xin Kang,
Zihan Zheng,
Lei Chu,
Yue Gao,
Jiahao Li,
Hao Pan,
Xuejin Chen,
Yan Lu
Abstract:
We present LTM3D, a Latent Token space Modeling framework for conditional 3D shape generation that integrates the strengths of diffusion and auto-regressive (AR) models. While diffusion-based methods effectively model continuous latent spaces and AR models excel at capturing inter-token dependencies, combining these paradigms for 3D shape generation remains a challenge. To address this, LTM3D feat…
▽ More
We present LTM3D, a Latent Token space Modeling framework for conditional 3D shape generation that integrates the strengths of diffusion and auto-regressive (AR) models. While diffusion-based methods effectively model continuous latent spaces and AR models excel at capturing inter-token dependencies, combining these paradigms for 3D shape generation remains a challenge. To address this, LTM3D features a Conditional Distribution Modeling backbone, leveraging a masked autoencoder and a diffusion model to enhance token dependency learning. Additionally, we introduce Prefix Learning, which aligns condition tokens with shape latent tokens during generation, improving flexibility across modalities. We further propose a Latent Token Reconstruction module with Reconstruction-Guided Sampling to reduce uncertainty and enhance structural fidelity in generated shapes. Our approach operates in token space, enabling support for multiple 3D representations, including signed distance fields, point clouds, meshes, and 3D Gaussian Splatting. Extensive experiments on image- and text-conditioned shape generation tasks demonstrate that LTM3D outperforms existing methods in prompt fidelity and structural accuracy while offering a generalizable framework for multi-modal, multi-representation 3D generation.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method
Authors:
Wanfu Gao,
Jun Gao,
Qingqi Han,
Hanlin Pan,
Kunpeng Liu
Abstract:
The rapid growth in feature dimension may introduce implicit associations between features and labels in multi-label datasets, making the relationships between features and labels increasingly complex. Moreover, existing methods often adopt low-dimensional linear decomposition to explore the associations between features and labels. However, linear decomposition struggles to capture complex nonlin…
▽ More
The rapid growth in feature dimension may introduce implicit associations between features and labels in multi-label datasets, making the relationships between features and labels increasingly complex. Moreover, existing methods often adopt low-dimensional linear decomposition to explore the associations between features and labels. However, linear decomposition struggles to capture complex nonlinear associations and may lead to misalignment between the feature space and the label space. To address these two critical challenges, we propose innovative solutions. First, we design a random walk graph that integrates feature-feature, label-label, and feature-label relationships to accurately capture nonlinear and implicit indirect associations, while optimizing the latent representations of associations between features and labels after low-rank decomposition. Second, we align the variable spaces by leveraging low-dimensional representation coefficients, while preserving the manifold structure between the original high-dimensional multi-label data and the low-dimensional representation space. Extensive experiments and ablation studies conducted on seven benchmark datasets and three representative datasets using various evaluation metrics demonstrate the superiority of the proposed method\footnote{Code: https://github.com/Heilong623/-GRW-}.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Human-Centered Human-AI Collaboration (HCHAC)
Authors:
Qi Gao,
Wei Xu,
Hanxi Pan,
Mowei Shen,
Zaifeng Gao
Abstract:
In the intelligent era, the interaction between humans and intelligent systems fundamentally involves collaboration with autonomous intelligent agents. Human-AI Collaboration (HAC) represents a novel type of human-machine relationship facilitated by autonomous intelligent machines equipped with AI technologies. In this paradigm, AI agents serve not only as auxiliary tools but also as active teamma…
▽ More
In the intelligent era, the interaction between humans and intelligent systems fundamentally involves collaboration with autonomous intelligent agents. Human-AI Collaboration (HAC) represents a novel type of human-machine relationship facilitated by autonomous intelligent machines equipped with AI technologies. In this paradigm, AI agents serve not only as auxiliary tools but also as active teammates, partnering with humans to accomplish tasks collaboratively. Human-centered AI (HCAI) emphasizes that humans play critical leadership roles in the collaboration. This human-led collaboration imparts new dimensions to the human-machine relationship, necessitating innovative research perspectives, paradigms, and agenda to address the unique challenges posed by HAC. This chapter delves into the essence of HAC from the human-centered perspective, outlining its core concepts and distinguishing features. It reviews the current research methodologies and research agenda within the HAC field from the HCAI perspective, highlighting advancements and ongoing studies. Furthermore, a framework for human-centered HAC (HCHAC) is proposed by integrating these reviews and analyses. A case study of HAC in the context of autonomous vehicles is provided, illustrating practical applications and the synergistic interactions between humans and AI agents. Finally, it identifies potential future research directions aimed at enhancing the effectiveness, reliability, and ethical integration of human-centered HAC systems in diverse domains.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting
Authors:
Zechen Li,
Lanqing Yang,
Yiheng Bian,
Hao Pan,
Yongjian Fu,
Yezhou Wang,
Yi-Chao Chen,
Guangtao Xue,
Ju Ren
Abstract:
This paper presents an innovative frequency-embedded 3D Gaussian splatting (3DGS) algorithm for wideband radio-frequency (RF) radiance field modeling, offering an advancement over the existing works limited to single-frequency modeling. Grounded in fundamental physics, we uncover the complex relationship between EM wave propagation behaviors and RF frequencies. Inspired by this, we design an EM fe…
▽ More
This paper presents an innovative frequency-embedded 3D Gaussian splatting (3DGS) algorithm for wideband radio-frequency (RF) radiance field modeling, offering an advancement over the existing works limited to single-frequency modeling. Grounded in fundamental physics, we uncover the complex relationship between EM wave propagation behaviors and RF frequencies. Inspired by this, we design an EM feature network with attenuation and radiance modules to learn the complex relationships between RF frequencies and the key properties of each 3D Gaussian, specifically the attenuation factor and RF signal intensity. By training the frequency-embedded 3DGS model, we can efficiently reconstruct RF radiance fields at arbitrary unknown frequencies within a given 3D environment. Finally, we propose a large-scale power angular spectrum (PAS) dataset containing 50000 samples ranging from 1 to 100 GHz in 6 indoor environments, and conduct extensive experiments to verify the effectiveness of our method. Our approach achieves an average Structural Similarity Index Measure (SSIM) up to 0.72, and a significant improvement up to 17.8% compared to the current state-of-the-art (SOTA) methods trained on individual test frequencies. Additionally, our method achieves an SSIM of 0.70 without prior training on these frequencies, which represents only a 2.8% performance drop compared to models trained with full PAS data. This demonstrates our model's capability to estimate PAS at unknown frequencies. For related code and datasets, please refer to https://github.com/sim-2-real/Wideband3DGS.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Chordless Structure: A Pathway to Simple and Expressive GNNs
Authors:
Hongxu Pan,
Shuxian Hu,
Mo Zhou,
Zhibin Wang,
Rong Gu,
Chen Tian,
Kun Yang,
Sheng Zhong
Abstract:
Researchers have proposed various methods of incorporating more structured information into the design of Graph Neural Networks (GNNs) to enhance their expressiveness. However, these methods are either computationally expensive or lacking in provable expressiveness. In this paper, we observe that the chords increase the complexity of the graph structure while contributing little useful information…
▽ More
Researchers have proposed various methods of incorporating more structured information into the design of Graph Neural Networks (GNNs) to enhance their expressiveness. However, these methods are either computationally expensive or lacking in provable expressiveness. In this paper, we observe that the chords increase the complexity of the graph structure while contributing little useful information in many cases. In contrast, chordless structures are more efficient and effective for representing the graph. Therefore, when leveraging the information of cycles, we choose to omit the chords. Accordingly, we propose a Chordless Structure-based Graph Neural Network (CSGNN) and prove that its expressiveness is strictly more powerful than the k-hop GNN (KPGNN) with polynomial complexity. Experimental results on real-world datasets demonstrate that CSGNN outperforms existing GNNs across various graph tasks while incurring lower computational costs and achieving better performance than the GNNs of 3-WL expressiveness.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach
Authors:
Huazi Pan,
Yanjun Zhang,
Leo Yu Zhang,
Scott Adams,
Abbas Kouzani,
Suiyang Khoo
Abstract:
Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA)…
▽ More
Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA) scheme, aiming at precisely introducing the extent of poisoning in a subtle controlled manner. It operates with a predefined objective, such as reducing global model's prediction accuracy by 10%. FedSA integrates robust nonlinear control-Sliding Mode Control (SMC) theory with model poisoning attacks. It can manipulate the updates from malicious clients to drive the global model towards a compromised state, achieving this at a controlled and inconspicuous rate. Additionally, leveraging the robust control properties of FedSA allows precise control over the convergence bounds, enabling the attacker to set the global accuracy of the poisoned model to any desired level. Experimental results demonstrate that FedSA can accurately achieve a predefined global accuracy with fewer malicious clients while maintaining a high level of stealth and adjustable learning rates.
△ Less
Submitted 28 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution
Authors:
Xu Yang,
Xiao Yang,
Shikai Fang,
Bowen Xian,
Yuante Li,
Jian Wang,
Minrui Xu,
Haoran Pan,
Xinpeng Hong,
Weiqing Liu,
Yelong Shen,
Weizhu Chen,
Jiang Bian
Abstract:
Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. While crowdsourcing platforms alleviate some challenges, high-level data science tasks remain labor-intensive and iterative. To overcome these limitations, we introduce R&D-Agent, a dual-agent framework for iterative exploration. The Researcher agent uses pe…
▽ More
Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. While crowdsourcing platforms alleviate some challenges, high-level data science tasks remain labor-intensive and iterative. To overcome these limitations, we introduce R&D-Agent, a dual-agent framework for iterative exploration. The Researcher agent uses performance feedback to generate ideas, while the Developer agent refines code based on error feedback. By enabling multiple parallel exploration traces that merge and enhance one another, R&D-Agent narrows the gap between automated solutions and expert-level performance. Evaluated on MLE-Bench, R&D-Agent emerges as the top-performing machine learning engineering agent, demonstrating its potential to accelerate innovation and improve precision across diverse data science applications. We have open-sourced R&D-Agent on GitHub: https://github.com/microsoft/RD-Agent.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
MAGI-1: Autoregressive Video Generation at Scale
Authors:
Sand. ai,
Hansi Teng,
Hongyu Jia,
Lei Sun,
Lingzhi Li,
Maolin Li,
Mingqiu Tang,
Shuai Han,
Tianning Zhang,
W. Q. Zhang,
Weifeng Luo,
Xiaoyang Kang,
Yuchen Sun,
Yue Cao,
Yunpeng Huang,
Yutong Lin,
Yuxin Fang,
Zewei Tao,
Zheng Zhang,
Zhongshu Wang,
Zixun Liu,
Dai Shi,
Guoli Su,
Hanwen Sun,
Hong Pan
, et al. (14 additional authors not shown)
Abstract:
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks condition…
▽ More
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Dual-Agent Reinforcement Learning for Automated Feature Generation
Authors:
Wanfu Gao,
Zengyao Man,
Hanlin Pan,
Kunpeng Liu
Abstract:
Feature generation involves creating new features from raw data to capture complex relationships among the original features, improving model robustness and machine learning performance. Current methods using reinforcement learning for feature generation have made feature exploration more flexible and efficient. However, several challenges remain: first, during feature expansion, a large number of…
▽ More
Feature generation involves creating new features from raw data to capture complex relationships among the original features, improving model robustness and machine learning performance. Current methods using reinforcement learning for feature generation have made feature exploration more flexible and efficient. However, several challenges remain: first, during feature expansion, a large number of redundant features are generated. When removing them, current methods only retain the best features each round, neglecting those that perform poorly initially but could improve later. Second, the state representation used by current methods fails to fully capture complex feature relationships. Third, there are significant differences between discrete and continuous features in tabular data, requiring different operations for each type. To address these challenges, we propose a novel dual-agent reinforcement learning method for feature generation. Two agents are designed: the first generates new features, and the second determines whether they should be preserved. A self-attention mechanism enhances state representation, and diverse operations distinguish interactions between discrete and continuous features. The experimental results on multiple datasets demonstrate that the proposed method is effective. The code is available at https://github.com/extess0/DARL.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA
Authors:
Zhengyuan Shi,
Zeju Li,
Chengyu Ma,
Yunhao Zhou,
Ziyang Zheng,
Jiawei Liu,
Hongyang Pan,
Lingfeng Zhou,
Kezhi Li,
Jiaying Zhu,
Lingwei Yan,
Zhiqiang He,
Chenhao Xue,
Wentao Jiang,
Fan Yang,
Guangyu Sun,
Xiaoyan Yang,
Gang Chen,
Chuan Shi,
Zhufei Chu,
Jun Yang,
Qiang Xu
Abstract:
We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on…
▽ More
We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on critical tasks such as Power, Performance, and Area (PPA) optimization, highlighting its ability to expose performance gaps and drive advancements. Additionally, ForgeEDA's scale and diversity facilitate the training of AI models for EDA tasks, demonstrating its potential to improve model performance and generalization. By addressing limitations in existing datasets, ForgeEDA aims to catalyze breakthroughs in modern IC design and support the next generation of innovations in EDA.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
MDD-LLM: Towards Accuracy Large Language Models for Major Depressive Disorder Diagnosis
Authors:
Yuyang Sha,
Hongxin Pan,
Wei Xu,
Weiyu Meng,
Gang Luo,
Xinyu Du,
Xiaobing Zhai,
Henry H. Y. Tong,
Caijuan Shi,
Kefeng Li
Abstract:
Major depressive disorder (MDD) impacts more than 300 million people worldwide, highlighting a significant public health issue. However, the uneven distribution of medical resources and the complexity of diagnostic methods have resulted in inadequate attention to this disorder in numerous countries and regions. This paper introduces a high-performance MDD diagnosis tool named MDD-LLM, an AI-driven…
▽ More
Major depressive disorder (MDD) impacts more than 300 million people worldwide, highlighting a significant public health issue. However, the uneven distribution of medical resources and the complexity of diagnostic methods have resulted in inadequate attention to this disorder in numerous countries and regions. This paper introduces a high-performance MDD diagnosis tool named MDD-LLM, an AI-driven framework that utilizes fine-tuned large language models (LLMs) and extensive real-world samples to tackle challenges in MDD diagnosis. Therefore, we select 274,348 individual information from the UK Biobank cohort to train and evaluate the proposed method. Specifically, we select 274,348 individual records from the UK Biobank cohort and design a tabular data transformation method to create a large corpus for training and evaluating the proposed approach. To illustrate the advantages of MDD-LLM, we perform comprehensive experiments and provide several comparative analyses against existing model-based solutions across multiple evaluation metrics. Experimental results show that MDD-LLM (70B) achieves an accuracy of 0.8378 and an AUC of 0.8919 (95% CI: 0.8799 - 0.9040), significantly outperforming existing machine learning and deep learning frameworks for MDD diagnosis. Given the limited exploration of LLMs in MDD diagnosis, we examine numerous factors that may influence the performance of our proposed method, such as tabular data transformation techniques and different fine-tuning strategies.
△ Less
Submitted 28 April, 2025;
originally announced May 2025.
-
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
Authors:
ByteDance Seed,
:,
Jiaze Chen,
Tiantian Fan,
Xin Liu,
Lingjun Liu,
Zhiqi Lin,
Mingxuan Wang,
Chengyi Wang,
Xiangpeng Wei,
Wenyuan Xu,
Yufeng Yuan,
Yu Yue,
Lin Yan,
Qiying Yu,
Xiaochen Zuo,
Chi Zhang,
Ruofei Zhu,
Zhecheng An,
Zhihao Bai,
Yu Bao,
Xingyan Bin,
Jiangjie Chen,
Feng Chen,
Hongmin Chen
, et al. (249 additional authors not shown)
Abstract:
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in…
▽ More
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark.
△ Less
Submitted 29 April, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
Mixed Structural Choice Operator: Enhancing Technology Mapping with Heterogeneous Representations
Authors:
Zhang Hu,
Hongyang Pan,
Yinshui Xia,
Lunyao Wang,
Zhufei Chu
Abstract:
The independence of logic optimization and technology mapping poses a significant challenge in achieving high-quality synthesis results. Recent studies have improved optimization outcomes through collaborative optimization of multiple logic representations and have improved structural bias through structural choices. However, these methods still rely on technology-independent optimization and fail…
▽ More
The independence of logic optimization and technology mapping poses a significant challenge in achieving high-quality synthesis results. Recent studies have improved optimization outcomes through collaborative optimization of multiple logic representations and have improved structural bias through structural choices. However, these methods still rely on technology-independent optimization and fail to truly resolve structural bias issues. This paper proposes a scalable and efficient framework based on Mixed Structural Choices (MCH). This is a novel heterogeneous mapping method that combines multiple logic representations with technology-aware optimization. MCH flexibly integrates different logic representations and stores candidates for various optimization strategies. By comprehensively evaluating the technology costs of these candidates, it enhances technology mapping and addresses structural bias issues in logic synthesis. Notably, the MCH-based lookup table (LUT) mapping algorithm set new records in the EPFL Best Results Challenge by combining the structural strengths of both And-Inverter Graph (AIG) and XOR-Majority Graph (XMG) logic representations. Additionally, MCH-based ASIC technology mapping achieves a 3.73% area and 8.94% delay reduction (balanced), 20.35% delay reduction (delay-oriented), and 21.02% area reduction (area-oriented), outperforming traditional structural choice methods. Furthermore, MCH-based logic optimization utilizes diverse structures to surpass local optima and achieve better results.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Two Tasks, One Goal: Uniting Motion and Planning for Excellent End To End Autonomous Driving Performance
Authors:
Lin Liu,
Ziying Song,
Hongyu Pan,
Lei Yang,
Caiyan Jia
Abstract:
End-to-end autonomous driving has made impressive progress in recent years. Former end-to-end autonomous driving approaches often decouple planning and motion tasks, treating them as separate modules. This separation overlooks the potential benefits that planning can gain from learning out-of-distribution data encountered in motion tasks. However, unifying these tasks poses significant challenges,…
▽ More
End-to-end autonomous driving has made impressive progress in recent years. Former end-to-end autonomous driving approaches often decouple planning and motion tasks, treating them as separate modules. This separation overlooks the potential benefits that planning can gain from learning out-of-distribution data encountered in motion tasks. However, unifying these tasks poses significant challenges, such as constructing shared contextual representations and handling the unobservability of other vehicles' states. To address these challenges, we propose TTOG, a novel two-stage trajectory generation framework. In the first stage, a diverse set of trajectory candidates is generated, while the second stage focuses on refining these candidates through vehicle state information. To mitigate the issue of unavailable surrounding vehicle states, TTOG employs a self-vehicle data-trained state estimator, subsequently extended to other vehicles. Furthermore, we introduce ECSA (equivariant context-sharing scene adapter) to enhance the generalization of scene representations across different agents. Experimental results demonstrate that TTOG achieves state-of-the-art performance across both planning and motion tasks. Notably, on the challenging open-loop nuScenes dataset, TTOG reduces the L2 distance by 36.06\%. Furthermore, on the closed-loop Bench2Drive dataset, our approach achieves a 22\% improvement in the driving score (DS), significantly outperforming existing baselines.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro
Authors:
Zheyuan Zhang,
Monica Dou,
Linkai Peng,
Hongyi Pan,
Ulas Bagci,
Boqing Gong
Abstract:
Advertisement videos serve as a rich and valuable source of purpose-driven information, encompassing high-quality visual, textual, and contextual cues designed to engage viewers. They are often more complex than general videos of similar duration due to their structured narratives and rapid scene transitions, posing significant challenges to multi-modal large language models (MLLMs). In this work,…
▽ More
Advertisement videos serve as a rich and valuable source of purpose-driven information, encompassing high-quality visual, textual, and contextual cues designed to engage viewers. They are often more complex than general videos of similar duration due to their structured narratives and rapid scene transitions, posing significant challenges to multi-modal large language models (MLLMs). In this work, we introduce VideoAds, the first dataset tailored for benchmarking the performance of MLLMs on advertisement videos. VideoAds comprises well-curated advertisement videos with complex temporal structures, accompanied by \textbf{manually} annotated diverse questions across three core tasks: visual finding, video summary, and visual reasoning. We propose a quantitative measure to compare VideoAds against existing benchmarks in terms of video complexity. Through extensive experiments, we find that Qwen2.5-VL-72B, an opensource MLLM, achieves 73.35\% accuracy on VideoAds, outperforming GPT-4o (66.82\%) and Gemini-1.5 Pro (69.66\%); the two proprietary models especially fall behind the opensource model in video summarization and reasoning, but perform the best in visual finding. Notably, human experts easily achieve a remarkable accuracy of 94.27\%. These results underscore the necessity of advancing MLLMs' temporal modeling capabilities and highlight VideoAds as a potentially pivotal benchmark for future research in understanding video that requires high FPS sampling. The dataset and evaluation code will be publicly available at https://videoadsbenchmark.netlify.app.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning
Authors:
Jiahua Lan,
Sen Zhang,
Haixia Pan,
Ruijun Liu,
Li Shen,
Dacheng Tao
Abstract:
In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grai…
▽ More
In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grained control of individual neurons. To overcome this limitation, we propose Neuron-level Balance between Stability and Plasticity (NBSP) method, by taking inspiration from the observation that specific neurons are strongly relevant to task-relevant skills. Specifically, NBSP first (1) defines and identifies RL skill neurons that are crucial for knowledge retention through a goal-oriented method, and then (2) introduces a framework by employing gradient masking and experience replay techniques targeting these neurons to preserve the encoded existing skills while enabling adaptation to new tasks. Numerous experimental results on the Meta-World and Atari benchmarks demonstrate that NBSP significantly outperforms existing approaches in balancing stability and plasticity.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition
Authors:
Qi Zhang,
Huitong Pan,
Zhijia Chen,
Longin Jan Latecki,
Cornelia Caragea,
Eduard Dragut
Abstract:
Distantly Supervised Named Entity Recognition (DS-NER) has attracted attention due to its scalability and ability to automatically generate labeled data. However, distant annotation introduces many mislabeled instances, limiting its performance. Most of the existing work attempt to solve this problem by developing intricate models to learn from the noisy labels. An alternative approach is to attem…
▽ More
Distantly Supervised Named Entity Recognition (DS-NER) has attracted attention due to its scalability and ability to automatically generate labeled data. However, distant annotation introduces many mislabeled instances, limiting its performance. Most of the existing work attempt to solve this problem by developing intricate models to learn from the noisy labels. An alternative approach is to attempt to clean the labeled data, thus increasing the quality of distant labels. This approach has received little attention for NER. In this paper, we propose a training dynamics-based label cleaning approach, which leverages the behavior of a model as training progresses to characterize the distantly annotated samples. We also introduce an automatic threshold estimation strategy to locate the errors in distant labels. Extensive experimental results demonstrate that: (1) models trained on our cleaned DS-NER datasets, which were refined by directly removing identified erroneous annotations, achieve significant improvements in F1-score, ranging from 3.18% to 8.95%; and (2) our method outperforms numerous advanced DS-NER approaches across four datasets.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
DualMS: Implicit Dual-Channel Minimal Surface Optimization for Heat Exchanger Design
Authors:
Weizheng Zhang,
Hao Pan,
Lin Lu,
Xiaowei Duan,
Xin Yan,
Ruonan Wang,
Qiang Du
Abstract:
Heat exchangers are critical components in a wide range of engineering applications, from energy systems to chemical processing, where efficient thermal management is essential. The design objectives for heat exchangers include maximizing the heat exchange rate while minimizing the pressure drop, requiring both a large interface area and a smooth internal structure. State-of-the-art designs, such…
▽ More
Heat exchangers are critical components in a wide range of engineering applications, from energy systems to chemical processing, where efficient thermal management is essential. The design objectives for heat exchangers include maximizing the heat exchange rate while minimizing the pressure drop, requiring both a large interface area and a smooth internal structure. State-of-the-art designs, such as triply periodic minimal surfaces (TPMS), have proven effective in optimizing heat exchange efficiency. However, TPMS designs are constrained by predefined mathematical equations, limiting their adaptability to freeform boundary shapes. Additionally, TPMS structures do not inherently control flow directions, which can lead to flow stagnation and undesirable pressure drops.
This paper presents DualMS, a novel computational framework for optimizing dual-channel minimal surfaces specifically for heat exchanger designs in freeform shapes. To the best of our knowledge, this is the first attempt to directly optimize minimal surfaces for two-fluid heat exchangers, rather than relying on TPMS. Our approach formulates the heat exchange maximization problem as a constrained connected maximum cut problem on a graph, with flow constraints guiding the optimization process. To address undesirable pressure drops, we model the minimal surface as a classification boundary separating the two fluids, incorporating an additional regularization term for area minimization. We employ a neural network that maps spatial points to binary flow types, enabling it to classify flow skeletons and automatically determine the surface boundary. DualMS demonstrates greater flexibility in surface topology compared to TPMS and achieves superior thermal performance, with lower pressure drops while maintaining a similar heat exchange rate under the same material cost.
△ Less
Submitted 19 May, 2025; v1 submitted 2 March, 2025;
originally announced April 2025.
-
Parsing Through Boundaries in Chinese Word Segmentation
Authors:
Yige Chen,
Zelong Li,
Cindy Zhang,
Changbing Yang,
Amandisa Cady,
Ai Ka Lee,
Zejiao Zeng,
Eunkyul Leah Jo,
Haihua Pan,
Jungyeul Park
Abstract:
Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer und…
▽ More
Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer understanding of how different segmentation strategies shape dependency structures in Chinese. Focusing on the Chinese GSD treebank, we analyze multiple word boundary schemes, each reflecting distinct linguistic and computational assumptions, and examine how they influence the resulting syntactic structures. To support detailed comparison, we introduce an interactive web-based visualization tool that displays parsing outcomes across segmentation methods.
△ Less
Submitted 4 July, 2025; v1 submitted 29 March, 2025;
originally announced March 2025.
-
MMGen: Unified Multi-modal Image Generation and Understanding in One Go
Authors:
Jiepeng Wang,
Zhaoqing Wang,
Hao Pan,
Yuan Liu,
Dongdong Yu,
Changhu Wang,
Wenping Wang
Abstract:
A unified diffusion framework for multi-modal generation and understanding has the transformative potential to achieve seamless and controllable image diffusion and other cross-modal tasks. In this paper, we introduce MMGen, a unified framework that integrates multiple generative tasks into a single diffusion model. This includes: (1) multi-modal category-conditioned generation, where multi-modal…
▽ More
A unified diffusion framework for multi-modal generation and understanding has the transformative potential to achieve seamless and controllable image diffusion and other cross-modal tasks. In this paper, we introduce MMGen, a unified framework that integrates multiple generative tasks into a single diffusion model. This includes: (1) multi-modal category-conditioned generation, where multi-modal outputs are generated simultaneously through a single inference process, given category information; (2) multi-modal visual understanding, which accurately predicts depth, surface normals, and segmentation maps from RGB images; and (3) multi-modal conditioned generation, which produces corresponding RGB images based on specific modality conditions and other aligned modalities. Our approach develops a novel diffusion transformer that flexibly supports multi-modal output, along with a simple modality-decoupling strategy to unify various tasks. Extensive experiments and applications demonstrate the effectiveness and superiority of MMGen across diverse tasks and conditions, highlighting its potential for applications that require simultaneous generation and understanding.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module
Authors:
Yishen Liu,
Shengda Liu,
Hudan Pan
Abstract:
Medical report generation requires specialized expertise that general large models often fail to accurately capture. Moreover, the inherent repetition and similarity in medical data make it difficult for models to extract meaningful features, resulting in a tendency to overfit. So in this paper, we propose a multimodal model, Co-Attention Triple-LSTM Network (CA-TriNet), a deep learning model that…
▽ More
Medical report generation requires specialized expertise that general large models often fail to accurately capture. Moreover, the inherent repetition and similarity in medical data make it difficult for models to extract meaningful features, resulting in a tendency to overfit. So in this paper, we propose a multimodal model, Co-Attention Triple-LSTM Network (CA-TriNet), a deep learning model that combines transformer architectures with a Multi-LSTM network. Its Co-Attention module synergistically links a vision transformer with a text transformer to better differentiate medical images with similarities, augmented by an adaptive weight operator to catch and amplify image labels with minor similarities. Furthermore, its Triple-LSTM module refines generated sentences using targeted image objects. Extensive evaluations over three public datasets have demonstrated that CA-TriNet outperforms state-of-the-art models in terms of comprehensive ability, even pre-trained large language models on some metrics.
△ Less
Submitted 27 March, 2025; v1 submitted 23 March, 2025;
originally announced March 2025.
-
Multi-Modality Representation Learning for Antibody-Antigen Interactions Prediction
Authors:
Peijin Guo,
Minghui Li,
Hewen Pan,
Ruixiang Huang,
Lulu Xue,
Shengqing Hu,
Zikang Guo,
Wei Wan,
Shengshan Hu
Abstract:
While deep learning models play a crucial role in predicting antibody-antigen interactions (AAI), the scarcity of publicly available sequence-structure pairings constrains their generalization. Current AAI methods often focus on residue-level static details, overlooking fine-grained structural representations of antibodies and their inter-antibody similarities. To tackle this challenge, we introdu…
▽ More
While deep learning models play a crucial role in predicting antibody-antigen interactions (AAI), the scarcity of publicly available sequence-structure pairings constrains their generalization. Current AAI methods often focus on residue-level static details, overlooking fine-grained structural representations of antibodies and their inter-antibody similarities. To tackle this challenge, we introduce a multi-modality representation approach that integates 3D structural and 1D sequence data to unravel intricate intra-antibody hierarchical relationships. By harnessing these representations, we present MuLAAIP, an AAI prediction framework that utilizes graph attention networks to illuminate graph-level structural features and normalized adaptive graph convolution networks to capture inter-antibody sequence associations. Furthermore, we have curated an AAI benchmark dataset comprising both structural and sequence information along with interaction labels. Through extensive experiments on this benchmark, our results demonstrate that MuLAAIP outperforms current state-of-the-art methods in terms of predictive performance. The implementation code and dataset are publicly available at https://github.com/trashTian/MuLAAIP for reproducibility.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
Authors:
Congyi Fan,
Jian Guan,
Xuanjia Zhao,
Dongli Xu,
Youtian Lin,
Tong Ye,
Pengming Feng,
Haiwei Pan
Abstract:
Automatically generating natural, diverse and rhythmic human dance movements driven by music is vital for virtual reality and film industries. However, generating dance that naturally follows music remains a challenge, as existing methods lack proper beat alignment and exhibit unnatural motion dynamics. In this paper, we propose Danceba, a novel framework that leverages gating mechanism to enhance…
▽ More
Automatically generating natural, diverse and rhythmic human dance movements driven by music is vital for virtual reality and film industries. However, generating dance that naturally follows music remains a challenge, as existing methods lack proper beat alignment and exhibit unnatural motion dynamics. In this paper, we propose Danceba, a novel framework that leverages gating mechanism to enhance rhythm-aware feature representation for music-driven dance generation, which achieves highly aligned dance poses with enhanced rhythmic sensitivity. Specifically, we introduce Phase-Based Rhythm Extraction (PRE) to precisely extract rhythmic information from musical phase data, capitalizing on the intrinsic periodicity and temporal structures of music. Additionally, we propose Temporal-Gated Causal Attention (TGCA) to focus on global rhythmic features, ensuring that dance movements closely follow the musical rhythm. We also introduce Parallel Mamba Motion Modeling (PMMM) architecture to separately model upper and lower body motions along with musical features, thereby improving the naturalness and diversity of generated dance movements. Extensive experiments confirm that Danceba outperforms state-of-the-art methods, achieving significantly better rhythmic alignment and motion diversity. Project page: https://danceba.github.io/ .
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Authors:
Hao Cui,
Zahra Shamsi,
Gowoon Cheon,
Xuejian Ma,
Shutong Li,
Maria Tikhanovskaya,
Peter Norgaard,
Nayantara Mudur,
Martyna Plomecka,
Paul Raccuglia,
Yasaman Bahri,
Victor V. Albert,
Pranesh Srinivasan,
Haining Pan,
Philippe Faist,
Brian Rohr,
Ekin Dogus Cubuk,
Muratahan Aykol,
Amil Merchant,
Michael J. Statt,
Dan Morris,
Drew Purves,
Elise Kleeman,
Ruth Alcantara,
Matthew Abraham
, et al. (9 additional authors not shown)
Abstract:
Scientific problem-solving involves synthesizing information while applying expert knowledge. We introduce CURIE, a scientific long-Context Understanding,Reasoning and Information Extraction benchmark to measure the potential of Large Language Models (LLMs) in scientific problem-solving and assisting scientists in realistic workflows. This benchmark introduces ten challenging tasks with a total of…
▽ More
Scientific problem-solving involves synthesizing information while applying expert knowledge. We introduce CURIE, a scientific long-Context Understanding,Reasoning and Information Extraction benchmark to measure the potential of Large Language Models (LLMs) in scientific problem-solving and assisting scientists in realistic workflows. This benchmark introduces ten challenging tasks with a total of 580 problems and solution pairs curated by experts in six disciplines - materials science, condensed matter physics, quantum computing, geospatial analysis, biodiversity, and proteins - covering both experimental and theoretical work-flows in science. We evaluate a range of closed and open LLMs on tasks in CURIE which requires domain expertise, comprehension of long in-context information,and multi-step reasoning. While Gemini Flash 2.0 and Claude-3 show consistent high comprehension across domains, the popular GPT-4o and command-R+ fail dramatically on protein sequencing tasks. With the best performance at 32% there is much room for improvement for all models. We hope that insights gained from CURIE can guide the future development of LLMs in sciences. Evaluation code and data are in https://github.com/google/curie
△ Less
Submitted 13 May, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
WRATH: Workload Resilience Across Task Hierarchies in Task-based Parallel Programming Frameworks
Authors:
Sicheng Zhou,
Zhuozhao Li,
Valérie Hayot-Sasson,
Haochen Pan,
Maxime Gonthier,
J. Gregory Pauloski,
Ryan Chard,
Kyle Chard,
Ian Foster
Abstract:
Failures in Task-based Parallel Programming (TBPP) can severely degrade performance and result in incomplete or incorrect outcomes. Existing failure-handling approaches, including reactive, proactive, and resilient methods such as retry and checkpointing mechanisms, often apply uniform retry mechanisms regardless of the root cause of failures, failing to account for the unique characteristics of T…
▽ More
Failures in Task-based Parallel Programming (TBPP) can severely degrade performance and result in incomplete or incorrect outcomes. Existing failure-handling approaches, including reactive, proactive, and resilient methods such as retry and checkpointing mechanisms, often apply uniform retry mechanisms regardless of the root cause of failures, failing to account for the unique characteristics of TBPP frameworks such as heterogeneous resource availability and task-level failures. To address these limitations, we propose WRATH, a novel systematic approach that categorizes failures based on the unique layered structure of TBPP frameworks and defines specific responses to address failures at different layers. WRATH combines a distributed monitoring system and a resilient module to collaboratively address different types of failures in real time. The monitoring system captures execution and resource information, reports failures, and profiles tasks across different layers of TBPP frameworks. The resilient module then categorizes failures and responds with appropriate actions, such as hierarchically retrying failed tasks on suitable resources. Evaluations demonstrate that WRATH significantly improves TBPP robustness, tripling the task success rate and maintaining an application success rate of over 90% for resolvable failures. Additionally, WRATH can reduce the time to failure by 20%-50%, allowing tasks that are destined to fail to be identified and fail more quickly.
△ Less
Submitted 27 March, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Machine Learning-Based Model for Postoperative Stroke Prediction in Coronary Artery Disease
Authors:
Haonan Pan,
Shuheng Chen,
Elham Pishgar,
Kamiar Alaei,
Greg Placencia,
Maryam Pishgar
Abstract:
Coronary artery disease remains one of the leading causes of mortality globally. Despite advances in revascularization treatments like PCI and CABG, postoperative stroke is inevitable. This study aims to develop and evaluate a sophisticated machine learning prediction model to assess postoperative stroke risk in coronary revascularization patients.This research employed data from the MIMIC-IV data…
▽ More
Coronary artery disease remains one of the leading causes of mortality globally. Despite advances in revascularization treatments like PCI and CABG, postoperative stroke is inevitable. This study aims to develop and evaluate a sophisticated machine learning prediction model to assess postoperative stroke risk in coronary revascularization patients.This research employed data from the MIMIC-IV database, consisting of a cohort of 7023 individuals. Study data included clinical, laboratory, and comorbidity variables. To reduce multicollinearity, variables with over 30% missing values and features with a correlation coefficient larger than 0.9 were deleted. The dataset has 70% training and 30% test. The Random Forest technique interpolated residual dataset missing values. Numerical values were normalized, whereas categorical variables were one-hot encoded. LASSO regularization selected features, and grid search found model hyperparameters. Finally, Logistic Regression, XGBoost, SVM, and CatBoost were employed for predictive modeling, and SHAP analysis assessed stroke risk for each variable. AUC of 0.855 (0.829-0.878) showed that SVM model outperformed logistic regression and CatBoost models in prior research. SHAP research showed that the Charlson Comorbidity Index (CCI), diabetes, chronic kidney disease, and heart failure are significant prognostic factors for postoperative stroke. This study shows that improved machine learning reduces overfitting and improves model predictive accuracy. Models using the CCI alone cannot predict postoperative stroke risk as accurately as those using independent comorbidity variables. The suggested technique provides a more thorough and individualized risk assessment by encompassing a wider range of clinically relevant characteristics, making it a better reference for preoperative risk assessments and targeted intervention.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Authors:
Yi Yang,
Xiaoxuan He,
Hongkun Pan,
Xiyan Jiang,
Yan Deng,
Xingtao Yang,
Haoyu Lu,
Dacheng Yin,
Fengyun Rao,
Minfeng Zhu,
Bo Zhang,
Wei Chen
Abstract:
Large Language Models have demonstrated remarkable reasoning capability in complex textual tasks. However, multimodal reasoning, which requires integrating visual and textual information, remains a significant challenge. Existing visual-language models often struggle to effectively analyze and reason visual content, resulting in suboptimal performance on complex reasoning tasks. Moreover, the abse…
▽ More
Large Language Models have demonstrated remarkable reasoning capability in complex textual tasks. However, multimodal reasoning, which requires integrating visual and textual information, remains a significant challenge. Existing visual-language models often struggle to effectively analyze and reason visual content, resulting in suboptimal performance on complex reasoning tasks. Moreover, the absence of comprehensive benchmarks hinders the accurate assessment of multimodal reasoning capabilities. In this paper, we introduce R1-Onevision, a multimodal reasoning model designed to bridge the gap between visual perception and deep reasoning. To achieve this, we propose a cross-modal reasoning pipeline that transforms images into formal textural representations, enabling precise language-based reasoning. Leveraging this pipeline, we construct the R1-Onevision dataset which provides detailed, step-by-step multimodal reasoning annotations across diverse domains. We further develop the R1-Onevision model through supervised fine-tuning and reinforcement learning to cultivate advanced reasoning and robust generalization abilities. To comprehensively evaluate multimodal reasoning performance across different grades, we introduce R1-Onevision-Bench, a benchmark aligned with human educational stages, covering exams from junior high school to university and beyond. Experimental results show that R1-Onevision achieves state-of-the-art performance, outperforming models such as GPT-4o and Qwen2.5-VL on multiple challenging multimodal reasoning benchmarks.
△ Less
Submitted 18 March, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
Reconsidering Feature Structure Information and Latent Space Alignment in Partial Multi-label Feature Selection
Authors:
Hanlin Pan,
Kunpeng Liu,
Wanfu Gao
Abstract:
The purpose of partial multi-label feature selection is to select the most representative feature subset, where the data comes from partial multi-label datasets that have label ambiguity issues. For label disambiguation, previous methods mainly focus on utilizing the information inside the labels and the relationship between the labels and features. However, the information existing in the feature…
▽ More
The purpose of partial multi-label feature selection is to select the most representative feature subset, where the data comes from partial multi-label datasets that have label ambiguity issues. For label disambiguation, previous methods mainly focus on utilizing the information inside the labels and the relationship between the labels and features. However, the information existing in the feature space is rarely considered, especially in partial multi-label scenarios where the noises is considered to be concentrated in the label space while the feature information is correct. This paper proposes a method based on latent space alignment, which uses the information mined in feature space to disambiguate in latent space through the structural consistency between labels and features. In addition, previous methods overestimate the consistency of features and labels in the latent space after convergence. We comprehensively consider the similarity of latent space projections to feature space and label space, and propose new feature selection term. This method also significantly improves the positive label identification ability of the selected features. Comprehensive experiments demonstrate the superiority of the proposed method.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
CCaaLF: Concurrency Control as a Learnable Function
Authors:
Hexiang Pan,
Shaofeng Cai,
Tien Tuan Anh Dinh,
Yuncheng Wu,
Yeow Meng Chee,
Gang Chen,
Beng Chin Ooi
Abstract:
Concurrency control (CC) algorithms are important in modern transactional databases, as they enable high performance by executing transactions concurrently while ensuring correctness. However, state-of-the-art CC algorithms struggle to perform well across diverse workloads, and most do not consider workload drifts.
In this paper, we propose CCaaLF (Concurrency Control as a Learnable Function), a…
▽ More
Concurrency control (CC) algorithms are important in modern transactional databases, as they enable high performance by executing transactions concurrently while ensuring correctness. However, state-of-the-art CC algorithms struggle to perform well across diverse workloads, and most do not consider workload drifts.
In this paper, we propose CCaaLF (Concurrency Control as a Learnable Function), a novel learned concurrency control algorithm designed to achieve high performance across varying workloads. The algorithm is quick to optimize, making it robust against dynamic workloads. CCaaLF learns an agent function that captures a large number of design choices from existing CC algorithms. The function is implemented as an efficient in-database lookup table that maps database states to concurrency control actions. The learning process is based on a combination of Bayesian optimization and a novel graph reduction algorithm, which converges quickly to a function that achieves high transaction throughput. We compare CCaaLF against five state-of-the-art CC algorithms and show that our algorithm consistently outperforms them in terms of transaction throughput and optimization time.
△ Less
Submitted 25 March, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
GrInAdapt: Scaling Retinal Vessel Structural Map Segmentation Through Grounding, Integrating and Adapting Multi-device, Multi-site, and Multi-modal Fundus Domains
Authors:
Zixuan Liu,
Aaron Honjaya,
Yuekai Xu,
Yi Zhang,
Hefu Pan,
Xin Wang,
Linda G Shapiro,
Sheng Wang,
Ruikang K Wang
Abstract:
Retinal vessel segmentation is critical for diagnosing ocular conditions, yet current deep learning methods are limited by modality-specific challenges and significant distribution shifts across imaging devices, resolutions, and anatomical regions. In this paper, we propose GrInAdapt, a novel framework for source-free multi-target domain adaptation that leverages multi-view images to refine segmen…
▽ More
Retinal vessel segmentation is critical for diagnosing ocular conditions, yet current deep learning methods are limited by modality-specific challenges and significant distribution shifts across imaging devices, resolutions, and anatomical regions. In this paper, we propose GrInAdapt, a novel framework for source-free multi-target domain adaptation that leverages multi-view images to refine segmentation labels and enhance model generalizability for optical coherence tomography angiography (OCTA) of the fundus of the eye. GrInAdapt follows an intuitive three-step approach: (i) grounding images to a common anchor space via registration, (ii) integrating predictions from multiple views to achieve improved label consensus, and (iii) adapting the source model to diverse target domains. Furthermore, GrInAdapt is flexible enough to incorporate auxiliary modalities such as color fundus photography, to provide complementary cues for robust vessel segmentation. Extensive experiments on a multi-device, multi-site, and multi-modal retinal dataset demonstrate that GrInAdapt significantly outperforms existing domain adaptation methods, achieving higher segmentation accuracy and robustness across multiple domains. These results highlight the potential of GrInAdapt to advance automated retinal vessel analysis and support robust clinical decision-making.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
Authors:
Yijie Xu,
Bolun Zheng,
Wei Zhu,
Hangjia Pan,
Yuchen Yao,
Ning Xu,
Anan Liu,
Quan Zhang,
Chenggang Yan
Abstract:
Social media popularity prediction task aims to predict the popularity of posts on social media platforms, which has a positive driving effect on application scenarios such as content optimization, digital marketing and online advertising. Though many studies have made significant progress, few of them pay much attention to the integration between popularity prediction with temporal alignment. In…
▽ More
Social media popularity prediction task aims to predict the popularity of posts on social media platforms, which has a positive driving effect on application scenarios such as content optimization, digital marketing and online advertising. Though many studies have made significant progress, few of them pay much attention to the integration between popularity prediction with temporal alignment. In this paper, with exploring YouTube's multilingual and multi-modal content, we construct a new social media temporal popularity prediction benchmark, namely SMTPD, and suggest a baseline framework for temporal popularity prediction. Through data analysis and experiments, we verify that temporal alignment and early popularity play crucial roles in social media popularity prediction for not only deepening the understanding of temporal dynamics of popularity in social media but also offering a suggestion about developing more effective prediction models in this field. Code is available at https://github.com/zhuwei321/SMTPD.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving
Authors:
Ziying Song,
Caiyan Jia,
Lin Liu,
Hongyu Pan,
Yongchang Zhang,
Junming Wang,
Xingyu Zhang,
Shaoqing Xu,
Lei Yang,
Yadan Luo
Abstract:
End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine…
▽ More
End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine trajectory predictions. MomAD comprises two core components: (1) Topological Trajectory Matching (TTM) employs Hausdorff Distance to select the optimal planning query that aligns with prior paths to ensure coherence;(2) Momentum Planning Interactor (MPI) cross-attends the selected planning query with historical queries to expand static and dynamic perception files. This enriched query, in turn, helps regenerate long-horizon trajectory and reduce collision risks. To mitigate noise arising from dynamic environments and detection errors, we introduce robust instance denoising during training, enabling the planning model to focus on critical signals and improve its robustness. We also propose a novel Trajectory Prediction Consistency (TPC) metric to quantitatively assess planning stability. Experiments on the nuScenes dataset demonstrate that MomAD achieves superior long-term consistency (>=3s) compared to SOTA methods. Moreover, evaluations on the curated Turning-nuScenes shows that MomAD reduces the collision rate by 26% and improves TPC by 0.97m (33.45%) over a 6s prediction horizon, while closedloop on Bench2Drive demonstrates an up to 16.3% improvement in success rate.
△ Less
Submitted 8 May, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
Authors:
Haowen Pan,
Xiaozhi Wang,
Yixin Cao,
Zenglin Shi,
Xun Yang,
Juanzi Li,
Meng Wang
Abstract:
Knowledge editing aims to update outdated information in Large Language Models (LLMs). A representative line of study is locate-then-edit methods, which typically employ causal tracing to identify the modules responsible for recalling factual knowledge about entities. However, we find these methods are often sensitive only to changes in the subject entity, leaving them less effective at adapting t…
▽ More
Knowledge editing aims to update outdated information in Large Language Models (LLMs). A representative line of study is locate-then-edit methods, which typically employ causal tracing to identify the modules responsible for recalling factual knowledge about entities. However, we find these methods are often sensitive only to changes in the subject entity, leaving them less effective at adapting to changes in relations. This limitation results in poor editing locality, which can lead to the persistence of irrelevant or inaccurate facts, ultimately compromising the reliability of LLMs. We believe this issue arises from the insufficient precision of knowledge localization. To address this, we propose a Fine-grained Neuron-level Knowledge Editing (FiNE) method that enhances editing locality without affecting overall success rates. By precisely identifying and modifying specific neurons within feed-forward networks, FiNE significantly improves knowledge localization and editing. Quantitative experiments demonstrate that FiNE efficiently achieves better overall performance compared to existing techniques, providing new insights into the localization and modification of knowledge within LLMs.
△ Less
Submitted 17 March, 2025; v1 submitted 2 March, 2025;
originally announced March 2025.
-
DeepCell: Multiview Representation Learning for Post-Mapping Netlists
Authors:
Zhengyuan Shi,
Chengyu Ma,
Ziyang Zheng,
Lingfeng Zhou,
Hongyang Pan,
Wentao Jiang,
Fan Yang,
Xiaoyan Yang,
Zhufei Chu,
Qiang Xu
Abstract:
Representation learning for post-mapping (PM) netlists is a critical challenge in Electronic Design Automation (EDA), driven by the diverse and complex nature of modern circuit designs. Existing approaches focus on intermediate representations like And-Inverter Graphs (AIGs), limiting their applicability to post-synthesis stages. We introduce DeepCell, a multiview representation learning framework…
▽ More
Representation learning for post-mapping (PM) netlists is a critical challenge in Electronic Design Automation (EDA), driven by the diverse and complex nature of modern circuit designs. Existing approaches focus on intermediate representations like And-Inverter Graphs (AIGs), limiting their applicability to post-synthesis stages. We introduce DeepCell, a multiview representation learning framework that integrates structural and functional insights from both PM netlists and AIGs to learn rich, generalizable embeddings. At its core, DeepCell employs the novel Mask Circuit Modeling (MCM) mechanism, which refines PM netlist representations in a self-supervised manner using pretrained AIG encoders. DeepCell sets a new benchmark in PM netlist representation, outperforming existing methods in predictive accuracy and reconstruction fidelity. To validate its efficacy, we apply DeepCell to functional Engineering Change Orders (ECO), achieving significant reductions in patch generation costs and runtime while improving patch quality.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems
Authors:
Wenyi Wang,
Maxime Gonthier,
Poornima Nookala,
Haochen Pan,
Ian Foster,
Ioan Raicu,
Kyle Chard
Abstract:
Achieving efficient task parallelism on many-core architectures is an important challenge. The widely used GNU OpenMP implementation of the popular OpenMP parallel programming model incurs high overhead for fine-grained, short-running tasks due to time spent on runtime synchronization. In this work, we introduce and analyze three key advances that collectively achieve significant performance gains…
▽ More
Achieving efficient task parallelism on many-core architectures is an important challenge. The widely used GNU OpenMP implementation of the popular OpenMP parallel programming model incurs high overhead for fine-grained, short-running tasks due to time spent on runtime synchronization. In this work, we introduce and analyze three key advances that collectively achieve significant performance gains. First, we introduce XQueue, a lock-less concurrent queue implementation to replace GNU's priority task queue and remove the global task lock. Second, we develop a scalable, efficient, and hybrid lock-free/lock-less distributed tree barrier to address the high hardware synchronization overhead from GNU's centralized barrier. Third, we develop two lock-less and NUMA-aware load balancing strategies. We evaluate our implementation using Barcelona OpenMP Task Suite (BOTS) benchmarks. We show that the use of XQueue and the distributed tree barrier can improve performance by up to 1522.8$\times$ compared to the original GNU OpenMP. We further show that lock-less load balancing can improve performance by up to 4$\times$ compared to GNU OpenMP using XQueue.
△ Less
Submitted 19 March, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
Classroom Simulacra: Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation
Authors:
Songlin Xu,
Hao-Ning Wen,
Hongyi Pan,
Dallas Dominguez,
Dongyin Hu,
Xinyu Zhang
Abstract:
Student simulation supports educators to improve teaching by interacting with virtual students. However, most existing approaches ignore the modulation effects of course materials because of two challenges: the lack of datasets with granularly annotated course materials, and the limitation of existing simulation models in processing extremely long textual data. To solve the challenges, we first ru…
▽ More
Student simulation supports educators to improve teaching by interacting with virtual students. However, most existing approaches ignore the modulation effects of course materials because of two challenges: the lack of datasets with granularly annotated course materials, and the limitation of existing simulation models in processing extremely long textual data. To solve the challenges, we first run a 6-week education workshop from N = 60 students to collect fine-grained data using a custom built online education system, which logs students' learning behaviors as they interact with lecture materials over time. Second, we propose a transferable iterative reflection (TIR) module that augments both prompting-based and finetuning-based large language models (LLMs) for simulating learning behaviors. Our comprehensive experiments show that TIR enables the LLMs to perform more accurate student simulation than classical deep learning models, even with limited demonstration data. Our TIR approach better captures the granular dynamism of learning performance and inter-student correlations in classrooms, paving the way towards a ''digital twin'' for online education.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Comparative Analysis of Pre-trained Deep Learning Models and DINOv2 for Cushing's Syndrome Diagnosis in Facial Analysis
Authors:
Hongjun Liu,
Changwei Song,
Jiaqi Qiang,
Jianqiang Li,
Hui Pan,
Lin Lu,
Xiao Long,
Qing Zhao,
Jiuzuo Huang,
Shi Chen
Abstract:
Cushing's syndrome is a condition caused by excessive glucocorticoid secretion from the adrenal cortex, often manifesting with moon facies and plethora, making facial data crucial for diagnosis. Previous studies have used pre-trained convolutional neural networks (CNNs) for diagnosing Cushing's syndrome using frontal facial images. However, CNNs are better at capturing local features, while Cushin…
▽ More
Cushing's syndrome is a condition caused by excessive glucocorticoid secretion from the adrenal cortex, often manifesting with moon facies and plethora, making facial data crucial for diagnosis. Previous studies have used pre-trained convolutional neural networks (CNNs) for diagnosing Cushing's syndrome using frontal facial images. However, CNNs are better at capturing local features, while Cushing's syndrome often presents with global facial features. Transformer-based models like ViT and SWIN, which utilize self-attention mechanisms, can better capture long-range dependencies and global features. Recently, DINOv2, a foundation model based on visual Transformers, has gained interest. This study compares the performance of various pre-trained models, including CNNs, Transformer-based models, and DINOv2, in diagnosing Cushing's syndrome. We also analyze gender bias and the impact of freezing mechanisms on DINOv2. Our results show that Transformer-based models and DINOv2 outperformed CNNs, with ViT achieving the highest F1 score of 85.74%. Both the pre-trained model and DINOv2 had higher accuracy for female samples. DINOv2 also showed improved performance when freezing parameters. In conclusion, Transformer-based models and DINOv2 are effective for Cushing's syndrome classification.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow
Authors:
Xiaoli Yan,
Nathaniel Hudson,
Hyun Park,
Daniel Grzenda,
J. Gregory Pauloski,
Marcus Schwarting,
Haochen Pan,
Hassan Harb,
Samuel Foreman,
Chris Knight,
Tom Gibbs,
Kyle Chard,
Santanu Chaudhuri,
Emad Tajkhorshid,
Ian Foster,
Mohamad Moosavi,
Logan Ward,
E. A. Huerta
Abstract:
We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screeni…
▽ More
We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO$_2$ adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine
Authors:
Yishen Liu,
Shengda Luo,
Zishao Zhong,
Tongtong Wu,
Jianguo Zhang,
Peiyao Ou,
Yong Liang,
Liang Liu,
Hudan Pan
Abstract:
Large language models (LLMs) primarily trained on English texts, often face biases and inaccuracies in Chinese contexts. Their limitations are pronounced in fields like Traditional Chinese Medicine (TCM), where cultural and clinical subtleties are vital, further hindered by a lack of domain-specific data, such as rheumatoid arthritis (RA). To address these issues, this paper introduces Hengqin-RA-…
▽ More
Large language models (LLMs) primarily trained on English texts, often face biases and inaccuracies in Chinese contexts. Their limitations are pronounced in fields like Traditional Chinese Medicine (TCM), where cultural and clinical subtleties are vital, further hindered by a lack of domain-specific data, such as rheumatoid arthritis (RA). To address these issues, this paper introduces Hengqin-RA-v1, the first large language model specifically tailored for TCM with a focus on diagnosing and treating RA. We also present HQ-GCM-RA-C1, a comprehensive RA-specific dataset curated from ancient Chinese medical literature, classical texts, and modern clinical studies. This dataset empowers Hengqin-RA-v1 to deliver accurate and culturally informed responses, effectively bridging the gaps left by general-purpose models. Extensive experiments demonstrate that Hengqin-RA-v1 outperforms state-of-the-art models, even surpassing the diagnostic accuracy of TCM practitioners in certain cases.
△ Less
Submitted 27 March, 2025; v1 submitted 5 January, 2025;
originally announced January 2025.