-
Semi-supervised Graph Anomaly Detection via Robust Homophily Learning
Authors:
Guoguo Ai,
Hezhe Qiao,
Hui Yan,
Guansong Pang
Abstract:
Semi-supervised graph anomaly detection (GAD) utilizes a small set of labeled normal nodes to identify abnormal nodes from a large set of unlabeled nodes in a graph. Current methods in this line posit that 1) normal nodes share a similar level of homophily and 2) the labeled normal nodes can well represent the homophily patterns in the normal class. However, this assumption often does not hold wel…
▽ More
Semi-supervised graph anomaly detection (GAD) utilizes a small set of labeled normal nodes to identify abnormal nodes from a large set of unlabeled nodes in a graph. Current methods in this line posit that 1) normal nodes share a similar level of homophily and 2) the labeled normal nodes can well represent the homophily patterns in the normal class. However, this assumption often does not hold well since normal nodes in a graph can exhibit diverse homophily in real-world GAD datasets. In this paper, we propose RHO, namely Robust Homophily Learning, to adaptively learn such homophily patterns. RHO consists of two novel modules, adaptive frequency response filters (AdaFreq) and graph normality alignment (GNA). AdaFreq learns a set of adaptive spectral filters that capture different frequency components of the labeled normal nodes with varying homophily in the channel-wise and cross-channel views of node attributes. GNA is introduced to enforce consistency between the channel-wise and cross-channel homophily representations to robustify the normality learned by the filters in the two views. Experiments on eight real-world GAD datasets show that RHO can effectively learn varying, often under-represented, homophily in the small normal node set and substantially outperforms state-of-the-art competing methods. Code is available at https://github.com/mala-lab/RHO.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Embedded Mean Field Reinforcement Learning for Perimeter-defense Game
Authors:
Li Wang,
Xin Yu,
Xuxin Lv,
Gangzheng Ai,
Wenjun Wu
Abstract:
With the rapid advancement of unmanned aerial vehicles (UAVs) and missile technologies, perimeter-defense game between attackers and defenders for the protection of critical regions have become increasingly complex and strategically significant across a wide range of domains. However, existing studies predominantly focus on small-scale, simplified two-dimensional scenarios, often overlooking reali…
▽ More
With the rapid advancement of unmanned aerial vehicles (UAVs) and missile technologies, perimeter-defense game between attackers and defenders for the protection of critical regions have become increasingly complex and strategically significant across a wide range of domains. However, existing studies predominantly focus on small-scale, simplified two-dimensional scenarios, often overlooking realistic environmental perturbations, motion dynamics, and inherent heterogeneity--factors that pose substantial challenges to real-world applicability. To bridge this gap, we investigate large-scale heterogeneous perimeter-defense game in a three-dimensional setting, incorporating realistic elements such as motion dynamics and wind fields. We derive the Nash equilibrium strategies for both attackers and defenders, characterize the victory regions, and validate our theoretical findings through extensive simulations. To tackle large-scale heterogeneous control challenges in defense strategies, we propose an Embedded Mean-Field Actor-Critic (EMFAC) framework. EMFAC leverages representation learning to enable high-level action aggregation in a mean-field manner, supporting scalable coordination among defenders. Furthermore, we introduce a lightweight agent-level attention mechanism based on reward representation, which selectively filters observations and mean-field information to enhance decision-making efficiency and accelerate convergence in large-scale tasks. Extensive simulations across varying scales demonstrate the effectiveness and adaptability of EMFAC, which outperforms established baselines in both convergence speed and overall performance. To further validate practicality, we test EMFAC in small-scale real-world experiments and conduct detailed analyses, offering deeper insights into the framework's effectiveness in complex scenarios.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Baichuan-Omni-1.5 Technical Report
Authors:
Yadong Li,
Jun Liu,
Tao Zhang,
Tao Zhang,
Song Chen,
Tianpeng Li,
Zehuan Li,
Lijun Liu,
Lingfeng Ming,
Guosheng Dong,
Da Pan,
Chong Li,
Yuanbo Fang,
Dongdong Kuang,
Mingrui Wang,
Chenglin Zhu,
Youwei Zhang,
Hongyu Guo,
Fengyu Zhang,
Yuran Wang,
Bowen Ding,
Wei Song,
Xu Li,
Yuqi Huo,
Zheng Liang
, et al. (68 additional authors not shown)
Abstract:
We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip…
▽ More
We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers
Authors:
Guoguo Ai,
Guansong Pang,
Hezhe Qiao,
Yuan Gao,
Hui Yan
Abstract:
Graph Transformers (GTs) have demonstrated remarkable performance in graph representation learning over popular graph neural networks (GNNs). However, self--attention, the core module of GTs, preserves only low-frequency signals in graph features, leading to ineffectiveness in capturing other important signals like high-frequency ones. Some recent GT models help alleviate this issue, but their fle…
▽ More
Graph Transformers (GTs) have demonstrated remarkable performance in graph representation learning over popular graph neural networks (GNNs). However, self--attention, the core module of GTs, preserves only low-frequency signals in graph features, leading to ineffectiveness in capturing other important signals like high-frequency ones. Some recent GT models help alleviate this issue, but their flexibility and expressiveness are still limited since the filters they learn are fixed on predefined graph spectrum or spectral order. To tackle this challenge, we propose a Graph Fourier Kolmogorov-Arnold Transformer (GrokFormer), a novel GT model that learns highly expressive spectral filters with adaptive graph spectrum and spectral order through a Fourier series modeling over learnable activation functions. We demonstrate theoretically and empirically that the proposed GrokFormer filter offers better expressiveness than other spectral methods. Comprehensive experiments on 10 real-world node classification datasets across various domains, scales, and graph properties, as well as 5 graph classification datasets, show that GrokFormer outperforms state-of-the-art GTs and GNNs. Our code is available at https://github.com/GGA23/GrokFormer
△ Less
Submitted 29 May, 2025; v1 submitted 26 November, 2024;
originally announced November 2024.
-
Baichuan 2: Open Large-scale Language Models
Authors:
Aiyuan Yang,
Bin Xiao,
Bingning Wang,
Borong Zhang,
Ce Bian,
Chao Yin,
Chenxu Lv,
Da Pan,
Dian Wang,
Dong Yan,
Fan Yang,
Fei Deng,
Feng Wang,
Feng Liu,
Guangwei Ai,
Guosheng Dong,
Haizhou Zhao,
Hang Xu,
Haoze Sun,
Hongda Zhang,
Hui Liu,
Jiaming Ji,
Jian Xie,
JunTao Dai,
Kun Fang
, et al. (30 additional authors not shown)
Abstract:
Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar…
▽ More
Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.
△ Less
Submitted 17 April, 2025; v1 submitted 19 September, 2023;
originally announced September 2023.
-
FemtoDet: An Object Detection Baseline for Energy Versus Performance Tradeoffs
Authors:
Peng Tu,
Xu Xie,
Guo AI,
Yuexiang Li,
Yawen Huang,
Yefeng Zheng
Abstract:
Efficient detectors for edge devices are often optimized for parameters or speed count metrics, which remain in weak correlation with the energy of detectors.
However, some vision applications of convolutional neural networks, such as always-on surveillance cameras, are critical for energy constraints.
This paper aims to serve as a baseline by designing detectors to reach tradeoffs between ene…
▽ More
Efficient detectors for edge devices are often optimized for parameters or speed count metrics, which remain in weak correlation with the energy of detectors.
However, some vision applications of convolutional neural networks, such as always-on surveillance cameras, are critical for energy constraints.
This paper aims to serve as a baseline by designing detectors to reach tradeoffs between energy and performance from two perspectives:
1) We extensively analyze various CNNs to identify low-energy architectures, including selecting activation functions, convolutions operators, and feature fusion structures on necks. These underappreciated details in past work seriously affect the energy consumption of detectors;
2) To break through the dilemmatic energy-performance problem, we propose a balanced detector driven by energy using discovered low-energy components named \textit{FemtoDet}.
In addition to the novel construction, we improve FemtoDet by considering convolutions and training strategy optimizations.
Specifically, we develop a new instance boundary enhancement (IBE) module for convolution optimization to overcome the contradiction between the limited capacity of CNNs and detection tasks in diverse spatial representations, and propose a recursive warm-restart (RecWR) for optimizing training strategy to escape the sub-optimization of light-weight detectors by considering the data shift produced in popular augmentations.
As a result, FemtoDet with only 68.77k parameters achieves a competitive score of 46.3 AP50 on PASCAL VOC and 1.11 W $\&$ 64.47 FPS on Qualcomm Snapdragon 865 CPU platforms.
Extensive experiments on COCO and TJU-DHD datasets indicate that the proposed method achieves competitive results in diverse scenes.
△ Less
Submitted 13 August, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Deep Reinforcement Learning based Dynamic Optimization of Bus Timetable
Authors:
Guanqun Ai,
Xingquan Zuo,
Gang chen,
Binglin Wu
Abstract:
Bus timetable optimization is a key issue to reduce operational cost of bus companies and improve the service quality. Existing methods use exact or heuristic algorithms to optimize the timetable in an offline manner. In practice, the passenger flow may change significantly over time. Timetables determined in offline cannot adjust the departure interval to satisfy the changed passenger flow. Aimin…
▽ More
Bus timetable optimization is a key issue to reduce operational cost of bus companies and improve the service quality. Existing methods use exact or heuristic algorithms to optimize the timetable in an offline manner. In practice, the passenger flow may change significantly over time. Timetables determined in offline cannot adjust the departure interval to satisfy the changed passenger flow. Aiming at improving the online performance of bus timetable, we propose a Deep Reinforcement Learning based bus Timetable dynamic Optimization method (DRL-TO). In this method, the timetable optimization is considered as a sequential decision problem. A Deep Q-Network (DQN) is employed as the decision model to determine whether to dispatch a bus service during each minute of the service period. Therefore, the departure intervals of bus services are determined in real time in accordance with passenger demand. We identify several new and useful state features for the DQN, including the load factor, carrying capacity utilization rate, and the number of stranding passengers. Taking into account both the interests of the bus company and passengers, a reward function is designed, which includes the indicators of full load rate, empty load rate, passengers' waiting time, and the number of stranding passengers. Building on an existing method for calculating the carrying capacity, we develop a new technique to enhance the matching degree at each bus station. Experiments demonstrate that compared with the timetable generated by the state-of-the-art bus timetable optimization approach based on a memetic algorithm (BTOA-MA), Genetic Algorithm (GA) and the manual method, DRL-TO can dynamically determine the departure intervals based on the real-time passenger flow, saving 8$\%$ of vehicles and reducing 17$\%$ of passengers' waiting time on average.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Accurate Face Detection for High Performance
Authors:
Faen Zhang,
Xinyu Fan,
Guo Ai,
Jianfei Song,
Yongqiang Qin,
Jiahong Wu
Abstract:
Face detection has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs). Its central issue in recent years is how to improve the detection performance of tiny faces. To this end, many recent works propose some specific strategies, redesign the architecture and introduce new loss functions for tiny object detection. In this report, we start from the popula…
▽ More
Face detection has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs). Its central issue in recent years is how to improve the detection performance of tiny faces. To this end, many recent works propose some specific strategies, redesign the architecture and introduce new loss functions for tiny object detection. In this report, we start from the popular one-stage RetinaNet approach and apply some recent tricks to obtain a high performance face detector. Specifically, we apply the Intersection over Union (IoU) loss function for regression, employ the two-step classification and regression for detection, revisit the data augmentation based on data-anchor-sampling for training, utilize the max-out operation for classification and use the multi-scale testing strategy for inference. As a consequence, the proposed face detection method achieves state-of-the-art performance on the most popular and challenging face detection benchmark WIDER FACE dataset.
△ Less
Submitted 24 May, 2019; v1 submitted 4 May, 2019;
originally announced May 2019.