-
Image Recognition with Online Lightweight Vision Transformer: A Survey
Authors:
Zherui Zhang,
Rongtao Xu,
Jie Zhou,
Changwei Wang,
Xingtian Pei,
Wenhao Xu,
Jiguang Zhang,
Li Guo,
Longxiang Gao,
Wenbo Xu,
Shibiao Xu
Abstract:
The Transformer architecture has achieved significant success in natural language processing, motivating its adaptation to computer vision tasks. Unlike convolutional neural networks, vision transformers inherently capture long-range dependencies and enable parallel processing, yet lack inductive biases and efficiency benefits, facing significant computational and memory challenges that limit its…
▽ More
The Transformer architecture has achieved significant success in natural language processing, motivating its adaptation to computer vision tasks. Unlike convolutional neural networks, vision transformers inherently capture long-range dependencies and enable parallel processing, yet lack inductive biases and efficiency benefits, facing significant computational and memory challenges that limit its real-world applicability. This paper surveys various online strategies for generating lightweight vision transformers for image recognition, focusing on three key areas: Efficient Component Design, Dynamic Network, and Knowledge Distillation. We evaluate the relevant exploration for each topic on the ImageNet-1K benchmark, analyzing trade-offs among precision, parameters, throughput, and more to highlight their respective advantages, disadvantages, and flexibility. Finally, we propose future research directions and potential challenges in the lightweighting of vision transformers with the aim of inspiring further exploration and providing practical guidance for the community. Project Page: https://github.com/ajxklo/Lightweight-VIT
△ Less
Submitted 10 May, 2025; v1 submitted 5 May, 2025;
originally announced May 2025.
-
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
Authors:
MingWei Zhou,
Xiaobing Pei
Abstract:
Adversarial training (AT) is an effective technique for enhancing adversarial robustness, but it usually comes at the cost of a decline in generalization ability. Recent studies have attempted to use clean training to assist adversarial training, yet there are contradictions among the conclusions. We comprehensively summarize the representative strategies and, with a focus on the multi - view hypo…
▽ More
Adversarial training (AT) is an effective technique for enhancing adversarial robustness, but it usually comes at the cost of a decline in generalization ability. Recent studies have attempted to use clean training to assist adversarial training, yet there are contradictions among the conclusions. We comprehensively summarize the representative strategies and, with a focus on the multi - view hypothesis, provide a unified explanation for the contradictory phenomena among different studies. In addition, we conduct an in - depth analysis of the knowledge combinations transferred from clean - trained models to adversarially - trained models in previous studies, and find that they can be divided into two categories: reducing the learning difficulty and providing correct guidance. Based on this finding, we propose a new idea of leveraging clean training to further improve the performance of advanced AT methods.We reveal that the problem of generalization degradation faced by AT partly stems from the difficulty of adversarial training in learning certain sample features, and this problem can be alleviated by making full use of clean training.
△ Less
Submitted 30 March, 2025;
originally announced April 2025.
-
Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection
Authors:
Duanrui Yu,
Jing You,
Xin Pei,
Anqi Qu,
Dingyu Wang,
Shaocheng Jia
Abstract:
Collaborative perception allows real-time inter-agent information exchange and thus offers invaluable opportunities to enhance the perception capabilities of individual agents. However, limited communication bandwidth in practical scenarios restricts the inter-agent data transmission volume, consequently resulting in performance declines in collaborative perception systems. This implies a trade-of…
▽ More
Collaborative perception allows real-time inter-agent information exchange and thus offers invaluable opportunities to enhance the perception capabilities of individual agents. However, limited communication bandwidth in practical scenarios restricts the inter-agent data transmission volume, consequently resulting in performance declines in collaborative perception systems. This implies a trade-off between perception performance and communication cost. To address this issue, we propose Which2comm, a novel multi-agent 3D object detection framework leveraging object-level sparse features. By integrating semantic information of objects into 3D object detection boxes, we introduce semantic detection boxes (SemDBs). Innovatively transmitting these information-rich object-level sparse features among agents not only significantly reduces the demanding communication volume, but also improves 3D object detection performance. Specifically, a fully sparse network is constructed to extract SemDBs from individual agents; a temporal fusion approach with a relative temporal encoding mechanism is utilized to obtain the comprehensive spatiotemporal features. Extensive experiments on the V2XSet and OPV2V datasets demonstrate that Which2comm consistently outperforms other state-of-the-art methods on both perception performance and communication cost, exhibiting better robustness to real-world latency. These results present that for multi-agent collaborative 3D object detection, transmitting only object-level sparse features is sufficient to achieve high-precision and robust performance.
△ Less
Submitted 25 March, 2025; v1 submitted 21 March, 2025;
originally announced March 2025.
-
Multi-view Fake News Detection Model Based on Dynamic Hypergraph
Authors:
Rongping Ye,
Xiaobing Pei
Abstract:
With the rapid development of online social networks and the inadequacies in content moderation mechanisms, the detection of fake news has emerged as a pressing concern for the public. Various methods have been proposed for fake news detection, including text-based approaches as well as a series of graph-based approaches. However, the deceptive nature of fake news renders text-based approaches les…
▽ More
With the rapid development of online social networks and the inadequacies in content moderation mechanisms, the detection of fake news has emerged as a pressing concern for the public. Various methods have been proposed for fake news detection, including text-based approaches as well as a series of graph-based approaches. However, the deceptive nature of fake news renders text-based approaches less effective. Propagation tree-based methods focus on the propagation process of individual news, capturing pairwise relationships but lacking the capability to capture high-order complex relationships. Large heterogeneous graph-based approaches necessitate the incorporation of substantial additional information beyond news text and user data, while hypergraph-based approaches rely on predefined hypergraph structures. To tackle these issues, we propose a novel dynamic hypergraph-based multi-view fake news detection model (DHy-MFND) that learns news embeddings across three distinct views: text-level, propagation tree-level, and hypergraph-level. By employing hypergraph structures to model complex high-order relationships among multiple news pieces and introducing dynamic hypergraph structure learning, we optimize predefined hypergraph structures while learning news embeddings. Additionally, we introduce contrastive learning to capture authenticity-relevant embeddings across different views. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our proposed DHy-MFND compared with a broad range of competing baselines.
△ Less
Submitted 26 December, 2024;
originally announced December 2024.
-
AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks
Authors:
Kai Yuan,
Jiahao Zhang,
Yidi Wang,
Xiaobing Pei
Abstract:
Adversarial attacks on Graph Neural Networks aim to perturb the performance of the learner by carefully modifying the graph topology and node attributes. Existing methods achieve attack stealthiness by constraining the modification budget and differences in graph properties. However, these methods typically disrupt task-relevant primary semantics directly, which results in low defensibility and de…
▽ More
Adversarial attacks on Graph Neural Networks aim to perturb the performance of the learner by carefully modifying the graph topology and node attributes. Existing methods achieve attack stealthiness by constraining the modification budget and differences in graph properties. However, these methods typically disrupt task-relevant primary semantics directly, which results in low defensibility and detectability of the attack. In this paper, we propose an Adversarial Attack on High-level Semantics for Graph Neural Networks (AHSG), which is a graph structure attack model that ensures the retention of primary semantics. By combining latent representations with shared primary semantics, our model retains detectable attributes and relational patterns of the original graph while leveraging more subtle changes to carry out the attack. Then we use the Projected Gradient Descent algorithm to map the latent representations with attack effects to the adversarial graph. Through experiments on robust graph deep learning models equipped with defense strategies, we demonstrate that AHSG outperforms other state-of-the-art methods in attack effectiveness. Additionally, using Contextual Stochastic Block Models to detect the attacked graph further validates that our method preserves the primary semantics of the graph.
△ Less
Submitted 17 April, 2025; v1 submitted 10 December, 2024;
originally announced December 2024.
-
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
Authors:
Xiaohuan Pei,
Tao Huang,
Chang Xu
Abstract:
KV cache pruning has emerged as a promising technique for reducing memory and computation costs in long-context auto-regressive generation. Existing methods for vision-language models (VLMs) typically rely on self-attention scores from large language models (LLMs) to identify and prune irrelevant tokens. However, these approaches overlook the inherent distributional discrepancies between modalitie…
▽ More
KV cache pruning has emerged as a promising technique for reducing memory and computation costs in long-context auto-regressive generation. Existing methods for vision-language models (VLMs) typically rely on self-attention scores from large language models (LLMs) to identify and prune irrelevant tokens. However, these approaches overlook the inherent distributional discrepancies between modalities, often leading to inaccurate token importance estimation and the over-pruning of critical visual tokens. To address this, we propose decomposing attention scores into intra-modality attention (within the same modality) and inter-modality attention (across modalities), enabling more precise KV cache pruning by independently managing these distinct attention types. Additionally, we introduce an n-softmax function to counteract distribution shifts caused by pruning, preserving the original smoothness of attention scores and ensuring stable performance. Our final training-free method, \textbf{C}ross-\textbf{S}elf \textbf{P}runing (CSP), achieves competitive performance compared to models with full KV caches while significantly outperforming previous pruning methods. Extensive evaluations on MileBench, a benchmark encompassing 29 multimodal datasets, demonstrate CSP's effectiveness, achieving up to a 41\% performance improvement on challenging tasks like conversational embodied dialogue while reducing the KV cache budget by 13.6\%. The code is available at https://github.com/TerryPei/CSP
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
CVVLSNet: Vehicle Location and Speed Estimation Using Partial Connected Vehicle Trajectory Data
Authors:
Jiachen Ye,
Dingyu Wang,
Shaocheng Jia,
Xin Pei,
Zi Yang,
Yi Zhang,
S. C. Wong
Abstract:
Real-time estimation of vehicle locations and speeds is crucial for developing many beneficial transportation applications in traffic management and control, e.g., adaptive signal control. Recent advances in communication technologies facilitate the emergence of connected vehicles (CVs), which can share traffic information with nearby CVs or infrastructures. At the early stage of connectivity, onl…
▽ More
Real-time estimation of vehicle locations and speeds is crucial for developing many beneficial transportation applications in traffic management and control, e.g., adaptive signal control. Recent advances in communication technologies facilitate the emergence of connected vehicles (CVs), which can share traffic information with nearby CVs or infrastructures. At the early stage of connectivity, only a portion of vehicles are CVs. The locations and speeds for those non-CVs (NCs) are not accessible and must be estimated to obtain the full traffic information. To address the above problem, this paper proposes a novel CV-based Vehicle Location and Speed estimation network, CVVLSNet, to simultaneously estimate the vehicle locations and speeds exclusively using partial CV trajectory data. A road cell occupancy (RCO) method is first proposed to represent the variable vehicle state information. Spatiotemporal interactions can be integrated by simply fusing the RCO representations. Then, CVVLSNet, taking the Coding-RAte TransformEr (CRATE) network as a backbone, is introduced to estimate the vehicle locations and speeds. Moreover, physical vehicle size constraints are also considered in loss functions. Extensive experiments indicate that the proposed method significantly outperformed the existing method under various CV penetration rates, signal timings, and volume-to-capacity ratios.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability
Authors:
Xi Zhang,
Yaru Xue,
Shaocheng Jia,
Xin Pei
Abstract:
Self-supervised depth estimation, which solely requires monocular image sequence as input, has become increasingly popular and promising in recent years. Current research primarily focuses on enhancing the prediction accuracy of the models. However, the excessive number of parameters impedes the universal deployment of the model on edge devices. Moreover, the emerging neural networks, being black-…
▽ More
Self-supervised depth estimation, which solely requires monocular image sequence as input, has become increasingly popular and promising in recent years. Current research primarily focuses on enhancing the prediction accuracy of the models. However, the excessive number of parameters impedes the universal deployment of the model on edge devices. Moreover, the emerging neural networks, being black-box models, are difficult to analyze, leading to challenges in understanding the rationales for performance improvements. To mitigate these issues, this study proposes a novel hybrid self-supervised depth estimation network, CCDepth, comprising convolutional neural networks (CNNs) and the white-box CRATE (Coding RAte reduction TransformEr) network. This novel network uses CNNs and the CRATE modules to extract local and global information in images, respectively, thereby boosting learning efficiency and reducing model size. Furthermore, incorporating the CRATE modules into the network enables a mathematically interpretable process in capturing global features. Extensive experiments on the KITTI dataset indicate that the proposed CCDepth network can achieve performance comparable with those state-of-the-art methods, while the model size has been significantly reduced. In addition, a series of quantitative and qualitative analyses on the inner features in the CCDepth network further confirm the effectiveness of the proposed method.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Multi-V2X: A Large Scale Multi-modal Multi-penetration-rate Dataset for Cooperative Perception
Authors:
Rongsong Li,
Xin Pei
Abstract:
Cooperative perception through vehicle-to-everything (V2X) has garnered significant attention in recent years due to its potential to overcome occlusions and enhance long-distance perception. Great achievements have been made in both datasets and algorithms. However, existing real-world datasets are limited by the presence of few communicable agents, while synthetic datasets typically cover only v…
▽ More
Cooperative perception through vehicle-to-everything (V2X) has garnered significant attention in recent years due to its potential to overcome occlusions and enhance long-distance perception. Great achievements have been made in both datasets and algorithms. However, existing real-world datasets are limited by the presence of few communicable agents, while synthetic datasets typically cover only vehicles. More importantly, the penetration rate of connected and autonomous vehicles (CAVs) , a critical factor for the deployment of cooperative perception technologies, has not been adequately addressed. To tackle these issues, we introduce Multi-V2X, a large-scale, multi-modal, multi-penetration-rate dataset for V2X perception. By co-simulating SUMO and CARLA, we equip a substantial number of cars and roadside units (RSUs) in simulated towns with sensor suites, and collect comprehensive sensing data. Datasets with specified CAV penetration rates can be obtained by masking some equipped cars as normal vehicles. In total, our Multi-V2X dataset comprises 549k RGB frames, 146k LiDAR frames, and 4,219k annotated 3D bounding boxes across six categories. The highest possible CAV penetration rate reaches 86.21%, with up to 31 agents in communication range, posing new challenges in selecting agents to collaborate with. We provide comprehensive benchmarks for cooperative 3D object detection tasks. Our data and code are available at https://github.com/RadetzkyLi/Multi-V2X .
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Transferable Adversarial Facial Images for Privacy Protection
Authors:
Minghui Li,
Jiangxiong Wang,
Hao Zhang,
Ziqi Zhou,
Shengshan Hu,
Xiaobing Pei
Abstract:
The success of deep face recognition (FR) systems has raised serious privacy concerns due to their ability to enable unauthorized tracking of users in the digital world. Previous studies proposed introducing imperceptible adversarial noises into face images to deceive those face recognition models, thus achieving the goal of enhancing facial privacy protection. Nevertheless, they heavily rely on u…
▽ More
The success of deep face recognition (FR) systems has raised serious privacy concerns due to their ability to enable unauthorized tracking of users in the digital world. Previous studies proposed introducing imperceptible adversarial noises into face images to deceive those face recognition models, thus achieving the goal of enhancing facial privacy protection. Nevertheless, they heavily rely on user-chosen references to guide the generation of adversarial noises, and cannot simultaneously construct natural and highly transferable adversarial face images in black-box scenarios. In light of this, we present a novel face privacy protection scheme with improved transferability while maintain high visual quality. We propose shaping the entire face space directly instead of exploiting one kind of facial characteristic like makeup information to integrate adversarial noises. To achieve this goal, we first exploit global adversarial latent search to traverse the latent space of the generative model, thereby creating natural adversarial face images with high transferability. We then introduce a key landmark regularization module to preserve the visual identity information. Finally, we investigate the impacts of various kinds of latent spaces and find that $\mathcal{F}$ latent space benefits the trade-off between visual naturalness and adversarial transferability. Extensive experiments over two datasets demonstrate that our approach significantly enhances attack transferability while maintaining high visual quality, outperforming state-of-the-art methods by an average 25% improvement in deep FR models and 10% improvement on commercial FR APIs, including Face++, Aliyun, and Tencent.
△ Less
Submitted 17 July, 2024;
originally announced August 2024.
-
Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing
Authors:
Yunlong Zhao,
Xiaoheng Deng,
Yijing Liu,
Xinjun Pei,
Jiazhi Xia,
Wei Chen
Abstract:
Model stealing (MS) involves querying and observing the output of a machine learning model to steal its capabilities. The quality of queried data is crucial, yet obtaining a large amount of real data for MS is often challenging. Recent works have reduced reliance on real data by using generative models. However, when high-dimensional query data is required, these methods are impractical due to the…
▽ More
Model stealing (MS) involves querying and observing the output of a machine learning model to steal its capabilities. The quality of queried data is crucial, yet obtaining a large amount of real data for MS is often challenging. Recent works have reduced reliance on real data by using generative models. However, when high-dimensional query data is required, these methods are impractical due to the high costs of querying and the risk of model collapse. In this work, we propose using sample gradients (SG) to enhance the utility of each real sample, as SG provides crucial guidance on the decision boundaries of the victim model. However, utilizing SG in the model stealing scenario faces two challenges: 1. Pixel-level gradient estimation requires extensive query volume and is susceptible to defenses. 2. The estimation of sample gradients has a significant variance. This paper proposes Superpixel Sample Gradient stealing (SPSG) for model stealing under the constraint of limited real samples. With the basic idea of imitating the victim model's low-variance patch-level gradients instead of pixel-level gradients, SPSG achieves efficient sample gradient estimation through two steps. First, we perform patch-wise perturbations on query images to estimate the average gradient in different regions of the image. Then, we filter the gradients through a threshold strategy to reduce variance. Exhaustive experiments demonstrate that, with the same number of real samples, SPSG achieves accuracy, agreements, and adversarial success rate significantly surpassing the current state-of-the-art MS methods. Codes are available at https://github.com/zyl123456aB/SPSG_attack.
△ Less
Submitted 18 May, 2024;
originally announced June 2024.
-
IENE: Identifying and Extrapolating the Node Environment for Out-of-Distribution Generalization on Graphs
Authors:
Haoran Yang,
Xiaobing Pei,
Kai Yuan
Abstract:
Due to the performance degradation of graph neural networks (GNNs) under distribution shifts, the work on out-of-distribution (OOD) generalization on graphs has received widespread attention. A novel perspective involves distinguishing potential confounding biases from different environments through environmental identification, enabling the model to escape environmentally-sensitive correlations a…
▽ More
Due to the performance degradation of graph neural networks (GNNs) under distribution shifts, the work on out-of-distribution (OOD) generalization on graphs has received widespread attention. A novel perspective involves distinguishing potential confounding biases from different environments through environmental identification, enabling the model to escape environmentally-sensitive correlations and maintain stable performance under distribution shifts. However, in graph data, confounding factors not only affect the generation process of node features but also influence the complex interaction between nodes. We observe that neglecting either aspect of them will lead to a decrease in performance. In this paper, we propose IENE, an OOD generalization method on graphs based on node-level environmental identification and extrapolation techniques. It strengthens the model's ability to extract invariance from two granularities simultaneously, leading to improved generalization. Specifically, to identify invariance in features, we utilize the disentangled information bottleneck framework to achieve mutual promotion between node-level environmental estimation and invariant feature learning. Furthermore, we extrapolate topological environments through graph augmentation techniques to identify structural invariance. We implement the conceptual method with specific algorithms and provide theoretical analysis and proofs for our approach. Extensive experimental evaluations on two synthetic and four real-world OOD datasets validate the superiority of IENE, which outperforms existing techniques and provides a flexible framework for enhancing the generalization of GNNs.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
AmBC-NOMA-Aided Short-Packet Communication for High Mobility V2X Transmissions
Authors:
Xinyue Pei,
Xingwei Wang,
Yingyang Chen,
Tingrui Pei,
Miaowen Wen
Abstract:
In this paper, we investigate the performance of ambient backscatter communication non-orthogonal multiple access (AmBC-NOMA)-assisted short packet communication for high-mobility vehicle-to-everything transmissions. In the proposed system, a roadside unit (RSU) transmits a superimposed signal to a typical NOMA user pair. Simultaneously, the backscatter device (BD) transmits its own signal towards…
▽ More
In this paper, we investigate the performance of ambient backscatter communication non-orthogonal multiple access (AmBC-NOMA)-assisted short packet communication for high-mobility vehicle-to-everything transmissions. In the proposed system, a roadside unit (RSU) transmits a superimposed signal to a typical NOMA user pair. Simultaneously, the backscatter device (BD) transmits its own signal towards the user pair by reflecting and modulating the RSU's superimposed signals. Due to vehicles' mobility, we consider realistic assumptions of time-selective fading and channel estimation errors. Theoretical expressions for the average block error rates (BLERs) of both users are derived. Furthermore, analysis and insights on transmit signal-to-noise ratio, vehicles' mobility, imperfect channel estimation, the reflection efficiency at the BD, and blocklength are provided. Numerical results validate the theoretical findings and reveal that the AmBC-NOMA system outperforms its orthogonal multiple access counterpart in terms of BLER performance.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Authors:
Xiaohuan Pei,
Tao Huang,
Chang Xu
Abstract:
Prior efforts in light-weight model development mainly centered on CNN and Transformer-based designs yet faced persistent challenges. CNNs adept at local feature extraction compromise resolution while Transformers offer global reach but escalate computational demands $\mathcal{O}(N^2)$. This ongoing trade-off between accuracy and efficiency remains a significant hurdle. Recently, state space model…
▽ More
Prior efforts in light-weight model development mainly centered on CNN and Transformer-based designs yet faced persistent challenges. CNNs adept at local feature extraction compromise resolution while Transformers offer global reach but escalate computational demands $\mathcal{O}(N^2)$. This ongoing trade-off between accuracy and efficiency remains a significant hurdle. Recently, state space models (SSMs), such as Mamba, have shown outstanding performance and competitiveness in various tasks such as language modeling and computer vision, while reducing the time complexity of global information extraction to $\mathcal{O}(N)$. Inspired by this, this work proposes to explore the potential of visual state space models in light-weight model design and introduce a novel efficient model variant dubbed EfficientVMamba. Concretely, our EfficientVMamba integrates a atrous-based selective scan approach by efficient skip sampling, constituting building blocks designed to harness both global and local representational features. Additionally, we investigate the integration between SSM blocks and convolutions, and introduce an efficient visual state space block combined with an additional convolution branch, which further elevate the model performance. Experimental results show that, EfficientVMamba scales down the computational complexity while yields competitive results across a variety of vision tasks. For example, our EfficientVMamba-S with $1.3$G FLOPs improves Vim-Ti with $1.5$G FLOPs by a large margin of $5.6\%$ accuracy on ImageNet. Code is available at: \url{https://github.com/TerryPei/EfficientVMamba}.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Authors:
Tao Huang,
Xiaohuan Pei,
Shan You,
Fei Wang,
Chen Qian,
Chang Xu
Abstract:
Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in…
▽ More
Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in optimizing scan directions for sequence modeling. Traditional ViM approaches, which flatten spatial tokens, overlook the preservation of local 2D dependencies, thereby elongating the distance between adjacent tokens. We introduce a novel local scanning strategy that divides images into distinct windows, effectively capturing local dependencies while maintaining a global perspective. Additionally, acknowledging the varying preferences for scan patterns across different network layers, we propose a dynamic method to independently search for the optimal scan choices for each layer, substantially improving performance. Extensive experiments across both plain and hierarchical models underscore our approach's superiority in effectively capturing image representations. For example, our model significantly outperforms Vim-Ti by 3.1% on ImageNet with the same 1.5G FLOPs. Code is available at: https://github.com/hunto/LocalMamba.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Hyperedge Interaction-aware Hypergraph Neural Network
Authors:
Rongping Ye,
Xiaobing Pei,
Haoran Yang,
Ruiqi Wang
Abstract:
Hypergraphs provide an effective modeling approach for modeling high-order relationships in many real-world datasets. To capture such complex relationships, several hypergraph neural networks have been proposed for learning hypergraph structure, which propagate information from nodes to hyperedges and then from hyperedges back to nodes. However, most existing methods focus on information propagati…
▽ More
Hypergraphs provide an effective modeling approach for modeling high-order relationships in many real-world datasets. To capture such complex relationships, several hypergraph neural networks have been proposed for learning hypergraph structure, which propagate information from nodes to hyperedges and then from hyperedges back to nodes. However, most existing methods focus on information propagation between hyperedges and nodes, neglecting the interactions among hyperedges themselves. In this paper, we propose HeIHNN, a hyperedge interaction-aware hypergraph neural network, which captures the interactions among hyperedges during the convolution process and introduce a novel mechanism to enhance information flow between hyperedges and nodes. Specifically, HeIHNN integrates the interactions between hyperedges into the hypergraph convolution by constructing a three-stage information propagation process. After propagating information from nodes to hyperedges, we introduce a hyperedge-level convolution to update the hyperedge embeddings. Finally, the embeddings that capture rich information from the interaction among hyperedges will be utilized to update the node embeddings. Additionally, we introduce a hyperedge outlier removal mechanism in the information propagation stages between nodes and hyperedges, which dynamically adjusts the hypergraph structure using the learned embeddings, effectively removing outliers. Extensive experiments conducted on real-world datasets show the competitive performance of HeIHNN compared with state-of-the-art methods.
△ Less
Submitted 5 April, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Semi-supervised learning via DQN for log anomaly detection
Authors:
Yingying He,
Xiaobing Pei
Abstract:
Log anomaly detection is a critical component in modern software system security and maintenance, serving as a crucial support and basis for system monitoring, operation, and troubleshooting. It aids operations personnel in timely identification and resolution of issues. However, current methods in log anomaly detection still face challenges such as underutilization of unlabeled data, imbalance be…
▽ More
Log anomaly detection is a critical component in modern software system security and maintenance, serving as a crucial support and basis for system monitoring, operation, and troubleshooting. It aids operations personnel in timely identification and resolution of issues. However, current methods in log anomaly detection still face challenges such as underutilization of unlabeled data, imbalance between normal and anomaly class data, and high rates of false positives and false negatives, leading to insufficient effectiveness in anomaly recognition. In this study, we propose a semi-supervised log anomaly detection method named DQNLog, which integrates deep reinforcement learning to enhance anomaly detection performance by leveraging a small amount of labeled data and large-scale unlabeled data. To address issues of imbalanced data and insufficient labeling, we design a state transition function biased towards anomalies based on cosine similarity, aiming to capture semantic-similar anomalies rather than favoring the majority class. To enhance the model's capability in learning anomalies, we devise a joint reward function that encourages the model to utilize labeled anomalies and explore unlabeled anomalies, thereby reducing false positives and false negatives. Additionally, to prevent the model from deviating from normal trajectories due to misestimation, we introduce a regularization term in the loss function to ensure the model retains prior knowledge during updates. We evaluate DQNLog on three widely used datasets, demonstrating its ability to effectively utilize large-scale unlabeled data and achieve promising results across all experimental datasets.
△ Less
Submitted 30 July, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Deeper and Wider Networks for Performance Metrics Prediction in Communication Networks
Authors:
Aijia Liu,
Shiqing Liu,
Xiaobing Pei
Abstract:
In today's era, users have increasingly high expectations regarding the performance and efficiency of communication networks. Network operators aspire to achieve efficient network planning, operation, and optimization through Digital Twin Networks (DTN). The effectiveness of DTN heavily relies on the network model, with graph neural networks (GNN) playing a crucial role in network modeling. Howeve…
▽ More
In today's era, users have increasingly high expectations regarding the performance and efficiency of communication networks. Network operators aspire to achieve efficient network planning, operation, and optimization through Digital Twin Networks (DTN). The effectiveness of DTN heavily relies on the network model, with graph neural networks (GNN) playing a crucial role in network modeling. However, existing network modeling methods still lack a comprehensive understanding of communication networks. In this paper, we propose DWNet (Deeper and Wider Networks), a heterogeneous graph neural network modeling method based on data-driven approaches that aims to address end-to-end latency and jitter prediction in network models. This method stands out due to two distinctive features: firstly, it introduces deeper levels of state participation in the message passing process; secondly, it extensively integrates relevant features during the feature fusion process. Through experimental validation and evaluation, our model achieves higher prediction accuracy compared to previous research achievements, particularly when dealing with unseen network topologies during model training. Our model not only provides more accurate predictions but also demonstrates stronger generalization capabilities across diverse topological structures.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
HC-Ref: Hierarchical Constrained Refinement for Robust Adversarial Training of GNNs
Authors:
Xiaobing Pei,
Haoran Yang,
Gang Shen
Abstract:
Recent studies have shown that attackers can catastrophically reduce the performance of GNNs by maliciously modifying the graph structure or node features on the graph. Adversarial training, which has been shown to be one of the most effective defense mechanisms against adversarial attacks in computer vision, holds great promise for enhancing the robustness of GNNs. There is limited research on de…
▽ More
Recent studies have shown that attackers can catastrophically reduce the performance of GNNs by maliciously modifying the graph structure or node features on the graph. Adversarial training, which has been shown to be one of the most effective defense mechanisms against adversarial attacks in computer vision, holds great promise for enhancing the robustness of GNNs. There is limited research on defending against attacks by performing adversarial training on graphs, and it is crucial to delve deeper into this approach to optimize its effectiveness. Therefore, based on robust adversarial training on graphs, we propose a hierarchical constraint refinement framework (HC-Ref) that enhances the anti-perturbation capabilities of GNNs and downstream classifiers separately, ultimately leading to improved robustness. We propose corresponding adversarial regularization terms that are conducive to adaptively narrowing the domain gap between the normal part and the perturbation part according to the characteristics of different layers, promoting the smoothness of the predicted distribution of both parts. Moreover, existing research on graph robust adversarial training primarily concentrates on training from the standpoint of node feature perturbations and seldom takes into account alterations in the graph structure. This limitation makes it challenging to prevent attacks based on topological changes in the graph. This paper generates adversarial examples by utilizing graph structure perturbations, offering an effective approach to defend against attack methods that are based on topological changes. Extensive experiments on two real-world graph benchmarks show that HC-Ref successfully resists various attacks and has better node classification performance compared to several baseline methods.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Unify Change Point Detection and Segment Classification in a Regression Task for Transportation Mode Identification
Authors:
Rongsong Li,
Xin Pei
Abstract:
Identifying travelers' transportation modes is important in transportation science and location-based services. It's appealing for researchers to leverage GPS trajectory data to infer transportation modes with the popularity of GPS-enabled devices, e.g., smart phones. Existing studies frame this problem as classification task. The dominant two-stage studies divide the trip into single-one mode seg…
▽ More
Identifying travelers' transportation modes is important in transportation science and location-based services. It's appealing for researchers to leverage GPS trajectory data to infer transportation modes with the popularity of GPS-enabled devices, e.g., smart phones. Existing studies frame this problem as classification task. The dominant two-stage studies divide the trip into single-one mode segments first and then categorize these segments. The over segmentation strategy and inevitable error propagation bring difficulties to classification stage and make optimizing the whole system hard. The recent one-stage works throw out trajectory segmentation entirely to avoid these by directly conducting point-wise classification for the trip, whereas leaving predictions dis-continuous. To solve above-mentioned problems, inspired by YOLO and SSD in object detection, we propose to reframe change point detection and segment classification as a unified regression task instead of the existing classification task. We directly regress coordinates of change points and classify associated segments. In this way, our method divides the trip into segments under a supervised manner and leverage more contextual information, obtaining predictions with high accuracy and continuity. Two frameworks, TrajYOLO and TrajSSD, are proposed to solve the regression task and various feature extraction backbones are exploited. Exhaustive experiments on GeoLife dataset show that the proposed method has competitive overall identification accuracy of 0.853 when distinguishing five modes: walk, bike, bus, car, train. As for change point detection, our method increases precision at the cost of drop in recall. All codes are available at https://github.com/RadetzkyLi/TrajYOLO-SSD.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Neural Architecture Retrieval
Authors:
Xiaohuan Pei,
Yanxi Li,
Minjing Dong,
Chang Xu
Abstract:
With the increasing number of new neural architecture designs and substantial existing neural architectures, it becomes difficult for the researchers to situate their contributions compared with existing neural architectures or establish the connections between their designs and other relevant ones. To discover similar neural architectures in an efficient and automatic manner, we define a new prob…
▽ More
With the increasing number of new neural architecture designs and substantial existing neural architectures, it becomes difficult for the researchers to situate their contributions compared with existing neural architectures or establish the connections between their designs and other relevant ones. To discover similar neural architectures in an efficient and automatic manner, we define a new problem Neural Architecture Retrieval which retrieves a set of existing neural architectures which have similar designs to the query neural architecture. Existing graph pre-training strategies cannot address the computational graph in neural architectures due to the graph size and motifs. To fulfill this potential, we propose to divide the graph into motifs which are used to rebuild the macro graph to tackle these issues, and introduce multi-level contrastive learning to achieve accurate graph representation learning. Extensive evaluations on both human-designed and synthesized neural architectures demonstrate the superiority of our algorithm. Such a dataset which contains 12k real-world network architectures, as well as their embedding, is built for neural architecture retrieval.
△ Less
Submitted 17 March, 2024; v1 submitted 15 July, 2023;
originally announced July 2023.
-
GPT Self-Supervision for a Better Data Annotator
Authors:
Xiaohuan Pei,
Yanxi Li,
Chang Xu
Abstract:
The task of annotating data into concise summaries poses a significant challenge across various domains, frequently requiring the allocation of significant time and specialized knowledge by human experts. Despite existing efforts to use large language models for annotation tasks, significant problems such as limited applicability to unlabeled data, the absence of self-supervised methods, and the l…
▽ More
The task of annotating data into concise summaries poses a significant challenge across various domains, frequently requiring the allocation of significant time and specialized knowledge by human experts. Despite existing efforts to use large language models for annotation tasks, significant problems such as limited applicability to unlabeled data, the absence of self-supervised methods, and the lack of focus on complex structured data still persist. In this work, we propose a GPT self-supervision annotation method, which embodies a generating-recovering paradigm that leverages the one-shot learning capabilities of the Generative Pretrained Transformer (GPT). The proposed approach comprises a one-shot tuning phase followed by a generation phase. In the one-shot tuning phase, we sample a data from the support set as part of the prompt for GPT to generate a textual summary, which is then used to recover the original data. The alignment score between the recovered and original data serves as a self-supervision navigator to refine the process. In the generation stage, the optimally selected one-shot sample serves as a template in the prompt and is applied to generating summaries from challenging datasets. The annotation performance is evaluated by tuning several human feedback reward networks and by calculating alignment scores between original and recovered data at both sentence and structure levels. Our self-supervised annotation method consistently achieves competitive scores, convincingly demonstrating its robust strength in various data-to-summary annotation tasks.
△ Less
Submitted 8 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Exploring a multi_stage feedback teaching mode for graduate students of software engineering discipline based on project_driven competition
Authors:
Xiangdong Pei,
Rui Zhang
Abstract:
Aiming at the current problems of theory-oriented,practice-light,and lack of innovation ability in the teaching of postgraduate software engineering courses,a multi-stage feedback teaching mode for software engineering postgraduates based on competition project_driven is proposed. The model is driven by the competition project,and implementing suggestions are given in terms of stage allocation of…
▽ More
Aiming at the current problems of theory-oriented,practice-light,and lack of innovation ability in the teaching of postgraduate software engineering courses,a multi-stage feedback teaching mode for software engineering postgraduates based on competition project_driven is proposed. The model is driven by the competition project,and implementing suggestions are given in terms of stage allocation of software engineering course tasks and ability cultivation,competition case design and process evaluation improvement,etc. Through the implementation of this teaching mode,students enthusiasm and initiative are expected to be stimulated,and the overall development of students professional skills and comprehension ability would be improved to meet the demand of society for software engineering technical talents.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
TraInterSim: Adaptive and Planning-Aware Hybrid-Driven Traffic Intersection Simulation
Authors:
Pei Lv,
Xinming Pei,
Xinyu Ren,
Yuzhen Zhang,
Chaochao Li,
Mingliang Xu
Abstract:
Traffic intersections are important scenes that can be seen almost everywhere in the traffic system. Currently, most simulation methods perform well at highways and urban traffic networks. In intersection scenarios, the challenge lies in the lack of clearly defined lanes, where agents with various motion plannings converge in the central area from different directions. Traditional model-based meth…
▽ More
Traffic intersections are important scenes that can be seen almost everywhere in the traffic system. Currently, most simulation methods perform well at highways and urban traffic networks. In intersection scenarios, the challenge lies in the lack of clearly defined lanes, where agents with various motion plannings converge in the central area from different directions. Traditional model-based methods are difficult to drive agents to move realistically at intersections without enough predefined lanes, while data-driven methods often require a large amount of high-quality input data. Simultaneously, tedious parameter tuning is inevitable involved to obtain the desired simulation results. In this paper, we present a novel adaptive and planning-aware hybrid-driven method (TraInterSim) to simulate traffic intersection scenarios. Our hybrid-driven method combines an optimization-based data-driven scheme with a velocity continuity model. It guides the agent's movements using real-world data and can generate those behaviors not present in the input data. Our optimization method fully considers velocity continuity, desired speed, direction guidance, and planning-aware collision avoidance. Agents can perceive others' motion planning and relative distance to avoid possible collisions. To preserve the individual flexibility of different agents, the parameters in our method are automatically adjusted during the simulation. TraInterSim can generate realistic behaviors of heterogeneous agents in different traffic intersection scenarios in interactive rates. Through extensive experiments as well as user studies, we validate the effectiveness and rationality of the proposed simulation method.
△ Less
Submitted 5 April, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters Substitution
Authors:
MingWei Zhou,
Xiaobing Pei
Abstract:
Most current methods generate adversarial examples with the $L_p$ norm specification. As a result, many defense methods utilize this property to eliminate the impact of such attacking algorithms. In this paper,we instead introduce "unrestricted" perturbations that create adversarial samples by using spurious relations which were learned by model training. Specifically, we find feature clusters in…
▽ More
Most current methods generate adversarial examples with the $L_p$ norm specification. As a result, many defense methods utilize this property to eliminate the impact of such attacking algorithms. In this paper,we instead introduce "unrestricted" perturbations that create adversarial samples by using spurious relations which were learned by model training. Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results, and treat them as spurious relations learned by the model. Then we create adversarial samples by using them to replace the corresponding feature clusters in the target image. Experimental evaluations show that in both black-box and white-box situations. Our adversarial examples do not change the semantics of images, while still being effective at fooling an adversarially trained DNN image classifier.
△ Less
Submitted 31 August, 2022;
originally announced September 2022.
-
Multi-dimensional Racism Classification during COVID-19: Stigmatization, Offensiveness, Blame, and Exclusion
Authors:
Xin Pei,
Deval Mehta
Abstract:
Transcending the binary categorization of racist texts, our study takes cues from social science theories to develop a multi-dimensional model for racism detection, namely stigmatization, offensiveness, blame, and exclusion. With the aid of BERT and topic modeling, this categorical detection enables insights into the underlying subtlety of racist discussion on digital platforms during COVID-19. Ou…
▽ More
Transcending the binary categorization of racist texts, our study takes cues from social science theories to develop a multi-dimensional model for racism detection, namely stigmatization, offensiveness, blame, and exclusion. With the aid of BERT and topic modeling, this categorical detection enables insights into the underlying subtlety of racist discussion on digital platforms during COVID-19. Our study contributes to enriching the scholarly discussion on deviant racist behaviours on social media. First, a stage-wise analysis is applied to capture the dynamics of the topic changes across the early stages of COVID-19 which transformed from a domestic epidemic to an international public health emergency and later to a global pandemic. Furthermore, mapping this trend enables a more accurate prediction of public opinion evolvement concerning racism in the offline world, and meanwhile, the enactment of specified intervention strategies to combat the upsurge of racism during the global public health crisis like COVID-19. In addition, this interdisciplinary research also points out a direction for future studies on social network analysis and mining. Integration of social science perspectives into the development of computational methods provides insights into more accurate data detection and analytics.
△ Less
Submitted 28 August, 2022;
originally announced August 2022.
-
Fine-grained Graph Learning for Multi-view Subspace Clustering
Authors:
Yidi Wang,
Xiaobing Pei,
Haoxi Zhan
Abstract:
Multi-view subspace clustering (MSC) is a popular unsupervised method by integrating heterogeneous information to reveal the intrinsic clustering structure hidden across views. Usually, MSC methods use graphs (or affinity matrices) fusion to learn a common structure, and further apply graph-based approaches to clustering. Despite progress, most of the methods do not establish the connection betwee…
▽ More
Multi-view subspace clustering (MSC) is a popular unsupervised method by integrating heterogeneous information to reveal the intrinsic clustering structure hidden across views. Usually, MSC methods use graphs (or affinity matrices) fusion to learn a common structure, and further apply graph-based approaches to clustering. Despite progress, most of the methods do not establish the connection between graph learning and clustering. Meanwhile, conventional graph fusion strategies assign coarse-grained weights to combine multi-graph, ignoring the importance of local structure. In this paper, we propose a fine-grained graph learning framework for multi-view subspace clustering (FGL-MSC) to address these issues. To utilize the multi-view information sufficiently, we design a specific graph learning method by introducing graph regularization and a local structure fusion pattern. The main challenge is how to optimize the fine-grained fusion weights while generating the learned graph that fits the clustering task, thus making the clustering representation meaningful and competitive. Accordingly, an iterative algorithm is proposed to solve the above joint optimization problem, which obtains the learned graph, the clustering representation, and the fusion weights simultaneously. Extensive experiments on eight real-world datasets show that the proposed framework has comparable performance to the state-of-the-art methods. The source code of the proposed method is available at https://github.com/siriuslay/FGL-MSC.
△ Less
Submitted 13 August, 2023; v1 submitted 12 January, 2022;
originally announced January 2022.
-
DMRVisNet: Deep Multi-head Regression Network for Pixel-wise Visibility Estimation Under Foggy Weather
Authors:
Jing You,
Shaocheng Jia,
Xin Pei,
Danya Yao
Abstract:
Scene perception is essential for driving decision-making and traffic safety. However, fog, as a kind of common weather, frequently appears in the real world, especially in the mountain areas, making it difficult to accurately observe the surrounding environments. Therefore, precisely estimating the visibility under foggy weather can significantly benefit traffic management and safety. To address…
▽ More
Scene perception is essential for driving decision-making and traffic safety. However, fog, as a kind of common weather, frequently appears in the real world, especially in the mountain areas, making it difficult to accurately observe the surrounding environments. Therefore, precisely estimating the visibility under foggy weather can significantly benefit traffic management and safety. To address this, most current methods use professional instruments outfitted at fixed locations on the roads to perform the visibility measurement; these methods are expensive and less flexible. In this paper, we propose an innovative end-to-end convolutional neural network framework to estimate the visibility leveraging Koschmieder's law exclusively using the image data. The proposed method estimates the visibility by integrating the physical model into the proposed framework, instead of directly predicting the visibility value via the convolutional neural work. Moreover, we estimate the visibility as a pixel-wise visibility map against those of previous visibility measurement methods which solely predict a single value for an entire image. Thus, the estimated result of our method is more informative, particularly in uneven fog scenarios, which can benefit to developing a more precise early warning system for foggy weather, thereby better protecting the intelligent transportation infrastructure systems and promoting its development. To validate the proposed framework, a virtual dataset, FACI, containing 3,000 foggy images in different concentrations, is collected using the AirSim platform. Detailed experiments show that the proposed method achieves performance competitive to those of state-of-the-art methods.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
Next-Generation Multiple Access Based on NOMA with Power Level Modulation
Authors:
Xinyue Pei,
Yingyang Chen,
Miaowen Wen,
Hua Yu,
Erdal Panayirci,
H. Vincent Poor
Abstract:
To cope with the explosive traffic growth of next-generation wireless communications, it is necessary to design next-generation multiple access techniques that can provide higher spectral efficiency as well as larger-scale connectivity. As a promising candidate, power-domain non-orthogonal multiple access (NOMA) has been widely studied. In conventional power-domain NOMA, multiple users are multipl…
▽ More
To cope with the explosive traffic growth of next-generation wireless communications, it is necessary to design next-generation multiple access techniques that can provide higher spectral efficiency as well as larger-scale connectivity. As a promising candidate, power-domain non-orthogonal multiple access (NOMA) has been widely studied. In conventional power-domain NOMA, multiple users are multiplexed in the same time and frequency band by different preset power levels, which, however, may limit the spectral efficiency under practical finite alphabet inputs. Inspired by the concept of spatial modulation, we propose to solve this problem by encoding extra information bits into the power levels, and exploit different signal constellations to help the receiver distinguish between them. To convey this idea, termed power selection (PS)-NOMA, clearly, we consider a simple downlink two-user NOMA system with finite input constellations. Assuming maximum-likelihood detection, we derive closed-form approximate bit error ratio (BER) expressions for both users. The achievable rates of both users are also derived in closed form. Simulation results verify the analysis and show that the proposed PS-NOMA outperforms conventional NOMA in terms of BER and achievable rate.
△ Less
Submitted 27 July, 2021;
originally announced July 2021.
-
Beyond a binary of (non)racist tweets: A four-dimensional categorical detection and analysis of racist and xenophobic opinions on Twitter in early Covid-19
Authors:
Xin Pei,
Deval Mehta
Abstract:
Transcending the binary categorization of racist and xenophobic texts, this research takes cues from social science theories to develop a four dimensional category for racism and xenophobia detection, namely stigmatization, offensiveness, blame, and exclusion. With the aid of deep learning techniques, this categorical detection enables insights into the nuances of emergent topics reflected in raci…
▽ More
Transcending the binary categorization of racist and xenophobic texts, this research takes cues from social science theories to develop a four dimensional category for racism and xenophobia detection, namely stigmatization, offensiveness, blame, and exclusion. With the aid of deep learning techniques, this categorical detection enables insights into the nuances of emergent topics reflected in racist and xenophobic expression on Twitter. Moreover, a stage wise analysis is applied to capture the dynamic changes of the topics across the stages of early development of Covid-19 from a domestic epidemic to an international public health emergency, and later to a global pandemic. The main contributions of this research include, first the methodological advancement. By bridging the state-of-the-art computational methods with social science perspective, this research provides a meaningful approach for future research to gain insight into the underlying subtlety of racist and xenophobic discussion on digital platforms. Second, by enabling a more accurate comprehension and even prediction of public opinions and actions, this research paves the way for the enactment of effective intervention policies to combat racist crimes and social exclusion under Covid-19.
△ Less
Submitted 17 July, 2021;
originally announced July 2021.
-
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos
Authors:
Shaocheng Jia,
Xin Pei,
Wei Yao,
S. C. Wong
Abstract:
Self-supervised depth estimation has drawn much attention in recent years as it does not require labeled data but image sequences. Moreover, it can be conveniently used in various applications, such as autonomous driving, robotics, realistic navigation, and smart cities. However, extracting global contextual information from images and predicting a geometrically natural depth map remain challengin…
▽ More
Self-supervised depth estimation has drawn much attention in recent years as it does not require labeled data but image sequences. Moreover, it can be conveniently used in various applications, such as autonomous driving, robotics, realistic navigation, and smart cities. However, extracting global contextual information from images and predicting a geometrically natural depth map remain challenging. In this paper, we present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features with the aid of our depth Linformer block. This block consists of the Linformer and innovative soft split multi-layer perceptron blocks. Moreover, a three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map by imposing the second-order smoothness constraint on the predicted three-dimensional point clouds, thereby realizing improved performance as a byproduct. Finally, we explore the multi-scale prediction strategy and propose the maximum margin dual-scale prediction strategy for further performance improvement. In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods, reducing time and space complexities by more than $62\%$ and $56\%$, respectively. Extensive testing on various real-world situations further demonstrates the strong practicality and generalization capability of the proposed model.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
A Received Power Model for Reconfigurable Intelligent Surface and Measurement-based Validations
Authors:
Zipeng Wang,
Li Tan,
Haifan Yin,
Kai Wang,
Xilong Pei,
David Gesbert
Abstract:
The idea of using a Reconfigurable Intelligent Surface (RIS) consisting of a large array of passive scattering elements to assist wireless communication systems has recently attracted much attention from academia and industry. A central issue with RIS is how much power they can effectively convey to the target radio nodes. Regarding this question, several power level models exist in the literature…
▽ More
The idea of using a Reconfigurable Intelligent Surface (RIS) consisting of a large array of passive scattering elements to assist wireless communication systems has recently attracted much attention from academia and industry. A central issue with RIS is how much power they can effectively convey to the target radio nodes. Regarding this question, several power level models exist in the literature but few have been validated through experiments. In this paper, we propose a radar cross section-based received power model for an RIS-aided wireless communication system that is rooted in the physical properties of RIS. Our proposed model follows the intuition that the received power is related to the distances from the transmitter/receiver to the RIS, the angles in the TX-RIS-RX triangle, the effective area of each element, and the reflection coefficient of each element. To the best of our knowledge, this paper is the first to model the angle-dependent phase shift of the reflection coefficient, which is typically ignored in existing literature. We further measure the received power with our experimental platform in different scenarios to validate our model. The measurement results show that our model is appropriate both in near field and far field and can characterize the impact of angles well.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Black-box Gradient Attack on Graph Neural Networks: Deeper Insights in Graph-based Attack and Defense
Authors:
Haoxi Zhan,
Xiaobing Pei
Abstract:
Graph Neural Networks (GNNs) have received significant attention due to their state-of-the-art performance on various graph representation learning tasks. However, recent studies reveal that GNNs are vulnerable to adversarial attacks, i.e. an attacker is able to fool the GNNs by perturbing the graph structure or node features deliberately. While being able to successfully decrease the performance…
▽ More
Graph Neural Networks (GNNs) have received significant attention due to their state-of-the-art performance on various graph representation learning tasks. However, recent studies reveal that GNNs are vulnerable to adversarial attacks, i.e. an attacker is able to fool the GNNs by perturbing the graph structure or node features deliberately. While being able to successfully decrease the performance of GNNs, most existing attacking algorithms require access to either the model parameters or the training data, which is not practical in the real world.
In this paper, we develop deeper insights into the Mettack algorithm, which is a representative grey-box attacking method, and then we propose a gradient-based black-box attacking algorithm. Firstly, we show that the Mettack algorithm will perturb the edges unevenly, thus the attack will be highly dependent on a specific training set. As a result, a simple yet useful strategy to defense against Mettack is to train the GNN with the validation set. Secondly, to overcome the drawbacks, we propose the Black-Box Gradient Attack (BBGA) algorithm. Extensive experiments demonstrate that out proposed method is able to achieve stable attack performance without accessing the training sets of the GNNs. Further results shows that our proposed method is also applicable when attacking against various defense methods.
△ Less
Submitted 9 October, 2021; v1 submitted 30 April, 2021;
originally announced April 2021.
-
RIS-Aided Wireless Communications: Prototyping, Adaptive Beamforming, and Indoor/Outdoor Field Trials
Authors:
Xilong Pei,
Haifan Yin,
Li Tan,
Lin Cao,
Zhanpeng Li,
Kai Wang,
Kun Zhang,
Emil Björnson
Abstract:
The prospects of using a Reconfigurable Intelligent Surface (RIS) to aid wireless communication systems have recently received much attention from academia and industry. Most papers make theoretical studies based on elementary models, while the prototyping of RIS-aided wireless communication and real-world field trials are scarce. In this paper, we describe a new RIS prototype consisting of 1100 c…
▽ More
The prospects of using a Reconfigurable Intelligent Surface (RIS) to aid wireless communication systems have recently received much attention from academia and industry. Most papers make theoretical studies based on elementary models, while the prototyping of RIS-aided wireless communication and real-world field trials are scarce. In this paper, we describe a new RIS prototype consisting of 1100 controllable elements working at 5.8 GHz band. We propose an efficient algorithm for configuring the RIS over the air by exploiting the geometrical array properties and a practical receiver-RIS feedback link. In our indoor test, where the transmitter and receiver are separated by a 30 cm thick concrete wall, our RIS prototype provides a 26 dB power gain compared to the baseline case where the RIS is replaced by a copper plate. A 27 dB power gain was observed in the short-distance outdoor measurement. We also carried out long-distance measurements and successfully transmitted a 32 Mbps data stream over 500 m. A 1080p video was live-streamed and it only played smoothly when the RIS was utilized. The power consumption of the RIS is around 1 W. Our paper is vivid proof that the RIS is a very promising technology for future wireless communications.
△ Less
Submitted 31 July, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.
-
I-GCN: Robust Graph Convolutional Network via Influence Mechanism
Authors:
Haoxi Zhan,
Xiaobing Pei
Abstract:
Deep learning models for graphs, especially Graph Convolutional Networks (GCNs), have achieved remarkable performance in the task of semi-supervised node classification. However, recent studies show that GCNs suffer from adversarial perturbations. Such vulnerability to adversarial attacks significantly decreases the stability of GCNs when being applied to security-critical applications. Defense me…
▽ More
Deep learning models for graphs, especially Graph Convolutional Networks (GCNs), have achieved remarkable performance in the task of semi-supervised node classification. However, recent studies show that GCNs suffer from adversarial perturbations. Such vulnerability to adversarial attacks significantly decreases the stability of GCNs when being applied to security-critical applications. Defense methods such as preprocessing, attention mechanism and adversarial training have been discussed by various studies. While being able to achieve desirable performance when the perturbation rates are low, such methods are still vulnerable to high perturbation rates. Meanwhile, some defending algorithms perform poorly when the node features are not visible. Therefore, in this paper, we propose a novel mechanism called influence mechanism, which is able to enhance the robustness of the GCNs significantly. The influence mechanism divides the effect of each node into two parts: introverted influence which tries to maintain its own features and extroverted influence which exerts influences on other nodes. Utilizing the influence mechanism, we propose the Influence GCN (I-GCN) model. Extensive experiments show that our proposed model is able to achieve higher accuracy rates than state-of-the-art methods when defending against non-targeted attacks.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Hybrid Multicast/Unicast Design in NOMA-based Vehicular Caching System with Supplementary Material
Authors:
Xinyue Pei,
Hua Yu,
Yingyang Chen,
Miaowen Wen,
Gaojie Chen
Abstract:
In this paper, we investigate a hybrid multicast/ unicast scheme for a multiple-input single-output cache-aided non-orthogonal multiple access (NOMA) vehicular scenario in the face of rapidly fluctuating vehicular wireless channels. Considering a more practical situation, imperfect channel state information is taking into account. In this paper, we formulate an optimization problem to maximize the…
▽ More
In this paper, we investigate a hybrid multicast/ unicast scheme for a multiple-input single-output cache-aided non-orthogonal multiple access (NOMA) vehicular scenario in the face of rapidly fluctuating vehicular wireless channels. Considering a more practical situation, imperfect channel state information is taking into account. In this paper, we formulate an optimization problem to maximize the unicast sum rate under the constraints of the peak power, the peak backhaul, the minimum unicast rate, and the maximum multicast outage probability. To solve the formulated non-convex problem, a lower bound relaxation method is proposed, which enables a division of the original problem into two convex sub-problems. Computer simulations show that the proposed caching-aided NOMA is superior to the orthogonal multiple access counterpart.
△ Less
Submitted 22 November, 2020; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Optimizing AD Pruning of Sponsored Search with Reinforcement Learning
Authors:
Yijiang Lian,
Zhijie Chen,
Xin Pei,
Shuang Li,
Yifei Wang,
Yuefeng Qiu,
Zhiheng Zhang,
Zhipeng Tao,
Liang Yuan,
Hanju Guan,
Kefeng Zhang,
Zhigang Li,
Xiaochun Liu
Abstract:
Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned…
▽ More
Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Bottom-up mechanism and improved contract net protocol for the dynamic task planning of heterogeneous Earth observation resources
Authors:
Baoju Liu,
Min Deng,
Guohua Wu,
Xinyu Pei,
Haifeng Li,
Witold Pedrycz
Abstract:
Earth observation resources are becoming increasingly indispensable in disaster relief, damage assessment and related domains. Many unpredicted factors, such as the change of observation task requirements, to the occurring of bad weather and resource failures, may cause the scheduled observation scheme to become infeasible. Therefore, it is crucial to be able to promptly and maybe frequently devel…
▽ More
Earth observation resources are becoming increasingly indispensable in disaster relief, damage assessment and related domains. Many unpredicted factors, such as the change of observation task requirements, to the occurring of bad weather and resource failures, may cause the scheduled observation scheme to become infeasible. Therefore, it is crucial to be able to promptly and maybe frequently develop high-quality replanned observation schemes that minimize the effects on the scheduled tasks. A bottom-up distributed coordinated framework together with an improved contract net are proposed to facilitate the dynamic task replanning for heterogeneous Earth observation resources. This hierarchical framework consists of three levels, namely, neighboring resource coordination, single planning center coordination, and multiple planning center coordination. Observation tasks affected by unpredicted factors are assigned and treated along with a bottom-up route from resources to planning centers. This bottom-up distributed coordinated framework transfers part of the computing load to various nodes of the observation systems to allocate tasks more efficiently and robustly. To support the prompt assignment of large-scale tasks to proper Earth observation resources in dynamic environments, we propose a multiround combinatorial allocation (MCA) method. Moreover, a new float interval-based local search algorithm is proposed to obtain the promising planning scheme more quickly. The experiments demonstrate that the MCA method can achieve a better task completion rate for large-scale tasks with satisfactory time efficiency. It also demonstrates that this method can help to efficiently obtain replanning schemes based on original scheme in dynamic environments.
△ Less
Submitted 9 June, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
#Coronavirus or #Chinesevirus?!: Understanding the negative sentiment reflected in Tweets with racist hashtags across the development of COVID-19
Authors:
Xin Pei,
Deval Mehta
Abstract:
Situated in the global outbreak of COVID-19, our study enriches the discussion concerning the emergent racism and xenophobia on social media. With big data extracted from Twitter, we focus on the analysis of negative sentiment reflected in tweets marked with racist hashtags, as racism and xenophobia are more likely to be delivered via the negative sentiment. Especially, we propose a stage-based ap…
▽ More
Situated in the global outbreak of COVID-19, our study enriches the discussion concerning the emergent racism and xenophobia on social media. With big data extracted from Twitter, we focus on the analysis of negative sentiment reflected in tweets marked with racist hashtags, as racism and xenophobia are more likely to be delivered via the negative sentiment. Especially, we propose a stage-based approach to capture how the negative sentiment changes along with the three development stages of COVID-19, under which it transformed from a domestic epidemic into an international public health emergency and later, into the global pandemic. At each stage, sentiment analysis enables us to recognize the negative sentiment from tweets with racist hashtags, and keyword extraction allows for the discovery of themes in the expression of negative sentiment by these tweets. Under this public health crisis of human beings, this stage-based approach enables us to provide policy suggestions for the enactment of stage-specific intervention strategies to combat racism and xenophobia on social media in a more effective way.
△ Less
Submitted 17 May, 2020;
originally announced May 2020.
-
Conceptual Design and Preliminary Results of a VR-based Radiation Safety Training System for Interventional Radiologists
Authors:
Yi Guo,
Li Mao,
Gongsen Zhang,
Zhi Chen,
Xi Pei,
X. George Xu
Abstract:
Recent studies have reported an increased risk of developing brain and neck tumors, as well as cataracts, in practitioners in interventional radiology (IR). Occupational radiation protection in IR has been a top concern for regulatory agencies and professional societies. To help minimize occupational radiation exposure in IR, we conceptualized a virtual reality (VR) based radiation safety training…
▽ More
Recent studies have reported an increased risk of developing brain and neck tumors, as well as cataracts, in practitioners in interventional radiology (IR). Occupational radiation protection in IR has been a top concern for regulatory agencies and professional societies. To help minimize occupational radiation exposure in IR, we conceptualized a virtual reality (VR) based radiation safety training system to help operators understand complex radiation fields and to avoid high radiation areas through game-like interactive simulations. The preliminary development of the system has yielded results suggesting that the training system can calculate and report the radiation exposure after each training session based on a database precalculated from computational phantoms and Monte Carlo simulations and the position information provided in real-time by the MS Hololens headset worn by trainee. In addition, real-time dose rate and cumulative dose will be displayed to the trainee by MS Hololens to help them adjust their practice. This paper presents the conceptual design of the overall hardware and software design, as well as preliminary results to combine MS HoloLens headset and complex 3D X-ray field spatial distribution data to create a mixed reality environment for safety training purpose in IR.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Automatic Lumbar Spinal CT Image Segmentation with a Dual Densely Connected U-Net
Authors:
He Tang,
Xiaobing Pei,
Shilong Huang,
Xin Li,
Chao Liu
Abstract:
The clinical treatment of degenerative and developmental lumbar spinal stenosis (LSS) is different. Computed tomography (CT) is helpful in distinguishing degenerative and developmental LSS due to its advantage in imaging of osseous and calcified tissues. However, boundaries of the vertebral body, spinal canal and dural sac have low contrast and hard to identify in a CT image, so the diagnosis depe…
▽ More
The clinical treatment of degenerative and developmental lumbar spinal stenosis (LSS) is different. Computed tomography (CT) is helpful in distinguishing degenerative and developmental LSS due to its advantage in imaging of osseous and calcified tissues. However, boundaries of the vertebral body, spinal canal and dural sac have low contrast and hard to identify in a CT image, so the diagnosis depends heavily on the knowledge of expert surgeons and radiologists. In this paper, we develop an automatic lumbar spinal CT image segmentation method to assist LSS diagnosis. The main contributions of this paper are the following: 1) a new lumbar spinal CT image dataset is constructed that contains 2393 axial CT images collected from 279 patients, with the ground truth of pixel-level segmentation labels; 2) a dual densely connected U-shaped neural network (DDU-Net) is used to segment the spinal canal, dural sac and vertebral body in an end-to-end manner; 3) DDU-Net is capable of segmenting tissues with large scale-variant, inconspicuous edges (e.g., spinal canal) and extremely small size (e.g., dural sac); and 4) DDU-Net is practical, requiring no image preprocessing such as contrast enhancement, registration and denoising, and the running time reaches 12 FPS. In the experiment, we achieve state-of-the-art performance on the lumbar spinal image segmentation task. We expect that the technique will increase both radiology workflow efficiency and the perceived value of radiology reports for referring clinicians and patients.
△ Less
Submitted 4 February, 2020; v1 submitted 21 October, 2019;
originally announced October 2019.
-
DualTable: A Hybrid Storage Model for Update Optimization in Hive
Authors:
Songlin Hu,
Wantao Liu,
Tilmann Rabl,
Shuo Huang,
Ying Liang,
Zheng Xiao,
Hans-Arno Jacobsen,
Xubin Pei,
Jiye Wang
Abstract:
Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operation…
▽ More
Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operations like updates and deletes. Hive cannot offer sufficient support for these while preserving high query performance. Hive using the Hadoop Distributed File System (HDFS) for storage cannot implement data manipulation efficiently and Hive on HBase suffers from poor query performance even though it can support faster data manipulation.There is a project based on Hive issue Hive-5317 to support update operations, but it has not been finished in Hive's latest version. Since this ACID compliant extension adopts same data storage format on HDFS, the update performance problem is not solved.
In this paper, we propose a hybrid storage model called DualTable, which combines the efficient streaming reads of HDFS and the random write capability of HBase. Hive on DualTable provides better data manipulation support and preserves query performance at the same time. Experiments on a TPC-H data set and on a real smart grid data set show that Hive on DualTable is up to 10 times faster than Hive when executing update and delete operations.
△ Less
Submitted 1 December, 2014; v1 submitted 28 April, 2014;
originally announced April 2014.