Search | arXiv e-print repository

Sentinel: Scheduling Live Streams with Proactive Anomaly Detection in Crowdsourced Cloud-Edge Platforms

Authors: Yuting Li, Shaoyuan Huang, Tengwen Zhang, Cheng Zhang, Xiaofei Wang, Victor C. M. Leung

Abstract: With the rapid growth of live streaming services, Crowdsourced Cloud-edge service Platforms (CCPs) are playing an increasingly important role in meeting the increasing demand. Although stream scheduling plays a critical role in optimizing CCPs' revenue, most optimization strategies struggle to achieve practical results due to various anomalies in unstable CCPs. Additionally, the substantial scale… ▽ More With the rapid growth of live streaming services, Crowdsourced Cloud-edge service Platforms (CCPs) are playing an increasingly important role in meeting the increasing demand. Although stream scheduling plays a critical role in optimizing CCPs' revenue, most optimization strategies struggle to achieve practical results due to various anomalies in unstable CCPs. Additionally, the substantial scale of CCPs magnifies the difficulties of anomaly detection in time-sensitive scheduling. To tackle these challenges, this paper proposes Sentinel, a proactive anomaly detection-based scheduling framework. Sentinel models the scheduling process as a two-stage Pre-Post-Scheduling paradigm: in the pre-scheduling stage, Sentinel conducts anomaly detection and constructs a strategy pool; in the post-scheduling stage, upon request arrival, it triggers an appropriate scheduling based on a pre-generated strategy to implement the scheduling process. Extensive experiments on realistic datasets show that Sentinel significantly reduces anomaly frequency by 70%, improves revenue by 74%, and doubles the scheduling speed. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: arXiv admin note: text overlap with arXiv:2402.14619

arXiv:2505.01821 [pdf, ps, other]

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey

Authors: Jing Liu, Yao Du, Kun Yang, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C. M. Leung

Abstract: Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications, integrating cloud resources with edge devices to enable efficient, low-latency processing. Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed sys… ▽ More Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications, integrating cloud resources with edge devices to enable efficient, low-latency processing. Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems, yet introduce significant challenges in model deployment and resource management. In this survey, we comprehensive examine the intersection of distributed intelligence and model optimization within edge-cloud environments, providing a structured tutorial on fundamental architectures, enabling technologies, and emerging applications. Additionally, we systematically analyze model optimization approaches, including compression, adaptation, and neural architecture search, alongside AI-driven resource management strategies that balance performance, energy efficiency, and latency requirements. We further explore critical aspects of privacy protection and security enhancement within ECCC systems and examines practical deployments through diverse applications, spanning autonomous driving, healthcare, and industrial automation. Performance analysis and benchmarking techniques are also thoroughly explored to establish evaluation standards for these complex systems. Furthermore, the review identifies critical research directions including LLMs deployment, 6G integration, neuromorphic computing, and quantum computing, offering a roadmap for addressing persistent challenges in heterogeneity management, real-time processing, and scalability. By bridging theoretical advancements and practical deployments, this survey offers researchers and practitioners a holistic perspective on leveraging AI to optimize distributed computing environments, fostering innovation in next-generation intelligent systems. △ Less

Submitted 17 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

Comments: 30 pages, 10 figures, 6 tables

arXiv:2504.15632 [pdf, other]

A Study on Mixup-Inspired Augmentation Methods for Software Vulnerability Detection

Authors: Seyed Shayan Daneshvar, Da Tan, Shaowei Wang, Carson Leung

Abstract: Various deep learning (DL) methods have recently been utilized to detect software vulnerabilities. Real-world software vulnerability datasets are rare and hard to acquire, as there is no simple metric for classifying vulnerability. Such datasets are heavily imbalanced, and none of the current datasets are considered huge for DL models. To tackle these problems, a recent work has tried to augment t… ▽ More Various deep learning (DL) methods have recently been utilized to detect software vulnerabilities. Real-world software vulnerability datasets are rare and hard to acquire, as there is no simple metric for classifying vulnerability. Such datasets are heavily imbalanced, and none of the current datasets are considered huge for DL models. To tackle these problems, a recent work has tried to augment the dataset using the source code and generate realistic single-statement vulnerabilities, which is not quite practical and requires manual checking of the generated vulnerabilities. In this paper, we aim to explore the augmentation of vulnerabilities at the representation level to help current models learn better, which has never been done before to the best of our knowledge. We implement and evaluate five augmentation techniques that augment the embedding of the data and have recently been used for code search, which is a completely different software engineering task. We also introduced a conditioned version of those augmentation methods, which ensures the augmentation does not change the vulnerable section of the vector representation. We show that such augmentation methods can be helpful and increase the F1-score by up to 9.67%, yet they cannot beat Random Oversampling when balancing datasets, which increases the F1-score by 10.82%. △ Less

Submitted 26 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

Comments: Accepted at EASE 2025, Istanbul, Turkey

arXiv:2504.03798 [pdf, other]

An Intelligent and Privacy-Preserving Digital Twin Model for Aging-in-Place

Authors: Yongjie Wang, Jonathan Cyril Leung, Ming Chen, Zhiwei Zeng, Benny Toh Hsiang Tan, Yang Qiu, Zhiqi Shen

Abstract: The population of older adults is steadily increasing, with a strong preference for aging-in-place rather than moving to care facilities. Consequently, supporting this growing demographic has become a significant global challenge. However, facilitating successful aging-in-place is challenging, requiring consideration of multiple factors such as data privacy, health status monitoring, and living en… ▽ More The population of older adults is steadily increasing, with a strong preference for aging-in-place rather than moving to care facilities. Consequently, supporting this growing demographic has become a significant global challenge. However, facilitating successful aging-in-place is challenging, requiring consideration of multiple factors such as data privacy, health status monitoring, and living environments to improve health outcomes. In this paper, we propose an unobtrusive sensor system designed for installation in older adults' homes. Using data from the sensors, our system constructs a digital twin, a virtual representation of events and activities that occurred in the home. The system uses neural network models and decision rules to capture residents' activities and living environments. This digital twin enables continuous health monitoring by providing actionable insights into residents' well-being. Our system is designed to be low-cost and privacy-preserving, with the aim of providing green and safe monitoring for the health of older adults. We have successfully deployed our system in two homes over a time period of two months, and our findings demonstrate the feasibility and effectiveness of digital twin technology in supporting independent living for older adults. This study highlights that our system could revolutionize elder care by enabling personalized interventions, such as lifestyle adjustments, medical treatments, or modifications to the residential environment, to enhance health outcomes. △ Less

Submitted 4 April, 2025; originally announced April 2025.

Comments: accepted to IEEE TENSYMP 2025

MSC Class: 68T05; ACM Class: I.2; J.3

arXiv:2503.18808 [pdf, other]

CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos

Authors: Yang Liu, Hongjin Wang, Zepu Wang, Xiaoguang Zhu, Jing Liu, Peng Sun, Rui Tang, Jianwei Du, Victor C. M. Leung, Liang Song

Abstract: Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection. Due to the rarity and diversity of anomalies, existing methods only use easily collected regular events to model the inherent normality of normal spatial-temporal patterns in an unsupervised ma… ▽ More Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection. Due to the rarity and diversity of anomalies, existing methods only use easily collected regular events to model the inherent normality of normal spatial-temporal patterns in an unsupervised manner. Previous studies have shown that existing unsupervised VAD models are incapable of label-independent data offsets (e.g., scene changes) in real-world scenarios and may fail to respond to light anomalies due to the overgeneralization of deep neural networks. Inspired by causality learning, we argue that there exist causal factors that can adequately generalize the prototypical patterns of regular events and present significant deviations when anomalous instances occur. In this regard, we propose Causal Representation Consistency Learning (CRCL) to implicitly mine potential scene-robust causal variable in unsupervised video normality learning. Specifically, building on the structural causal models, we propose scene-debiasing learning and causality-inspired normality learning to strip away entangled scene bias in deep representations and learn causal video normality, respectively. Extensive experiments on benchmarks validate the superiority of our method over conventional deep representation learning. Moreover, ablation studies and extension validation show that the CRCL can cope with label-independent biases in multi-scene settings and maintain stable performance with only limited training data available. △ Less

Submitted 24 March, 2025; originally announced March 2025.

Comments: Accepted for publication by IEEE Transactions on Image Processing

arXiv:2503.08156 [pdf, other]

Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model

Authors: Yufan Chen, Ching Ting Leung, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao

Abstract: Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into str… ▽ More Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F1 score of 88% on various benchmarks, surpassing literature methods by 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses. An instance of the RxnIM web application can be accessed at https://huggingface.co/spaces/CYF200127/RxnIM. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2502.19450 [pdf, other]

CLIP-Optimized Multimodal Image Enhancement via ISP-CNN Fusion for Coal Mine IoVT under Uneven Illumination

Authors: Shuai Wang, Shihao Zhang, Jiaqi Wu, Zijian Tian, Wei Chen, Tongzhu Jin, Miaomiao Xue, Zehua Wang, Fei Richard Yu, Victor C. M. Leung

Abstract: Clear monitoring images are crucial for the safe operation of coal mine Internet of Video Things (IoVT) systems. However, low illumination and uneven brightness in underground environments significantly degrade image quality, posing challenges for enhancement methods that often rely on difficult-to-obtain paired reference images. Additionally, there is a trade-off between enhancement performance a… ▽ More Clear monitoring images are crucial for the safe operation of coal mine Internet of Video Things (IoVT) systems. However, low illumination and uneven brightness in underground environments significantly degrade image quality, posing challenges for enhancement methods that often rely on difficult-to-obtain paired reference images. Additionally, there is a trade-off between enhancement performance and computational efficiency on edge devices within IoVT systems.To address these issues, we propose a multimodal image enhancement method tailored for coal mine IoVT, utilizing an ISP-CNN fusion architecture optimized for uneven illumination. This two-stage strategy combines global enhancement with detail optimization, effectively improving image quality, especially in poorly lit areas. A CLIP-based multimodal iterative optimization allows for unsupervised training of the enhancement algorithm. By integrating traditional image signal processing (ISP) with convolutional neural networks (CNN), our approach reduces computational complexity while maintaining high performance, making it suitable for real-time deployment on edge devices.Experimental results demonstrate that our method effectively mitigates uneven brightness and enhances key image quality metrics, with PSNR improvements of 2.9%-4.9%, SSIM by 4.3%-11.4%, and VIF by 4.9%-17.8% compared to seven state-of-the-art algorithms. Simulated coal mine monitoring scenarios validate our method's ability to balance performance and computational demands, facilitating real-time enhancement and supporting safer mining operations. △ Less

Submitted 26 February, 2025; originally announced February 2025.

arXiv:2502.16036 [pdf]

AI Models Still Lag Behind Traditional Numerical Models in Predicting Sudden-Turning Typhoons

Authors: Daosheng Xu, Zebin Lu, Jeremy Cheuk-Hin Leung, Dingchi Zhao, Yi Li, Yang Shi, Bin Chen, Gaozhen Nie, Naigeng Wu, Xiangjun Tian, Yi Yang, Shaoqing Zhang, Banglin Zhang

Abstract: Given the interpretability, accuracy, and stability of numerical weather prediction (NWP) models, current operational weather forecasting relies heavily on the NWP approach. In the past two years, the rapid development of Artificial Intelligence (AI) has provided an alternative solution for medium-range (1-10 days) weather forecasting. Bi et al. (2023) (hereafter Bi23) introduced the first AI-base… ▽ More Given the interpretability, accuracy, and stability of numerical weather prediction (NWP) models, current operational weather forecasting relies heavily on the NWP approach. In the past two years, the rapid development of Artificial Intelligence (AI) has provided an alternative solution for medium-range (1-10 days) weather forecasting. Bi et al. (2023) (hereafter Bi23) introduced the first AI-based weather prediction (AIWP) model in China, named Pangu-Weather, which offers fast prediction without compromising accuracy. In their work, Bi23 made notable claims regarding its effectiveness in extreme weather predictions. However, this claim lacks persuasiveness because the extreme nature of the two tropical cyclones (TCs) examples presented in Bi23, namely Typhoon Kong-rey and Typhoon Yutu, stems primarily from their intensities rather than their moving paths. Their claim may mislead into another meaning which is that Pangu-Weather works well in predicting unusual typhoon paths, which was not explicitly analyzed. Here, we reassess Pangu-Weather's ability to predict extreme TC trajectories from 2020-2024. Results reveal that while Pangu-Weather overall outperforms NWP models in predicting tropical cyclone (TC) tracks, it falls short in accurately predicting the rarely observed sudden-turning tracks, such as Typhoon Khanun in 2023. We argue that current AIWP models still lag behind traditional NWP models in predicting such rare extreme events in medium-range forecasts. △ Less

Submitted 21 February, 2025; originally announced February 2025.

arXiv:2502.10687 [pdf, other]

Multi-objective Aerial IRS-assisted ISAC Optimization via Generative AI-enhanced Deep Reinforcement Learning

Authors: Wenwen Xie, Geng Sun, Jiacheng Wang, Hongyang Du, Jiawen Kang, Kaibin Huang, Victor C. M. Leung

Abstract: Integrated sensing and communication (ISAC) has garnered substantial research interest owing to its pivotal role in advancing the development of next-generation (6G) wireless networks. However, achieving a performance balance between communication and sensing in the dual-function radar communication (DFRC)-based ISAC system remains a significant challenge. In this paper, an aerial intelligent refl… ▽ More Integrated sensing and communication (ISAC) has garnered substantial research interest owing to its pivotal role in advancing the development of next-generation (6G) wireless networks. However, achieving a performance balance between communication and sensing in the dual-function radar communication (DFRC)-based ISAC system remains a significant challenge. In this paper, an aerial intelligent reflecting surface (IRS)-assisted ISAC system is explored, where a base station (BS) supports dual-functional operations, enabling both data transmission for multiple users and sensing for a blocked target, with the channel quality enhanced by an IRS mounted on the unmanned aerial vehicle (UAV). Moreover, we formulate an integrated communication, sensing, and energy efficiency multi-objective optimization problem (CSEMOP), which aims to maximize the communication rate of the users and the echo rate of the target, while minimizing UAV propulsion energy consumption by jointly optimizing the BS beamforming matrix, IRS phase shifts, the flight velocity and angle of the UAV. Considering the non-convexity, trade-off, and dynamic nature of the formulated CSEMOP, we propose a generative diffusion model-based deep deterministic policy gradient (GDMDDPG) method to solve the problem. Specifically, the diffusion model is incorporated into the actor network of DDPG to improve the action quality, with noise perturbation mechanism for better exploration and recent prioritized experience replay (RPER) sampling mechanism for enhanced training efficiency. Simulation results indicate that the GDMDDPG method delivers superior performance compared to the existing methods. △ Less

Submitted 15 February, 2025; originally announced February 2025.

arXiv:2502.09038 [pdf, other]

AoI-Sensitive Data Forwarding with Distributed Beamforming in UAV-Assisted IoT

Authors: Zifan Lang, Guixia Liu, Geng Sun, Jiahui Li, Zemin Sun, Jiacheng Wang, Victor C. M. Leung

Abstract: This paper proposes a UAV-assisted forwarding system based on distributed beamforming to enhance age of information (AoI) in Internet of Things (IoT). Specifically, UAVs collect and relay data between sensor nodes (SNs) and the remote base station (BS). However, flight delays increase the AoI and degrade the network performance. To mitigate this, we adopt distributed beamforming to extend the comm… ▽ More This paper proposes a UAV-assisted forwarding system based on distributed beamforming to enhance age of information (AoI) in Internet of Things (IoT). Specifically, UAVs collect and relay data between sensor nodes (SNs) and the remote base station (BS). However, flight delays increase the AoI and degrade the network performance. To mitigate this, we adopt distributed beamforming to extend the communication range, reduce the flight frequency and ensure the continuous data relay and efficient energy utilization. Then, we formulate an optimization problem to minimize AoI and UAV energy consumption, by jointly optimizing the UAV trajectories and communication schedules. The problem is non-convex and with high dynamic, and thus we propose a deep reinforcement learning (DRL)-based algorithm to solve the problem, thereby enhancing the stability and accelerate convergence speed. Simulation results show that the proposed algorithm effectively addresses the problem and outperforms other benchmark algorithms. △ Less

Submitted 13 February, 2025; originally announced February 2025.

Comments: 6 pages, 4 figures, ICC2025

arXiv:2501.18439 [pdf, other]

MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability

Authors: Yan Sun, Yutong Lu, Yan Yi Li, Zihao Jing, Carson K. Leung, Pingzhao Hu

Abstract: Predicting molecular properties is essential for drug discovery, and computational methods can greatly enhance this process. Molecular graphs have become a focus for representation learning, with Graph Neural Networks (GNNs) widely used. However, GNNs often struggle with capturing long-range dependencies. To address this, we propose MolGraph-xLSTM, a novel graph-based xLSTM model that enhances fea… ▽ More Predicting molecular properties is essential for drug discovery, and computational methods can greatly enhance this process. Molecular graphs have become a focus for representation learning, with Graph Neural Networks (GNNs) widely used. However, GNNs often struggle with capturing long-range dependencies. To address this, we propose MolGraph-xLSTM, a novel graph-based xLSTM model that enhances feature extraction and effectively models molecule long-range interactions. Our approach processes molecular graphs at two scales: atom-level and motif-level. For atom-level graphs, a GNN-based xLSTM framework with jumping knowledge extracts local features and aggregates multilayer information to capture both local and global patterns effectively. Motif-level graphs provide complementary structural information for a broader molecular view. Embeddings from both scales are refined via a multi-head mixture of experts (MHMoE), further enhancing expressiveness and performance. We validate MolGraph-xLSTM on 10 molecular property prediction datasets, covering both classification and regression tasks. Our model demonstrates consistent performance across all datasets, with improvements of up to 7.03% on the BBBP dataset for classification and 7.54% on the ESOL dataset for regression compared to baselines. On average, MolGraph-xLSTM achieves an AUROC improvement of 3.18\% for classification tasks and an RMSE reduction of 3.83\% across regression datasets compared to the baseline methods. These results confirm the effectiveness of our model, offering a promising solution for molecular representation learning for drug discovery. △ Less

Submitted 30 January, 2025; originally announced January 2025.

arXiv:2501.15468 [pdf, other]

Low-altitude Friendly-Jamming for Satellite-Maritime Communications via Generative AI-enabled Deep Reinforcement Learning

Authors: Jiawei Huang, Aimin Wang, Geng Sun, Jiahui Li, Jiacheng Wang, Dusit Niyato, Victor C. M. Leung

Abstract: Low Earth Orbit (LEO) satellites can be used to assist maritime wireless communications for data transmission across wide-ranging areas. However, extensive coverage of LEO satellites, combined with openness of channels, can cause the communication process to suffer from security risks. This paper presents a low-altitude friendly-jamming LEO satellite-maritime communication system enabled by a unma… ▽ More Low Earth Orbit (LEO) satellites can be used to assist maritime wireless communications for data transmission across wide-ranging areas. However, extensive coverage of LEO satellites, combined with openness of channels, can cause the communication process to suffer from security risks. This paper presents a low-altitude friendly-jamming LEO satellite-maritime communication system enabled by a unmanned aerial vehicle (UAV) to ensure data security at the physical layer. Since such a system requires trade-off policies that balance the secrecy rate and energy consumption of the UAV to meet evolving scenario demands, we formulate a secure satellite-maritime communication multi-objective optimization problem (SSMCMOP). In order to solve the dynamic and long-term optimization problem, we reformulate it into a Markov decision process. We then propose a transformer-enhanced soft actor critic (TransSAC) algorithm, which is a generative artificial intelligence-enable deep reinforcement learning approach to solve the reformulated problem, so that capturing global dependencies and diversely exploring weights. Simulation results demonstrate that the TransSAC outperforms various baselines, and achieves an optimal secrecy rate while effectively minimizing the energy consumption of the UAV. Moreover, the results find more suitable constraint values for the system. △ Less

Submitted 26 January, 2025; originally announced January 2025.

arXiv:2501.14832 [pdf, other]

Resource Allocation Driven by Large Models in Future Semantic-Aware Networks

Authors: Haijun Zhang, Jiaxin Ni, Zijun Wu, Xiangnan Liu, V. C. M. Leung

Abstract: Large model has emerged as a key enabler for the popularity of future networked intelligent applications. However, the surge of data traffic brought by intelligent applications puts pressure on the resource utilization and energy consumption of the future networks. With efficient content understanding capabilities, semantic communication holds significant potential for reducing data transmission i… ▽ More Large model has emerged as a key enabler for the popularity of future networked intelligent applications. However, the surge of data traffic brought by intelligent applications puts pressure on the resource utilization and energy consumption of the future networks. With efficient content understanding capabilities, semantic communication holds significant potential for reducing data transmission in intelligent applications. In this article, resource allocation driven by large models in semantic-aware networks is investigated. Specifically, a semantic-aware communication network architecture based on scene graph models and multimodal pre-trained models is designed to achieve efficient data transmission. On the basis of the proposed network architecture, an intelligent resource allocation scheme in semantic-aware network is proposed to further enhance resource utilization efficiency. In the resource allocation scheme, the semantic transmission quality is adopted as an evaluation metric and the impact of wireless channel fading on semantic transmission is analyzed. To maximize the semantic transmission quality for multiple users, a diffusion model-based decision-making scheme is designed to address the power allocation problem in semantic-aware networks. Simulation results demonstrate that the proposed large-model-driven network architecture and resource allocation scheme achieve high-quality semantic transmission. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.11430 [pdf, other]

A Survey on Diffusion Models for Anomaly Detection

Authors: Jing Liu, Zhenchao Ma, Zepu Wang, Chenxuanyin Zou, Jiayang Ren, Zehua Wang, Liang Song, Bo Hu, Yang Liu, Victor C. M. Leung

Abstract: Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing. The intersection of these two fields, termed diffusion models for anomaly detection (DMAD), offers promising solutions for identifying deviations in increasingly c… ▽ More Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing. The intersection of these two fields, termed diffusion models for anomaly detection (DMAD), offers promising solutions for identifying deviations in increasingly complex and high-dimensional data. In this survey, we review recent advances in DMAD research. We begin by presenting the fundamental concepts of AD and DMs, followed by a comprehensive analysis of classic DM architectures including DDPMs, DDIMs, and Score SDEs. We further categorize existing DMAD methods into reconstruction-based, density-based, and hybrid approaches, providing detailed examinations of their methodological innovations. We also explore the diverse tasks across different data modalities, encompassing image, time series, video, and multimodal data analysis. Furthermore, we discuss critical challenges and emerging research directions, including computational efficiency, model interpretability, robustness enhancement, edge-cloud collaboration, and integration with large language models. The collection of DMAD research papers and resources is available at https://github.com/fdjingliu/DMAD. △ Less

Submitted 26 February, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

arXiv:2501.10693 [pdf, ps, other]

Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data

Authors: Cheuk Hang Leung, Yiyan Huang, Yijun Li, Qi Wu

Abstract: Using offline observational data for policy evaluation and learning allows decision-makers to evaluate and learn a policy that connects characteristics and interventions. Most existing literature has focused on either discrete treatment spaces or assumed no difference in the distributions between the policy-learning and policy-deployed environments. These restrict applications in many real-world s… ▽ More Using offline observational data for policy evaluation and learning allows decision-makers to evaluate and learn a policy that connects characteristics and interventions. Most existing literature has focused on either discrete treatment spaces or assumed no difference in the distributions between the policy-learning and policy-deployed environments. These restrict applications in many real-world scenarios where distribution shifts are present with continuous treatment. To overcome these challenges, this paper focuses on developing a distributionally robust policy under a continuous treatment setting. The proposed distributionally robust estimators are established using the Inverse Probability Weighting (IPW) method extended from the discrete one for policy evaluation and learning under continuous treatments. Specifically, we introduce a kernel function into the proposed IPW estimator to mitigate the exclusion of observations that can occur in the standard IPW method to continuous treatments. We then provide finite-sample analysis that guarantees the convergence of the proposed distributionally robust policy evaluation and learning estimators. The comprehensive experiments further verify the effectiveness of our approach when distribution shifts are present. △ Less

Submitted 18 January, 2025; originally announced January 2025.

arXiv:2501.10408 [pdf, other]

Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition

Authors: Ruoyu Zhao, Xiantao Jiang, F. Richard Yu, Victor C. M. Leung, Tao Wang, Shaohu Zhang

Abstract: Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction. Cross-Linguistic SER (CLSER) has been a challenging research problem due to significant variability in linguistic and acoustic features of different languages. In this study, we propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics. These features are fused using a… ▽ More Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction. Cross-Linguistic SER (CLSER) has been a challenging research problem due to significant variability in linguistic and acoustic features of different languages. In this study, we propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics. These features are fused using a cross-attention transformer (CAT) mechanism during feature extraction. Transfer learning is applied to gain from a source emotional speech dataset to the target corpus for emotion recognition. We use IEMOCAP as the source dataset to train the source model and evaluate the proposed method on seven datasets in five languages (e.g., English, German, Spanish, Italian, and Chinese). We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75% across the seven datasets, with notable performance of 88.69% on EMODB (German language) and 79.48% on EMOVO (Italian language). Our extensive evaluation demonstrates that HuMP-CAT outperforms existing methods across multiple target languages. △ Less

Submitted 6 January, 2025; originally announced January 2025.

arXiv:2412.18230 [pdf, other]

Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment

Authors: Jiaqi Wu, Shihao Zhang, Simin Chen, Lixu Wang, Zehua Wang, Wei Chen, Fangyuan He, Zijian Tian, F. Richard Yu, Victor C. M. Leung

Abstract: Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios. However, existing edge detection methods face challenges: 1) difficulty balancing detection precision with lightweight models, 2) limited adaptability of generalized deployment designs, and 3) insufficient real-world validation. To address these issues, we propose the Edge D… ▽ More Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios. However, existing edge detection methods face challenges: 1) difficulty balancing detection precision with lightweight models, 2) limited adaptability of generalized deployment designs, and 3) insufficient real-world validation. To address these issues, we propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments. Specifically, we introduce a lightweight Reparameterized Dynamic Convolutional Network (Rep-DConvNet) featuring weighted multi-shape convolutional branches to enhance detection performance. Additionally, we design a Sparse Cross-Attention (SC-A) network with a localized-mapping-assisted self-attention mechanism, enabling a well-crafted joint module for adaptive feature transfer. For real-world applications, we incorporate an Efficient Head into the YOLO framework to accelerate edge model optimization. To demonstrate practical impact, we identify a gap in helmet detection -- overlooking band fastening, a critical safety factor -- and create the Helmet Band Detection Dataset (HBDD). Using ED-TOOLBOX-optimized models, we address this real-world task. Extensive experiments validate the effectiveness of ED-TOOLBOX, with edge detection models outperforming six state-of-the-art methods in visual surveillance simulations, achieving real-time and accurate performance. These results highlight ED-TOOLBOX as a superior solution for edge object detection. △ Less

Submitted 24 December, 2024; originally announced December 2024.

arXiv:2412.17616 [pdf, other]

doi 10.1145/3737456

Facial Expression Analysis and Its Potentials in IoT Systems: A Contemporary Survey

Authors: Zixuan Shangguan, Yanjie Dong, Song Guo, Victor C. M. Leung, M. Jamal Deen, Xiping Hu

Abstract: Facial expressions convey human emotions and can be categorized into macro-expressions (MaEs) and micro-expressions (MiEs) based on duration and intensity. While MaEs are voluntary and easily recognized, MiEs are involuntary, rapid, and can reveal concealed emotions. The integration of facial expression analysis with Internet-of-Thing (IoT) systems has significant potential across diverse scenario… ▽ More Facial expressions convey human emotions and can be categorized into macro-expressions (MaEs) and micro-expressions (MiEs) based on duration and intensity. While MaEs are voluntary and easily recognized, MiEs are involuntary, rapid, and can reveal concealed emotions. The integration of facial expression analysis with Internet-of-Thing (IoT) systems has significant potential across diverse scenarios. IoT-enhanced MaE analysis enables real-time monitoring of patient emotions, facilitating improved mental health care in smart healthcare. Similarly, IoT-based MiE detection enhances surveillance accuracy and threat detection in smart security. Our work aims to provide a comprehensive overview of research progress in facial expression analysis and explores its potential integration with IoT systems. We discuss the distinctions between our work and existing surveys, elaborate on advancements in MaE and MiE analysis techniques across various learning paradigms, and examine their potential applications in IoT. We highlight challenges and future directions for the convergence of facial expression-based technologies and IoT systems, aiming to foster innovation in this domain. By presenting recent developments and practical applications, our work offers a systematic understanding of the ways of facial expression analysis to enhance IoT systems in healthcare, security, and beyond. △ Less

Submitted 23 May, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

arXiv:2412.14185 [pdf, other]

Fabric Sensing of Intrinsic Hand Muscle Activity

Authors: Katelyn Lee, Runsheng Wang, Ava Chen, Lauren Winterbottom, Ho Man Colman Leung, Lisa Maria DiSalvo, Iris Xu, Jingxi Xu, Dawn M. Nilsen, Joel Stein, Xia Zhou, Matei Ciocarlie

Abstract: Wearable robotics have the capacity to assist stroke survivors in assisting and rehabilitating hand function. Many devices that use surface electromyographic (sEMG) for control rely on extrinsic muscle signals, since sEMG sensors are relatively easy to place on the forearm without interfering with hand activity. In this work, we target the intrinsic muscles of the thumb, which are superficial to t… ▽ More Wearable robotics have the capacity to assist stroke survivors in assisting and rehabilitating hand function. Many devices that use surface electromyographic (sEMG) for control rely on extrinsic muscle signals, since sEMG sensors are relatively easy to place on the forearm without interfering with hand activity. In this work, we target the intrinsic muscles of the thumb, which are superficial to the skin and thus potentially more accessible via sEMG sensing. However, traditional, rigid electrodes can not be placed on the hand without adding bulk and affecting hand functionality. We thus present a novel sensing sleeve that uses textile electrodes to measure sEMG activity of intrinsic thumb muscles. We evaluate the sleeve's performance on detecting thumb movements and muscle activity during both isolated and isometric muscle contractions of the thumb and fingers. This work highlights the potential of textile-based sensors as a low-cost, lightweight, and non-obtrusive alternative to conventional sEMG sensors for wearable robotics. △ Less

Submitted 6 December, 2024; originally announced December 2024.

Comments: 6 pages, 4 figures, ICORR 2025 submission

arXiv:2412.06456 [pdf, other]

UAV Virtual Antenna Array Deployment for Uplink Interference Mitigation in Data Collection Networks

Authors: Hongjuan Li, Hui Kang, Geng Sun, Jiahui Li, Jiacheng Wang, Xue Wang, Dusit Niyato, Victor C. M. Leung

Abstract: Unmanned aerial vehicles (UAVs) have gained considerable attention as a platform for establishing aerial wireless networks and communications. However, the line-of-sight dominance in air-to-ground communications often leads to significant interference with terrestrial networks, reducing communication efficiency among terrestrial terminals. This paper explores a novel uplink interference mitigation… ▽ More Unmanned aerial vehicles (UAVs) have gained considerable attention as a platform for establishing aerial wireless networks and communications. However, the line-of-sight dominance in air-to-ground communications often leads to significant interference with terrestrial networks, reducing communication efficiency among terrestrial terminals. This paper explores a novel uplink interference mitigation approach based on the collaborative beamforming (CB) method in multi-UAV network systems. Specifically, the UAV swarm forms a UAV-enabled virtual antenna array (VAA) to achieve the transmissions of gathered data to multiple base stations (BSs) for data backup and distributed processing. However, there is a trade-off between the effectiveness of CB-based interference mitigation and the energy conservation of UAVs. Thus, by jointly optimizing the excitation current weights and hover position of UAVs as well as the sequence of data transmission to various BSs, we formulate an uplink interference mitigation multi-objective optimization problem (MOOP) to decrease interference affection, enhance transmission efficiency, and improve energy efficiency, simultaneously. In response to the computational demands of the formulated problem, we introduce an evolutionary computation method, namely chaotic non-dominated sorting genetic algorithm II (CNSGA-II) with multiple improved operators. The proposed CNSGA-II efficiently addresses the formulated MOOP, outperforming several other comparative algorithms, as evidenced by the outcomes of the simulations. Moreover, the proposed CB-based uplink interference mitigation approach can significantly reduce the interference caused by UAVs to non-receiving BSs. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: This paper has been accepted by IEEE Internet of Things Journal

arXiv:2412.05641 [pdf, other]

Hyperedge Anomaly Detection with Hypergraph Neural Network

Authors: Md. Tanvir Alam, Chowdhury Farhan Ahmed, Carson K. Leung

Abstract: Hypergraph is a data structure that enables us to model higher-order associations among data entities. Conventional graph-structured data can represent pairwise relationships only, whereas hypergraph enables us to associate any number of entities, which is essential in many real-life applications. Hypergraph learning algorithms have been well-studied for numerous problem settings, such as node cla… ▽ More Hypergraph is a data structure that enables us to model higher-order associations among data entities. Conventional graph-structured data can represent pairwise relationships only, whereas hypergraph enables us to associate any number of entities, which is essential in many real-life applications. Hypergraph learning algorithms have been well-studied for numerous problem settings, such as node classification, link prediction, etc. However, much less research has been conducted on anomaly detection from hypergraphs. Anomaly detection identifies events that deviate from the usual pattern and can be applied to hypergraphs to detect unusual higher-order associations. In this work, we propose an end-to-end hypergraph neural network-based model for identifying anomalous associations in a hypergraph. Our proposed algorithm operates in an unsupervised manner without requiring any labeled data. Extensive experimentation on several real-life datasets demonstrates the effectiveness of our model in detecting anomalous hyperedges. △ Less

Submitted 7 December, 2024; originally announced December 2024.

arXiv:2411.09712 [pdf, other]

Digital Twin-Assisted Space-Air-Ground Integrated Multi-Access Edge Computing for Low-Altitude Economy: An Online Decentralized Optimization Approach

Authors: Long He, Geng Sun, Zemin Sun, Jiacheng Wang, Hongyang Du, Dusit Niyato, Jiangchuan Liu, Victor C. M. Leung

Abstract: The emergence of space-air-ground integrated multi-access edge computing (SAGIMEC) networks opens a significant opportunity for the rapidly growing low altitude economy (LAE), facilitating the development of various applications by offering efficient communication and computing services. However, the heterogeneous nature of SAGIMEC networks, coupled with the stringent computational and communicati… ▽ More The emergence of space-air-ground integrated multi-access edge computing (SAGIMEC) networks opens a significant opportunity for the rapidly growing low altitude economy (LAE), facilitating the development of various applications by offering efficient communication and computing services. However, the heterogeneous nature of SAGIMEC networks, coupled with the stringent computational and communication requirements of diverse applications in the LAE, introduces considerable challenges in integrating SAGIMEC into the LAE. In this work, we first present a digital twin-assisted SAGIMEC paradigm for LAE, where digital twin enables reliable network monitoring and management, while SAGIMEC provides efficient computing offloading services for Internet of Things sensor devices (ISDs). Then, a joint satellite selection, computation offloading, communication resource allocation, computation resource allocation and UAV trajectory control optimization problem (JSC4OP) is formulated to maximize the quality of service (QoS) of ISDs. Given the complexity of JSC4OP, we propose an online decentralized optimization approach (ODOA) to address the problem. Specifically, JSC4OP is first transformed into a real-time decision-making optimization problem (RDOP) by leveraging Lyapunov optimization. Then, to solve the RDOP, we introduce an online learning-based latency prediction method to predict the uncertain system environment and a game theoretic decision-making method to make real-time decisions. Finally, theoretical analysis confirms the effectiveness of the ODOA, while the simulation results demonstrate that the proposed ODOA outperforms other alternative approaches in terms of overall system performance. △ Less

Submitted 30 January, 2025; v1 submitted 12 November, 2024; originally announced November 2024.

Comments: arXiv admin note: text overlap with arXiv:2406.11918

arXiv:2411.00838 [pdf, other]

Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment

Authors: Jiaqi Wu, Simin Chen, Zehua Wang, Wei Chen, Zijian Tian, F. Richard Yu, Victor C. M. Leung

Abstract: As the volume of image data grows, data-oriented cloud computing in Internet of Video Things (IoVT) systems encounters latency issues. Task-oriented edge computing addresses this by shifting data analysis to the edge. However, limited computational power of edge devices poses challenges for executing visual tasks. Existing methods struggle to balance high model performance with low resource consum… ▽ More As the volume of image data grows, data-oriented cloud computing in Internet of Video Things (IoVT) systems encounters latency issues. Task-oriented edge computing addresses this by shifting data analysis to the edge. However, limited computational power of edge devices poses challenges for executing visual tasks. Existing methods struggle to balance high model performance with low resource consumption; lightweight neural networks often underperform, while device-specific models designed by Neural Architecture Search (NAS) fail to adapt to heterogeneous devices. For these issues, we propose a novel co-design framework to optimize neural network architecture and deployment strategies during inference for high-throughput. Specifically, it implements a dynamic model structure based on re-parameterization, coupled with a Roofline-based model partitioning strategy to enhance the computational performance of edge devices. We also employ a multi-objective co-optimization approach to balance throughput and accuracy. Additionally, we derive mathematical consistency and convergence of partitioned models. Experimental results demonstrate significant improvements in throughput (12.05\% on MNIST, 18.83\% on ImageNet) and superior classification accuracy compared to baseline algorithms. Our method consistently achieves stable performance across different devices, underscoring its adaptability. Simulated experiments further confirm its efficacy in high-accuracy, real-time detection for small objects in IoVT systems. △ Less

Submitted 29 October, 2024; originally announced November 2024.

arXiv:2410.13901 [pdf, ps, other]

SoK: Prompt Hacking of Large Language Models

Authors: Baha Rababah, Shang, Wu, Matthew Kwiatkowski, Carson Leung, Cuneyt Gurcan Akcora

Abstract: The safety and robustness of large language models (LLMs) based applications remain critical challenges in artificial intelligence. Among the key threats to these applications are prompt hacking attacks, which can significantly undermine the security and reliability of LLM-based systems. In this work, we offer a comprehensive and systematic overview of three distinct types of prompt hacking: jailb… ▽ More The safety and robustness of large language models (LLMs) based applications remain critical challenges in artificial intelligence. Among the key threats to these applications are prompt hacking attacks, which can significantly undermine the security and reliability of LLM-based systems. In this work, we offer a comprehensive and systematic overview of three distinct types of prompt hacking: jailbreaking, leaking, and injection, addressing the nuances that differentiate them despite their overlapping characteristics. To enhance the evaluation of LLM-based applications, we propose a novel framework that categorizes LLM responses into five distinct classes, moving beyond the traditional binary classification. This approach provides more granular insights into the AI's behavior, improving diagnostic precision and enabling more targeted enhancements to the system's safety and robustness. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.03885 [pdf, other]

Collaborative Safety-Critical Formation Control with Obstacle Avoidance

Authors: Brooks A. Butler, Chi Ho Leung, Philip E. Paré

Abstract: This work explores a collaborative method for ensuring safety in multi-agent formation control problems. We formulate a control barrier function (CBF) based safety filter control law for a generic distributed formation controller and extend our previously developed collaborative safety framework to an obstacle avoidance problem for agents with acceleration control inputs. We then incorporate multi… ▽ More This work explores a collaborative method for ensuring safety in multi-agent formation control problems. We formulate a control barrier function (CBF) based safety filter control law for a generic distributed formation controller and extend our previously developed collaborative safety framework to an obstacle avoidance problem for agents with acceleration control inputs. We then incorporate multi-obstacle collision avoidance into the collaborative safety framework. This framework includes a method for computing the maximum capability of agents to satisfy their individual safety requirements. We analyze the convergence rate of our collaborative safety algorithm, and prove the linear-time convergence of cooperating agents to a jointly feasible safe action for all agents under the special case of a tree-structured communication network with a single obstacle for each agent. We illustrate the analytical results via simulation on a mass-spring kinematics-based formation controller and demonstrate the finite-time convergence of the collaborative safety algorithm in the simple proven case, the more general case of a fully-connected system with multiple static obstacles, and with dynamic obstacles. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: This work is under review for publication in Automatica. arXiv admin note: text overlap with arXiv:2311.11156

arXiv:2410.02345 [pdf, other]

Coastal Underwater Evidence Search System with Surface-Underwater Collaboration

Authors: Hin Wang Lin, Pengyu Wang, Zhaohua Yang, Ka Chun Leung, Fangming Bao, Ka Yu Kui, Jian Xiang Erik Xu, Ling Shi

Abstract: The Coastal underwater evidence search system with surface-underwater collaboration is designed to revolutionize the search for artificial objects in coastal underwater environments, overcoming limitations associated with traditional methods such as divers and tethered remotely operated vehicles. Our innovative multi-robot collaborative system consists of three parts, an autonomous surface vehicle… ▽ More The Coastal underwater evidence search system with surface-underwater collaboration is designed to revolutionize the search for artificial objects in coastal underwater environments, overcoming limitations associated with traditional methods such as divers and tethered remotely operated vehicles. Our innovative multi-robot collaborative system consists of three parts, an autonomous surface vehicle as a mission control center, a towed underwater vehicle for wide-area search, and a biomimetic underwater robot inspired by marine organisms for detailed inspections of identified areas. We conduct extensive simulations and real-world experiments in pond environments and coastal fields to demonstrate the system potential to surpass the limitations of conventional underwater search methods, offering a robust and efficient solution for law enforcement and recovery operations in marine settings. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: This paper has been accepted by the 18th International Conference on Control, Automation, Robotics and Vision (ICARCV)

arXiv:2409.17668 [pdf]

A Database Engineered System for Big Data Analytics on Tornado Climatology

Authors: Fengfan Bian, Carson K. Leung, Piers Grenier, Harry Pu, Samuel Ning, Alfredo Cuzzocrea

Abstract: Recognizing the challenges with current tornado warning systems, we investigate alternative approaches. In particular, we present a database engi-neered system that integrates information from heterogeneous rich data sources, including climatology data for tornadoes and data just before a tornado warning. The system aids in predicting tornado occurrences by identifying the data points that form th… ▽ More Recognizing the challenges with current tornado warning systems, we investigate alternative approaches. In particular, we present a database engi-neered system that integrates information from heterogeneous rich data sources, including climatology data for tornadoes and data just before a tornado warning. The system aids in predicting tornado occurrences by identifying the data points that form the basis of a tornado warning. Evaluation on US data highlights the advantages of using a classification forecasting recurrent neural network (RNN) model. The results highlight the effectiveness of our database engineered system for big data analytics on tornado climatology-especially, in accurately predict-ing tornado lead-time, magnitude, and location, contributing to the development of sustainable cities. △ Less

Submitted 26 September, 2024; originally announced September 2024.

arXiv:2408.10691 [pdf, other]

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Authors: Yanjie Dong, Haijun Zhang, Chengming Li, Song Guo, Victor C. M. Leung, Xiping Hu

Abstract: Since the invention of GPT2--1.5B in 2019, large language models (LLMs) have transitioned from specialized models to versatile foundation models. The LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment. Traditional fine-tuning techniques with the first-order optimizers require substantial GPU memory that exceeds mainstr… ▽ More Since the invention of GPT2--1.5B in 2019, large language models (LLMs) have transitioned from specialized models to versatile foundation models. The LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment. Traditional fine-tuning techniques with the first-order optimizers require substantial GPU memory that exceeds mainstream hardware capability. Therefore, memory-efficient methods are motivated to be investigated. Model compression techniques can reduce energy consumption, operational costs, and environmental impact so that to support sustainable artificial intelligence advancements. Additionally, large-scale foundation models have expanded to create images, audio, videos, and multi-modal contents, further emphasizing the need for efficient deployment. Therefore, we are motivated to present a comprehensive overview of the prevalent memory-efficient fine-tuning methods over the network edge. We also review the state-of-the-art literatures on model compression to provide a vision on deploying LLMs over the network edge. △ Less

Submitted 1 October, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

arXiv:2407.18338 [pdf, other]

SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images

Authors: Ching Ting Leung, Yufan Chen, Hanyu Gao

Abstract: Optical chemical structure recognition (OCSR) systems aim to extract the molecular structure information, usually in the form of molecular graph or SMILES, from images of chemical molecules. While many tools have been developed for this purpose, challenges still exist due to different types of noises that might exist in the images. Specifically, we focus on the 'arrow-pushing' diagrams, a typical… ▽ More Optical chemical structure recognition (OCSR) systems aim to extract the molecular structure information, usually in the form of molecular graph or SMILES, from images of chemical molecules. While many tools have been developed for this purpose, challenges still exist due to different types of noises that might exist in the images. Specifically, we focus on the 'arrow-pushing' diagrams, a typical type of chemical images to demonstrate electron flow in mechanistic steps. We present Structural molecular identifier of Molecular images in Chemical Reaction Mechanisms (SMiCRM), a dataset designed to benchmark machine recognition capabilities of chemical molecules with arrow-pushing annotations. Comprising 453 images, it spans a broad array of organic chemical reactions, each illustrated with molecular structures and mechanistic arrows. SMiCRM offers a rich collection of annotated molecule images for enhancing the benchmarking process for OCSR methods. This dataset includes a machine-readable molecular identity for each image as well as mechanistic arrows showing electron flow during chemical reactions. It presents a more authentic and challenging task for testing molecular recognition technologies, and achieving this task can greatly enrich the mechanisitic information in computer-extracted chemical reaction data. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Under Submission

arXiv:2406.08115 [pdf, other]

Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey

Authors: Feng Liang, Zhen Zhang, Haifeng Lu, Chengming Li, Victor C. M. Leung, Yanyi Guo, Xiping Hu

Abstract: With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for… ▽ More With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for resource allocation and workload scheduling in distributed deep learning, such as scheduling complexity, resource and workload heterogeneity, and fault tolerance. To uncover these challenges and corresponding solutions, this survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL. We explore these strategies by focusing on various resource types, scheduling granularity levels, and performance goals during distributed training and inference processes. We highlight critical challenges for each topic and discuss key insights of existing technologies. To illustrate practical large-scale resource allocation and workload scheduling in real distributed deep learning scenarios, we use a case study of training large language models. This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances and explore future research directions for efficient framework strategies for large-scale distributed deep learning. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.10347 [pdf, other]

Networking Systems for Video Anomaly Detection: A Tutorial and Survey

Authors: Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Liang Cao, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C. M. Leung

Abstract: The increasing utilization of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD ha… ▽ More The increasing utilization of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD has made significant progress and advances synergized with emerging applications in smart cities and video internet, which has moved beyond the conventional research scope of algorithm engineering to deployable Networking Systems for VAD (NSVAD), a practical hotspot for intersection exploration in the AI, IoVT, and computing fields. In this article, we delineate the foundational assumptions, learning frameworks, and applicable scenarios of various deep learning-driven VAD routes, offering an exhaustive tutorial for novices in NSVAD. In addition, this article elucidates core concepts by reviewing recent advances and typical solutions and aggregating available research resources accessible at https://github.com/fdjingliu/NSVAD. Lastly, this article projects future development trends and discusses how the integration of AI and computing technologies can address existing research challenges and promote open opportunities, serving as an insightful guide for prospective researchers and engineers. △ Less

Submitted 3 April, 2025; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to ACM Computing Surveys. For more information and supplementary material, please visit https://github.com/fdjingliu/NSVAD

arXiv:2405.07518 [pdf, other]

doi 10.1109/MICRO61859.2024.00100

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them. In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2$\times$ to 13$\times$ on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19$\times$, speeds up model switching time by 15$\times$ to 31$\times$, and achieves an overall speedup of 3.7$\times$ over a DGX H100 and 6.6$\times$ over a DGX A100. △ Less

Submitted 4 November, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)

ACM Class: C.1.3; C.0

arXiv:2405.03798 [pdf, other]

Update Rate, Accuracy, and Age of Information in a Wireless Sensor Network

Authors: Xinlu Dai, Cyril Leung

Abstract: Age of Information (AoI), namely the time that has elapsed since the most recently delivered packet was generated, is receiving increasing attention with the emergence of many real-time applications that rely on the exchange of time-sensitive information. AoI captures the freshness of the information from the perspective of the destination. The term "accuracy of information" is used to assess how… ▽ More Age of Information (AoI), namely the time that has elapsed since the most recently delivered packet was generated, is receiving increasing attention with the emergence of many real-time applications that rely on the exchange of time-sensitive information. AoI captures the freshness of the information from the perspective of the destination. The term "accuracy of information" is used to assess how close the estimate at the destination is to the parameter value measured by the sensor. In this paper, the mean square error (MSE) is used to evaluate the accuracy of information. We focus on a single sensor that monitors a time-sensitive physical process, which is modelled as a random walk. Whenever the state of the random walk changes by more than a specified threshold, the sensor generates a status update packet and transmits it to the destination. When no update packet is received, the destination assumes that the state of the process has not changed. We study the problem of finding the minimum update rate under AoI and accuracy of information constraints. More specifically, we derive analytical expressions for the update rate, the AoI, and the MSE. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.15292 [pdf, other]

Multi-objective Optimization for Multi-UAV-assisted Mobile Edge Computing

Authors: Geng Sun, Yixian Wang, Zemin Sun, Qingqing Wu, Jiawen Kang, Dusit Niyato, Victor C. M. Leung

Abstract: Recent developments in unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) have provided users with flexible and resilient computing services. However, meeting the computing-intensive and latency-sensitive demands of users poses a significant challenge due to the limited resources of UAVs. To address this challenge, we present a multi-objective optimization approach for multi-UAV-assis… ▽ More Recent developments in unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) have provided users with flexible and resilient computing services. However, meeting the computing-intensive and latency-sensitive demands of users poses a significant challenge due to the limited resources of UAVs. To address this challenge, we present a multi-objective optimization approach for multi-UAV-assisted MEC systems. First, we formulate a multi-objective optimization problem \textcolor{b2}{aiming} at minimizing the total task completion delay, reducing the total UAV energy consumption, and maximizing the total amount of offloaded tasks by jointly optimizing task offloading, computation resource allocation, and UAV trajectory control. Since the problem is a mixed-integer non-linear programming (MINLP) and NP-hard problem which is challenging, we propose a joint task offloading, computation resource allocation, and UAV trajectory control (JTORATC) approach to solve the problem. \textcolor{b3}{However, since the decision variables of task offloading, computation resource allocation, and UAV trajectory control are coupled with each other, the original problem is split into three sub-problems, i.e., task offloading, computation resource allocation, and UAV trajectory control, which are solved individually to obtain the corresponding decisions.} \textcolor{b2}{Moreover, the sub-problem of task offloading is solved by using distributed splitting and threshold rounding methods, the sub-problem of computation resource allocation is solved by adopting the Karush-Kuhn-Tucker (KKT) method, and the sub-problem of UAV trajectory control is solved by employing the successive convex approximation (SCA) method.} Simulation results show that the proposed JTORATC has superior performance compared to the other benchmark methods. △ Less

Submitted 23 March, 2024; originally announced April 2024.

arXiv:2404.13348 [pdf, other]

Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems

Authors: Xiaofei Wang, Yunfeng Zhao, Chao Qiu, Qinghua Hu, Victor C. M. Leung

Abstract: Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in sup… ▽ More Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in supporting services with diverse requirements. In response to these issues, this paper introduces socialized learning (SL) as a promising solution, further propelling the advancement of EI. SL is a learning paradigm predicated on social principles and behaviors, aimed at amplifying the collaborative capacity and collective intelligence of agents within the EI system. SL not only enhances the system's adaptability but also optimizes communication, and networking processes, essential for distributed intelligence across diverse devices and platforms. Therefore, a combination of SL and EI may greatly facilitate the development of collaborative intelligence in the future network. This paper presents the findings of a literature review on the integration of EI and SL, summarizing the latest achievements in existing research on EI and SL. Subsequently, we delve comprehensively into the limitations of EI and how it could benefit from SL. Special emphasis is placed on the communication challenges and networking strategies and other aspects within these systems, underlining the role of optimized network solutions in improving system efficiency. Based on these discussions, we elaborate in detail on three integrated components: socialized architecture, socialized training, and socialized inference, analyzing their strengths and weaknesses. Finally, we identify some possible future applications of combining SL and EI, discuss open problems and suggest some future research. △ Less

Submitted 3 November, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by IEEE Communications Surveys and Tutorials

arXiv:2404.07450 [pdf, other]

Collaborative Ground-Space Communications via Evolutionary Multi-objective Deep Reinforcement Learning

Authors: Jiahui Li, Geng Sun, Qingqing Wu, Dusit Niyato, Jiawen Kang, Abbas Jamalipour, Victor C. M. Leung

Abstract: In this paper, we propose a distributed collaborative beamforming (DCB)-based uplink communication paradigm for enabling ground-space direct communications. Specifically, DCB treats the terminals that are unable to establish efficient direct connections with the low Earth orbit (LEO) satellites as distributed antennas, forming a virtual antenna array to enhance the terminal-to-satellite uplink ach… ▽ More In this paper, we propose a distributed collaborative beamforming (DCB)-based uplink communication paradigm for enabling ground-space direct communications. Specifically, DCB treats the terminals that are unable to establish efficient direct connections with the low Earth orbit (LEO) satellites as distributed antennas, forming a virtual antenna array to enhance the terminal-to-satellite uplink achievable rates and durations. However, such systems need multiple trade-off policies that variously balance the terminal-satellite uplink achievable rate, energy consumption of terminals, and satellite switching frequency to satisfy the scenario requirement changes. Thus, we perform a multi-objective optimization analysis and formulate a long-term optimization problem. To address availability in different terminal cluster scales, we reformulate this problem into an action space-reduced and universal multi-objective Markov decision process. Then, we propose an evolutionary multi-objective deep reinforcement learning algorithm to obtain the desirable policies, in which the low-value actions are masked to speed up the training process. As such, the applicability of a one-time trained model can cover more changing terminal-satellite uplink scenarios. Simulation results show that the proposed algorithm outmatches various baselines, and draw some useful insights. Specifically, it is found that DCB enables terminals that cannot reach the uplink achievable threshold to achieve efficient direct uplink transmission, which thus reveals that DCB is an effective solution for enabling direct ground-space communications. Moreover, it reveals that the proposed algorithm achieves multiple policies favoring different objectives and achieving near-optimal uplink achievable rates with low switching frequency. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: This paper has been submitted to IEEE Journal on Selected Areas in Communications

arXiv:2404.06114 [pdf, other]

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Authors: Feng Liang, Zhen Zhang, Haifeng Lu, Victor C. M. Leung, Yanyi Guo, Xiping Hu

Abstract: With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resou… ▽ More With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resources. Due to intensive synchronization of models and sharing of data across GPUs and computing nodes during distributed training and inference processes, communication efficiency becomes the bottleneck for achieving high performance at a large scale. This article surveys the literature over the period of 2018-2023 on algorithms and technologies aimed at achieving efficient communication in large-scale distributed deep learning at various levels, including algorithms, frameworks, and infrastructures. Specifically, we first introduce efficient algorithms for model synchronization and communication data compression in the context of large-scale distributed training. Next, we introduce efficient strategies related to resource allocation and task scheduling for use in distributed training and inference. After that, we present the latest technologies pertaining to modern communication infrastructures used in distributed deep learning with a focus on examining the impact of the communication overhead in a large-scale and heterogeneous setting. Finally, we conduct a case study on the distributed training of large language models at a large scale to illustrate how to apply these technologies in real cases. This article aims to offer researchers a comprehensive understanding of the current landscape of large-scale distributed deep learning and to reveal promising future research directions toward communication-efficient solutions in this scope. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.01347 [pdf, other]

Mining Sequential Patterns in Uncertain Databases Using Hierarchical Index Structure

Authors: Kashob Kumar Roy, Md Hasibul Haque Moon, Md Mahmudur Rahman, Chowdhury Farhan Ahmed, Carson K. Leung

Abstract: In this uncertain world, data uncertainty is inherent in many applications and its importance is growing drastically due to the rapid development of modern technologies. Nowadays, researchers have paid more attention to mine patterns in uncertain databases. A few recent works attempt to mine frequent uncertain sequential patterns. Despite their success, they are incompetent to reduce the number of… ▽ More In this uncertain world, data uncertainty is inherent in many applications and its importance is growing drastically due to the rapid development of modern technologies. Nowadays, researchers have paid more attention to mine patterns in uncertain databases. A few recent works attempt to mine frequent uncertain sequential patterns. Despite their success, they are incompetent to reduce the number of false-positive pattern generation in their mining process and maintain the patterns efficiently. In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. A novel hierarchical structure is introduced to maintain the patterns in a space-efficient way. Afterward, we develop a versatile framework for mining uncertain sequential patterns that can effectively handle weight constraints as well. Besides, with the advent of incremental uncertain databases, existing works are not scalable. There exist several incremental sequential pattern mining algorithms, but they are limited to mine in precise databases. Therefore, we propose a new technique to adapt our framework to mine patterns when the database is incremental. Finally, we conduct extensive experiments on several real-life datasets and show the efficacy of our framework in different applications. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted at PAKDD 2021. arXiv admin note: text overlap with arXiv:2404.00746

arXiv:2404.00746 [pdf, other]

Mining Weighted Sequential Patterns in Incremental Uncertain Databases

Authors: Kashob Kumar Roy, Md Hasibul Haque Moon, Md Mahmudur Rahman, Chowdhury Farhan Ahmed, Carson Kai-Sang Leung

Abstract: Due to the rapid development of science and technology, the importance of imprecise, noisy, and uncertain data is increasing at an exponential rate. Thus, mining patterns in uncertain databases have drawn the attention of researchers. Moreover, frequent sequences of items from these databases need to be discovered for meaningful knowledge with great impact. In many real cases, weights of items and… ▽ More Due to the rapid development of science and technology, the importance of imprecise, noisy, and uncertain data is increasing at an exponential rate. Thus, mining patterns in uncertain databases have drawn the attention of researchers. Moreover, frequent sequences of items from these databases need to be discovered for meaningful knowledge with great impact. In many real cases, weights of items and patterns are introduced to find interesting sequences as a measure of importance. Hence, a constraint of weight needs to be handled while mining sequential patterns. Besides, due to the dynamic nature of databases, mining important information has become more challenging. Instead of mining patterns from scratch after each increment, incremental mining algorithms utilize previously mined information to update the result immediately. Several algorithms exist to mine frequent patterns and weighted sequences from incremental databases. However, these algorithms are confined to mine the precise ones. Therefore, we have developed an algorithm to mine frequent sequences in an uncertain database in this work. Furthermore, we have proposed two new techniques for mining when the database is incremental. Extensive experiments have been conducted for performance evaluation. The analysis showed the efficiency of our proposed framework. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted to Information Science journal

Journal ref: Information Sciences 582 (2022): 865-896

arXiv:2403.17934 [pdf, other]

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Authors: Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai

Abstract: Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer… ▽ More Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer from 1) loss of valuable contextual information via cropping, 2) introducing distractions, and 3) lacking inter-association among different persons and body parts, inevitably causing performance degradation, especially for crowded scenes. To address these issues, we introduce a novel all-in-one-stage framework, AiOS, for multiple expressive human pose and shape recovery without an additional human detection step. Specifically, our method is built upon DETR, which treats multi-person whole-body mesh recovery task as a progressive set prediction problem with various sequential detection. We devise the decoder tokens and extend them to our task. Specifically, we first employ a human token to probe a human location in the image and encode global features for each instance, which provides a coarse location for the later transformer block. Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature, which collaborates with the global feature to regress the whole-body mesh. This straightforward but effective model outperforms previous state-of-the-art methods by a 9% reduction in NMVE on AGORA, a 30% reduction in PVE on EHF, a 10% reduction in PVE on ARCTIC, and a 3% reduction in PVE on EgoBody. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Homepage: https://ttxskk.github.io/AiOS/

arXiv:2403.03691 [pdf, other]

doi 10.1186/s13321-024-00926-w

MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

Authors: Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao

Abstract: In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of… ▽ More In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext, a powerful Convolutional Neural Network variant, and Vision-TRansformer. This integration facilitates a more detailed extraction of both local and global features from molecular images. MolNexTR can predict atoms and bonds simultaneously and understand their layout rules. It also excels at flexibly integrating symbolic chemistry principles to discern chirality and decipher abbreviated structures. We further incorporate a series of advanced algorithms, including an improved data augmentation module, an image contamination module, and a post-processing module for getting the final SMILES output. These modules cooperate to enhance the model's robustness to diverse styles of molecular images found in real literature. In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%, marking a significant advancement in the domain of molecular structure recognition. △ Less

Submitted 27 August, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Journal ref: Journal of Cheminformatics (2024) 16:141

arXiv:2403.00473 [pdf, other]

Computer-Controlled 3D Freeform Surface Weaving

Authors: Xiangjia Chen, Lip M. Lai, Zishun Liu, Chengkai Dai, Isaac C. W. Leung, Charlie C. L. Wang, Yeung Yam

Abstract: In this paper, we present a new computer-controlled weaving technology that enables the fabrication of woven structures in the shape of given 3D surfaces by using threads in non-traditional materials with high bending-stiffness, allowing for multiple applications with the resultant woven fabrics. A new weaving machine and a new manufacturing process are developed to realize the function of 3D surf… ▽ More In this paper, we present a new computer-controlled weaving technology that enables the fabrication of woven structures in the shape of given 3D surfaces by using threads in non-traditional materials with high bending-stiffness, allowing for multiple applications with the resultant woven fabrics. A new weaving machine and a new manufacturing process are developed to realize the function of 3D surface weaving by the principle of short-row shaping. A computational solution is investigated to convert input 3D freeform surfaces into the corresponding weaving operations (indicated as W-code) to guide the operation of this system. A variety of examples using cotton threads, conductive threads and optical fibres are fabricated by our prototype system to demonstrate its functionality. △ Less

Submitted 8 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.18392 [pdf, other]

Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators

Authors: Yiyan Huang, Cheuk Hang Leung, Siyi Wang, Yijun Li, Qi Wu

Abstract: The growing demand for personalized decision-making has led to a surge of interest in estimating the Conditional Average Treatment Effect (CATE). Various types of CATE estimators have been developed with advancements in machine learning and causal inference. However, selecting the desirable CATE estimator through a conventional model validation procedure remains impractical due to the absence of c… ▽ More The growing demand for personalized decision-making has led to a surge of interest in estimating the Conditional Average Treatment Effect (CATE). Various types of CATE estimators have been developed with advancements in machine learning and causal inference. However, selecting the desirable CATE estimator through a conventional model validation procedure remains impractical due to the absence of counterfactual outcomes in observational data. Existing approaches for CATE estimator selection, such as plug-in and pseudo-outcome metrics, face two challenges. First, they must determine the metric form and the underlying machine learning models for fitting nuisance parameters (e.g., outcome function, propensity function, and plug-in learner). Second, they lack a specific focus on selecting a robust CATE estimator. To address these challenges, this paper introduces a Distributionally Robust Metric (DRM) for CATE estimator selection. The proposed DRM is nuisance-free, eliminating the need to fit models for nuisance parameters, and it effectively prioritizes the selection of a distributionally robust CATE estimator. The experimental results validate the effectiveness of the DRM method in selecting CATE estimators that are robust to the distribution shift incurred by covariate shift and hidden confounders. △ Less

Submitted 31 October, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: This paper was accepted by NeurIPS-2024

arXiv:2402.05396 [pdf, other]

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

Authors: Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor Prasanna

Abstract: Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that sign… ▽ More Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time. △ Less

Submitted 23 November, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: IPDPS 2024

arXiv:2402.03317 [pdf, other]

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Authors: Xixu Hu, Runkai Zheng, Jindong Wang, Cheuk Hang Leung, Qi Wu, Xing Xie

Abstract: Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local L… ▽ More Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs' attention layers, we enhance the model's robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn. △ Less

Submitted 13 July, 2024; v1 submitted 2 January, 2024; originally announced February 2024.

Comments: Accepted by ECCV 2024; 27 pages; code is at: https://github.com/microsoft/robustlearn

arXiv:2312.10388 [pdf, other]

The Causal Impact of Credit Lines on Spending Distributions

Authors: Yijun Li, Cheuk Hang Leung, Xiangqian Sun, Chaoqun Wang, Yiyan Huang, Xing Yan, Qi Wu, Dongdong Wang, Zhixiang Huang

Abstract: Consumer credit services offered by e-commerce platforms provide customers with convenient loan access during shopping and have the potential to stimulate sales. To understand the causal impact of credit lines on spending, previous studies have employed causal estimators, based on direct regression (DR), inverse propensity weighting (IPW), and double machine learning (DML) to estimate the treatmen… ▽ More Consumer credit services offered by e-commerce platforms provide customers with convenient loan access during shopping and have the potential to stimulate sales. To understand the causal impact of credit lines on spending, previous studies have employed causal estimators, based on direct regression (DR), inverse propensity weighting (IPW), and double machine learning (DML) to estimate the treatment effect. However, these estimators do not consider the notion that an individual's spending can be understood and represented as a distribution, which captures the range and pattern of amounts spent across different orders. By disregarding the outcome as a distribution, valuable insights embedded within the outcome distribution might be overlooked. This paper develops a distribution-valued estimator framework that extends existing real-valued DR-, IPW-, and DML-based estimators to distribution-valued estimators within Rubin's causal framework. We establish their consistency and apply them to a real dataset from a large e-commerce platform. Our findings reveal that credit lines positively influence spending across all quantiles; however, as credit lines increase, consumers allocate more to luxuries (higher quantiles) than necessities (lower quantiles). △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.07917 [pdf, other]

On Designing Multi-UAV aided Wireless Powered Dynamic Communication via Hierarchical Deep Reinforcement Learning

Authors: Ze Yu Zhao, Yue Ling Che, Sheng Luo, Gege Luo, Kaishun Wu, Victor C. M. Leung

Abstract: This paper proposes a novel design on the wireless powered communication network (WPCN) in dynamic environments under the assistance of multiple unmanned aerial vehicles (UAVs). Unlike the existing studies, where the low-power wireless nodes (WNs) often conform to the coherent harvest-then-transmit protocol, under our newly proposed double-threshold based WN type updating rule, each WN can dynamic… ▽ More This paper proposes a novel design on the wireless powered communication network (WPCN) in dynamic environments under the assistance of multiple unmanned aerial vehicles (UAVs). Unlike the existing studies, where the low-power wireless nodes (WNs) often conform to the coherent harvest-then-transmit protocol, under our newly proposed double-threshold based WN type updating rule, each WN can dynamically and repeatedly update its WN type as an E-node for non-linear energy harvesting over time slots or an I-node for transmitting data over sub-slots. To maximize the total transmission data size of all the WNs over T slots, each of the UAVs individually determines its trajectory and binary wireless energy transmission (WET) decisions over times slots and its binary wireless data collection (WDC) decisions over sub-slots, under the constraints of each UAV's limited on-board energy and each WN's node type updating rule. However, due to the UAVs' tightly-coupled trajectories with their WET and WDC decisions, as well as each WN's time-varying battery energy, this problem is difficult to solve optimally. We then propose a new multi-agent based hierarchical deep reinforcement learning (MAHDRL) framework with two tiers to solve the problem efficiently, where the soft actor critic (SAC) policy is designed in tier-1 to determine each UAV's continuous trajectory and binary WET decision over time slots, and the deep-Q learning (DQN) policy is designed in tier-2 to determine each UAV's binary WDC decisions over sub-slots under the given UAV trajectory from tier-1. Both of the SAC policy and the DQN policy are executed distributively at each UAV. Finally, extensive simulation results are provided to validate the outweighed performance of the proposed MAHDRL approach over various state-of-the-art benchmarks. △ Less

Submitted 6 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: 13 pages, 10 figures; Submitted for possible journal publishing

arXiv:2311.13233 [pdf, other]

A Survey of Adversarial CAPTCHAs on its History, Classification and Generation

Authors: Zisheng Xu, Qiao Yan, F. Richard Yu, Victor C. M. Leung

Abstract: Completely Automated Public Turing test to tell Computers and Humans Apart, short for CAPTCHA, is an essential and relatively easy way to defend against malicious attacks implemented by bots. The security and usability trade-off limits the use of massive geometric transformations to interfere deep model recognition and deep models even outperformed humans in complex CAPTCHAs. The discovery of adve… ▽ More Completely Automated Public Turing test to tell Computers and Humans Apart, short for CAPTCHA, is an essential and relatively easy way to defend against malicious attacks implemented by bots. The security and usability trade-off limits the use of massive geometric transformations to interfere deep model recognition and deep models even outperformed humans in complex CAPTCHAs. The discovery of adversarial examples provides an ideal solution to the security and usability trade-off by integrating adversarial examples and CAPTCHAs to generate adversarial CAPTCHAs that can fool the deep models. In this paper, we extend the definition of adversarial CAPTCHAs and propose a classification method for adversarial CAPTCHAs. Then we systematically review some commonly used methods to generate adversarial examples and methods that are successfully used to generate adversarial CAPTCHAs. Also, we analyze some defense methods that can be used to defend adversarial CAPTCHAs, indicating potential threats to adversarial CAPTCHAs. Finally, we discuss some possible future research directions for adversarial CAPTCHAs at the end of this paper. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Submitted to ACM Computing Surveys (Under Review)

arXiv:2311.11156 [pdf, other]

Collaborative Safe Formation Control for Coupled Multi-Agent Systems

Authors: Brooks A. Butler, Chi Ho Leung, Philip E. Paré

Abstract: The safe control of multi-robot swarms is a challenging and active field of research, where common goals include maintaining group cohesion while simultaneously avoiding obstacles and inter-agent collision. Building off our previously developed theory for distributed collaborative safety-critical control for networked dynamic systems, we propose a distributed algorithm for the formation control of… ▽ More The safe control of multi-robot swarms is a challenging and active field of research, where common goals include maintaining group cohesion while simultaneously avoiding obstacles and inter-agent collision. Building off our previously developed theory for distributed collaborative safety-critical control for networked dynamic systems, we propose a distributed algorithm for the formation control of robot swarms given individual agent dynamics, induced formation dynamics, and local neighborhood position and velocity information within a defined sensing radius for each agent. Individual safety guarantees for each agent are obtained using rounds of communication between neighbors to restrict unsafe control actions among cooperating agents through safety conditions derived from high-order control barrier functions. We provide conditions under which a swarm is guaranteed to achieve collective safety with respect to multiple obstacles using a modified collaborative safety algorithm. We demonstrate the performance of our distributed algorithm via simulation in a simplified physics-based environment. △ Less

Submitted 2 April, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: This work has been accepted to be presented at the 2024 European Control Conference

arXiv:2310.01980 [pdf, other]

UAV Swarm-enabled Collaborative Secure Relay Communications with Time-domain Colluding Eavesdropper

Authors: Chuang Zhang, Geng Sun, Qingqing Wu, Jiahui Li, Shuang Liang, Dusit Niyato, Victor C. M. Leung

Abstract: Unmanned aerial vehicles (UAVs) as aerial relays are practically appealing for assisting Internet of Things (IoT) network. In this work, we aim to utilize the UAV swarm to assist the secure communication between the micro base station (MBS) equipped with the planar array antenna (PAA) and the IoT terminal devices by collaborative beamforming (CB), so as to counteract the effects of collusive eaves… ▽ More Unmanned aerial vehicles (UAVs) as aerial relays are practically appealing for assisting Internet of Things (IoT) network. In this work, we aim to utilize the UAV swarm to assist the secure communication between the micro base station (MBS) equipped with the planar array antenna (PAA) and the IoT terminal devices by collaborative beamforming (CB), so as to counteract the effects of collusive eavesdropping attacks in time-domain. Specifically, we formulate a UAV swarm-enabled secure relay multi-objective optimization problem (US2RMOP) for simultaneously maximizing the achievable sum rate of associated IoT terminal devices, minimizing the achievable sum rate of the eavesdropper and minimizing the energy consumption of UAV swarm, by jointly optimizing the excitation current weights of both MBS and UAV swarm, the selection of the UAV receiver, the position of UAVs and user association order of IoT terminal devices. Furthermore, the formulated US2RMOP is proved to be a non-convex, NP-hard and large-scale optimization problem. Therefore, we propose an improved multi-objective grasshopper algorithm (IMOGOA) with some specific designs to address the problem. Simulation results exhibit the effectiveness of the proposed UAV swarm-enabled collaborative secure relay strategy and demonstrate the superiority of IMOGOA. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: Submitted to IEEE Transactions on Mobile Computing

Showing 1–50 of 191 results for author: Leung, C