Search | arXiv e-print repository

AI Flow: Perspectives, Scenarios, and Approaches

Authors: Hongjun An, Wenhan Hu, Sida Huang, Siqi Huang, Ruanjun Li, Yuanzhi Liang, Jiawei Shao, Yiliang Song, Zihan Wang, Cheng Yuan, Chi Zhang, Hongyuan Zhang, Wenhao Zhuang, Xuelong Li

Abstract: Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models th… ▽ More Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models that are reshaping industries and redefining human-machine collaboration. However, the realization of ubiquitous intelligence faces considerable challenges due to substantial resource consumption in large models and high communication bandwidth demands. To address these challenges, AI Flow has been introduced as a multidisciplinary framework that integrates cutting-edge IT and CT advancements, with a particular emphasis on the following three key points. First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters to optimize scalability and efficiency for low-latency model inference. Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features, enabling effective collaboration and the flexibility to adapt to varying resource constraints and dynamic scenarios. Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow. By leveraging communication networks to enhance connectivity, the collaboration among AI models across heterogeneous nodes achieves emergent intelligence that surpasses the capability of any single model. The innovations of AI Flow provide enhanced intelligence, timely responsiveness, and ubiquitous accessibility to AI services, paving the way for the tighter fusion of AI techniques and communication systems. △ Less

Submitted 3 July, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

Comments: Authors are with Institute of Artificial Intelligence (TeleAI), China Telecom, China. Author names are listed alphabetically by surname. This work was conducted at TeleAI, facilitated by Dr. Jiawei Shao (e-mail: [email protected]) under the leadership of Prof. Xuelong Li. The corresponding author is Prof. Xuelong Li (e-mail: xuelong [email protected]), the CTO and Chief Scientist of China Telecom

arXiv:2505.01074 [pdf, other]

WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

Authors: Jingwen Tong, Wei Guo, Jiawei Shao, Qiong Wu, Zijian Li, Zehong Lin, Jun Zhang

Abstract: The rapid evolution of wireless networks presents unprecedented challenges in managing complex and dynamic systems. Existing methods are increasingly facing fundamental limitations in addressing these challenges. In this paper, we introduce WirelessAgent, a novel framework that harnesses large language models (LLMs) to create autonomous AI agents for diverse wireless network tasks. This framework… ▽ More The rapid evolution of wireless networks presents unprecedented challenges in managing complex and dynamic systems. Existing methods are increasingly facing fundamental limitations in addressing these challenges. In this paper, we introduce WirelessAgent, a novel framework that harnesses large language models (LLMs) to create autonomous AI agents for diverse wireless network tasks. This framework integrates four core modules that mirror human cognitive processes: perception, memory, planning, and action. To implement it, we provide a basic usage based on agentic workflows and the LangGraph architecture. We demonstrate the effectiveness of WirelessAgent through a comprehensive case study on network slicing. The numerical results show that WirelessAgent achieves $44.4\%$ higher bandwidth utilization than the \emph{Prompt-based} method, while performing only $4.3\%$ below the \emph{Rule-based optimality}. Notably, WirelessAgent delivers near-optimal network throughput across diverse network scenarios. These underscore the framework's potential for intelligent and autonomous resource management in future wireless networks. The code is available at \url{https://github.com/jwentong/WirelessAgent_R1}. △ Less

Submitted 2 May, 2025; originally announced May 2025.

Comments: This manuscript is an extended version of a previous magazine version and is now submitted to a journal for possible publication. arXiv admin note: text overlap with arXiv:2409.07964

arXiv:2503.20195 [pdf, other]

Mutual Information-Empowered Task-Oriented Communication: Principles, Applications and Challenges

Authors: Hongru Li, Songjie Xie, Jiawei Shao, Zixin Wang, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

Abstract: Mutual information (MI)-based guidelines have recently proven to be effective for designing task-oriented communication systems, where the ultimate goal is to extract and transmit task-relevant information for downstream task. This paper provides a comprehensive overview of MI-empowered task-oriented communication, highlighting how MI-based methods can serve as a unifying design framework in vario… ▽ More Mutual information (MI)-based guidelines have recently proven to be effective for designing task-oriented communication systems, where the ultimate goal is to extract and transmit task-relevant information for downstream task. This paper provides a comprehensive overview of MI-empowered task-oriented communication, highlighting how MI-based methods can serve as a unifying design framework in various task-oriented communication scenarios. We begin with the roadmap of MI for designing task-oriented communication systems, and then introduce the roles and applications of MI to guide feature encoding, transmission optimization, and efficient training with two case studies. We further elaborate the limitations and challenges of MI-based methods. Finally, we identify several open issues in MI-based task-oriented communication to inspire future research. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: 8 pages,5 figures, submitted to IEEE for potential publication

arXiv:2503.12926 [pdf, other]

Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference

Authors: Cheng Yuan, Zhening Liu, Jiashu Lv, Jiawei Shao, Yufei Jiang, Jun Zhang, Xuelong Li

Abstract: With the rapid development of large multimodal models (LMMs), multimodal understanding applications are emerging. As most LMM inference requests originate from edge devices with limited computational capabilities, the predominant inference pipeline involves directly forwarding the input data to an edge server which handles all computations. However, this approach introduces high transmission laten… ▽ More With the rapid development of large multimodal models (LMMs), multimodal understanding applications are emerging. As most LMM inference requests originate from edge devices with limited computational capabilities, the predominant inference pipeline involves directly forwarding the input data to an edge server which handles all computations. However, this approach introduces high transmission latency due to limited uplink bandwidth of edge devices and significant computation latency caused by the prohibitive number of visual tokens, thus hindering delay-sensitive tasks and degrading user experience. To address this challenge, we propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework, where visual features are merged by clustering and encoded by a learnable and selective entropy model before feature projection. Specifically, we employ density peaks clustering based on K nearest neighbors to reduce the number of visual features, thereby minimizing both data transmission and computational complexity. Subsequently, a learnable entropy model with hyperprior is utilized to encode and decode merged features, further reducing transmission overhead. To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features, enabling a more accurate estimation of the probability distribution. Comprehensive experiments on seven visual question answering benchmarks validate the effectiveness of the proposed TOFC method. Results show that TOFC achieves up to 60% reduction in data transmission overhead and 50% reduction in system latency while maintaining identical task performance, compared with traditional image compression methods. △ Less

Submitted 17 March, 2025; originally announced March 2025.

arXiv:2503.00467 [pdf, other]

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Authors: Xueyang Wang, Zhixin Zheng, Jiandong Shao, Yule Duan, Liang-Jian Deng

Abstract: Recent advancements in convolutional neural network (CNN)-based techniques for remote sensing pansharpening have markedly enhanced image quality. However, conventional convolutional modules in these methods have two critical drawbacks. First, the sampling positions in convolution operations are confined to a fixed square window. Second, the number of sampling points is preset and remains unchanged… ▽ More Recent advancements in convolutional neural network (CNN)-based techniques for remote sensing pansharpening have markedly enhanced image quality. However, conventional convolutional modules in these methods have two critical drawbacks. First, the sampling positions in convolution operations are confined to a fixed square window. Second, the number of sampling points is preset and remains unchanged. Given the diverse object sizes in remote sensing images, these rigid parameters lead to suboptimal feature extraction. To overcome these limitations, we introduce an innovative convolutional module, Adaptive Rectangular Convolution (ARConv). ARConv adaptively learns both the height and width of the convolutional kernel and dynamically adjusts the number of sampling points based on the learned scale. This approach enables ARConv to effectively capture scale-specific features of various objects within an image, optimizing kernel sizes and sampling locations. Additionally, we propose ARNet, a network architecture in which ARConv is the primary convolutional module. Extensive evaluations across multiple datasets reveal the superiority of our method in enhancing pansharpening performance over previous techniques. Ablation studies and visualization further confirm the efficacy of ARConv. △ Less

Submitted 1 March, 2025; originally announced March 2025.

Comments: 8 pages, 6 figures, Accepted by CVPR

arXiv:2501.13772 [pdf, ps, other]

Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

Authors: Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Shen, Philip Torr, Jindong Gu, Renjing Xu

Abstract: Large Language Models (LLMs) demonstrate impressive zero-shot performance across a wide range of natural language processing tasks. Integrating various modality encoders further expands their capabilities, giving rise to Multimodal Large Language Models (MLLMs) that process not only text but also visual and auditory modality inputs. However, these advanced capabilities may also pose significant se… ▽ More Large Language Models (LLMs) demonstrate impressive zero-shot performance across a wide range of natural language processing tasks. Integrating various modality encoders further expands their capabilities, giving rise to Multimodal Large Language Models (MLLMs) that process not only text but also visual and auditory modality inputs. However, these advanced capabilities may also pose significant security risks, as models can be exploited to generate harmful or inappropriate content through jailbreak attack. While prior work has extensively explored how manipulating textual or visual modality inputs can circumvent safeguards in LLMs and MLLMs, the vulnerability of audio-specific Jailbreak on Large Audio-Language Models (LALMs) remains largely underexplored. To address this gap, we introduce \textbf{Jailbreak-AudioBench}, which consists of the Toolbox, curated Dataset, and comprehensive Benchmark. The Toolbox supports not only text-to-audio conversion but also various editing techniques for injecting audio hidden semantics. The curated Dataset provides diverse explicit and implicit jailbreak audio examples in both original and edited forms. Utilizing this dataset, we evaluate multiple state-of-the-art LALMs and establish the most comprehensive Jailbreak benchmark to date for audio modality. Finally, Jailbreak-AudioBench establishes a foundation for advancing future research on LALMs safety alignment by enabling the in-depth exposure of more powerful jailbreak threats, such as query-based audio editing, and by facilitating the development of effective defense mechanisms. △ Less

Submitted 1 June, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

arXiv:2411.12469 [pdf, other]

AI Flow at the Network Edge

Authors: Jiawei Shao, Xuelong Li

Abstract: Recent advancements in large language models (LLMs) and their multimodal variants have led to remarkable progress across various domains, demonstrating impressive capabilities and unprecedented potential. In the era of ubiquitous connectivity, leveraging communication networks to distribute intelligence is a transformative concept, envisioning AI-powered services accessible at the network edge. Ho… ▽ More Recent advancements in large language models (LLMs) and their multimodal variants have led to remarkable progress across various domains, demonstrating impressive capabilities and unprecedented potential. In the era of ubiquitous connectivity, leveraging communication networks to distribute intelligence is a transformative concept, envisioning AI-powered services accessible at the network edge. However, pushing large models from the cloud to resource-constrained environments faces critical challenges. Model inference on low-end devices leads to excessive latency and performance bottlenecks, while raw data transmission over limited bandwidth networks causes high communication overhead. This article presents AI Flow, a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers, making intelligence flow across networks. To facilitate cooperation among multiple computational nodes, the proposed framework explores a paradigm shift in the design of communication network systems from transmitting information flow to intelligence flow, where the goal of communications is task-oriented and folded into the inference process. Experimental results demonstrate the effectiveness of the proposed framework through an image captioning use case, showcasing the ability to reduce response latency while maintaining high-quality captions. This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow. △ Less

Submitted 13 February, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

Comments: This paper has been accepted to IEEE Network Magazine

arXiv:2407.20748 [pdf, other]

Task-Oriented Communication for Vehicle-to-Infrastructure Cooperative Perception

Authors: Jiawei Shao, Teng Li, Jun Zhang

Abstract: Vehicle-to-infrastructure (V2I) cooperative perception plays a crucial role in autonomous driving scenarios. Despite its potential to improve perception accuracy and robustness, the large amount of raw sensor data inevitably results in high communication overhead. To mitigate this issue, we propose TOCOM-V2I, a task-oriented communication framework for V2I cooperative perception, which reduces ban… ▽ More Vehicle-to-infrastructure (V2I) cooperative perception plays a crucial role in autonomous driving scenarios. Despite its potential to improve perception accuracy and robustness, the large amount of raw sensor data inevitably results in high communication overhead. To mitigate this issue, we propose TOCOM-V2I, a task-oriented communication framework for V2I cooperative perception, which reduces bandwidth consumption by transmitting only task-relevant information, instead of the raw data stream, for perceiving the surrounding environment. Our contributions are threefold. First, we propose a spatial-aware feature selection module to filter out irrelevant information based on spatial relationships and perceptual prior. Second, we introduce a hierarchical entropy model to exploit redundancy within the features for efficient compression and transmission. Finally, we utilize a scaled dot-product attention architecture to fuse vehicle-side and infrastructure-side features to improve perception performance. Experimental results demonstrate the effectiveness of TOCOM-V2I. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.10632 [pdf, other]

Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model

Authors: Zhening Liu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang

Abstract: With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo im… ▽ More With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo image compression architecture, named BiSIC. Specifically, we propose a 3D convolution based codec backbone to capture local features and incorporate bidirectional attention blocks to exploit global features. Moreover, we design a novel cross-dimensional entropy model that integrates various conditioning factors, including the spatial context, channel context, and stereo dependency, to effectively estimate the distribution of latent representations for entropy coding. Extensive experiments demonstrate that our proposed BiSIC outperforms conventional image/video compression standards, as well as state-of-the-art learning-based methods, in terms of both PSNR and MS-SSIM. △ Less

Submitted 26 October, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.00949 [pdf, ps, other]

SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, training and test times required. In this paper, we propose an spectral Kolmogorov-Arnold Network for HSIs-CD (SpectralKAN). SpectralKAN represent a multivariate continuous function with a composition of activation functions to extract HSIs feature and classification. These activation functions are b-spline functions with different parameters that can simulate various functions. In SpectralKAN, a KAN encoder is proposed to enhance computational efficiency for HSIs. And a spatial-spectral KAN encoder is introduced, where the spatial KAN encoder extracts spatial features and compresses the spatial dimensions from patch size to one. The spectral KAN encoder then extracts spectral features and classifies them into changed and unchanged categories. We use five HSIs-CD datasets to verify the effectiveness of SpectralKAN. Experimental verification has shown that SpectralKAN maintains high HSIs-CD accuracy while requiring fewer parameters, FLOPs, GPU memory, training and testing times, thereby increasing the efficiency of HSIs-CD. The code will be available at https://github.com/yanhengwang-heu/SpectralKAN. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2405.09514 [pdf, other]

Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck

Authors: Hongru Li, Jiawei Shao, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

Abstract: Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that t… ▽ More Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that the encoded features can generalize to domain-shifted data and detect semanticshifted data, while remaining compact for transmission. In this paper, we propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework. The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization and accurate semantic-shift detection without any knowledge of the test data during training. Specifically, we propose an invariant feature encoding approach based on the IB principle and IRM framework for domainshift generalization, which aims to find the causal relationship between the input data and task result by minimizing the complexity and domain dependence of the encoded feature. Furthermore, we enhance the task-oriented communication with the label-dependent feature encoding approach for semanticshift detection which achieves joint gains in IB optimization and detection performance. To avoid the intractable computation of the IB-based objective, we leverage variational approximation to derive a tractable upper bound for optimization. Extensive simulation results on image classification tasks demonstrate that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 13 pages, 8 figures, submitted to IEEE for potential publication

arXiv:2403.08505 [pdf, other]

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

Abstract: Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi… ▽ More Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed. Code is available at https://github.com/Xinjie-Q/CAMSIC. △ Less

Submitted 8 February, 2025; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted by AAAI 2025

arXiv:2307.02779 [pdf, other]

Large Language Models Empowered Autonomous Edge AI for Connected Intelligence

Authors: Yifei Shen, Jiawei Shao, Xinjie Zhang, Zehong Lin, Hao Pan, Dongsheng Li, Jun Zhang, Khaled B. Letaief

Abstract: The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network… ▽ More The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. This article presents a vision of autonomous edge AI systems that automatically organize, adapt, and optimize themselves to meet users' diverse requirements, leveraging the power of large language models (LLMs), i.e., Generative Pretrained Transformer (GPT). By exploiting the powerful abilities of GPT in language understanding, planning, and code generation, as well as incorporating classic wisdom such as task-oriented communication and edge federated learning, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models in a privacy-preserving manner. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models at edge servers. △ Less

Submitted 25 December, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: IEEE Communication Magazine

arXiv:2305.17630 [pdf, ps, other]

doi 10.1103/PhysRevA.109.012609

Solving quantum optimal control problems using projection-operator-based Newton steps

Authors: Jieqiu Shao, Mantas Naris, John Hauser, Marco M. Nicotra

Abstract: The Quantum Projection Operator-Based NewtonMethod for Trajectory Optimization (Q-PRONTO) is a numerical method for solving quantum optimal control problems. This paper significantly improves prior versions of the quantum projection operator by introducing a regulator that stabilizes the solution estimate at every iteration. This modification is shown to not only improve the convergence rate of th… ▽ More The Quantum Projection Operator-Based NewtonMethod for Trajectory Optimization (Q-PRONTO) is a numerical method for solving quantum optimal control problems. This paper significantly improves prior versions of the quantum projection operator by introducing a regulator that stabilizes the solution estimate at every iteration. This modification is shown to not only improve the convergence rate of the algorithm, but also steer the solver towards better local minima compared to the unregulated case. Numerical examples showcase how Q-PRONTO can be used to solve multi-input quantum optimal control problems featuring time-varying costs and undesirable populations that ought to be avoided during the transient. △ Less

Submitted 8 January, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 11 pages, 9 figures

Journal ref: Phys. Rev. A 109 (2024) 012609

arXiv:2305.12423 [pdf, other]

Task-Oriented Communication with Out-of-Distribution Detection: An Information Bottleneck Framework

Authors: Hongru Li, Wentao Yu, Hengtao He, Jiawei Shao, Shenghui Song, Jun Zhang, Khaled B. Letaief

Abstract: Task-oriented communication is an emerging paradigm for next-generation communication networks, which extracts and transmits task-relevant information, instead of raw data, for downstream applications. Most existing deep learning (DL)-based task-oriented communication systems adopt a closed-world scenario, assuming either the same data distribution for training and testing, or the system could hav… ▽ More Task-oriented communication is an emerging paradigm for next-generation communication networks, which extracts and transmits task-relevant information, instead of raw data, for downstream applications. Most existing deep learning (DL)-based task-oriented communication systems adopt a closed-world scenario, assuming either the same data distribution for training and testing, or the system could have access to a large out-of-distribution (OoD) dataset for retraining. However, in practical open-world scenarios, task-oriented communication systems need to handle unknown OoD data. Under such circumstances, the powerful approximation ability of learning methods may force the task-oriented communication systems to overfit the training data (i.e., in-distribution data) and provide overconfident judgments when encountering OoD data. Based on the information bottleneck (IB) framework, we propose a class conditional IB (CCIB) approach to address this problem in this paper, supported by information-theoretical insights. The idea is to extract distinguishable features from in-distribution data while keeping their compactness and informativeness. This is achieved by imposing the class conditional latent prior distribution and enforcing the latent of different classes to be far away from each other. Simulation results shall demonstrate that the proposed approach detects OoD data more efficiently than the baselines and state-of-the-art approaches, without compromising the rate-distortion tradeoff. △ Less

Submitted 27 January, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: code available in github, accepted by IEEE GLOBECOM2023

arXiv:2303.11599 [pdf, other]

Low-complexity Deep Video Compression with A Distributed Coding Architecture

Authors: Xinjie Zhang, Jiawei Shao, Jun Zhang

Abstract: Prevalent predictive coding-based video compression methods rely on a heavy encoder to reduce temporal redundancy, which makes it challenging to deploy them on resource-constrained devices. Since the 1970s, distributed source coding theory has indicated that independent encoding and joint decoding with side information (SI) can achieve high-efficient compression of correlated sources. This has ins… ▽ More Prevalent predictive coding-based video compression methods rely on a heavy encoder to reduce temporal redundancy, which makes it challenging to deploy them on resource-constrained devices. Since the 1970s, distributed source coding theory has indicated that independent encoding and joint decoding with side information (SI) can achieve high-efficient compression of correlated sources. This has inspired a distributed coding architecture aiming at reducing the encoding complexity. However, traditional distributed coding methods suffer from a substantial performance gap to predictive coding ones. Inspired by the great success of learning-based compression, we propose the first end-to-end distributed deep video compression framework to improve the rate-distortion performance. A key ingredient is an effective SI generation module at the decoder, which helps to effectively exploit inter-frame correlations without computation-intensive encoder-side motion estimation and compensation. Experiments show that our method significantly outperforms conventional distributed video coding and H.264. Meanwhile, it enjoys 6-7x encoding speedup against DVC [1] with comparable compression performance. Code is released at https://github.com/Xinjie-Q/Distributed-DVC. △ Less

Submitted 2 April, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted by ICME 2023

arXiv:2301.09799 [pdf, other]

LDMIC: Learning-based Distributed Multi-view Image Coding

Authors: Xinjie Zhang, Jiawei Shao, Jun Zhang

Abstract: Multi-view image compression plays a critical role in 3D-related applications. Existing methods adopt a predictive coding architecture, which requires joint encoding to compress the corresponding disparity as well as residual information. This demands collaboration among cameras and enforces the epipolar geometric constraint between different views, which makes it challenging to deploy these metho… ▽ More Multi-view image compression plays a critical role in 3D-related applications. Existing methods adopt a predictive coding architecture, which requires joint encoding to compress the corresponding disparity as well as residual information. This demands collaboration among cameras and enforces the epipolar geometric constraint between different views, which makes it challenging to deploy these methods in distributed camera systems with randomly overlapping fields of view. Meanwhile, distributed source coding theory indicates that efficient data compression of correlated sources can be achieved by independent encoding and joint decoding, which motivates us to design a learning-based distributed multi-view image coding (LDMIC) framework. With independent encoders, LDMIC introduces a simple yet effective joint context transfer module based on the cross-attention mechanism at the decoder to effectively capture the global inter-view correlations, which is insensitive to the geometric relationships between images. Experimental results show that LDMIC significantly outperforms both traditional and learning-based MIC methods while enjoying fast encoding speed. Code will be released at https://github.com/Xinjie-Q/LDMIC. △ Less

Submitted 12 April, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: Accepted by ICLR 2023

arXiv:2211.14049 [pdf, other]

Task-Oriented Communication for Edge Video Analytics

Authors: Jiawei Shao, Xinjie Zhang, Jun Zhang

Abstract: With the development of artificial intelligence (AI) techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video an… ▽ More With the development of artificial intelligence (AI) techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video analytics, but the device-server communications will remain a bottleneck due to the limited bandwidth. This paper proposes a task-oriented communication framework for edge video analytics, where multiple devices collect the visual sensory data and transmit the informative features to an edge server for processing. To enable low-latency inference, this framework removes video redundancy in spatial and temporal domains and transmits minimal information that is essential for the downstream task, rather than reconstructing the videos at the edge server. Specifically, it extracts compact task-relevant features based on the deterministic information bottleneck (IB) principle, which characterizes a tradeoff between the informativeness of the features and the communication cost. As the features of consecutive frames are temporally correlated, we propose a temporal entropy model (TEM) to reduce the bitrate by taking the previous features as side information in feature encoding. To further improve the inference performance, we build a spatial-temporal fusion module at the server to integrate features of the current and previous frames for joint inference. Extensive experiments on video analytics tasks evidence that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods. △ Less

Submitted 1 April, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: This paper was accepted to IEEE Transactions on Wireless Communications (TWC)

arXiv:2205.10619 [pdf, other]

A Pilot Study of Relating MYCN-Gene Amplification with Neuroblastoma-Patient CT Scans

Authors: Zihan Zhang, Xiang Xiang, Xuehua Peng, Jianbo Shao

Abstract: Neuroblastoma is one of the most common cancers in infants, and the initial diagnosis of this disease is difficult. At present, the MYCN gene amplification (MNA) status is detected by invasive pathological examination of tumor samples. This is time-consuming and may have a hidden impact on children. To handle this problem, we adopt multiple machine learning (ML) algorithms to predict the presence… ▽ More Neuroblastoma is one of the most common cancers in infants, and the initial diagnosis of this disease is difficult. At present, the MYCN gene amplification (MNA) status is detected by invasive pathological examination of tumor samples. This is time-consuming and may have a hidden impact on children. To handle this problem, we adopt multiple machine learning (ML) algorithms to predict the presence or absence of MYCN gene amplification. The dataset is composed of retrospective CT images of 23 neuroblastoma patients. Different from previous work, we develop the algorithm without manually-segmented primary tumors which is time-consuming and not practical. Instead, we only need the coordinate of the center point and the number of tumor slices given by a subspecialty-trained pediatric radiologist. Specifically, CNN-based method uses pre-trained convolutional neural network, and radiomics-based method extracts radiomics features. Our results show that CNN-based method outperforms the radiomics-based method. △ Less

Submitted 21 May, 2022; originally announced May 2022.

arXiv:2205.05144 [pdf, other]

doi 10.1364/DH.2022.W3A.3

Limited-memory BFGS Optimisation of Phase-Only Computer-Generated Hologram for Fraunhofer Diffraction

Authors: Jinze Sha, Andrew Kadis, Fan Yang, Timothy D. Wilkinson

Abstract: We implement a novel limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimisation algorithm with cross entropy (CE) loss function, to produce phase-only computer-generated hologram (CGH) for holographic displays, with validation on a binary-phase modulation holographic projector. We implement a novel limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimisation algorithm with cross entropy (CE) loss function, to produce phase-only computer-generated hologram (CGH) for holographic displays, with validation on a binary-phase modulation holographic projector. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Journal ref: Digital Holography and 3-D Imaging 2022 Technical Digest Series (Optica Publishing Group, 2022), paper W3A.3

arXiv:2204.06913 [pdf, other]

doi 10.2352/J.ImagingSci.Technol.2023.67.3.030405

Digital Pre-Distorted One-Step Phase Retrieval Algorithm for Real-Time Hologram Generation for Holographic Displays

Authors: Jinze Sha, Adam Goldney, Andrew Kadis, Jana Skirnewskaja, Timothy D. Wilkinson

Abstract: In a computer-generated holographic projection system, the image is reconstructed via the diffraction of light from a spatial light modulator. In this process, several factors could contribute to non-linearities between the reconstruction and the target image. This paper evaluates the non-linearity of the overall holographic projection system experimentally, using binary phase holograms computed u… ▽ More In a computer-generated holographic projection system, the image is reconstructed via the diffraction of light from a spatial light modulator. In this process, several factors could contribute to non-linearities between the reconstruction and the target image. This paper evaluates the non-linearity of the overall holographic projection system experimentally, using binary phase holograms computed using the one-step phase retrieval (OSPR) algorithm, and then applies a digital pre-distortion (DPD) method to correct for the non-linearity. Both a notable increase in reconstruction quality and a significant reduction in mean squared error were observed, proving the effectiveness of the proposed DPD-OSPR algorithm. △ Less

Submitted 12 January, 2023; v1 submitted 14 April, 2022; originally announced April 2022.

Journal ref: Journal of Imaging Science and Technology, 2023, pp 030405-1 - 030405-7

arXiv:2203.09487 [pdf, other]

Defending Against Adversarial Attack in ECG Classification with Adversarial Distillation Training

Authors: Jiahao Shao, Shijia Geng, Zhaoji Fu, Weilun Xu, Tong Liu, Shenda Hong

Abstract: In clinics, doctors rely on electrocardiograms (ECGs) to assess severe cardiac disorders. Owing to the development of technology and the increase in health awareness, ECG signals are currently obtained by using medical and commercial devices. Deep neural networks (DNNs) can be used to analyze these signals because of their high accuracy rate. However, researchers have found that adversarial attack… ▽ More In clinics, doctors rely on electrocardiograms (ECGs) to assess severe cardiac disorders. Owing to the development of technology and the increase in health awareness, ECG signals are currently obtained by using medical and commercial devices. Deep neural networks (DNNs) can be used to analyze these signals because of their high accuracy rate. However, researchers have found that adversarial attacks can significantly reduce the accuracy of DNNs. Studies have been conducted to defend ECG-based DNNs against traditional adversarial attacks, such as projected gradient descent (PGD), and smooth adversarial perturbation (SAP) which targets ECG classification; however, to the best of our knowledge, no study has completely explored the defense against adversarial attacks targeting ECG classification. Thus, we did different experiments to explore the effects of defense methods against white-box adversarial attack and black-box adversarial attack targeting ECG classification, and we found that some common defense methods performed well against these attacks. Besides, we proposed a new defense method called Adversarial Distillation Training (ADT) which comes from defensive distillation and can effectively improve the generalization performance of DNNs. The results show that our method performed more effectively against adversarial attacks targeting on ECG classification than the other baseline methods, namely, adversarial training, defensive distillation, Jacob regularization, and noise-to-signal ratio regularization. Furthermore, we found that our method performed better against PGD attacks with low noise levels, which means that our method has stronger robustness. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2111.08795 [pdf, other]

doi 10.1103/PhysRevA.105.032605

A Projection Operator-based Newton Method for the Trajectory Optimization of Closed Quantum Systems

Authors: Jieqiu Shao, Joshua Combes, John Hauser, Marco M. Nicotra

Abstract: Quantum optimal control is an important technology that enables fast state preparation and gate design. In the absence of an analytic solution, most quantum optimal control methods rely on an iterative scheme to update the solution estimate. At present, the convergence rate of existing solvers is at most superlinear. This paper develops a new general purpose solver for quantum optimal control base… ▽ More Quantum optimal control is an important technology that enables fast state preparation and gate design. In the absence of an analytic solution, most quantum optimal control methods rely on an iterative scheme to update the solution estimate. At present, the convergence rate of existing solvers is at most superlinear. This paper develops a new general purpose solver for quantum optimal control based on the PRojection Operator Newton method for Trajectory Optimization, or PRONTO. Specifically, the proposed approach uses a projection operator to incorporate the Schrödinger equation directly into the cost function, which is then minimized using a quasi-Newton method. At each iteration, the descent direction is obtained by computing the analytic solution to a Linear-Quadratic trajectory optimization problem. The resulting method guarantees monotonic convergence at every iteration and quadratic convergence in proximity of the solution. To highlight the potential of PRONTO, we present an numerical example that employs it to solve the optimal state-to-state mapping problem for a qubit and compares its performance to a state-of-the-art quadratic optimal control method. △ Less

Submitted 25 October, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: 10 pages

Journal ref: Phys. Rev. A 105, 032605 (2022)

arXiv:2109.00172 [pdf, other]

Task-Oriented Communication for Multi-Device Cooperative Edge Inference

Authors: Jiawei Shao, Yuyi Mao, Jun Zhang

Abstract: This paper investigates task-oriented communication for multi-device cooperative edge inference, where a group of distributed low-end edge devices transmit the extracted features of local samples to a powerful edge server for inference. While cooperative edge inference can overcome the limited sensing capability of a single device, it substantially increases the communication overhead and may incu… ▽ More This paper investigates task-oriented communication for multi-device cooperative edge inference, where a group of distributed low-end edge devices transmit the extracted features of local samples to a powerful edge server for inference. While cooperative edge inference can overcome the limited sensing capability of a single device, it substantially increases the communication overhead and may incur excessive latency. To enable low-latency cooperative inference, we propose a learning-based communication scheme that optimizes local feature extraction and distributed feature encoding in a task-oriented manner, i.e., to remove data redundancy and transmit information that is essential for the downstream inference task rather than reconstructing the data samples at the edge server. Specifically, we leverage an information bottleneck (IB) principle to extract the task-relevant feature at each edge device and adopt a distributed information bottleneck (DIB) framework to formalize a single-letter characterization of the optimal rate-relevance tradeoff for distributed feature encoding. To admit flexible control of the communication overhead, we extend the DIB framework to a distributed deterministic information bottleneck (DDIB) objective that explicitly incorporates the representational costs of the encoded features. As the IB-based objectives are computationally prohibitive for high-dimensional data, we adopt variational approximations to make the optimization problems tractable. To compensate the potential performance loss due to the variational approximations, we also develop a selective retransmission (SR) mechanism to identify the redundancy in the encoded features of multiple edge devices to attain additional communication overhead reduction. Extensive experiments evidence that the proposed task-oriented communication scheme achieves a better rate-relevance tradeoff than baseline methods. △ Less

Submitted 12 September, 2023; v1 submitted 31 August, 2021; originally announced September 2021.

Comments: This paper was accepted to IEEE Transactions on Wireless Communication

arXiv:2108.13009 [pdf, ps, other]

Communication-Computation Efficient Device-Edge Co-Inference via AutoML

Authors: Xinjie Zhang, Jiawei Shao, Yuyi Mao, Jun Zhang

Abstract: Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the inference process, on-device model sparsification and intermediate feature compression are regarded as two prominent techniques. However, as the on-device model sparsi… ▽ More Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the inference process, on-device model sparsification and intermediate feature compression are regarded as two prominent techniques. However, as the on-device model sparsity level and intermediate feature compression ratio have direct impacts on computation workload and communication overhead respectively, and both of them affect the inference accuracy, finding the optimal values of these hyper-parameters brings a major challenge due to the large search space. In this paper, we endeavor to develop an efficient algorithm to determine these hyper-parameters. By selecting a suitable model split point and a pair of encoder/decoder for the intermediate feature vector, this problem is casted as a sequential decision problem, for which, a novel automated machine learning (AutoML) framework is proposed based on deep reinforcement learning (DRL). Experiment results on an image classification task demonstrate the effectiveness of the proposed framework in achieving a better communication-computation trade-off and significant inference speedup against various baseline schemes. △ Less

Submitted 31 August, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

arXiv:2103.09565 [pdf, other]

Color image segmentation based on a convex K-means approach

Authors: Tingting Wu, Xiaoyu Gu, Jinbo Shao, Ruoxuan Zhou, Zhi Li

Abstract: Image segmentation is a fundamental and challenging task in image processing and computer vision. The color image segmentation is attracting more attention due to the color image provides more information than the gray image. In this paper, we propose a variational model based on a convex K-means approach to segment color images. The proposed variational method uses a combination of $l_1$ and… ▽ More Image segmentation is a fundamental and challenging task in image processing and computer vision. The color image segmentation is attracting more attention due to the color image provides more information than the gray image. In this paper, we propose a variational model based on a convex K-means approach to segment color images. The proposed variational method uses a combination of $l_1$ and $l_2$ regularizers to maintain edge information of objects in images while overcoming the staircase effect. Meanwhile, our one-stage strategy is an improved version based on the smoothing and thresholding strategy, which contributes to improving the accuracy of segmentation. The proposed method performs the following steps. First, we specify the color set which can be determined by human or the K-means method. Second, we use a variational model to obtain the most appropriate color for each pixel from the color set via convex relaxation and lifting. The Chambolle-Pock algorithm and simplex projection are applied to solve the variational model effectively. Experimental results and comparison analysis demonstrate the effectiveness and robustness of our method. △ Less

Submitted 17 March, 2021; originally announced March 2021.

arXiv:2102.04170 [pdf, other]

Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach

Authors: Jiawei Shao, Yuyi Mao, Jun Zhang

Abstract: This paper investigates task-oriented communication for edge inference, where a low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing. It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth. We propose a learning-based communication scheme that jointl… ▽ More This paper investigates task-oriented communication for edge inference, where a low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing. It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth. We propose a learning-based communication scheme that jointly optimizes feature extraction, source coding, and channel coding in a task-oriented manner, i.e., targeting the downstream inference task rather than data reconstruction. Specifically, we leverage an information bottleneck (IB) framework to formalize a rate-distortion tradeoff between the informativeness of the encoded feature and the inference performance. As the IB optimization is computationally prohibitive for the high-dimensional data, we adopt a variational approximation, namely the variational information bottleneck (VIB), to build a tractable upper bound. To reduce the communication overhead, we leverage a sparsity-inducing distribution as the variational prior for the VIB framework to sparsify the encoded feature vector. Furthermore, considering dynamic channel conditions in practical communication systems, we propose a variable-length feature encoding scheme based on dynamic neural networks to adaptively adjust the activated dimensions of the encoded feature to different channel conditions. Extensive experiments evidence that the proposed task-oriented communication system achieves a better rate-distortion tradeoff than baseline methods and significantly reduces the feature transmission latency in dynamic channel conditions. △ Less

Submitted 18 January, 2023; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: This paper was accepted to the IEEE JSAC Series on Machine Learning for Communications and Networks and will be published in Jan 2022

arXiv:2007.09561 [pdf, ps, other]

Leader-Driven Opinion Dynamics in Signed Social Networks With Asynchronous Trust/Distrust Level Evolution

Authors: Lei Shi, Yuhua Cheng, Jinliang Shao, Xiaofan Wang, Hanmin Sheng

Abstract: Trust and distrust are common in the opinion interactions among agents in social networks, and they are described by the edges with positive and negative weights in the signed digraph, respectively. It has been shown in social psychology that although the opinions of most agents (followers) tend to prevail, sometimes one agent (leader) with a firm stand and strong influence can impact or even over… ▽ More Trust and distrust are common in the opinion interactions among agents in social networks, and they are described by the edges with positive and negative weights in the signed digraph, respectively. It has been shown in social psychology that although the opinions of most agents (followers) tend to prevail, sometimes one agent (leader) with a firm stand and strong influence can impact or even overthrow the preferences of followers. This paper aims to analyze how the leader influences the formation of followers' opinions in signed social networks. In addition, this paper considers an asynchronous evolution mechanism of trust/distrust level based on opinion difference, in which the trust/distrust level between neighboring agents is portrayed as a nonlinear weight function of their opinion difference, and each agent interacts with the neighbors to update the trust/distrust level and opinion at the times determined by its own will. Based on the related properties of sub-stochastic and super-stochastic matrices, the inequality conditions about positive and negative weights to achieve opinion consensus and polarization are established. Some numerical simulations based on two well-known networks called the ``12 Angry Men" network and the Karate Club network are provided to verify the correctness of the theoretical results. △ Less

Submitted 3 May, 2021; v1 submitted 18 July, 2020; originally announced July 2020.

Comments: 12 pages, 15 figures

arXiv:2006.14836 [pdf, ps, other]

doi 10.1109/LCSYS.2020.3003789

Distributed Localization in Wireless Sensor Networks Under Denial-of-Service Attacks

Authors: Lei Shi, Qingchen Liu, Jinliang Shao, Yuhua Cheng

Abstract: In this paper, we study the problem of localizing the sensors' positions in presence of denial-of-service (DoS) attacks. We consider a general attack model, in which the attacker action is only constrained through the frequency and duration of DoS attacks. We propose a distributed iterative localization algorithm with an abandonment strategy based on the barycentric coordinate of a sensor with res… ▽ More In this paper, we study the problem of localizing the sensors' positions in presence of denial-of-service (DoS) attacks. We consider a general attack model, in which the attacker action is only constrained through the frequency and duration of DoS attacks. We propose a distributed iterative localization algorithm with an abandonment strategy based on the barycentric coordinate of a sensor with respect to its neighbors, which is computed through relative distance measurements. In particular, if a sensor's communication links for receiving its neighbors' information lose packets due to DoS attacks, then the sensor abandons the location estimation. When the attacker launches DoS attacks, the AS-DILOC algorithm is proved theoretically to be able to accurately locate the sensors regardless of the attack strategy at each time. The effectiveness of the proposed algorithm is demonstrated through simulation examples. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.07976 [pdf, other]

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Authors: Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li

Abstract: Localizing persons and recognizing their actions from videos is a challenging task towards high-level video understanding. Recent advances have been achieved by modeling direct pairwise relations between entities. In this paper, we take one step further, not only model direct relations between pairs but also take into account indirect higher-order relations established upon multiple elements. We p… ▽ More Localizing persons and recognizing their actions from videos is a challenging task towards high-level video understanding. Recent advances have been achieved by modeling direct pairwise relations between entities. In this paper, we take one step further, not only model direct relations between pairs but also take into account indirect higher-order relations established upon multiple elements. We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context. To this end, we design an Actor-Context-Actor Relation Network (ACAR-Net) which builds upon a novel High-order Relation Reasoning Operator and an Actor-Context Feature Bank to enable indirect relation reasoning for spatio-temporal action localization. Experiments on AVA and UCF101-24 datasets show the advantages of modeling actor-context-actor relations, and visualization of attention maps further verifies that our model is capable of finding relevant higher-order relations to support action detection. Notably, our method ranks first in the AVA-Kineticsaction localization task of ActivityNet Challenge 2020, out-performing other entries by a significant margin (+6.71mAP). Training code and models will be available at https://github.com/Siyu-C/ACAR-Net. △ Less

Submitted 20 April, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: Accepted in CVPR 2021

arXiv:2006.02166 [pdf, ps, other]

Communication-Computation Trade-Off in Resource-Constrained Edge Inference

Authors: Jiawei Shao, Jun Zhang

Abstract: The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inferenc… ▽ More The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods. △ Less

Submitted 14 October, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2004.01867 [pdf, ps, other]

Sub/super-stochastic matrix with applications to bipartite tracking control over signed networks

Authors: Lei Shi, Wei Xing Zheng, Jinliang Shao, Yuhua Cheng

Abstract: In this contribution, the properties of sub-stochastic matrix and super-stochastic matrix are applied to analyze the bipartite tracking issues of multi-agent systems (MASs) over signed networks, in which the edges with positive weight and negative weight are used to describe the cooperation and competition among the agents, respectively. For the sake of integrity of the study, the overall content… ▽ More In this contribution, the properties of sub-stochastic matrix and super-stochastic matrix are applied to analyze the bipartite tracking issues of multi-agent systems (MASs) over signed networks, in which the edges with positive weight and negative weight are used to describe the cooperation and competition among the agents, respectively. For the sake of integrity of the study, the overall content is divided into two parts. In the first part, we examine the dynamics of bipartite tracking for first-order MASs, second-order MASs and general linear MASs in the presence of asynchronous interactions, respectively. Asynchronous interactions mean that each agent only interacts with its neighbors at the instants when it wants to update the state rather than keeping compulsory consistent with other agents. In the second part, we investigate the problems of bipartite tracing in different practical scenarios, such as time delays, switching topologies, random networks, lossy links, matrix disturbance, external noise disturbance, and a leader of unmeasurable velocity and acceleration. The bipartite tracking problems of MASs under these different scenario settings can be equivalently converted into the product convergence problems of infinite sub-stochastic matrices (ISubSM) or infinite super-stochastic matrices (ISupSM). With the help of nonnegative matrix theory together with some key results related to the compositions of directed edge sets, we establish systematic algebraic-graphical methods of dealing with the product convergence of ISubSM and ISupSM. Finally, the efficiency of the proposed methods is verified by computer simulations. △ Less

Submitted 16 December, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

arXiv:2003.13067 [pdf, other]

doi 10.1016/j.trb.2021.06.009

Optimizing Coordinated Vehicle Platooning: An Analytical Approach Based on Stochastic Dynamic Programming

Authors: Xi Xiong, Junyi Sha, Li Jin

Abstract: Platooning connected and autonomous vehicles (CAVs) can improve traffic and fuel efficiency. However, scalable platooning operations require junction-level coordination, which has not been well studied. In this paper, we study the coordination of vehicle platooning at highway junctions. We consider a setting where CAVs randomly arrive at a highway junction according to a general renewal process. W… ▽ More Platooning connected and autonomous vehicles (CAVs) can improve traffic and fuel efficiency. However, scalable platooning operations require junction-level coordination, which has not been well studied. In this paper, we study the coordination of vehicle platooning at highway junctions. We consider a setting where CAVs randomly arrive at a highway junction according to a general renewal process. When a CAV approaches the junction, a system operator determines whether the CAV will merge into the platoon ahead according to the positions and speeds of the CAV and the platoon. We formulate a Markov decision process to minimize the discounted cumulative travel cost, i.e. fuel consumption plus travel delay, over an infinite time horizon. We show that the optimal policy is threshold-based: the CAV will merge with the platoon if and only if the difference between the CAV's and the platoon's predicted times of arrival at the junction is less than a constant threshold. We also propose two ready-to-implement algorithms to derive the optimal policy. Comparison with the classical value iteration algorithm implies that our approach explicitly incorporating the characteristics of the optimal policy is significantly more efficient in terms of computation. Importantly, we show that the optimal policy under Poisson arrivals can be obtained by solving a system of integral equations. We also validate our results in simulation with Real-time Strategy (RTS) using real traffic data. The simulation results indicate that the proposed method yields better performance compared with the conventional method. △ Less

Submitted 8 May, 2020; v1 submitted 29 March, 2020; originally announced March 2020.

Report number: Volume 150, August 2021, Pages 482-502

Journal ref: 2021

arXiv:1910.14315 [pdf, ps, other]

BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems

Authors: Jiawei Shao, Jun Zhang

Abstract: The emergence of various intelligent mobile applications demands the deployment of powerful deep learning models at resource-constrained mobile devices. The device-edge co-inference framework provides a promising solution by splitting a neural network at a mobile device and an edge computing server. In order to balance the on-device computation and the communication overhead, the splitting point n… ▽ More The emergence of various intelligent mobile applications demands the deployment of powerful deep learning models at resource-constrained mobile devices. The device-edge co-inference framework provides a promising solution by splitting a neural network at a mobile device and an edge computing server. In order to balance the on-device computation and the communication overhead, the splitting point needs to be carefully picked, while the intermediate feature needs to be compressed before transmission. Existing studies decoupled the design of model splitting, feature compression, and communication, which may lead to excessive resource consumption of the mobile device. In this paper, we introduce an end-to-end architecture, named BottleNet++, that consists of an encoder, a non-trainable channel layer, and a decoder for more efficient feature compression and transmission. The encoder and decoder essentially implement joint source-channel coding via convolutional neural networks (CNNs), while explicitly considering the effect of channel noise. By exploiting the strong sparsity and the fault-tolerant property of the intermediate feature in a deep neural network (DNN), BottleNet++ achieves a much higher compression ratio than existing methods. Furthermore, by providing the channel condition to the encoder as an input, our method enjoys a strong generalization ability in different channel conditions. Compared with merely transmitting intermediate data without feature compression, BottleNet++ achieves up to 64x bandwidth reduction over the additive white Gaussian noise channel and up to 256x bit compression ratio in the binary erasure channel, with less than 2% reduction in accuracy. With a higher compression ratio, BottleNet++ enables splitting a DNN at earlier layers, which leads to up to 3x reduction in on-device computation compared with other compression methods. △ Less

Submitted 5 June, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

arXiv:1910.06809 [pdf, other]

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Authors: Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li

Abstract: Semantic image synthesis aims at generating photorealistic images from semantic layouts. Previous approaches with conditional generative adversarial networks (GAN) show state-of-the-art performance on this task, which either feed the semantic label maps as inputs to the generator, or use them to modulate the activations in normalization layers via affine transformations. We argue that convolutiona… ▽ More Semantic image synthesis aims at generating photorealistic images from semantic layouts. Previous approaches with conditional generative adversarial networks (GAN) show state-of-the-art performance on this task, which either feed the semantic label maps as inputs to the generator, or use them to modulate the activations in normalization layers via affine transformations. We argue that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images. In order to better exploit the semantic layout for the image generator, we propose to predict convolutional kernels conditioned on the semantic label map to generate the intermediate feature maps from the noise maps and eventually generate the images. Moreover, we propose a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts than previous multi-scale discriminators. We achieve state-of-the-art results on both quantitative metrics and subjective evaluation on various semantic segmentation datasets, demonstrating the effectiveness of our approach. △ Less

Submitted 10 January, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

Comments: Accepted by NeurIPS 2019. Code is available soon at https://github.com/xh-liu/CC-FPSE

arXiv:1804.06745 [pdf, other]

Efficient Channel Estimator with Angle-Division Multiple Access

Authors: Xiaozhen Liu, Jin Sha, Hongxiang Xie, Feifei Gao, Shi Jin, Zaichen Zhang, Xiaohu You, Chuan Zhang

Abstract: Massive multiple-input multiple-output (M-MIMO) is an enabling technology of 5G wireless communication. The performance of an M-MIMO system is highly dependent on the speed and accuracy of obtaining the channel state information (CSI). The computational complexity of channel estimation for an M-MIMO system can be reduced by making use of the sparsity of the M-MIMO channel. In this paper, we propos… ▽ More Massive multiple-input multiple-output (M-MIMO) is an enabling technology of 5G wireless communication. The performance of an M-MIMO system is highly dependent on the speed and accuracy of obtaining the channel state information (CSI). The computational complexity of channel estimation for an M-MIMO system can be reduced by making use of the sparsity of the M-MIMO channel. In this paper, we propose the hardware-efficient channel estimator based on angle-division multiple access (ADMA) for the first time. Preamble, uplink (UL) and downlink (DL) training are also implemented. For further hardware-efficiency consideration, optimization regarding quantization and approximation strategies have been discussed. Implementation techniques such as pipelining and systolic processing are also employed for hardware regularity. Numerical results and FPGA implementation have demonstrated the advantages of the proposed channel estimator. △ Less

Submitted 17 April, 2018; originally announced April 2018.

Showing 1–36 of 36 results for author: Shao, J