-
Baichuan-Omni-1.5 Technical Report
Authors:
Yadong Li,
Jun Liu,
Tao Zhang,
Tao Zhang,
Song Chen,
Tianpeng Li,
Zehuan Li,
Lijun Liu,
Lingfeng Ming,
Guosheng Dong,
Da Pan,
Chong Li,
Yuanbo Fang,
Dongdong Kuang,
Mingrui Wang,
Chenglin Zhu,
Youwei Zhang,
Hongyu Guo,
Fengyu Zhang,
Yuran Wang,
Bowen Ding,
Wei Song,
Xu Li,
Yuqi Huo,
Zheng Liang
, et al. (68 additional authors not shown)
Abstract:
We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip…
▽ More
We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Gradient-Free Adversarial Purification with Diffusion Models
Authors:
Xuelong Dai,
Dong Wang,
Duan Mingxing,
Bin Xiao
Abstract:
Adversarial training and adversarial purification are two effective and practical defense methods to enhance a model's robustness against adversarial attacks. However, adversarial training necessitates additional training, while adversarial purification suffers from low time efficiency. More critically, current defenses are designed under the perturbation-based adversarial threat model, which is i…
▽ More
Adversarial training and adversarial purification are two effective and practical defense methods to enhance a model's robustness against adversarial attacks. However, adversarial training necessitates additional training, while adversarial purification suffers from low time efficiency. More critically, current defenses are designed under the perturbation-based adversarial threat model, which is ineffective against the recently proposed unrestricted adversarial attacks. In this paper, we propose an effective and efficient adversarial defense method that counters both perturbation-based and unrestricted adversarial attacks. Our defense is inspired by the observation that adversarial attacks are typically located near the decision boundary and are sensitive to pixel changes. To address this, we introduce adversarial anti-aliasing to mitigate adversarial modifications. Additionally, we propose adversarial super-resolution, which leverages prior knowledge from clean datasets to benignly recover images. These approaches do not require additional training and are computationally efficient without calculating gradients. Extensive experiments against both perturbation-based and unrestricted adversarial attacks demonstrate that our defense method outperforms state-of-the-art adversarial purification methods.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Fluid-Antenna Enhanced ISAC: Joint Antenna Positioning and Dual-Functional Beamforming Design under Perfect and Imperfect CSI
Authors:
Tian Hao,
Changxin Shi,
Qingqing Wu,
Bin Xia,
Yinghong Guo,
Lianghui Ding,
Feng Yang
Abstract:
Integrated sensing and communication (ISAC) emerges as an essential technique for overcoming spectrum congestion. However, the performance of traditional ISAC systems with fixed-position-antennas (FPA) is limited due to insufficient spatial degree of freedom (DoF) exploration. Recently, fluid antenna (FA) with reconfigurable antenna position is developed to enhance the sensing and communication pe…
▽ More
Integrated sensing and communication (ISAC) emerges as an essential technique for overcoming spectrum congestion. However, the performance of traditional ISAC systems with fixed-position-antennas (FPA) is limited due to insufficient spatial degree of freedom (DoF) exploration. Recently, fluid antenna (FA) with reconfigurable antenna position is developed to enhance the sensing and communication performance by reshaping the channel. This paper investigates an FA-enhanced ISAC system where a base station is equipped with multiple FAs to communicate with multiple single-antenna users and with FPAs to sense a point target. In this paper, we consider both perfect and imperfect channel state information (CSI) of the communication channel and sensing channel. In two cases, we focus on the maximization of the sensing signal-to-noise (SNR) by optimizing the positions of FAs and the dual-functional beamforming under the constraints of the FA moving region, the minimum FA distance and the minimum signal-to-interference-plus-noise (SINR) per user. Specifically, for the ideal case of perfect CSI, an iterative alternating optimization (AO) algorithm is proposed to tackle the formulated problem where the dual-functional beamforming and the FA positions are obtained via semidefinite relaxation (SDR) and successive convex approximation (SCA) techniques. Then, for the imperfect CSI case, we propose an AO-based iterative algorithm where $\mathcal{S}-$Procedure and SCA are applied to obtain the dual-functional beamforming and the FA positions. Furthermore, we analytically and numerically prove the convergence of the proposed algorithms. Numerical results demonstrate the notable gains of the proposed algorithms in the respective cases.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Fluid-Antenna Enhanced Integrated Sensing and Communication: Joint Antenna Positioning and Beamforming Design
Authors:
Tian Hao,
Changxin Shi,
Yinghong Guo,
Bin Xia,
Feng Yang
Abstract:
This paper investigates a fluid antenna (FA) enhanced integrated sensing and communication (ISAC) system consisting of a base station (BS), multiple single-antenna communication users, and one point target, where the BS is equipped with FAs to enhance both the communication and sensing performance. First, we formulate a problem that maximizes the radar signal-to-noise ratio (SNR) by jointly optimi…
▽ More
This paper investigates a fluid antenna (FA) enhanced integrated sensing and communication (ISAC) system consisting of a base station (BS), multiple single-antenna communication users, and one point target, where the BS is equipped with FAs to enhance both the communication and sensing performance. First, we formulate a problem that maximizes the radar signal-to-noise ratio (SNR) by jointly optimizing the FAs' positions and transmit beamforming matrix. Then, to tackle this highly non-convex problem, we present efficient algorithms by using alternating optimization (AO), successive convex approximation (SCA), and semi-definite relaxation (SDR). Numerical results demonstrate the convergence behavior and effectiveness of the proposed algorithm.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Beware of Overestimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals
Authors:
Xiran Xu,
Bo Wang,
Boda Xiao,
Yadong Niu,
Yiwen Wang,
Xihong Wu,
Jing Chen
Abstract:
Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were…
▽ More
Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were doubted by some researchers, and they argued that such decoding accuracy was overestimated due to the inherent temporal autocorrelation of EEG signals. However, the coupling between the stimulus-driven neural responses and the EEG temporal autocorrelations makes it difficult to confirm whether this overestimation exists in truth. Furthermore, the underlying pitfalls behind overestimated decoding accuracy have not been fully explained due to a lack of appropriate formulation. In this work, we formulate the pitfall in various EEG decoding tasks in a unified framework. EEG data were recorded from watermelons to remove stimulus-driven neural responses. Labels were assigned to continuous EEG according to the experimental design for EEG recording of several typical datasets, and then the decoding methods were conducted. The results showed the label can be successfully decoded as long as continuous EEG data with the same label were split into training and test sets. Further analysis indicated that high accuracy of various BCI decoding tasks could be achieved by associating labels with EEG intrinsic temporal autocorrelation features. These results underscore the importance of choosing the right experimental designs and data splits in BCI decoding tasks to prevent inflated accuracies due to EEG temporal autocorrelation.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks
Authors:
Bingnan Xiao,
Jingjing Zhang,
Wei Ni,
Xin Wang
Abstract:
Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and l…
▽ More
Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and local training iterations, adapting to their instantaneous computing powers. The convergence upper bound of FLARE is established rigorously under a general setting with non-convex models in the presence of non-i.i.d. datasets and imbalanced computing powers. By minimizing the upper bound, we further optimize the scheduling of FLARE to exploit the channel heterogeneity. A nested problem structure is revealed to facilitate iteratively allocating the bandwidth with binary search and selecting devices with a new greedy method. A linear problem structure is also identified and a low-complexity linear programming scheduling policy is designed when training models have large Lipschitz constants. Experiments demonstrate that FLARE consistently outperforms the baselines in test accuracy, and converges much faster with the proposed scheduling policy.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Framework for Safe Probabilistic Invariance Verification of Stochastic Dynamical Systems
Authors:
Taoran Wu,
Yiqing Yu,
Bican Xia,
Ji Wang,
Bai Xue
Abstract:
Ensuring safety through set invariance has proven to be a valuable method in various robotics and control applications. This paper introduces a comprehensive framework for the safe probabilistic invariance verification of both discrete- and continuous-time stochastic dynamical systems over an infinite time horizon. The objective is to ascertain the lower and upper bounds of liveness probabilities…
▽ More
Ensuring safety through set invariance has proven to be a valuable method in various robotics and control applications. This paper introduces a comprehensive framework for the safe probabilistic invariance verification of both discrete- and continuous-time stochastic dynamical systems over an infinite time horizon. The objective is to ascertain the lower and upper bounds of liveness probabilities for a given safe set and set of initial states. The liveness probability signifies the likelihood of the system remaining within the safe set indefinitely, starting from a state in the initial set. To address this problem, we propose optimizations for verifying safe probabilistic invariance in discrete-time and continuous-time stochastic dynamical systems. These optimizations are constructed via either using the Doob's nonnegative supermartingale inequality-based method or relaxing the equations described in [30,32], which can precisely characterize the probability of reaching a target set while avoiding unsafe states. Finally, we demonstrate the effectiveness of these optimizations through several examples using semi-definite programming tools.
△ Less
Submitted 3 August, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
Unified Predefined-time Stability Conditions of Nonlinear Systems with Lyapunov Analysis
Authors:
Bing Xiao,
Haichao Zhang,
Shijie Zhao,
Lu Cao
Abstract:
This brief gives a set of unified Lyapunov stability conditions to guarantee the predefined-time/finite-time stability of a dynamical systems. The derived Lyapunov theorem for autonomous systems establishes equivalence with existing theorems on predefined-time/finite-time stability. The findings proposed herein develop a nonsingular sliding mode control framework for an Euler-Lagrange system to an…
▽ More
This brief gives a set of unified Lyapunov stability conditions to guarantee the predefined-time/finite-time stability of a dynamical systems. The derived Lyapunov theorem for autonomous systems establishes equivalence with existing theorems on predefined-time/finite-time stability. The findings proposed herein develop a nonsingular sliding mode control framework for an Euler-Lagrange system to analyze its stability, and its upper bound for the settling time can be arbitrarily determined a priori through predefined time constant.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Low-Trace Adaptation of Zero-shot Self-supervised Blind Image Denoising
Authors:
Jintong Hu,
Bin Xia,
Bingchen Li,
Wenming Yang
Abstract:
Deep learning-based denoiser has been the focus of recent development on image denoising. In the past few years, there has been increasing interest in developing self-supervised denoising networks that only require noisy images, without the need for clean ground truth for training. However, a performance gap remains between current self-supervised methods and their supervised counterparts. Additio…
▽ More
Deep learning-based denoiser has been the focus of recent development on image denoising. In the past few years, there has been increasing interest in developing self-supervised denoising networks that only require noisy images, without the need for clean ground truth for training. However, a performance gap remains between current self-supervised methods and their supervised counterparts. Additionally, these methods commonly depend on assumptions about noise characteristics, thereby constraining their applicability in real-world scenarios. Inspired by the properties of the Frobenius norm expansion, we discover that incorporating a trace term reduces the optimization goal disparity between self-supervised and supervised methods, thereby enhancing the performance of self-supervised learning. To exploit this insight, we propose a trace-constraint loss function and design the low-trace adaptation Noise2Noise (LoTA-N2N) model that bridges the gap between self-supervised and supervised learning. Furthermore, we have discovered that several existing self-supervised denoising frameworks naturally fall within the proposed trace-constraint loss as subcases. Extensive experiments conducted on natural and confocal image datasets indicate that our method achieves state-of-the-art performance within the realm of zero-shot self-supervised image denoising approaches, without relying on any assumptions regarding the noise.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Multi-frequency antenna for quasi-isotropic radiator and 6G massive IoT
Authors:
Bing Xiao,
Hang Wong,
Kam Man Shum
Abstract:
An isotropic antenna radiates and receives electromagnetic wave uniformly in magnitude in 3D space. A multi-frequency quasi-isotropic antenna can serve as a practically feasible solution to emulate an ideal multi-frequency isotropic radiator. It is also an essential technology for mobile smart devices for massive IoT in the upcoming 6G. However, ever since the quasi-isotropic antenna was proposed…
▽ More
An isotropic antenna radiates and receives electromagnetic wave uniformly in magnitude in 3D space. A multi-frequency quasi-isotropic antenna can serve as a practically feasible solution to emulate an ideal multi-frequency isotropic radiator. It is also an essential technology for mobile smart devices for massive IoT in the upcoming 6G. However, ever since the quasi-isotropic antenna was proposed and achieved more than half a century ago, at most two discrete narrow frequency bands can be achieved, because of the significantly increased structural complexity from multi-frequency isotropic radiation. This limitation impedes numerous related electromagnetic experiments and the advances in wireless communication. Here, for the first time, a design method for multi-band (>2) quasi-isotropic antennas is proposed. An exemplified quasi-isotropic antenna with the desired four frequency bands is also presented for demonstration. The measured results validate excellent performance on both electromagnetics and wireless communications for this antenna.
△ Less
Submitted 18 December, 2023;
originally announced January 2024.
-
On Completeness of SDP-Based Barrier Certificate Synthesis over Unbounded Domains
Authors:
Hao Wu,
Shenghua Feng,
Ting Gan,
Jie Wang,
Bican Xia,
Naijun Zhan
Abstract:
Barrier certificates, serving as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by the Archimedean condition, which requir…
▽ More
Barrier certificates, serving as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by the Archimedean condition, which requires all variables to be bounded, i.e., systems are defined over bounded domains. For systems over unbounded domains, unfortunately, existing methods become incomplete and may fail to identify potential barrier certificates.
In this paper, we address this limitation for the unbounded cases. We first give a complete characterization of polynomial barrier certificates by using homogenization, a recent technique in the optimization community to reduce an unbounded optimization problem to a bounded one. Furthermore, motivated by this formulation, we introduce the definition of homogenized systems and propose a complete characterization of a family of non-polynomial barrier certificates with more expressive power. Experimental results demonstrate that our two approaches are more effective while maintaining a comparable level of efficiency.
△ Less
Submitted 8 July, 2024; v1 submitted 24 December, 2023;
originally announced December 2023.
-
Disturbance Rejection Control for Autonomous Trolley Collection Robots with Prescribed Performance
Authors:
Rui-Dong Xi,
Liang Lu,
Xue Zhang,
Xiao Xiao,
Bingyi Xia,
Jiankun Wang,
Max Q. -H. Meng
Abstract:
Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped di…
▽ More
Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped disturbances. On this basis, a robust controller with prescribed performance is proposed using a backstepping technique, which improves the transient performance and guarantees fast convergence. Simulation outcomes have been provided to illustrate the effectiveness of the proposed control scheme.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Semantic reconstruction of continuous language from MEG signals
Authors:
Bo Wang,
Xiran Xu,
Longxiang Zhang,
Boda Xiao,
Xihong Wu,
Jing Chen
Abstract:
Decoding language from neural signals holds considerable theoretical and practical importance. Previous research has indicated the feasibility of decoding text or speech from invasive neural signals. However, when using non-invasive neural signals, significant challenges are encountered due to their low quality. In this study, we proposed a data-driven approach for decoding semantic of language fr…
▽ More
Decoding language from neural signals holds considerable theoretical and practical importance. Previous research has indicated the feasibility of decoding text or speech from invasive neural signals. However, when using non-invasive neural signals, significant challenges are encountered due to their low quality. In this study, we proposed a data-driven approach for decoding semantic of language from Magnetoencephalography (MEG) signals recorded while subjects were listening to continuous speech. First, a multi-subject decoding model was trained using contrastive learning to reconstruct continuous word embeddings from MEG data. Subsequently, a beam search algorithm was adopted to generate text sequences based on the reconstructed word embeddings. Given a candidate sentence in the beam, a language model was used to predict the subsequent words. The word embeddings of the subsequent words were correlated with the reconstructed word embedding. These correlations were then used as a measure of the probability for the next word. The results showed that the proposed continuous word embedding model can effectively leverage both subject-specific and subject-shared information. Additionally, the decoded text exhibited significant similarity to the target text, with an average BERTScore of 0.816, a score comparable to that in the previous fMRI study.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Research on Damage Analysis of Key Parts of UAV Flight Control System
Authors:
Tianshun Li,
Huaimin Chen,
Ben Xiao,
Hao Li,
Shiyu Hao,
Di Hai,
Xuetong Wang
Abstract:
A set of hardware in the loop simulation methods based on the UAV model is proposed to create fault data, which is used to judge the parts where faults happen. Actual flight experimental data is utilized to prove the reliability of Simulink models. Then a series of typical faults with various amplitudes are injected into different channels of UAV parts in hardware in the loop simulation platform.…
▽ More
A set of hardware in the loop simulation methods based on the UAV model is proposed to create fault data, which is used to judge the parts where faults happen. Actual flight experimental data is utilized to prove the reliability of Simulink models. Then a series of typical faults with various amplitudes are injected into different channels of UAV parts in hardware in the loop simulation platform. Fault data is created this way, and the effect on UAV flight and task/control can be obtained through damage analysis. Typical fault characters are extracted, and those parts that have faults can be analyzed and judged. We can also know the trend that faults will develop and conclude the reasons for faults based on exterior performance, which supports precise attack and performance evaluation techniques.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
IRS-Enabled Covert and Reliable Communications: How Many Reflection Elements are Required?
Authors:
Manlin Wang,
Bin Xia,
Yao Yao,
Zhiyong Chen,
Jiangzhou Wang
Abstract:
Short-packet communications are applied to various scenarios where transmission covertness and reliability are crucial due to the open wireless medium and finite blocklength. Although intelligent reflection surface (IRS) has been widely utilized to enhance transmission covertness and reliability, the question of how many reflection elements at IRS are required remains unanswered, which is vital to…
▽ More
Short-packet communications are applied to various scenarios where transmission covertness and reliability are crucial due to the open wireless medium and finite blocklength. Although intelligent reflection surface (IRS) has been widely utilized to enhance transmission covertness and reliability, the question of how many reflection elements at IRS are required remains unanswered, which is vital to system design and practical deployment. The inherent strong coupling exists between the transmission covertness and reliability by IRS, leading to the question of intractability. To address this issue, the detection error probability at the warder and its approximation are derived first to reveal the relation between covertness performance and the number of reflection elements. Besides, to evaluate the reliability performance of the system, the decoding error probability at the receiver is also derived. Subsequently, the asymptotic reliability performance in high covertness regimes is investigated, which provides theoretical predictions about the number of reflection elements at IRS required to achieve a decoding error probability close to 0 with given covertness requirements. Furthermore, Monte-Carlo simulations verify the accuracy of the derived results for detection (decoding) error probabilities and the validity of the theoretical predictions for reflection elements. Moreover, results show that more reflection elements are required to achieve high reliability with tighter covertness requirements, longer blocklength and higher transmission rates.
△ Less
Submitted 9 September, 2023; v1 submitted 4 August, 2023;
originally announced August 2023.
-
Covert and Reliable Short-Packet Communications against A Proactive Warder
Authors:
Manlin Wang,
Yao Yao,
Bin Xia,
Zhiyong Chen,
Jiangzhou Wang
Abstract:
Wireless short-packet communications pose challenges to the security and reliability of the transmission. Besides, the proactive warder compounds these challenges, who detects and interferes with the potential transmission. An extra jamming channel is introduced by the proactive warder compared with the passive one, resulting in the inapplicability of analytical methods and results in exsiting wor…
▽ More
Wireless short-packet communications pose challenges to the security and reliability of the transmission. Besides, the proactive warder compounds these challenges, who detects and interferes with the potential transmission. An extra jamming channel is introduced by the proactive warder compared with the passive one, resulting in the inapplicability of analytical methods and results in exsiting works. Thus, effective system design schemes are required for short-packet communications against the proactive warder. To address this issue, we consider the analysis and design of covert and reliable transmissions for above systems. Specifically, to investigate the reliable and covert performance of the system, detection error probability at the warder and decoding error probability at the receiver are derived, which is affected by both the transmit power and the jamming power. Furthermore, to maximize the effective throughput, an optimization framework is proposed under reliability and covertness constraints. Numerical results verify the accuracy of analytical results and the feasibility of the optimization framework. It is shown that the tradeoff between transmission reliability and covertness is changed by the proactive warder compared with the passive one. Besides, it is shown that longer blocklength is always beneficial to improve the throughput for systems with optimized transmission rates. But when transmission rates are fixed, the blocklength should be carefully designed since the maximum one is not optimal in this case.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Over-The-Air Federated Learning: Status Quo, Open Challenges, and Future Directions
Authors:
Bingnan Xiao,
Xichen Yu,
Wei Ni,
Xin Wang,
H. Vincent Poor
Abstract:
The development of applications based on artificial intelligence and implemented over wireless networks is increasingly rapidly and is expected to grow dramatically in the future. The resulting demand for the aggregation of large amounts of data has caused serious communication bottlenecks in wireless networks and particularly at the network edge. Over-the-air federated learning (OTA-FL), leveragi…
▽ More
The development of applications based on artificial intelligence and implemented over wireless networks is increasingly rapidly and is expected to grow dramatically in the future. The resulting demand for the aggregation of large amounts of data has caused serious communication bottlenecks in wireless networks and particularly at the network edge. Over-the-air federated learning (OTA-FL), leveraging the superposition feature of multi-access channels (MACs), enables users at the network edge to share spectrum resources and achieves efficient and low-latency global model aggregation. This paper provides a holistic review of progress in OTA-FL and points to potential future research directions. Specifically, we classify OTA-FL from the perspective of system settings, including single-antenna OTA-FL, multi-antenna OTA-FL, and OTA-FL with the aid of the emerging reconfigurable intelligent surface (RIS) technology, and the contributions of existing works in these areas are summarized. Moreover, we discuss the trust, security and privacy aspects of OTA-FL, and highlight concerns arising from security and privacy. Finally, challenges and potential research directions are discussed to promote the future development of OTA-FL in terms of improving system performance, reliability, and trustworthiness. Specifical challenges to be addressed include model distortion under channel fading, the ineffective OTA aggregation of local models trained on substantially unbalanced data, and the limited accessibility and verifiability of individual local models.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
The contribution of T2 relaxation time to diffusion MRI quantification and its clinical implications: a hypothesis
Authors:
Yi Xiang J Wang,
Kai-Xuan Zhao,
Fu-Zhao Ma,
Ben-Heng Xiao
Abstract:
Considering liver as the reference, that both fast diffusion (PF) and slow diffusion (Dslow) of the spleen are much underestimated is likely due to the MRI properties of the spleen such as the much longer T2 relaxation time. It is possible that longer T2 relaxation time partially mitigates the signal decay effect of various gradients on diffusion weighted image. This phenomenon will not be limited…
▽ More
Considering liver as the reference, that both fast diffusion (PF) and slow diffusion (Dslow) of the spleen are much underestimated is likely due to the MRI properties of the spleen such as the much longer T2 relaxation time. It is possible that longer T2 relaxation time partially mitigates the signal decay effect of various gradients on diffusion weighted image. This phenomenon will not be limited to the spleen. Most liver tumors have a longer T2 relaxation time than their native normal tissue and this is considered to be associated with oedema. On the other hand, most tumors are measured with lower MRI diffusion (despite being oedematous). The reason why malignant tumors have lower diffusion value [apparent diffusion coefficient (ADC) and Dslow] are poorly understood but has been proposed to be related to a combination of higher cellularity, tissue disorganization, and increased extracellular space tortuosity. These explanations may be true, but it is also possible to that many tumors have MRI properties similar to the spleen such as longer T2 (relative to the liver) and these MRI properties may also contribute to the lower MRI measured ADC and Dslow . In other words, if we could hypothetically plant a piece of spleen tissue in the liver, MRI would recognize this planted spleen tissue as being similar to a tumor and measure it to have lower diffusion than the liver.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Authors:
Ziyi Yang,
Mahmoud Khademi,
Yichong Xu,
Reid Pryzant,
Yuwei Fang,
Chenguang Zhu,
Dongdong Chen,
Yao Qian,
Mei Gao,
Yi-Ling Chen,
Robert Gmyr,
Naoyuki Kanda,
Noel Codella,
Bin Xiao,
Yu Shi,
Lu Yuan,
Takuya Yoshioka,
Michael Zeng,
Xuedong Huang
Abstract:
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a…
▽ More
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Fusion-Based Multi-User Semantic Communications for Wireless Image Transmission over Degraded Broadcast Channels
Authors:
Tong Wu,
Zhiyong Chen,
Meixia Tao,
Bin Xia,
Wenjun Zhang
Abstract:
Degraded broadcast channels (DBC) are a typical multi-user communications scenario. There exist classic transmission methods, such as superposition coding with successive interference cancellation, to achieve the DBC capacity region. However, semantic communications method over DBC remains lack of in-depth research. To address this, we design a fusion-based multi-user semantic communications syste…
▽ More
Degraded broadcast channels (DBC) are a typical multi-user communications scenario. There exist classic transmission methods, such as superposition coding with successive interference cancellation, to achieve the DBC capacity region. However, semantic communications method over DBC remains lack of in-depth research. To address this, we design a fusion-based multi-user semantic communications system for wireless image transmission over DBC in this paper. The proposed architecture supports a transmitter extracting semantic features for two users separately, and learns to dynamically fuse these semantic features into a joint latent representation for broadcasting. The key here is to design a flexible image semantic fusion (FISF) module to fuse the semantic features of two users, and to use a multi-layer perceptron (MLP) based neural network to adjust the weights of different user semantic features for flexible adaptability to different users channels. Experiments present the semantic performance region based on the peak signal-to-noise ratio (PSNR) of both users, and show that the proposed system dominates the traditional methods.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Knowledge Distillation based Degradation Estimation for Blind Super-Resolution
Authors:
Bin Xia,
Yulun Zhang,
Yitong Wang,
Yapeng Tian,
Wenming Yang,
Radu Timofte,
Luc Van Gool
Abstract:
Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (e.g., blur, noise, jpeg compression) to…
▽ More
Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (e.g., blur, noise, jpeg compression) to supervise the degradation estimator training. In addition, these special designs for certain degradation, such as blur, impedes the models from being generalized to handle different degradations. To this end, it is necessary to design an implicit degradation estimator that can extract discriminative degradation representation for all degradations without relying on the supervision of degradation ground-truth. In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network. To learn the KDSR model, we first train a teacher network: KD-IDE$_{T}$. It takes paired HR and LR patches as inputs and is optimized with the SR network jointly. Then, we further train a student network KD-IDE$_{S}$, which only takes LR images as input and learns to extract the same implicit degradation representation (IDR) as KD-IDE$_{T}$. In addition, to fully use extracted IDR, we design a simple, strong, and efficient IDR based dynamic convolution residual block (IDR-DCRB) to build an SR network. We conduct extensive experiments under classic and real-world degradation settings. The results show that KDSR achieves SOTA performance and can generalize to various degradation processes. The source codes and pre-trained models will be released.
△ Less
Submitted 16 February, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
A 2030 United States Macro Grid Unlocking Geographical Diversity to Accomplish Clean Energy Goals
Authors:
Yixing Xu,
Daniel Olsen,
Bainan Xia,
Dan Livengood,
Victoria Hunt,
Yifan Li,
Lane Smith
Abstract:
Some U.S. states have set clean energy goals and targets in an effort to decarbonize their electricity sectors. There are many reasons for such goals and targets, including the increasingly apparent effects of climate change. A handful of states (Washington, California, New York, and Virginia) are aiming for deep decarbonization by 2050 or earlier, a mere 30 years or less from today. The urgency o…
▽ More
Some U.S. states have set clean energy goals and targets in an effort to decarbonize their electricity sectors. There are many reasons for such goals and targets, including the increasingly apparent effects of climate change. A handful of states (Washington, California, New York, and Virginia) are aiming for deep decarbonization by 2050 or earlier, a mere 30 years or less from today. The urgency of substantial carbon emissions reduction (50% or more by 2030) needed to avoid catastrophic climate impacts requires even more ambitious efforts than some of the original targets (e.g., a 30% renewable portfolio standard) set for between now and 2030. With the cost of solar and wind energy falling faster than expected in recent years, economics are also driving rapid expansion of clean energy investments. With this in mind, this report examines combinations of interregional AC and High-Voltage DC (HVDC) transmission upgrades and additions to evaluate the benefits of large-scale transmission expansion.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Basic Binary Convolution Unit for Binarized Image Restoration Network
Authors:
Bin Xia,
Yulun Zhang,
Yitong Wang,
Yapeng Tian,
Wenming Yang,
Radu Timofte,
Luc Van Gool
Abstract:
Lighter and faster image restoration (IR) models are crucial for the deployment on resource-limited devices. Binary neural network (BNN), one of the most promising model compression methods, can dramatically reduce the computations and parameters of full-precision convolutional neural networks (CNN). However, there are different properties between BNN and full-precision CNN, and we can hardly use…
▽ More
Lighter and faster image restoration (IR) models are crucial for the deployment on resource-limited devices. Binary neural network (BNN), one of the most promising model compression methods, can dramatically reduce the computations and parameters of full-precision convolutional neural networks (CNN). However, there are different properties between BNN and full-precision CNN, and we can hardly use the experience of designing CNN to develop BNN. In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks. We conduct systematic analyses to explain each component's role in binary convolution and discuss the pitfalls. Specifically, we find that residual connection can reduce the information loss caused by binarization; BatchNorm can solve the value range gap between residual connection and binary convolution; The position of the activation function dramatically affects the performance of BNN. Based on our findings and analyses, we design a simple yet efficient basic binary convolution unit (BBCU). Furthermore, we divide IR networks into four parts and specially design variants of BBCU for each part to explore the benefit of binarizing these parts. We conduct experiments on different IR tasks, and our BBCU significantly outperforms other BNNs and lightweight models, which shows that BBCU can serve as a basic unit for binarized IR networks. All codes and models will be released.
△ Less
Submitted 16 February, 2023; v1 submitted 1 October, 2022;
originally announced October 2022.
-
Image Synthesis with Disentangled Attributes for Chest X-Ray Nodule Augmentation and Detection
Authors:
Zhenrong Shen,
Xi Ouyang,
Bin Xiao,
Jie-Zhi Cheng,
Qian Wang,
Dinggang Shen
Abstract:
Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung…
▽ More
Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung nodule synthesis methods are proposed for the sake of data augmentation. Nevertheless, previous methods lack the ability to generate nodules that are realistic with the size attribute desired by the detector. To address this issue, we introduce a novel lung nodule synthesis framework in this paper, which decomposes nodule attributes into three main aspects including shape, size, and texture, respectively. A GAN-based Shape Generator firstly models nodule shapes by generating diverse shape masks. The following Size Modulation then enables quantitative control on the diameters of the generated nodule shapes in pixel-level granularity. A coarse-to-fine gated convolutional Texture Generator finally synthesizes visually plausible nodule textures conditioned on the modulated shape masks. Moreover, we propose to synthesize nodule CXR images by controlling the disentangled nodule attributes for data augmentation, in order to better compensate for the nodules that are easily missed in the detection task. Our experiments demonstrate the enhanced image quality, diversity, and controllability of the proposed lung nodule synthesis framework. We also validate the effectiveness of our data augmentation on greatly improving nodule detection performance.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Structured Sparsity Learning for Efficient Video Super-Resolution
Authors:
Bin Xia,
Jingwen He,
Yulun Zhang,
Yitong Wang,
Yapeng Tian,
Wenming Yang,
Luc Van Gool
Abstract:
The high computational costs of video super-resolution (VSR) models hinder their deployment on resource-limited devices, (e.g., smartphones and drones). Existing VSR models contain considerable redundant filters, which drag down the inference efficiency. To prune these unimportant filters, we develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties…
▽ More
The high computational costs of video super-resolution (VSR) models hinder their deployment on resource-limited devices, (e.g., smartphones and drones). Existing VSR models contain considerable redundant filters, which drag down the inference efficiency. To prune these unimportant filters, we develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties of VSR. In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks. Specifically, we develop a Residual Sparsity Connection (RSC) scheme for residual blocks of recurrent networks to liberate pruning restrictions and preserve the restoration information. For upsampling networks, we design a pixel-shuffle pruning scheme to guarantee the accuracy of feature channel-space conversion. In addition, we observe that pruning error would be amplified as the hidden states propagate along with recurrent networks. To alleviate the issue, we design Temporal Finetuning (TF). Extensive experiments show that SSL can significantly outperform recent methods quantitatively and qualitatively.
△ Less
Submitted 25 March, 2023; v1 submitted 15 June, 2022;
originally announced June 2022.
-
How Much Demand Flexibility Could Have Spared Texas from the 2021 Outage?
Authors:
Dongqi Wu,
Xiangtian Zheng,
Ali Menati,
Lane Smith,
Bainan Xia,
Yixing Xu,
Chanan Singh,
Le Xie
Abstract:
The February 2021 Texas winter power outage has led to hundreds of deaths and billions of dollars in economic losses, largely due to the generation failure and record-breaking electric demand. In this paper, we study the scaling-up of demand flexibility as a means to avoid load shedding during such an extreme weather event. The three mechanisms considered are interruptible load, residential load r…
▽ More
The February 2021 Texas winter power outage has led to hundreds of deaths and billions of dollars in economic losses, largely due to the generation failure and record-breaking electric demand. In this paper, we study the scaling-up of demand flexibility as a means to avoid load shedding during such an extreme weather event. The three mechanisms considered are interruptible load, residential load rationing, and incentive-based demand response. By simulating on a synthetic but realistic large-scale Texas grid model along with demand flexibility modeling and electricity outage data, we identify portfolios of mixing mechanisms that exactly avoid outages, which a single mechanism may fail due to decaying marginal effects. We also reveal a complementary relationship between interruptible load and residential load rationing and find nonlinear impacts of incentive-based demand response on the efficacy of other mechanisms.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Physical-World Optical Adversarial Attacks on 3D Face Recognition
Authors:
Yanjie Li,
Yiquan Li,
Xuelong Dai,
Songtao Guo,
Bin Xiao
Abstract:
2D face recognition has been proven insecure for physical adversarial attacks. However, few studies have investigated the possibility of attacking real-world 3D face recognition systems. 3D-printed attacks recently proposed cannot generate adversarial points in the air. In this paper, we attack 3D face recognition systems through elaborate optical noises. We took structured light 3D scanners as ou…
▽ More
2D face recognition has been proven insecure for physical adversarial attacks. However, few studies have investigated the possibility of attacking real-world 3D face recognition systems. 3D-printed attacks recently proposed cannot generate adversarial points in the air. In this paper, we attack 3D face recognition systems through elaborate optical noises. We took structured light 3D scanners as our attack target. End-to-end attack algorithms are designed to generate adversarial illumination for 3D faces through the inherent or an additional projector to produce adversarial points at arbitrary positions. Nevertheless, face reflectance is a complex procedure because the skin is translucent. To involve this projection-and-capture procedure in optimization loops, we model it by Lambertian rendering model and use SfSNet to estimate the albedo. Moreover, to improve the resistance to distance and angle changes while maintaining the perturbation unnoticeable, a 3D transform invariant loss and two kinds of sensitivity maps are introduced. Experiments are conducted in both simulated and physical worlds. We successfully attacked point-cloud-based and depth-image-based 3D face recognition algorithms while needing fewer perturbations than previous state-of-the-art physical-world 3D adversarial attacks.
△ Less
Submitted 13 November, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Damage Maximization for Combat Network with Limited Costs
Authors:
Jintao Yu,
Bing Xiao,
Yuzhu Cui
Abstract:
Maximizing the damage by attacking specific nodes of the combat network can efficiently disrupt enemies' defense capability, protect our critical units, and enhance the resistance to the destruction of system-of-system~(SOS). However, the modeling of the combat network damage is not practical enough. In this paper, we report a more realistic model to study the combat network damage maximization pr…
▽ More
Maximizing the damage by attacking specific nodes of the combat network can efficiently disrupt enemies' defense capability, protect our critical units, and enhance the resistance to the destruction of system-of-system~(SOS). However, the modeling of the combat network damage is not practical enough. In this paper, we report a more realistic model to study the combat network damage maximization problems. By analyzing realistic situations, the cost of damage is redefined based on the network topology and the functional characteristics of nodes. The damage effect is also updated according to the combat network topology and operational capability. Hence, a cost-limited damage maximization model for the combat network is constructed. In addition, to obtain optimal solutions, an improved genetic algorithm~(IPGA) based on prior information is proposed. As a result, our method has a significant advantage in the feasibility and effectiveness compared with other algorithms in experiments. The attack pattern of the combat network and the convergence and complexity of the proposed algorithm are further explored. The improved model and algorithm, as well as the mined attack patterns, can provide support for military decisions.
△ Less
Submitted 23 December, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
Robustness of double-layer group-dependent combat network with cascading failure
Authors:
Jintao Yu,
Bing Xiao,
Yuzhu Cui
Abstract:
The networked combat system-of-system (CSOS) is the trend of combat development with the innovation of technology. The achievement of combat effectiveness requires CSOS to have a good ability to deal with external interference. Here we report a modeling method of CSOS from the perspective of complex networks and explore the robustness of the combat network based on this. Firstly, a more realistic…
▽ More
The networked combat system-of-system (CSOS) is the trend of combat development with the innovation of technology. The achievement of combat effectiveness requires CSOS to have a good ability to deal with external interference. Here we report a modeling method of CSOS from the perspective of complex networks and explore the robustness of the combat network based on this. Firstly, a more realistic double-layer heterogeneous dependent combat network model is established. Then, the conditional group dependency situation is considered to design failure rules for dependent failure, and the coupling relation between the double-layer subnets is analyzed for cascading failure. Based on this, the initial load and capacity of the node are defined, respectively, as well as the load redistribution strategy and the status judgment rules for the cascading failure model. Simulation experiments are carried out by changing the attack modes and different parameters, and the results show that the robustness of the combat network can be effectively improved by improving the tolerance limit of one-way dependency of the functional net, the node capacity of the functional subnet and the tolerance of the overload state. The conclusions of this paper can provide a useful reference for network structure optimization and network security protection in the military field.
△ Less
Submitted 9 December, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
i-Code: An Integrative and Composable Multimodal Learning Framework
Authors:
Ziyi Yang,
Yuwei Fang,
Chenguang Zhu,
Reid Pryzant,
Dongdong Chen,
Yu Shi,
Yichong Xu,
Yao Qian,
Mei Gao,
Yi-Ling Chen,
Liyang Lu,
Yujia Xie,
Robert Gmyr,
Noel Codella,
Naoyuki Kanda,
Bin Xiao,
Lu Yuan,
Takuya Yoshioka,
Michael Zeng,
Xuedong Huang
Abstract:
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. I…
▽ More
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.
△ Less
Submitted 5 May, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Polar Transformation Based Multiple Instance Learning Assisting Weakly Supervised Image Segmentation With Loose Bounding Box Annotations
Authors:
Juan Wang,
Bin Xia
Abstract:
This study investigates weakly supervised image segmentation using loose bounding box supervision. It presents a multiple instance learning strategy based on polar transformation to assist image segmentation when loose bounding boxes are employed as supervision. In this strategy, weighted smooth maximum approximation is introduced to incorporate the observation that pixels closer to the origin of…
▽ More
This study investigates weakly supervised image segmentation using loose bounding box supervision. It presents a multiple instance learning strategy based on polar transformation to assist image segmentation when loose bounding boxes are employed as supervision. In this strategy, weighted smooth maximum approximation is introduced to incorporate the observation that pixels closer to the origin of the polar transformation are more likely to belong to the object in the bounding box. The proposed approach was evaluated on a public medical dataset using Dice coefficient. The results demonstrate its superior performance. The codes are available at \url{https://github.com/wangjuan313/wsis-polartransform}.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Human-like Driving Decision at Unsignalized Intersections Based on Game Theory
Authors:
Daofei Li,
Guanming Liu,
Bin Xiao
Abstract:
Unsignalized intersection driving is challenging for automated vehicles. For safe and efficient performances, the diverse and dynamic behaviors of interacting vehicles should be considered. Based on a game-theoretic framework, a human-like payoff design methodology is proposed for the automated decision at unsignalized intersections. Prospect Theory is introduced to map the objective collision ris…
▽ More
Unsignalized intersection driving is challenging for automated vehicles. For safe and efficient performances, the diverse and dynamic behaviors of interacting vehicles should be considered. Based on a game-theoretic framework, a human-like payoff design methodology is proposed for the automated decision at unsignalized intersections. Prospect Theory is introduced to map the objective collision risk to the subjective driver payoffs, and the driving style can be quantified as a tradeoff between safety and speed. To account for the dynamics of interaction, a probabilistic model is further introduced to describe the acceleration tendency of drivers. Simulation results show that the proposed decision algorithm can describe the dynamic process of two-vehicle interaction in limit cases. Statistics of uniformly-sampled cases simulation indicate that the success rate of safe interaction reaches 98%, while the speed efficiency can also be guaranteed. The proposed approach is further applied and validated in four-vehicle interaction scenarios at a four-arm intersection.
△ Less
Submitted 9 January, 2022; v1 submitted 12 December, 2021;
originally announced December 2021.
-
Generating Unrestricted 3D Adversarial Point Clouds
Authors:
Xuelong Dai,
Yanjie Li,
Hua Dai,
Bin Xiao
Abstract:
Utilizing 3D point cloud data has become an urgent need for the deployment of artificial intelligence in many areas like facial recognition and self-driving. However, deep learning for 3D point clouds is still vulnerable to adversarial attacks, e.g., iterative attacks, point transformation attacks, and generative attacks. These attacks need to restrict perturbations of adversarial examples within…
▽ More
Utilizing 3D point cloud data has become an urgent need for the deployment of artificial intelligence in many areas like facial recognition and self-driving. However, deep learning for 3D point clouds is still vulnerable to adversarial attacks, e.g., iterative attacks, point transformation attacks, and generative attacks. These attacks need to restrict perturbations of adversarial examples within a strict bound, leading to the unrealistic adversarial 3D point clouds. In this paper, we propose an Adversarial Graph-Convolutional Generative Adversarial Network (AdvGCGAN) to generate visually realistic adversarial 3D point clouds from scratch. Specifically, we use a graph convolutional generator and a discriminator with an auxiliary classifier to generate realistic point clouds, which learn the latent distribution from the real 3D data. The unrestricted adversarial attack loss is incorporated in the special adversarial training of GAN, which enables the generator to generate the adversarial examples to spoof the target network. Compared with the existing state-of-art attack methods, the experiment results demonstrate the effectiveness of our unrestricted adversarial attack methods with a higher attack success rate and visual quality. Additionally, the proposed AdvGCGAN can achieve better performance against defense models and better transferability than existing attack methods with strong camouflage.
△ Less
Submitted 18 November, 2021; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Deep Learning-based Physical-Layer Secret Key Generation for FDD Systems
Authors:
Xinwei Zhang,
Guyue Li,
Junqing Zhang,
Aiqun Hu,
Zongyue Hou,
Bin Xiao
Abstract:
Physical-layer key generation (PKG) establishes cryptographic keys from highly correlated measurements of wireless channels, which relies on reciprocal channel characteristics between uplink and downlink, is a promising wireless security technique for Internet of Things (IoT). However, it is challenging to extract common features in frequency division duplexing (FDD) systems as uplink and downlink…
▽ More
Physical-layer key generation (PKG) establishes cryptographic keys from highly correlated measurements of wireless channels, which relies on reciprocal channel characteristics between uplink and downlink, is a promising wireless security technique for Internet of Things (IoT). However, it is challenging to extract common features in frequency division duplexing (FDD) systems as uplink and downlink transmissions operate at different frequency bands whose channel frequency responses are not reciprocal any more. Existing PKG methods for FDD systems have many limitations, i.e., high overhead and security problems. This paper proposes a novel PKG scheme that uses the feature mapping function between different frequency bands obtained by deep learning to make two users generate highly similar channel features in FDD systems. In particular, this is the first time to apply deep learning for PKG in FDD systems. We first prove the existence of the band feature mapping function for a given environment and a feedforward network with a single hidden layer can approximate the mapping function. Then a Key Generation neural Network (KGNet) is proposed for reciprocal channel feature construction, and a key generation scheme based on the KGNet is also proposed. Numerical results verify the excellent performance of the KGNet-based key generation scheme in terms of randomness, key generation ratio, and key error rate. Besides, the overhead analysis shows that the method proposed in this paper can be used for resource-contrained IoT devices in FDD systems.
△ Less
Submitted 30 August, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Shaping Advice in Deep Multi-Agent Reinforcement Learning
Authors:
Baicen Xiao,
Bhaskar Ramasubramanian,
Radha Poovendran
Abstract:
Multi-agent reinforcement learning involves multiple agents interacting with each other and a shared environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose a method called Shaping Advice in deep Multi-agent reinforcement…
▽ More
Multi-agent reinforcement learning involves multiple agents interacting with each other and a shared environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose a method called Shaping Advice in deep Multi-agent reinforcement learning (SAM) to augment the reward signal from the environment with an additional reward termed shaping advice. The shaping advice is given by a difference of potential functions at consecutive time-steps. Each potential function is a function of observations and actions of the agents. The shaping advice needs to be specified only once at the start of training, and can be easily provided by non-experts. We show through theoretical analyses and experimental validation that shaping advice provided by SAM does not distract agents from completing tasks specified by the environment reward. Theoretically, we prove that convergence of policy gradients and value functions when using SAM implies convergence of these quantities in the absence of SAM. Experimentally, we evaluate SAM on three tasks in the multi-agent Particle World environment that have sparse rewards. We observe that using SAM results in agents learning policies to complete tasks faster, and obtain higher rewards than: i) using sparse rewards alone; ii) a state-of-the-art reward redistribution method.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Switching Controller Synthesis for Delay Hybrid Systems under Perturbations
Authors:
Yunjun Bai,
Ting Gan,
Li Jiao,
Bican Xia,
Bai Xue,
Naijun Zhan
Abstract:
Delays are ubiquitous in modern hybrid systems, which exhibit both continuous and discrete dynamical behaviors. Induced by signal transmission, conversion, the nature of plants, and so on, delays may appear either in the continuous evolution of a hybrid system such that the evolution depends not only on the present state but also on its execution history, or in the discrete switching between its d…
▽ More
Delays are ubiquitous in modern hybrid systems, which exhibit both continuous and discrete dynamical behaviors. Induced by signal transmission, conversion, the nature of plants, and so on, delays may appear either in the continuous evolution of a hybrid system such that the evolution depends not only on the present state but also on its execution history, or in the discrete switching between its different control modes. In this paper we come up with a new model of hybrid systems, called \emph{delay hybrid automata}, to capture the dynamics of systems with the aforementioned two kinds of delays. Furthermore, based upon this model we study the robust switching controller synthesis problem such that the controlled delay system is able to satisfy the specified safety properties regardless of perturbations. To the end, a novel method is proposed to synthesize switching controllers based on the computation of differential invariants for continuous evolution and backward reachable sets of discrete jumps with delays. Finally, we implement a prototypical tool of our approach and demonstrate it on some case studies.
△ Less
Submitted 21 March, 2021;
originally announced March 2021.
-
The upgrade of EAST Safety and Interlock system
Authors:
Z. C. Zhang,
B. J. Xiao,
Z. S. Ji,
Y. Wang,
F. Xia,
Z. H. Xu
Abstract:
The Experimental Advanced Superconducting Tokamak (EAST), a nation-level large-scale scientific project of China, plays a key role for the research of peaceful utilizations of fusion energy. The safety and interlock system (SIS) is in charge of the supervision and control of all the EAST components involved in the protection of human and tokamak from potential accidents. With the development of ph…
▽ More
The Experimental Advanced Superconducting Tokamak (EAST), a nation-level large-scale scientific project of China, plays a key role for the research of peaceful utilizations of fusion energy. The safety and interlock system (SIS) is in charge of the supervision and control of all the EAST components involved in the protection of human and tokamak from potential accidents. With the development of physical experiment, the SIS had come close to reaching its limits for expandability. Therefore, a prototype for upgrading EAST SIS has been designed, and a fast architecture based on COTS FPGA is absorbed into the new SIS. This paper presents EAST machine and human protection mechanism and the architecture of the upgrading safety and interlock system.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
Safety-Critical Online Control with Adversarial Disturbances
Authors:
Bhaskar Ramasubramanian,
Baicen Xiao,
Linda Bushnell,
Radha Poovendran
Abstract:
This paper studies the control of safety-critical dynamical systems in the presence of adversarial disturbances. We seek to synthesize state-feedback controllers to minimize a cost incurred due to the disturbance, while respecting a safety constraint. The safety constraint is given by a bound on an H-inf norm, while the cost is specified as an upper bound on the H-2 norm of the system. We consider…
▽ More
This paper studies the control of safety-critical dynamical systems in the presence of adversarial disturbances. We seek to synthesize state-feedback controllers to minimize a cost incurred due to the disturbance, while respecting a safety constraint. The safety constraint is given by a bound on an H-inf norm, while the cost is specified as an upper bound on the H-2 norm of the system. We consider an online setting where costs at each time are revealed only after the controller at that time is chosen. We propose an iterative approach to the synthesis of the controller by solving a modified discrete-time Riccati equation. Solutions of this equation enforce the safety constraint. We compare the cost of this controller with that of the optimal controller when one has complete knowledge of disturbances and costs in hindsight. We show that the regret function, which is defined as the difference between these costs, varies logarithmically with the time horizon. We validate our approach on a process control setup that is subject to two kinds of adversarial attacks.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
CDE-GAN: Cooperative Dual Evolution Based Generative Adversarial Network
Authors:
Shiming Chen,
Wenjie Wang,
Beihao Xia,
Xinge You,
Zehong Cao,
Weiping Ding
Abstract:
Generative adversarial networks (GANs) have been a popular deep generative model for real-world applications. Despite many recent efforts on GANs that have been contributed, mode collapse and instability of GANs are still open problems caused by their adversarial optimization difficulties. In this paper, motivated by the cooperative co-evolutionary algorithm, we propose a Cooperative Dual Evolutio…
▽ More
Generative adversarial networks (GANs) have been a popular deep generative model for real-world applications. Despite many recent efforts on GANs that have been contributed, mode collapse and instability of GANs are still open problems caused by their adversarial optimization difficulties. In this paper, motivated by the cooperative co-evolutionary algorithm, we propose a Cooperative Dual Evolution based Generative Adversarial Network (CDE-GAN) to circumvent these drawbacks. In essence, CDE-GAN incorporates dual evolution with respect to the generator(s) and discriminators into a unified evolutionary adversarial framework to conduct effective adversarial multi-objective optimization. Thus it exploits the complementary properties and injects dual mutation diversity into training to steadily diversify the estimated density in capturing multi-modes and improve generative performance. Specifically, CDE-GAN decomposes the complex adversarial optimization problem into two subproblems (generation and discrimination), and each subproblem is solved with a separated subpopulation (E-Generator} and E-Discriminators), evolved by its own evolutionary algorithm. Additionally, we further propose a Soft Mechanism to balance the trade-off between E-Generators and E-Discriminators to conduct steady training for CDE-GAN. Extensive experiments on one synthetic dataset and three real-world benchmark image datasets demonstrate that the proposed CDE-GAN achieves a competitive and superior performance in generating good quality and diverse samples over baselines. The code and more generated results are available at our project homepage: https://shiming-chen.github.io/CDE-GAN-website/CDE-GAN.html.
△ Less
Submitted 23 March, 2021; v1 submitted 21 August, 2020;
originally announced August 2020.
-
Design of Small Multi-band Full-screen Smartwatch Antenna for IoT applications
Authors:
Bing Xiao,
Hang Wong,
Di Wu,
Kwan L. Yeung
Abstract:
Smartwatch is a potential candidate for the Internet of Things (IoT) hub. However, the performance of smartwatch antennas is severely restricted by the smartwatch structure, especially when the antennas are designed by traditional methods. For adapting smartwatches to the role of IoT hub, a novel method of designing multi-band smartwatch antenna is presented in this paper, aiming at increasing the…
▽ More
Smartwatch is a potential candidate for the Internet of Things (IoT) hub. However, the performance of smartwatch antennas is severely restricted by the smartwatch structure, especially when the antennas are designed by traditional methods. For adapting smartwatches to the role of IoT hub, a novel method of designing multi-band smartwatch antenna is presented in this paper, aiming at increasing the number of frequency bands, omni-directivity, and structural suitability. Firstly, the fundamental structure (including the full screen and the system PCB) of the smartwatch is analyzed as a whole by characteristic mode analysis (CMA). Thus, abundant resources of characteristic modes are introduced. The fundamental structure is then modified as the radiator of a multi-band antenna. Then, a non-radiating capacitive coupling element (CCE) excites the desired four 0.5-wavelength modes from this structure. This method could fully utilize the intrinsic modes of the smartwatch structure itself, thus exhibits multiple advantages: significantly small size, smaller ground, omni-directional radiation, and fitting to the full-screen smartwatch structure.
△ Less
Submitted 23 May, 2021; v1 submitted 29 December, 2019;
originally announced December 2019.
-
Context-endcoding for neural network based skull stripping in magnetic resonance imaging
Authors:
Zhen Liu,
Borui Xiao,
Yuemeng Li,
Yong Fan
Abstract:
Skull stripping is usually the first step for most brain analysisprocess in magnetic resonance images. A lot of deep learn-ing neural network based methods have been developed toachieve higher accuracy. Since the 3D deep learning modelssuffer from high computational cost and are subject to GPUmemory limit challenge, a variety of 2D deep learning meth-ods have been developed. However, existing 2D d…
▽ More
Skull stripping is usually the first step for most brain analysisprocess in magnetic resonance images. A lot of deep learn-ing neural network based methods have been developed toachieve higher accuracy. Since the 3D deep learning modelssuffer from high computational cost and are subject to GPUmemory limit challenge, a variety of 2D deep learning meth-ods have been developed. However, existing 2D deep learn-ing methods are not equipped to effectively capture 3D se-mantic information that is needed to achieve higher accuracy.In this paper, we propose a context-encoding method to em-power the 2D network to capture the 3D context information.For the context-encoding method, firstly we encode the 2Dfeatures of original 2D network, secondly we encode the sub-volume of 3D MRI images, finally we fuse the encoded 2Dfeatures and 3D features with semantic encoding classifica-tion loss. To get computational efficiency, although we en-code the sub-volume of 3D MRI images instead of buildinga 3D neural network, extensive experiments on three bench-mark Datasets demonstrate our method can achieve superioraccuracy to state-of-the-art alternative methods with the dicescore 99.6% on NFBS and 99.09 % on LPBA40 and 99.17 %on OASIS.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Authors:
Bowen Cheng,
Bin Xiao,
Jingdong Wang,
Honghui Shi,
Thomas S. Huang,
Lei Zhang
Abstract:
Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation…
▽ More
Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene. The code and models are available at https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation.
△ Less
Submitted 12 March, 2020; v1 submitted 27 August, 2019;
originally announced August 2019.
-
Potential-Based Advice for Stochastic Policy Learning
Authors:
Baicen Xiao,
Bhaskar Ramasubramanian,
Andrew Clark,
Hannaneh Hajishirzi,
Linda Bushnell,
Radha Poovendran
Abstract:
This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies. We show that a potential-based reward shaping scheme is able to preserve optimality of stochastic policies, and demonstrate that the ability of an agent to learn an optimal policy is not affected when this scheme is augmented to…
▽ More
This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies. We show that a potential-based reward shaping scheme is able to preserve optimality of stochastic policies, and demonstrate that the ability of an agent to learn an optimal policy is not affected when this scheme is augmented to soft Q-learning. We propose a method to impart potential based advice schemes to policy gradient algorithms. An algorithm that considers an advantage actor-critic architecture augmented with this scheme is proposed, and we give guarantees on its convergence. Finally, we evaluate our approach on a puddle-jump grid world with indistinguishable states, and the continuous state and action mountain car environment from classical control. Our results indicate that these schemes allow the agent to learn a stochastic optimal policy faster and obtain a higher average reward.
△ Less
Submitted 20 July, 2019;
originally announced July 2019.
-
QFlow: A Learning Approach to High QoE Video Streaming at the Wireless Edge
Authors:
Rajarshi Bhattacharyya,
Archana Bura,
Desik Rengarajan,
Mason Rumuly,
Bainan Xia,
Srinivas Shakkottai,
Dileep Kalathil,
Ricky K. P. Mok,
Amogh Dhamdhere
Abstract:
The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adopt…
▽ More
The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adoption, and this in turn implies that agile control policies can be now instantiated on access networks. The goal of this work is to design, develop and demonstrate QFlow, a learning approach to create a value chain from the application on one side, to algorithms operating over reconfigurable infrastructure on the other, so that applications are able to obtain necessary resources for optimal performance. Using YouTube video streaming as an example, we illustrate how QFlow is able to adaptively provide such resources and attain a high QoE for all clients at a wireless access point.
△ Less
Submitted 13 May, 2020; v1 submitted 3 January, 2019;
originally announced January 2019.
-
Training Multi-Task Adversarial Network for Extracting Noise-Robust Speaker Embedding
Authors:
Jianfeng Zhou,
Tao Jiang,
Lin Li,
Qingyang Hong,
Zhe Wang,
Bingyin Xia
Abstract:
Under noisy environments, to achieve the robust performance of speaker recognition is still a challenging task. Motivated by the promising performance of multi-task training in a variety of image processing tasks, we explore the potential of multi-task adversarial training for learning a noise-robust speaker embedding. In this paper we present a novel framework which consists of three components:…
▽ More
Under noisy environments, to achieve the robust performance of speaker recognition is still a challenging task. Motivated by the promising performance of multi-task training in a variety of image processing tasks, we explore the potential of multi-task adversarial training for learning a noise-robust speaker embedding. In this paper we present a novel framework which consists of three components: an encoder that extracts noise-robust speaker embedding; a classifier that classifies the speakers; a discriminator that discriminates the noise type of the speaker embedding. Besides, we propose a training strategy using the training accuracy as an indicator to stabilize the multi-class adversarial optimization process. We conduct our experiments on the English and Mandarin corpus and the experimental results demonstrate that our proposed multi-task adversarial training method could greatly outperform the other methods without adversarial training in noisy environments. Furthermore, experiments indicate that our method is also able to improve the speaker verification performance the clean condition.
△ Less
Submitted 12 May, 2019; v1 submitted 22 November, 2018;
originally announced November 2018.
-
Parameter Synthesis Problems for one parametric clock Timed Automata
Authors:
Liyun Dai,
Taolue Chen,
Zhiming Liu,
Bican Xia,
Naijun Zhan,
Kim G. Larsen
Abstract:
In this paper, we study the parameter synthesis problem for a class of parametric timed automata. The problem asks to construct the set of valuations of the parameters in the parametric timed automa- ton, referred to as the feasible region, under which the resulting timed automaton satisfies certain properties. We show that the parameter syn- thesis problem of parametric timed automata with only o…
▽ More
In this paper, we study the parameter synthesis problem for a class of parametric timed automata. The problem asks to construct the set of valuations of the parameters in the parametric timed automa- ton, referred to as the feasible region, under which the resulting timed automaton satisfies certain properties. We show that the parameter syn- thesis problem of parametric timed automata with only one parametric clock (unlimited concretely constrained clock) and arbitrarily many pa- rameters is solvable when all the expressions are linear expressions. And it is moreover the synthesis problem is solvable when the form of con- straints are parameter polynomial inequality not just simple constraint and parameter domain is nonnegative real number.
△ Less
Submitted 15 September, 2018;
originally announced September 2018.
-
Barrier Certificates Revisited
Authors:
Liyun Dai,
Ting Gan,
Bican Xia,
Naijun Zhan
Abstract:
A barrier certificate can separate the state space of a con- sidered hybrid system (HS) into safe and unsafe parts ac- cording to the safety property to be verified. Therefore this notion has been widely used in the verification of HSs. A stronger condition on barrier certificates means that less expressive barrier certificates can be synthesized. On the other hand, synthesizing more expressive ba…
▽ More
A barrier certificate can separate the state space of a con- sidered hybrid system (HS) into safe and unsafe parts ac- cording to the safety property to be verified. Therefore this notion has been widely used in the verification of HSs. A stronger condition on barrier certificates means that less expressive barrier certificates can be synthesized. On the other hand, synthesizing more expressive barrier certificates often means high complexity. In [9], Kong et al consid- ered how to relax the condition of barrier certificates while still keeping their convexity so that one can synthesize more expressive barrier certificates efficiently using semi-definite programming (SDP). In this paper, we first discuss how to relax the condition of barrier certificates in a general way, while still keeping their convexity. Particularly, one can then utilize different weaker conditions flexibly to synthesize dif- ferent kinds of barrier certificates with more expressiveness efficiently using SDP. These barriers give more opportuni- ties to verify the considered system. We also show how to combine two functions together to form a combined barrier certificate in order to prove a safety property under consid- eration, whereas neither of them can be used as a barrier certificate separately, even according to any relaxed condi- tion. Another contribution of this paper is that we discuss how to discover certificates from the general relaxed condi- tion by SDP. In particular, we focus on how to avoid the unsoundness because of numeric error caused by SDP with symbolic checking
△ Less
Submitted 24 October, 2013;
originally announced October 2013.