-
Towards Interpretable PolSAR Image Classification: Polarimetric Scattering Mechanism Informed Concept Bottleneck and Kolmogorov-Arnold Network
Authors:
Jinqi Zhang,
Fangzhou Han,
Di Zhuang,
Lamei Zhang,
Bin Zou,
Li Yuan
Abstract:
In recent years, Deep Learning (DL) based methods have received extensive and sufficient attention in the field of PolSAR image classification, which show excellent performance. However, due to the ``black-box" nature of DL methods, the interpretation of the high-dimensional features extracted and the backtracking of the decision-making process based on the features are still unresolved problems.…
▽ More
In recent years, Deep Learning (DL) based methods have received extensive and sufficient attention in the field of PolSAR image classification, which show excellent performance. However, due to the ``black-box" nature of DL methods, the interpretation of the high-dimensional features extracted and the backtracking of the decision-making process based on the features are still unresolved problems. In this study, we first highlight this issue and attempt to achieve the interpretability analysis of DL-based PolSAR image classification technology with the help of Polarimetric Target Decomposition (PTD), a feature extraction method related to the scattering mechanism unique to the PolSAR image processing field. In our work, by constructing the polarimetric conceptual labels and a novel structure named Parallel Concept Bottleneck Networks (PaCBM), the uninterpretable high-dimensional features are transformed into human-comprehensible concepts based on physically verifiable polarimetric scattering mechanisms. Then, the Kolmogorov-Arnold Network (KAN) is used to replace Multi-Layer Perceptron (MLP) for achieving a more concise and understandable mapping process between layers and further enhanced non-linear modeling ability. The experimental results on several PolSAR datasets show that the features could be conceptualization under the premise of achieving satisfactory accuracy through the proposed pipeline, and the analytical function for predicting category labels from conceptual labels can be obtained by combining spline functions, thus promoting the research on the interpretability of the DL-based PolSAR image classification model.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization
Authors:
Tingting Liu,
Yuan Liu,
Jinhui Tang,
Liyin Yuan,
Chengyu Liu,
Chunlai Li,
Xiubao Sui,
Qian Chen
Abstract:
Thermal infrared (TIR) images, acquired through thermal radiation imaging, are unaffected by variations in lighting conditions and atmospheric haze. However, TIR images inherently lack color and texture information, limiting downstream tasks and potentially causing visual fatigue. Existing colorization methods primarily rely on single-band images with limited spectral information and insufficient…
▽ More
Thermal infrared (TIR) images, acquired through thermal radiation imaging, are unaffected by variations in lighting conditions and atmospheric haze. However, TIR images inherently lack color and texture information, limiting downstream tasks and potentially causing visual fatigue. Existing colorization methods primarily rely on single-band images with limited spectral information and insufficient feature extraction capabilities, which often result in image distortion and semantic ambiguity. In contrast, multiband infrared imagery provides richer spectral data, facilitating the preservation of finer details and enhancing semantic accuracy. In this paper, we propose a generative adversarial network (GAN)-based framework designed to integrate spectral information to enhance the colorization of infrared images. The framework employs a multi-stage spectral self-attention Transformer network (MTSIC) as the generator. Each spectral feature is treated as a token for self-attention computation, and a multi-head self-attention mechanism forms a spatial-spectral attention residual block (SARB), achieving multi-band feature mapping and reducing semantic confusion. Multiple SARB units are integrated into a Transformer-based single-stage network (STformer), which uses a U-shaped architecture to extract contextual information, combined with multi-scale wavelet blocks (MSWB) to align semantic information in the spatial-frequency dual domain. Multiple STformer modules are cascaded to form MTSIC, progressively optimizing the reconstruction quality. Experimental results demonstrate that the proposed method significantly outperforms traditional techniques and effectively enhances the visual quality of infrared images.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
RAISE: Optimizing RIS Placement to Maximize Task Throughput in Multi-Server Vehicular Edge Computing
Authors:
Yanan Ma,
Zhengru Fang,
Longzhi Yuan,
Yiqin Deng,
Xianhao Chen,
Yuguang Fang
Abstract:
Given the limited computing capabilities on autonomous vehicles, onboard processing of large volumes of latency-sensitive tasks presents significant challenges. While vehicular edge computing (VEC) has emerged as a solution, offloading data-intensive tasks to roadside servers or other vehicles is hindered by large obstacles like trucks/buses and the surge in service demands during rush hours. To a…
▽ More
Given the limited computing capabilities on autonomous vehicles, onboard processing of large volumes of latency-sensitive tasks presents significant challenges. While vehicular edge computing (VEC) has emerged as a solution, offloading data-intensive tasks to roadside servers or other vehicles is hindered by large obstacles like trucks/buses and the surge in service demands during rush hours. To address these challenges, Reconfigurable Intelligent Surface (RIS) can be leveraged to mitigate interference from ground signals and reach more edge servers by elevating RIS adaptively. To this end, we propose RAISE, an optimization framework for RIS placement in multi-server VEC systems. Specifically, RAISE optimizes RIS altitude and tilt angle together with the optimal task assignment to maximize task throughput under deadline constraints. To find a solution, a two-layer optimization approach is proposed, where the inner layer exploits the unimodularity of the task assignment problem to derive the efficient optimal strategy while the outer layer develops a near-optimal hill climbing (HC) algorithm for RIS placement with low complexity. Extensive experiments demonstrate that the proposed RAISE framework consistently outperforms existing benchmarks.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Resilient Quantized Consensus in Multi-Hop Relay Networks
Authors:
Liwei Yuan,
Hideaki Ishii
Abstract:
We study resilient quantized consensus in multi-agent systems, where some agents may malfunction. The network consists of agents taking integer-valued states, and the agents' communication is subject to asynchronous updates and time delays. We utilize the quantized weighted mean subsequence reduced algorithm where agents communicate with others through multi-hop relays. We prove necessary and suff…
▽ More
We study resilient quantized consensus in multi-agent systems, where some agents may malfunction. The network consists of agents taking integer-valued states, and the agents' communication is subject to asynchronous updates and time delays. We utilize the quantized weighted mean subsequence reduced algorithm where agents communicate with others through multi-hop relays. We prove necessary and sufficient conditions for our algorithm to achieve the objective under the malicious and Byzantine attack models. Our approach has tighter graph conditions compared to the one-hop algorithm and the flooding-based algorithms for binary consensus. Numerical examples verify the efficacy of our algorithm.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Reaching Resilient Leader-Follower Consensus in Time-Varying Networks via Multi-Hop Relays
Authors:
Liwei Yuan,
Hideaki Ishii
Abstract:
We study resilient leader-follower consensus of multi-agent systems (MASs) in the presence of adversarial agents, where agents' communication is modeled by time-varying topologies. The objective is to develop distributed algorithms for the nonfaulty/normal followers to track an arbitrary reference value propagated by a set of leaders while they are in interaction with the unknown adversarial agent…
▽ More
We study resilient leader-follower consensus of multi-agent systems (MASs) in the presence of adversarial agents, where agents' communication is modeled by time-varying topologies. The objective is to develop distributed algorithms for the nonfaulty/normal followers to track an arbitrary reference value propagated by a set of leaders while they are in interaction with the unknown adversarial agents. Our approaches are based on the weighted mean subsequence reduced (W-MSR) algorithms with agents being capable to communicate with multi-hop neighbors. Our algorithms can handle agents possessing first-order and second-order dynamics. Moreover, we characterize necessary and sufficient graph conditions for our algorithms to succeed by the novel notion of jointly robust following graphs. Our graph condition is tighter than the sufficient conditions in the literature when agents use only one-hop communication (without relays). Using multi-hop relays, we can enhance robustness of leader-follower networks without increasing communication links and obtain further relaxed graph requirements for our algorithms to succeed. Numerical examples are given to verify the efficacy of our algorithms.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Authors:
Liuhan Chen,
Zongjian Li,
Bin Lin,
Bin Zhu,
Qian Wang,
Shenghai Yuan,
Xing Zhou,
Xinhua Cheng,
Li Yuan
Abstract:
Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ign…
▽ More
Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ignored in the temporal dimension. How to conduct temporal compression for videos in a VAE to obtain more concise latent representations while promising accurate reconstruction is seldom explored. To fill this gap, we propose an omni-dimension compression VAE, named OD-VAE, which can temporally and spatially compress videos. Although OD-VAE's more sufficient compression brings a great challenge to video reconstruction, it can still achieve high reconstructed accuracy by our fine design. To obtain a better trade-off between video reconstruction quality and compression speed, four variants of OD-VAE are introduced and analyzed. In addition, a novel tail initialization is designed to train OD-VAE more efficiently, and a novel inference strategy is proposed to enable OD-VAE to handle videos of arbitrary length with limited GPU memory. Comprehensive experiments on video reconstruction and LVDM-based video generation demonstrate the effectiveness and efficiency of our proposed methods.
△ Less
Submitted 9 September, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
Resilient Average Consensus with Adversaries via Distributed Detection and Recovery
Authors:
Liwei Yuan,
Hideaki Ishii
Abstract:
We study the problem of resilient average consensus in multi-agent systems where some of the agents are subject to failures or attacks. The objective of resilient average consensus is for non-faulty/normal agents to converge to the average of their initial values despite the erroneous effects from malicious agents. To this end, we propose a successful distributed iterative resilient average consen…
▽ More
We study the problem of resilient average consensus in multi-agent systems where some of the agents are subject to failures or attacks. The objective of resilient average consensus is for non-faulty/normal agents to converge to the average of their initial values despite the erroneous effects from malicious agents. To this end, we propose a successful distributed iterative resilient average consensus algorithm for the multi-agent networks with general directed topologies. The proposed algorithm has two parts at each iteration: detection and averaging. For the detection part, we propose two distributed algorithms and one of them can detect malicious agents with only the information from direct in-neighbors. For the averaging part, we extend the applicability of an existing averaging algorithm where normal agents can remove the effects from malicious agents so far, after they are detected. Another important feature of our method is that it can handle the case where malicious agents are neighboring and collaborating with each other to mislead the normal ones from averaging. This case cannot be solved by existing detection approaches in related literature. Moreover, our algorithm is efficient in storage usage especially for large-scale networks as each agent only requires the values of neighbors within two hops. Lastly, numerical examples are given to verify the efficacy of the proposed algorithms.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
EvaNet: Elevation-Guided Flood Extent Mapping on Earth Imagery (Extended Version)
Authors:
Mirza Tanzim Sami,
Da Yan,
Saugat Adhikari,
Lyuheng Yuan,
Jiao Han,
Zhe Jiang,
Jalal Khalil,
Yang Zhou
Abstract:
Accurate and timely mapping of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectr…
▽ More
Accurate and timely mapping of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectral features. Thanks to the digital elevation model (DEM) data readily available from sources such as United States Geological Survey (USGS), this work explores the use of an elevation map to improve flood extent mapping. We propose, EvaNet, an elevation-guided segmentation model based on the encoder-decoder architecture with two novel techniques: (1) a loss function encoding the physical law of gravity that if a location is flooded (resp. dry), then its adjacent locations with a lower (resp. higher) elevation must also be flooded (resp. dry); (2) a new (de)convolution operation that integrates the elevation map by a location sensitive gating mechanism to regulate how much spectral features flow through adjacent layers. Extensive experiments show that EvaNet significantly outperforms the U-Net baselines, and works as a perfect drop-in replacement for U-Net in existing solutions to flood extent mapping.
△ Less
Submitted 25 September, 2024; v1 submitted 27 April, 2024;
originally announced April 2024.
-
Image Quality Assessment With Compressed Sampling
Authors:
Ronghua Liao,
Chen Hui,
Lang Yuan,
Haiqi Zhu,
Feng Jiang
Abstract:
No-Reference Image Quality Assessment (NR-IQA) aims at estimating image quality in accordance with subjective human perception. However, most methods focus on exploring increasingly complex networks to improve the final performance,accompanied by limitations on input images. Especially when applied to high-resolution (HR) images, these methods offen have to adjust the size of original image to mee…
▽ More
No-Reference Image Quality Assessment (NR-IQA) aims at estimating image quality in accordance with subjective human perception. However, most methods focus on exploring increasingly complex networks to improve the final performance,accompanied by limitations on input images. Especially when applied to high-resolution (HR) images, these methods offen have to adjust the size of original image to meet model input.To further alleviate the aforementioned issue, we propose two networks for NR-IQA with Compressive Sampling (dubbed CL-IQA and CS-IQA). They consist of four components: (1) The Compressed Sampling Module (CSM) to sample the image (2)The Adaptive Embedding Module (AEM). The measurements are embedded by AEM to extract high-level features. (3) The Vision Transformer and Scale Swin TranBlocksformer Moudle(SSTM) to extract deep features. (4) The Dual Branch (DB) to get final quality score. Experiments show that our proposed methods outperform other methods on various datasets with less data usage.
△ Less
Submitted 11 September, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Dependability Evaluation of Stable Diffusion with Soft Errors on the Model Parameters
Authors:
Zhen Gao,
Lini Yuan,
Pedro Reviriego,
Shanshan Liu,
Fabrizio Lombardi
Abstract:
Stable Diffusion is a popular Transformer-based model for image generation from text; it applies an image information creator to the input text and the visual knowledge is added in a step-by-step fashion to create an image that corresponds to the input text. However, this diffusion process can be corrupted by errors from the underlying hardware, which are especially relevant for implementations at…
▽ More
Stable Diffusion is a popular Transformer-based model for image generation from text; it applies an image information creator to the input text and the visual knowledge is added in a step-by-step fashion to create an image that corresponds to the input text. However, this diffusion process can be corrupted by errors from the underlying hardware, which are especially relevant for implementations at the nanoscales. In this paper, the dependability of Stable Diffusion is studied focusing on soft errors in the memory that stores the model parameters; specifically, errors are injected into some critical layers of the Transformer in different blocks of the image information creator, to evaluate their impact on model performance. The simulations results reveal several conclusions: 1) errors on the down blocks of the creator have a larger impact on the quality of the generated images than those on the up blocks, while the errors on middle block have negligible effect; 2) errors on the self-attention (SA) layers have larger impact on the results than those on the cross-attention (CA) layers; 3) for CA layers, errors on deeper levels result in a larger impact; 4) errors on blocks at the first levels tend to introduce noise in the image, and those on deep layers tend to introduce large colored blocks. These results provide an initial understanding of the impact of errors on Stable Diffusion.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Generative Enhancement for 3D Medical Images
Authors:
Lingting Zhu,
Noel Codella,
Dongdong Chen,
Zhenchao Jin,
Lu Yuan,
Lequan Yu
Abstract:
The limited availability of 3D medical image datasets, due to privacy concerns and high collection or annotation costs, poses significant challenges in the field of medical imaging. While a promising alternative is the use of synthesized medical data, there are few solutions for realistic 3D medical image synthesis due to difficulties in backbone design and fewer 3D training samples compared to 2D…
▽ More
The limited availability of 3D medical image datasets, due to privacy concerns and high collection or annotation costs, poses significant challenges in the field of medical imaging. While a promising alternative is the use of synthesized medical data, there are few solutions for realistic 3D medical image synthesis due to difficulties in backbone design and fewer 3D training samples compared to 2D counterparts. In this paper, we propose GEM-3D, a novel generative approach to the synthesis of 3D medical images and the enhancement of existing datasets using conditional diffusion models. Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask. By decomposing the 3D medical images into masks and patient prior information, GEM-3D offers a flexible yet effective solution for generating versatile 3D images from existing datasets. GEM-3D can enable dataset enhancement by combining informed slice selection and generation at random positions, along with editable mask volumes to introduce large variations in diffusion sampling. Moreover, as the informed slice contains patient-wise information, GEM-3D can also facilitate counterfactual image synthesis and dataset-level de-enhancement with desired control. Experiments on brain MRI and abdomen CT images demonstrate that GEM-3D is capable of synthesizing high-quality 3D medical images with volumetric consistency, offering a straightforward solution for dataset enhancement during inference. The code is available at https://github.com/HKU-MedAI/GEM-3D.
△ Less
Submitted 24 May, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Asynchronous Approximate Byzantine Consensus: A Multi-hop Relay Method and Tight Graph Conditions
Authors:
Liwei Yuan,
Hideaki Ishii
Abstract:
We study a multi-agent resilient consensus problem, where some agents are of the Byzantine type and try to prevent the normal ones from reaching consensus. In our setting, normal agents communicate with each other asynchronously over multi-hop relay channels with delays. To solve this asynchronous Byzantine consensus problem, we develop the multi-hop weighted mean subsequence reduced (MW-MSR) algo…
▽ More
We study a multi-agent resilient consensus problem, where some agents are of the Byzantine type and try to prevent the normal ones from reaching consensus. In our setting, normal agents communicate with each other asynchronously over multi-hop relay channels with delays. To solve this asynchronous Byzantine consensus problem, we develop the multi-hop weighted mean subsequence reduced (MW-MSR) algorithm. The main contribution is that we characterize a tight graph condition for our algorithm to achieve Byzantine consensus, which is expressed in the novel notion of strictly robust graphs. We show that the multi-hop communication is effective for enhancing the network's resilience against Byzantine agents. As a result, we also obtain novel conditions for resilient consensus under the malicious attack model, which are tighter than those known in the literature. Furthermore, the proposed algorithm can be viewed as a generalization of the conventional flooding-based algorithms, with less computational complexity. Lastly, we provide numerical examples to show the effectiveness of the proposed algorithm.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
Authors:
Yiju Guo,
Ganqu Cui,
Lifan Yuan,
Ning Ding,
Zexu Sun,
Bowen Sun,
Huimin Chen,
Ruobing Xie,
Jie Zhou,
Yankai Lin,
Zhiyuan Liu,
Maosong Sun
Abstract:
Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, exi…
▽ More
Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives. To navigate this challenge, we argue the prominence of grounding LLMs with evident preferences. We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives, thereby guiding the model to generate responses that meet the requirements. Our experimental analysis reveals that the aligned models can provide responses that match various preferences among the "3H" (helpfulness, honesty, harmlessness) desiderata. Furthermore, by introducing diverse data and alignment goals, we surpass baseline methods in aligning with single objectives, hence mitigating the impact of the alignment tax and achieving improvements in multi-objective alignment.
△ Less
Submitted 11 October, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Knowledge Graph Driven UAV Cognitive Semantic Communication Systems for Efficient Object Detection
Authors:
Xi Song,
Lu Yuan,
Zhibo Qu,
Fuhui Zhou,
Qihui Wu,
Tony Q. S. Quek,
Rose Qingyang Hu
Abstract:
Unmanned aerial vehicles (UAVs) are widely used for object detection. However, the existing UAV-based object detection systems are subject to the serious challenge, namely, the finite computation, energy and communication resources, which limits the achievable detection performance. In order to overcome this challenge, a UAV cognitive semantic communication system is proposed by exploiting knowled…
▽ More
Unmanned aerial vehicles (UAVs) are widely used for object detection. However, the existing UAV-based object detection systems are subject to the serious challenge, namely, the finite computation, energy and communication resources, which limits the achievable detection performance. In order to overcome this challenge, a UAV cognitive semantic communication system is proposed by exploiting knowledge graph. Moreover, a multi-scale compression network is designed for semantic compression to reduce data transmission volume while guaranteeing the detection performance. Furthermore, an object detection scheme is proposed by using the knowledge graph to overcome channel noise interference and compression distortion. Simulation results conducted on the practical aerial image dataset demonstrate that compared to the benchmark systems, our proposed system has superior detection accuracy, communication robustness and computation efficiency even under high compression rates and low signal-to-noise ratio (SNR) conditions.
△ Less
Submitted 21 February, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Human Sensing via Passive Spectrum Monitoring
Authors:
Huaizheng Mu,
Liangqi Yuan,
Jia Li
Abstract:
Human sensing is significantly improving our lifestyle in many fields such as elderly healthcare and public safety. Research has demonstrated that human activity can alter the passive radio frequency (PRF) spectrum, which represents the passive reception of RF signals in the surrounding environment without actively transmitting a target signal. This paper proposes a novel passive human sensing met…
▽ More
Human sensing is significantly improving our lifestyle in many fields such as elderly healthcare and public safety. Research has demonstrated that human activity can alter the passive radio frequency (PRF) spectrum, which represents the passive reception of RF signals in the surrounding environment without actively transmitting a target signal. This paper proposes a novel passive human sensing method that utilizes PRF spectrum alteration as a biometrics modality for human authentication, localization, and activity recognition. The proposed method uses software-defined radio (SDR) technology to acquire the PRF in the frequency band sensitive to human signature. Additionally, the PRF spectrum signatures are classified and regressed by five machine learning (ML) algorithms based on different human sensing tasks. The proposed Sensing Humans among Passive Radio Frequency (SHAPR) method was tested in several environments and scenarios, including a laboratory, a living room, a classroom, and a vehicle, to verify its extensiveness. The experimental results show that the SHAPR method achieved more than 95% accuracy in the four scenarios for the three human sensing tasks, with a localization error of less than 0.8 m. These results indicate that the SHAPR technique can be considered a new human signature modality with high accuracy, robustness, and general applicability.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Authors:
Ziyi Yang,
Mahmoud Khademi,
Yichong Xu,
Reid Pryzant,
Yuwei Fang,
Chenguang Zhu,
Dongdong Chen,
Yao Qian,
Mei Gao,
Yi-Ling Chen,
Robert Gmyr,
Naoyuki Kanda,
Noel Codella,
Bin Xiao,
Yu Shi,
Lu Yuan,
Takuya Yoshioka,
Michael Zeng,
Xuedong Huang
Abstract:
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a…
▽ More
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity Recognition
Authors:
Liangqi Yuan,
Yuan Wei,
Jia Li
Abstract:
With the emphasis on healthcare, early childhood education, and fitness, non-invasive measurement and recognition methods have received more attention. Pressure sensing has been extensively studied because of its advantages of simple structure, easy access, visualization application, and harmlessness. This paper introduces a Smart Pressure e-Mat (SPeM) system based on piezoresistive material, Velo…
▽ More
With the emphasis on healthcare, early childhood education, and fitness, non-invasive measurement and recognition methods have received more attention. Pressure sensing has been extensively studied because of its advantages of simple structure, easy access, visualization application, and harmlessness. This paper introduces a Smart Pressure e-Mat (SPeM) system based on piezoresistive material, Velostat, for human monitoring applications, including recognition of sleeping postures, sports, and yoga. After a subsystem scans the e-mat readings and processes the signal, it generates a pressure image stream. Deep neural networks (DNNs) are used to fit and train the pressure image stream and recognize the corresponding human behavior. Four sleeping postures and 13 dynamic activities inspired by Nintendo Switch Ring Fit Adventure (RFA) are used as a preliminary validation of the proposed SPeM system. The SPeM system achieves high accuracies in both applications, demonstrating the high accuracy and generalizability of the models. Compared with other pressure sensor-based systems, SPeM possesses more flexible applications and commercial application prospects, with reliable, robust, and repeatable properties.
△ Less
Submitted 19 November, 2024; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Passive Radio Frequency-based 3D Indoor Positioning System via Ensemble Learning
Authors:
Liangqi Yuan,
Houlin Chen,
Robert Ewing,
Jia Li
Abstract:
Passive radio frequency (PRF)-based indoor positioning systems (IPS) have attracted researchers' attention due to their low price, easy and customizable configuration, and non-invasive design. This paper proposes a PRF-based three-dimensional (3D) indoor positioning system (PIPS), which is able to use signals of opportunity (SoOP) for positioning and also capture a scenario signature. PIPS passive…
▽ More
Passive radio frequency (PRF)-based indoor positioning systems (IPS) have attracted researchers' attention due to their low price, easy and customizable configuration, and non-invasive design. This paper proposes a PRF-based three-dimensional (3D) indoor positioning system (PIPS), which is able to use signals of opportunity (SoOP) for positioning and also capture a scenario signature. PIPS passively monitors SoOPs containing scenario signatures through a single receiver. Moreover, PIPS leverages the Dynamic Data Driven Applications System (DDDAS) framework to devise and customize the sampling frequency, enabling the system to use the most impacted frequency band as the rated frequency band. Various regression methods within three ensemble learning strategies are used to train and predict the receiver position. The PRF spectrum of 60 positions is collected in the experimental scenario, and three criteria are applied to evaluate the performance of PIPS. Experimental results show that the proposed PIPS possesses the advantages of high accuracy, configurability, and robustness.
△ Less
Submitted 25 March, 2023;
originally announced April 2023.
-
i-Code: An Integrative and Composable Multimodal Learning Framework
Authors:
Ziyi Yang,
Yuwei Fang,
Chenguang Zhu,
Reid Pryzant,
Dongdong Chen,
Yu Shi,
Yichong Xu,
Yao Qian,
Mei Gao,
Yi-Ling Chen,
Liyang Lu,
Yujia Xie,
Robert Gmyr,
Noel Codella,
Naoyuki Kanda,
Bin Xiao,
Lu Yuan,
Takuya Yoshioka,
Michael Zeng,
Xuedong Huang
Abstract:
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. I…
▽ More
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.
△ Less
Submitted 5 May, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Event-triggered Approximate Byzantine Consensus with Multi-hop Communication
Authors:
Liwei Yuan,
Hideaki Ishii
Abstract:
In this paper, we consider a resilient consensus problem for the multi-agent network where some of the agents are subject to Byzantine attacks and may transmit erroneous state values to their neighbors. In particular, we develop an event-triggered update rule to tackle this problem as well as reduce the communication for each agent. Our approach is based on the mean subsequence reduced (MSR) algor…
▽ More
In this paper, we consider a resilient consensus problem for the multi-agent network where some of the agents are subject to Byzantine attacks and may transmit erroneous state values to their neighbors. In particular, we develop an event-triggered update rule to tackle this problem as well as reduce the communication for each agent. Our approach is based on the mean subsequence reduced (MSR) algorithm with agents being capable to communicate with multi-hop neighbors. Since delays are critical in such an environment, we provide necessary graph conditions for the proposed algorithm to perform well with delays in the communication. We highlight that through multi-hop communication, the network connectivity can be reduced especially in comparison with the common onehop communication case. Lastly, we show the effectiveness of the proposed algorithm by a numerical example.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
Towards Best Practice of Interpreting Deep Learning Models for EEG-based Brain Computer Interfaces
Authors:
Jian Cui,
Liqiang Yuan,
Zhaoxiang Wang,
Ruilin Li,
Tianzi Jiang
Abstract:
As deep learning has achieved state-of-the-art performance for many tasks of EEG-based BCI, many efforts have been made in recent years trying to understand what have been learned by the models. This is commonly done by generating a heatmap indicating to which extent each pixel of the input contributes to the final classification for a trained model. Despite the wide use, it is not yet understood…
▽ More
As deep learning has achieved state-of-the-art performance for many tasks of EEG-based BCI, many efforts have been made in recent years trying to understand what have been learned by the models. This is commonly done by generating a heatmap indicating to which extent each pixel of the input contributes to the final classification for a trained model. Despite the wide use, it is not yet understood to which extent the obtained interpretation results can be trusted and how accurate they can reflect the model decisions. In order to fill this research gap, we conduct a study to evaluate different deep interpretation techniques quantitatively on EEG datasets. The results reveal the importance of selecting a proper interpretation technique as the initial step. In addition, we also find that the quality of the interpretation results is inconsistent for individual samples despite when a method with an overall good performance is used. Many factors, including model structure and dataset types, could potentially affect the quality of the interpretation results. Based on the observations, we propose a set of procedures that allow the interpretation results to be presented in an understandable and trusted way. We illustrate the usefulness of our method for EEG-based BCI with instances selected from different scenarios.
△ Less
Submitted 17 April, 2023; v1 submitted 12 February, 2022;
originally announced February 2022.
-
Resilient Consensus with Multi-hop Communication
Authors:
Liwei Yuan,
Hideaki Ishii
Abstract:
In this paper, we study the problem of resilient consensus for a multi-agent network where some of the nodes might be adversarial, attempting to prevent consensus by transmitting faulty values. Our approach is based on that of the so-called weighted mean subsequence reduced (W-MSR) algorithm with a special emphasis on its use in agents capable to communicate with multi-hop neighbors. The MSR algor…
▽ More
In this paper, we study the problem of resilient consensus for a multi-agent network where some of the nodes might be adversarial, attempting to prevent consensus by transmitting faulty values. Our approach is based on that of the so-called weighted mean subsequence reduced (W-MSR) algorithm with a special emphasis on its use in agents capable to communicate with multi-hop neighbors. The MSR algorithm is a powerful tool for achieving resilient consensus under minimal requirements for network structures, characterized by the class of robust graphs. Our analysis highlights that through multi-hop communication, the network connectivity can be reduced especially in comparison with the common one-hop communication case. Moreover, we analyze the multi-hop W-MSR algorithm with delays in communication since the values from different multi-hop neighbors may arrive at the agents at different time steps.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
KFWC: A Knowledge-Driven Deep Learning Model for Fine-grained Classification of Wet-AMD
Authors:
Haihong E,
Jiawen He,
Tianyi Hu,
Lifei Wang,
Lifei Yuan,
Ruru Zhang,
Meina Song
Abstract:
Automated diagnosis using deep neural networks can help ophthalmologists detect the blinding eye disease wet Age-related Macular Degeneration (AMD). Wet-AMD has two similar subtypes, Neovascular AMD and Polypoidal Choroidal Vessels (PCV). However, due to the difficulty in data collection and the similarity between images, most studies have only achieved the coarse-grained classification of wet-AMD…
▽ More
Automated diagnosis using deep neural networks can help ophthalmologists detect the blinding eye disease wet Age-related Macular Degeneration (AMD). Wet-AMD has two similar subtypes, Neovascular AMD and Polypoidal Choroidal Vessels (PCV). However, due to the difficulty in data collection and the similarity between images, most studies have only achieved the coarse-grained classification of wet-AMD rather than a finer-grained one of wet-AMD subtypes. To solve this issue, in this paper we propose a Knowledge-driven Fine-grained Wet-AMD Classification Model (KFWC), to classify fine-grained diseases with insufficient data. With the introduction of a priori knowledge of 10 lesion signs of input images into the KFWC, we aim to accelerate the KFWC by means of multi-label classification pre-training, to locate the decisive image features in the fine-grained disease classification task and therefore achieve better classification. Simultaneously, the KFWC can also provide good interpretability and effectively alleviate the pressure of data collection and annotation in the field of fine-grained disease classification for wet-AMD. The experiments demonstrate the effectiveness of the KFWC which reaches 99.71% in AU-ROC scores, and its considerable improvements over the data-driven w/o Knowledge and ophthalmologists, with the rates of 6.69% over the strongest baseline and 4.14% over ophthalmologists.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Automatic Modulation Classification Using Involution Enabled Residual Networks
Authors:
Hao Zhang,
Lu Yuan,
Guangyu Wu,
Fuhui Zhou,
Qihui Wu
Abstract:
Automatic modulation classification (AMC) is of crucial importance for realizing wireless intelligence communications. Many deep learning based models especially convolution neural networks (CNNs) have been proposed for AMC. However, the computation cost is very high, which makes them inappropriate for beyond the fifth generation wireless communication networks that have stringent requirements on…
▽ More
Automatic modulation classification (AMC) is of crucial importance for realizing wireless intelligence communications. Many deep learning based models especially convolution neural networks (CNNs) have been proposed for AMC. However, the computation cost is very high, which makes them inappropriate for beyond the fifth generation wireless communication networks that have stringent requirements on the classification accuracy and computing time. In order to tackle those challenges, a novel involution enabled AMC scheme is proposed by using the bottleneck structure of the residual networks. Involution is utilized instead of convolution to enhance the discrimination capability and expressiveness of the model by incorporating a self-attention mechanism. Simulation results demonstrate that our proposed scheme achieves superior classification performance and faster convergence speed comparing with other benchmark schemes.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Authors:
Hassan Akbari,
Liangzhe Yuan,
Rui Qian,
Wei-Hong Chuang,
Shih-Fu Chang,
Yin Cui,
Boqing Gong
Abstract:
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and eval…
▽ More
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval. Furthermore, we study a modality-agnostic, single-backbone Transformer by sharing weights among the three modalities. We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks. Especially, VATT's vision Transformer achieves the top-1 accuracy of 82.1% on Kinetics-400, 83.6% on Kinetics-600, 72.7% on Kinetics-700, and 41.1% on Moments in Time, new records while avoiding supervised pre-training. Transferring to image classification leads to 78.7% top-1 accuracy on ImageNet compared to 64.7% by training the same Transformer from scratch, showing the generalizability of our model despite the domain gap between videos and images. VATT's audio Transformer also sets a new record on waveform-based audio event recognition by achieving the mAP of 39.4% on AudioSet without any supervised pre-training. VATT's source code is publicly available.
△ Less
Submitted 6 December, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
AutoFlow: Hotspot-Aware, Dynamic Load Balancing for Distributed Stream Processing
Authors:
Pengqi Lu,
Liang Yuan,
Yunquan Zhang,
Hang Cao,
Kun Li
Abstract:
Stream applications are widely deployed on the cloud. While modern distributed streaming systems like Flink and Spark Streaming can schedule and execute them efficiently, streaming dataflows are often dynamically changing, which may cause computation imbalance and backpressure. We introduce AutoFlow, an automatic, hotspot-aware dynamic load balance system for streaming dataflows. It incorporates a…
▽ More
Stream applications are widely deployed on the cloud. While modern distributed streaming systems like Flink and Spark Streaming can schedule and execute them efficiently, streaming dataflows are often dynamically changing, which may cause computation imbalance and backpressure. We introduce AutoFlow, an automatic, hotspot-aware dynamic load balance system for streaming dataflows. It incorporates a centralized scheduler which monitors the load balance in the entire dataflow dynamically and implements state migrations correspondingly. The scheduler achieves these two tasks using a simple asynchronous distributed control message mechanism and a hotspot-diminishing algorithm. The timing mechanism supports implicit barriers and a highly efficient state-migration without global barriers or pauses to operators. It also supports a time-window based load-balance measurement and feeds them to the hotspot-diminishing algorithm without user interference. We implemented AutoFlow on top of Ray, an actor-based distributed execution framework. Our evaluation based on various streaming benchmark dataset shows that AutoFlow achieves good load-balance and incurs a low latency overhead in highly data-skew workload.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Secure Consensus with Distributed Detection via Two-hop Communication
Authors:
Liwei Yuan,
Hideaki Ishii
Abstract:
In this paper, we consider a multi-agent resilient consensus problem, where some of the nodes may behave maliciously. The approach is to equip all nodes with a scheme to detect neighboring nodes when they behave in an abnormal fashion. To this end, the nodes exchange not only their own states but also information regarding their neighbor nodes. Such two-hop communication has long been studied in f…
▽ More
In this paper, we consider a multi-agent resilient consensus problem, where some of the nodes may behave maliciously. The approach is to equip all nodes with a scheme to detect neighboring nodes when they behave in an abnormal fashion. To this end, the nodes exchange not only their own states but also information regarding their neighbor nodes. Such two-hop communication has long been studied in fault-tolerant algorithms in computer science. We propose two distributed schemes for detection of malicious nodes and resilient consensus with different requirements on resources for communication and the structures of the networks. In particular, the detection schemes become effective under certain connectivity properties in the network so that the non-malicious nodes can share enough information about their neighbors. It is shown that the requirements are however less stringent than those for conventional algorithms. A numerical example is presented to demonstrate the performance of the proposed methods in wireless sensor networks.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Exploiting Shared Knowledge from Non-COVID Lesions for Annotation-Efficient COVID-19 CT Lung Infection Segmentation
Authors:
Yichi Zhang,
Qingcheng Liao,
Lin Yuan,
He Zhu,
Jiezhen Xing,
Jicong Zhang
Abstract:
The novel Coronavirus disease (COVID-19) is a highly contagious virus and has spread all over the world, posing an extremely serious threat to all countries. Automatic lung infection segmentation from computed tomography (CT) plays an important role in the quantitative analysis of COVID-19. However, the major challenge lies in the inadequacy of annotated COVID-19 datasets. Currently, there are sev…
▽ More
The novel Coronavirus disease (COVID-19) is a highly contagious virus and has spread all over the world, posing an extremely serious threat to all countries. Automatic lung infection segmentation from computed tomography (CT) plays an important role in the quantitative analysis of COVID-19. However, the major challenge lies in the inadequacy of annotated COVID-19 datasets. Currently, there are several public non-COVID lung lesion segmentation datasets, providing the potential for generalizing useful information to the related COVID-19 segmentation task. In this paper, we propose a novel relation-driven collaborative learning model to exploit shared knowledge from non-COVID lesions for annotation-efficient COVID-19 CT lung infection segmentation. The model consists of a general encoder to capture general lung lesion features based on multiple non-COVID lesions, and a target encoder to focus on task-specific features based on COVID-19 infections. Features extracted from the two parallel encoders are concatenated for the subsequent decoder part. We develop a collaborative learning scheme to regularize feature-level relation consistency of given input and encourage the model to learn more general and discriminative representation of COVID-19 infections. Extensive experiments demonstrate that trained with limited COVID-19 data, exploiting shared knowledge from non-COVID lesions can further improve state-of-the-art performance with up to 3.0% in dice similarity coefficient and 4.2% in normalized surface dice. Our proposed method promotes new insights into annotation-efficient deep learning for COVID-19 infection segmentation and illustrates strong potential for real-world applications in the global fight against COVID-19 in the absence of sufficient high-quality annotations.
△ Less
Submitted 27 July, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
Fooling the primate brain with minimal, targeted image manipulation
Authors:
Li Yuan,
Will Xiao,
Giorgia Dellaferrera,
Gabriel Kreiman,
Francis E. H. Tay,
Jiashi Feng,
Margaret S. Livingstone
Abstract:
Artificial neural networks (ANNs) are considered the current best models of biological vision. ANNs are the best predictors of neural activity in the ventral stream; moreover, recent work has demonstrated that ANN models fitted to neuronal activity can guide the synthesis of images that drive pre-specified response patterns in small neuronal populations. Despite the success in predicting and steer…
▽ More
Artificial neural networks (ANNs) are considered the current best models of biological vision. ANNs are the best predictors of neural activity in the ventral stream; moreover, recent work has demonstrated that ANN models fitted to neuronal activity can guide the synthesis of images that drive pre-specified response patterns in small neuronal populations. Despite the success in predicting and steering firing activity, these results have not been connected with perceptual or behavioral changes. Here we propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior. We generated 'deceptive images' of human faces, monkey faces, and noise patterns so that they are perceived as a different, pre-specified target category, and measured both monkey neuronal responses and human behavior to these images. We found several effective methods for changing primate visual categorization that required much smaller image change compared to untargeted noise. Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images. Our results represent a valuable step in quantifying and characterizing the differences in perturbation robustness of biological and artificial vision.
△ Less
Submitted 30 March, 2022; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Ultrasonic and Electromagnetic Sensors for Downhole Reservoir Characterization
Authors:
K. Wang,
H. T. Chien,
S. Liao,
L. P. Yuan,
S. H. Sheen,
S. Bakhtiari,
A. C. Raptis
Abstract:
The current work covers the evaluation of ultrasonic and electromagnetic (EM) techniques applied to temperature measurement and flow characterization for Enhanced Geothermal System (EGS). We have evaluated both ultrasonic techniques and microwave radiometry for temperature gradient and profile measurements. A waveguide-based ultrasonic probe was developed to measure the temperature gradient. A sta…
▽ More
The current work covers the evaluation of ultrasonic and electromagnetic (EM) techniques applied to temperature measurement and flow characterization for Enhanced Geothermal System (EGS). We have evaluated both ultrasonic techniques and microwave radiometry for temperature gradient and profile measurements. A waveguide-based ultrasonic probe was developed to measure the temperature gradient. A statistic approach on estimating the average grain size via spectral analysis of the scattered ultrasonic signals is introduced. For directional temperature measurement, different microwave antenna designs are compared numerically and an array loop antenna design is selected for further development. Finally techniques to characterize the porosity and permeability of a hot dry rock resource are presented.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Projection Inpainting Using Partial Convolution for Metal Artifact Reduction
Authors:
Lin Yuan,
Yixing Huang,
Andreas Maier
Abstract:
In computer tomography, due to the presence of metal implants in the patient body, reconstructed images will suffer from metal artifacts. In order to reduce metal artifacts, metals are typically removed in projection images. Therefore, the metal corrupted projection areas need to be inpainted. For deep learning inpainting methods, convolutional neural networks (CNNs) are widely used, for example,…
▽ More
In computer tomography, due to the presence of metal implants in the patient body, reconstructed images will suffer from metal artifacts. In order to reduce metal artifacts, metals are typically removed in projection images. Therefore, the metal corrupted projection areas need to be inpainted. For deep learning inpainting methods, convolutional neural networks (CNNs) are widely used, for example, the U-Net. However, such CNNs use convolutional filter responses on both valid and corrupted pixel values, resulting in unsatisfactory image quality. In this work, partial convolution is applied for projection inpainting, which only relies on valid pixels values. The U-Net with partial convolution and conventional convolution are compared for metal artifact reduction. Our experiments demonstrate that the U-Net with partial convolution is able to inpaint the metal corrupted areas better than that with conventional convolution.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
Cross-domain Correspondence Learning for Exemplar-based Image Translation
Authors:
Pan Zhang,
Bo Zhang,
Dong Chen,
Lu Yuan,
Fang Wen
Abstract:
We present a general framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain (e.g., semantic segmentation mask, or edge map, or pose keypoints), given an exemplar image. The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar. We propose to jointly learn the cros…
▽ More
We present a general framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain (e.g., semantic segmentation mask, or edge map, or pose keypoints), given an exemplar image. The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar. We propose to jointly learn the crossdomain correspondence and the image translation, where both tasks facilitate each other and thus can be learned with weak supervision. The images from distinct domains are first aligned to an intermediate domain where dense correspondence is established. Then, the network synthesizes images based on the appearance of semantically corresponding patches in the exemplar. We demonstrate the effectiveness of our approach in several image translation tasks. Our method is superior to state-of-the-art methods in terms of image quality significantly, with the image style faithful to the exemplar with semantic consistency. Moreover, we show the utility of our method for several applications
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Modular Deep Reinforcement Learning with Temporal Logic Specifications
Authors:
Lim Zun Yuan,
Mohammadhosein Hasanbeig,
Alessandro Abate,
Daniel Kroening
Abstract:
We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temp…
▽ More
We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy. We evaluate our framework in a Mars rover experiment and we present the success rate of the synthesised policy.
△ Less
Submitted 22 November, 2019; v1 submitted 23 September, 2019;
originally announced September 2019.