Search | arXiv e-print repository

Oscillatory Associative Memory with Exponential Capacity

Authors: Taosha Guo, Arie Ogranovich, Arvind R. Venkatakrishnan, Madelyn R. Shapiro, Francesco Bullo, Fabio Pasqualetti

Abstract: The slowing of Moore's law and the increasing energy demands of machine learning present critical challenges for both the hardware and machine learning communities, and drive the development of novel computing paradigms. Of particular interest is the challenge of incorporating memory efficiently into the learning process. Inspired by how human brains store and retrieve information, associative mem… ▽ More The slowing of Moore's law and the increasing energy demands of machine learning present critical challenges for both the hardware and machine learning communities, and drive the development of novel computing paradigms. Of particular interest is the challenge of incorporating memory efficiently into the learning process. Inspired by how human brains store and retrieve information, associative memory mechanisms provide a class of computational methods that can store and retrieve patterns in a robust, energy-efficient manner. Existing associative memory architectures, such as the celebrated Hopfield model and oscillatory associative memory networks, store patterns as stable equilibria of network dynamics. However, the capacity (i.e. the number of patterns that a network can memorize normalized by their number of nodes) of existing oscillatory models have been shown to decrease with the size of the network, making them impractical for large-scale, real-world applications. In this paper, we propose a novel associative memory architecture based on Kuramoto oscillators. We show that the capacity of our associative memory network increases exponentially with network size and features no spurious memories. In addition, we present algorithms and numerical experiments to support these theoretical findings, providing guidelines for the hardware implementation of the proposed associative memory networks. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: 7 pages, 5 figures

arXiv:2503.17467 [pdf, other]

doi 10.1109/TCSVT.2025.3552049

High Efficiency Wiener Filter-based Point Cloud Quality Enhancement for MPEG G-PCC

Authors: Yuxuan Wei, Zehan Wang, Tian Guo, Hao Liu, Liquan Shen, Hui Yuan

Abstract: Point clouds, which directly record the geometry and attributes of scenes or objects by a large number of points, are widely used in various applications such as virtual reality and immersive communication. However, due to the huge data volume and unstructured geometry, efficient compression of point clouds is very crucial. The Moving Picture Expert Group is establishing a geometry-based point clo… ▽ More Point clouds, which directly record the geometry and attributes of scenes or objects by a large number of points, are widely used in various applications such as virtual reality and immersive communication. However, due to the huge data volume and unstructured geometry, efficient compression of point clouds is very crucial. The Moving Picture Expert Group is establishing a geometry-based point cloud compression (G-PCC) standard for both static and dynamic point clouds in recent years. Although lossy compression of G-PCC can achieve a very high compression ratio, the reconstruction quality is relatively low, especially at low bitrates. To mitigate this problem, we propose a high efficiency Wiener filter that can be integrated into the encoder and decoder pipeline of G-PCC to improve the reconstruction quality as well as the rate-distortion performance for dynamic point clouds. Specifically, we first propose a basic Wiener filter, and then improve it by introducing coefficients inheritance and variance-based point classification for the Luma component. Besides, to reduce the complexity of the nearest neighbor search during the application of the Wiener filter, we also propose a Morton code-based fast nearest neighbor search algorithm for efficient calculation of filter coefficients. Experimental results demonstrate that the proposed method can achieve average Bjøntegaard delta rates of -6.1%, -7.3%, and -8.0% for Luma, Chroma Cb, and Chroma Cr components, respectively, under the condition of lossless-geometry-lossy-attributes configuration compared to the latest G-PCC encoding platform (i.e., geometry-based solid content test model version 7.0 release candidate 2) by consuming affordable computational complexity. △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2503.06755 [pdf, other]

Transfer Learning for LQR Control

Authors: Taosha Guo, Fabio Pasqualetti

Abstract: In this paper, we study a transfer learning framework for Linear Quadratic Regulator (LQR) control, where (i) the dynamics of the system of interest (target system) are unknown and only a short trajectory of impulse responses from the target system is provided, and (ii) impulse responses are available from $N$ source systems with different dynamics. We show that the LQR controller can be learned f… ▽ More In this paper, we study a transfer learning framework for Linear Quadratic Regulator (LQR) control, where (i) the dynamics of the system of interest (target system) are unknown and only a short trajectory of impulse responses from the target system is provided, and (ii) impulse responses are available from $N$ source systems with different dynamics. We show that the LQR controller can be learned from a sufficiently long trajectory of impulse responses. Further, a transferable mode set can be identified using the available data from source systems and the target system, enabling the reconstruction of the target system's impulse responses for controller design. By leveraging data from source systems, we show that the sample complexity for synthesizing the LQR controller can be reduced by $50 \%$. Algorithms and numerical examples are provided to demonstrate the implementation of the proposed transfer control framework. △ Less

Submitted 1 May, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

Comments: 6 pages, 2 figures

arXiv:2503.00047 [pdf, other]

PCE-GAN: A Generative Adversarial Network for Point Cloud Attribute Quality Enhancement based on Optimal Transport

Authors: Tian Guo, Hui Yuan, Qi Liu, Honglei Su, Raouf Hamzaoui, Sam Kwong

Abstract: Point cloud compression significantly reduces data volume but sacrifices reconstruction quality, highlighting the need for advanced quality enhancement techniques. Most existing approaches focus primarily on point-to-point fidelity, often neglecting the importance of perceptual quality as interpreted by the human visual system. To address this issue, we propose a generative adversarial network for… ▽ More Point cloud compression significantly reduces data volume but sacrifices reconstruction quality, highlighting the need for advanced quality enhancement techniques. Most existing approaches focus primarily on point-to-point fidelity, often neglecting the importance of perceptual quality as interpreted by the human visual system. To address this issue, we propose a generative adversarial network for point cloud quality enhancement (PCE-GAN), grounded in optimal transport theory, with the goal of simultaneously optimizing both data fidelity and perceptual quality. The generator consists of a local feature extraction (LFE) unit, a global spatial correlation (GSC) unit and a feature squeeze unit. The LFE unit uses dynamic graph construction and a graph attention mechanism to efficiently extract local features, placing greater emphasis on points with severe distortion. The GSC unit uses the geometry information of neighboring patches to construct an extended local neighborhood and introduces a transformer-style structure to capture long-range global correlations. The discriminator computes the deviation between the probability distributions of the enhanced point cloud and the original point cloud, guiding the generator to achieve high quality reconstruction. Experimental results show that the proposed method achieves state-of-the-art performance. Specifically, when applying PCE-GAN to the latest geometry-based point cloud compression (G-PCC) test model, it achieves an average BD-rate of -19.2% compared with the PredLift coding configuration and -18.3% compared with the RAHT coding configuration. Subjective comparisons show a significant improvement in texture clarity and color transitions, revealing finer details and more natural color gradients. △ Less

Submitted 26 February, 2025; originally announced March 2025.

arXiv:2412.20466 [pdf, other]

Single-image reflection removal via self-supervised diffusion models

Authors: Zhengyang Lu, Weifan Wang, Tianhao Guo, Feng Wang

Abstract: Reflections often degrade the visual quality of images captured through transparent surfaces, and reflection removal methods suffers from the shortage of paired real-world samples.This paper proposes a hybrid approach that combines cycle-consistency with denoising diffusion probabilistic models (DDPM) to effectively remove reflections from single images without requiring paired training data. The… ▽ More Reflections often degrade the visual quality of images captured through transparent surfaces, and reflection removal methods suffers from the shortage of paired real-world samples.This paper proposes a hybrid approach that combines cycle-consistency with denoising diffusion probabilistic models (DDPM) to effectively remove reflections from single images without requiring paired training data. The method introduces a Reflective Removal Network (RRN) that leverages DDPMs to model the decomposition process and recover the transmission image, and a Reflective Synthesis Network (RSN) that re-synthesizes the input image using the separated components through a nonlinear attention-based mechanism. Experimental results demonstrate the effectiveness of the proposed method on the SIR$^2$, Flash-Based Reflection Removal (FRR) Dataset, and a newly introduced Museum Reflection Removal (MRR) dataset, showing superior performance compared to state-of-the-art methods. △ Less

Submitted 29 December, 2024; originally announced December 2024.

arXiv:2409.10293 [pdf, other]

SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds

Authors: Xiaolong Mao, Hui Yuan, Tian Guo, Shiqi Jiang, Raouf Hamzaoui, Sam Kwong

Abstract: We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference… ▽ More We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the latest G-PCC test model (TMC13v23) under the MPEG common test conditions (CTCs), the proposed method achieved an average Bjontegaard delta bitrate reduction of 24.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG Category Dense dataset. This is the first instance of a learning-based codec outperforming the G-PCC standard on these datasets under the MPEG CTCs. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: 136pages, 13 figures

arXiv:2409.07640 [pdf, other]

Scoping Sustainable Collaborative Mixed Reality

Authors: Yasra Chandio, Noman Bashir, Tian Guo, Elsa Olivetti, Fatima Anwar

Abstract: Mixed Reality (MR) is becoming ubiquitous as it finds its applications in education, healthcare, and other sectors beyond leisure. While MR end devices, such as headsets, have low energy intensity, the total number of devices and resource requirements of the entire MR ecosystem, which includes cloud and edge endpoints, can be significant. The resulting operational and embodied carbon footprint of… ▽ More Mixed Reality (MR) is becoming ubiquitous as it finds its applications in education, healthcare, and other sectors beyond leisure. While MR end devices, such as headsets, have low energy intensity, the total number of devices and resource requirements of the entire MR ecosystem, which includes cloud and edge endpoints, can be significant. The resulting operational and embodied carbon footprint of MR has led to concerns about its environmental implications. Recent research has explored reducing the carbon footprint of MR devices by exploring hardware design space or network optimizations. However, many additional avenues for enhancing MR's sustainability remain open, including energy savings in non-processor components and carbon-aware optimizations in collaborative MR ecosystems. In this paper, we aim to identify key challenges, existing solutions, and promising research directions for improving MR sustainability. We explore adjacent fields of embedded and mobile computing systems for insights and outline MR-specific problems requiring new solutions. We identify the challenges that must be tackled to enable researchers, developers, and users to avail themselves of these opportunities in collaborative MR systems. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: IEEE International Symposium on the Emerging Metaverse (ISEMV)

arXiv:2406.01414 [pdf, other]

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

Authors: Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo

Abstract: This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-le… ▽ More This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS. For open-domain NAS tasks, CE-NAS achieves SOTA results with 97.35% top-1 accuracy on CIFAR-10 with only 1.68M parameters and a carbon consumption of 38.53 lbs of CO2. On ImageNet, our searched model achieves 80.6% top-1 accuracy with a 0.78 ms TensorRT latency using FP16 on NVIDIA V100, consuming only 909.86 lbs of CO2, making it comparable to other one-shot-based NAS baselines. △ Less

Submitted 17 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.04131

arXiv:2403.01960 [pdf, other]

A robust audio deepfake detection system via multi-view feature

Authors: Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

Abstract: With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, includi… ▽ More With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, including various handcrafted features and learning-based features. Experiments show that learning-based audio features pretrained on a large amount of data generalize better than hand-crafted features on out-of-domain scenarios. Subsequently, we further improve the generalizability of the ADD system using proposed multi-feature approaches to incorporate complimentary information from features of different views. The model trained on ASV2019 data achieves an equal error rate of 24.27\% on the In-the-Wild dataset. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 5 pages, 2 figures

arXiv:2402.17487 [pdf, other]

Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model

Authors: Panqi Jia, A. Burakhan Koyuncu, Jue Mao, Ze Cui, Yi Ma, Tiansheng Guo, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Jing Wang, Elena Alshina, Andre Kaup

Abstract: The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties… ▽ More The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties evoked the attention of both scientific and industrial communities, resulting in the standardization activity JPEG-AI. The verification model for the standardization process of JPEG-AI is already in development and has surpassed the advanced VVC intra codec. To generate reconstructed images with the desired bits per pixel and assess the BD-rate performance of both the JPEG-AI verification model and VVC intra, bit rate matching is employed. However, the current state of the JPEG-AI verification model experiences significant slowdowns during bit rate matching, resulting in suboptimal performance due to an unsuitable model. The proposed methodology offers a gradual algorithmic optimization for matching bit rates, resulting in a fourfold acceleration and over 1% improvement in BD-rate at the base operation point. At the high operation point, the acceleration increases up to sixfold. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted at (IEEE) PCS 2024; 6 pages

arXiv:2303.09002 [pdf, other]

doi 10.1109/LCSYS.2023.3285167

Imitation and Transfer Learning for LQG Control

Authors: Taosha Guo, Abed AlRahman Al Makdah, Vishaal Krishnan, Fabio Pasqualetti

Abstract: In this paper we study an imitation and transfer learning setting for Linear Quadratic Gaussian (LQG) control, where (i) the system dynamics, noise statistics and cost function are unknown and expert data is provided (that is, sequences of optimal inputs and outputs) to learn the LQG controller, and (ii) multiple control tasks are performed for the same system but with different LQG costs. We show… ▽ More In this paper we study an imitation and transfer learning setting for Linear Quadratic Gaussian (LQG) control, where (i) the system dynamics, noise statistics and cost function are unknown and expert data is provided (that is, sequences of optimal inputs and outputs) to learn the LQG controller, and (ii) multiple control tasks are performed for the same system but with different LQG costs. We show that the LQG controller can be learned from a set of expert trajectories of length $n(l+2)-1$, with $n$ and $l$ the dimension of the system state and output, respectively. Further, the controller can be decomposed as the product of an estimation matrix, which depends only on the system dynamics, and a control matrix, which depends on the LQG cost. This data-based separation principle allows us to transfer the estimation matrix across different LQG tasks, and to reduce the length of the expert trajectories needed to learn the LQG controller to~$2n+m-1$ with $m$ the dimension of the inputs (for single-input systems with $l=2$, this yields approximately a $50\%$ reduction of the required expert data). △ Less

Submitted 22 June, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE L-CSS

arXiv:2211.02903 [pdf, other]

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Authors: Yongmao Zhang, Heyang Xue, Hanzhao Li, Lei Xie, Tingwei Guo, Ruixiong Zhang, Caixia Gong

Abstract: End-to-end singing voice synthesis (SVS) model VISinger can achieve better performance than the typical two-stage model with fewer parameters. However, VISinger has several problems: text-to-phase problem, the end-to-end model learns the meaningless mapping of text-to-phase; glitches problem, the harmonic components corresponding to the periodic signal of the voiced segment occurs a sudden change… ▽ More End-to-end singing voice synthesis (SVS) model VISinger can achieve better performance than the typical two-stage model with fewer parameters. However, VISinger has several problems: text-to-phase problem, the end-to-end model learns the meaningless mapping of text-to-phase; glitches problem, the harmonic components corresponding to the periodic signal of the voiced segment occurs a sudden change with audible artefacts; low sampling rate, the sampling rate of 24KHz does not meet the application needs of high-fidelity generation with the full-band rate (44.1KHz or higher). In this paper, we propose VISinger 2 to address these issues by integrating the digital signal processing (DSP) methods with VISinger. Specifically, inspired by recent advances in differentiable digital signal processing (DDSP), we incorporate a DSP synthesizer into the decoder to solve the above issues. The DSP synthesizer consists of a harmonic synthesizer and a noise synthesizer to generate periodic and aperiodic signals, respectively, from the latent representation z in VISinger. It supervises the posterior encoder to extract the latent representation without phase information and avoid the prior encoder modelling text-to-phase mapping. To avoid glitch artefacts, the HiFi-GAN is modified to accept the waveforms generated by the DSP synthesizer as a condition to produce the singing voice. Moreover, with the improved waveform decoder, VISinger 2 manages to generate 44.1kHz singing audio with richer expression and better quality. Experiments on OpenCpop corpus show that VISinger 2 outperforms VISinger, CpopSing and RefineSinger in both subjective and objective metrics. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: Submitted to ICASSP 2023

arXiv:2210.07707 [pdf]

doi 10.1109/TIE.2022.3212378

Generative Adversarial Learning for Trusted and Secure Clustering in Industrial Wireless Sensor Networks

Authors: Liu Yang, Simon X. Yang, Yun Li, Yinzhi Lu, Tan Guo

Abstract: Traditional machine learning techniques have been widely used to establish the trust management systems. However, the scale of training dataset can significantly affect the security performances of the systems, while it is a great challenge to detect malicious nodes due to the absence of labeled data regarding novel attacks. To address this issue, this paper presents a generative adversarial netwo… ▽ More Traditional machine learning techniques have been widely used to establish the trust management systems. However, the scale of training dataset can significantly affect the security performances of the systems, while it is a great challenge to detect malicious nodes due to the absence of labeled data regarding novel attacks. To address this issue, this paper presents a generative adversarial network (GAN) based trust management mechanism for Industrial Wireless Sensor Networks (IWSNs). First, type-2 fuzzy logic is adopted to evaluate the reputation of sensor nodes while alleviating the uncertainty problem. Then, trust vectors are collected to train a GAN-based codec structure, which is used for further malicious node detection. Moreover, to avoid normal nodes being isolated from the network permanently due to error detections, a GAN-based trust redemption model is constructed to enhance the resilience of trust management. Based on the latest detection results, a trust model update method is developed to adapt to the dynamic industrial environment. The proposed trust management mechanism is finally applied to secure clustering for reliable and real-time data transmission, and simulation results show that it achieves a high detection rate up to 96%, as well as a low false positive rate below 8%. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2209.01386 [pdf, other]

SaleNet: A low-power end-to-end CNN accelerator for sustained attention level evaluation using EEG

Authors: Chao Zhang, Zijian Tang, Taoming Guo, Jiaxin Lei, Jiaxin Xiao, Anhe Wang, Shuo Bai, Milin Zhang

Abstract: This paper proposes SaleNet - an end-to-end convolutional neural network (CNN) for sustained attention level evaluation using prefrontal electroencephalogram (EEG). A bias-driven pruning method is proposed together with group convolution, global average pooling (GAP), near-zero pruning, weight clustering and quantization for the model compression, achieving a total compression ratio of 183.11x. Th… ▽ More This paper proposes SaleNet - an end-to-end convolutional neural network (CNN) for sustained attention level evaluation using prefrontal electroencephalogram (EEG). A bias-driven pruning method is proposed together with group convolution, global average pooling (GAP), near-zero pruning, weight clustering and quantization for the model compression, achieving a total compression ratio of 183.11x. The compressed SaleNet obtains a state-of-the-art subject-independent sustained attention level classification accuracy of 84.2% on the recorded 6-subject EEG database in this work. The SaleNet is implemented on a Artix-7 FPGA with a competitive power consumption of 0.11 W and an energy-efficiency of 8.19 GOps/W. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: 5 pages, 4 figures, to be published in IEEE International Symposium on Circuits and Systems (ISCAS) 2022

arXiv:2208.01221 [pdf]

Generative Adversarial Learning for Intelligent Trust Management in 6G Wireless Networks

Authors: Liu Yang, Yun Li, Simon X. Yang, Yinzhi Lu, Tan Guo, Keping Yu

Abstract: Emerging six generation (6G) is the integration of heterogeneous wireless networks, which can seamlessly support anywhere and anytime networking. But high Quality-of-Trust should be offered by 6G to meet mobile user expectations. Artificial intelligence (AI) is considered as one of the most important components in 6G. Then AI-based trust management is a promising paradigm to provide trusted and re… ▽ More Emerging six generation (6G) is the integration of heterogeneous wireless networks, which can seamlessly support anywhere and anytime networking. But high Quality-of-Trust should be offered by 6G to meet mobile user expectations. Artificial intelligence (AI) is considered as one of the most important components in 6G. Then AI-based trust management is a promising paradigm to provide trusted and reliable services. In this article, a generative adversarial learning-enabled trust management method is presented for 6G wireless networks. Some typical AI-based trust management schemes are first reviewed, and then a potential heterogeneous and intelligent 6G architecture is introduced. Next, the integration of AI and trust management is developed to optimize the intelligence and security. Finally, the presented AI-based trust management method is applied to secure clustering to achieve reliable and real-time communications. Simulation results have demonstrated its excellent performance in guaranteeing network security and service quality. △ Less

Submitted 1 August, 2022; originally announced August 2022.

arXiv:2207.10282 [pdf]

doi 10.1109/JSEN.2021.3070689

An Evolutionary Game based Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Wireless Sensor Networks

Authors: Liu Yang, Yinzhi Lu, Simon X. Yang, Yuanchang Zhong, Tan Guo, Zhifang Liang

Abstract: Trustworthy and reliable data delivery is a challenging task in Wireless Sensor Networks (WSNs) due to unique characteristics and constraints. To acquire secured data delivery and address the conflict between security and energy, in this paper we present an evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for WSNs. Firstly, a fuzzy trust evaluati… ▽ More Trustworthy and reliable data delivery is a challenging task in Wireless Sensor Networks (WSNs) due to unique characteristics and constraints. To acquire secured data delivery and address the conflict between security and energy, in this paper we present an evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for WSNs. Firstly, a fuzzy trust evaluation method is presented to transform the transmission evidences into trust values while effectively alleviating the trust uncertainty. And then, a K-Means based outlier detection scheme is proposed to further analyze plenty of trust values obtained via fuzzy trust evaluation or trust recommendation. It can discover the commonalities and differences among sensor nodes while improving the accuracy of outlier detection. Finally, we present an evolutionary game based secure clustering protocol to achieve a trade-off between security assurance and energy saving for sensor nodes when electing for the cluster heads. A sensor node which failed to be the cluster head can securely choose its own head by isolating the suspicious nodes. Simulation results verify that our secure clustering protocol can effectively defend the network against the attacks from internal selfish or compromised nodes. Correspondingly, the timely data transfer rate can be improved significantly. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.09936 [pdf]

doi 10.1109/TII.2020.3019286

A Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Industrial Wireless Sensor Networks

Authors: Liu Yang, Yinzhi Lu, Simon X. Yang, Tan Guo, Zhifang Liang

Abstract: Security is one of the major concerns in Industrial Wireless Sensor Networks (IWSNs). To assure the security in clustered IWSNs, this paper presents a secure clustering protocol with fuzzy trust evaluation and outlier detection (SCFTO). Firstly, to deal with the transmission uncertainty in an open wireless medium, an interval type-2 fuzzy logic controller is adopted to estimate the trusts. And the… ▽ More Security is one of the major concerns in Industrial Wireless Sensor Networks (IWSNs). To assure the security in clustered IWSNs, this paper presents a secure clustering protocol with fuzzy trust evaluation and outlier detection (SCFTO). Firstly, to deal with the transmission uncertainty in an open wireless medium, an interval type-2 fuzzy logic controller is adopted to estimate the trusts. And then a density based outlier detection mechanism is introduced to acquire an adaptive trust threshold used to isolate the malicious nodes from being cluster heads. Finally, a fuzzy based cluster heads election method is proposed to achieve a balance between energy saving and security assurance, so that a normal sensor node with more residual energy or less confidence on other nodes has higher probability to be the cluster head. Extensive experiments verify that our secure clustering protocol can effectively defend the network against attacks from internal malicious or compromised nodes. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.09057 [pdf]

doi 10.1109/TII.2021.3128954

An Intelligent Trust Cloud Management Method for Secure Clustering in 5G enabled Internet of Medical Things

Authors: Liu Yang, Keping Yu, Simon X. Yang, Chinmay Chakraborty, Yinzhi Lu, Tan Guo

Abstract: 5G edge computing enabled Internet of Medical Things (IoMT) is an efficient technology to provide decentralized medical services while Device-to-device (D2D) communication is a promising paradigm for future 5G networks. To assure secure and reliable communication in 5G edge computing and D2D enabled IoMT systems, this paper presents an intelligent trust cloud management method. Firstly, an active… ▽ More 5G edge computing enabled Internet of Medical Things (IoMT) is an efficient technology to provide decentralized medical services while Device-to-device (D2D) communication is a promising paradigm for future 5G networks. To assure secure and reliable communication in 5G edge computing and D2D enabled IoMT systems, this paper presents an intelligent trust cloud management method. Firstly, an active training mechanism is proposed to construct the standard trust clouds. Secondly, individual trust clouds of the IoMT devices can be established through fuzzy trust inferring and recommending. Thirdly, a trust classification scheme is proposed to determine whether an IoMT device is malicious. Finally, a trust cloud update mechanism is presented to make the proposed trust management method adaptive and intelligent under an open wireless medium. Simulation results demonstrate that the proposed method can effectively address the trust uncertainty issue and improve the detection accuracy of malicious devices. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.08226 [pdf]

doi 10.1109/TII.2022.3186891

An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things

Authors: Yinzhi Lu, Liu Yang, Simon X. Yang, Qiaozhi Hua, Arun Kumar Sangaiah, Tan Guo, Keping Yu

Abstract: Edge enabled Industrial Internet of Things (IIoT) platform is of great significance to accelerate the development of smart industry. However, with the dramatic increase in real-time IIoT applications, it is a great challenge to support fast response time, low latency, and efficient bandwidth utilization. To address this issue, Time Sensitive Network (TSN) is recently researched to realize low late… ▽ More Edge enabled Industrial Internet of Things (IIoT) platform is of great significance to accelerate the development of smart industry. However, with the dramatic increase in real-time IIoT applications, it is a great challenge to support fast response time, low latency, and efficient bandwidth utilization. To address this issue, Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling. To the best of our knowledge, the combinability of multiple flows, which can significantly affect the scheduling performance, has never been systematically analyzed before. In this article, we first analyze the combinability problem. Then a non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows. Moreover, to improve bandwidth utilization, a dynamic queue scheduling (DQS) method is presented for the best-effort flows. Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2204.08720 [pdf, other]

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

Authors: Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

Abstract: This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. The first round of Track 3.2 data is generated from Text-to-Speech(TTS) or voice conversion (VC) algorithms, while the second round of data consists of… ▽ More This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. The first round of Track 3.2 data is generated from Text-to-Speech(TTS) or voice conversion (VC) algorithms, while the second round of data consists of generated fake audio from other participants in Track 3.1, aiming to spoof our systems. Our systems use a standard 34-layer ResNet, with multi-head attention pooling \cite{india2019self} to learn the discriminative embedding for fake audio and spoof detection. We further utilize neural stitching to boost the model's generalization capability in order to perform equally well in different tasks, and more details will be explained in the following sessions. The experiments show that our proposed method outperforms all other systems with a 10.1% equal error rate(EER) in Track 3.2. △ Less

Submitted 19 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: Accepted to ICASSP 2022

arXiv:2204.08692 [pdf, other]

Time Domain Adversarial Voice Conversion for ADD 2022

Authors: Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

Abstract: In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) system to convert source speech with arbitrary language content into the target speaker%u2019s fake speech. Then the converted speech generated from VC is post-processed in the time domain to improve the deception ability.… ▽ More In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) system to convert source speech with arbitrary language content into the target speaker%u2019s fake speech. Then the converted speech generated from VC is post-processed in the time domain to improve the deception ability. The experimental results show that our system has adversarial ability against anti-spoofing detectors with a little compromise in audio quality and speaker similarity. This system ranks top in Track 3.1 in the ADD 2022, showing that our method could also gain good generalization ability against different detectors. △ Less

Submitted 19 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: Accepted to ICASSP 2022

arXiv:2204.08686 [pdf, ps, other]

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

Authors: Yanguang Xu, Jianwei Sun, Yang Han, Shuaijiang Zhao, Chaoyang Mei, Tingwei Guo, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

Abstract: This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. The purpose of Task 1 is to leverage both audio and video information to improve the environmental robustness of far-field wake word spotting. In the proposed system, firstly, we take advantage of speech enhancement algorithms such as beamforming and weight… ▽ More This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. The purpose of Task 1 is to leverage both audio and video information to improve the environmental robustness of far-field wake word spotting. In the proposed system, firstly, we take advantage of speech enhancement algorithms such as beamforming and weighted prediction error (WPE) to address the multi-microphone conversational audio. Secondly, several data augmentation techniques are applied to simulate a more realistic far-field scenario. For the video information, the provided region of interest (ROI) is used to obtain visual representation. Then the multi-layer CNN is proposed to learn audio and visual representations, and these representations are fed into our two-branch attention-based network which can be employed for fusion, such as transformer and conformed. The focal loss is used to fine-tune the model and improve the performance significantly. Finally, multiple trained models are integrated by casting vote to achieve our final 0.091 score. △ Less

Submitted 19 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: Accepted to ICASSP 2022

arXiv:2202.08994 [pdf, other]

REFUGE2 Challenge: A Treasure Trove for Multi-Dimension Analysis and Evaluation in Glaucoma Screening

Authors: Huihui Fang, Fei Li, Junde Wu, Huazhu Fu, Xu Sun, Jaemin Son, Shuang Yu, Menglu Zhang, Chenglang Yuan, Cheng Bian, Baiying Lei, Benjian Zhao, Xinxing Xu, Shaohua Li, Francisco Fumero, José Sigut, Haidar Almubarak, Yakoub Bazi, Yuanhao Guo, Yating Zhou, Ujjwal Baid, Shubham Innani, Tianjiao Guo, Jie Yang, José Ignacio Orlando , et al. (3 additional authors not shown)

Abstract: With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small… ▽ More With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small in size. In addition, labeling standards are not uniform across datasets, and there is no clear information on the acquisition device. Here we release a multi-annotation, multi-quality, and multi-device color fundus image dataset for glaucoma analysis on an original challenge -- Retinal Fundus Glaucoma Challenge 2nd Edition (REFUGE2). The REFUGE2 dataset contains 2000 color fundus images with annotations of glaucoma classification, optic disc/cup segmentation, as well as fovea localization. Meanwhile, the REFUGE2 challenge sets three sub-tasks of automatic glaucoma diagnosis and fundus structure analysis and provides an online evaluation framework. Based on the characteristics of multi-device and multi-quality data, some methods with strong generalizations are provided in the challenge to make the predictions more robust. This shows that REFUGE2 brings attention to the characteristics of real-world multi-domain data, bridging the gap between scientific research and clinical application. △ Less

Submitted 29 December, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: 29 pages, 21 figures

arXiv:2108.10550 [pdf]

A generative adversarial approach to facilitate archival-quality histopathologic diagnoses from frozen tissue sections

Authors: Kianoush Falahkheirkhah, Tao Guo, Michael Hwang, Pheroze Tamboli, Christopher G Wood, Jose A Karam, Kanishka Sircar, Rohit Bhargava

Abstract: In clinical diagnostics and research involving histopathology, formalin fixed paraffin embedded (FFPE) tissue is almost universally favored for its superb image quality. However, tissue processing time (more than 24 hours) can slow decision-making. In contrast, fresh frozen (FF) processing (less than 1 hour) can yield rapid information but diagnostic accuracy is suboptimal due to lack of clearing,… ▽ More In clinical diagnostics and research involving histopathology, formalin fixed paraffin embedded (FFPE) tissue is almost universally favored for its superb image quality. However, tissue processing time (more than 24 hours) can slow decision-making. In contrast, fresh frozen (FF) processing (less than 1 hour) can yield rapid information but diagnostic accuracy is suboptimal due to lack of clearing, morphologic deformation and more frequent artifacts. Here, we bridge this gap using artificial intelligence. We synthesize FFPE-like images ,virtual FFPE, from FF images using a generative adversarial network (GAN) from 98 paired kidney samples derived from 40 patients. Five board-certified pathologists evaluated the results in a blinded test. Image quality of the virtual FFPE data was assessed to be high and showed a close resemblance to real FFPE images. Clinical assessments of disease on the virtual FFPE images showed a higher inter-observer agreement compared to FF images. The nearly instantaneously generated virtual FFPE images can not only reduce time to information but can facilitate more precise diagnosis from routine FF images without extraneous costs and effort. △ Less

Submitted 24 August, 2021; originally announced August 2021.

Comments: 24 pages, 6 figures, and 3 tables

arXiv:2107.00705 [pdf, other]

Precise Feature Selection and Case Study of Intrusion Detection in an Industrial Control System (ICS) Environment

Authors: Terry Guo, Animesh Dahal, Ambareen Siraj

Abstract: This paper presents analytical techniques to improve redundancy and relevance assessment for precise selection of features in practical multi-class raw datasets. We propose a matrix-rank based $k$-medoids algorithm that guarantees to output all independent medoids. The new algorithm uses matrix rank as a robust indicator, while a traditional $k$-medoids algorithm depends on specific datasets and h… ▽ More This paper presents analytical techniques to improve redundancy and relevance assessment for precise selection of features in practical multi-class raw datasets. We propose a matrix-rank based $k$-medoids algorithm that guarantees to output all independent medoids. The new algorithm uses matrix rank as a robust indicator, while a traditional $k$-medoids algorithm depends on specific datasets and how the distance between any of two features is defined. Another advantage is that the total number of operations in the nested loops is bounded, different from some $k$-medoids algorithms that involve random search. Sparse regression is an efficient tool for feature relevance analysis, but its outcome can depend on what labeled datasets are employed. A compensation method is introduced in this paper to handle the unequality of class-occurrence in a practical raw dataset. To assess the proposed techniques quantitatively, an existing Industrial Control System (ICS) dataset is used to perform intrusion detection. The numerical results generated from this case study validate the effectiveness and necessity of the proposed analytical framework. △ Less

Submitted 1 July, 2021; originally announced July 2021.

arXiv:2010.09275 [pdf, other]

DiDiSpeech: A Large Scale Mandarin Speech Corpus

Authors: Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li

Abstract: This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quiet environment and is suitable for various speech processing tasks, such as voice conversion, multi-speaker text-to-speech and automatic speech recogni… ▽ More This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quiet environment and is suitable for various speech processing tasks, such as voice conversion, multi-speaker text-to-speech and automatic speech recognition. We conduct experiments with multiple speech tasks and evaluate the performance, showing that it is promising to use the corpus for both academic research and practical application. The corpus is available at https://outreach.didichuxing.com/research/opendata/. △ Less

Submitted 8 February, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 5 pages, 2 figures, 11 tables

arXiv:2004.00006 [pdf, other]

PointAR: Efficient Lighting Estimation for Mobile Augmented Reality

Authors: Yiqin Zhao, Tian Guo

Abstract: We propose an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art mobile deep learning models. Our pipeline, PointAR, takes a single RGB-D image captured from the mobile camera and a 2D location in that image, and estimates 2nd order spherical harmonics coefficients. This estimated spherical harmonics co… ▽ More We propose an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art mobile deep learning models. Our pipeline, PointAR, takes a single RGB-D image captured from the mobile camera and a 2D location in that image, and estimates 2nd order spherical harmonics coefficients. This estimated spherical harmonics coefficients can be directly utilized by rendering engines for supporting spatially variant indoor lighting, in the context of augmented reality. Our key insight is to formulate the lighting estimation as a point cloud-based learning problem directly from point clouds, which is in part inspired by the Monte Carlo integration leveraged by real-time spherical harmonics lighting. While existing approaches estimate lighting information with complex deep learning pipelines, our method focuses on reducing the computational complexity. Through both quantitative and qualitative experiments, we demonstrate that PointAR achieves lower lighting estimation errors compared to state-of-the-art methods. Further, our method requires an order of magnitude lower resource, comparable to that of mobile-specific DNNs. △ Less

Submitted 17 July, 2020; v1 submitted 30 March, 2020; originally announced April 2020.

Comments: Accepted to 16th European Conference On Computer Vision (ECCV'20)

arXiv:2003.02012 [pdf, other]

Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation

Authors: Ze Cui, Jing Wang, Shangyin Gao, Bo Bai, Tiansheng Guo, Yihui Feng

Abstract: With the development of deep learning techniques, the combination of deep learning with image compression has drawn lots of attention. Recently, learned image compression methods had exceeded their classical counterparts in terms of rate-distortion performance. However, continuous rate adaptation remains an open question. Some learned image compression methods use multiple networks for multiple ra… ▽ More With the development of deep learning techniques, the combination of deep learning with image compression has drawn lots of attention. Recently, learned image compression methods had exceeded their classical counterparts in terms of rate-distortion performance. However, continuous rate adaptation remains an open question. Some learned image compression methods use multiple networks for multiple rates, while others use one single model at the expense of computational complexity increase and performance degradation. In this paper, we propose a continuously rate adjustable learned image compression framework, Asymmetric Gained Variational Autoencoder (AG-VAE). AG-VAE utilizes a pair of gain units to achieve discrete rate adaptation in one single model with a negligible additional computation. Then, by using exponential interpolation, continuous rate adaptation is achieved without compromising performance. Besides, we propose the asymmetric Gaussian entropy model for more accurate entropy estimation. Exhaustive experiments show that our method achieves comparable quantitative performance with SOTA learned image compression methods and better qualitative performance than classical image codecs. In the ablation study, we confirm the usefulness and superiority of gain units and the asymmetric Gaussian entropy model. △ Less

Submitted 2 August, 2022; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: Accepted by CVPR 2021

arXiv:1912.01242 [pdf, other]

"How do urban incidents affect traffic speed?" A Deep Graph Convolutional Network for Incident-driven Traffic Speed Prediction

Authors: Qinge Xie, Tiancheng Guo, Yang Chen, Yu Xiao, Xin Wang, Ben Y. Zhao

Abstract: Accurate traffic speed prediction is an important and challenging topic for transportation planning. Previous studies on traffic speed prediction predominately used spatio-temporal and context features for prediction. However, they have not made good use of the impact of urban traffic incidents. In this work, we aim to make use of the information of urban incidents to achieve a better prediction o… ▽ More Accurate traffic speed prediction is an important and challenging topic for transportation planning. Previous studies on traffic speed prediction predominately used spatio-temporal and context features for prediction. However, they have not made good use of the impact of urban traffic incidents. In this work, we aim to make use of the information of urban incidents to achieve a better prediction of traffic speed. Our incident-driven prediction framework consists of three processes. First, we propose a critical incident discovery method to discover urban traffic incidents with high impact on traffic speed. Second, we design a binary classifier, which uses deep learning methods to extract the latent incident impact features from the middle layer of the classifier. Combining above methods, we propose a Deep Incident-Aware Graph Convolutional Network (DIGC-Net) to effectively incorporate urban traffic incident, spatio-temporal, periodic and context features for traffic speed prediction. We conduct experiments on two real-world urban traffic datasets of San Francisco and New York City. The results demonstrate the superior performance of our model compare to the competing benchmarks. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Comments: 18 pages, 8 figures

arXiv:1909.04572 [pdf, other]

doi 10.1109/TIP.2019.2942510

Deep MR Brain Image Super-Resolution Using Spatio-Structural Priors

Authors: Venkateswararao Cherukuri, Tiantong Guo, Steve. J. Schiff, Vishal Monga

Abstract: High resolution Magnetic Resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware and processing constraints. Recently, deep learning methods have been shown to produce compelling state-of-the-art results for image enhancement/super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose… ▽ More High resolution Magnetic Resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware and processing constraints. Recently, deep learning methods have been shown to produce compelling state-of-the-art results for image enhancement/super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose a new regularized network that exploits image priors, namely a low-rank structure and a sharpness prior to enhance deep MR image super-resolution (SR). Our contributions are then incorporating these priors in an analytically tractable fashion \color{black} as well as towards a novel prior guided network architecture that accomplishes the super-resolution task. This is particularly challenging for the low rank prior since the rank is not a differentiable function of the image matrix(and hence the network parameters), an issue we address by pursuing differentiable approximations of the rank. Sharpness is emphasized by the variance of the Laplacian which we show can be implemented by a fixed feedback layer at the output of the network. As a key extension, we modify the fixed feedback (Laplacian) layer by learning a new set of training data driven filters that are optimized for enhanced sharpness. Experiments performed on publicly available MR brain image databases and comparisons against existing state-of-the-art methods show that the proposed prior guided network offers significant practical gains in terms of improved SNR/image quality measures. Because our priors are on output images, the proposed method is versatile and can be combined with a wide variety of existing network architectures to further enhance their performance. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: Accepted to IEEE transactions on Image Processing

arXiv:1901.07061 [pdf, other]

Prior Information Guided Regularized Deep Learning for Cell Nucleus Detection

Authors: Mohammad Tofighi, Tiantong Guo, Jairam K. P. Vanamala, Vishal Monga

Abstract: Cell nuclei detection is a challenging research topic because of limitations in cellular image quality and diversity of nuclear morphology, i.e. varying nuclei shapes, sizes, and overlaps between multiple cell nuclei. This has been a topic of enduring interest with promising recent success shown by deep learning methods. These methods train Convolutional Neural Networks (CNNs) with a training set… ▽ More Cell nuclei detection is a challenging research topic because of limitations in cellular image quality and diversity of nuclear morphology, i.e. varying nuclei shapes, sizes, and overlaps between multiple cell nuclei. This has been a topic of enduring interest with promising recent success shown by deep learning methods. These methods train Convolutional Neural Networks (CNNs) with a training set of input images and known, labeled nuclei locations. Many such methods are supplemented by spatial or morphological processing. Using a set of canonical cell nuclei shapes, prepared with the help of a domain expert, we develop a new approach that we call Shape Priors with Convolutional Neural Networks (SP-CNN). We further extend the network to introduce a shape prior (SP) layer and then allowing it to become trainable (i.e. optimizable). We call this network tunable SP-CNN (TSP-CNN). In summary, we present new network structures that can incorporate 'expected behavior' of nucleus shapes via two components: learnable layers that perform the nucleus detection and a fixed processing part that guides the learning with prior information. Analytically, we formulate two new regularization terms that are targeted at: 1) learning the shapes, 2) reducing false positives while simultaneously encouraging detection inside the cell nucleus boundary. Experimental results on two challenging datasets reveal that the proposed SP-CNN and TSP-CNN can outperform state-of-the-art alternatives. △ Less

Submitted 21 January, 2019; originally announced January 2019.

Comments: Accepted for Publication

Journal ref: IEEE Transactions on Medical Imaging, January 2019

arXiv:1809.03140 [pdf, other]

Deep MR Image Super-Resolution Using Structural Priors

Authors: Venkateswararao Cherukuri, Tiantong Guo, Steven J. Schiff, Vishal Monga

Abstract: High resolution magnetic resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware, cost and processing constraints. Recently, deep learning methods have been shown to produce compelling state of the art results for image super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose a new… ▽ More High resolution magnetic resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware, cost and processing constraints. Recently, deep learning methods have been shown to produce compelling state of the art results for image super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose a new regularized network that exploits image priors, namely a low-rank structure and a sharpness prior to enhance deep MR image superresolution. Our contributions are then incorporating these priors in an analytically tractable fashion in the learning of a convolutional neural network (CNN) that accomplishes the super-resolution task. This is particularly challenging for the low rank prior, since the rank is not a differentiable function of the image matrix (and hence the network parameters), an issue we address by pursuing differentiable approximations of the rank. Sharpness is emphasized by the variance of the Laplacian which we show can be implemented by a fixed {\em feedback} layer at the output of the network. Experiments performed on two publicly available MR brain image databases exhibit promising results particularly when training imagery is limited. △ Less

Submitted 10 September, 2018; originally announced September 2018.

Comments: Accepted to IEEE ICIP 2018

arXiv:1801.05458 [pdf, other]

Deep Network for Simultaneous Decomposition and Classification in UWB-SAR Imagery

Authors: Tiep Vu, Lam Nguyen, Tiantong Guo, Vishal Monga

Abstract: Classifying buried and obscured targets of interest from other natural and manmade clutter objects in the scene is an important problem for the U.S. Army. Targets of interest are often represented by signals captured using low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar (SAR) technology. This technology has been used in various applications, including ground penetration… ▽ More Classifying buried and obscured targets of interest from other natural and manmade clutter objects in the scene is an important problem for the U.S. Army. Targets of interest are often represented by signals captured using low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar (SAR) technology. This technology has been used in various applications, including ground penetration and sensing-through-the-wall. However, the technology still faces a significant issues regarding low-resolution SAR imagery in this particular frequency band, low radar cross sections (RCS), small objects compared to radar signal wavelengths, and heavy interference. The classification problem has been firstly, and partially, addressed by sparse representation-based classification (SRC) method which can extract noise from signals and exploit the cross-channel information. Despite providing potential results, SRC-related methods have drawbacks in representing nonlinear relations and dealing with larger training sets. In this paper, we propose a Simultaneous Decomposition and Classification Network (SDCN) to alleviate noise inferences and enhance classification accuracy. The network contains two jointly trained sub-networks: the decomposition sub-network handles denoising, while the classification sub-network discriminates targets from confusers. Experimental results show significant improvements over a network without decomposition and SRC-related methods. △ Less

Submitted 22 February, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

Showing 1–33 of 33 results for author: Guo, T