Search | arXiv e-print repository

Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation

Authors: Yuan Gan, Jiaxu Miao, Yunze Wang, Yi Yang

Abstract: Advances in talking-head animation based on Latent Diffusion Models (LDM) enable the creation of highly realistic, synchronized videos. These fabricated videos are indistinguishable from real ones, increasing the risk of potential misuse for scams, political manipulation, and misinformation. Hence, addressing these ethical concerns has become a pressing issue in AI security. Recent proactive defen… ▽ More Advances in talking-head animation based on Latent Diffusion Models (LDM) enable the creation of highly realistic, synchronized videos. These fabricated videos are indistinguishable from real ones, increasing the risk of potential misuse for scams, political manipulation, and misinformation. Hence, addressing these ethical concerns has become a pressing issue in AI security. Recent proactive defense studies focused on countering LDM-based models by adding perturbations to portraits. However, these methods are ineffective at protecting reference portraits from advanced image-to-video animation. The limitations are twofold: 1) they fail to prevent images from being manipulated by audio signals, and 2) diffusion-based purification techniques can effectively eliminate protective perturbations. To address these challenges, we propose Silencer, a two-stage method designed to proactively protect the privacy of portraits. First, a nullifying loss is proposed to ignore audio control in talking-head generation. Second, we apply anti-purification loss in LDM to optimize the inverted latent feature to generate robust perturbations. Extensive experiments demonstrate the effectiveness of Silencer in proactively protecting portrait privacy. We hope this work will raise awareness among the AI security community regarding critical ethical issues related to talking-head generation techniques. Code: https://github.com/yuangan/Silencer. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: Accepted to CVPR 2025

arXiv:2405.14251

Efficient Navigation of a Robotic Fish Swimming Across the Vortical Flow Field

Authors: Haodong Feng, Dehan Yuan, Jiale Miao, Jie You, Yue Wang, Yi Zhu, Dixia Fan

Abstract: Navigating efficiently across vortical flow fields presents a significant challenge in various robotic applications. The dynamic and unsteady nature of vortical flows often disturbs the control of underwater robots, complicating their operation in hydrodynamic environments. Conventional control methods, which depend on accurate modeling, fail in these settings due to the complexity of fluid-struct… ▽ More Navigating efficiently across vortical flow fields presents a significant challenge in various robotic applications. The dynamic and unsteady nature of vortical flows often disturbs the control of underwater robots, complicating their operation in hydrodynamic environments. Conventional control methods, which depend on accurate modeling, fail in these settings due to the complexity of fluid-structure interactions (FSI) caused by unsteady hydrodynamics. This study proposes a deep reinforcement learning (DRL) algorithm, trained in a data-driven manner, to enable efficient navigation of a robotic fish swimming across vortical flows. Our proposed algorithm incorporates the LSTM architecture and uses several recent consecutive observations as the state to address the issue of partial observation, often due to sensor limitations. We present a numerical study of navigation within a Karman vortex street, created by placing a stationary cylinder in a uniform flow, utilizing the immersed boundary-lattice Boltzmann method (IB-LBM). The aim is to train the robotic fish to discover efficient navigation policies, enabling it to reach a designated target point across the Karman vortex street from various initial positions. After training, the fish demonstrates the ability to rapidly reach the target from different initial positions, showcasing the effectiveness and robustness of our proposed algorithm. Analysis of the results reveals that the robotic fish can leverage velocity gains and pressure differences induced by the vortices to reach the target, underscoring the potential of our proposed algorithm in enhancing navigation in complex hydrodynamic environments. △ Less

Submitted 27 September, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: We would like to request the withdrawal of our submission due to some misunderstandings among the co-authors concerning the submission process. It appears that the current version was submitted before we reached a consensus among all authors. We are actively working to address these matters and plan to resubmit a revised version once we achieve agreement

arXiv:2403.16286 [pdf, other]

HemoSet: The First Blood Segmentation Dataset for Automation of Hemostasis Management

Authors: Albert J. Miao, Shan Lin, Jingpei Lu, Florian Richter, Benjamin Ostrander, Emily K. Funk, Ryan K. Orosco, Michael C. Yip

Abstract: Hemorrhaging occurs in surgeries of all types, forcing surgeons to quickly adapt to the visual interference that results from blood rapidly filling the surgical field. Introducing automation into the crucial surgical task of hemostasis management would offload mental and physical tasks from the surgeon and surgical assistants while simultaneously increasing the efficiency and safety of the operati… ▽ More Hemorrhaging occurs in surgeries of all types, forcing surgeons to quickly adapt to the visual interference that results from blood rapidly filling the surgical field. Introducing automation into the crucial surgical task of hemostasis management would offload mental and physical tasks from the surgeon and surgical assistants while simultaneously increasing the efficiency and safety of the operation. The first step in automation of hemostasis management is detection of blood in the surgical field. To propel the development of blood detection algorithms in surgeries, we present HemoSet, the first blood segmentation dataset based on bleeding during a live animal robotic surgery. Our dataset features vessel hemorrhage scenarios where turbulent flow leads to abnormal pooling geometries in surgical fields. These pools are formed in conditions endemic to surgical procedures -- uneven heterogeneous tissue, under glossy lighting conditions and rapid tool movement. We benchmark several state-of-the-art segmentation models and provide insight into the difficulties specific to blood detection. We intend for HemoSet to spur development of autonomous blood suction tools by providing a platform for training and refining blood segmentation models, addressing the precision needed for such robotics. △ Less

Submitted 2 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2308.06614 [pdf, other]

A Fog-based Smart Agriculture System to Detect Animal Intrusion

Authors: Jinpeng Miao, Dasari Rajasekhar, Shivakant Mishra, Sanjeet Kumar Nayak, Ramanarayan Yadav

Abstract: Smart agriculture is one of the most promising areas where IoT-enabled technologies have the potential to substantially improve the quality and quantity of the crops and reduce the associated operational cost. However, building a smart agriculture system presents several challenges, including high latency and bandwidth consumption associated with cloud computing, frequent Internet disconnections i… ▽ More Smart agriculture is one of the most promising areas where IoT-enabled technologies have the potential to substantially improve the quality and quantity of the crops and reduce the associated operational cost. However, building a smart agriculture system presents several challenges, including high latency and bandwidth consumption associated with cloud computing, frequent Internet disconnections in rural areas, and the need to keep costs low for farmers. This paper presents an end-to-end, fog-based smart agriculture infrastructure that incorporates edge computing and LoRa-based communication to address these challenges. Our system is deployed to transform traditional agriculture land of rural areas into smart agriculture. We address the top concern of farmers - animals intruding - by proposing a solution that detects animal intrusion using low-cost PIR sensors, cameras, and computer vision. In particular, we propose three different sensor layouts and a novel algorithm for predicting animals' future locations. Our system can detect animals before they intrude into the field, identify them, predict their future locations, and alert farmers in a timely manner. Our experiments show that the system can effectively and quickly detect animal intrusions while maintaining a much lower cost than current state-of-the-art systems. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: 9 pages, 16 figures

arXiv:2305.00416 [pdf, other]

Quaternion Matrix Completion Using Untrained Quaternion Convolutional Neural Network for Color Image Inpainting

Authors: Jifei Miao, Kit Ian Kou, Liqiao Yang, Juan Han

Abstract: The use of quaternions as a novel tool for color image representation has yielded impressive results in color image processing. By considering the color image as a unified entity rather than separate color space components, quaternions can effectively exploit the strong correlation among the RGB channels, leading to enhanced performance. Especially, color image inpainting tasks are highly benefici… ▽ More The use of quaternions as a novel tool for color image representation has yielded impressive results in color image processing. By considering the color image as a unified entity rather than separate color space components, quaternions can effectively exploit the strong correlation among the RGB channels, leading to enhanced performance. Especially, color image inpainting tasks are highly beneficial from the application of quaternion matrix completion techniques, in recent years. However, existing quaternion matrix completion methods suffer from two major drawbacks. First, it can be difficult to choose a regularizer that captures the common characteristics of natural images, and sometimes the regularizer that is chosen based on empirical evidence may not be the optimal or efficient option. Second, the optimization process of quaternion matrix completion models is quite challenging because of the non-commutativity of quaternion multiplication. To address the two drawbacks of the existing quaternion matrix completion approaches mentioned above, this paper tends to use an untrained quaternion convolutional neural network (QCNN) to directly generate the completed quaternion matrix. This approach replaces the explicit regularization term in the quaternion matrix completion model with an implicit prior that is learned by the QCNN. Extensive quantitative and qualitative evaluations demonstrate the superiority of the proposed method for color image inpainting compared with some existing quaternion-based and tensor-based methods. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2212.08361 [pdf, other]

Quaternion Tensor Completion with Sparseness for Color Video Recovery

Authors: Liqiao Yang, Kit Ian Kou, Jifei Miao, Yang Liu, Maggie Pui Man Hoi

Abstract: A novel low-rank completion algorithm based on the quaternion tensor is proposed in this paper. This approach uses the TQt-rank of quaternion tensor to maintain the structure of RGB channels throughout the entire process. In more detail, the pixels in each frame are encoded on three imaginary parts of a quaternion as an element in a quaternion matrix. Each quaternion matrix is then stacked into a… ▽ More A novel low-rank completion algorithm based on the quaternion tensor is proposed in this paper. This approach uses the TQt-rank of quaternion tensor to maintain the structure of RGB channels throughout the entire process. In more detail, the pixels in each frame are encoded on three imaginary parts of a quaternion as an element in a quaternion matrix. Each quaternion matrix is then stacked into a quaternion tensor. A logarithmic function and truncated nuclear norm are employed to characterize the rank of the quaternion tensor in order to promote the low rankness of the tensor. Moreover, by introducing a newly defined quaternion tensor discrete cosine transform-based (QTDCT) regularization to the low-rank approximation framework, the optimized recovery results can be obtained in the local details of color videos. In particular, the sparsity of the quaternion tensor is reasonably characterized by l1 norm in the QDCT domain. This strategy is optimized via the two-step alternating direction method of multipliers (ADMM) framework. Numerical experimental results for recovering color videos show the obvious advantage of the proposed method over other potential competing approaches. △ Less

Submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.12793 [pdf, other]

Low Rank Quaternion Matrix Completion Based on Quaternion QR Decomposition and Sparse Regularizer

Authors: Juan Han, Liqiao Yang, Kit Ian Kou, Jifei Miao, Lizhi Liu

Abstract: Matrix completion is one of the most challenging problems in computer vision. Recently, quaternion representations of color images have achieved competitive performance in many fields. Because it treats the color image as a whole, the coupling information between the three channels of the color image is better utilized. Due to this, low-rank quaternion matrix completion (LRQMC) algorithms have gai… ▽ More Matrix completion is one of the most challenging problems in computer vision. Recently, quaternion representations of color images have achieved competitive performance in many fields. Because it treats the color image as a whole, the coupling information between the three channels of the color image is better utilized. Due to this, low-rank quaternion matrix completion (LRQMC) algorithms have gained considerable attention from researchers. In contrast to the traditional quaternion matrix completion algorithms based on quaternion singular value decomposition (QSVD), we propose a novel method based on quaternion Qatar Riyal decomposition (QQR). In the first part of the paper, a novel method for calculating an approximate QSVD based on iterative QQR is proposed (CQSVD-QQR), whose computational complexity is lower than that of QSVD. The largest $r \ (r>0)$ singular values of a given quaternion matrix can be computed by using CQSVD-QQR. Then, we propose a new quaternion matrix completion method based on CQSVD-QQR which combines low-rank and sparse priors of color images. Experimental results on color images and color medical images demonstrate that our model outperforms those state-of-the-art methods. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2210.16674 [pdf, other]

Semantic-SuPer: A Semantic-aware Surgical Perception Framework for Endoscopic Tissue Identification, Reconstruction, and Tracking

Authors: Shan Lin, Albert J. Miao, Jingpei Lu, Shunkai Yu, Zih-Yun Chiu, Florian Richter, Michael C. Yip

Abstract: Accurate and robust tracking and reconstruction of the surgical scene is a critical enabling technology toward autonomous robotic surgery. Existing algorithms for 3D perception in surgery mainly rely on geometric information, while we propose to also leverage semantic information inferred from the endoscopic video using image segmentation algorithms. In this paper, we present a novel, comprehensiv… ▽ More Accurate and robust tracking and reconstruction of the surgical scene is a critical enabling technology toward autonomous robotic surgery. Existing algorithms for 3D perception in surgery mainly rely on geometric information, while we propose to also leverage semantic information inferred from the endoscopic video using image segmentation algorithms. In this paper, we present a novel, comprehensive surgical perception framework, Semantic-SuPer, that integrates geometric and semantic information to facilitate data association, 3D reconstruction, and tracking of endoscopic scenes, benefiting downstream tasks like surgical navigation. The proposed framework is demonstrated on challenging endoscopic data with deforming tissue, showing its advantages over our baseline and several other state-of the-art approaches. Our code and dataset are available at https://github.com/ucsdarclab/Python-SuPer. △ Less

Submitted 20 February, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2209.02964 [pdf, other]

Quaternion Tensor Train Rank Minimization with Sparse Regularization in a Transformed Domain for Quaternion Tensor Completion

Authors: Jifei Miao, Kit Ian Kou, Liqiao Yang, Dong Cheng

Abstract: The tensor train rank (TT-rank) has achieved promising results in tensor completion due to its ability to capture the global low-rankness of higher-order (>3) tensors. On the other hand, recently, quaternions have proven to be a very suitable framework for encoding color pixels, and have obtained outstanding performance in various color image processing tasks. In this paper, the quaternion tensor… ▽ More The tensor train rank (TT-rank) has achieved promising results in tensor completion due to its ability to capture the global low-rankness of higher-order (>3) tensors. On the other hand, recently, quaternions have proven to be a very suitable framework for encoding color pixels, and have obtained outstanding performance in various color image processing tasks. In this paper, the quaternion tensor train (QTT) decomposition is presented, and based on that the quaternion TT-rank (QTT-rank) is naturally defined, which are the generalizations of their counterparts in the real number field. In addition, to utilize the local sparse prior of the quaternion tensor, a general and flexible transform framework is defined. Combining both the global low-rank and local sparse priors of the quaternion tensor, we propose a novel quaternion tensor completion model, i.e., QTT-rank minimization with sparse regularization in a transformed domain. Specifically, we use the quaternion weighted nuclear norm (QWNN) of mode-n canonical unfolding quaternion matrices to characterize the global low-QTT-rankness, and the l1-norm of the quaternion tensor in a transformed domain to characterize the local sparse property. Moreover, to enable the QTT-rank minimization to handle color images and better handle color videos, we generalize KA, a tensor augmentation method, to quaternion tensors and define quaternion KA (QKA), which is a helpful pretreatment step for QTT-rank based optimization problems. The numerical experiments on color images and color videos inpainting tasks indicate the advantages of the proposed method over the state-of-the-art ones. △ Less

Submitted 7 September, 2022; originally announced September 2022.

arXiv:2207.01287 [pdf, other]

FFCNet: Fourier Transform-Based Frequency Learning and Complex Convolutional Network for Colon Disease Classification

Authors: Kai-Ni Wang, Yuting He, Shuaishuai Zhuang, Juzheng Miao, Xiaopu He, Ping Zhou, Guanyu Yang, Guang-Quan Zhou, Shuo Li

Abstract: Reliable automatic classification of colonoscopy images is of great significance in assessing the stage of colonic lesions and formulating appropriate treatment plans. However, it is challenging due to uneven brightness, location variability, inter-class similarity, and intra-class dissimilarity, affecting the classification accuracy. To address the above issues, we propose a Fourier-based Frequen… ▽ More Reliable automatic classification of colonoscopy images is of great significance in assessing the stage of colonic lesions and formulating appropriate treatment plans. However, it is challenging due to uneven brightness, location variability, inter-class similarity, and intra-class dissimilarity, affecting the classification accuracy. To address the above issues, we propose a Fourier-based Frequency Complex Network (FFCNet) for colon disease classification in this study. Specifically, FFCNet is a novel complex network that enables the combination of complex convolutional networks with frequency learning to overcome the loss of phase information caused by real convolution operations. Also, our Fourier transform transfers the average brightness of an image to a point in the spectrum (the DC component), alleviating the effects of uneven brightness by decoupling image content and brightness. Moreover, the image patch scrambling module in FFCNet generates random local spectral blocks, empowering the network to learn long-range and local diseasespecific features and improving the discriminative ability of hard samples. We evaluated the proposed FFCNet on an in-house dataset with 2568 colonoscopy images, showing our method achieves high performance outperforming previous state-of-the art methods with an accuracy of 86:35% and an accuracy of 4.46% higher than the backbone. The project page with code is available at https://github.com/soleilssss/FFCNet. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted for publication at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2022

arXiv:2201.05344 [pdf, other]

AWSnet: An Auto-weighted Supervision Attention Network for Myocardial Scar and Edema Segmentation in Multi-sequence Cardiac Magnetic Resonance Images

Authors: Kai-Ni Wang, Xin Yang, Juzheng Miao, Lei Li, Jing Yao, Ping Zhou, Wufeng Xue, Guang-Quan Zhou, Xiahai Zhuang, Dong Ni

Abstract: Multi-sequence cardiac magnetic resonance (CMR) provides essential pathology information (scar and edema) to diagnose myocardial infarction. However, automatic pathology segmentation can be challenging due to the difficulty of effectively exploring the underlying information from the multi-sequence CMR data. This paper aims to tackle the scar and edema segmentation from multi-sequence CMR with a n… ▽ More Multi-sequence cardiac magnetic resonance (CMR) provides essential pathology information (scar and edema) to diagnose myocardial infarction. However, automatic pathology segmentation can be challenging due to the difficulty of effectively exploring the underlying information from the multi-sequence CMR data. This paper aims to tackle the scar and edema segmentation from multi-sequence CMR with a novel auto-weighted supervision framework, where the interactions among different supervised layers are explored under a task-specific objective using reinforcement learning. Furthermore, we design a coarse-to-fine framework to boost the small myocardial pathology region segmentation with shape prior knowledge. The coarse segmentation model identifies the left ventricle myocardial structure as a shape prior, while the fine segmentation model integrates a pixel-wise attention strategy with an auto-weighted supervision model to learn and extract salient pathological structures from the multi-sequence CMR data. Extensive experimental results on a publicly available dataset from Myocardial pathology segmentation combining multi-sequence CMR (MyoPS 2020) demonstrate our method can achieve promising performance compared with other state-of-the-art methods. Our method is promising in advancing the myocardial pathology assessment on multi-sequence CMR data. To motivate the community, we have made our code publicly available via https://github.com/soleilssss/AWSnet/tree/master. △ Less

Submitted 14 January, 2022; originally announced January 2022.

Comments: 19 pages, 10 figures, accepted by Medical Image Analysis

arXiv:2112.13982 [pdf, other]

Quaternion-based dynamic mode decomposition for background modeling in color videos

Authors: Juan Han, Kit Ian Kou, Jifei Miao

Abstract: Scene Background Initialization (SBI) is one of the challenging problems in computer vision. Dynamic mode decomposition (DMD) is a recently proposed method to robustly decompose a video sequence into the background model and the corresponding foreground part. However, this method needs to convert the color image into the grayscale image for processing, which leads to the neglect of the coupling in… ▽ More Scene Background Initialization (SBI) is one of the challenging problems in computer vision. Dynamic mode decomposition (DMD) is a recently proposed method to robustly decompose a video sequence into the background model and the corresponding foreground part. However, this method needs to convert the color image into the grayscale image for processing, which leads to the neglect of the coupling information between the three channels of the color image. In this study, we propose a quaternion-based DMD (Q-DMD), which extends the DMD by quaternion matrix analysis, so as to completely preserve the inherent color structure of the color image and the color video. We exploit the standard eigenvalues of the quaternion matrix to compute its spectral decomposition and calculate the corresponding Q-DMD modes and eigenvalues. The results on the publicly available benchmark datasets prove that our Q-DMD outperforms the exact DMD method, and experiment results also demonstrate that the performance of our approach is comparable to that of the state-of-the-art ones. △ Less

Submitted 27 December, 2021; originally announced December 2021.

Comments: 16 pages

arXiv:2109.14797 [pdf, other]

Emergency Vehicles Audio Detection and Localization in Autonomous Driving

Authors: Hongyi Sun, Xinyi Liu, Kecheng Xu, Jinghao Miao, Qi Luo

Abstract: Emergency vehicles in service have right-of-way over all other vehicles. Hence, all other vehicles are supposed to take proper actions to yield emergency vehicles with active sirens. As this task requires the cooperation between ears and eyes for human drivers, it also needs audio detection as a supplement to vision-based algorithms for fully autonomous driving vehicles. In urban driving scenarios… ▽ More Emergency vehicles in service have right-of-way over all other vehicles. Hence, all other vehicles are supposed to take proper actions to yield emergency vehicles with active sirens. As this task requires the cooperation between ears and eyes for human drivers, it also needs audio detection as a supplement to vision-based algorithms for fully autonomous driving vehicles. In urban driving scenarios, we need to know both the existence of emergency vehicles and their relative positions to us to decide the proper actions. We present a novel system from collecting the real-world siren data to the deployment of models using only two cost-efficient microphones. We are able to achieve promising performance for each task separately, especially within the crucial 10m to 50m distance range to react (the size of our ego vehicle is around 5m in length and 2m in width). The recall rate to determine the existence of sirens is 99.16% , the median and mean angle absolute error is 9.64° and 19.18° respectively, and the median and mean distance absolute error of 9.30m and 10.58m respectively within that range. We also benchmark various machine learning approaches that can determine the siren existence and sound source localization which includes direction and distance simultaneously within 50ms of latency. △ Less

Submitted 1 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2107.01380 [pdf, other]

Low Rank Quaternion Matrix Recovery via Logarithmic Approximation

Authors: Liqiao Yang, Jifei Miao, Kit Ian Kou

Abstract: In color image processing, image completion aims to restore missing entries from the incomplete observation image. Recently, great progress has been made in achieving completion by approximately solving the rank minimization problem. In this paper, we utilize a novel quaternion matrix logarithmic norm to approximate rank under the quaternion matrix framework. From one side, unlike the traditional… ▽ More In color image processing, image completion aims to restore missing entries from the incomplete observation image. Recently, great progress has been made in achieving completion by approximately solving the rank minimization problem. In this paper, we utilize a novel quaternion matrix logarithmic norm to approximate rank under the quaternion matrix framework. From one side, unlike the traditional matrix completion method that handles RGB channels separately, the quaternion-based method is able to avoid destroying the structure of images via putting the color image in a pure quaternion matrix. From the other side, the logarithmic norm induces a more accurate rank surrogate. Based on the logarithmic norm, we take advantage of not only truncated technique but also factorization strategy to achieve image restoration. Both strategies are optimized based on the alternating minimization framework. The experimental results demonstrate that the use of logarithmic surrogates in the quaternion domain is more superior in solving the problem of color images completion. △ Less

Submitted 3 July, 2021; originally announced July 2021.

Comments: 35 pages, 7 figures

arXiv:2101.02443 [pdf, other]

Weighted Truncated Nuclear Norm Regularization for Low-Rank Quaternion Matrix Completion

Authors: Liqiao Yang, Kit Ian Kou, Jifei Miao

Abstract: In recent years, quaternion matrix completion (QMC) based on low-rank regularization has been gradually used in image de-noising and de-blurring.Unlike low-rank matrix completion (LRMC) which handles RGB images by recovering each color channel separately, the QMC models utilize the connection of three channels by processing them as a whole. Most of the existing quaternion-based methods formulate l… ▽ More In recent years, quaternion matrix completion (QMC) based on low-rank regularization has been gradually used in image de-noising and de-blurring.Unlike low-rank matrix completion (LRMC) which handles RGB images by recovering each color channel separately, the QMC models utilize the connection of three channels by processing them as a whole. Most of the existing quaternion-based methods formulate low-rank QMC (LRQMC) as a quaternion nuclear norm (a convex relaxation of the rank) minimization problem.The main limitation of these approaches is that the singular values being minimized simultaneously so that the low-rank property could not be approximated well and efficiently. To achieve a more accurate low-rank approximation, the matrix-based truncated nuclear norm has been proposed and also been proved to have the superiority. In this paper, we introduce a quaternion truncated nuclear norm (QTNN) for LRQMC and utilize the alternating direction method of multipliers (ADMM) to get the optimization.We further propose weights to the residual error quaternion matrix during the update process for accelerating the convergence of the QTNN method with admissible performance. The weighted method utilizes a concise gradient descent strategy which has a theoretical guarantee in optimization. The effectiveness of our method is illustrated by experiments on real visual datasets. △ Less

Submitted 7 January, 2021; originally announced January 2021.

arXiv:2101.00364 [pdf, other]

Quaternion higher-order singular value decomposition and its applications in color image processing

Authors: Jifei Miao, Kit Ian Kou

Abstract: Higher-order singular value decomposition (HOSVD) is one of the most efficient tensor decomposition techniques. It has the salient ability to represent high_dimensional data and extract features. In more recent years, the quaternion has proven to be a very suitable tool for color pixel representation as it can well preserve cross-channel correlation of color channels. Motivated by the advantages o… ▽ More Higher-order singular value decomposition (HOSVD) is one of the most efficient tensor decomposition techniques. It has the salient ability to represent high_dimensional data and extract features. In more recent years, the quaternion has proven to be a very suitable tool for color pixel representation as it can well preserve cross-channel correlation of color channels. Motivated by the advantages of the HOSVD and the quaternion tool, in this paper, we generalize the HOSVD to the quaternion domain and define quaternion-based HOSVD (QHOSVD). Due to the non-commutability of quaternion multiplication, QHOSVD is not a trivial extension of the HOSVD. They have similar but different calculation procedures. The defined QHOSVD can be widely used in various visual data processing with color pixels. In this paper, we present two applications of the defined QHOSVD in color image processing: multi_focus color image fusion and color image denoising. The experimental results on the two applications respectively demonstrate the competitive performance of the proposed methods over some existing ones. △ Less

Submitted 1 January, 2021; originally announced January 2021.

arXiv:2011.04250 [pdf]

A Learning-Based Tune-Free Control Framework for Large Scale Autonomous Driving System Deployment

Authors: Yu Wang, Shu Jiang, Weiman Lin, Yu Cao, Longtao Lin, Jiangtao Hu, Jinghao Miao, Qi Luo

Abstract: This paper presents the design of a tune-free (human-out-of-the-loop parameter tuning) control framework, aiming at accelerating large scale autonomous driving system deployed on various vehicles and driving environments. The framework consists of three machine-learning-based procedures, which jointly automate the control parameter tuning for autonomous driving, including: a learning-based dynamic… ▽ More This paper presents the design of a tune-free (human-out-of-the-loop parameter tuning) control framework, aiming at accelerating large scale autonomous driving system deployed on various vehicles and driving environments. The framework consists of three machine-learning-based procedures, which jointly automate the control parameter tuning for autonomous driving, including: a learning-based dynamic modeling procedure, to enable the control-in-the-loop simulation with highly accurate vehicle dynamics for parameter tuning; a learning-based open-loop mapping procedure, to solve the feedforward control parameters tuning; and more significantly, a Bayesian-optimization-based closed-loop parameter tuning procedure, to automatically tune feedback control (PID, LQR, MRAC, MPC, etc.) parameters in simulation environment. The paper shows an improvement in control performance with a significant increase in parameter tuning efficiency, in both simulation and road tests. This framework has been validated on different vehicles in US and China. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: 8 pages, 12 figures

arXiv:2010.09776 [pdf, other]

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

Authors: Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat , et al. (12 additional authors not shown)

Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse a… ▽ More Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at https://github.com/huawei-noah/SMARTS. △ Less

Submitted 31 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 20 pages, 11 figures. Paper accepted to CoRL 2020

arXiv:2006.00749 [pdf, other]

Constrained low-rank quaternion approximation for color image denoising by bilateral random projections

Authors: Jifei Miao, Kit Ian Kou

Abstract: In this letter, we propose a novel low-rank quaternion approximation (LRQA) model by directly constraining the quaternion rank prior for effectively removing the noise in color images. The LRQA model treats the color image holistically rather than independently for the color space components, thus it can fully utilize the high correlation among RGB channels. We design an iterative algorithm by usi… ▽ More In this letter, we propose a novel low-rank quaternion approximation (LRQA) model by directly constraining the quaternion rank prior for effectively removing the noise in color images. The LRQA model treats the color image holistically rather than independently for the color space components, thus it can fully utilize the high correlation among RGB channels. We design an iterative algorithm by using quaternion bilateral random projections (Q-BRP) to efficiently optimize the proposed model. The main advantage of Q-BRP is that the approximation of the low-rank quaternion matrix can be obtained quite accurately in an inexpensive way. Furthermore, color image denoising is further based on nonlocal self-similarity (NSS) prior. The experimental results on color image denoising illustrate the effectiveness and superiority of the proposed method. △ Less

Submitted 1 June, 2020; originally announced June 2020.

arXiv:2005.02886 [pdf, other]

doi 10.1109/TSP.2020.3025519

Quaternion-based bilinear factor matrix norm minimization for color image inpainting

Authors: Jifei Miao, Kit Ian Kou

Abstract: As a new color image representation tool, quaternion has achieved excellent results in the color image processing, because it treats the color image as a whole rather than as a separate color space component, thus it can make full use of the high correlation among RGB channels. Recently, low-rank quaternion matrix completion (LRQMC) methods have proven very useful for color image inpainting. In th… ▽ More As a new color image representation tool, quaternion has achieved excellent results in the color image processing, because it treats the color image as a whole rather than as a separate color space component, thus it can make full use of the high correlation among RGB channels. Recently, low-rank quaternion matrix completion (LRQMC) methods have proven very useful for color image inpainting. In this paper, we propose three novel LRQMC methods based on three quaternion-based bilinear factor (QBF) matrix norm minimization models. Specifically, we define quaternion double Frobenius norm (Q-DFN), quaternion double nuclear norm (Q-DNN) and quaternion Frobenius/nuclear norm (Q-FNN), and then show their relationship with quaternion-based matrix Schatten-p (Q- Schatten-p ) norm for certain p values. The proposed methods can avoid computing quaternion singular value decompositions (QSVD) for large quaternion matrices, and thus can effectively reduce the calculation time compared with existing (LRQMC) methods. The experimental results demonstrate the superior performance of the proposed methods over some state-of-the-art low-rank (quaternion) matrix completion methods. △ Less

Submitted 6 May, 2020; originally announced May 2020.

arXiv:2004.10445 [pdf, other]

RESIRE: real space iterative reconstruction engine for Tomography

Authors: Minh Pham, Yakun Yuan, Arjun Rana, Jianwei Miao, Stanley Osher

Abstract: Tomography has made a revolutionary impact on diverse fields, ranging from macro-/mesoscopic scale studies in biology, radiology, plasma physics to the characterization of 3D atomic structure in material science. The fundamental of tomography is to reconstruct a 3D object from a set of 2D projections. To solve the tomography problem, many algorithms have been developed. Among them are methods usin… ▽ More Tomography has made a revolutionary impact on diverse fields, ranging from macro-/mesoscopic scale studies in biology, radiology, plasma physics to the characterization of 3D atomic structure in material science. The fundamental of tomography is to reconstruct a 3D object from a set of 2D projections. To solve the tomography problem, many algorithms have been developed. Among them are methods using transformation technique such as computed tomography (CT) based on Radon transform and Generalized Fourier iterative reconstruction (GENFIRE) based on Fourier slice theorem (FST), and direct methods such as Simultaneous Iterative Reconstruction Technique (SIRT) and Simultaneous Algebraic Reconstruction Technique (SART) using gradient descent and algebra technique. In this paper, we propose a hybrid gradient descent to solve the tomography problem by combining Fourier slice theorem and calculus of variations. By using simulated and experimental data, we show that the state-of-art RESIRE can produce more superior results than previous methods; the reconstructed objects have higher quality and smaller relative errors. More importantly, RESIRE can deal with partially blocked projections rigorously where only part of projection information are provided while other methods fail. We anticipate RESIRE will not only improve the reconstruction quality in all existing tomographic applications, but also expand tomography method to a broad class of functional thin films. We expect RESIRE to find a broad applications across diverse disciplines. △ Less

Submitted 25 April, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

arXiv:1909.06567 [pdf, other]

Color image recovery using low-rank quaternion matrix completion algorithm

Authors: Jifei Miao, Kit Ian Kou

Abstract: As a new color image representation tool, quaternion has achieved excellent results in color image processing problems. In this paper, we propose a novel low-rank quaternion matrix completion algorithm to recover missing data of color image. Motivated by two kinds of low-rank approximation approaches (low-rank decomposition and nuclear norm minimization) in traditional matrix-based methods, we com… ▽ More As a new color image representation tool, quaternion has achieved excellent results in color image processing problems. In this paper, we propose a novel low-rank quaternion matrix completion algorithm to recover missing data of color image. Motivated by two kinds of low-rank approximation approaches (low-rank decomposition and nuclear norm minimization) in traditional matrix-based methods, we combine the two approaches in our quaternion matrix-based model. Furthermore, the nuclear norm of the quaternion matrix is replaced by the sum of Frobenius norm of its two low-rank factor quaternion matrices. Based on the relationship between quaternion matrix and its equivalent complex matrix, the problem eventually is converted from quaternion number field to complex number field. An alternating minimization method is applied to solve the model. Simulation results on real world color image recovery show the superior performance and efficiency of the proposed algorithm over some state-of-the-art tensor-based ones. △ Less

Submitted 14 September, 2019; originally announced September 2019.

arXiv:1906.01875 [pdf, other]

doi 10.1364/OE.27.031246

A semi-implicit relaxed Douglas-Rachford algorithm (sir-DR) for Ptychograhpy

Authors: Minh Pham, Arjun Rana, Jianwei Miao, Stanley Osher

Abstract: Alternating projection based methods, such as ePIE and rPIE, have been used widely in ptychography. However, they only work well if there are adequate measurements (diffraction patterns); in the case of sparse data (i.e. fewer measurements) alternating projection underperforms and might not even converge. In this paper, we propose semi-implicit relaxed Douglas Rachford (sir-DR), an accelerated ite… ▽ More Alternating projection based methods, such as ePIE and rPIE, have been used widely in ptychography. However, they only work well if there are adequate measurements (diffraction patterns); in the case of sparse data (i.e. fewer measurements) alternating projection underperforms and might not even converge. In this paper, we propose semi-implicit relaxed Douglas Rachford (sir-DR), an accelerated iterative method, to solve the classical ptychography problem. Using both simulated and experimental data, we show that sir-DR improves the convergence speed and the reconstruction quality relative to ePIE and rPIE. Furthermore, in certain cases when sparsity is high, sir-DR converges while ePIE and rPIE fail. To facilitate others to use the algorithm, we post the Matlab source code of sir-DR on a public website (www.physics.ucla.edu/research/imaging/sir-DR). We anticipate that this algorithm can be generally applied to the ptychographic reconstruction of a wide range of samples in the physical and biological sciences. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Showing 1–23 of 23 results for author: Miao, J