-
Spherical Pendulum with Quad-Rotor Thrust Vectoring Actuation -- A Novel Mechatronics and Control Benchmark Platform
Authors:
Yuchen Li,
Omar Curiel,
Sheng-Fan Wen,
Tsu-Chin Tsao
Abstract:
Motor-actuated pendulums have been established as arguably the most common laboratory prototypes used in control system education because of the relevance to robot manipulator control in industry. Meanwhile, multi-rotor drones like quadcopters have become popular in industrial applications but have not been broadly employed in control education laboratory. Platforms with pendulums and multi-rotor…
▽ More
Motor-actuated pendulums have been established as arguably the most common laboratory prototypes used in control system education because of the relevance to robot manipulator control in industry. Meanwhile, multi-rotor drones like quadcopters have become popular in industrial applications but have not been broadly employed in control education laboratory. Platforms with pendulums and multi-rotor copters present classical yet intriguing multi-degree of freedom (DoF) dynamics and coordinate systems for the control system investigation. In this paper, we introduce a novel control platform in which a 2-DoF pendulum capable of azimuth and elevation rotation is actuated through vectored thrust generated by a quadcopter. Designed as a benchmark for mechatronics and nonlinear control education and research, the system integrates detailed mechatronic implementation with different control strategies. Specifically, we apply and compare small perturbation linearization (SPL), state feedback linearization (SFL), and partial feedback linearization (PFL) to the nonlinear system dynamics. The performances are evaluated by time specifications of step response and Root-Mean-Square (RMS) error of trajectory tracking. The robustness of the closed-loop system is validated under external disturbances, and both simulation and experimental results are presented to highlight the strengths and limitations of the nonlinear model-based control approaches.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
DEKC: Data-Enable Control for Tethered Space Robot Deployment in the Presence of Uncertainty via Koopman Operator Theory
Authors:
Ao Jin,
Qinyi Wang,
Sijie Wen,
Ya Liu,
Ganghui Shen,
Panfeng Huang,
Fan Zhang
Abstract:
This work focuses the deployment of tethered space robot in the presence of unknown uncertainty. A data-enable framework called DEKC which contains offline training part and online execution part is proposed to deploy tethered space robot in the presence of uncertainty. The main idea of this work is modeling the unknown uncertainty as a dynamical system, which enables high accuracy and convergence…
▽ More
This work focuses the deployment of tethered space robot in the presence of unknown uncertainty. A data-enable framework called DEKC which contains offline training part and online execution part is proposed to deploy tethered space robot in the presence of uncertainty. The main idea of this work is modeling the unknown uncertainty as a dynamical system, which enables high accuracy and convergence of capturing uncertainty. The core part of proposed framework is a proxy model of uncertainty, which is derived from data-driven Koopman theory and is separated with controller design. In the offline stage, the lifting functions associated with Koopman operator are parameterized with deep neural networks. Then by solving an optimization problem, the lifting functions are learned from sampling data. In the online execution stage, the proxy model cooperates the learned lifting functions obtained in the offline phase to capture the unknown uncertainty. Then the output of proxy model is compensated to the baseline controller such that the effect of uncertainty can be attenuated or even eliminated. Furthermore, considering some scenarios in which the performance of proxy model may weaken, a receding-horizon scheme is proposed to update the proxy model online. Finally, the extensive numerical simulations demonstrate the effectiveness of our proposed framework. The implementation of proposed DEKC framework is publicly available at https://github.com/NPU-RCIR/DEKC.git.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
Authors:
Yao Ni,
Song Wen,
Piotr Koniusz,
Anoop Cherian
Abstract:
Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we p…
▽ More
Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model's robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate that incorporating these losses into fine-tuning not only preserves subject identity but also enhances image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Optimal Parameter Adaptation for Safety-Critical Control via Safe Barrier Bayesian Optimization
Authors:
Shengbo Wang,
Ke Li,
Zheng Yan,
Zhenyuan Guo,
Song Zhu,
Guanghui Wen,
Shiping Wen
Abstract:
Safety is of paramount importance in control systems to avoid costly risks and catastrophic damages. The control barrier function (CBF) method, a promising solution for safety-critical control, poses a new challenge of enhancing control performance due to its direct modification of original control design and the introduction of uncalibrated parameters. In this work, we shed light on the crucial r…
▽ More
Safety is of paramount importance in control systems to avoid costly risks and catastrophic damages. The control barrier function (CBF) method, a promising solution for safety-critical control, poses a new challenge of enhancing control performance due to its direct modification of original control design and the introduction of uncalibrated parameters. In this work, we shed light on the crucial role of configurable parameters in the CBF method for performance enhancement with a systematical categorization. Based on that, we propose a novel framework combining the CBF method with Bayesian optimization (BO) to optimize the safe control performance. Considering feasibility/safety-critical constraints, we develop a safe version of BO using the barrier-based interior method to efficiently search for promising feasible configurable parameters. Furthermore, we provide theoretical criteria of our framework regarding safety and optimality. An essential advantage of our framework lies in that it can work in model-agnostic environments, leaving sufficient flexibility in designing objective and constraint functions. Finally, simulation experiments on swing-up control and high-fidelity adaptive cruise control are conducted to demonstrate the effectiveness of our framework.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification
Authors:
Qi Zhang,
Huamin Wang,
Hangchi Shen,
Shukai Duan,
Shiping Wen,
Tingwen Huang
Abstract:
Recently, it can be noticed that most models based on spiking neural networks (SNNs) only use a same level temporal resolution to deal with speech classification problems, which makes these models cannot learn the information of input data at different temporal scales. Additionally, owing to the different time lengths of the data before and after the sub-modules of many models, the effective resid…
▽ More
Recently, it can be noticed that most models based on spiking neural networks (SNNs) only use a same level temporal resolution to deal with speech classification problems, which makes these models cannot learn the information of input data at different temporal scales. Additionally, owing to the different time lengths of the data before and after the sub-modules of many models, the effective residual connections cannot be applied to optimize the training processes of these models.To solve these problems, on the one hand, we reconstruct the temporal dimension of the audio spectrum to propose a novel method named as Temporal Reconstruction (TR) by referring the hierarchical processing process of the human brain for understanding speech. Then, the reconstructed SNN model with TR can learn the information of input data at different temporal scales and model more comprehensive semantic information from audio data because it enables the networks to learn the information of input data at different temporal resolutions. On the other hand, we propose the Non-Aligned Residual (NAR) method by analyzing the audio data, which allows the residual connection can be used in two audio data with different time lengths. We have conducted plentiful experiments on the Spiking Speech Commands (SSC), the Spiking Heidelberg Digits (SHD), and the Google Speech Commands v0.02 (GSC) datasets. According to the experiment results, we have achieved the state-of-the-art (SOTA) result 81.02\% on SSC for the test classification accuracy of all SNN models, and we have obtained the SOTA result 96.04\% on SHD for the classification accuracy of all models.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge
Authors:
Qishan Zhang,
Shuangbing Wen,
Fangke Yan,
Tao Hu,
Jun Li
Abstract:
This paper introduces the model structure used in the SVDD 2024 Challenge. The SVDD 2024 challenge has been introduced this year for the first time. Singing voice deepfake detection (SVDD) which faces complexities due to informal speech intonations and varying speech rates. In this paper, we propose the XWSB system, which achieved SOTA per-formance in the SVDD challenge. XWSB stands for XLS-R, Wav…
▽ More
This paper introduces the model structure used in the SVDD 2024 Challenge. The SVDD 2024 challenge has been introduced this year for the first time. Singing voice deepfake detection (SVDD) which faces complexities due to informal speech intonations and varying speech rates. In this paper, we propose the XWSB system, which achieved SOTA per-formance in the SVDD challenge. XWSB stands for XLS-R, WavLM, and SLS Blend, representing the integration of these technologies for the purpose of SVDD. Specifically, we used the best performing model structure XLS-R&SLS from the ASVspoof DF dataset, and applied SLS to WavLM to form the WavLM&SLS structure. Finally, we integrated two models to form the XWSB system. Experimental results show that our system demonstrates advanced recognition capabilities in the SVDD challenge, specifically achieving an EER of 2.32% in the CtrSVDD track. The code and data can be found at https://github.com/QiShanZhang/XWSB_for_ SVDD2024.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Authors:
Xin Li,
Kun Yuan,
Yajing Pei,
Yiting Lu,
Ming Sun,
Chao Zhou,
Zhibo Chen,
Radu Timofte,
Wei Sun,
Haoning Wu,
Zicheng Zhang,
Jun Jia,
Zhichao Zhang,
Linhan Cao,
Qiubo Chen,
Xiongkuo Min,
Weisi Lin,
Guangtao Zhai,
Jianhui Sun,
Tianyi Wang,
Lei Li,
Han Kong,
Wenxuan Wang,
Bing Li,
Cheng Luo
, et al. (43 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The…
▽ More
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
ImageBind-LLM: Multi-modality Instruction Tuning
Authors:
Jiaming Han,
Renrui Zhang,
Wenqi Shao,
Peng Gao,
Peng Xu,
Han Xiao,
Kaipeng Zhang,
Chris Liu,
Song Wen,
Ziyu Guo,
Xudong Lu,
Shuai Ren,
Yafei Wen,
Xiaoxin Chen,
Xiangyu Yue,
Hongsheng Li,
Yu Qiao
Abstract:
We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training…
▽ More
We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.
△ Less
Submitted 11 September, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm
Authors:
Dongyuan Shi,
Woon-Seng Gan,
Bhan Lam,
Shulin Wen,
Xiaoyi Shen
Abstract:
Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal…
▽ More
Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of dealing with quickly varying disturbances, such as piling noise. Furthermore, the noise power variation also deteriorates the robustness of the algorithm when it adopts the fixed step size. To solve these issues, we integrated the normalized multichannel FxLMS with the momentum method, which hence, effectively avoids the interference of the primary noise power and accelerates the convergence of the algorithm. To validate its effectiveness, we deployed this algorithm in a multichannel noise control window to control the real machine noise.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Model-Assisted Probabilistic Safe Adaptive Control With Meta-Bayesian Learning
Authors:
Shengbo Wang,
Ke Li,
Yin Yang,
Yuting Cao,
Tingwen Huang,
Shiping Wen
Abstract:
Breaking safety constraints in control systems can lead to potential risks, resulting in unexpected costs or catastrophic damage. Nevertheless, uncertainty is ubiquitous, even among similar tasks. In this paper, we develop a novel adaptive safe control framework that integrates meta learning, Bayesian models, and control barrier function (CBF) method. Specifically, with the help of CBF method, we…
▽ More
Breaking safety constraints in control systems can lead to potential risks, resulting in unexpected costs or catastrophic damage. Nevertheless, uncertainty is ubiquitous, even among similar tasks. In this paper, we develop a novel adaptive safe control framework that integrates meta learning, Bayesian models, and control barrier function (CBF) method. Specifically, with the help of CBF method, we learn the inherent and external uncertainties by a unified adaptive Bayesian linear regression (ABLR) model, which consists of a forward neural network (NN) and a Bayesian output layer. Meta learning techniques are leveraged to pre-train the NN weights and priors of the ABLR model using data collected from historical similar tasks. For a new control task, we refine the meta-learned models using a few samples, and introduce pessimistic confidence bounds into CBF constraints to ensure safe control. Moreover, we provide theoretical criteria to guarantee probabilistic safety during the control processes. To validate our approach, we conduct comparative experiments in various obstacle avoidance scenarios. The results demonstrate that our algorithm significantly improves the Bayesian model-based CBF method, and is capable for efficient safe exploration even with multiple uncertain constraints.
△ Less
Submitted 13 July, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Technology Report : Smartphone-Based Pedestrian Dead Reckoning Integrated with Data-Fusion-Adopted Visible Light Positioning
Authors:
Shangsheng Wen,
Ziyang Ge,
Danlan Yuan,
Yingcong Chen,
Xuecong Fang
Abstract:
Pedestrian dead-reckoning (PDR) is a potential indoor localization technology that obtains location estimation with the inertial measurement unit (IMU). However, one of its most significant drawbacks is the accumulation of its measurement error. This paper proposes a visible light positioning (VLP)-integrated PDR system, which could achieve real-time and accurate indoor positioning using IMU and t…
▽ More
Pedestrian dead-reckoning (PDR) is a potential indoor localization technology that obtains location estimation with the inertial measurement unit (IMU). However, one of its most significant drawbacks is the accumulation of its measurement error. This paper proposes a visible light positioning (VLP)-integrated PDR system, which could achieve real-time and accurate indoor positioning using IMU and the camera sensor of our smartphone. A multi-frame fusion method is proposed in the encoding and decoding process of the system, reaching 98.5% decoding accuracy with a 20-bit-long ID at the height of 2.1 m, which allows the variation in the shutter speeds of cameras and heights of the LED. Meanwhile, absolute locations and step length could be calibrated with the help of a single light-emitting diode (LED), promising average accuracy within 0.5 meters in a 108-meter walk.
△ Less
Submitted 5 January, 2023;
originally announced January 2023.
-
A geometry method for LED mapping
Authors:
Junlin Huang,
Shangsheng Wen,
Weipeng Guan
Abstract:
With inputs from RGB-D camera, industrial camera and wheel odometer, in this letter, we propose a geometry-based detecting method, by which the 3-D modulated LED map can be acquired with the aid of visual odometry algorithm from ORB-SLAM2 system when the decoding result of LED-ID is inaccurate. Subsequently, an enhanced cost function is proposed to optimize the mapping result of LEDs. The average…
▽ More
With inputs from RGB-D camera, industrial camera and wheel odometer, in this letter, we propose a geometry-based detecting method, by which the 3-D modulated LED map can be acquired with the aid of visual odometry algorithm from ORB-SLAM2 system when the decoding result of LED-ID is inaccurate. Subsequently, an enhanced cost function is proposed to optimize the mapping result of LEDs. The average 3-D mapping error (8.5cm) is evaluated with a real-world experiment. This work can be viewed as a preliminary work of visible light positioning systems, offering a way to prevent the labor-intensive manual site surveys of LEDs.
△ Less
Submitted 28 October, 2022; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Generative Anomaly Detection for Time Series Datasets
Authors:
Zhuangwei Kang,
Ayan Mukhopadhyay,
Aniruddha Gokhale,
Shijie Wen,
Abhishek Dubey
Abstract:
Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of mul…
▽ More
Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of multivariate time series (MTS). However, existing works are either not scalable or unable to capture the spatial-temporal information in MTS simultaneously. To this end, we propose a principled and comprehensive framework consisting of a data-driven generative approach that can perform tractable density estimation for detecting traffic anomalies. Our approach first clusters segments in the feature space and then uses conditional normalizing flow to identify anomalous temporal snapshots at the cluster level in an unsupervised setting. Then, we identify anomalies at the segment level by using a kernel density estimator on the anomalous cluster. Extensive experiments on synthetic datasets show that our approach significantly outperforms several state-of-the-art congestion anomaly detection and diagnosis methods in terms of Recall and F1-Score. We also use the generative model to sample labeled data, which can train classifiers in a supervised setting, alleviating the lack of labeled data for anomaly detection in sparse settings.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Suboptimal Safety-Critical Control for Continuous Systems Using Prediction-Correction Online Optimization
Authors:
Shengbo Wang,
Shiping Wen,
Yin Yang,
Yuting Cao,
Kaibo Shi,
Tingwen Huang
Abstract:
This paper investigates the control barrier function (CBF) based safety-critical control for continuous nonlinear control affine systems using the more efficient online algorithms through time-varying optimization. The idea lies in that when quadratic programming (QP) or other convex optimization algorithms needed in the CBF-based method is not computation affordable, the alternative suboptimal fe…
▽ More
This paper investigates the control barrier function (CBF) based safety-critical control for continuous nonlinear control affine systems using the more efficient online algorithms through time-varying optimization. The idea lies in that when quadratic programming (QP) or other convex optimization algorithms needed in the CBF-based method is not computation affordable, the alternative suboptimal feasible solutions can be obtained more economically. By using the barrier-based interior point method, the constrained CBF-QP problems are converted into the unconstrained ones with suboptimal solutions tracked by two continuous descent-based algorithms. Considering the lag effect of tracking and exploiting the system information, the prediction method is added to the algorithms which thereby achieves a exponential convergence rate to the time-varying suboptimal solutions. The convergence and robustness of the designed methods as well as the safety criteria of the algorithms are analyzed theoretically. In the end, the effectiveness is illustrated by simulations on the anti-swing and obstacle avoidance tasks.
△ Less
Submitted 20 March, 2023; v1 submitted 29 March, 2022;
originally announced March 2022.
-
TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers
Authors:
Di Liu,
Yunhe Gao,
Qilong Zhangli,
Ligong Han,
Xiaoxiao He,
Zhaoyang Xia,
Song Wen,
Qi Chang,
Zhennan Yan,
Mu Zhou,
Dimitris Metaxas
Abstract:
Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view im…
▽ More
Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation.
△ Less
Submitted 5 September, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Authors:
Chen Shen,
Yi Liu,
Wenzhi Fan,
Bin Wang,
Shixue Wen,
Yao Tian,
Jun Zhang,
Jingsheng Yang,
Zejun Ma
Abstract:
This paper describes our submission to ICASSP 2022 Multi-channel Multi-party Meeting Transcription (M2MeT) Challenge. For Track 1, we propose several approaches to empower the clustering-based speaker diarization system to handle overlapped speech. Front-end dereverberation and the direction-of-arrival (DOA) estimation are used to improve the accuracy of speaker diarization. Multi-channel combinat…
▽ More
This paper describes our submission to ICASSP 2022 Multi-channel Multi-party Meeting Transcription (M2MeT) Challenge. For Track 1, we propose several approaches to empower the clustering-based speaker diarization system to handle overlapped speech. Front-end dereverberation and the direction-of-arrival (DOA) estimation are used to improve the accuracy of speaker diarization. Multi-channel combination and overlap detection are applied to reduce the missed speaker error. A modified DOVER-Lap is also proposed to fuse the results of different systems. We achieve the final DER of 5.79% on the Eval set and 7.23% on the Test set. For Track 2, we develop our system using the Conformer model in a joint CTC-attention architecture. Serialized output training is adopted to multi-speaker overlapped speech recognition. We propose a neural front-end module to model multi-channel audio and train the model end-to-end. Various data augmentation methods are utilized to mitigate over-fitting in the multi-channel multi-speaker E2E system. Transformer language model fusion is developed to achieve better performance. The final CER is 19.2% on the Eval set and 20.8% on the Test set.
△ Less
Submitted 9 February, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
An improved bearing fault detection strategy based on artificial bee colony algorithm
Authors:
Haiquan Wang,
Wenxuan Yue,
Shengjun Wen,
Xiaobin Xu,
Menghao Su,
Shanshan Zhang,
Panpan Du
Abstract:
The operating state of bearing directly affects the performance of rotating machinery and how to accurately and decisively extract features from the original vibration signal and recognize the faulty parts as early as possible is very critical. In this study, the one-dimensional ternary model which has been proved to be an effective statistical method in feature selection is introduced and shapele…
▽ More
The operating state of bearing directly affects the performance of rotating machinery and how to accurately and decisively extract features from the original vibration signal and recognize the faulty parts as early as possible is very critical. In this study, the one-dimensional ternary model which has been proved to be an effective statistical method in feature selection is introduced and shapelets transformation is proposed to calculate the parameter of it which is also the standard deviation of the transformed shaplets that is usually selected by trial and error. Moreover, XGBoost is used to recognize the faults from the obtained features, and an improved artificial bee colony algorithm(ABC) where the evolution is guided by the importance indices of different search space is proposed to optimize the parameters of XGBoost. Here the value of importance index is related to the probability of optimal solutions in certain space, thus the problem of easily falling into local optimality in traditional ABC could be avoided.The experimental results based on the failure vibration signal samples show that the average accuracy of fault signal recognition can reach 97% which is much higher than the ones corresponding to other extraction strategies, thus the ability of extraction could be improved. And with the improved artificial bee colony algorithm which is used to optimize the parameters of XGBoost, the classification accuracy could be improved from 97.02% to about 98.60% compared with the traditional classification strategy
△ Less
Submitted 2 December, 2021; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Robust Adaptive Safety-Critical Control for Unknown Systems with Finite-Time Element-Wise Parameter Estimation
Authors:
Shengbo Wang,
Bo Lyu,
Shiping Wen,
Kaibo Shi,
Song Zhu,
Tingwen Huang
Abstract:
Safety is always one of the most critical principles for a system to be controlled. This paper investigates a safety-critical control scheme for unknown structured systems by using the control barrier function (CBF) method. Benefited from the dynamic regressor extension and mixing (DREM), an extended element-wise parameter identification law is utilized to dismiss the uncertainty. On the one hand,…
▽ More
Safety is always one of the most critical principles for a system to be controlled. This paper investigates a safety-critical control scheme for unknown structured systems by using the control barrier function (CBF) method. Benefited from the dynamic regressor extension and mixing (DREM), an extended element-wise parameter identification law is utilized to dismiss the uncertainty. On the one hand, it is shown that the proposed control scheme can always guarantee the safety in the identification process with noised signal injection excitation, which was not considered in the previous study. On the other hand, the element-wise estimation process in DREM can minimize conservatism of the safe adaptive process compared to other existing adaptive CBF algorithms. The stability as well as the forward invariance of the presented safe control-estimation scheme is proved. Furthermore, the robustness of the scheme under bounded disturbances is analyzed, where a robust CBF with modest conditions is used to ensure safety. The framework is illustrated by simulations on adaptive cruise control, where the slope resistance of the following vehicle is robustly estimated in finite time against small disturbances and the potential crash risk is avoided by the proposed safe control scheme.
△ Less
Submitted 14 January, 2022; v1 submitted 27 November, 2021;
originally announced November 2021.
-
Optimal Tracking Control for Unknown Linear Systems with Finite-Time Parameter Estimation
Authors:
Shengbo Wang,
Shiping Wen,
Kaibo Shi,
Song Zhu,
Tingwen Huang
Abstract:
The optimal control input for linear systems can be solved from algebraic Riccati equation (ARE), from which it remains questionable to get the form of the exact solution. In engineering, the acceptable numerical solutions of ARE can be found by iteration or optimization. Recently, the gradient descent based numerical solutions has been proven effective to approximate the optimal ones. This paper…
▽ More
The optimal control input for linear systems can be solved from algebraic Riccati equation (ARE), from which it remains questionable to get the form of the exact solution. In engineering, the acceptable numerical solutions of ARE can be found by iteration or optimization. Recently, the gradient descent based numerical solutions has been proven effective to approximate the optimal ones. This paper introduces this method to tracking problem for heterogeneous linear systems. Differently, the parameters in the dynamics of the linear systems are all assumed to be unknown, which is intractable since the gradient as well as the allowable initialization needs the prior knowledge of system dynamics. To solve this problem, the method named dynamic regressor extension and mix (DREM) is improved to estimate the parameter matrices in finite time. Besides, a discounted factor is introduced to ensure the existence of optimal solutions for heterogeneous systems. Two simulation experiments are given to illustrate the effectiveness.
△ Less
Submitted 6 January, 2022; v1 submitted 27 November, 2021;
originally announced November 2021.
-
A strong baseline for image and video quality assessment
Authors:
Shaoguo Wen,
Junle Wang
Abstract:
In this work, we present a simple yet effective unified model for perceptual quality assessment of image and video. In contrast to existing models which usually consist of complex network architecture, or rely on the concatenation of multiple branches of features, our model achieves a comparable performance by applying only one global feature derived from a backbone network (i.e. resnet18 in the p…
▽ More
In this work, we present a simple yet effective unified model for perceptual quality assessment of image and video. In contrast to existing models which usually consist of complex network architecture, or rely on the concatenation of multiple branches of features, our model achieves a comparable performance by applying only one global feature derived from a backbone network (i.e. resnet18 in the presented work). Combined with some training tricks, the proposed model surpasses the current baselines of SOTA models on public and private datasets. Based on the architecture proposed, we release the models well trained for three common real-world scenarios: UGC videos in the wild, PGC videos with compression, Game videos with compression. These three pre-trained models can be directly applied for quality assessment, or be further fine-tuned for more customized usages. All the code, SDK, and the pre-trained weights of the proposed models are publicly available at https://github.com/Tencent/CenseoQoE.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Design and Evaluation of Active Noise Control on Machinery Noise
Authors:
Shulin Wen,
Duy Hai Nguyen,
Miqing Wang,
Woon-Seng Gan
Abstract:
Construction workers and residents live near around construction sites are exposed to noises that might cause hearing loss, high blood pressure, heart disease, sleep disturbance and stress. Regulations has been carried out by national governments to limit the maximum permissible noise levels for construction works. A four-channel active noise control system mounted on the opening of an enclosure i…
▽ More
Construction workers and residents live near around construction sites are exposed to noises that might cause hearing loss, high blood pressure, heart disease, sleep disturbance and stress. Regulations has been carried out by national governments to limit the maximum permissible noise levels for construction works. A four-channel active noise control system mounted on the opening of an enclosure is designed to prevent the machinery noise from spreading around and retaining the heat diffusion path. Multi-channel FxLMS algorithm in time domain is implemented on the main controller. A Genelec speaker is placed inside the box as the primary noise source to play back different types of noises. Analyses and experiments are carried out to investigate the controllable frequency range of this ANC system in detail. Considerable noise reduction performance is achieved for different recorded practical construction noises.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
"One-Shot" Reduction of Additive Artifacts in Medical Images
Authors:
Yu-Jen Chen,
Yen-Jung Chang,
Shao-Cheng Wen,
Yiyu Shi,
Xiaowei Xu,
Tsung-Yi Ho,
Meiping Huang,
Haiyun Yuan,
Jian Zhuang
Abstract:
Medical images may contain various types of artifacts with different patterns and mixtures, which depend on many factors such as scan setting, machine condition, patients' characteristics, surrounding environment, etc. However, existing deep-learning-based artifact reduction methods are restricted by their training set with specific predetermined artifact types and patterns. As such, they have lim…
▽ More
Medical images may contain various types of artifacts with different patterns and mixtures, which depend on many factors such as scan setting, machine condition, patients' characteristics, surrounding environment, etc. However, existing deep-learning-based artifact reduction methods are restricted by their training set with specific predetermined artifact types and patterns. As such, they have limited clinical adoption. In this paper, we introduce One-Shot medical image Artifact Reduction (OSAR), which exploits the power of deep learning but without using pre-trained general networks. Specifically, we train a light-weight image-specific artifact reduction network using data synthesized from the input image at test-time. Without requiring any prior large training data set, OSAR can work with almost any medical images that contain varying additive artifacts which are not in any existing data sets. In addition, Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are used as vehicles and show that the proposed method can reduce artifacts better than state-of-the-art both qualitatively and quantitatively using shorter test time.
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
Subjective and Objective Quality Assessment of Mobile Gaming Video
Authors:
Shaoguo Wen,
Suiyi Ling,
Junle Wang,
Ximing Chen,
Lizhi Fang,
Yanqing Jing,
Patrick Le Callet
Abstract:
Nowadays, with the vigorous expansion and development of gaming video streaming techniques and services, the expectation of users, especially the mobile phone users, for higher quality of experience is also growing swiftly. As most of the existing research focuses on traditional video streaming, there is a clear lack of both subjective study and objective quality models that are tailored for quali…
▽ More
Nowadays, with the vigorous expansion and development of gaming video streaming techniques and services, the expectation of users, especially the mobile phone users, for higher quality of experience is also growing swiftly. As most of the existing research focuses on traditional video streaming, there is a clear lack of both subjective study and objective quality models that are tailored for quality assessment of mobile gaming content. To this end, in this study, we first present a brand new Tencent Gaming Video dataset containing 1293 mobile gaming sequences encoded with three different codecs. Second, we propose an objective quality framework, namely Efficient hard-RAnk Quality Estimator (ERAQUE), that is equipped with (1) a novel hard pairwise ranking loss, which forces the model to put more emphasis on differentiating similar pairs; (2) an adapted model distillation strategy, which could be utilized to compress the proposed model efficiently without causing significant performance drop. Extensive experiments demonstrate the efficiency and robustness of our model.
△ Less
Submitted 27 January, 2021;
originally announced March 2021.
-
High Accuracy Visible Light Positioning Based on Multi-target Tracking Algorithm
Authors:
Linyi Huang,
Wentao Yang,
Shangsheng Wen,
Manxi Liu,
Weipeng Guan
Abstract:
In this paper, we propose a multi-target image tracking algorithm based on continuously apative mean-shift (Cam-shift) and unscented Kalman filter. We improved the single-lamp tracking algorithm proposed in our previous work to multi-target tracking, and achieved better robustness in the case of occlusion, the real-time performance to complete one positioning and relatively high accuracy by dynami…
▽ More
In this paper, we propose a multi-target image tracking algorithm based on continuously apative mean-shift (Cam-shift) and unscented Kalman filter. We improved the single-lamp tracking algorithm proposed in our previous work to multi-target tracking, and achieved better robustness in the case of occlusion, the real-time performance to complete one positioning and relatively high accuracy by dynamically adjusting the weights of the multi-target motion states. Our previous algorithm is limited to the analysis of tracking error. In this paper, the results of the tracking algorithm are evaluated with the tracking error we defined. Then combined with the double-lamp positioning algorithm, the real position of the terminal is calculated and evaluated with the positioning error we defined. Experiments show that the defined tracking error is 0.61cm and the defined positioning error for 3-D positioning is 3.29cm with the average processing time of 91.63ms per frame. Even if nearly half of the LED area is occluded, the tracking error remains at 5.25cm. All of this shows that the proposed visible light positioning (VLP) method can track multiple targets for positioning at the same time with good robustness, real-time performance and accuracy. In addition, the definition and analysis of tracking errors and positioning errors indicates the direction for future efforts to reduce errors.
△ Less
Submitted 26 May, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Joint Demosaicking and Denoising in the Wild: The Case of Training Under Ground Truth Uncertainty
Authors:
Jierun Chen,
Song Wen,
S. -H. Gary Chan
Abstract:
Image demosaicking and denoising are the two key fundamental steps in digital camera pipelines, aiming to reconstruct clean color images from noisy luminance readings. In this paper, we propose and study Wild-JDD, a novel learning framework for joint demosaicking and denoising in the wild. In contrast to previous works which generally assume the ground truth of training data is a perfect reflectio…
▽ More
Image demosaicking and denoising are the two key fundamental steps in digital camera pipelines, aiming to reconstruct clean color images from noisy luminance readings. In this paper, we propose and study Wild-JDD, a novel learning framework for joint demosaicking and denoising in the wild. In contrast to previous works which generally assume the ground truth of training data is a perfect reflection of the reality, we consider here the more common imperfect case of ground truth uncertainty in the wild. We first illustrate its manifestation as various kinds of artifacts including zipper effect, color moire and residual noise. Then we formulate a two-stage data degradation process to capture such ground truth uncertainty, where a conjugate prior distribution is imposed upon a base distribution. After that, we derive an evidence lower bound (ELBO) loss to train a neural network that approximates the parameters of the conjugate prior distribution conditioned on the degraded input. Finally, to further enhance the performance for out-of-distribution input, we design a simple but effective fine-tuning strategy by taking the input as a weakly informative prior. Taking into account ground truth uncertainty, Wild-JDD enjoys good interpretability during optimization. Extensive experiments validate that it outperforms state-of-the-art schemes on joint demosaicking and denoising tasks on both synthetic and realistic raw datasets.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Do Noises Bother Human and Neural Networks In the Same Way? A Medical Image Analysis Perspective
Authors:
Shao-Cheng Wen,
Yu-Jen Chen,
Zihao Liu,
Wujie Wen,
Xiaowei Xu,
Yiyu Shi,
Tsung-Yi Ho,
Qianjun Jia,
Meiping Huang,
Jian Zhuang
Abstract:
Deep learning had already demonstrated its power in medical images, including denoising, classification, segmentation, etc. All these applications are proposed to automatically analyze medical images beforehand, which brings more information to radiologists during clinical assessment for accuracy improvement. Recently, many medical denoising methods had shown their significant artifact reduction r…
▽ More
Deep learning had already demonstrated its power in medical images, including denoising, classification, segmentation, etc. All these applications are proposed to automatically analyze medical images beforehand, which brings more information to radiologists during clinical assessment for accuracy improvement. Recently, many medical denoising methods had shown their significant artifact reduction result and noise removal both quantitatively and qualitatively. However, those existing methods are developed around human-vision, i.e., they are designed to minimize the noise effect that can be perceived by human eyes. In this paper, we introduce an application-guided denoising framework, which focuses on denoising for the following neural networks. In our experiments, we apply the proposed framework to different datasets, models, and use cases. Experimental results show that our proposed framework can achieve a better result than human-vision denoising network.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
A Pathologist-Annotated Dataset for Validating Artificial Intelligence: A Project Description and Pilot Study
Authors:
Sarah N Dudgeon,
Si Wen,
Matthew G Hanna,
Rajarsi Gupta,
Mohamed Amgad,
Manasi Sheth,
Hetal Marble,
Richard Huang,
Markus D Herrmann,
Clifford H. Szu,
Darick Tong,
Bruce Werness,
Evan Szu,
Denis Larsimont,
Anant Madabhushi,
Evangelos Hytopoulos,
Weijie Chen,
Rajendra Singh,
Steven N. Hart,
Joel Saltz,
Roberto Salgado,
Brandon D Gallas
Abstract:
Purpose: In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images (WSIs). We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eo…
▽ More
Purpose: In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images (WSIs). We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eosin-stained ductal carcinoma core biopsies prepared at a single clinical site. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the FDA via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Authors:
Di Hu,
Rui Qian,
Minyue Jiang,
Xiao Tan,
Shilei Wen,
Errui Ding,
Weiyao Lin,
Dejing Dou
Abstract:
Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the s…
▽ More
Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
On the Security of Networked Control Systems in Smart Vehicle and its Adaptive Cruise Control
Authors:
Faezeh Farivar,
Mohammad Sayad Haghighi,
Alireza Jolfaei,
Sheng Wen
Abstract:
With the benefits of Internet of Vehicles (IoV) paradigm, come along unprecedented security challenges. Among many applications of inter-connected systems, vehicular networks and smart cars are examples that are already rolled out. Smart vehicles not only have networks connecting their internal components e.g. via Controller Area Network (CAN) bus, but also are connected to the outside world throu…
▽ More
With the benefits of Internet of Vehicles (IoV) paradigm, come along unprecedented security challenges. Among many applications of inter-connected systems, vehicular networks and smart cars are examples that are already rolled out. Smart vehicles not only have networks connecting their internal components e.g. via Controller Area Network (CAN) bus, but also are connected to the outside world through road side units and other vehicles. In some cases, the internal and external network packets pass through the same hardware and are merely isolated by software defined rules. Any misconfiguration opens a window for the hackers to intrude into vehicles' internal components e.g. central lock system, Engine Control Unit (ECU), Anti-lock Braking System (ABS) or Adaptive Cruise Control (ACC) system. Compromise of any of these can lead to disastrous outcomes. In this paper, we study the security of smart vehicles' adaptive cruise control systems in the presence of covert attacks. We define two covert/stealth attacks in the context of cruise control and propose a novel intrusion detection and compensation method to disclose and respond to such attacks. More precisely, we focus on the covert cyber attacks that compromise the integrity of cruise controller and employ a neural network identifier in the IDS engine to estimate the system output dynamically and compare it against the ACC output. If any anomaly is detected, an embedded substitute controller kicks in and takes over the control. We conducted extensive experiments in MATLAB to evaluate the effectiveness of the proposed scheme in a simulated environment.
△ Less
Submitted 4 August, 2020; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Recognition and evaluation of constellation diagram using deep learning based on underwater wireless optical communication
Authors:
ZiHao Zhou,
WeiPeng Guan,
ShangSheng Wen
Abstract:
Abstract. In this paper, we proposed a method of constellation diagram recognition and evaluation using deep learning based on underwater wireless optical communication (UWOC). More specifically, an constellation diagram analyzer for UWOC system based on convolutional neural network (CNN) is designed for modulation format recognition (MFR), optical signal noise ratio (OSNR) and phase error estimat…
▽ More
Abstract. In this paper, we proposed a method of constellation diagram recognition and evaluation using deep learning based on underwater wireless optical communication (UWOC). More specifically, an constellation diagram analyzer for UWOC system based on convolutional neural network (CNN) is designed for modulation format recognition (MFR), optical signal noise ratio (OSNR) and phase error estimation. Besides, unsupervised learning is used to excavate a new optimization metric from various factors that affect the quality of underwater channel.The proposed new metric synthesizes several original indexes, which we termed it as multi noise spatial metric (MNSM). The proposed MNSM divides the quality of constellation from high to low into several levels and reflects the quality of UWOC channel. Through the simulation, the constellation diagrams of four widely used M-QAM modulation formats for 16 OSNR values (15dB~30dB) are obtained, with the phase error standard deviations ranging from 0° to 45°. The results show that the accuracy of MFR , the estimation of OSNR and phase noise are 100%, 95% and 98.6% accuracies are achieved respectively. The ablation studies are also carried out in order to analyze the performance of deep learning in the recognition of constellation diagrams.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection
Authors:
Liang Du,
Xiaoqing Ye,
Xiao Tan,
Jianfeng Feng,
Zhenbo Xu,
Errui Ding,
Shilei Wen
Abstract:
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques. Owing to the severe spatial occlusion and inherent variance of point density with the distance to sensors, appearance of a same object varies a lot in point cloud data. Designing robust feature representation against such appearance changes is hence the key…
▽ More
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques. Owing to the severe spatial occlusion and inherent variance of point density with the distance to sensors, appearance of a same object varies a lot in point cloud data. Designing robust feature representation against such appearance changes is hence the key issue in a 3D object detection method. In this paper, we innovatively propose a domain adaptation like approach to enhance the robustness of the feature representation. More specifically, we bridge the gap between the perceptual domain where the feature comes from a real scene and the conceptual domain where the feature is extracted from an augmented scene consisting of non-occlusion point cloud rich of detailed information. This domain adaptation approach mimics the functionality of the human brain when proceeding object perception. Extensive experiments demonstrate that our simple yet effective approach fundamentally boosts the performance of 3D point cloud object detection and achieves the state-of-the-art results.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
Sub-Band Knowledge Distillation Framework for Speech Enhancement
Authors:
Xiang Hao,
Shixue Wen,
Xiangdong Su,
Yun Liu,
Guanglai Gao,
Xiaofei Li
Abstract:
In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral mapping for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pr…
▽ More
In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral mapping for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pre-train an elite-level sub-band enhancement model (teacher model) for each sub-band. These teacher models are dedicated to processing their own sub-bands. Next, under the teacher models' guidance, we train a general sub-band enhancement model (student model) that works for all sub-bands. Without increasing the number of model parameters and computational complexity, the student model's performance is further improved. To evaluate our proposed method, we conducted a large number of experiments on an open-source data set. The final experimental results show that the guidance from the elite-level teacher models dramatically improves the student model's performance, which exceeds the full-band model by employing fewer parameters.
△ Less
Submitted 29 October, 2020; v1 submitted 29 May, 2020;
originally announced May 2020.
-
NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results
Authors:
Dario Fuoli,
Zhiwu Huang,
Martin Danelljan,
Radu Timofte,
Hua Wang,
Longcun Jin,
Dewei Su,
Jing Liu,
Jaehoon Lee,
Michal Kudelski,
Lukasz Bala,
Dmitry Hrybov,
Marcin Mozejko,
Muchen Li,
Siyao Li,
Bo Pang,
Cewu Lu,
Chao Li,
Dongliang He,
Fu Li,
Shilei Wen
Abstract:
This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM), which addresses the issues of quality mapping from source video domain to target video domain. The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets. In particular, track 1 offers a new Internet video benchmark, requiring algorithms to learn the map from…
▽ More
This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM), which addresses the issues of quality mapping from source video domain to target video domain. The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets. In particular, track 1 offers a new Internet video benchmark, requiring algorithms to learn the map from more compressed videos to less compressed videos in a supervised training manner. In track 2, algorithms are required to learn the quality mapping from one device to another when their quality varies substantially and weakly-aligned video pairs are available. For track 1, in total 7 teams competed in the final test phase, demonstrating novel and effective solutions to the problem. For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.
△ Less
Submitted 15 June, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results
Authors:
Kai Zhang,
Shuhang Gu,
Radu Timofte,
Taizhang Shang,
Qiuju Dai,
Shengchen Zhu,
Tong Yang,
Yandong Guo,
Younghyun Jo,
Sejong Yang,
Seon Joo Kim,
Lin Zha,
Jiande Jiang,
Xinbo Gao,
Wen Lu,
Jing Liu,
Kwangjin Yoon,
Taegyun Jeon,
Kazutoshi Akita,
Takeru Ooba,
Norimichi Ukita,
Zhipeng Luo,
Yuehan Yao,
Zhenyu Xu,
Dongliang He
, et al. (38 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best percept…
▽ More
This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best perceptual quality and similar to the ground truth. The track had 280 registered participants, and 19 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution.
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
Variable Rate Image Compression Method with Dead-zone Quantizer
Authors:
Jing Zhou,
Akira Nakagawa,
Keizo Kato,
Sihan Wen,
Kimihiko Kazui,
Zhiming Tan
Abstract:
Deep learning based image compression methods have achieved superior performance compared with transform based conventional codec. With end-to-end Rate-Distortion Optimization (RDO) in the codec, compression model is optimized with Lagrange multiplier $λ$. For conventional codec, signal is decorrelated with orthonmal transformation, and uniform quantizer is introduced. We propose a variable rate i…
▽ More
Deep learning based image compression methods have achieved superior performance compared with transform based conventional codec. With end-to-end Rate-Distortion Optimization (RDO) in the codec, compression model is optimized with Lagrange multiplier $λ$. For conventional codec, signal is decorrelated with orthonmal transformation, and uniform quantizer is introduced. We propose a variable rate image compression method with dead-zone quantizer. Firstly, the autoencoder network is trained with RaDOGAGA \cite{radogaga} framework, which can make the latents isometric to the metric space, such as SSIM and MSE. Then the conventional dead-zone quantization method with arbitrary step size is used in the common trained network to provide the flexible rate control. With dead-zone quantizer, the experimental results show that our method performs comparably with independently optimized models within a wide range of bitrate.
△ Less
Submitted 26 April, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Indoor Localization System of ROS mobile robot based on Visible Light Communication
Authors:
Weipeng Guan,
Shihuan Chen,
Shangsheng Wen,
Wenyuan Hou,
Zequn Tan,
Ruihong Cen
Abstract:
In this paper, an indoor robot localization system based on Robot Operating System (ROS) and visible light communication (VLC) is presented. On the basis of our previous work, we innovatively designed a VLC localization and navigation package based on Robot Operating System (ROS), which contains the LED-ID detection and recognition method, the video target tracking algorithm and the double-lamp po…
▽ More
In this paper, an indoor robot localization system based on Robot Operating System (ROS) and visible light communication (VLC) is presented. On the basis of our previous work, we innovatively designed a VLC localization and navigation package based on Robot Operating System (ROS), which contains the LED-ID detection and recognition method, the video target tracking algorithm and the double-lamp positioning algorithm. This package exploited the principle of double-lamp positioning and the loose coupling characteristics of the ROS system, which is implemented by loosely coupled ROS nodes. Consequently, this paper combines ROS and VLC, aiming at promoting the application of VLC positioning in mature robotic systems. Moreover, it pushed forward the development of localization application based on VLC technology and lays a foundation for transplanting to other ROS robot platforms. Experimental results show that the proposed system can provide indoor localization within 1 cm and possesses a good real-time performance which takes only 0.4 seconds for one-time positioning. And if a high-performance laptop is used, the single positioning time can be reduced to 0.08 seconds. Therefore, this study confirms the practical application and the superior performance of VLC technology in ROS robot, showing the great potential of VLC localization. T he video demo of the proposed robot positioning system based on VLC can be seen in *
△ Less
Submitted 6 January, 2020;
originally announced January 2020.
-
A Sparse Representation Based Joint Demosaicing Method for Single-Chip Polarized Color Sensor
Authors:
Sijia Wen,
Yinqiang Zheng,
Feng Lu
Abstract:
The emergence of the single-chip polarized color sensor now allows for simultaneously capturing chromatic and polarimetric information of the scene on a monochromatic image plane. However, unlike the usual camera with an embedded demosaicing method, the latest polarized color camera is not delivered with an in-built demosaicing tool. For demosaicing, the users have to down-sample the captured imag…
▽ More
The emergence of the single-chip polarized color sensor now allows for simultaneously capturing chromatic and polarimetric information of the scene on a monochromatic image plane. However, unlike the usual camera with an embedded demosaicing method, the latest polarized color camera is not delivered with an in-built demosaicing tool. For demosaicing, the users have to down-sample the captured images or to use traditional interpolation techniques. Neither of them can perform well since the polarization and color are interdependent. Therefore, joint chromatic and polarimetric demosaicing is the key to obtaining high-quality polarized color images. In this paper, we propose a joint chromatic and polarimetric demosaicing model to address this challenging problem. Instead of mechanically demosaicing for the multi-channel polarized color image, we further present a sparse representation-based optimization strategy that utilizes chromatic information and polarimetric information to jointly optimize the model. To avoid the interaction between color and polarization during demosaicing, we separately construct the corresponding dictionaries. We also build an optical data acquisition system to collect a dataset, which contains various sources of polarization, such as illumination, reflectance and birefringence. Results of both qualitative and quantitative experiments have shown that our method is capable of faithfully recovering full RGB information of four polarization angles for each pixel from a single mosaic input image. Moreover, the proposed method can perform well not only on the synthetic data but the real captured data.
△ Less
Submitted 7 April, 2021; v1 submitted 16 December, 2019;
originally announced December 2019.
-
High accuracy and error analysis of indoor visible light positioning algorithm based on image sensor
Authors:
Shihuan Chen,
Weipeng Guan,
Zequn Tan,
Shangsheng Wen,
Manxi Liu,
Jingmin Wang,
Jingyi Li
Abstract:
In recent years, with the increasing demand for indoor positioning service, visible light indoor positioning based on image sensors has been widely studied. However, many researches only put forward the relevant localization algorithm and did not make a deep discussion on the principle of the visible light localization. In this paper, we make a deep discussion on the principle of the two-light pos…
▽ More
In recent years, with the increasing demand for indoor positioning service, visible light indoor positioning based on image sensors has been widely studied. However, many researches only put forward the relevant localization algorithm and did not make a deep discussion on the principle of the visible light localization. In this paper, we make a deep discussion on the principle of the two-light positioning algorithm and the three-light positioning algorithm based on the image sensor, which includes how these positioning algorithms work and the errors analysis. Based on the discussion above, we propose two methods to improve the positioning accuracy, which is rotation method and dispersion circle method respectively. In our experiment, we have numerically and experimentally verified the two optimization methods and we obtained good positioning results. Especially, the positioning accuracy of the dual-lamp positioning algorithm based on dispersion circle optimization is up to 1.93cm, while the average positioning error is only 0.82cm, which is state-of-the-art of the same type of positioning system at present.
△ Less
Submitted 29 April, 2020; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Multi-scale and Context-adaptive Entropy Model for Image Compression
Authors:
Jing Zhou,
Sihan Wen,
Akira Nakagawa,
Kimihiko Kazui,
Zhiming Tan
Abstract:
We propose an end-to-end trainable image compression framework with a multi-scale and context-adaptive entropy model, especially for low bitrate compression. Due to the success of autoregressive priors in probabilistic generative model, the complementary combination of autoregressive and hierarchical priors can estimate the distribution of each latent representation accurately. Based on this combi…
▽ More
We propose an end-to-end trainable image compression framework with a multi-scale and context-adaptive entropy model, especially for low bitrate compression. Due to the success of autoregressive priors in probabilistic generative model, the complementary combination of autoregressive and hierarchical priors can estimate the distribution of each latent representation accurately. Based on this combination, we firstly propose a multi-scale masked convolutional network as our autoregressive model. Secondly, for the significant computational penalty of generative model, we focus on decoded representations covered by receptive field, and skip full zero latents in arithmetic codec. At last, according to the low-rate compression's constraint in CLIC-2019, we use a method to maximize MS-SSIM by allocating bitrate for each image.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.