-
PAEG: Phrase-level Adversarial Example Generation for Neural Machine Translation
Authors:
Juncheng Wan,
Jian Yang,
Shuming Ma,
Dongdong Zhang,
Weinan Zhang,
Yong Yu,
Zhoujun Li
Abstract:
While end-to-end neural machine translation (NMT) has achieved impressive progress, noisy input usually leads models to become fragile and unstable. Generating adversarial examples as the augmented data has been proved to be useful to alleviate this problem. Existing methods for adversarial example generation (AEG) are word-level or character-level, which ignore the ubiquitous phrase structure. In…
▽ More
While end-to-end neural machine translation (NMT) has achieved impressive progress, noisy input usually leads models to become fragile and unstable. Generating adversarial examples as the augmented data has been proved to be useful to alleviate this problem. Existing methods for adversarial example generation (AEG) are word-level or character-level, which ignore the ubiquitous phrase structure. In this paper, we propose a Phrase-level Adversarial Example Generation (PAEG) framework to enhance the robustness of the translation model. Our method further improves the gradient-based word-level AEG method by adopting a phrase-level substitution strategy. We verify our method on three benchmarks, including LDC Chinese-English, IWSLT14 German-English, and WMT14 English-German tasks. Experimental results demonstrate that our approach significantly improves translation performance and robustness to noise compared to previous strong baselines.
△ Less
Submitted 24 October, 2022; v1 submitted 6 January, 2022;
originally announced January 2022.
-
Robust and Precise Facial Landmark Detection by Self-Calibrated Pose Attention Network
Authors:
Jun Wan,
Hui Xi,
Jie Zhou,
Zhihui Lai,
Witold Pedrycz,
Xu Wang,
Hang Sun
Abstract:
Current fully-supervised facial landmark detection methods have progressed rapidly and achieved remarkable performance. However, they still suffer when coping with faces under large poses and heavy occlusions for inaccurate facial shape constraints and insufficient labeled training samples. In this paper, we propose a semi-supervised framework, i.e., a Self-Calibrated Pose Attention Network (SCPAN…
▽ More
Current fully-supervised facial landmark detection methods have progressed rapidly and achieved remarkable performance. However, they still suffer when coping with faces under large poses and heavy occlusions for inaccurate facial shape constraints and insufficient labeled training samples. In this paper, we propose a semi-supervised framework, i.e., a Self-Calibrated Pose Attention Network (SCPAN) to achieve more robust and precise facial landmark detection in challenging scenarios. To be specific, a Boundary-Aware Landmark Intensity (BALI) field is proposed to model more effective facial shape constraints by fusing boundary and landmark intensity field information. Moreover, a Self-Calibrated Pose Attention (SCPA) model is designed to provide a self-learned objective function that enforces intermediate supervision without label information by introducing a self-calibrated mechanism and a pose attention mask. We show that by integrating the BALI fields and SCPA model into a novel self-calibrated pose attention network, more facial prior knowledge can be learned and the detection accuracy and robustness of our method for faces with large poses and heavy occlusions have been improved. The experimental results obtained for challenging benchmark datasets demonstrate that our approach outperforms state-of-the-art methods in the literature.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition
Authors:
Benjia Zhou,
Pichao Wang,
Jun Wan,
Yanyan Liang,
Fan Wang,
Du Zhang,
Zhen Lei,
Hao Li,
Rong Jin
Abstract:
Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors. Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coupled multi-modal spatiotemporal representation, they still suffer from (i) optimization difficulty under small data setting due to the tightly spatiote…
▽ More
Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors. Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coupled multi-modal spatiotemporal representation, they still suffer from (i) optimization difficulty under small data setting due to the tightly spatiotemporal-entangled modeling;(ii) information redundancy as it usually contains lots of marginal information that is weakly relevant to classification; and (iii) low interaction between multi-modal spatiotemporal information caused by insufficient late fusion. To alleviate these drawbacks, we propose to decouple and recouple spatiotemporal representation for RGB-D-based motion recognition. Specifically, we disentangle the task of learning spatiotemporal representation into 3 sub-tasks: (1) Learning high-quality and dimension independent features through a decoupled spatial and temporal modeling network. (2) Recoupling the decoupled representation to establish stronger space-time dependency. (3) Introducing a Cross-modal Adaptive Posterior Fusion (CAPF) mechanism to capture cross-modal spatiotemporal information from RGB-D data. Seamless combination of these novel designs forms a robust spatialtemporal representation and achieves better performance than state-of-the-art methods on four public motion datasets. Our code is available at https://github.com/damo-cv/MotionRGBD.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Unsupervised feature selection via self-paced learning and low-redundant regularization
Authors:
Weiyi Li,
Hongmei Chen,
Tianrui Li,
Jihong Wan,
Binbin Sang
Abstract:
Much more attention has been paid to unsupervised feature selection nowadays due to the emergence of massive unlabeled data. The distribution of samples and the latent effect of training a learning method using samples in more effective order need to be considered so as to improve the robustness of the method. Self-paced learning is an effective method considering the training order of samples. In…
▽ More
Much more attention has been paid to unsupervised feature selection nowadays due to the emergence of massive unlabeled data. The distribution of samples and the latent effect of training a learning method using samples in more effective order need to be considered so as to improve the robustness of the method. Self-paced learning is an effective method considering the training order of samples. In this study, an unsupervised feature selection is proposed by integrating the framework of self-paced learning and subspace learning. Moreover, the local manifold structure is preserved and the redundancy of features is constrained by two regularization terms. $L_{2,1/2}$-norm is applied to the projection matrix, which aims to retain discriminative features and further alleviate the effect of noise in the data. Then, an iterative method is presented to solve the optimization problem. The convergence of the method is proved theoretically and experimentally. The proposed method is compared with other state of the art algorithms on nine real-world datasets. The experimental results show that the proposed method can improve the performance of clustering methods and outperform other compared algorithms.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
The Spectroscopic Binaries from LAMOST Medium-Resolution Survey (MRS). I. Searching for Double-lined Spectroscopic Binaries (SB2s) with Convolutional Neural Network
Authors:
Bo Zhang,
Ying-Jie Jing,
Fan Yang,
Jun-Chen Wan,
Xin Ji,
Jian-Ning Fu,
Chao Liu,
Xiao-Bin Zhang,
Feng Luo,
Hao Tian,
Yu-Tao Zhou,
Jia-Xin Wang,
Yan-Jun Guo,
Weikai Zong,
Jian-Ping Xiong,
Jiao Li
Abstract:
We developed a convolutional neural network (CNN) model to distinguish the double-lined spectroscopic binaries (SB2s) from others based on single exposure medium-resolution spectra ($R\sim 7,500$). The training set consists of a large set of mock spectra of single stars and binaries synthesized based on the MIST stellar evolutionary model and ATLAS9 atmospheric model. Our model reaches a novel the…
▽ More
We developed a convolutional neural network (CNN) model to distinguish the double-lined spectroscopic binaries (SB2s) from others based on single exposure medium-resolution spectra ($R\sim 7,500$). The training set consists of a large set of mock spectra of single stars and binaries synthesized based on the MIST stellar evolutionary model and ATLAS9 atmospheric model. Our model reaches a novel theoretic false positive rate by adding a proper penalty on the negative sample (e.g., 0.12\% and 0.16\% for the blue/red arm when the penalty parameter $Λ=16$). Tests show that the performance is as expected and favors FGK-type Main-sequence binaries with high mass ratio ($q \geq 0.7$) and large radial velocity separation ($Δv \geq 50\,\mathrm{km\,s^{-1}}$). Although the real false positive rate can not be estimated reliably, validating on eclipsing binaries identified from Kepler light curves indicates that our model predicts low binary probabilities at eclipsing phases (0, 0.5, and 1.0) as expected. The color-magnitude diagram also helps illustrate its feasibility and capability of identifying FGK MS binaries from spectra. We conclude that this model is reasonably reliable and can provide an automatic approach to identify SB2s with period $\lesssim 10$ days. This work yields a catalog of binary probabilities for over 5 million spectra of 1 million sources from the LAMOST medium-resolution survey (MRS), and a catalog of 2198 SB2 candidates whose physical properties will be analyzed in our following-up paper. Data products are made publicly available at the journal as well as our Github website.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Adaptive Channel Encoding for Point Cloud Analysis
Authors:
Guoquan Xu,
Hezhi Cao,
Yifan Zhang,
Jianwei Wan,
Ke Xu,
Yanxin Ma
Abstract:
Attention mechanism plays a more and more important role in point cloud analysis and channel attention is one of the hotspots. With so much channel information, it is difficult for neural networks to screen useful channel information. Thus, an adaptive channel encoding mechanism is proposed to capture channel relationships in this paper. It improves the quality of the representation generated by t…
▽ More
Attention mechanism plays a more and more important role in point cloud analysis and channel attention is one of the hotspots. With so much channel information, it is difficult for neural networks to screen useful channel information. Thus, an adaptive channel encoding mechanism is proposed to capture channel relationships in this paper. It improves the quality of the representation generated by the network by explicitly encoding the interdependence between the channels of its features. Specifically, a channel-wise convolution (Channel-Conv) is proposed to adaptively learn the relationship between coordinates and features, so as to encode the channel. Different from the popular attention weight schemes, the Channel-Conv proposed in this paper realizes adaptability in convolution operation, rather than simply assigning different weights for channels. Extensive experiments on existing benchmarks verify our method achieves the state of the arts.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
Adaptive Channel Encoding Transformer for Point Cloud Analysis
Authors:
Guoquan Xu,
Hezhi Cao,
Yifan Zhang,
Yanxin Ma,
Jianwei Wan,
Ke Xu
Abstract:
Transformer plays an increasingly important role in various computer vision areas and remarkable achievements have also been made in point cloud analysis. Since they mainly focus on point-wise transformer, an adaptive channel encoding transformer is proposed in this paper. Specifically, a channel convolution called Transformer-Conv is designed to encode the channel. It can encode feature channels…
▽ More
Transformer plays an increasingly important role in various computer vision areas and remarkable achievements have also been made in point cloud analysis. Since they mainly focus on point-wise transformer, an adaptive channel encoding transformer is proposed in this paper. Specifically, a channel convolution called Transformer-Conv is designed to encode the channel. It can encode feature channels by capturing the potential relationship between coordinates and features. Compared with simply assigning attention weight to each channel, our method aims to encode the channel adaptively. In addition, our network adopts the neighborhood search method of low-level and high-level dual semantic receptive fields to improve the performance. Extensive experiments show that our method is superior to state-of-the-art point cloud classification and segmentation methods on three benchmark datasets.
△ Less
Submitted 16 July, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
Quasi one-dimensional diffuse laser cooling of atoms
Authors:
Jin-Yin Wan,
Xin Wang,
Xiao Zhang,
Yan-Ling Meng,
Wen-Li Wang,
Yuan Sun,
Liang Liu
Abstract:
We demonstrate experimentally the generation of one-dimensional cold gases of $^{87}$Rb atoms by diffuse laser cooling (DLC). A horizontal slender vacuum glass tube with length of 105~cm and diameter of 2~cm is used in our experiment. The diffuse laser light inside the tube, which is generated by multi-reflection of injected lasers, cools the background vapor atoms. With 250~mW of cooling light an…
▽ More
We demonstrate experimentally the generation of one-dimensional cold gases of $^{87}$Rb atoms by diffuse laser cooling (DLC). A horizontal slender vacuum glass tube with length of 105~cm and diameter of 2~cm is used in our experiment. The diffuse laser light inside the tube, which is generated by multi-reflection of injected lasers, cools the background vapor atoms. With 250~mW of cooling light and 50~mW of repumping light, an evenly distributed meter-long profile of atom cloud is obtained. We observe a factor 4 improvement on the atomic OD for a typical cooling duration of 170~ms and a sub-Doppler atomic temperature of 25~$μ$k. The maximum number of detected cold atoms remain constant for a free-fall duration of 30~ms. Such samples are ideal for many quantum optical experiments involving electromagnetically induced transparency, electronically highly excited (Rydberg) atoms and quantum precision measurements.
△ Less
Submitted 29 November, 2021; v1 submitted 28 November, 2021;
originally announced November 2021.
-
Correlation Improves Group Testing: Modeling Concentration-Dependent Test Errors
Authors:
Jiayue Wan,
Yujia Zhang,
Peter I. Frazier
Abstract:
Population-wide screening is a powerful tool for controlling infectious diseases. Group testing enables such screening despite limited resources. Viral concentration of pooled samples are often positively correlated, either because prevalence and sample collection are influenced by location, or through intentional enhancement via pooling samples according to risk/household. Such correlation is kno…
▽ More
Population-wide screening is a powerful tool for controlling infectious diseases. Group testing enables such screening despite limited resources. Viral concentration of pooled samples are often positively correlated, either because prevalence and sample collection are influenced by location, or through intentional enhancement via pooling samples according to risk/household. Such correlation is known to improve efficiency under fixed test sensitivity. However, in reality, a test's sensitivity depends on the concentration of the analyte (e.g., viral RNA), as in the so-called dilution effect, where sensitivity decreases for larger pools. We show that concentration-dependent test error alters correlation's effect under the most widely-used group testing procedure, the two-stage Dorfman procedure. We prove that when test sensitivity increases with concentration, pooling correlated samples together (correlated pooling) achieves asymptotically higher sensitivity than independently pooling the samples (naive pooling). In contrast, in the concentration-independent case, correlation does not affect sensitivity. Moreover, with concentration-dependent errors, correlation can degrade test efficiency compared to naive pooling whereas under concentration-independent errors, correlation always improves efficiency. We propose an alternative measure of test resource usage, the number of positives found per test consumed, which we argue is better aligned with infection control, and show that correlated pooling outperforms naive pooling on this measure. In simulation, we show that the effect of correlation under realistic concentration-dependent test error meaningfully differs from correlation's effect assuming fixed sensitivity. Our findings underscore the importance for policy-makers of using models that incorporate naturally-occurring correlation and of considering ways of strengthening this correlation.
△ Less
Submitted 30 March, 2025; v1 submitted 14 November, 2021;
originally announced November 2021.
-
Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave Signal
Authors:
Yadong Li,
Dongheng Zhang,
Jinbo Chen,
Jinwei Wan,
Dong Zhang,
Yang Hu,
Qibin Sun,
Yan Chen
Abstract:
Human gesture recognition using millimeter-wave (mmWave) signals provides attractive applications including smart home and in-car interfaces. While existing works achieve promising performance under controlled settings, practical applications are still limited due to the need of intensive data collection, extra training efforts when adapting to new domains, and poor performance for real-time recog…
▽ More
Human gesture recognition using millimeter-wave (mmWave) signals provides attractive applications including smart home and in-car interfaces. While existing works achieve promising performance under controlled settings, practical applications are still limited due to the need of intensive data collection, extra training efforts when adapting to new domains, and poor performance for real-time recognition. In this paper, we propose DI-Gesture, a domain-independent and real-time mmWave gesture recognition system. Specifically, we first derive signal variations corresponding to human gestures with spatial-temporal processing. To enhance the robustness of the system and reduce data collecting efforts, we design a data augmentation framework for mmWave signals based on correlations between signal patterns and gesture variations. Furthermore, a spatial-temporal gesture segmentation algorithm is employed for real-time recognition. Extensive experimental results show DI-Gesture achieves an average accuracy of 97.92\%, 99.18\%, and 98.76\% for new users, environments, and locations, respectively. We also evaluate DI-Gesture in challenging scenarios like real-time recognition and sensing at extreme angles, all of which demonstrate the superior robustness and effectiveness of our system.
△ Less
Submitted 8 October, 2022; v1 submitted 11 November, 2021;
originally announced November 2021.
-
LAE : Long-tailed Age Estimation
Authors:
Zenghao Bao,
Zichang Tan,
Yu Zhu,
Jun Wan,
Xibo Ma,
Zhen Lei,
Guodong Guo
Abstract:
Facial age estimation is an important yet very challenging problem in computer vision. To improve the performance of facial age estimation, we first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on. Compared with the standard baseline, the proposed one significantly decreases the estimation e…
▽ More
Facial age estimation is an important yet very challenging problem in computer vision. To improve the performance of facial age estimation, we first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on. Compared with the standard baseline, the proposed one significantly decreases the estimation errors. Moreover, long-tailed recognition has been an important topic in facial age datasets, where the samples often lack on the elderly and children. To train a balanced age estimator, we propose a two-stage training method named Long-tailed Age Estimation (LAE), which decouples the learning procedure into representation learning and classification. The effectiveness of our approach has been demonstrated on the dataset provided by organizers of Guess The Age Contest 2021.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
MARVEL: Raster Manga Vectorization via Primitive-wise Deep Reinforcement Learning
Authors:
Hao Su,
Jianwei Niu,
Xuefeng Liu,
Jiahe Cui,
Ji Wan
Abstract:
Manga is a fashionable Japanese-style comic form that is composed of black-and-white strokes and is generally displayed as raster images on digital devices. Typical mangas have simple textures, wide lines, and few color gradients, which are vectorizable natures to enjoy the merits of vector graphics, e.g., adaptive resolutions and small file sizes. In this paper, we propose MARVEL (MAnga's Raster…
▽ More
Manga is a fashionable Japanese-style comic form that is composed of black-and-white strokes and is generally displayed as raster images on digital devices. Typical mangas have simple textures, wide lines, and few color gradients, which are vectorizable natures to enjoy the merits of vector graphics, e.g., adaptive resolutions and small file sizes. In this paper, we propose MARVEL (MAnga's Raster to VEctor Learning), a primitive-wise approach for vectorizing raster mangas by Deep Reinforcement Learning (DRL). Unlike previous learning-based methods which predict vector parameters for an entire image, MARVEL introduces a new perspective that regards an entire manga as a collection of basic primitives\textemdash stroke lines, and designs a DRL model to decompose the target image into a primitive sequence for achieving accurate vectorization. To improve vectorization accuracies and decrease file sizes, we further propose a stroke accuracy reward to predict accurate stroke lines, and a pruning mechanism to avoid generating erroneous and repeated strokes. Extensive subjective and objective experiments show that our MARVEL can generate impressive results and reaches the state-of-the-art level. Our code is open-source at: https://github.com/SwordHolderSH/Mang2Vec.
△ Less
Submitted 18 July, 2023; v1 submitted 10 October, 2021;
originally announced October 2021.
-
Pose Refinement with Joint Optimization of Visual Points and Lines
Authors:
Shuang Gao,
Jixiang Wan,
Yishan Ping,
Xudong Zhang,
Shuzhou Dong,
Yuchen Yang,
Haikuan Ning,
Jijunnan Li,
Yandong Guo
Abstract:
High-precision camera re-localization technology in a pre-established 3D environment map is the basis for many tasks, such as Augmented Reality, Robotics and Autonomous Driving. The point-based visual re-localization approaches are well-developed in recent decades, but are insufficient in some feature-less cases. In this paper, we design a complete pipeline for camera pose refinement with points a…
▽ More
High-precision camera re-localization technology in a pre-established 3D environment map is the basis for many tasks, such as Augmented Reality, Robotics and Autonomous Driving. The point-based visual re-localization approaches are well-developed in recent decades, but are insufficient in some feature-less cases. In this paper, we design a complete pipeline for camera pose refinement with points and lines, which contains the innovatively designed line extracting CNN named VLSE, the line matching and the pose optimization approaches. We adopt a novel line representation and customize a hybrid convolution block based on the Stacked Hourglass network, to detect accurate and stable line features on images. Then we apply a geometric-based strategy to obtain precise 2D-3D line correspondences using epipolar constraint and reprojection filtering. A following point-line joint cost function is constructed to optimize the camera pose with the initial coarse pose from the pure point-based localization. Sufficient experiments are conducted on open datasets, i.e, line extractor on Wireframe and YorkUrban, localization performance on InLoc duc1 and duc2, to confirm the effectiveness of our point-line joint pose optimization method.
△ Less
Submitted 26 July, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
The spectral property of hypergraph coverings
Authors:
Yi-Min Song,
Yi-Zheng Fan,
Yi Wang,
Meng-Yu Tian,
Jiang-Chao Wan
Abstract:
Let $H$ be a connected $m$-uniform hypergraph, and let $\mathcal{A}(H)$ be the adjacency tensor of $H$ whose spectrum is simply called the spectrum of $H$. Let $s(H)$ denote the number of eigenvectors of $\mathcal{A}(H)$ associated with the spectral radius, and $c(H)$ denote the number of eigenvalues of $\mathcal{A}(H)$ with modulus equal to the spectral radius, which are respectively called the s…
▽ More
Let $H$ be a connected $m$-uniform hypergraph, and let $\mathcal{A}(H)$ be the adjacency tensor of $H$ whose spectrum is simply called the spectrum of $H$. Let $s(H)$ denote the number of eigenvectors of $\mathcal{A}(H)$ associated with the spectral radius, and $c(H)$ denote the number of eigenvalues of $\mathcal{A}(H)$ with modulus equal to the spectral radius, which are respectively called the stabilizing index and cyclic index of $H$. Let $\bar{H}$ be a $k$-fold covering of $H$ which can be obtained from some permutation assignment in the symmetric group $\mathbf{S}_k$ on $H$. In this paper, we first characterize the connectedness of $\bar{H}$ by its incidence graph and the permutation assignment, and then investigate the relationship between the spectral property of $H$ and that of $\bar{H}$. By applying module theory and group representation, if $\bar{H}$ is connected, we prove that $s(H) \mid s(\bar{H})$ and $c(H) \mid c(\bar{H})$. In particular, when $\bar{H}$ is a $2$-fold covering of $H$, if $m$ is even, we show that regardless of multiplicities, the spectrum of $\bar{H}$ contains the spectrum of $H$ and the spectrum of a signed hypergraph with $H$ as underlying hypergraph; if $m$ is odd, we give an explicit formula for $s(\bar{H})$. We also find some differences on the spectral property between hypergraph coverings and graph coverings by examples.
△ Less
Submitted 1 October, 2023; v1 submitted 29 August, 2021;
originally announced August 2021.
-
Dual-Neighborhood Deep Fusion Network for Point Cloud Analysis
Authors:
Guoquan Xu,
Hezhi Cao,
Yifan Zhang,
Jianwei Wan,
Ke Xu,
Yanxin Ma
Abstract:
Recently, deep neural networks have made remarkable achievements in 3D point cloud classification. However, existing classification methods are mainly implemented on idealized point clouds and suffer heavy degradation of per-formance on non-idealized scenarios. To handle this prob-lem, a feature representation learning method, named Dual-Neighborhood Deep Fusion Network (DNDFN), is proposed to ser…
▽ More
Recently, deep neural networks have made remarkable achievements in 3D point cloud classification. However, existing classification methods are mainly implemented on idealized point clouds and suffer heavy degradation of per-formance on non-idealized scenarios. To handle this prob-lem, a feature representation learning method, named Dual-Neighborhood Deep Fusion Network (DNDFN), is proposed to serve as an improved point cloud encoder for the task of non-idealized point cloud classification. DNDFN utilizes a trainable neighborhood learning method called TN-Learning to capture the global key neighborhood. Then, the global neighborhood is fused with the local neighbor-hood to help the network achieve more powerful reasoning ability. Besides, an Information Transfer Convolution (IT-Conv) is proposed for DNDFN to learn the edge infor-mation between point-pairs and benefits the feature transfer procedure. The transmission of information in IT-Conv is similar to the propagation of information in the graph which makes DNDFN closer to the human reasoning mode. Extensive experiments on existing benchmarks especially non-idealized datasets verify the effectiveness of DNDFN and DNDFN achieves the state of the arts.
△ Less
Submitted 5 May, 2022; v1 submitted 20 August, 2021;
originally announced August 2021.
-
3D High-Fidelity Mask Face Presentation Attack Detection Challenge
Authors:
Ajian Liu,
Chenxu Zhao,
Zitong Yu,
Anyang Su,
Xing Liu,
Zijian Kong,
Jun Wan,
Sergio Escalera,
Hugo Jair Escalante,
Zhen Lei,
Guodong Guo
Abstract:
The threat of 3D masks to face recognition systems is increasingly serious and has been widely concerned by researchers. To facilitate the study of the algorithms, a large-scale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask) has been collected. Specifically, it consists of a total amount of 54, 600 videos which are recorded from 75 subjects with 225 realistic masks under…
▽ More
The threat of 3D masks to face recognition systems is increasingly serious and has been widely concerned by researchers. To facilitate the study of the algorithms, a large-scale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask) has been collected. Specifically, it consists of a total amount of 54, 600 videos which are recorded from 75 subjects with 225 realistic masks under 7 new kinds of sensors. Based on this dataset and Protocol 3 which evaluates both the discrimination and generalization ability of the algorithm under the open set scenarios, we organized a 3D High-Fidelity Mask Face Presentation Attack Detection Challenge to boost the research of 3D mask-based attack detection. It attracted 195 teams for the development phase with a total of 18 teams qualifying for the final round. All the results were verified and re-run by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including the introduction of the dataset used, the definition of the protocol, the calculation of the evaluation criteria, and the summary and publication of the competition results. Finally, we focus on introducing and analyzing the top ranking algorithms, the conclusion summary, and the research ideas for mask attack detection provided by this competition.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Artificial Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges
Authors:
Jiafu Wan,
Xiaomin Li,
Hong-Ning Dai,
Andrew Kusiak,
Miguel Martínez-García,
Di Li
Abstract:
The traditional production paradigm of large batch production does not offer flexibility towards satisfying the requirements of individual customers. A new generation of smart factories is expected to support new multi-variety and small-batch customized production modes. For that, Artificial Intelligence (AI) is enabling higher value-added manufacturing by accelerating the integration of manufactu…
▽ More
The traditional production paradigm of large batch production does not offer flexibility towards satisfying the requirements of individual customers. A new generation of smart factories is expected to support new multi-variety and small-batch customized production modes. For that, Artificial Intelligence (AI) is enabling higher value-added manufacturing by accelerating the integration of manufacturing and information communication technologies, including computing, communication, and control. The characteristics of a customized smart factory are to include self-perception, operations optimization, dynamic reconfiguration, and intelligent decision-making. The AI technologies will allow manufacturing systems to perceive the environment, adapt to external needs, and extract the processed knowledge, including business models, such as intelligent production, networked collaboration, and extended service models.
This paper focuses on the implementation of AI in customized manufacturing (CM). The architecture of an AI-driven customized smart factory is presented. Details of intelligent manufacturing devices, intelligent information interaction, and the construction of a flexible manufacturing line are showcased. The state-of-the-art AI technologies of potential use in CM, i.e., machine learning, multi-agent systems, Internet of Things, big data, and cloud-edge computing are surveyed. The AI-enabled technologies in a customized smart factory are validated with a case study of customized packaging. The experimental results have demonstrated that the AI-assisted CM offers the possibility of higher production flexibility and efficiency. Challenges and solutions related to AI in CM are also discussed.
△ Less
Submitted 13 April, 2023; v1 submitted 7 August, 2021;
originally announced August 2021.
-
Machine learning enabled fast evaluation of dynamic aperture for storage ring accelerators
Authors:
Jinyu Wan,
Yi Jiao
Abstract:
For any storage ring-based large-scale scientific facility, one of the most important performance parameters is the dynamic aperture (DA), which measures the motion stability of charged particles in a global manner. To date, long-term tracking-based simulation is regarded as the most reliable method to calculate DA. However, numerical tracking may become a significant issue, especially when lots o…
▽ More
For any storage ring-based large-scale scientific facility, one of the most important performance parameters is the dynamic aperture (DA), which measures the motion stability of charged particles in a global manner. To date, long-term tracking-based simulation is regarded as the most reliable method to calculate DA. However, numerical tracking may become a significant issue, especially when lots of candidate designs of a storage ring need to be evaluated. In this paper, we present a novel machine learning-based method, which can reduce the computation cost of DA tracking by approximately one order of magnitude, while keeping sufficiently high evaluation accuracy. Moreover, we demonstrate that this method is independent of concrete physical models of a storage ring. This method has the potential to be applied to similar problems of identifying irregular motions in other complex dynamical systems.
△ Less
Submitted 22 October, 2021; v1 submitted 5 July, 2021;
originally announced July 2021.
-
ChaLearn Looking at People: Inpainting and Denoising challenges
Authors:
Sergio Escalera,
Marti Soler,
Stephane Ayache,
Umut Guclu,
Jun Wan,
Meysam Madadi,
Xavier Baro,
Hugo Jair Escalante,
Isabelle Guyon
Abstract:
Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in visual data. This chapter describes the design of…
▽ More
Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in visual data. This chapter describes the design of an academic competition focusing on inpainting of images and video sequences that was part of the competition program of WCCI2018 and had a satellite event collocated with ECCV2018. The ChaLearn Looking at People Inpainting Challenge aimed at advancing the state of the art on visual inpainting by promoting the development of methods for recovering missing and occluded information from images and video. Three tracks were proposed in which visual inpainting might be helpful but still challenging: human body pose estimation, text overlays removal and fingerprint denoising. This chapter describes the design of the challenge, which includes the release of three novel datasets, and the description of evaluation metrics, baselines and evaluation protocol. The results of the challenge are analyzed and discussed in detail and conclusions derived from this event are outlined.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Resource Allocation and Service Provisioning in Multi-Agent Cloud Robotics: A Comprehensive Survey
Authors:
Mahbuba Afrin,
Jiong Jin,
Akhlaqur Rahman,
Ashfaqur Rahman,
Jiafu Wan,
Ekram Hossain
Abstract:
Robotic applications nowadays are widely adopted to enhance operational automation and performance of real-world Cyber-Physical Systems (CPSs) including Industry 4.0, agriculture, healthcare, and disaster management. These applications are composed of latency-sensitive, data-heavy, and compute-intensive tasks. The robots, however, are constrained in the computational power and storage capacity. Th…
▽ More
Robotic applications nowadays are widely adopted to enhance operational automation and performance of real-world Cyber-Physical Systems (CPSs) including Industry 4.0, agriculture, healthcare, and disaster management. These applications are composed of latency-sensitive, data-heavy, and compute-intensive tasks. The robots, however, are constrained in the computational power and storage capacity. The concept of multi-agent cloud robotics enables robot-to-robot cooperation and creates a complementary environment for the robots in executing large-scale applications with the capability to utilize the edge and cloud resources. However, in such a collaborative environment, the optimal resource allocation for robotic tasks is challenging to achieve. Heterogeneous energy consumption rates and application of execution costs associated with the robots and computing instances make it even more complex. In addition, the data transmission delay between local robots, edge nodes, and cloud data centres adversely affects the real-time interactions and impedes service performance guarantee. Taking all these issues into account, this paper comprehensively surveys the state-of-the-art on resource allocation and service provisioning in multi-agent cloud robotics. The paper presents the application domains of multi-agent cloud robotics through explicit comparison with the contemporary computing paradigms and identifies the specific research challenges. A complete taxonomy on resource allocation is presented for the first time, together with the discussion of resource pooling, computation offloading, and task scheduling for efficient service provisioning. Furthermore, we highlight the research gaps from the learned lessons, and present future directions deemed beneficial to further advance this emerging field.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Data-driven Chaos Indicator for Nonlinear Dynamics and Applications on Storage Ring Lattice Design
Authors:
Yongjun Li,
Jinyu Wan,
Allen Liu,
Yi Jiao,
Robert Rainer
Abstract:
A data-driven chaos indicator concept is introduced to characterize the degree of chaos for nonlinear dynamical systems. The indicator is represented by the prediction accuracy of surrogate models established purely from data. It provides a metric for the predictability of nonlinear motions in a given system. When using the indicator to implement a tune-scan for a quadratic Henon map, the main res…
▽ More
A data-driven chaos indicator concept is introduced to characterize the degree of chaos for nonlinear dynamical systems. The indicator is represented by the prediction accuracy of surrogate models established purely from data. It provides a metric for the predictability of nonlinear motions in a given system. When using the indicator to implement a tune-scan for a quadratic Henon map, the main resonances and their asymmetric stop-band widths can be identified. When applied to particle transportation in a storage ring, as particle motion becomes more chaotic, its surrogate model prediction accuracy decreases correspondingly. Therefore, the prediction accuracy, acting as a chaos indicator, can be used directly as the objective for nonlinear beam dynamics optimization. This method provides a different perspective on nonlinear beam dynamics and an efficient method for nonlinear lattice optimization. Applications in dynamic aperture optimization are demonstrated as real world examples.
△ Less
Submitted 9 November, 2021; v1 submitted 16 April, 2021;
originally announced April 2021.
-
Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face Presentation Attack Detection
Authors:
Ajian Liu,
Chenxu Zhao,
Zitong Yu,
Jun Wan,
Anyang Su,
Xing Liu,
Zichang Tan,
Sergio Escalera,
Junliang Xing,
Yanyan Liang,
Guodong Guo,
Zhen Lei,
Stan Z. Li,
Du Zhang
Abstract:
Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and remote photoplethysmography (rPPG) methods achiev…
▽ More
Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and remote photoplethysmography (rPPG) methods achieved acceptable performance on these benchmarks but still far from the needs of practical scenarios. To bridge the gap to real-world applications, we introduce a largescale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask). Specifically, a total amount of 54,600 videos are recorded from 75 subjects with 225 realistic masks by 7 new kinds of sensors. Together with the dataset, we propose a novel Contrastive Context-aware Learning framework, namely CCL. CCL is a new training methodology for supervised PAD tasks, which is able to learn by leveraging rich contexts accurately (e.g., subjects, mask material and lighting) among pairs of live faces and high-fidelity mask attacks. Extensive experimental evaluations on HiFiMask and three additional 3D mask datasets demonstrate the effectiveness of our method.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
On the location of zeros of the Laplacian matching polynomials of graphs
Authors:
Jiang-Chao Wan,
Yi Wang,
Ali Mohammadian
Abstract:
The Laplacian matching polynomial of a graph $G$, denoted by $\mathscr{L\hspace{-0.7mm}M}(G,x)$, is a new graph polynomial whose all roots are nonnegative real numbers. In this paper, we investigate the location of zeros of the Laplacian matching polynomials. Let $G$ be a connected graph. We show that $0$ is a root of $\mathscr{L\hspace{-0.7mm}M}(G, x)$ if and only if $G$ is a tree. We prove that…
▽ More
The Laplacian matching polynomial of a graph $G$, denoted by $\mathscr{L\hspace{-0.7mm}M}(G,x)$, is a new graph polynomial whose all roots are nonnegative real numbers. In this paper, we investigate the location of zeros of the Laplacian matching polynomials. Let $G$ be a connected graph. We show that $0$ is a root of $\mathscr{L\hspace{-0.7mm}M}(G, x)$ if and only if $G$ is a tree. We prove that the number of distinct positive zeros of $\mathscr{L\hspace{-0.7mm}M}(G,x)$ is at least equal to the length of the longest path in $G$. It is also established that the zeros of $\mathscr{L\hspace{-0.7mm}M}(G,x)$ and $\mathscr{L\hspace{-0.7mm}M}(G-e,x)$ interlace for each edge $e$ of $G$. Using the path-tree of $G$, we present a linear algebraic approach to investigate the largest zero of $\mathscr{L\hspace{-0.7mm}M}(G,x)$ and particularly to give tight upper and lower bounds on it.
△ Less
Submitted 22 June, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
Machine learning-based direct solver for one-to-many problems on temporal shaping of relativistic electron beams
Authors:
Jinyu Wan,
Yi Jiao
Abstract:
To control the temporal profile of a relativistic electron beam to meet requirements of various advanced scientific applications like free-electron-laser and plasma wakefield acceleration, a widely-used technique is to manipulate the dispersion terms which turns out to be one-to-many problems. Due to their intrinsic one-to-many property, current popular stochastic optimization approaches on tempor…
▽ More
To control the temporal profile of a relativistic electron beam to meet requirements of various advanced scientific applications like free-electron-laser and plasma wakefield acceleration, a widely-used technique is to manipulate the dispersion terms which turns out to be one-to-many problems. Due to their intrinsic one-to-many property, current popular stochastic optimization approaches on temporal shaping may face the problems of long computing time or sometimes suggesting only one solution. Here we propose a real-time solver for one-to-many problems of temporal shaping, with the aid of a semi-supervised machine learning method, the conditional generative adversarial network (CGAN). We demonstrate that the CGAN solver can learn the one-to-many dynamics and is able to accurately and quickly predict the required dispersion terms for different custom temporal profiles. This machine learning-based solver is expected to have the potential for wide applications to one-to-many problems in other scientific fields.
△ Less
Submitted 19 September, 2022; v1 submitted 11 March, 2021;
originally announced March 2021.
-
Volcano: Stateless Cache Side-channel Attack by Exploiting Mesh Interconnect
Authors:
Junpeng Wan,
Yanxiang Bi,
Zhe Zhou,
Zhou Li
Abstract:
Cache side-channel attacks lead to severe security threats to the settings that a CPU is shared across users, e.g., in the cloud. The existing attacks rely on sensing the micro-architectural state changes made by victims, and this assumption can be invalidated by combining spatial (\eg, Intel CAT) and temporal isolation (\eg, time protection). In this work, we advance the state of cache side-chann…
▽ More
Cache side-channel attacks lead to severe security threats to the settings that a CPU is shared across users, e.g., in the cloud. The existing attacks rely on sensing the micro-architectural state changes made by victims, and this assumption can be invalidated by combining spatial (\eg, Intel CAT) and temporal isolation (\eg, time protection). In this work, we advance the state of cache side-channel attacks by showing stateless cache side-channel attacks that cannot be defeated by both spatial and temporal isolation.
This side-channel exploits the timing difference resulted from interconnect congestion. Specifically, to complete cache transactions, for Intel CPUs, cache lines would travel across cores via the CPU mesh interconnect. Nonetheless, the mesh links are shared by all cores, and cache isolation does not segregate the traffic. An attacker can generate interconnect traffic to contend with the victim's on a mesh link, hoping that extra delay will be measured. With the variant delays, the attacker can deduce the memory access pattern of a victim program, and infer its sensitive data. Based on this idea, we implement Volcano and test it against the existing RSA implementations of JDK. We found the RSA private key used by a victim process can be partially recovered. In the end, we propose a few directions for defense and call for the attention of the security community.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Regional Attention with Architecture-Rebuilt 3D Network for RGB-D Gesture Recognition
Authors:
Benjia Zhou,
Yunan Li,
Jun Wan
Abstract:
Human gesture recognition has drawn much attention in the area of computer vision. However, the performance of gesture recognition is always influenced by some gesture-irrelevant factors like the background and the clothes of performers. Therefore, focusing on the regions of hand/arm is important to the gesture recognition. Meanwhile, a more adaptive architecture-searched network structure can als…
▽ More
Human gesture recognition has drawn much attention in the area of computer vision. However, the performance of gesture recognition is always influenced by some gesture-irrelevant factors like the background and the clothes of performers. Therefore, focusing on the regions of hand/arm is important to the gesture recognition. Meanwhile, a more adaptive architecture-searched network structure can also perform better than the block-fixed ones like Resnet since it increases the diversity of features in different stages of the network better. In this paper, we propose a regional attention with architecture-rebuilt 3D network (RAAR3DNet) for gesture recognition. We replace the fixed Inception modules with the automatically rebuilt structure through the network via Neural Architecture Search (NAS), owing to the different shape and representation ability of features in the early, middle, and late stage of the network. It enables the network to capture different levels of feature representations at different layers more adaptively. Meanwhile, we also design a stackable regional attention module called dynamic-static Attention (DSA), which derives a Gaussian guidance heatmap and dynamic motion map to highlight the hand/arm regions and the motion information in the spatial and temporal domains, respectively. Extensive experiments on two recent large-scale RGB-D gesture datasets validate the effectiveness of the proposed method and show it outperforms state-of-the-art methods. The codes of our method are available at: https://github.com/zhoubenjia/RAAR3DNet.
△ Less
Submitted 9 March, 2021; v1 submitted 10 February, 2021;
originally announced February 2021.
-
Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization
Authors:
Ziquan Liu,
Yufei Cui,
Jia Wan,
Yu Mao,
Antoni B. Chan
Abstract:
Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this paper, we demonstrate that the practical usage of weight decay still has some unsolved problems in spite of existing theoretical work on explaining the effect o…
▽ More
Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this paper, we demonstrate that the practical usage of weight decay still has some unsolved problems in spite of existing theoretical work on explaining the effect of weight decay in BN-DNNs. On the one hand, when the non-adaptive learning rate e.g. SGD with momentum is used, the effective learning rate continues to increase even after the initial training stage, which leads to an overfitting effect in many neural architectures. On the other hand, in both SGDM and adaptive learning rate optimizers e.g. Adam, the effect of weight decay on generalization is quite sensitive to the hyperparameter. Thus, finding an optimal weight decay parameter requires extensive parameter searching. To address those weaknesses, we propose to regularize the weight norm using a simple yet effective weight rescaling (WRS) scheme as an alternative to weight decay. WRS controls the weight norm by explicitly rescaling it to the unit norm, which prevents a large increase to the gradient but also ensures a sufficiently large effective learning rate to improve generalization. On a variety of computer vision applications including image classification, object detection, semantic segmentation and crowd counting, we show the effectiveness and robustness of WRS compared with weight decay, implicit weight rescaling (weight standardization) and gradient projection (AdamP).
△ Less
Submitted 17 June, 2022; v1 submitted 5 February, 2021;
originally announced February 2021.
-
The electromagnetic form factors of $Λ_c$ hyperon in the vector meson dominance model
Authors:
Junyao Wan,
Yongliang Yang,
Zhun Lu
Abstract:
We apply a modified vector meson dominance (VMD) model to analyze the electromagnetic form factors of the $Λ_c$ hyperon in the time-like reaction $e^+e^-\rightarrow Λ_c^+ \barΛ_c^-$. In the model, we include the contributions from the vector charmed mesons and their excitations $ψ(1S)$, $ψ(2S)$, $ψ(3770)$, $ψ(4040)$, $ψ(4160)$ and $ψ(4415)$. We perform a combined fit to the available data on the B…
▽ More
We apply a modified vector meson dominance (VMD) model to analyze the electromagnetic form factors of the $Λ_c$ hyperon in the time-like reaction $e^+e^-\rightarrow Λ_c^+ \barΛ_c^-$. In the model, we include the contributions from the vector charmed mesons and their excitations $ψ(1S)$, $ψ(2S)$, $ψ(3770)$, $ψ(4040)$, $ψ(4160)$ and $ψ(4415)$. We perform a combined fit to the available data on the Born cross section in reaction $e^+e^-\rightarrow Λ_c^+ \barΛ_c^-$ and the ratio of the electromagnetic form factors $|G_E/G_M|$ to obtain the values of the model parameters. Our results show that the VMD model can simultaneously describe the data of electromagnetic form factors from the Belle and BESIII Collaborations, and the behavior of $|G_E/G_M|$ for the BESIII data can be qualitatively reproduced by the VMD model prediction at the threshold region. Moreover, we predict the single and double polarization observables in $e^+e^-\rightarrow Λ_c^+ \barΛ_c^-$ reactions, which are experimentally accessible in the polarized process. We also obtain the form factors of the $Λ_c$ hyperon in the space-like region via analytic continuing the time-like form factors.
△ Less
Submitted 16 October, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels
Authors:
Can Gao,
Jie Zhoua,
Duoqian Miao,
Xiaodong Yue,
Jun Wan
Abstract:
Attribute reduction is one of the most important research topics in the theory of rough sets, and many rough sets-based attribute reduction methods have thus been presented. However, most of them are specifically designed for dealing with either labeled data or unlabeled data, while many real-world applications come in the form of partial supervision. In this paper, we propose a rough sets-based s…
▽ More
Attribute reduction is one of the most important research topics in the theory of rough sets, and many rough sets-based attribute reduction methods have thus been presented. However, most of them are specifically designed for dealing with either labeled data or unlabeled data, while many real-world applications come in the form of partial supervision. In this paper, we propose a rough sets-based semi-supervised attribute reduction method for partially labeled data. Particularly, with the aid of prior class distribution information about data, we first develop a simple yet effective strategy to produce the proxy labels for unlabeled data. Then the concept of information granularity is integrated into the information-theoretic measure, based on which, a novel granular conditional entropy measure is proposed, and its monotonicity is proved in theory. Furthermore, a fast heuristic algorithm is provided to generate the optimal reduct of partially labeled data, which could accelerate the process of attribute reduction by removing irrelevant examples and excluding redundant attributes simultaneously. Extensive experiments conducted on UCI data sets demonstrate that the proposed semi-supervised attribute reduction method is promising and even compares favourably with the supervised methods on labeled data and unlabeled data with true labels in terms of classification performance.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.
-
Measuring $H_0$ using X-ray and SZ effect observations of dynamically relaxed galaxy clusters
Authors:
Jenny T. Wan,
Adam B. Mantz,
Jack Sayers,
Steven W. Allen,
R. Glenn Morris,
Sunil R. Golwala
Abstract:
We use a sample of 14 massive, dynamically relaxed galaxy clusters to constrain the Hubble Constant, $H_0$, by combining X-ray and Sunyaev-Zel'dovich (SZ) effect signals measured with Chandra, Planck and Bolocam. This is the first such analysis to marginalize over an empirical, data-driven prior on the overall accuracy of X-ray temperature measurements, while our restriction to the most relaxed, m…
▽ More
We use a sample of 14 massive, dynamically relaxed galaxy clusters to constrain the Hubble Constant, $H_0$, by combining X-ray and Sunyaev-Zel'dovich (SZ) effect signals measured with Chandra, Planck and Bolocam. This is the first such analysis to marginalize over an empirical, data-driven prior on the overall accuracy of X-ray temperature measurements, while our restriction to the most relaxed, massive clusters also minimizes astrophysical systematics. For a cosmological-constant model with $Ω_m = 0.3$ and $Ω_Λ = 0.7$, we find $H_0 = 67.3^{+21.3}_{-13.3}$ km/s/Mpc, limited by the temperature calibration uncertainty (compared to the statistically limited constraint of $H_0 = 72.3^{+7.6}_{-7.6}$ km/s/Mpc). The intrinsic scatter in the X-ray/SZ pressure ratio is found to be $13 \pm 4$ per cent ($10 \pm 3$ per cent when two clusters with significant galactic dust emission are removed from the sample), consistent with being primarily due to triaxiality and projection. We discuss the prospects for reducing the dominant systematic limitation to this analysis, with improved X-ray calibration and/or precise measurements of the relativistic SZ effect providing a plausible route to per cent level constraints on $H_0$.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
GuidedStyle: Attribute Knowledge Guided Style Manipulation for Semantic Face Editing
Authors:
Xianxu Hou,
Xiaokang Zhang,
Linlin Shen,
Zhihui Lai,
Jun Wan
Abstract:
Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there still lacks of control over the generation process in order to achieve semantic face editing. In addition, it remains very challenging to maintain other face information untouched while editing the target attributes. In this pape…
▽ More
Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there still lacks of control over the generation process in order to achieve semantic face editing. In addition, it remains very challenging to maintain other face information untouched while editing the target attributes. In this paper, we propose a novel learning framework, called GuidedStyle, to achieve semantic face editing on StyleGAN by guiding the image generation process with a knowledge network. Furthermore, we allow an attention mechanism in StyleGAN generator to adaptively select a single layer for style manipulation. As a result, our method is able to perform disentangled and controllable edits along various attributes, including smiling, eyeglasses, gender, mustache and hair color. Both qualitative and quantitative results demonstrate the superiority of our method over other competing methods for semantic face editing. Moreover, we show that our model can be also applied to different types of real and artistic face editing, demonstrating strong generalization ability.
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Robust Facial Landmark Detection by Multi-order Multi-constraint Deep Networks
Authors:
Jun Wan,
Zhihui Lai,
Jing Li,
Jie Zhou,
Can Gao
Abstract:
Recently, heatmap regression has been widely explored in facial landmark detection and obtained remarkable performance. However, most of the existing heatmap regression-based facial landmark detection methods neglect to explore the high-order feature correlations, which is very important to learn more representative features and enhance shape constraints. Moreover, no explicit global shape constra…
▽ More
Recently, heatmap regression has been widely explored in facial landmark detection and obtained remarkable performance. However, most of the existing heatmap regression-based facial landmark detection methods neglect to explore the high-order feature correlations, which is very important to learn more representative features and enhance shape constraints. Moreover, no explicit global shape constraints have been added to the final predicted landmarks, which leads to a reduction in accuracy. To address these issues, in this paper, we propose a Multi-order Multi-constraint Deep Network (MMDN) for more powerful feature correlations and shape constraints learning. Specifically, an Implicit Multi-order Correlating Geometry-aware (IMCG) model is proposed to introduce the multi-order spatial correlations and multi-order channel correlations for more discriminative representations. Furthermore, an Explicit Probability-based Boundary-adaptive Regression (EPBR) method is developed to enhance the global shape constraints and further search the semantically consistent landmarks in the predicted boundary for robust facial landmark detection. It's interesting to show that the proposed MMDN can generate more accurate boundary-adaptive landmark heatmaps and effectively enhance shape constraints to the predicted landmarks for faces with large pose variations and heavy occlusions. Experimental results on challenging benchmark datasets demonstrate the superiority of our MMDN over state-of-the-art facial landmark detection methods. The code has been publicly available at https://github.com/junwan2014/MMDN-master.
△ Less
Submitted 10 December, 2020; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Combining Self-Supervised and Supervised Learning with Noisy Labels
Authors:
Yongqi Zhang,
Hui Zhang,
Quanming Yao,
Jun Wan
Abstract:
Since convolutional neural networks (CNNs) can easily overfit noisy labels, which are ubiquitous in visual classification tasks, it has been a great challenge to train CNNs against them robustly. Various methods have been proposed for this challenge. However, none of them pay attention to the difference between representation and classifier learning of CNNs. Thus, inspired by the observation that…
▽ More
Since convolutional neural networks (CNNs) can easily overfit noisy labels, which are ubiquitous in visual classification tasks, it has been a great challenge to train CNNs against them robustly. Various methods have been proposed for this challenge. However, none of them pay attention to the difference between representation and classifier learning of CNNs. Thus, inspired by the observation that classifier is more robust to noisy labels while representation is much more fragile, and by the recent advances of self-supervised representation learning (SSRL) technologies, we design a new method, i.e., CS$^3$NL, to obtain representation by SSRL without labels and train the classifier directly with noisy labels. Extensive experiments are performed on both synthetic and real benchmark datasets. Results demonstrate that the proposed method can beat the state-of-the-art ones by a large margin, especially under a high noisy level.
△ Less
Submitted 25 June, 2023; v1 submitted 16 November, 2020;
originally announced November 2020.
-
An End-to-end Method for Producing Scanning-robust Stylized QR Codes
Authors:
Hao Su,
Jianwei Niu,
Xuefeng Liu,
Qingfeng Li,
Ji Wan,
Mingliang Xu,
Tao Ren
Abstract:
Quick Response (QR) code is one of the most worldwide used two-dimensional codes.~Traditional QR codes appear as random collections of black-and-white modules that lack visual semantics and aesthetic elements, which inspires the recent works to beautify the appearances of QR codes. However, these works adopt fixed generation algorithms and therefore can only generate QR codes with a pre-defined st…
▽ More
Quick Response (QR) code is one of the most worldwide used two-dimensional codes.~Traditional QR codes appear as random collections of black-and-white modules that lack visual semantics and aesthetic elements, which inspires the recent works to beautify the appearances of QR codes. However, these works adopt fixed generation algorithms and therefore can only generate QR codes with a pre-defined style. In this paper, combining the Neural Style Transfer technique, we propose a novel end-to-end method, named ArtCoder, to generate the stylized QR codes that are personalized, diverse, attractive, and scanning-robust.~To guarantee that the generated stylized QR codes are still scanning-robust, we propose a Sampling-Simulation layer, a module-based code loss, and a competition mechanism. The experimental results show that our stylized QR codes have high-quality in both the visual effect and the scanning-robustness, and they are able to support the real-world application.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Robust Facial Landmark Detection by Cross-order Cross-semantic Deep Network
Authors:
Jun Wan,
Zhihui Lai,
Linlin Shen,
Jie Zhou,
Can Gao,
Gang Xiao,
Xianxu Hou
Abstract:
Recently, convolutional neural networks (CNNs)-based facial landmark detection methods have achieved great success. However, most of existing CNN-based facial landmark detection methods have not attempted to activate multiple correlated facial parts and learn different semantic features from them that they can not accurately model the relationships among the local details and can not fully explore…
▽ More
Recently, convolutional neural networks (CNNs)-based facial landmark detection methods have achieved great success. However, most of existing CNN-based facial landmark detection methods have not attempted to activate multiple correlated facial parts and learn different semantic features from them that they can not accurately model the relationships among the local details and can not fully explore more discriminative and fine semantic features, thus they suffer from partial occlusions and large pose variations. To address these problems, we propose a cross-order cross-semantic deep network (CCDN) to boost the semantic features learning for robust facial landmark detection. Specifically, a cross-order two-squeeze multi-excitation (CTM) module is proposed to introduce the cross-order channel correlations for more discriminative representations learning and multiple attention-specific part activation. Moreover, a novel cross-order cross-semantic (COCS) regularizer is designed to drive the network to learn cross-order cross-semantic features from different activation for facial landmark detection. It is interesting to show that by integrating the CTM module and COCS regularizer, the proposed CCDN can effectively activate and learn more fine and complementary cross-order cross-semantic features to improve the accuracy of facial landmark detection under extremely challenging scenarios. Experimental results on challenging benchmark datasets demonstrate the superiority of our CCDN over state-of-the-art facial landmark detection methods.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
NAS-FAS: Static-Dynamic Central Difference Network Search for Face Anti-Spoofing
Authors:
Zitong Yu,
Jun Wan,
Yunxiao Qin,
Xiaobai Li,
Stan Z. Li,
Guoying Zhao
Abstract:
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems. Existing methods heavily rely on the expert-designed networks, which may lead to a sub-optimal solution for FAS task. Here we propose the first FAS method based on neural architecture search (NAS), called NAS-FAS, to discover the well-suited task-aware networks. Unlike previous NAS works mainly focus on developing ef…
▽ More
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems. Existing methods heavily rely on the expert-designed networks, which may lead to a sub-optimal solution for FAS task. Here we propose the first FAS method based on neural architecture search (NAS), called NAS-FAS, to discover the well-suited task-aware networks. Unlike previous NAS works mainly focus on developing efficient search strategies in generic object classification, we pay more attention to study the search spaces for FAS task. The challenges of utilizing NAS for FAS are in two folds: the networks searched on 1) a specific acquisition condition might perform poorly in unseen conditions, and 2) particular spoofing attacks might generalize badly for unseen attacks. To overcome these two issues, we develop a novel search space consisting of central difference convolution and pooling operators. Moreover, an efficient static-dynamic representation is exploited for fully mining the FAS-aware spatio-temporal discrepancy. Besides, we propose Domain/Type-aware Meta-NAS, which leverages cross-domain/type knowledge for robust searching. Finally, in order to evaluate the NAS transferability for cross datasets and unknown attack types, we release a large-scale 3D mask dataset, namely CASIA-SURF 3DMask, for supporting the new 'cross-dataset cross-type' testing protocol. Experiments demonstrate that the proposed NAS-FAS achieves state-of-the-art performance on nine FAS benchmark datasets with four testing protocols.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
On Updating and Querying Submatrices
Authors:
Jason Yang,
Jun Wan
Abstract:
In this paper, we study the $d$-dimensional update-query problem. We provide lower bounds on update and query running times, assuming a long-standing conjecture on min-plus matrix multiplication, as well as algorithms that are close to the lower bounds. Given a $d$-dimensional matrix, an \textit{update} changes each element in a given submatrix from $x$ to $x\bigtriangledown v$, where $v$ is a giv…
▽ More
In this paper, we study the $d$-dimensional update-query problem. We provide lower bounds on update and query running times, assuming a long-standing conjecture on min-plus matrix multiplication, as well as algorithms that are close to the lower bounds. Given a $d$-dimensional matrix, an \textit{update} changes each element in a given submatrix from $x$ to $x\bigtriangledown v$, where $v$ is a given constant. A \textit{query} returns the $\bigtriangleup$ of all elements in a given submatrix. We study the cases where $\bigtriangledown$ and $\bigtriangleup$ are both commutative and associative binary operators. When $d = 1$, updates and queries can be performed in $O(\log N)$ worst-case time for many $(\bigtriangledown,\bigtriangleup)$ by using a segment tree with lazy propagation. However, when $d\ge 2$, similar techniques usually cannot be generalized. We show that if min-plus matrix multiplication cannot be computed in $O(N^{3-\varepsilon})$ time for any $\varepsilon>0$ (which is widely believed to be the case), then for $(\bigtriangledown,\bigtriangleup)=(+,\min)$, either updates or queries cannot both run in $O(N^{1-\varepsilon})$ time for any constant $\varepsilon>0$, or preprocessing cannot run in polynomial time. Finally, we show a special case where lazy propagation can be generalized for $d\ge 2$ and where updates and queries can run in $O(\log^d N)$ worst-case time. We present an algorithm that meets this running time and is simpler than similar algorithms of previous works.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Robust Face Alignment by Multi-order High-precision Hourglass Network
Authors:
Jun Wan,
Zhihui Lai,
Jun Liu,
Jie Zhou,
Can Gao
Abstract:
Heatmap regression (HR) has become one of the mainstream approaches for face alignment and has obtained promising results under constrained environments. However, when a face image suffers from large pose variations, heavy occlusions and complicated illuminations, the performances of HR methods degrade greatly due to the low resolutions of the generated landmark heatmaps and the exclusion of impor…
▽ More
Heatmap regression (HR) has become one of the mainstream approaches for face alignment and has obtained promising results under constrained environments. However, when a face image suffers from large pose variations, heavy occlusions and complicated illuminations, the performances of HR methods degrade greatly due to the low resolutions of the generated landmark heatmaps and the exclusion of important high-order information that can be used to learn more discriminative features. To address the alignment problem for faces with extremely large poses and heavy occlusions, this paper proposes a heatmap subpixel regression (HSR) method and a multi-order cross geometry-aware (MCG) model, which are seamlessly integrated into a novel multi-order high-precision hourglass network (MHHN). The HSR method is proposed to achieve high-precision landmark detection by a well-designed subpixel detection loss (SDL) and subpixel detection technology (SDT). At the same time, the MCG model is able to use the proposed multi-order cross information to learn more discriminative representations for enhancing facial geometric constraints and context information. To the best of our knowledge, this is the first study to explore heatmap subpixel regression for robust and high-precision face alignment. The experimental results from challenging benchmark datasets demonstrate that our approach outperforms state-of-the-art methods in the literature.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
Smooth braneworld in $6$-dimensional asymptotically AdS spacetime
Authors:
Jun-Jie Wan,
Zheng-Quan Cui,
Wen-Bin Feng,
Yu-Xiao Liu
Abstract:
In this paper, we investigate a $6$-dimensional smooth thick braneworld model which contains a compact extra dimension and an infinite large one. The braneworld is generated by a real scalar field with a $φ^6$ potential and the bulk is asymptotically $\text{AdS}_6$ spacetime. The geometry achieves the localization of the free $U(1)$ gauge field, which is a problem in the $5$-dimensional Randall-Su…
▽ More
In this paper, we investigate a $6$-dimensional smooth thick braneworld model which contains a compact extra dimension and an infinite large one. The braneworld is generated by a real scalar field with a $φ^6$ potential and the bulk is asymptotically $\text{AdS}_6$ spacetime. The geometry achieves the localization of the free $U(1)$ gauge field, which is a problem in the $5$-dimensional Randall-Sundrum-like models. In addition, we analyze the stability of the braneworld system and the localization of gravitons.
△ Less
Submitted 1 July, 2021; v1 submitted 10 October, 2020;
originally announced October 2020.
-
Tensor Perturbations and Thick Branes in Higher-dimensional $f(R)$ Gravity
Authors:
Zheng-Quan Cui,
Zi-Chao Lin,
Jun-Jie Wan,
Yu-Xiao Liu,
Li Zhao
Abstract:
We study brane worlds in an anisotropic higher-dimensional spacetime within the context of $f(R)$ gravity. Firstly, we demonstrate that this spacetime with a concrete metric ansatz is stable against linear tensor perturbations under certain conditions. Moreover, the Kaluza-Klein modes of the graviton are analyzed. Secondly, we investigate thick brane solutions in six dimensions and their propertie…
▽ More
We study brane worlds in an anisotropic higher-dimensional spacetime within the context of $f(R)$ gravity. Firstly, we demonstrate that this spacetime with a concrete metric ansatz is stable against linear tensor perturbations under certain conditions. Moreover, the Kaluza-Klein modes of the graviton are analyzed. Secondly, we investigate thick brane solutions in six dimensions and their properties. We further exhibit two sets of solutions for thick branes. At last, the effective potential of the Kaluza-Klein modes of the graviton is discussed for the two solved $f(R)$ models in higher dimensions.
△ Less
Submitted 21 December, 2020; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Think about boundary: Fusing multi-level boundary information for landmark heatmap regression
Authors:
Jinheng Xie,
Jun Wan,
Linlin Shen,
Zhihui Lai
Abstract:
Although current face alignment algorithms have obtained pretty good performances at predicting the location of facial landmarks, huge challenges remain for faces with severe occlusion and large pose variations, etc. On the contrary, semantic location of facial boundary is more likely to be reserved and estimated on these scenes. Therefore, we study a two-stage but end-to-end approach for explorin…
▽ More
Although current face alignment algorithms have obtained pretty good performances at predicting the location of facial landmarks, huge challenges remain for faces with severe occlusion and large pose variations, etc. On the contrary, semantic location of facial boundary is more likely to be reserved and estimated on these scenes. Therefore, we study a two-stage but end-to-end approach for exploring the relationship between the facial boundary and landmarks to get boundary-aware landmark predictions, which consists of two modules: the self-calibrated boundary estimation (SCBE) module and the boundary-aware landmark transform (BALT) module. In the SCBE module, we modify the stem layers and employ intermediate supervision to help generate high-quality facial boundary heatmaps. Boundary-aware features inherited from the SCBE module are integrated into the BALT module in a multi-scale fusion framework to better model the transformation from boundary to landmark heatmap. Experimental results conducted on the challenging benchmark datasets demonstrate that our approach outperforms state-of-the-art methods in the literature.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition
Authors:
Zitong Yu,
Benjia Zhou,
Jun Wan,
Pichao Wang,
Haoyu Chen,
Xin Liu,
Stan Z. Li,
Guoying Zhao
Abstract:
Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manua…
▽ More
Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings.The code is available at https://github.com/ZitongYu/3DCDC-NAS
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Group Testing Enables Asymptomatic Screening for COVID-19 Mitigation: Feasibility and Optimal Pool Size Selection with Dilution Effects
Authors:
Yifan Lin,
Yuxuan Ren,
Jingyuan Wan,
Massey Cashore,
Jiayue Wan,
Yujia Zhang,
Peter Frazier,
Enlu Zhou
Abstract:
Repeated asymptomatic screening for SARS-CoV-2 promises to control spread of the virus but would require too many resources to implement at scale. Group testing is promising for screening more people with fewer test resources: multiple samples tested together in one pool can be excluded with one negative test result. Existing approaches to group testing design for SARS-CoV-2 asymptomatic screening…
▽ More
Repeated asymptomatic screening for SARS-CoV-2 promises to control spread of the virus but would require too many resources to implement at scale. Group testing is promising for screening more people with fewer test resources: multiple samples tested together in one pool can be excluded with one negative test result. Existing approaches to group testing design for SARS-CoV-2 asymptomatic screening, however, do not consider dilution effects: that false negatives become more common with larger pools. As a consequence, they may recommend pool sizes that are too large or misestimate the benefits of screening. Modeling dilution effects, we derive closed-form expressions for the expected number of tests and false negative/positives per person screened under two popular group testing methods: the linear and square array methods. We find that test error correlation induced by a common viral load across an individual's samples results in many fewer false negatives than would be expected from less realistic but more widely assumed independent errors. This insight also suggests that false positives can be controlled through repeated tests without significantly increasing false negatives. Using these closed-form expressions to trace a Pareto frontier over error rates and tests, we design testing protocols for repeated asymptomatic screening of a large population. We minimize disease prevalence by optimizing a time-varying pool sizes and screening frequency constrained by daily test capacity and a false positive limit. This provides a testing protocol practitioners can use for mitigating COVID-19. In a case study, we demonstrate the effectiveness of this methodology in controlling spread.
△ Less
Submitted 16 November, 2020; v1 submitted 14 August, 2020;
originally announced August 2020.
-
Overhead Control with Reliable Transmission of Popular Packets in Ad-Hoc Social Networks
Authors:
Feng Xia,
Hannan Bin Liaqat,
Jing Deng,
Jiafu Wan,
Sajal K. Das
Abstract:
Reliable social connectivity and transmission of data for popular nodes is vital in multihop Ad-hoc Social Networks (ASNETs). In this networking paradigm, transmission unreliability could be caused by multiple social applications running on a single node. This leads to contentions among nodes and connection paths. In addition, congestions can be the result of multiple senders transmitting data to…
▽ More
Reliable social connectivity and transmission of data for popular nodes is vital in multihop Ad-hoc Social Networks (ASNETs). In this networking paradigm, transmission unreliability could be caused by multiple social applications running on a single node. This leads to contentions among nodes and connection paths. In addition, congestions can be the result of multiple senders transmitting data to a single receiver and every sender waiting for a positive acknowledgment to move on. Therefore, traditional Transmission Control Protocol (TCP) performs poorly in ASNETs, due to the fact that the available bandwidth is shared among nodes using round trip time and the acknowledgment is provided individually to every data packet. To solve these issues, we propose a technique, called Overhead Control with Reliable Transmission of Popular Packets in Ad-Hoc Social Networks (RTPS), which improves transmission reliability by assigning bandwidth to users based on their popularity levels: extra bandwidth is assigned to the nodes with higher popularity and their acknowledgments are sent with higher priority. In addition, RTPS further reduces contentions and packet losses by delaying acknowledgment packet transmissions. Our detailed investigations demonstrate the excellent performance of RTPS in terms of throughput latency and overhead with different hop-distances and different numbers of concurrent TCP flows.
△ Less
Submitted 8 August, 2020;
originally announced August 2020.
-
Fine-Grained Crowd Counting
Authors:
Jia Wan,
Nikil Senthil Kumar,
Antoni B. Chan
Abstract:
Current crowd counting algorithms are only concerned about the number of people in an image, which lacks low-level fine-grained information of the crowd. For many practical applications, the total number of people in an image is not as useful as the number of people in each sub-category. E.g., knowing the number of people waiting inline or browsing can help retail stores; knowing the number of peo…
▽ More
Current crowd counting algorithms are only concerned about the number of people in an image, which lacks low-level fine-grained information of the crowd. For many practical applications, the total number of people in an image is not as useful as the number of people in each sub-category. E.g., knowing the number of people waiting inline or browsing can help retail stores; knowing the number of people standing/sitting can help restaurants/cafeterias; knowing the number of violent/non-violent people can help police in crowd management. In this paper, we propose fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals (e.g. standing/sitting or violent behavior) and then counts the number of people in each category. To enable research in this area, we construct a new dataset of four real-world fine-grained counting tasks: traveling direction on a sidewalk, standing or sitting, waiting in line or not, and exhibiting violent behavior or not. Since the appearance features of different crowd categories are similar, the challenge of fine-grained crowd counting is to effectively utilize contextual information to distinguish between categories. We propose a two branch architecture, consisting of a density map estimation branch and a semantic segmentation branch. We propose two refinement strategies for improving the predictions of the two branches. First, to encode contextual information, we propose feature propagation guided by the density map prediction, which eliminates the effect of background features during propagation. Second, we propose a complementary attention model to share information between the two branches. Experiment results confirm the effectiveness of our method.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Ultra-fast quantum-well infared photodetectors operating at 10μm with flat response up to 70GHz at room temperature
Authors:
M. Hakl,
Q. Y. Lin,
S. Lepillet,
M. Billet,
J-F. Lampin,
S. Pirotta,
R. Colombelli,
W. J. Wan,
J. C. Cao,
H. Li,
E. Peytavit,
S. Barbieri
Abstract:
III-V semiconductor mid-infrared photodetectors based on intersubband transitions hold a great potential for ultra-high-speed operation up to several hundreds of GHz. In this work we exploit a ~350nm-thick GaAs/Al0.2Ga0.8As multi-quantum-well heterostructure to demonstrate heterodyne detection at 10um wavelength with a nearly flat frequency response up to 70GHz at room temperature, solely limited…
▽ More
III-V semiconductor mid-infrared photodetectors based on intersubband transitions hold a great potential for ultra-high-speed operation up to several hundreds of GHz. In this work we exploit a ~350nm-thick GaAs/Al0.2Ga0.8As multi-quantum-well heterostructure to demonstrate heterodyne detection at 10um wavelength with a nearly flat frequency response up to 70GHz at room temperature, solely limited by the measurement system bandwidth. This is the broadest RF-bandwidth reported to date for a quantum-well mid-infrared photodetector. Responsivities of 0.15A/W and 1.5A/W are obtained at 300K and 77K respectively. To allow ultrafast operation and illumination at normal incidence, the detector consists of a 50Ohm coplanar waveguide, monolithically integrated with a 2D-array of sub-wavelength antennas, electrically interconnected by suspended wires. With this device architecture we obtain a parasitic capacitance of ~30fF, corresponding to the static capacitance of the antennas, yielding a RC-limited 3dB cutoff frequency >150GHz at 300K, extracted with a small-signal equivalent circuit model. Using this model, we quantitively reproduce the detector frequency response and find intrinsic roll-off time constants as low as 1ps at room temperature.
△ Less
Submitted 5 January, 2021; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Single-Shot 3D Detection of Vehicles from Monocular RGB Images via Geometry Constrained Keypoints in Real-Time
Authors:
Nils Gählert,
Jun-Jun Wan,
Nicolas Jourdan,
Jan Finkbeiner,
Uwe Franke,
Joachim Denzler
Abstract:
In this paper we propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images. Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters and hence keeping the runtime close to pure 2D object detection. The additional parameters are transformed to 3D bounding box keypoints within the network under geometri…
▽ More
In this paper we propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images. Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters and hence keeping the runtime close to pure 2D object detection. The additional parameters are transformed to 3D bounding box keypoints within the network under geometric constraints. Our proposed method features a full 3D description including all three angles of rotation without supervision by any labeled ground truth data for the object's orientation, as it focuses on certain keypoints within the image plane. While our approach can be combined with any modern object detection framework with only little computational overhead, we exemplify the extension of SSD for the prediction of 3D bounding boxes. We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection as well as the novel nuScenes Object Detection benchmarks. While we achieve competitive results on both benchmarks we outperform current state-of-the-art methods in terms of speed with more than 20 FPS for all tested datasets and image resolutions.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Nearly nondestructive thermometry of labeled cold atoms and application to isotropic laser cooling
Authors:
Xin Wang,
Yuan Sun,
Hua-Dong Cheng,
Jin-Yin Wan,
Yan-Ling Meng,
Ling Xiao,
Liang Liu
Abstract:
We have designed and implemented a straightforward method to deterministically measure the temperature of the selected segment of a cold atom ensemble, and we have also developed an upgrade in the form of nondestructive thermometry. The essence is to monitor the thermal expansion of the targeted cold atoms after labeling them through manipulating the internal states, and the nondestructive propert…
▽ More
We have designed and implemented a straightforward method to deterministically measure the temperature of the selected segment of a cold atom ensemble, and we have also developed an upgrade in the form of nondestructive thermometry. The essence is to monitor the thermal expansion of the targeted cold atoms after labeling them through manipulating the internal states, and the nondestructive property relies upon the nearly lossless detection via driving a cycling transition. For cold atoms subject to isotropic laser cooling, this method has the unique capability of addressing only the atoms on the optical detection axis within the enclosure, which is exactly the part we care about in major applications such as atomic clock or quantum sensing. Furthermore, our results confirm the sub-Doppler cooling features in isotropic laser cooling, and we have investigated the relevant cooling properties. Meanwhile, we have applied the recently developed optical configuration with the cooling laser injection in the form of hollow beams, which helps to enhance the cooling performance and accumulate more cold atoms in the central regions.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
Authors:
Jianqiang Wan,
Yang Liu,
Donglai Wei,
Xiang Bai,
Yongchao Xu
Abstract:
Image segmentation is a fundamental vision task and a crucial step for many applications. In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD. Precisely, we define BPD on each pixel as a two-dimensional unit vector pointing from its nearest boundary to the pixel. In the BPD,…
▽ More
Image segmentation is a fundamental vision task and a crucial step for many applications. In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD. Precisely, we define BPD on each pixel as a two-dimensional unit vector pointing from its nearest boundary to the pixel. In the BPD, nearby pixels from different regions have opposite directions departing from each other, and adjacent pixels in the same region have directions pointing to the other or each other (i.e., around medial points). We make use of such property to partition an image into super-BPDs, which are novel informative superpixels with robust direction similarity for fast grouping into segmentation regions. Extensive experimental results on BSDS500 and Pascal Context demonstrate the accuracy and efficency of the proposed super-BPD in segmenting images. In practice, the proposed super-BPD achieves comparable or superior performance with MCG while running at ~25fps vs. 0.07fps. Super-BPD also exhibits a noteworthy transferability to unseen scenes. The code is publicly available at https://github.com/JianqiangWan/Super-BPD.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation
Authors:
Jiajing Wan,
Xinting Huang
Abstract:
This paper presents our strategies in SemEval 2020 Task 4: Commonsense Validation and Explanation. We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks. The results show that our evidence-searching approach improves model performance on commonsense explanation task. Our team ranks 2nd in subtask C according to huma…
▽ More
This paper presents our strategies in SemEval 2020 Task 4: Commonsense Validation and Explanation. We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks. The results show that our evidence-searching approach improves model performance on commonsense explanation task. Our team ranks 2nd in subtask C according to human evaluation score.
△ Less
Submitted 24 July, 2020; v1 submitted 24 May, 2020;
originally announced May 2020.