-
FastCLIPstyler: Optimisation-free Text-based Image Style Transfer Using Style Representations
Authors:
Ananda Padhmanabhan Suresh,
Sanjana Jain,
Pavit Noinongyao,
Ankush Ganguly,
Ukrit Watchareeruetai,
Aubin Samacoits
Abstract:
In recent years, language-driven artistic style transfer has emerged as a new type of style transfer technique, eliminating the need for a reference style image by using natural language descriptions of the style. The first model to achieve this, called CLIPstyler, has demonstrated impressive stylisation results. However, its lengthy optimisation procedure at runtime for each query limits its suit…
▽ More
In recent years, language-driven artistic style transfer has emerged as a new type of style transfer technique, eliminating the need for a reference style image by using natural language descriptions of the style. The first model to achieve this, called CLIPstyler, has demonstrated impressive stylisation results. However, its lengthy optimisation procedure at runtime for each query limits its suitability for many practical applications. In this work, we present FastCLIPstyler, a generalised text-based image style transfer model capable of stylising images in a single forward pass for arbitrary text inputs. Furthermore, we introduce EdgeCLIPstyler, a lightweight model designed for compatibility with resource-constrained devices. Through quantitative and qualitative comparisons with state-of-the-art approaches, we demonstrate that our models achieve superior stylisation quality based on measurable metrics while offering significantly improved runtime efficiency, particularly on edge devices.
△ Less
Submitted 14 November, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Amortized Variational Inference: A Systematic Review
Authors:
Ankush Ganguly,
Sanjana Jain,
Ukrit Watchareeruetai
Abstract:
The core principle of Variational Inference (VI) is to convert the statistical inference problem of computing complex posterior probability densities into a tractable optimization problem. This property enables VI to be faster than several sampling-based techniques. However, the traditional VI algorithm is not scalable to large data sets and is unable to readily infer out-of-bounds data points wit…
▽ More
The core principle of Variational Inference (VI) is to convert the statistical inference problem of computing complex posterior probability densities into a tractable optimization problem. This property enables VI to be faster than several sampling-based techniques. However, the traditional VI algorithm is not scalable to large data sets and is unable to readily infer out-of-bounds data points without re-running the optimization process. Recent developments in the field, like stochastic-, black box-, and amortized-VI, have helped address these issues. Generative modeling tasks nowadays widely make use of amortized VI for its efficiency and scalability, as it utilizes a parameterized function to learn the approximate posterior density parameters. In this paper, we review the mathematical foundations of various VI techniques to form the basis for understanding amortized VI. Additionally, we provide an overview of the recent trends that address several issues of amortized VI, such as the amortization gap, generalization issues, inconsistent representation learning, and posterior collapse. Finally, we analyze alternate divergence measures that improve VI optimization.
△ Less
Submitted 24 October, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Development of a face mask detection pipeline for mask-wearing monitoring in the era of the COVID-19 pandemic: A modular approach
Authors:
Benjaphan Sommana,
Ukrit Watchareeruetai,
Ankush Ganguly,
Samuel W. F. Earp,
Taya Kitiyakara,
Suparee Boonmanunt,
Ratchainant Thammasudjarit
Abstract:
During the SARS-Cov-2 pandemic, mask-wearing became an effective tool to prevent spreading and contracting the virus. The ability to monitor the mask-wearing rate in the population would be useful for determining public health strategies against the virus. However, artificial intelligence technologies for detecting face masks have not been deployed at a large scale in real-life to measure the mask…
▽ More
During the SARS-Cov-2 pandemic, mask-wearing became an effective tool to prevent spreading and contracting the virus. The ability to monitor the mask-wearing rate in the population would be useful for determining public health strategies against the virus. However, artificial intelligence technologies for detecting face masks have not been deployed at a large scale in real-life to measure the mask-wearing rate in public. In this paper, we present a two-step face mask detection approach consisting of two separate modules: 1) face detection and alignment and 2) face mask classification. This approach allowed us to experiment with different combinations of face detection and face mask classification modules. More specifically, we experimented with PyramidKey and RetinaFace as face detectors while maintaining a lightweight backbone for the face mask classification module. Moreover, we also provide a relabeled annotation of the test set of the AIZOO dataset, where we rectified the incorrect labels for some face images. The evaluation results on the AIZOO and Moxa 3K datasets showed that the proposed face mask detection pipeline surpassed the state-of-the-art methods. The proposed pipeline also yielded a higher mAP on the relabeled test set of the AIZOO dataset than the original test set. Since we trained the proposed model using in-the-wild face images, we can successfully deploy our model to monitor the mask-wearing rate using public CCTV images.
△ Less
Submitted 1 August, 2022; v1 submitted 30 December, 2021;
originally announced December 2021.
-
LOTR: Face Landmark Localization Using Localization Transformer
Authors:
Ukrit Watchareeruetai,
Benjaphan Sommana,
Sanjana Jain,
Pavit Noinongyao,
Ankush Ganguly,
Aubin Samacoits,
Samuel W. F. Earp,
Nakarin Sritrakool
Abstract:
This paper presents a novel Transformer-based facial landmark localization network named Localization Transformer (LOTR). The proposed framework is a direct coordinate regression approach leveraging a Transformer network to better utilize the spatial information in the feature map. An LOTR model consists of three main modules: 1) a visual backbone that converts an input image into a feature map, 2…
▽ More
This paper presents a novel Transformer-based facial landmark localization network named Localization Transformer (LOTR). The proposed framework is a direct coordinate regression approach leveraging a Transformer network to better utilize the spatial information in the feature map. An LOTR model consists of three main modules: 1) a visual backbone that converts an input image into a feature map, 2) a Transformer module that improves the feature representation from the visual backbone, and 3) a landmark prediction head that directly predicts the landmark coordinates from the Transformer's representation. Given cropped-and-aligned face images, the proposed LOTR can be trained end-to-end without requiring any post-processing steps. This paper also introduces the smooth-Wing loss function, which addresses the gradient discontinuity of the Wing loss, leading to better convergence than standard loss functions such as L1, L2, and Wing loss. Experimental results on the JD landmark dataset provided by the First Grand Challenge of 106-Point Facial Landmark Localization indicate the superiority of LOTR over the existing methods on the leaderboard and two recent heatmap-based approaches. On the WFLW dataset, the proposed LOTR framework demonstrates promising results compared with several state-of-the-art methods. Additionally, we report the improvement in state-of-the-art face recognition performance when using our proposed LOTRs for face alignment.
△ Less
Submitted 22 February, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.