Search | arXiv e-print repository

Connective Viewpoints of Signal-to-Noise Diffusion Models

Authors: Khanh Doan, Long Tung Vuong, Tuan Nguyen, Anh Tuan Bui, Quyen Tran, Thanh-Toan Do, Dinh Phung, Trung Le

Abstract: Diffusion models (DM) have become fundamental components of generative models, excelling across various domains such as image creation, audio generation, and complex data interpolation. Signal-to-Noise diffusion models constitute a diverse family covering most state-of-the-art diffusion models. While there have been several attempts to study Signal-to-Noise (S2N) diffusion models from various pers… ▽ More Diffusion models (DM) have become fundamental components of generative models, excelling across various domains such as image creation, audio generation, and complex data interpolation. Signal-to-Noise diffusion models constitute a diverse family covering most state-of-the-art diffusion models. While there have been several attempts to study Signal-to-Noise (S2N) diffusion models from various perspectives, there remains a need for a comprehensive study connecting different viewpoints and exploring new perspectives. In this study, we offer a comprehensive perspective on noise schedulers, examining their role through the lens of the signal-to-noise ratio (SNR) and its connections to information theory. Building upon this framework, we have developed a generalized backward equation to enhance the performance of the inference process. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2210.07646 [pdf, other]

Vision Transformer Visualization: What Neurons Tell and How Neurons Behave?

Authors: Van-Anh Nguyen, Khanh Pham Dinh, Long Tung Vuong, Thanh-Toan Do, Quan Hung Tran, Dinh Phung, Trung Le

Abstract: Recently vision transformers (ViT) have been applied successfully for various tasks in computer vision. However, important questions such as why they work or how they behave still remain largely unknown. In this paper, we propose an effective visualization technique, to assist us in exposing the information carried in neurons and feature embeddings across the ViT's layers. Our approach departs fro… ▽ More Recently vision transformers (ViT) have been applied successfully for various tasks in computer vision. However, important questions such as why they work or how they behave still remain largely unknown. In this paper, we propose an effective visualization technique, to assist us in exposing the information carried in neurons and feature embeddings across the ViT's layers. Our approach departs from the computational process of ViTs with a focus on visualizing the local and global information in input images and the latent feature embeddings at multiple levels. Visualizations at the input and embeddings at level 0 reveal interesting findings such as providing support as to why ViTs are rather generally robust to image occlusions and patch shuffling; or unlike CNNs, level 0 embeddings already carry rich semantic details. Next, we develop a rigorous framework to perform effective visualizations across layers, exposing the effects of ViTs filters and grouping/clustering behaviors to object patches. Finally, we provide comprehensive experiments on real datasets to qualitatively and quantitatively demonstrate the merit of our proposed methods as well as our findings. https://github.com/byM1902/ViT_visualization △ Less

Submitted 17 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: The first two authors contributed equally to this work. Our code is available at https://github.com/byM1902/ViT_visualization

arXiv:2209.09002 [pdf, other]

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Authors: Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung

Abstract: Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalizatio… ▽ More Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2005.07682 [pdf, other]

doi 10.1364/OPTICA.397707

Small-brain neural networks rapidly solve inverse problems with vortex Fourier encoders

Authors: Baurzhan Muminov, Luat T. Vuong

Abstract: We introduce a vortex phase transform with a lenslet-array to accompany shallow, dense, ``small-brain'' neural networks for high-speed and low-light imaging. Our single-shot ptychographic approach exploits the coherent diffraction, compact representation, and edge enhancement of Fourier-tranformed spiral-phase gradients. With vortex spatial encoding, a small brain is trained to deconvolve images a… ▽ More We introduce a vortex phase transform with a lenslet-array to accompany shallow, dense, ``small-brain'' neural networks for high-speed and low-light imaging. Our single-shot ptychographic approach exploits the coherent diffraction, compact representation, and edge enhancement of Fourier-tranformed spiral-phase gradients. With vortex spatial encoding, a small brain is trained to deconvolve images at rates 5-20 times faster than those achieved with random encoding schemes, where greater advantages are gained in the presence of noise. Once trained, the small brain reconstructs an object from intensity-only data, solving an inverse mapping without performing iterations on each image and without deep-learning schemes. With this hybrid, optical-digital, vortex Fourier encoded, small-brain scheme, we reconstruct MNIST Fashion objects illuminated with low-light flux (5 nJ/cm$^2$) at a rate of several thousand frames per second on a 15 W central processing unit, two orders of magnitude faster than convolutional neural networks. △ Less

Submitted 15 May, 2020; originally announced May 2020.

arXiv:1708.03850 [pdf, other]

Structure in scientific networks: towards predictions of research dynamism

Authors: Benjamin W. Stewart, Andy Rivas, Luat T. Vuong

Abstract: Certain areas of scientific research flourish while others lose advocates and attention. We are interested in whether structural patterns within citation networks correspond to the growth or decline of the research areas to which those networks belong. We focus on three topic areas within optical physics as a set of cases; those areas have developed along different trajectories: one continues to e… ▽ More Certain areas of scientific research flourish while others lose advocates and attention. We are interested in whether structural patterns within citation networks correspond to the growth or decline of the research areas to which those networks belong. We focus on three topic areas within optical physics as a set of cases; those areas have developed along different trajectories: one continues to expand rapidly; another is on the wane after an earlier peak; the final area has re-emerged after a short waning period. These three areas have substantial overlaps in the types of equipment they use and general methodology; at the same time, their citation networks are largely independent of each other. For each of our three areas, we map the citation networks of the top-100 most-cited papers, published pre-1999. In order to quantify the structures of the selected articles' citation networks, we use a modified version of weak tie theory in tandem with entropy measures. Although the fortunes of a given research area are most obviously the result of accumulated innovations and impasses, our preliminary study provides evidence that these citation networks' emergent structures reflect those developments and may shape evolving conversations in the scholarly literature. △ Less

Submitted 13 August, 2017; originally announced August 2017.

Showing 1–5 of 5 results for author: Vuong, L T