Estimating Visual Information From Audio Through Manifold Learning
Authors:
Fabrizio Pedersoli,
Dryden Wiebe,
Amin Banitalebi,
Yong Zhang,
George Tzanetakis,
Kwang Moo Yi
Abstract:
We propose a new framework for extracting visual information about a scene only using audio signals. Audio-based methods can overcome some of the limitations of vision-based methods i.e., they do not require "line-of-sight", are robust to occlusions and changes in illumination, and can function as a backup in case vision/lidar sensors fail. Therefore, audio-based methods can be useful even for app…
▽ More
We propose a new framework for extracting visual information about a scene only using audio signals. Audio-based methods can overcome some of the limitations of vision-based methods i.e., they do not require "line-of-sight", are robust to occlusions and changes in illumination, and can function as a backup in case vision/lidar sensors fail. Therefore, audio-based methods can be useful even for applications in which only visual information is of interest Our framework is based on Manifold Learning and consists of two steps. First, we train a Vector-Quantized Variational Auto-Encoder to learn the data manifold of the particular visual modality we are interested in. Second, we train an Audio Transformation network to map multi-channel audio signals to the latent representation of the corresponding visual sample. We show that our method is able to produce meaningful images from audio using a publicly available audio/visual dataset. In particular, we consider the prediction of the following visual modalities from audio: depth and semantic segmentation. We hope the findings of our work can facilitate further research in visual information extraction from audio. Code is available at: https://github.com/ubc-vision/audio_manifold.
△ Less
Submitted 13 September, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
Robust LSB Watermarking Optimized for Local Structural Similarity
Authors:
Amin Banitalebi,
Said Nader-Esfahani,
Alireza Nasiri Avanaki
Abstract:
Growth of the Internet and networked multimedia systems has emphasized the need for copyright protection of the media. Media can be images, audio clips, videos and etc. Digital watermarking is today extensively used for many applications such as authentication of ownership or identification of illegal copies. Digital watermark is an invisible or maybe visible structure added to the original media…
▽ More
Growth of the Internet and networked multimedia systems has emphasized the need for copyright protection of the media. Media can be images, audio clips, videos and etc. Digital watermarking is today extensively used for many applications such as authentication of ownership or identification of illegal copies. Digital watermark is an invisible or maybe visible structure added to the original media (known as asset). Images are considered as communication channel when they are subject to a watermark embedding procedure so in the case of embedding a digital watermark in an image, the capacity of the channel should be considered. There is a trade-off between imperceptibility, robustness and capacity for embedding a watermark in an asset. In the case of image watermarks, it is reasonable that the watermarking algorithm should depend on the content and structure of the image. Conventionally, mean squared error (MSE) has been used as a common distortion measure to assess the quality of the images. Newly developed quality metrics proposed some distortion measures that are based on human visual system (HVS). These metrics show that MSE is not based on HVS and it has a lack of accuracy when dealing with perceptually important signals such as images and videos. SSIM or structural similarity is a state of the art HVS based image quality criterion that has recently been of much interest. In this paper we propose a robust least significant bit (LSB) watermarking scheme which is optimized for structural similarity. The watermark is embedded into a host image through an adaptive algorithm. Various attacks examined on the embedding approach and simulation results revealed the fact that the watermarked sequence can be extracted with an acceptable accuracy after all attacks.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.