Search | arXiv e-print repository

IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features

Authors: Anand Kumar, Jiteng Mu, Nuno Vasconcelos

Abstract: Text-to-image (T2I) models have gained widespread adoption among content creators and the general public. However, this has sparked significant concerns regarding data privacy and copyright infringement among artists. Consequently, there is an increasing demand for T2I models to incorporate mechanisms that prevent the generation of specific artistic styles, thereby safeguarding intellectual proper… ▽ More Text-to-image (T2I) models have gained widespread adoption among content creators and the general public. However, this has sparked significant concerns regarding data privacy and copyright infringement among artists. Consequently, there is an increasing demand for T2I models to incorporate mechanisms that prevent the generation of specific artistic styles, thereby safeguarding intellectual property rights. Existing methods for style extraction typically necessitate the collection of custom datasets and the training of specialized models. This, however, is resource-intensive, time-consuming, and often impractical for real-time applications. Moreover, it may not adequately address the dynamic nature of artistic styles and the rapidly evolving landscape of digital art. We present a novel, training-free framework to solve the style attribution problem, using the features produced by a diffusion model alone, without any external modules or retraining. This is denoted as introspective style attribution (IntroStyle) and demonstrates superior performance to state-of-the-art models for style retrieval. We also introduce a synthetic dataset of Style Hacks (SHacks) to isolate artistic style and evaluate fine-grained style attribution performance. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: 16 pages, 17 figures

arXiv:2104.05623 [pdf, other]

Rethinking and Improving the Robustness of Image Style Transfer

Authors: Pei Wang, Yijun Li, Nuno Vasconcelos

Abstract: Extensive research in neural style transfer methods has shown that the correlation between features extracted by a pre-trained VGG network has a remarkable ability to capture the visual style of an image. Surprisingly, however, this stylization quality is not robust and often degrades significantly when applied to features from more advanced and lightweight networks, such as those in the ResNet fa… ▽ More Extensive research in neural style transfer methods has shown that the correlation between features extracted by a pre-trained VGG network has a remarkable ability to capture the visual style of an image. Surprisingly, however, this stylization quality is not robust and often degrades significantly when applied to features from more advanced and lightweight networks, such as those in the ResNet family. By performing extensive experiments with different network architectures, we find that residual connections, which represent the main architectural difference between VGG and ResNet, produce feature maps of small entropy, which are not suitable for style transfer. To improve the robustness of the ResNet architecture, we then propose a simple yet effective solution based on a softmax transformation of the feature activations that enhances their entropy. Experimental results demonstrate that this small magic can greatly improve the quality of stylization results, even for networks with random weights. This suggests that the architecture used for feature extraction is more important than the use of learned weights for the task of style transfer. △ Less

Submitted 7 April, 2021; originally announced April 2021.

Comments: Published in CVPR2021 (Oral)

arXiv:2004.01997 [pdf, other]

doi 10.1007/978-3-030-32226-7_20

Volumetric Attention for 3D Medical Image Segmentation and Detection

Authors: Xudong Wang, Shizhong Han, Yunqiang Chen, Dashan Gao, Nuno Vasconcelos

Abstract: A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed. VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction, and allows the use of pretrained 2D detection models when training data is limited, as is often the case for medical applications. Its integration in the Mask R-CN… ▽ More A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed. VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction, and allows the use of pretrained 2D detection models when training data is limited, as is often the case for medical applications. Its integration in the Mask R-CNN is shown to enable state-of-the-art performance on the Liver Tumor Segmentation (LiTS) Challenge, outperforming the previous challenge winner by 3.9 points and achieving top performance on the LiTS leader board at the time of paper submission. Detection experiments on the DeepLesion dataset also show that the addition of VA to existing object detectors enables a 69.1 sensitivity at 0.5 false positive per image, outperforming the best published results by 6.6 points. △ Less

Submitted 4 April, 2020; originally announced April 2020.

Comments: Accepted by MICCAI 2019

Journal ref: In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 175-184. Springer, Cham, 2019

arXiv:1908.00658 [pdf, ps, other]

Robust Deep Sensing Through Transfer Learning in Cognitive Radio

Authors: Qihang Peng, Andrew Gilman, Nuno Vasconcelos, Pamela C. Cosman, Laurence B. Milstein

Abstract: We propose a robust spectrum sensing framework based on deep learning. The received signals at the secondary user's receiver are filtered, sampled and then directly fed into a convolutional neural network. Although this deep sensing is effective when operating in the same scenario as the collected training data, the sensing performance is degraded when it is applied in a different scenario with di… ▽ More We propose a robust spectrum sensing framework based on deep learning. The received signals at the secondary user's receiver are filtered, sampled and then directly fed into a convolutional neural network. Although this deep sensing is effective when operating in the same scenario as the collected training data, the sensing performance is degraded when it is applied in a different scenario with different wireless signals and propagation. We incorporate transfer learning into the framework to improve the robustness. Results validate the effectiveness as well as the robustness of the proposed deep spectrum sensing framework. △ Less

Submitted 1 August, 2019; originally announced August 2019.

arXiv:1809.02587 [pdf, other]

Self-Supervised Generation of Spatial Audio for 360 Video

Authors: Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, Oliver Wang

Abstract: We introduce an approach to convert mono audio recorded by a 360 video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 video viewing, but spatial audio microphones are still rare in current 360 video production. Our system consists of end-to-end trainable neural networks that separate in… ▽ More We introduce an approach to convert mono audio recorded by a 360 video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 video viewing, but spatial audio microphones are still rare in current 360 video production. Our system consists of end-to-end trainable neural networks that separate individual sound sources and localize them on the viewing sphere, conditioned on multi-modal analysis of audio and 360 video frames. We introduce several datasets, including one filmed ourselves, and one collected in-the-wild from YouTube, consisting of 360 videos uploaded with spatial audio. During training, ground-truth spatial audio serves as self-supervision and a mixed down mono track forms the input to our network. Using our approach, we show that it is possible to infer the spatial location of sound sources based only on 360 video and a mono audio track. △ Less

Submitted 7 September, 2018; originally announced September 2018.

Comments: To appear in NIPS 2018

Showing 1–5 of 5 results for author: Vasconcelos, N