-
Artifacts of Idiosyncracy in Global Street View Data
Authors:
Tim Alpherts,
Sennay Ghebreab,
Nanne van Noord
Abstract:
Street view data is increasingly being used in computer vision applications in recent years. Machine learning datasets are collected for these applications using simple sampling techniques. These datasets are assumed to be a systematic representation of cities, especially when densely sampled. Prior works however, show that there are clear gaps in coverage, with certain cities or regions being cov…
▽ More
Street view data is increasingly being used in computer vision applications in recent years. Machine learning datasets are collected for these applications using simple sampling techniques. These datasets are assumed to be a systematic representation of cities, especially when densely sampled. Prior works however, show that there are clear gaps in coverage, with certain cities or regions being covered poorly or not at all. Here we demonstrate that a cities' idiosyncracies, such as city layout, may lead to biases in street view data for 28 cities across the globe, even when they are densely covered. We quantitatively uncover biases in the distribution of coverage of street view data and propose a method for evaluation of such distributions to get better insight in idiosyncracies in a cities' coverage. In addition, we perform a case study of Amsterdam with semi-structured interviews, showing how idiosyncracies of the collection process impact representation of cities and regions and allowing us to address biases at their source.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
EMPLACE: Self-Supervised Urban Scene Change Detection
Authors:
Tim Alpherts,
Sennay Ghebreab,
Nanne van Noord
Abstract:
Urban change is a constant process that influences the perception of neighbourhoods and the lives of the people within them. The field of Urban Scene Change Detection (USCD) aims to capture changes in street scenes using computer vision and can help raise awareness of changes that make it possible to better understand the city and its residents. Traditionally, the field of USCD has used supervised…
▽ More
Urban change is a constant process that influences the perception of neighbourhoods and the lives of the people within them. The field of Urban Scene Change Detection (USCD) aims to capture changes in street scenes using computer vision and can help raise awareness of changes that make it possible to better understand the city and its residents. Traditionally, the field of USCD has used supervised methods with small scale datasets. This constrains methods when applied to new cities, as it requires labour-intensive labeling processes and forces a priori definitions of relevant change. In this paper we introduce AC-1M the largest USCD dataset by far of over 1.1M images, together with EMPLACE, a self-supervising method to train a Vision Transformer using our adaptive triplet loss. We show EMPLACE outperforms SOTA methods both as a pre-training method for linear fine-tuning as well as a zero-shot setting. Lastly, in a case study of Amsterdam, we show that we are able to detect both small and large changes throughout the city and that changes uncovered by EMPLACE, depending on size, correlate with housing prices - which in turn is indicative of inequity.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Stylistic Multi-Task Analysis of Ukiyo-e Woodblock Prints
Authors:
Selina Khan,
Nanne van Noord
Abstract:
In this work we present a large-scale dataset of \textit{Ukiyo-e} woodblock prints. Unlike previous works and datasets in the artistic domain that primarily focus on western art, this paper explores this pre-modern Japanese art form with the aim of broadening the scope for stylistic analysis and to provide a benchmark to evaluate a variety of art focused Computer Vision approaches. Our dataset con…
▽ More
In this work we present a large-scale dataset of \textit{Ukiyo-e} woodblock prints. Unlike previous works and datasets in the artistic domain that primarily focus on western art, this paper explores this pre-modern Japanese art form with the aim of broadening the scope for stylistic analysis and to provide a benchmark to evaluate a variety of art focused Computer Vision approaches. Our dataset consists of over $175.000$ prints with corresponding metadata (\eg artist, era, and creation date) from the 17th century to present day. By approaching stylistic analysis as a Multi-Task problem we aim to more efficiently utilize the available metadata, and learn more general representations of style. We show results for well-known baselines and state-of-the-art multi-task learning frameworks to enable future comparison, and to encourage stylistic analysis on this artistic domain.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Context-Infused Visual Grounding for Art
Authors:
Selina Khan,
Nanne van Noord
Abstract:
Many artwork collections contain textual attributes that provide rich and contextualised descriptions of artworks. Visual grounding offers the potential for localising subjects within these descriptions on images, however, existing approaches are trained on natural images and generalise poorly to art. In this paper, we present CIGAr (Context-Infused GroundingDINO for Art), a visual grounding appro…
▽ More
Many artwork collections contain textual attributes that provide rich and contextualised descriptions of artworks. Visual grounding offers the potential for localising subjects within these descriptions on images, however, existing approaches are trained on natural images and generalise poorly to art. In this paper, we present CIGAr (Context-Infused GroundingDINO for Art), a visual grounding approach which utilises the artwork descriptions during training as context, thereby enabling visual grounding on art. In addition, we present a new dataset, Ukiyo-eVG, with manually annotated phrase-grounding annotations, and we set a new state-of-the-art for object detection on two artwork datasets.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
TULIP: Token-length Upgraded CLIP
Authors:
Ivona Najdenkoska,
Mohammad Mahdi Derakhshani,
Yuki M. Asano,
Nanne van Noord,
Marcel Worring,
Cees G. M. Snoek
Abstract:
We address the challenge of representing long captions in vision-language models, such as CLIP. By design these models are limited by fixed, absolute positional encodings, restricting inputs to a maximum of 77 tokens and hindering performance on tasks requiring longer descriptions. Although recent work has attempted to overcome this limit, their proposed approaches struggle to model token relation…
▽ More
We address the challenge of representing long captions in vision-language models, such as CLIP. By design these models are limited by fixed, absolute positional encodings, restricting inputs to a maximum of 77 tokens and hindering performance on tasks requiring longer descriptions. Although recent work has attempted to overcome this limit, their proposed approaches struggle to model token relationships over longer distances and simply extend to a fixed new token length. Instead, we propose a generalizable method, named TULIP, able to upgrade the token length to any length for CLIP-like models. We do so by improving the architecture with relative position encodings, followed by a training procedure that (i) distills the original CLIP text encoder into an encoder with relative position encodings and (ii) enhances the model for aligning longer captions with images. By effectively encoding captions longer than the default 77 tokens, our model outperforms baselines on cross-modal tasks such as retrieval and text-to-image generation. The code repository is available at https://github.com/ivonajdenkoska/tulip.
△ Less
Submitted 28 March, 2025; v1 submitted 13 October, 2024;
originally announced October 2024.
-
GO4Align: Group Optimization for Multi-Task Alignment
Authors:
Jiayi Shen,
Cheems Wang,
Zehao Xiao,
Nanne Van Noord,
Marcel Worring
Abstract:
This paper proposes \textit{GO4Align}, a multi-task optimization approach that tackles task imbalance by explicitly aligning the optimization across tasks. To achieve this, we design an adaptive group risk minimization strategy, comprising two techniques in implementation: (i) dynamical group assignment, which clusters similar tasks based on task interactions; (ii) risk-guided group indicators, wh…
▽ More
This paper proposes \textit{GO4Align}, a multi-task optimization approach that tackles task imbalance by explicitly aligning the optimization across tasks. To achieve this, we design an adaptive group risk minimization strategy, comprising two techniques in implementation: (i) dynamical group assignment, which clusters similar tasks based on task interactions; (ii) risk-guided group indicators, which exploit consistent task correlations with risk information from previous iterations. Comprehensive experimental results on diverse benchmarks demonstrate our method's performance superiority with even lower computational costs.
△ Less
Submitted 29 October, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
Authors:
Carlo Bretti,
Pascal Mettes,
Hendrik Vincent Koops,
Daan Odijk,
Nanne van Noord
Abstract:
Creating a trailer requires carefully picking out and piecing together brief enticing moments out of a longer video, making it a challenging and time-consuming task. This requires selecting moments based on both visual and dialogue information. We introduce a multi-modal method for predicting the trailerness to assist editors in selecting trailer-worthy moments from long-form videos. We present re…
▽ More
Creating a trailer requires carefully picking out and piecing together brief enticing moments out of a longer video, making it a challenging and time-consuming task. This requires selecting moments based on both visual and dialogue information. We introduce a multi-modal method for predicting the trailerness to assist editors in selecting trailer-worthy moments from long-form videos. We present results on a newly introduced soap opera dataset, demonstrating that predicting trailerness is a challenging task that benefits from multi-modal information. Code is available at https://github.com/carlobretti/cliffhanger
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Blind Dates: Examining the Expression of Temporality in Historical Photographs
Authors:
Alexandra Barancová,
Melvin Wevers,
Nanne van Noord
Abstract:
This paper explores the capacity of computer vision models to discern temporal information in visual content, focusing specifically on historical photographs. We investigate the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model. Our experiment consists of three steps: zero-shot classification, fine-tuning, and analysis of visual content…
▽ More
This paper explores the capacity of computer vision models to discern temporal information in visual content, focusing specifically on historical photographs. We investigate the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model. Our experiment consists of three steps: zero-shot classification, fine-tuning, and analysis of visual content. We use the \textit{De Boer Scene Detection} dataset, containing 39,866 gray-scale historical press photographs from 1950 to 1999. The results show that zero-shot classification is relatively ineffective for image dating, with a bias towards predicting dates in the past. Fine-tuning OpenCLIP with a logistic classifier improves performance and eliminates the bias. Additionally, our analysis reveals that images featuring buses, cars, cats, dogs, and people are more accurately dated, suggesting the presence of temporal markers. The study highlights the potential of machine learning models like OpenCLIP in dating images and emphasizes the importance of fine-tuning for accurate temporal analysis. Future research should explore the application of these findings to color photographs and diverse datasets.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Prototype-based Dataset Comparison
Authors:
Nanne van Noord
Abstract:
Dataset summarisation is a fruitful approach to dataset inspection. However, when applied to a single dataset the discovery of visual concepts is restricted to those most prominent. We argue that a comparative approach can expand upon this paradigm to enable richer forms of dataset inspection that go beyond the most prominent concepts. To enable dataset comparison we present a module that learns c…
▽ More
Dataset summarisation is a fruitful approach to dataset inspection. However, when applied to a single dataset the discovery of visual concepts is restricted to those most prominent. We argue that a comparative approach can expand upon this paradigm to enable richer forms of dataset inspection that go beyond the most prominent concepts. To enable dataset comparison we present a module that learns concept-level prototypes across datasets. We leverage self-supervised learning to discover these prototypes without supervision, and we demonstrate the benefits of our approach in two case-studies. Our findings show that dataset comparison extends dataset inspection and we hope to encourage more works in this direction. Code and usage instructions available at https://github.com/Nanne/ProtoSim
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Hierarchical Explanations for Video Action Recognition
Authors:
Sadaf Gulshad,
Teng Long,
Nanne van Noord
Abstract:
To interpret deep neural networks, one main approach is to dissect the visual input and find the prototypical parts responsible for the classification. However, existing methods often ignore the hierarchical relationship between these prototypes, and thus can not explain semantic concepts at both higher level (e.g., water sports) and lower level (e.g., swimming). In this paper inspired by human co…
▽ More
To interpret deep neural networks, one main approach is to dissect the visual input and find the prototypical parts responsible for the classification. However, existing methods often ignore the hierarchical relationship between these prototypes, and thus can not explain semantic concepts at both higher level (e.g., water sports) and lower level (e.g., swimming). In this paper inspired by human cognition system, we leverage hierarchal information to deal with uncertainty: When we observe water and human activity, but no definitive action it can be recognized as the water sports parent class. Only after observing a person swimming can we definitively refine it to the swimming action. To this end, we propose HIerarchical Prototype Explainer (HIPE) to build hierarchical relations between prototypes and classes. HIPE enables a reasoning process for video action classification by dissecting the input video frames on multiple levels of the class hierarchy, our method is also applicable to other video tasks. The faithfulness of our method is verified by reducing accuracy-explainability trade off on ActivityNet and UCF-101 while providing multi-level explanations.
△ Less
Submitted 3 April, 2023; v1 submitted 1 January, 2023;
originally announced January 2023.
-
An Analytics of Culture: Modeling Subjectivity, Scalability, Contextuality, and Temporality
Authors:
Nanne van Noord,
Melvin Wevers,
Tobias Blanke,
Julia Noordegraaf,
Marcel Worring
Abstract:
There is a bidirectional relationship between culture and AI; AI models are increasingly used to analyse culture, thereby shaping our understanding of culture. On the other hand, the models are trained on collections of cultural artifacts thereby implicitly, and not always correctly, encoding expressions of culture. This creates a tension that both limits the use of AI for analysing culture and le…
▽ More
There is a bidirectional relationship between culture and AI; AI models are increasingly used to analyse culture, thereby shaping our understanding of culture. On the other hand, the models are trained on collections of cultural artifacts thereby implicitly, and not always correctly, encoding expressions of culture. This creates a tension that both limits the use of AI for analysing culture and leads to problems in AI with respect to cultural complex issues such as bias.
One approach to overcome this tension is to more extensively take into account the intricacies and complexities of culture. We structure our discussion using four concepts that guide humanistic inquiry into culture: subjectivity, scalability, contextuality, and temporality. We focus on these concepts because they have not yet been sufficiently represented in AI research. We believe that possible implementations of these aspects into AI research leads to AI that better captures the complexities of culture. In what follows, we briefly describe these four concepts and their absence in AI research. For each concept, we define possible research challenges.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Hyperbolic Image Segmentation
Authors:
Mina GhadimiAtigh,
Julian Schoep,
Erman Acar,
Nanne van Noord,
Pascal Mettes
Abstract:
For image segmentation, the current standard is to perform pixel-level optimization and inference in Euclidean output embedding spaces through linear hyperplanes. In this work, we show that hyperbolic manifolds provide a valuable alternative for image segmentation and propose a tractable formulation of hierarchical pixel-level classification in hyperbolic space. Hyperbolic Image Segmentation opens…
▽ More
For image segmentation, the current standard is to perform pixel-level optimization and inference in Euclidean output embedding spaces through linear hyperplanes. In this work, we show that hyperbolic manifolds provide a valuable alternative for image segmentation and propose a tractable formulation of hierarchical pixel-level classification in hyperbolic space. Hyperbolic Image Segmentation opens up new possibilities and practical benefits for segmentation, such as uncertainty estimation and boundary information for free, zero-label generalization, and increased performance in low-dimensional output embeddings.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
The Met Dataset: Instance-level Recognition for Artworks
Authors:
Nikolaos-Antonios Ypsilantis,
Noa Garcia,
Guangxing Han,
Sarah Ibrahimi,
Nanne Van Noord,
Giorgos Tolias
Abstract:
This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhib…
▽ More
This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhibit with photos taken under studio conditions. Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing. Testing is additionally performed on a set of images not related to Met exhibits making the task resemble an out-of-distribution detection problem. The proposed benchmark follows the paradigm of other recent datasets for instance-level recognition on different domains to encourage research on domain independent approaches. A number of suitable approaches are evaluated to offer a testbed for future comparisons. Self-supervised and supervised contrastive learning are effectively combined to train the backbone which is used for non-parametric classification that is shown as a promising direction. Dataset webpage: http://cmp.felk.cvut.cz/met/
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Extending CLIP for Category-to-image Retrieval in E-commerce
Authors:
Mariya Hendriksen,
Maurits Bleeker,
Svitlana Vakulenko,
Nanne van Noord,
Ernst Kuiper,
Maarten de Rijke
Abstract:
E-commerce provides rich multimodal data that is barely leveraged in practice. One aspect of this data is a category tree that is being used in search and recommendation. However, in practice, during a user's session there is often a mismatch between a textual and a visual representation of a given category. Motivated by the problem, we introduce the task of category-to-image retrieval in e-commer…
▽ More
E-commerce provides rich multimodal data that is barely leveraged in practice. One aspect of this data is a category tree that is being used in search and recommendation. However, in practice, during a user's session there is often a mismatch between a textual and a visual representation of a given category. Motivated by the problem, we introduce the task of category-to-image retrieval in e-commerce and propose a model for the task, CLIP-ITA. The model leverages information from multiple modalities (textual, visual, and attribute modality) to create product representations. We explore how adding information from multiple modalities (textual, visual, and attribute modality) impacts the model's performance. In particular, we observe that CLIP-ITA significantly outperforms a comparable model that leverages only the visual modality and a comparable model that leverages the visual and attribute modality.
△ Less
Submitted 4 January, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Inside Out Visual Place Recognition
Authors:
Sarah Ibrahimi,
Nanne van Noord,
Tim Alpherts,
Marcel Worring
Abstract:
Visual Place Recognition (VPR) is generally concerned with localizing outdoor images. However, localizing indoor scenes that contain part of an outdoor scene can be of large value for a wide range of applications. In this paper, we introduce Inside Out Visual Place Recognition (IOVPR), a task aiming to localize images based on outdoor scenes visible through windows. For this task we present the ne…
▽ More
Visual Place Recognition (VPR) is generally concerned with localizing outdoor images. However, localizing indoor scenes that contain part of an outdoor scene can be of large value for a wide range of applications. In this paper, we introduce Inside Out Visual Place Recognition (IOVPR), a task aiming to localize images based on outdoor scenes visible through windows. For this task we present the new large-scale dataset Amsterdam-XXXL, with images taken in Amsterdam, that consists of 6.4 million panoramic street-view images and 1000 user-generated indoor queries. Additionally, we introduce a new training protocol Inside Out Data Augmentation to adapt Visual Place Recognition methods for localizing indoor images, demonstrating the potential of Inside Out Visual Place Recognition. We empirically show the benefits of our proposed data augmentation scheme on a smaller scale, whilst demonstrating the difficulty of this large-scale dataset for existing methods. With this new task we aim to encourage development of methods for IOVPR. The dataset and code are available for research purposes at https://github.com/saibr/IOVPR
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
Translating Visual Art into Music
Authors:
Maximilian Müller-Eberstein,
Nanne van Noord
Abstract:
The Synesthetic Variational Autoencoder (SynVAE) introduced in this research is able to learn a consistent mapping between visual and auditive sensory modalities in the absence of paired datasets. A quantitative evaluation on MNIST as well as the Behance Artistic Media dataset (BAM) shows that SynVAE is capable of retaining sufficient information content during the translation while maintaining cr…
▽ More
The Synesthetic Variational Autoencoder (SynVAE) introduced in this research is able to learn a consistent mapping between visual and auditive sensory modalities in the absence of paired datasets. A quantitative evaluation on MNIST as well as the Behance Artistic Media dataset (BAM) shows that SynVAE is capable of retaining sufficient information content during the translation while maintaining cross-modal latent space consistency. In a qualitative evaluation trial, human evaluators were furthermore able to match musical samples with the images which generated them with accuracies of up to 73%.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
I Bet You Are Wrong: Gambling Adversarial Networks for Structured Semantic Segmentation
Authors:
Laurens Samson,
Nanne van Noord,
Olaf Booij,
Michael Hofmann,
Efstratios Gavves,
Mohsen Ghafoorian
Abstract:
Adversarial training has been recently employed for realizing structured semantic segmentation, in which the aim is to preserve higher-level scene structural consistencies in dense predictions. However, as we show, value-based discrimination between the predictions from the segmentation network and ground-truth annotations can hinder the training process from learning to improve structural qualiti…
▽ More
Adversarial training has been recently employed for realizing structured semantic segmentation, in which the aim is to preserve higher-level scene structural consistencies in dense predictions. However, as we show, value-based discrimination between the predictions from the segmentation network and ground-truth annotations can hinder the training process from learning to improve structural qualities as well as disabling the network from properly expressing uncertainties. In this paper, we rethink adversarial training for semantic segmentation and propose to formulate the fake/real discrimination framework with a correct/incorrect training objective. More specifically, we replace the discriminator with a "gambler" network that learns to spot and distribute its budget in areas where the predictions are clearly wrong, while the segmenter network tries to leave no clear clues for the gambler where to bet. Empirical evaluation on two road-scene semantic segmentation tasks shows that not only does the proposed method re-enable expressing uncertainties, it also improves pixel-wise and structure-based metrics.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
Learning Task Relatedness in Multi-Task Learning for Images in Context
Authors:
Gjorgji Strezoski,
Nanne van Noord,
Marcel Worring
Abstract:
Multimedia applications often require concurrent solutions to multiple tasks. These tasks hold clues to each-others solutions, however as these relations can be complex this remains a rarely utilized property. When task relations are explicitly defined based on domain knowledge multi-task learning (MTL) offers such concurrent solutions, while exploiting relatedness between multiple tasks performed…
▽ More
Multimedia applications often require concurrent solutions to multiple tasks. These tasks hold clues to each-others solutions, however as these relations can be complex this remains a rarely utilized property. When task relations are explicitly defined based on domain knowledge multi-task learning (MTL) offers such concurrent solutions, while exploiting relatedness between multiple tasks performed over the same dataset. In most cases however, this relatedness is not explicitly defined and the domain expert knowledge that defines it is not available. To address this issue, we introduce Selective Sharing, a method that learns the inter-task relatedness from secondary latent features while the model trains. Using this insight, we can automatically group tasks and allow them to share knowledge in a mutually beneficial way. We support our method with experiments on 5 datasets in classification, regression, and ranking tasks and compare to strong baselines and state-of-the-art approaches showing a consistent improvement in terms of accuracy and parameter counts. In addition, we perform an activation region analysis showing how Selective Sharing affects the learned representation.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Many Task Learning with Task Routing
Authors:
Gjorgji Strezoski,
Nanne van Noord,
Marcel Worring
Abstract:
Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable parameter set to jointly optimize over several tasks. However, when the number of tasks increases so do the complexity of the architectural adjustments and resource requirements. In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activat…
▽ More
Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable parameter set to jointly optimize over several tasks. However, when the number of tasks increases so do the complexity of the architectural adjustments and resource requirements. In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks. To distinguish from regular MTL, we introduce Many Task Learning (MaTL) as a special case of MTL where more than 20 tasks are performed by a single model. Our method dubbed Task Routing (TR) is encapsulated in a layer we call the Task Routing Layer (TRL), which applied in an MaTL scenario successfully fits hundreds of classification tasks in one model. We evaluate our method on 5 datasets against strong baselines and state-of-the-art approaches.
△ Less
Submitted 28 March, 2019;
originally announced March 2019.
-
Light-weight pixel context encoders for image inpainting
Authors:
Nanne van Noord,
Eric Postma
Abstract:
In this work we propose Pixel Content Encoders (PCE), a light-weight image inpainting model, capable of generating novel con-tent for large missing regions in images. Unlike previously presented convolutional neural network based models, our PCE model has an order of magnitude fewer trainable parameters. Moreover, by incorporating dilated convolutions we are able to preserve fine grained spatial i…
▽ More
In this work we propose Pixel Content Encoders (PCE), a light-weight image inpainting model, capable of generating novel con-tent for large missing regions in images. Unlike previously presented convolutional neural network based models, our PCE model has an order of magnitude fewer trainable parameters. Moreover, by incorporating dilated convolutions we are able to preserve fine grained spatial information, achieving state-of-the-art performance on benchmark datasets of natural images and paintings. Besides image inpainting, we show that without changing the architecture, PCE can be used for image extrapolation, generating novel content beyond existing image boundaries.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
Learning scale-variant and scale-invariant features for deep image classification
Authors:
Nanne van Noord,
Eric Postma
Abstract:
Convolutional Neural Networks (CNNs) require large image corpora to be trained on classification tasks. The variation in image resolutions, sizes of objects and patterns depicted, and image scales, hampers CNN training and performance, because the task-relevant information varies over spatial scales. Previous work attempting to deal with such scale variations focused on encouraging scale-invariant…
▽ More
Convolutional Neural Networks (CNNs) require large image corpora to be trained on classification tasks. The variation in image resolutions, sizes of objects and patterns depicted, and image scales, hampers CNN training and performance, because the task-relevant information varies over spatial scales. Previous work attempting to deal with such scale variations focused on encouraging scale-invariant CNN representations. However, scale-invariant representations are incomplete representations of images, because images contain scale-variant information as well. This paper addresses the combined development of scale-invariant and scale-variant representations. We propose a multi- scale CNN method to encourage the recognition of both types of features and evaluate it on a challenging image classification task involving task-relevant characteristics at multiple scales. The results show that our multi-scale CNN outperforms single-scale CNN. This leads to the conclusion that encouraging the combined development of a scale-invariant and scale-variant representation in CNNs is beneficial to image recognition performance.
△ Less
Submitted 13 May, 2016; v1 submitted 3 February, 2016;
originally announced February 2016.
-
Exploring the influence of scale on artist attribution
Authors:
Nanne van Noord,
Eric Postma
Abstract:
Previous work has shown that the artist of an artwork can be identified by use of computational methods that analyse digital images. However, the digitised artworks are often investigated at a coarse scale discarding many of the important details that may define an artist's style. In recent years high resolution images of artworks have become available, which, combined with increased processing po…
▽ More
Previous work has shown that the artist of an artwork can be identified by use of computational methods that analyse digital images. However, the digitised artworks are often investigated at a coarse scale discarding many of the important details that may define an artist's style. In recent years high resolution images of artworks have become available, which, combined with increased processing power and new computational techniques, allow us to analyse digital images of artworks at a very fine scale. In this work we train and evaluate a Convolutional Neural Network (CNN) on the task of artist attribution using artwork images of varying resolutions. To this end, we combine two existing methods to enable the application of high resolution images to CNNs. By comparing the attribution performances obtained at different scales, we find that in most cases finer scales are beneficial to the attribution performance, whereas for a minority of the artists, coarser scales appear to be preferable. We conclude that artist attribution would benefit from a multi-scale CNN approach which vastly expands the possibilities for computational art forensics.
△ Less
Submitted 19 June, 2015;
originally announced June 2015.