-
Learning to cluster neuronal function
Authors:
Nina S. Nellen,
Polina Turishcheva,
Michaela Vystrčilová,
Shashwat Sridhar,
Tim Gollisch,
Andreas S. Tolias,
Alexander S. Ecker
Abstract:
Deep neural networks trained to predict neural activity from visual input and behaviour have shown great potential to serve as digital twins of the visual cortex. Per-neuron embeddings derived from these models could potentially be used to map the functional landscape or identify cell types. However, state-of-the-art predictive models of mouse V1 do not generate functional embeddings that exhibit…
▽ More
Deep neural networks trained to predict neural activity from visual input and behaviour have shown great potential to serve as digital twins of the visual cortex. Per-neuron embeddings derived from these models could potentially be used to map the functional landscape or identify cell types. However, state-of-the-art predictive models of mouse V1 do not generate functional embeddings that exhibit clear clustering patterns which would correspond to cell types. This raises the question whether the lack of clustered structure is due to limitations of current models or a true feature of the functional organization of mouse V1. In this work, we introduce DECEMber -- Deep Embedding Clustering via Expectation Maximization-based refinement -- an explicit inductive bias into predictive models that enhances clustering by adding an auxiliary $t$-distribution-inspired loss function that enforces structured organization among per-neuron embeddings. We jointly optimize both neuronal feature embeddings and clustering parameters, updating cluster centers and scale matrices using the EM-algorithm. We demonstrate that these modifications improve cluster consistency while preserving high predictive performance and surpassing standard clustering methods in terms of stability. Moreover, DECEMber generalizes well across species (mice, primates) and visual areas (retina, V1, V4). The code is available at https://github.com/Nisone2000/sensorium/tree/neuroips_version.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Hierarchical clustering with maximum density paths and mixture models
Authors:
Martin Ritzert,
Polina Turishcheva,
Laura Hansel,
Paul Wollenhaupt,
Marissa A. Weis,
Alexander S. Ecker
Abstract:
Hierarchical clustering is an effective, interpretable method for analyzing structure in data. It reveals insights at multiple scales without requiring a predefined number of clusters and captures nested patterns and subtle relationships, which are often missed by flat clustering approaches. However, existing hierarchical clustering methods struggle with high-dimensional data, especially when ther…
▽ More
Hierarchical clustering is an effective, interpretable method for analyzing structure in data. It reveals insights at multiple scales without requiring a predefined number of clusters and captures nested patterns and subtle relationships, which are often missed by flat clustering approaches. However, existing hierarchical clustering methods struggle with high-dimensional data, especially when there are no clear density gaps between modes. In this work, we introduce t-NEB, a probabilistically grounded hierarchical clustering method, which yields state-of-the-art clustering performance on naturalistic high-dimensional data. t-NEB consists of three steps: (1) density estimation via overclustering; (2) finding maximum density paths between clusters; (3) creating a hierarchical structure via bottom-up cluster merging. t-NEB uses a probabilistic parametric density model for both overclustering and cluster merging, which yields both high clustering performance and a meaningful hierarchy, making it a valuable tool for exploratory data analysis. Code is available at https://github.com/ecker-lab/tneb clustering.
△ Less
Submitted 21 May, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
What should a neuron aim for? Designing local objective functions based on information theory
Authors:
Andreas C. Schneider,
Valentin Neuhaus,
David A. Ehrlich,
Abdullah Makkeh,
Alexander S. Ecker,
Viola Priesemann,
Michael Wibral
Abstract:
In modern deep neural networks, the learning dynamics of the individual neurons is often obscure, as the networks are trained via global optimization. Conversely, biological systems build on self-organized, local learning, achieving robustness and efficiency with limited global information. We here show how self-organization between individual artificial neurons can be achieved by designing abstra…
▽ More
In modern deep neural networks, the learning dynamics of the individual neurons is often obscure, as the networks are trained via global optimization. Conversely, biological systems build on self-organized, local learning, achieving robustness and efficiency with limited global information. We here show how self-organization between individual artificial neurons can be achieved by designing abstract bio-inspired local learning goals. These goals are parameterized using a recent extension of information theory, Partial Information Decomposition (PID), which decomposes the information that a set of information sources holds about an outcome into unique, redundant and synergistic contributions. Our framework enables neurons to locally shape the integration of information from various input classes, i.e. feedforward, feedback, and lateral, by selecting which of the three inputs should contribute uniquely, redundantly or synergistically to the output. This selection is expressed as a weighted sum of PID terms, which, for a given problem, can be directly derived from intuitive reasoning or via numerical optimization, offering a window into understanding task-relevant local information processing. Achieving neuron-level interpretability while enabling strong performance using local learning, our work advances a principled information-theoretic foundation for local learning strategies.
△ Less
Submitted 21 January, 2025; v1 submitted 3 December, 2024;
originally announced December 2024.
-
MNIST-Nd: a set of naturalistic datasets to benchmark clustering across dimensions
Authors:
Polina Turishcheva,
Laura Hansel,
Martin Ritzert,
Marissa A. Weis,
Alexander S. Ecker
Abstract:
Driven by advances in recording technology, large-scale high-dimensional datasets have emerged across many scientific disciplines. Especially in biology, clustering is often used to gain insights into the structure of such datasets, for instance to understand the organization of different cell types. However, clustering is known to scale poorly to high dimensions, even though the exact impact of d…
▽ More
Driven by advances in recording technology, large-scale high-dimensional datasets have emerged across many scientific disciplines. Especially in biology, clustering is often used to gain insights into the structure of such datasets, for instance to understand the organization of different cell types. However, clustering is known to scale poorly to high dimensions, even though the exact impact of dimensionality is unclear as current benchmark datasets are mostly two-dimensional. Here we propose MNIST-Nd, a set of synthetic datasets that share a key property of real-world datasets, namely that individual samples are noisy and clusters do not perfectly separate. MNIST-Nd is obtained by training mixture variational autoencoders with 2 to 64 latent dimensions on MNIST, resulting in six datasets with comparable structure but varying dimensionality. It thus offers the chance to disentangle the impact of dimensionality on clustering. Preliminary common clustering algorithm benchmarks on MNIST-Nd suggest that Leiden is the most robust for growing dimensions.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos
Authors:
Polina Turishcheva,
Paul G. Fahey,
Michaela Vystrčilová,
Laura Hansel,
Rachel Froebe,
Kayla Ponder,
Yongrong Qiu,
Konstantin F. Willeke,
Mohammad Bashiri,
Ruslan Baikulov,
Yu Zhu,
Lei Ma,
Shan Yu,
Tiejun Huang,
Bryan M. Li,
Wolf De Wulf,
Nina Kudryashova,
Matthias H. Hennig,
Nathalie L. Rochefort,
Arno Onken,
Eric Wang,
Zhiwei Ding,
Andreas S. Tolias,
Fabian H. Sinz,
Alexander S Ecker
Abstract:
Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different model on the same ta…
▽ More
Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different model on the same task under standardized conditions. However, there was no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we established the Sensorium 2023 Benchmark Competition with dynamic input, featuring a new large-scale dataset from the primary visual cortex of ten mice. This dataset includes responses from 78,853 neurons to 2 hours of dynamic stimuli per neuron, together with the behavioral measurements such as running speed, pupil dilation, and eye movements. The competition ranked models in two tracks based on predictive performance for neuronal responses on a held-out test set: one focusing on predicting in-domain natural stimuli and another on out-of-distribution (OOD) stimuli to assess model generalization. As part of the NeurIPS 2023 competition track, we received more than 160 model submissions from 22 teams. Several new architectures for predictive models were proposed, and the winning teams improved the previous state-of-the-art model by 50%. Access to the dataset as well as the benchmarking infrastructure will remain online at www.sensorium-competition.net.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Tailor-designed models for the turbulent velocity gradient through normalizing flow
Authors:
Maurizio Carbone,
Vincent J. Peterhans,
Alexander S. Ecker,
Michael Wilczek
Abstract:
Small-scale turbulence can be comprehensively described in terms of velocity gradients, which makes them an appealing starting point for low-dimensional modeling. Typical models consist of stochastic equations based on closures for non-local pressure and viscous contributions. The fidelity of the resulting models depends on the accuracy of the underlying modeling assumptions. Here, we discuss an a…
▽ More
Small-scale turbulence can be comprehensively described in terms of velocity gradients, which makes them an appealing starting point for low-dimensional modeling. Typical models consist of stochastic equations based on closures for non-local pressure and viscous contributions. The fidelity of the resulting models depends on the accuracy of the underlying modeling assumptions. Here, we discuss an alternative data-driven approach leveraging machine learning to derive a velocity gradient model which captures its statistics by construction. We use a normalizing flow to learn the velocity gradient probability density function (PDF) from direct numerical simulation (DNS) of incompressible turbulence. Then, by using the equation for the single-time PDF of the velocity gradient, we construct a deterministic, yet chaotic, dynamical system featuring the learned steady-state PDF by design. Finally, utilizing gauge terms for the velocity gradient single-time statistics, we optimize the time correlations as obtained from our model against the DNS data. As a result, the model time realizations statistically closely resemble the time series from DNS.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Computer Vision for Primate Behavior Analysis in the Wild
Authors:
Richard Vogg,
Timo Lüddecke,
Jonathan Henrich,
Sharmita Dey,
Matthias Nuske,
Valentin Hassler,
Derek Murphy,
Julia Fischer,
Julia Ostner,
Oliver Schülke,
Peter M. Kappeler,
Claudia Fichtel,
Alexander Gail,
Stefan Treue,
Hansjörg Scherberger,
Florentin Wörgötter,
Alexander S. Ecker
Abstract:
Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards…
▽ More
Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards closing this gap, by guiding behavioral scientists in what can be expected from current methods and steering computer vision researchers towards problems that are relevant to advance research in animal behavior. We start with a survey of the state-of-the-art methods for computer vision problems that are directly relevant to the video-based study of animal behavior, including object detection, multi-individual tracking, individual identification, and (inter)action recognition. We then review methods for effort-efficient learning, which is one of the biggest challenges from a practical perspective. Finally, we close with an outlook into the future of the emerging field of computer vision for animal behavior, where we argue that the field should develop approaches to unify detection, tracking, identification and (inter)action recognition in a single, video-based framework.
△ Less
Submitted 12 August, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Most discriminative stimuli for functional cell type clustering
Authors:
Max F. Burg,
Thomas Zenkel,
Michaela Vystrčilová,
Jonathan Oesterle,
Larissa Höfling,
Konstantin F. Willeke,
Jan Lause,
Sarah Müller,
Paul G. Fahey,
Zhiwei Ding,
Kelli Restivo,
Shashwat Sridhar,
Tim Gollisch,
Philipp Berens,
Andreas S. Tolias,
Thomas Euler,
Matthias Bethge,
Alexander S. Ecker
Abstract:
Identifying cell types and understanding their functional properties is crucial for unraveling the mechanisms underlying perception and cognition. In the retina, functional types can be identified by carefully selected stimuli, but this requires expert domain knowledge and biases the procedure towards previously known cell types. In the visual cortex, it is still unknown what functional types exis…
▽ More
Identifying cell types and understanding their functional properties is crucial for unraveling the mechanisms underlying perception and cognition. In the retina, functional types can be identified by carefully selected stimuli, but this requires expert domain knowledge and biases the procedure towards previously known cell types. In the visual cortex, it is still unknown what functional types exist and how to identify them. Thus, for unbiased identification of the functional cell types in retina and visual cortex, new approaches are needed. Here we propose an optimization-based clustering approach using deep predictive models to obtain functional clusters of neurons using Most Discriminative Stimuli (MDS). Our approach alternates between stimulus optimization with cluster reassignment akin to an expectation-maximization algorithm. The algorithm recovers functional clusters in mouse retina, marmoset retina and macaque visual area V4. This demonstrates that our approach can successfully find discriminative stimuli across species, stages of the visual system and recording techniques. The resulting most discriminative stimuli can be used to assign functional cell types fast and on the fly, without the need to train complex predictive models or show a large natural scene dataset, paving the way for experiments that were previously limited by experimental time. Crucially, MDS are interpretable: they visualize the distinctive stimulus patterns that most unambiguously identify a specific type of neuron.
△ Less
Submitted 14 March, 2024; v1 submitted 29 November, 2023;
originally announced January 2024.
-
The Dynamic Sensorium competition for predicting large-scale mouse visual cortex activity from videos
Authors:
Polina Turishcheva,
Paul G. Fahey,
Laura Hansel,
Rachel Froebe,
Kayla Ponder,
Michaela Vystrčilová,
Konstantin F. Willeke,
Mohammad Bashiri,
Eric Wang,
Zhiwei Ding,
Andreas S. Tolias,
Fabian H. Sinz,
Alexander S. Ecker
Abstract:
Understanding how biological visual systems process information is challenging due to the complex nonlinear relationship between neuronal responses and high-dimensional visual input. Artificial neural networks have already improved our understanding of this system by allowing computational neuroscientists to create predictive models and bridge biological and machine vision. During the Sensorium 20…
▽ More
Understanding how biological visual systems process information is challenging due to the complex nonlinear relationship between neuronal responses and high-dimensional visual input. Artificial neural networks have already improved our understanding of this system by allowing computational neuroscientists to create predictive models and bridge biological and machine vision. During the Sensorium 2022, we introduced benchmarks for vision models with static input. However, animals operate and excel in dynamic environments, making it crucial to study and understand how the brain functions under these conditions. Moreover, many biological theories, such as predictive coding, suggest that previous input is crucial for current input processing. Currently, there is no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we propose the Sensorium 2023 Benchmark Competition with dynamic input. It includes the collection of a new large-scale dataset from the primary visual cortex of ten mice, containing responses from over 78,000 neurons to over 2 hours of dynamic stimuli per neuron. Participants in the main benchmark track will compete to identify the best predictive models of neuronal responses for dynamic input. We will also host a bonus track in which submission performance will be evaluated on out-of-domain input, using withheld neuronal responses to dynamic input stimuli whose statistics differ from the training set. Both tracks will offer behavioral data along with video stimuli. As before, we will provide code, tutorials, and strong pre-trained baseline models to encourage participation. We hope this competition will continue to strengthen the accompanying Sensorium benchmarks collection as a standard tool to measure progress in large-scale neural system identification models of the entire mouse visual hierarchy and beyond.
△ Less
Submitted 12 July, 2024; v1 submitted 31 May, 2023;
originally announced May 2023.
-
The Sensorium competition on predicting large-scale mouse primary visual cortex activity
Authors:
Konstantin F. Willeke,
Paul G. Fahey,
Mohammad Bashiri,
Laura Pede,
Max F. Burg,
Christoph Blessing,
Santiago A. Cadena,
Zhiwei Ding,
Konstantin-Klemens Lurz,
Kayla Ponder,
Taliah Muhammad,
Saumil S. Patel,
Alexander S. Ecker,
Andreas S. Tolias,
Fabian H. Sinz
Abstract:
The neural underpinning of the biological visual system is challenging to study experimentally, in particular as the neuronal activity becomes increasingly nonlinear with respect to visual input. Artificial neural networks (ANNs) can serve a variety of goals for improving our understanding of this complex system, not only serving as predictive digital twins of sensory cortex for novel hypothesis g…
▽ More
The neural underpinning of the biological visual system is challenging to study experimentally, in particular as the neuronal activity becomes increasingly nonlinear with respect to visual input. Artificial neural networks (ANNs) can serve a variety of goals for improving our understanding of this complex system, not only serving as predictive digital twins of sensory cortex for novel hypothesis generation in silico, but also incorporating bio-inspired architectural motifs to progressively bridge the gap between biological and machine vision. The mouse has recently emerged as a popular model system to study visual information processing, but no standardized large-scale benchmark to identify state-of-the-art models of the mouse visual system has been established. To fill this gap, we propose the Sensorium benchmark competition. We collected a large-scale dataset from mouse primary visual cortex containing the responses of more than 28,000 neurons across seven mice stimulated with thousands of natural images, together with simultaneous behavioral measurements that include running speed, pupil dilation, and eye movements. The benchmark challenge will rank models based on predictive performance for neuronal responses on a held-out test set, and includes two tracks for model input limited to either stimulus only (Sensorium) or stimulus plus behavior (Sensorium+). We provide a starting kit to lower the barrier for entry, including tutorials, pre-trained baseline models, and APIs with one line commands for data loading and submission. We would like to see this as a starting point for regular challenges and data releases, and as a standard tool for measuring progress in large-scale neural system identification models of the mouse visual system and beyond.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Self-Supervised Graph Representation Learning for Neuronal Morphologies
Authors:
Marissa A. Weis,
Laura Hansel,
Timo Lüddecke,
Alexander S. Ecker
Abstract:
Unsupervised graph representation learning has recently gained interest in several application domains such as neuroscience, where modeling the diverse morphology of cell types in the brain is one of the key challenges. It is currently unknown how many excitatory cortical cell types exist and what their defining morphological features are. Here we present GraphDINO, a purely data-driven approach t…
▽ More
Unsupervised graph representation learning has recently gained interest in several application domains such as neuroscience, where modeling the diverse morphology of cell types in the brain is one of the key challenges. It is currently unknown how many excitatory cortical cell types exist and what their defining morphological features are. Here we present GraphDINO, a purely data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled large-scale datasets. GraphDINO is a novel transformer-based representation learning method for spatially-embedded graphs. To enable self-supervised learning on transformers, we (1) developed data augmentation strategies for spatially-embedded graphs, (2) adapted the positional encoding and (3) introduced a novel attention mechanism, AC-Attention, which combines attention-based global interaction between nodes and classic graph convolutional processing. We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings that are on par with manual feature-based classification by experts, but without using prior knowledge about the structural features of neurons. Moreover, it outperforms previous approaches on quantitative benchmarks predicting expert labels. Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets. It is applicable beyond neuroscience in settings where samples in a dataset are graphs and graph-level embeddings are desired.
△ Less
Submitted 21 June, 2023; v1 submitted 23 December, 2021;
originally announced December 2021.
-
Image Segmentation Using Text and Image Prompts
Authors:
Timo Lüddecke,
Alexander S. Ecker
Abstract:
Image segmentation is usually addressed by training a model for a fixed set of object classes. Incorporating additional classes or more complex queries later is expensive as it requires re-training the model on a dataset that encompasses these expressions. Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text or an ima…
▽ More
Image segmentation is usually addressed by training a model for a fixed set of object classes. Incorporating additional classes or more complex queries later is expensive as it requires re-training the model on a dataset that encompasses these expressions. Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text or an image. This approach enables us to create a unified model (trained once) for three common segmentation tasks, which come with distinct challenges: referring expression segmentation, zero-shot segmentation and one-shot segmentation. We build upon the CLIP model as a backbone which we extend with a transformer-based decoder that enables dense prediction. After training on an extended version of the PhraseCut dataset, our system generates a binary segmentation map for an image based on a free-text prompt or on an additional image expressing the query. We analyze different variants of the latter image-based prompts in detail. This novel hybrid input allows for dynamic adaptation not only to the three segmentation tasks mentioned above, but to any binary segmentation task where a text or image query can be formulated. Finally, we find our system to adapt well to generalized queries involving affordances or properties. Code is available at https://eckerlab.org/code/clipseg.
△ Less
Submitted 30 March, 2022; v1 submitted 18 December, 2021;
originally announced December 2021.
-
A Broad Dataset is All You Need for One-Shot Object Detection
Authors:
Claudio Michaelis,
Matthias Bethge,
Alexander S. Ecker
Abstract:
Is it possible to detect arbitrary objects from a single example? A central problem of all existing attempts at one-shot object detection is the generalization gap: Object categories used during training are detected much more reliably than novel ones. We here show that this generalization gap can be nearly closed by increasing the number of object categories used during training. Doing so allows…
▽ More
Is it possible to detect arbitrary objects from a single example? A central problem of all existing attempts at one-shot object detection is the generalization gap: Object categories used during training are detected much more reliably than novel ones. We here show that this generalization gap can be nearly closed by increasing the number of object categories used during training. Doing so allows us to improve generalization from seen to unseen classes from 45% to 89% and improve the state-of-the-art on COCO by 5.4 %AP50 (from 22.0 to 27.5). We verify that the effect is caused by the number of categories and not the number of training samples, and that it holds for different models, backbones and datasets. This result suggests that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead simply in scaling the number of categories. We hope that our findings will help to better understand the challenges of few-shot learning and encourage future data annotation efforts to focus on wider datasets with a broader set of categories rather than gathering more samples per category.
△ Less
Submitted 29 October, 2022; v1 submitted 9 November, 2020;
originally announced November 2020.
-
Benchmarking Unsupervised Object Representations for Video Sequences
Authors:
Marissa A. Weis,
Kashyap Chitta,
Yash Sharma,
Wieland Brendel,
Matthias Bethge,
Andreas Geiger,
Alexander S. Ecker
Abstract:
Perceiving the world in terms of objects and tracking them through time is a crucial prerequisite for reasoning and scene understanding. Recently, several methods have been proposed for unsupervised learning of object-centric representations. However, since these models were evaluated on different downstream tasks, it remains unclear how they compare in terms of basic perceptual abilities such as…
▽ More
Perceiving the world in terms of objects and tracking them through time is a crucial prerequisite for reasoning and scene understanding. Recently, several methods have been proposed for unsupervised learning of object-centric representations. However, since these models were evaluated on different downstream tasks, it remains unclear how they compare in terms of basic perceptual abilities such as detection, figure-ground segmentation and tracking of objects. To close this gap, we design a benchmark with four data sets of varying complexity and seven additional test sets featuring challenging tracking scenarios relevant for natural videos. Using this benchmark, we compare the perceptual abilities of four object-centric approaches: ViMON, a video-extension of MONet, based on recurrent spatial attention, OP3, which exploits clustering via spatial mixture models, as well as TBA and SCALOR, which use explicit factorization via spatial transformers. Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking than the spatial transformer based architectures. We also observe that none of the methods are able to gracefully handle the most challenging tracking scenarios despite their synthetic nature, suggesting that our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
△ Less
Submitted 29 June, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming
Authors:
Claudio Michaelis,
Benjamin Mitzkus,
Robert Geirhos,
Evgenia Rusak,
Oliver Bringmann,
Alexander S. Ecker,
Matthias Bethge,
Wieland Brendel
Abstract:
The ability to detect objects regardless of image distortions or weather conditions is crucial for real-world applications of deep learning like autonomous driving. We here provide an easy-to-use benchmark to assess how object detection models perform when image quality degrades. The three resulting benchmark datasets, termed Pascal-C, Coco-C and Cityscapes-C, contain a large variety of image corr…
▽ More
The ability to detect objects regardless of image distortions or weather conditions is crucial for real-world applications of deep learning like autonomous driving. We here provide an easy-to-use benchmark to assess how object detection models perform when image quality degrades. The three resulting benchmark datasets, termed Pascal-C, Coco-C and Cityscapes-C, contain a large variety of image corruptions. We show that a range of standard object detection models suffer a severe performance loss on corrupted images (down to 30--60\% of the original performance). However, a simple data augmentation trick---stylizing the training images---leads to a substantial increase in robustness across corruption type, severity and dataset. We envision our comprehensive benchmark to track future progress towards building robust object detection models. Benchmark, code and data are publicly available.
△ Less
Submitted 31 March, 2020; v1 submitted 17 July, 2019;
originally announced July 2019.
-
One-Shot Instance Segmentation
Authors:
Claudio Michaelis,
Ivan Ustyuzhaninov,
Matthias Bethge,
Alexander S. Ecker
Abstract:
We tackle the problem of one-shot instance segmentation: Given an example image of a novel, previously unknown object category, find and segment all objects of this category within a complex scene. To address this challenging new task, we propose Siamese Mask R-CNN. It extends Mask R-CNN by a Siamese backbone encoding both reference image and scene, allowing it to target detection and segmentation…
▽ More
We tackle the problem of one-shot instance segmentation: Given an example image of a novel, previously unknown object category, find and segment all objects of this category within a complex scene. To address this challenging new task, we propose Siamese Mask R-CNN. It extends Mask R-CNN by a Siamese backbone encoding both reference image and scene, allowing it to target detection and segmentation towards the reference category. We demonstrate empirical results on MS Coco highlighting challenges of the one-shot setting: while transferring knowledge about instance segmentation to novel object categories works very well, targeting the detection network towards the reference category appears to be more difficult. Our work provides a first strong baseline for one-shot instance segmentation and will hopefully inspire further research into more powerful and flexible scene analysis algorithms. Code is available at: https://github.com/bethgelab/siamese-mask-rcnn
△ Less
Submitted 28 May, 2019; v1 submitted 28 November, 2018;
originally announced November 2018.
-
A rotation-equivariant convolutional neural network model of primary visual cortex
Authors:
Alexander S. Ecker,
Fabian H. Sinz,
Emmanouil Froudarakis,
Paul G. Fahey,
Santiago A. Cadena,
Edgar Y. Walker,
Erick Cobos,
Jacob Reimer,
Andreas S. Tolias,
Matthias Bethge
Abstract:
Classical models describe primary visual cortex (V1) as a filter bank of orientation-selective linear-nonlinear (LN) or energy models, but these models fail to predict neural responses to natural stimuli accurately. Recent work shows that models based on convolutional neural networks (CNNs) lead to much more accurate predictions, but it remains unclear which features are extracted by V1 neurons be…
▽ More
Classical models describe primary visual cortex (V1) as a filter bank of orientation-selective linear-nonlinear (LN) or energy models, but these models fail to predict neural responses to natural stimuli accurately. Recent work shows that models based on convolutional neural networks (CNNs) lead to much more accurate predictions, but it remains unclear which features are extracted by V1 neurons beyond orientation selectivity and phase invariance. Here we work towards systematically studying V1 computations by categorizing neurons into groups that perform similar computations. We present a framework to identify common features independent of individual neurons' orientation selectivity by using a rotation-equivariant convolutional neural network, which automatically extracts every feature at multiple different orientations. We fit this model to responses of a population of 6000 neurons to natural images recorded in mouse primary visual cortex using two-photon imaging. We show that our rotation-equivariant network not only outperforms a regular CNN with the same number of feature maps, but also reveals a number of common features shared by many V1 neurons, which deviate from the typical textbook idea of V1 as a bank of Gabor filters. Our findings are a first step towards a powerful new tool to study the nonlinear computations in V1.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
Diverse feature visualizations reveal invariances in early layers of deep neural networks
Authors:
Santiago A. Cadena,
Marissa A. Weis,
Leon A. Gatys,
Matthias Bethge,
Alexander S. Ecker
Abstract:
Visualizing features in deep neural networks (DNNs) can help understanding their computations. Many previous studies aimed to visualize the selectivity of individual units by finding meaningful images that maximize their activation. However, comparably little attention has been paid to visualizing to what image transformations units in DNNs are invariant. Here we propose a method to discover invar…
▽ More
Visualizing features in deep neural networks (DNNs) can help understanding their computations. Many previous studies aimed to visualize the selectivity of individual units by finding meaningful images that maximize their activation. However, comparably little attention has been paid to visualizing to what image transformations units in DNNs are invariant. Here we propose a method to discover invariances in the responses of hidden layer units of deep neural networks. Our approach is based on simultaneously searching for a batch of images that strongly activate a unit while at the same time being as distinct from each other as possible. We find that even early convolutional layers in VGG-19 exhibit various forms of response invariance: near-perfect phase invariance in some units and invariance to local diffeomorphic transformations in others. At the same time, we uncover representational differences with ResNet-50 in its corresponding layers. We conclude that invariance transformations are a major computational component learned by DNNs and we provide a systematic method to study them.
△ Less
Submitted 27 July, 2018;
originally announced July 2018.
-
One-Shot Segmentation in Clutter
Authors:
Claudio Michaelis,
Matthias Bethge,
Alexander S. Ecker
Abstract:
We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call $\textit{cluttered Omniglot}$. Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressi…
▽ More
We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call $\textit{cluttered Omniglot}$. Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressively harder. Using oracle models with access to various amounts of ground-truth information, we evaluate different aspects of the problem and show that in this kind of visual search task, detection and segmentation are two intertwined problems, the solution to each of which helps solving the other. We therefore introduce $\textit{MaskNet}$, an improved model that attends to multiple candidate locations, generates segmentation proposals to mask out background clutter and selects among the segmented objects. Our findings suggest that such image recognition models based on an iterative refinement of object detection and foreground segmentation may provide a way to deal with highly cluttered scenes.
△ Less
Submitted 13 June, 2018; v1 submitted 26 March, 2018;
originally announced March 2018.
-
Neural system identification for large populations separating "what" and "where"
Authors:
David A. Klindt,
Alexander S. Ecker,
Thomas Euler,
Matthias Bethge
Abstract:
Neuroscientists classify neurons into different types that perform similar computations at different locations in the visual field. Traditional methods for neural system identification do not capitalize on this separation of 'what' and 'where'. Learning deep convolutional feature spaces that are shared among many neurons provides an exciting path forward, but the architectural design needs to acco…
▽ More
Neuroscientists classify neurons into different types that perform similar computations at different locations in the visual field. Traditional methods for neural system identification do not capitalize on this separation of 'what' and 'where'. Learning deep convolutional feature spaces that are shared among many neurons provides an exciting path forward, but the architectural design needs to account for data limitations: While new experimental techniques enable recordings from thousands of neurons, experimental time is limited so that one can sample only a small fraction of each neuron's response space. Here, we show that a major bottleneck for fitting convolutional neural networks (CNNs) to neural data is the estimation of the individual receptive field locations, a problem that has been scratched only at the surface thus far. We propose a CNN architecture with a sparse readout layer factorizing the spatial (where) and feature (what) dimensions. Our network scales well to thousands of neurons and short recordings and can be trained end-to-end. We evaluate this architecture on ground-truth data to explore the challenges and limitations of CNN-based system identification. Moreover, we show that our network model outperforms current state-of-the art system identification models of mouse primary visual cortex.
△ Less
Submitted 29 January, 2018; v1 submitted 7 November, 2017;
originally announced November 2017.
-
Synthesising Dynamic Textures using Convolutional Neural Networks
Authors:
Christina M. Funke,
Leon A. Gatys,
Alexander S. Ecker,
Matthias Bethge
Abstract:
Here we present a parametric model for dynamic textures. The model is based on spatiotemporal summary statistics computed from the feature representations of a Convolutional Neural Network (CNN) trained on object recognition. We demonstrate how the model can be used to synthesise new samples of dynamic textures and to predict motion in simple movies.
Here we present a parametric model for dynamic textures. The model is based on spatiotemporal summary statistics computed from the feature representations of a Convolutional Neural Network (CNN) trained on object recognition. We demonstrate how the model can be used to synthesise new samples of dynamic textures and to predict motion in simple movies.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.
-
Controlling Perceptual Factors in Neural Style Transfer
Authors:
Leon A. Gatys,
Alexander S. Ecker,
Matthias Bethge,
Aaron Hertzmann,
Eli Shechtman
Abstract:
Neural Style Transfer has shown very exciting results enabling new forms of image manipulation. Here we extend the existing method to introduce control over spatial location, colour information and across spatial scale. We demonstrate how this enhances the method by allowing high-resolution controlled stylisation and helps to alleviate common failure cases such as applying ground textures to sky r…
▽ More
Neural Style Transfer has shown very exciting results enabling new forms of image manipulation. Here we extend the existing method to introduce control over spatial location, colour information and across spatial scale. We demonstrate how this enhances the method by allowing high-resolution controlled stylisation and helps to alleviate common failure cases such as applying ground textures to sky regions. Furthermore, by decomposing style into these perceptual factors we enable the combination of style information from multiple sources to generate new, perceptually appealing styles from existing ones. We also describe how these methods can be used to more efficiently produce large size, high-quality stylisation. Finally we show how the introduced control measures can be applied in recent methods for Fast Neural Style Transfer.
△ Less
Submitted 11 May, 2017; v1 submitted 23 November, 2016;
originally announced November 2016.
-
A Neural Algorithm of Artistic Style
Authors:
Leon A. Gatys,
Alexander S. Ecker,
Matthias Bethge
Abstract:
In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human…
▽ More
In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.
△ Less
Submitted 2 September, 2015; v1 submitted 26 August, 2015;
originally announced August 2015.
-
Texture Synthesis Using Convolutional Neural Networks
Authors:
Leon A. Gatys,
Alexander S. Ecker,
Matthias Bethge
Abstract:
Here we introduce a new model of natural textures based on the feature spaces of convolutional neural networks optimised for object recognition. Samples from the model are of high perceptual quality demonstrating the generative power of neural networks trained in a purely discriminative fashion. Within the model, textures are represented by the correlations between feature maps in several layers o…
▽ More
Here we introduce a new model of natural textures based on the feature spaces of convolutional neural networks optimised for object recognition. Samples from the model are of high perceptual quality demonstrating the generative power of neural networks trained in a purely discriminative fashion. Within the model, textures are represented by the correlations between feature maps in several layers of the network. We show that across layers the texture representations increasingly capture the statistical properties of natural images while making object information more and more explicit. The model provides a new tool to generate stimuli for neuroscience and might offer insights into the deep representations learned by convolutional neural networks.
△ Less
Submitted 6 November, 2015; v1 submitted 27 May, 2015;
originally announced May 2015.