Search | arXiv e-print repository

Noninvasive precision modulation of high-level neural population activity via natural vision perturbations

Authors: Guy Gaziv, Sarah Goulding, Ani Ayvazian-Hancock, Yoon Bai, James J. DiCarlo

Abstract: Precise control of neural activity -- modulating target neurons deep in the brain while leaving nearby neurons unaffected -- is an outstanding challenge in neuroscience, generally approached using invasive techniques. This study investigates the possibility of precisely and noninvasively modulating neural activity in the high-level primate ventral visual stream via perturbations on one's natural v… ▽ More Precise control of neural activity -- modulating target neurons deep in the brain while leaving nearby neurons unaffected -- is an outstanding challenge in neuroscience, generally approached using invasive techniques. This study investigates the possibility of precisely and noninvasively modulating neural activity in the high-level primate ventral visual stream via perturbations on one's natural visual feed. When tested on macaque inferior temporal (IT) neural populations, we found quantitative agreement between the model-predicted and biologically realized effect: strong modulation concentrated on targeted neural sites. We extended this to demonstrate accurate injection of experimenter-chosen neural population patterns via subtle perturbations applied on the background of typical natural visual feeds. These results highlight that current machine-executable models of the ventral stream can now design noninvasive, visually-delivered, possibly imperceptible neural interventions at the resolution of individual neurons. △ Less

Submitted 9 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

arXiv:2412.09115 [pdf, other]

Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations

Authors: Yudi Xie, Weichen Huang, Esther Alter, Jeremy Schwartz, Joshua B. Tenenbaum, James J. DiCarlo

Abstract: Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also deriv… ▽ More Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different -- if at all -- are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar -- but not identical -- internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream. △ Less

Submitted 17 February, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

Comments: 30 pages, 21 figures, ICLR 2025

arXiv:2401.06005 [pdf, other]

How does the primate brain combine generative and discriminative computations in vision?

Authors: Benjamin Peters, James J. DiCarlo, Todd Gureckis, Ralf Haefner, Leyla Isik, Joshua Tenenbaum, Talia Konkle, Thomas Naselaris, Kimberly Stachenfeld, Zenna Tavares, Doris Tsao, Ilker Yildirim, Nikolaus Kriegeskorte

Abstract: Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remo… ▽ More Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remove irrelevant variation and represent behaviorally relevant information in a format suitable for downstream functions of cognition and behavioral control. In this conception, vision is driven by the sensory data, and perception is direct because the processing proceeds from the data to the latent variables of interest. The notion of "inference" in this conception is that of the engineering literature on neural networks, where feedforward convolutional neural networks processing images are said to perform inference. The alternative conception is that of vision as an inference process in Helmholtz's sense, where the sensory evidence is evaluated in the context of a generative model of the causal processes giving rise to it. In this conception, vision inverts a generative model through an interrogation of the evidence in a process often thought to involve top-down predictions of sensory data to evaluate the likelihood of alternative hypotheses. The authors include scientists rooted in roughly equal numbers in each of the conceptions and motivated to overcome what might be a false dichotomy between them and engage the other perspective in the realm of theory and experiment. The primate brain employs an unknown algorithm that may combine the advantages of both conceptions. We explain and clarify the terminology, review the key empirical evidence, and propose an empirical research program that transcends the dichotomy and sets the stage for revealing the mysterious hybrid algorithm of primate vision. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.14285 [pdf, other]

Probing Biological and Artificial Neural Networks with Task-dependent Neural Manifolds

Authors: Michael Kuoch, Chi-Ning Chou, Nikhil Parthasarathy, Joel Dapello, James J. DiCarlo, Haim Sompolinsky, SueYeon Chung

Abstract: Recently, growth in our understanding of the computations performed in both biological and artificial neural networks has largely been driven by either low-level mechanistic studies or global normative approaches. However, concrete methodologies for bridging the gap between these levels of abstraction remain elusive. In this work, we investigate the internal mechanisms of neural networks through t… ▽ More Recently, growth in our understanding of the computations performed in both biological and artificial neural networks has largely been driven by either low-level mechanistic studies or global normative approaches. However, concrete methodologies for bridging the gap between these levels of abstraction remain elusive. In this work, we investigate the internal mechanisms of neural networks through the lens of neural population geometry, aiming to provide understanding at an intermediate level of abstraction, as a way to bridge that gap. Utilizing manifold capacity theory (MCT) from statistical physics and manifold alignment analysis (MAA) from high-dimensional statistics, we probe the underlying organization of task-dependent manifolds in deep neural networks and macaque neural recordings. Specifically, we quantitatively characterize how different learning objectives lead to differences in the organizational strategies of these models and demonstrate how these geometric analyses are connected to the decodability of task-relevant information. These analyses present a strong direction for bridging mechanistic and normative theories in neural networks through neural population geometry, potentially opening up many future research avenues in both machine learning and neuroscience. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: To appear in the proceedings of the Conference on Parsimony and Learning (CPAL) 2024

arXiv:2312.05956 [pdf, other]

The Quest for an Integrated Set of Neural Mechanisms Underlying Object Recognition in Primates

Authors: Kohitij Kar, James J DiCarlo

Abstract: Visual object recognition -- the behavioral ability to rapidly and accurately categorize many visually encountered objects -- is core to primate cognition. This behavioral capability is algorithmically impressive because of the myriad identity-preserving viewpoints and scenes that dramatically change the visual image produced by the same object. Until recently, the brain mechanisms that support th… ▽ More Visual object recognition -- the behavioral ability to rapidly and accurately categorize many visually encountered objects -- is core to primate cognition. This behavioral capability is algorithmically impressive because of the myriad identity-preserving viewpoints and scenes that dramatically change the visual image produced by the same object. Until recently, the brain mechanisms that support that capability were deeply mysterious. However, over the last decade, this scientific mystery has been illuminated by the discovery and development of brain-inspired, image-computable, artificial neural network (ANN) systems that rival primates in this behavioral feat. Apart from fundamentally changing the landscape of artificial intelligence (AI), modified versions of these ANN systems are the current leading scientific hypotheses of an integrated set of mechanisms in the primate ventral visual stream that support object recognition. What separates brain-mapped versions of these systems from prior conceptual models is that they are Sensory-computable, Mechanistic, Anatomically Referenced, and Testable (SMART). Here, we review and provide perspective on the brain mechanisms that the currently leading SMART models address. We review the empirical brain and behavioral alignment successes and failures of those current models. Given ongoing advances in neurobehavioral measurements and AI, we discuss the next frontiers for even more accurate mechanistic understanding. And we outline the likely applications of that SMART-model-based understanding. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2308.06887 [pdf, other]

Robustified ANNs Reveal Wormholes Between Human Category Percepts

Authors: Guy Gaziv, Michael J. Lee, James J. DiCarlo

Abstract: The visual object category reports of artificial neural networks (ANNs) are notoriously sensitive to tiny, adversarial image perturbations. Because human category reports (aka human percepts) are thought to be insensitive to those same small-norm perturbations -- and locally stable in general -- this argues that ANNs are incomplete scientific models of human visual perception. Consistent with this… ▽ More The visual object category reports of artificial neural networks (ANNs) are notoriously sensitive to tiny, adversarial image perturbations. Because human category reports (aka human percepts) are thought to be insensitive to those same small-norm perturbations -- and locally stable in general -- this argues that ANNs are incomplete scientific models of human visual perception. Consistent with this, we show that when small-norm image perturbations are generated by standard ANN models, human object category percepts are indeed highly stable. However, in this very same "human-presumed-stable" regime, we find that robustified ANNs reliably discover low-norm image perturbations that strongly disrupt human percepts. These previously undetectable human perceptual disruptions are massive in amplitude, approaching the same level of sensitivity seen in robustified ANNs. Further, we show that robustified ANNs support precise perceptual state interventions: they guide the construction of low-norm image perturbations that strongly alter human category percepts toward specific prescribed percepts. These observations suggest that for arbitrary starting points in image space, there exists a set of nearby "wormholes", each leading the subject from their current category perceptual state into a semantically very different state. Moreover, contemporary ANN models of biological visual processing are now accurate enough to consistently guide us to those portals. △ Less

Submitted 4 October, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: In NeurIPS 2023. Code: https://github.com/ggaziv/Wormholes Project Webpage: https://himjl.github.io/pwormholes

Journal ref: https://neurips.cc/virtual/2023/poster/72812

arXiv:2210.08340 [pdf]

Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution

Authors: Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, James DiCarlo, Surya Ganguli, Jeff Hawkins, Konrad Koerding, Alexei Koulakov, Yann LeCun, Timothy Lillicrap, Adam Marblestone, Bruno Olshausen, Alexandre Pouget, Cristina Savin, Terrence Sejnowski, Eero Simoncelli, Sara Solla, David Sussillo , et al. (2 additional authors not shown)

Abstract: Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts… ▽ More Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts the focus from those capabilities like game playing and language that are especially well-developed or uniquely human to those capabilities, inherited from over 500 million years of evolution, that are shared with all animals. Building models that can pass the embodied Turing test will provide a roadmap for the next generation of AI. △ Less

Submitted 22 February, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: White paper, 10 pages + 8 pages of references, 1 figures

arXiv:2206.11228 [pdf, other]

Adversarially trained neural representations may already be as robust as corresponding biological neural representations

Authors: Chong Guo, Michael J. Lee, Guillaume Leclerc, Joel Dapello, Yug Rao, Aleksander Madry, James J. DiCarlo

Abstract: Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial visual systems that are adversarially robust. In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. We then leverage this method to demonstrate that… ▽ More Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial visual systems that are adversarially robust. In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. We then leverage this method to demonstrate that the above-mentioned belief might not be well founded. Specifically, we report that the biological neurons that make up visual systems of primates exhibit susceptibility to adversarial perturbations that is comparable in magnitude to existing (robustly trained) artificial neural networks. △ Less

Submitted 19 June, 2022; originally announced June 2022.

Comments: 10 pages, 6 figures, ICML2022

arXiv:2111.06979 [pdf, other]

Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Authors: Joel Dapello, Jenelle Feather, Hang Le, Tiago Marques, David D. Cox, Josh H. McDermott, James J. DiCarlo, SueYeon Chung

Abstract: Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response… ▽ More Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response stochasticity, like that exhibited by biological neurons. Here, using recently developed geometrical techniques from computational neuroscience, we investigate how adversarial perturbations influence the internal representations of standard, adversarially trained, and biologically-inspired stochastic networks. We find distinct geometric signatures for each type of network, revealing different mechanisms for achieving robust representations. Next, we generalize these results to the auditory domain, showing that neural stochasticity also makes auditory models more robust to adversarial perturbations. Geometric analysis of the stochastic networks reveals overlap between representations of clean and adversarially perturbed stimuli, and quantitatively demonstrates that competing geometric effects of stochasticity mediate a tradeoff between adversarial and clean performance. Our results shed light on the strategies of robust perception utilized by adversarially trained and stochastic networks, and help explain how stochasticity may be beneficial to machine and biological computation. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2110.10645 [pdf, other]

Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs

Authors: Avinash Baidya, Joel Dapello, James J. DiCarlo, Tiago Marques

Abstract: While some convolutional neural networks (CNNs) have surpassed human visual abilities in object classification, they often struggle to recognize objects in images corrupted with different types of common noise patterns, highlighting a major limitation of this family of models. Recently, it has been shown that simulating a primary visual cortex (V1) at the front of CNNs leads to small improvements… ▽ More While some convolutional neural networks (CNNs) have surpassed human visual abilities in object classification, they often struggle to recognize objects in images corrupted with different types of common noise patterns, highlighting a major limitation of this family of models. Recently, it has been shown that simulating a primary visual cortex (V1) at the front of CNNs leads to small improvements in robustness to these image perturbations. In this study, we start with the observation that different variants of the V1 model show gains for specific corruption types. We then build a new model using an ensembling technique, which combines multiple individual models with different V1 front-end variants. The model ensemble leverages the strengths of each individual model, leading to significant improvements in robustness across all corruption categories and outperforming the base model by 38% on average. Finally, we show that using distillation, it is possible to partially compress the knowledge in the ensemble model into a single model with a V1 front-end. While the ensembling and distillation techniques used here are hardly biologically-plausible, the results presented here demonstrate that by combining the specific strengths of different neuronal circuits in V1 it is possible to improve the robustness of CNNs for a wide range of perturbations. △ Less

Submitted 7 December, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

Comments: 15 pages with supplementary material, 3 main figures, 2 supplementary figures, 4 supplementary tables

Journal ref: Workshop on Shared Visual Representations in Human and Machine Intelligence 2021

arXiv:1909.09847 [pdf]

doi 10.32470/CCN.2018.1085-0

Topographic Deep Artificial Neural Networks (TDANNs) predict face selectivity topography in primate inferior temporal (IT) cortex

Authors: Hyodong Lee, James J. DiCarlo

Abstract: Deep convolutional neural networks are biologically driven models that resemble the hierarchical structure of primate visual cortex and are the current best predictors of the neural responses measured along the ventral stream. However, the networks lack topographic properties that are present in the visual cortex, such as orientation maps in primary visual cortex and category-selective maps in inf… ▽ More Deep convolutional neural networks are biologically driven models that resemble the hierarchical structure of primate visual cortex and are the current best predictors of the neural responses measured along the ventral stream. However, the networks lack topographic properties that are present in the visual cortex, such as orientation maps in primary visual cortex and category-selective maps in inferior temporal (IT) cortex. In this work, the minimum wiring cost constraint was approximated as an additional learning rule in order to generate topographic maps of the networks. We found that our topographic deep artificial neural networks (ANNs) can reproduce the category selectivity maps of the primate IT cortex. △ Less

Submitted 21 September, 2019; originally announced September 2019.

Comments: 2018 Conference on Cognitive Computational Neuroscience

arXiv:1909.06161 [pdf, other]

Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs

Authors: Jonas Kubilius, Martin Schrimpf, Kohitij Kar, Ha Hong, Najib J. Majaj, Rishi Rajalingham, Elias B. Issa, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, Aran Nayebi, Daniel Bear, Daniel L. K. Yamins, James J. DiCarlo

Abstract: Deep convolutional artificial neural networks (ANNs) are the leading class of candidate models of the mechanisms of visual processing in the primate ventral stream. While initially inspired by brain anatomy, over the past years, these ANNs have evolved from a simple eight-layer architecture in AlexNet to extremely deep and branching architectures, demonstrating increasingly better object categoriz… ▽ More Deep convolutional artificial neural networks (ANNs) are the leading class of candidate models of the mechanisms of visual processing in the primate ventral stream. While initially inspired by brain anatomy, over the past years, these ANNs have evolved from a simple eight-layer architecture in AlexNet to extremely deep and branching architectures, demonstrating increasingly better object categorization performance, yet bringing into question how brain-like they still are. In particular, typical deep models from the machine learning community are often hard to map onto the brain's anatomy due to their vast number of layers and missing biologically-important connections, such as recurrence. Here we demonstrate that better anatomical alignment to the brain and high performance on machine learning as well as neuroscience measures do not have to be in contradiction. We developed CORnet-S, a shallow ANN with four anatomically mapped areas and recurrent connectivity, guided by Brain-Score, a new large-scale composite of neural and behavioral benchmarks for quantifying the functional fidelity of models of the primate ventral visual stream. Despite being significantly shallower than most models, CORnet-S is the top model on Brain-Score and outperforms similarly compact models on ImageNet. Moreover, our extensive analyses of CORnet-S circuitry variants reveal that recurrence is the main predictive factor of both Brain-Score and ImageNet top-1 performance. Finally, we report that the temporal evolution of the CORnet-S "IT" neural population resembles the actual monkey IT population dynamics. Taken together, these results establish CORnet-S, a compact, recurrent ANN, as the current best model of the primate ventral visual stream. △ Less

Submitted 28 October, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

Comments: NeurIPS 2019 (Oral). Code available at https://github.com/dicarlolab/neurips2019

arXiv:1807.00053 [pdf, other]

Task-Driven Convolutional Recurrent Models of the Visual System

Authors: Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. K. Yamins

Abstract: Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortic… ▽ More Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain's recurrent connections in performing difficult visual behaviors. △ Less

Submitted 26 October, 2018; v1 submitted 20 June, 2018; originally announced July 2018.

Comments: NIPS 2018 Camera Ready Version, 16 pages including supplementary information, 6 figures

arXiv:1703.07633 [pdf]

Deep brain fluorescence imaging with minimally invasive ultra-thin optical fibers

Authors: Shay Ohayon, Antonio Miguel Caravaca-Aguirre, Rafael Piestun, James J. DiCarlo

Abstract: A major open challenge in neuroscience is the ability to measure and perturb neural activity in vivo from well-defined neural sub-populations at cellular resolution anywhere in the brain. However, limitations posed by scattering and absorption prohibit non-invasive (surface) multiphoton approaches for deep (>2mm) structures, while Gradient Refreactive Index (GRIN) endoscopes are thick and cause si… ▽ More A major open challenge in neuroscience is the ability to measure and perturb neural activity in vivo from well-defined neural sub-populations at cellular resolution anywhere in the brain. However, limitations posed by scattering and absorption prohibit non-invasive (surface) multiphoton approaches for deep (>2mm) structures, while Gradient Refreactive Index (GRIN) endoscopes are thick and cause significant damage upon insertion. Here, we demonstrate a novel microendoscope to image neural activity at arbitrary depths via an ultrathin multimode optical fiber (MMF) probe that is 5-10X thinner than commercially available microendoscopes. We demonstrate micron-scale resolution, multispectral and volumetric imaging. In contrast to previous approaches, we show that this method has an improved acquisition speed that is sufficient to capture rapid neuronal dynamics in-vivo in rodents expressing a genetically encoded calcium indicator. Our results emphasize the potential of this technology in neuroscience applications and open up possibilities for cellular resolution imaging in previously unreachable brain regions. △ Less

Submitted 9 November, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

arXiv:1406.3284 [pdf, other]

doi 10.1371/journal.pcbi.1003963

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Authors: Charles F. Cadieu, Ha Hong, Daniel L. K. Yamins, Nicolas Pinto, Diego Ardila, Ethan A. Solomon, Najib J. Majaj, James J. DiCarlo

Abstract: The primate visual system achieves remarkable visual object recognition performance even in brief presentations and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have… ▽ More The primate visual system achieves remarkable visual object recognition performance even in brief presentations and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations such as the amount of noise, the number of neural recording sites, and the number trials, and computational limitations such as the complexity of the decoding classifier and the number of classifier training examples. In this work we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of "kernel analysis" that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds. △ Less

Submitted 12 June, 2014; originally announced June 2014.

Comments: 35 pages, 12 figures, extends and expands upon arXiv:1301.3530

arXiv:1301.3530 [pdf, other]

The Neural Representation Benchmark and its Evaluation on Brain and Machine

Authors: Charles F. Cadieu, Ha Hong, Dan Yamins, Nicolas Pinto, Najib J. Majaj, James J. DiCarlo

Abstract: A key requirement for the development of effective learning representations is their evaluation and comparison to representations we know to be effective. In natural sensory domains, the community has viewed the brain as a source of inspiration and as an implicit benchmark for success. However, it has not been possible to directly test representational learning algorithms directly against the repr… ▽ More A key requirement for the development of effective learning representations is their evaluation and comparison to representations we know to be effective. In natural sensory domains, the community has viewed the brain as a source of inspiration and as an implicit benchmark for success. However, it has not been possible to directly test representational learning algorithms directly against the representations contained in neural systems. Here, we propose a new benchmark for visual representations on which we have directly tested the neural representation in multiple visual cortical areas in macaque (utilizing data from [Majaj et al., 2012]), and on which any computer vision algorithm that produces a feature space can be tested. The benchmark measures the effectiveness of the neural or machine representation by computing the classification loss on the ordered eigendecomposition of a kernel matrix [Montavon et al., 2011]. In our analysis we find that the neural representation in visual area IT is superior to visual area V4. In our analysis of representational learning algorithms, we find that three-layer models approach the representational performance of V4 and the algorithm in [Le et al., 2012] surpasses the performance of V4. Impressively, we find that a recent supervised algorithm [Krizhevsky et al., 2012] achieves performance comparable to that of IT for an intermediate level of image variation difficulty, and surpasses IT at a higher difficulty level. We believe this result represents a major milestone: it is the first learning algorithm we have found that exceeds our current estimate of IT representation performance. We hope that this benchmark will assist the community in matching the representational performance of visual cortex and will serve as an initial rallying point for further correspondence between representations derived in brains and machines. △ Less

Submitted 25 January, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

Comments: The v1 version contained incorrectly computed kernel analysis curves and KA-AUC values for V4, IT, and the HT-L3 models. They have been corrected in this version

Showing 1–16 of 16 results for author: DiCarlo, J