Search | arXiv e-print repository

Successes and Limitations of Object-centric Models at Compositional Generalisation

Authors: Milton L. Montero, Jeffrey S. Bowers, Gaurav Malhotra

Abstract: In recent years, it has been shown empirically that standard disentangled latent variable models do not support robust compositional learning in the visual domain. Indeed, in spite of being designed with the goal of factorising datasets into their constituent factors of variations, disentangled models show extremely limited compositional generalisation capabilities. On the other hand, object-centr… ▽ More In recent years, it has been shown empirically that standard disentangled latent variable models do not support robust compositional learning in the visual domain. Indeed, in spite of being designed with the goal of factorising datasets into their constituent factors of variations, disentangled models show extremely limited compositional generalisation capabilities. On the other hand, object-centric architectures have shown promising compositional skills, albeit these have 1) not been extensively tested and 2) experiments have been limited to scene composition -- where models must generalise to novel combinations of objects in a visual scene instead of novel combinations of object properties. In this work, we show that these compositional generalisation skills extend to this later setting. Furthermore, we present evidence pointing to the source of these skills and how they can be improved through careful training. Finally, we point to one important limitation that still exists which suggests new directions of research. △ Less

Submitted 24 December, 2024; originally announced December 2024.

Comments: As it appeared in the Compositional Learning Workshop, NeurIPS 2024; 14 pages (5 main text, 7 appendices, 2 references); 9 figures

arXiv:2406.03653 [pdf, other]

Equivalence Set Restricted Latent Class Models (ESRLCM)

Authors: Jesse Bowers, Steve Culpepper

Abstract: Latent Class Models (LCMs) are used to cluster multivariate categorical data, commonly used to interpret survey responses. We propose a novel Bayesian model called the Equivalence Set Restricted Latent Class Model (ESRLCM). This model identifies clusters who have common item response probabilities, and does so more generically than traditional restricted latent attribute models. We verify the iden… ▽ More Latent Class Models (LCMs) are used to cluster multivariate categorical data, commonly used to interpret survey responses. We propose a novel Bayesian model called the Equivalence Set Restricted Latent Class Model (ESRLCM). This model identifies clusters who have common item response probabilities, and does so more generically than traditional restricted latent attribute models. We verify the identifiability of ESRLCMs, and demonstrate the effectiveness in both simulations and real-world applications. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 43 pages, 10 tables, 1 figure

arXiv:2404.14325 [pdf, other]

doi 10.1371/journal.pcbi.1012673

Adapting to time: Why nature may have evolved a diverse set of neurons

Authors: Karim G. Habashy, Benjamin D. Evans, Dan F. M. Goodman, Jeffrey S. Bowers

Abstract: Brains have evolved diverse neurons with varying morphologies and dynamics that impact temporal information processing. In contrast, most neural network models use homogeneous units that vary only in spatial parameters (weights and biases). To explore the importance of temporal parameters, we trained spiking neural networks on tasks with varying temporal complexity, holding different parameter sub… ▽ More Brains have evolved diverse neurons with varying morphologies and dynamics that impact temporal information processing. In contrast, most neural network models use homogeneous units that vary only in spatial parameters (weights and biases). To explore the importance of temporal parameters, we trained spiking neural networks on tasks with varying temporal complexity, holding different parameter subsets constant. We found that adapting conduction delays is crucial for solving all test conditions under tight resource constraints. Remarkably, these tasks can be solved using only temporal parameters (delays and time constants) with constant weights. In more complex spatio-temporal tasks, an adaptable bursting parameter was essential. Overall, allowing adaptation of both temporal and spatial parameters enhances network robustness to noise, a vital feature for biological brains and neuromorphic computing systems. Our findings suggest that rich and adaptable dynamics may be the key for solving temporally structured tasks efficiently in evolving organisms, which would help explain the diverse physiological properties of biological neurons. △ Less

Submitted 12 January, 2025; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 19 pages, 6 figures

ACM Class: K.3.2; I.2.m

Journal ref: PLoS Comput Biol 20(12): e1012673 (2024)

arXiv:2404.06989 [pdf]

On Fixing the Right Problems in Predictive Analytics: AUC Is Not the Problem

Authors: Ryan S. Baker, Nigel Bosch, Stephen Hutt, Andres F. Zambrano, Alex J. Bowers

Abstract: Recently, ACM FAccT published an article by Kwegyir-Aggrey and colleagues (2023), critiquing the use of AUC ROC in predictive analytics in several domains. In this article, we offer a critique of that article. Specifically, we highlight technical inaccuracies in that paper's comparison of metrics, mis-specification of the interpretation and goals of AUC ROC, the article's use of the accuracy metri… ▽ More Recently, ACM FAccT published an article by Kwegyir-Aggrey and colleagues (2023), critiquing the use of AUC ROC in predictive analytics in several domains. In this article, we offer a critique of that article. Specifically, we highlight technical inaccuracies in that paper's comparison of metrics, mis-specification of the interpretation and goals of AUC ROC, the article's use of the accuracy metric as a gold standard for comparison to AUC ROC, and the article's application of critiques solely to AUC ROC for concerns that would apply to the use of any metric. We conclude with a re-framing of the very valid concerns raised in that article, and discuss how the use of AUC ROC can remain a valid and appropriate practice in a well-informed predictive analytics approach taking those concerns into account. We conclude by discussing the combined use of multiple metrics, including machine learning bias metrics, and AUC ROC's place in such an approach. Like broccoli, AUC ROC is healthy, but also like broccoli, researchers and practitioners in our field shouldn't eat a diet of only AUC ROC. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05290 [pdf, other]

MindSet: Vision. A toolbox for testing DNNs on key psychological experiments

Authors: Valerio Biscione, Dong Yin, Gaurav Malhotra, Marin Dujmovic, Milton L. Montero, Guillermo Puebla, Federico Adolfi, Rachel F. Heaton, John E. Hummel, Benjamin D. Evans, Karim Habashy, Jeffrey S. Bowers

Abstract: Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision. In almost all cases these benchmarks are observational in the sense they are composed of behavioural and brain responses to naturalistic images that have not been manipulated to test hypotheses regarding how DNNs or humans perceive and identify objects. Here we introduce the toolbo… ▽ More Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision. In almost all cases these benchmarks are observational in the sense they are composed of behavioural and brain responses to naturalistic images that have not been manipulated to test hypotheses regarding how DNNs or humans perceive and identify objects. Here we introduce the toolbox MindSet: Vision, consisting of a collection of image datasets and related scripts designed to test DNNs on 30 psychological findings. In all experimental conditions, the stimuli are systematically manipulated to test specific hypotheses regarding human visual perception and object recognition. In addition to providing pre-generated datasets of images, we provide code to regenerate these datasets, offering many configurable parameters which greatly extend the dataset versatility for different research contexts, and code to facilitate the testing of DNNs on these image datasets using three different methods (similarity judgments, out-of-distribution classification, and decoder method), accessible at https://github.com/MindSetVision/mindset-vision. We test ResNet-152 on each of these methods as an example of how the toolbox can be used. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2402.12675 [pdf, other]

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

Authors: Guillermo Puebla, Jeffrey S. Bowers

Abstract: Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade, several studies have applied deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of generalization of the relations learned. However, in recent years, object-centric representation learning has been put forward as a way to achieve visual reasonin… ▽ More Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade, several studies have applied deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of generalization of the relations learned. However, in recent years, object-centric representation learning has been put forward as a way to achieve visual reasoning within the deep learning framework. Object-centric models attempt to model input scenes as compositions of objects and relations between them. To this end, these models use several kinds of attention mechanisms to segregate the individual objects in a scene from the background and from other objects. In this work we tested relation learning and generalization in several object-centric models, as well as a ResNet-50 baseline. In contrast to previous research, which has focused heavily in the same-different task in order to asses relational reasoning in DNNs, we use a set of tasks -- with varying degrees of difficulty -- derived from the comparative cognition literature. Our results show that object-centric models are able to segregate the different objects in a scene, even in many out-of-distribution cases. In our simpler tasks, this improves their capacity to learn and generalize visual relations in comparison to the ResNet-50 baseline. However, object-centric models still struggle in our more difficult tasks and conditions. We conclude that abstract visual reasoning remains an open challenge for DNNs, including object-centric models. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 16 pages, 14 figures

arXiv:2304.07091 [pdf, other]

The role of object-centric representations, guided attention, and external memory on generalizing visual relations

Authors: Guillermo Puebla, Jeffrey S. Bowers

Abstract: Visual reasoning is a long-term goal of vision research. In the last decade, several works have attempted to apply deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of the generalization of the relations learned. In recent years, several innovations in DNNs have been developed in order to enable learning abstract relation from images. In… ▽ More Visual reasoning is a long-term goal of vision research. In the last decade, several works have attempted to apply deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of the generalization of the relations learned. In recent years, several innovations in DNNs have been developed in order to enable learning abstract relation from images. In this work, we systematically evaluate a series of DNNs that integrate mechanism such as slot attention, recurrently guided attention, and external memory, in the simplest possible visual reasoning task: deciding whether two objects are the same or different. We found that, although some models performed better than others in generalizing the same-different relation to specific types of images, no model was able to generalize this relation across the board. We conclude that abstract visual reasoning remains largely an unresolved challenge for DNNs. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2302.03992 [pdf, other]

Convolutional Neural Networks Trained to Identify Words Provide a Surprisingly Good Account of Visual Form Priming Effects

Authors: Dong Yin, Valerio Biscione, Jeffrey Bowers

Abstract: A wide variety of orthographic coding schemes and models of visual word identification have been developed to account for masked priming data that provide a measure of orthographic similarity between letter strings. These models tend to include hand-coded orthographic representations with single unit coding for specific forms of knowledge (e.g., units coding for a letter in a given position). Here… ▽ More A wide variety of orthographic coding schemes and models of visual word identification have been developed to account for masked priming data that provide a measure of orthographic similarity between letter strings. These models tend to include hand-coded orthographic representations with single unit coding for specific forms of knowledge (e.g., units coding for a letter in a given position). Here we assess how well a range of these coding schemes and models account for the pattern of form priming effects taken from the Form Priming Project and compare these findings to results observed with 11 standard deep neural network models (DNNs) developed in computer science. We find that deep convolutional networks (CNNs) perform as well or better than the coding schemes and word recognition models, whereas transformer networks did less well. The success of CNNs is remarkable as their architectures were not developed to support word recognition (they were designed to perform well on object recognition), they classify pixel images of words (rather than artificial encodings of letter strings), and their training was highly simplified (not respecting many key aspects of human experience). In addition to these form priming effects, we find that the DNNs can account for visual similarity effects on priming that are beyond all current psychological models of priming. The findings add to the recent work of (Hannagan et al., 2021) and suggest that CNNs should be given more attention in psychology as models of human visual word recognition. △ Less

Submitted 14 March, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

arXiv:2206.01211 [pdf]

Electrically pumped quantum-dot lasers grown on 300 mm patterned Si photonic wafers

Authors: Chen Shang, Kaiyin Feng, Eamonn T. Hughes, Andrew Clark, Mukul Debnath, Rosalyn Koscica, Gerald Leake, Joshua Herman, David Harame, Peter Ludewig, Yating Wan, John E. Bowers

Abstract: Monolithic integration of quantum dot (QD) gain materials onto Si photonic platforms via direct epitaxial growth is a promising solution for on-chip light sources. Recent developments have demonstrated superior device reliability in blanket hetero-epitaxy of III-V devices on Si at elevated temperatures. Yet, thick, defect management epi designs prevent vertical light coupling from the gain region… ▽ More Monolithic integration of quantum dot (QD) gain materials onto Si photonic platforms via direct epitaxial growth is a promising solution for on-chip light sources. Recent developments have demonstrated superior device reliability in blanket hetero-epitaxy of III-V devices on Si at elevated temperatures. Yet, thick, defect management epi designs prevent vertical light coupling from the gain region to the Si-on-Insulator (SOI) waveguides. Here, we demonstrate the first electrically pumped QD lasers grown on a 300 mm patterned (001) Si wafer with a butt-coupled configuration by molecular beam epitaxy (MBE). Unique growth and fabrication challenges imposed by the template architecture have been resolved, contributing to continuous wave lasing to 60 °C and a maximum double-side output power of 126.6 mW at 20 °C with a double-side wall plug efficiency of 8.6%. The potential for robust on-chip laser operation and efficient low-loss light coupling to Si photonic circuits makes this heteroepitaxial integration platform on Si promising for scalable and low-cost mass production. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 11 pages including references, 6 figures

arXiv:2204.03740 [pdf, other]

doi 10.1016/j.neunet.2023.02.032

Successes and critical failures of neural networks in capturing human-like speech recognition

Authors: Federico Adolfi, Jeffrey S. Bowers, David Poeppel

Abstract: Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for su… ▽ More Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition. △ Less

Submitted 19 April, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

Journal ref: Neural Networks, 162, 199-211 (2023)

arXiv:2204.02283 [pdf, other]

Lost in Latent Space: Disentangled Models and the Challenge of Combinatorial Generalisation

Authors: Milton L. Montero, Jeffrey S. Bowers, Rui Ponte Costa, Casimir J. H. Ludwig, Gaurav Malhotra

Abstract: Recent research has shown that generative models with highly disentangled representations fail to generalise to unseen combination of generative factor values. These findings contradict earlier research which showed improved performance in out-of-training distribution settings when compared to entangled representations. Additionally, it is not clear if the reported failures are due to (a) encoders… ▽ More Recent research has shown that generative models with highly disentangled representations fail to generalise to unseen combination of generative factor values. These findings contradict earlier research which showed improved performance in out-of-training distribution settings when compared to entangled representations. Additionally, it is not clear if the reported failures are due to (a) encoders failing to map novel combinations to the proper regions of the latent space or (b) novel combinations being mapped correctly but the decoder/downstream process is unable to render the correct output for the unseen combinations. We investigate these alternatives by testing several models on a range of datasets and training settings. We find that (i) when models fail, their encoders also fail to map unseen combinations to correct regions of the latent space and (ii) when models succeed, it is either because the test conditions do not exclude enough examples, or because excluded generative factors determine independent parts of the output image. Based on these results, we argue that to generalise properly, models not only need to capture factors of variation, but also understand how to invert the generative process that was used to generate the data. △ Less

Submitted 14 June, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: 10 pages and 7 figures in main text (not including references). 27 pages and 31 figures in appendix. Updated to match the camera-ready version

ACM Class: I.2.6; I.2.10; I.4.5; I.4.10; I.5.1; I.5.3

Journal ref: Adv.Neur.Info.Proc.Sys. 35 (2022) 10136-1049

arXiv:2203.07302 [pdf, other]

Mixed Evidence for Gestalt Grouping in Deep Neural Networks

Authors: Valerio Biscione, Jeffrey S. Bowers

Abstract: Gestalt psychologists have identified a range of conditions in which humans organize elements of a scene into a group or whole, and perceptual grouping principles play an essential role in scene perception and object identification. Recently, Deep Neural Networks (DNNs) trained on natural images (ImageNet) have been proposed as compelling models of human vision based on reports that they perform w… ▽ More Gestalt psychologists have identified a range of conditions in which humans organize elements of a scene into a group or whole, and perceptual grouping principles play an essential role in scene perception and object identification. Recently, Deep Neural Networks (DNNs) trained on natural images (ImageNet) have been proposed as compelling models of human vision based on reports that they perform well on various brain and behavioral benchmarks. Here we test a total of 16 networks covering a variety of architectures and learning paradigms (convolutional, attention-based, supervised and self-supervised, feed-forward and recurrent) on dots (Experiment 1) and more complex shapes (Experiment 2) stimuli that produce strong Gestalts effects in humans. In Experiment 1 we found that convolutional networks were indeed sensitive in a human-like fashion to the principles of proximity, linearity, and orientation, but only at the output layer. In Experiment 2, we found that most networks exhibited Gestalt effects only for a few sets, and again only at the latest stage of processing. Overall, self-supervised and Vision-Transformer appeared to perform worse than convolutional networks in terms of human similarity. Remarkably, no model presented a grouping effect at the early or intermediate stages of processing. This is at odds with the widespread assumption that Gestalts occur prior to object recognition, and indeed, serve to organize the visual scene for the sake of object recognition. Our overall conclusion is that, albeit noteworthy that networks trained on simple 2D images support a form of Gestalt grouping for some stimuli at the output layer, this ability does not seem to transfer to more complex features. Additionally, the fact that this grouping only occurs at the last layer suggests that networks learn fundamentally different perceptual properties than humans. △ Less

Submitted 20 February, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Accepted in Computational Brain & Behaviour

arXiv:2110.05861 [pdf, other]

Convolutional Neural Networks Are Not Invariant to Translation, but They Can Learn to Be

Authors: Valerio Biscione, Jeffrey S. Bowers

Abstract: When seeing a new object, humans can immediately recognize it across different retinal locations: the internal object representation is invariant to translation. It is commonly believed that Convolutional Neural Networks (CNNs) are architecturally invariant to translation thanks to the convolution and/or pooling operations they are endowed with. In fact, several studies have found that these netwo… ▽ More When seeing a new object, humans can immediately recognize it across different retinal locations: the internal object representation is invariant to translation. It is commonly believed that Convolutional Neural Networks (CNNs) are architecturally invariant to translation thanks to the convolution and/or pooling operations they are endowed with. In fact, several studies have found that these networks systematically fail to recognise new objects on untrained locations. In this work, we test a wide variety of CNNs architectures showing how, apart from DenseNet-121, none of the models tested was architecturally invariant to translation. Nevertheless, all of them could learn to be invariant to translation. We show how this can be achieved by pretraining on ImageNet, and it is sometimes possible with much simpler data sets when all the items are fully translated across the input canvas. At the same time, this invariance can be disrupted by further training due to catastrophic forgetting/interference. These experiments show how pretraining a network on an environment with the right `latent' characteristics (a more naturalistic environment) can result in the network learning deep perceptual rules which would dramatically improve subsequent generalization. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Journal ref: Journal of Machine Learning Research 2021 22(229) 1-28

arXiv:2110.01476 [pdf, other]

doi 10.1016/j.neunet.2022.02.017

Learning Online Visual Invariances for Novel Objects via Supervised and Self-Supervised Training

Authors: Valerio Biscione, Jeffrey S. Bowers

Abstract: Humans can identify objects following various spatial transformations such as scale and viewpoint. This extends to novel objects, after a single presentation at a single pose, sometimes referred to as online invariance. CNNs have been proposed as a compelling model of human vision, but their ability to identify objects across transformations is typically tested on held-out samples of trained categ… ▽ More Humans can identify objects following various spatial transformations such as scale and viewpoint. This extends to novel objects, after a single presentation at a single pose, sometimes referred to as online invariance. CNNs have been proposed as a compelling model of human vision, but their ability to identify objects across transformations is typically tested on held-out samples of trained categories after extensive data augmentation. This paper assesses whether standard CNNs can support human-like online invariance by training models to recognize images of synthetic 3D objects that undergo several transformations: rotation, scaling, translation, brightness, contrast, and viewpoint. Through the analysis of models' internal representations, we show that standard supervised CNNs trained on transformed objects can acquire strong invariances on novel classes even when trained with as few as 50 objects taken from 10 classes. This extended to a different dataset of photographs of real objects. We also show that these invariances can be acquired in a self-supervised way, through solving the same/different task. We suggest that this latter approach may be similar to how humans acquire invariances. △ Less

Submitted 14 January, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

Journal ref: Neural Networks, Volume 150, 2022, Pages 222-236, ISSN 0893-6080,

arXiv:2107.06872 [pdf, ps, other]

Generalisation in Neural Networks Does not Require Feature Overlap

Authors: Jeff Mitchell, Jeffrey S. Bowers

Abstract: That shared features between train and test data are required for generalisation in artificial neural networks has been a common assumption of both proponents and critics of these models. Here, we show that convolutional architectures avoid this limitation by applying them to two well known challenges, based on learning the identity function and learning rules governing sequences of words. In each… ▽ More That shared features between train and test data are required for generalisation in artificial neural networks has been a common assumption of both proponents and critics of these models. Here, we show that convolutional architectures avoid this limitation by applying them to two well known challenges, based on learning the identity function and learning rules governing sequences of words. In each case, successful performance on the test set requires generalising to features that were not present in the training data, which is typically not feasible for standard connectionist models. However, our experiments demonstrate that neural networks can succeed on such problems when they incorporate the weight sharing employed by convolutional architectures. In the image processing domain, such architectures are intended to reflect the symmetry under spatial translations of the natural world that such images depict. We discuss the role of symmetry in the two tasks and its connection to generalisation. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: 19 pages, 3 Figures. Submitted to Cognition

MSC Class: 68T07 ACM Class: I.2.6; J.4

arXiv:2105.14934 [pdf, other]

Review Of Integrated Photonic Elastic WDM Switches For Data Centers

Authors: Akhilesh S P Khope, Anirban Samanta, Xian Xiao, Ben Yoo, John E Bowers

Abstract: In this review paper, we present an elaborate discussion on wavelength selective switches and their demonstrations. We also review packaging and electronic photonic integration of switches; a topic neglected in other review papers. We also cover wavelength locking which is paramount in switching networks with many tunable filters. In this review paper, we present an elaborate discussion on wavelength selective switches and their demonstrations. We also review packaging and electronic photonic integration of switches; a topic neglected in other review papers. We also cover wavelength locking which is paramount in switching networks with many tunable filters. △ Less

Submitted 23 May, 2021; originally announced May 2021.

arXiv:2011.11757 [pdf, other]

Learning Translation Invariance in CNNs

Authors: Valerio Biscione, Jeffrey Bowers

Abstract: When seeing a new object, humans can immediately recognize it across different retinal locations: we say that the internal object representation is invariant to translation. It is commonly believed that Convolutional Neural Networks (CNNs) are architecturally invariant to translation thanks to the convolution and/or pooling operations they are endowed with. In fact, several works have found that t… ▽ More When seeing a new object, humans can immediately recognize it across different retinal locations: we say that the internal object representation is invariant to translation. It is commonly believed that Convolutional Neural Networks (CNNs) are architecturally invariant to translation thanks to the convolution and/or pooling operations they are endowed with. In fact, several works have found that these networks systematically fail to recognise new objects on untrained locations. In this work we show how, even though CNNs are not 'architecturally invariant' to translation, they can indeed 'learn' to be invariant to translation. We verified that this can be achieved by pretraining on ImageNet, and we found that it is also possible with much simpler datasets in which the items are fully translated across the input canvas. We investigated how this pretraining affected the internal network representations, finding that the invariance was almost always acquired, even though it was some times disrupted by further training due to catastrophic forgetting/interference. These experiments show how pretraining a network on an environment with the right 'latent' characteristics (a more naturalistic environment) can result in the network learning deep perceptual rules which would dramatically improve subsequent generalization. △ Less

Submitted 6 November, 2020; originally announced November 2020.

Comments: NeurIPS 2020 Workshop SVRHM

arXiv:2007.01062 [pdf, other]

Are there any 'object detectors' in the hidden layers of CNNs trained to identify objects or scenes?

Authors: Ella M. Gale, Nicholas Martin, Ryan Blything, Anh Nguyen, Jeffrey S. Bowers

Abstract: Various methods of measuring unit selectivity have been developed with the aim of better understanding how neural networks work. But the different measures provide divergent estimates of selectivity, and this has led to different conclusions regarding the conditions in which selective object representations are learned and the functional relevance of these representations. In an attempt to better… ▽ More Various methods of measuring unit selectivity have been developed with the aim of better understanding how neural networks work. But the different measures provide divergent estimates of selectivity, and this has led to different conclusions regarding the conditions in which selective object representations are learned and the functional relevance of these representations. In an attempt to better characterize object selectivity, we undertake a comparison of various selectivity measures on a large set of units in AlexNet, including localist selectivity, precision, class-conditional mean activity selectivity (CCMAS), network dissection,the human interpretation of activation maximization (AM) images, and standard signal-detection measures. We find that the different measures provide different estimates of object selectivity, with precision and CCMAS measures providing misleadingly high estimates. Indeed, the most selective units had a poor hit-rate or a high false-alarm rate (or both) in object classification, making them poor object detectors. We fail to find any units that are even remotely as selective as the 'grandmother cell' units reported in recurrent neural networks. In order to generalize these results, we compared selectivity measures on units in VGG-16 and GoogLeNet trained on the ImageNet or Places-365 datasets that have been described as 'object detectors'. Again, we find poor hit-rates and high false-alarm rates for object classification. We conclude that signal-detection measures provide a better assessment of single-unit selectivity compared to common alternative approaches, and that deep convolutional networks of image classification do not learn object detectors in their hidden layers. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: Published in Vision Research 2020, 19 pages, 8 figures

MSC Class: 68-T02 ACM Class: I.2.6; I.2.10; I.4.10; I.4.0; J.4

Journal ref: Vision Research, 2020

arXiv:1906.02136 [pdf]

LMF Reloaded

Authors: Laurent Romary, Mohamed Khemakhem, Fahad Khan, Jack Bowers, Nicoletta Calzolari, Monte George, Mandy Pet, Piotr Bański

Abstract: Lexical Markup Framework (LMF) or ISO 24613 [1] is a de jure standard that provides a framework for modelling and encoding lexical information in retrodigitised print dictionaries and NLP lexical databases. An in-depth review is currently underway within the standardisation subcommittee , ISO-TC37/SC4/WG4, to find a more modular, flexible and durable follow up to the original LMF standard publish… ▽ More Lexical Markup Framework (LMF) or ISO 24613 [1] is a de jure standard that provides a framework for modelling and encoding lexical information in retrodigitised print dictionaries and NLP lexical databases. An in-depth review is currently underway within the standardisation subcommittee , ISO-TC37/SC4/WG4, to find a more modular, flexible and durable follow up to the original LMF standard published in 2008. In this paper we will present some of the major improvements which have so far been implemented in the new version of LMF. △ Less

Submitted 23 May, 2019; originally announced June 2019.

Comments: AsiaLex 2019: Past, Present and Future, Jun 2019, Istanbul, Turkey

arXiv:1903.12354 [pdf, other]

Training neural networks to encode symbols enables combinatorial generalization

Authors: Ivan Vankov, Jeffrey Bowers

Abstract: Combinatorial generalization - the ability to understand and produce novel combinations of already familiar elements - is considered to be a core capacity of the human mind and a major challenge to neural network models. A significant body of research suggests that conventional neural networks can't solve this problem unless they are endowed with mechanisms specifically engineered for the purpose… ▽ More Combinatorial generalization - the ability to understand and produce novel combinations of already familiar elements - is considered to be a core capacity of the human mind and a major challenge to neural network models. A significant body of research suggests that conventional neural networks can't solve this problem unless they are endowed with mechanisms specifically engineered for the purpose of representing symbols. In this paper we introduce a novel way of representing symbolic structures in connectionist terms - the vectors approach to representing symbols (VARS), which allows training standard neural architectures to encode symbolic knowledge explicitly at their output layers. In two simulations, we show that neural networks not only can learn to produce VARS representations, but in doing so they achieve combinatorial generalization in their symbolic and non-symbolic output. This adds to other recent work that has shown improved combinatorial generalization under specific training conditions, and raises the question of whether specific mechanisms or training routines are needed to support symbolic processing. △ Less

Submitted 23 September, 2019; v1 submitted 29 March, 2019; originally announced March 2019.

arXiv:1806.03934 [pdf, other]

When and where do feed-forward neural networks learn localist representations?

Authors: Ella M. Gale, Nicolas Martin, Jeffrey S. Bowers

Abstract: According to parallel distributed processing (PDP) theory in psychology, neural networks (NN) learn distributed rather than interpretable localist representations. This view has been held so strongly that few researchers have analysed single units to determine if this assumption is correct. However, recent results from psychology, neuroscience and computer science have shown the occasional existen… ▽ More According to parallel distributed processing (PDP) theory in psychology, neural networks (NN) learn distributed rather than interpretable localist representations. This view has been held so strongly that few researchers have analysed single units to determine if this assumption is correct. However, recent results from psychology, neuroscience and computer science have shown the occasional existence of local codes emerging in artificial and biological neural networks. In this paper, we undertake the first systematic survey of when local codes emerge in a feed-forward neural network, using generated input and output data with known qualities. We find that the number of local codes that emerge from a NN follows a well-defined distribution across the number of hidden layer neurons, with a peak determined by the size of input data, number of examples presented and the sparsity of input data. Using a 1-hot output code drastically decreases the number of local codes on the hidden layer. The number of emergent local codes increases with the percentage of dropout applied to the hidden layer, suggesting that the localist encoding may offer a resilience to noisy networks. This data suggests that localist coding can emerge from feed-forward PDP networks and suggests some of the conditions that may lead to interpretable localist representations in the cortex. The findings highlight how local codes should not be dismissed out of hand. △ Less

Submitted 11 June, 2018; originally announced June 2018.

MSC Class: 92b20

arXiv:1611.10122 [pdf]

Deep encoding of etymological information in TEI

Authors: Jack Bowers, Laurent Romary

Abstract: This paper aims to provide a comprehensive modeling and representation of etymological data in digital dictionaries. The purpose is to integrate in one coherent framework both digital representations of legacy dictionaries, and also born-digital lexical databases that are constructed manually or semi-automatically. We want to propose a systematic and coherent set of modeling principles for a varie… ▽ More This paper aims to provide a comprehensive modeling and representation of etymological data in digital dictionaries. The purpose is to integrate in one coherent framework both digital representations of legacy dictionaries, and also born-digital lexical databases that are constructed manually or semi-automatically. We want to propose a systematic and coherent set of modeling principles for a variety of etymological phenomena that may contribute to the creation of a continuum between existing and future lexical constructs, where anyone interested in tracing the history of words and their meanings will be able to seamlessly query lexical resources.Instead of designing an ad hoc model and representation language for digital etymological data, we will focus on identifying all the possibilities offered by the TEI guidelines for the representation of lexical information. △ Less

Submitted 30 November, 2016; originally announced November 2016.

arXiv:1405.6260 [pdf, other]

Faster Reductions for Straight Skeletons to Motorcycle Graphs

Authors: John C. Bowers

Abstract: We give an algorithm that reduces the straight skeleton to the motorcycle graph in $O(n\log n)$ time for simple polygons and $O(n(\log n)\log m)$ time for a planar straight line graph (PSLG) with $m$ connected components. This improves on the previous best of $O(n(\log n)\log r)$ for polygons with $r$ reflex vertices (possibly with holes) and $O(n^2\log n)$ for general planar straight line graphs.… ▽ More We give an algorithm that reduces the straight skeleton to the motorcycle graph in $O(n\log n)$ time for simple polygons and $O(n(\log n)\log m)$ time for a planar straight line graph (PSLG) with $m$ connected components. This improves on the previous best of $O(n(\log n)\log r)$ for polygons with $r$ reflex vertices (possibly with holes) and $O(n^2\log n)$ for general planar straight line graphs. This allows us to speed up the straight skeleton algorithm for polygons and PSLGs. For a polygon with $h$ holes and $r$ reflex vertices we achieve a speedup from $O(n(\log n)\log r + r^{4/3+ε})$ time to $O(n(\log n)\log h + r^{4/3 + ε})$ time in the non-degenerate case and from $O(n(\log n)\log r + r^{17/11 + ε})$ to $O(n(\log n)\log h + r^{17/11 + ε})$ in degenerate cases. For a PSLG with $m$ connected components and $r$ reflex vertices, we gain a speed up from $O(n^{1 + ε} + n^{8/11 + ε}r^{9/11+ε})$ to $O(n(\log n)\log m + r^{4/3 + ε})$ in the non-degenerate case and from $O(n^{1 + ε} + n^{8/11 + ε}r^{9/11+ε})$ to $O(n(\log n)\log m + r^{17/11 + ε})$ in the degenerate case. △ Less

Submitted 6 August, 2014; v1 submitted 23 May, 2014; originally announced May 2014.

Comments: Fixed typos from V3

Showing 1–23 of 23 results for author: Bowers, J