-
Tracking objects that change in appearance with phase synchrony
Authors:
Sabine Muzellec,
Drew Linsley,
Alekh K. Ashok,
Ennio Mingolla,
Girik Malik,
Rufin VanRullen,
Thomas Serre
Abstract:
Objects we encounter often change appearance as we interact with them. Changes in illumination (shadows), object pose, or the movement of non-rigid objects can drastically alter available image features. How do biological visual systems track objects as they change? One plausible mechanism involves attentional mechanisms for reasoning about the locations of objects independently of their appearanc…
▽ More
Objects we encounter often change appearance as we interact with them. Changes in illumination (shadows), object pose, or the movement of non-rigid objects can drastically alter available image features. How do biological visual systems track objects as they change? One plausible mechanism involves attentional mechanisms for reasoning about the locations of objects independently of their appearances -- a capability that prominent neuroscience theories have associated with computing through neural synchrony. Here, we describe a novel deep learning circuit that can learn to precisely control attention to features separately from their location in the world through neural synchrony: the complex-valued recurrent neural network (CV-RNN). Next, we compare object tracking in humans, the CV-RNN, and other deep neural networks (DNNs), using FeatureTracker: a large-scale challenge that asks observers to track objects as their locations and appearances change in precisely controlled ways. While humans effortlessly solved FeatureTracker, state-of-the-art DNNs did not. In contrast, our CV-RNN behaved similarly to humans on the challenge, providing a computational proof-of-concept for the role of phase synchronization as a neural substrate for tracking appearance-morphing objects as they move about.
△ Less
Submitted 2 March, 2025; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Extreme Image Transformations Facilitate Robust Latent Object Representations
Authors:
Girik Malik,
Dakarai Crowder,
Ennio Mingolla
Abstract:
Adversarial attacks can affect the object recognition capabilities of machines in wild. These can often result from spurious correlations between input and class labels, and are prone to memorization in large networks. While networks are expected to do automated feature selection, it is not effective at the scale of the object. Humans, however, are able to select the minimum set of features requir…
▽ More
Adversarial attacks can affect the object recognition capabilities of machines in wild. These can often result from spurious correlations between input and class labels, and are prone to memorization in large networks. While networks are expected to do automated feature selection, it is not effective at the scale of the object. Humans, however, are able to select the minimum set of features required to form a robust representation of an object. In this work, we show that finetuning any pretrained off-the-shelf network with Extreme Image Transformations (EIT) not only helps in learning a robust latent representation, it also improves the performance of these networks against common adversarial attacks of various intensities. Our EIT trained networks show strong activations in the object regions even when tested with more intense noise, showing promising generalizations across different kinds of adversarial attacks.
△ Less
Submitted 19 September, 2023;
originally announced October 2023.
-
Extreme Image Transformations Affect Humans and Machines Differently
Authors:
Girik Malik,
Dakarai Crowder,
Ennio Mingolla
Abstract:
Some recent artificial neural networks (ANNs) claim to model aspects of primate neural and human performance data. Their success in object recognition is, however, dependent on exploiting low-level features for solving visual tasks in a way that humans do not. As a result, out-of-distribution or adversarial input is often challenging for ANNs. Humans instead learn abstract patterns and are mostly…
▽ More
Some recent artificial neural networks (ANNs) claim to model aspects of primate neural and human performance data. Their success in object recognition is, however, dependent on exploiting low-level features for solving visual tasks in a way that humans do not. As a result, out-of-distribution or adversarial input is often challenging for ANNs. Humans instead learn abstract patterns and are mostly unaffected by many extreme image distortions. We introduce a set of novel image transforms inspired by neurophysiological findings and evaluate humans and ANNs on an object recognition task. We show that machines perform better than humans for certain transforms and struggle to perform at par with humans on others that are easy for humans. We quantify the differences in accuracy for humans and machines and find a ranking of difficulty for our transforms for human data. We also suggest how certain characteristics of human visual processing can be adapted to improve the performance of ANNs for our difficult-for-machines transforms.
△ Less
Submitted 11 April, 2023; v1 submitted 30 November, 2022;
originally announced December 2022.
-
The Challenge of Appearance-Free Object Tracking with Feedforward Neural Networks
Authors:
Girik Malik,
Drew Linsley,
Thomas Serre,
Ennio Mingolla
Abstract:
Nearly all models for object tracking with artificial neural networks depend on appearance features extracted from a "backbone" architecture, designed for object recognition. Indeed, significant progress on object tracking has been spurred by introducing backbones that are better able to discriminate objects by their appearance. However, extensive neurophysiology and psychophysics evidence suggest…
▽ More
Nearly all models for object tracking with artificial neural networks depend on appearance features extracted from a "backbone" architecture, designed for object recognition. Indeed, significant progress on object tracking has been spurred by introducing backbones that are better able to discriminate objects by their appearance. However, extensive neurophysiology and psychophysics evidence suggests that biological visual systems track objects using both appearance and motion features. Here, we introduce $\textit{PathTracker}$, a visual challenge inspired by cognitive psychology, which tests the ability of observers to learn to track objects solely by their motion. We find that standard 3D-convolutional deep network models struggle to solve this task when clutter is introduced into the generated scenes, or when objects travel long distances. This challenge reveals that tracing the path of object motion is a blind spot of feedforward neural networks. We expect that strategies for appearance-free object tracking from biological vision can inspire solutions these failures of deep neural networks.
△ Less
Submitted 30 September, 2021;
originally announced October 2021.
-
Tracking Without Re-recognition in Humans and Machines
Authors:
Drew Linsley,
Girik Malik,
Junkyung Kim,
Lakshmi N Govindarajan,
Ennio Mingolla,
Thomas Serre
Abstract:
Imagine trying to track one particular fruitfly in a swarm of hundreds. Higher biological visual systems have evolved to track moving objects by relying on both appearance and motion features. We investigate if state-of-the-art deep neural networks for visual tracking are capable of the same. For this, we introduce PathTracker, a synthetic visual challenge that asks human observers and machines to…
▽ More
Imagine trying to track one particular fruitfly in a swarm of hundreds. Higher biological visual systems have evolved to track moving objects by relying on both appearance and motion features. We investigate if state-of-the-art deep neural networks for visual tracking are capable of the same. For this, we introduce PathTracker, a synthetic visual challenge that asks human observers and machines to track a target object in the midst of identical-looking "distractor" objects. While humans effortlessly learn PathTracker and generalize to systematic variations in task design, state-of-the-art deep networks struggle. To address this limitation, we identify and model circuit mechanisms in biological brains that are implicated in tracking objects based on motion cues. When instantiated as a recurrent network, our circuit model learns to solve PathTracker with a robust visual strategy that rivals human performance and explains a significant proportion of their decision-making on the challenge. We also show that the success of this circuit model extends to object tracking in natural videos. Adding it to a transformer-based architecture for object tracking builds tolerance to visual nuisances that affect object appearance, resulting in a new state-of-the-art performance on the large-scale TrackingNet object tracking challenge. Our work highlights the importance of building artificial vision models that can help us better understand human vision and improve computer vision.
△ Less
Submitted 2 June, 2021; v1 submitted 27 May, 2021;
originally announced May 2021.