-
A discrete adjoint method for deterministic and probabilistic eikonal-equation-based inversion of traveltime for velocity and source location
Authors:
Andrea Zunino,
Scott Keating,
Andreas Fichtner
Abstract:
Seismic traveltime tomography represents a popular and useful tool for unravelling the structure of the subsurface across the scales. In this work we address the case where the forward model is represented by the eikonal equation and derive a formalism to solve the inverse problem where gradients are calculated efficiently using the discrete adjoint state method. Our approach provides gradients wi…
▽ More
Seismic traveltime tomography represents a popular and useful tool for unravelling the structure of the subsurface across the scales. In this work we address the case where the forward model is represented by the eikonal equation and derive a formalism to solve the inverse problem where gradients are calculated efficiently using the discrete adjoint state method. Our approach provides gradients with respect to both velocity structure and source locations, allowing us to perform a consistent joint inversion. The forward problem is solved using a second-order fast-marching method, which provides a strategy to efficiently solve the adjoint problem. Our approach allows for arbitrary positions of both sources and receivers and for a refined grid around the source region to reduce errors in computed traveltimes. We show how gradients computed using the discrete adjoint method can be employed to perform either deterministic inversion, i.e., solving an optimization problem, or for a probabilistic (Bayesian) approach, i.e., obtaining a posterior probability density function. We show applications of our methodology on a set of synthetic examples both in 2D and 3D using the L-BFGS algorithm for the deterministic case and the Hamiltonian Monte Carlo algorithm for the probabilistic case.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Structured Detection for Simultaneous Super-Resolution and Optical Sectioning in Laser Scanning Microscopy
Authors:
Alessandro Zunino,
Giacomo Garrè,
Eleonora Perego,
Sabrina Zappone,
Mattia Donato,
Giuseppe Vicidomini
Abstract:
Fast and sensitive detector arrays enable image scanning microscopy (ISM), overcoming the trade-off between spatial resolution and signal-to-noise ratio (SNR) typical of confocal microscopy. However, current ISM approaches cannot provide optical sectioning and fail with thick samples, unless the size of the detector is limited. Thus, another trade-off between optical sectioning and SNR persists. H…
▽ More
Fast and sensitive detector arrays enable image scanning microscopy (ISM), overcoming the trade-off between spatial resolution and signal-to-noise ratio (SNR) typical of confocal microscopy. However, current ISM approaches cannot provide optical sectioning and fail with thick samples, unless the size of the detector is limited. Thus, another trade-off between optical sectioning and SNR persists. Here, we propose a method without drawbacks that combines uncompromised super-resolution, high SNR, and optical sectioning. Furthermore, our approach enables super-sampling of images, relaxing Nyquist's criterion by a factor of two. Based on the observation that imaging with a detector array inherently embeds axial information about the sample, we designed a straightforward reconstruction algorithm that inverts the physical model of ISM. We present the comprehensive theoretical framework and validate our method with synthetic and experimental images of biological samples captured using a custom setup equipped with a single-photon avalanche diode (SPAD) array detector. We demonstrate the feasibility of our approach exciting fluorescence emission both in the linear and non-linear regime. Moreover, we generalize the algorithm for fluorescence lifetime imaging, fully exploiting the single-photon timing ability of the SPAD array detector. Our method outperforms conventional approaches to ISM and can be extended to any LSM technique.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Image Scanning Microscopy Reconstruction by Autocorrelation Inversion
Authors:
Daniele Ancora,
Alessandro Zunino,
Giuseppe Vicidomini,
Alvaro H. Crevenna
Abstract:
Confocal laser scanning microscopy (CLSM) stands out as one of the most widely used microscopy techniques, thanks to its three-dimensional imaging capability and its sub-diffraction spatial resolution, achieved through the closure of a pinhole in front of a single-element detector. However, the pinhole also rejects useful photons and beating the diffraction limit comes at the price of irremediably…
▽ More
Confocal laser scanning microscopy (CLSM) stands out as one of the most widely used microscopy techniques, thanks to its three-dimensional imaging capability and its sub-diffraction spatial resolution, achieved through the closure of a pinhole in front of a single-element detector. However, the pinhole also rejects useful photons and beating the diffraction limit comes at the price of irremediably compromising the signal-to-noise ratio (SNR) of the data. Image scanning microscopy (ISM) emerged as the natural evolution of CLSM, exploiting a small array detector in place of the pinhole and the single-element detector. Each sensitive element is small enough to achieve sub-diffraction resolution through the confocal effect, but the size of the whole detector is large enough to guarantee excellent collection efficiency and SNR. However, the raw data produced by an ISM setup consists of a 4D dataset which can be seen as a set of confocal-like images. Thus, fusing the dataset into a single super-resolved image requires a dedicated reconstruction algorithm. Conventional methods are multi-image deconvolution, which requires prior knowledge of the system point spread functions (PSF), or adaptive pixel reassignment (APR), which is effective only on a limited range of experimental conditions. In this work, we describe and validate a novel concept for ISM image reconstruction based on autocorrelation inversion. We leverage unique properties of the autocorrelation to discard low-frequency components and maximize the resolution of the reconstructed image, without any assumption on the image or any knowledge of the PSF. Our results push the quality of the ISM reconstruction beyond the level provided by APR and open new perspectives for multi-dimensional image processing.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
FollowMe: a Robust Person Following Framework Based on Re-Identification and Gestures
Authors:
Federico Rollo,
Andrea Zunino,
Gennaro Raiola,
Fabio Amadio,
Arash Ajoudani,
Nikolaos Tsagarakis
Abstract:
Human-robot interaction (HRI) has become a crucial enabler in houses and industries for facilitating operational flexibility. When it comes to mobile collaborative robots, this flexibility can be further increased due to the autonomous mobility and navigation capacity of the robotic agents, expanding their workspace and consequently, the personalizable assistance they can provide to the human oper…
▽ More
Human-robot interaction (HRI) has become a crucial enabler in houses and industries for facilitating operational flexibility. When it comes to mobile collaborative robots, this flexibility can be further increased due to the autonomous mobility and navigation capacity of the robotic agents, expanding their workspace and consequently, the personalizable assistance they can provide to the human operators. This however requires that the robot is capable of detecting and identifying the human counterpart in all stages of the collaborative task, and in particular while following a human in crowded workplaces. To respond to this need, we developed a unified perception and navigation framework, which enables the robot to identify and follow a target person using a combination of visual Re-Identification (Re-ID), hand gestures detection, and collision-free navigation. The Re-ID module can autonomously learn the features of a target person and use the acquired knowledge to visually re-identify the target. The navigation stack is used to follow the target avoiding obstacles and other individuals in the environment. Experiments are conducted with few subjects in a laboratory setting where some unknown dynamic obstacles are introduced.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance
Authors:
Federico Rollo,
Andrea Zunino,
Nikolaos Tsagarakis,
Enrico Mingo Hoffman,
Arash Ajoudani
Abstract:
In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil…
▽ More
In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.
△ Less
Submitted 31 January, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Borehole fibre-optic seismology inside the Northeast Greenland Ice Stream
Authors:
Andreas Fichtner,
Coen Hofstede,
Lars Gebraad,
Andrea Zunino,
Dimitri Zigone,
Olaf Eisen
Abstract:
Ice streams are major contributors to ice sheet mass loss and sea level rise. Effects of their dynamic behaviour are imprinted into seismic properties, such as wave speeds and anisotropy. Here we present results from the first Distributed Acoustic Sensing (DAS) experiment in a deep ice-core borehole in the onset region of the Northeast Greenland Ice Stream. A series of active surface sources produ…
▽ More
Ice streams are major contributors to ice sheet mass loss and sea level rise. Effects of their dynamic behaviour are imprinted into seismic properties, such as wave speeds and anisotropy. Here we present results from the first Distributed Acoustic Sensing (DAS) experiment in a deep ice-core borehole in the onset region of the Northeast Greenland Ice Stream. A series of active surface sources produced clear recordings of the P and S wavefield, including internal reflections, along a 1500 m long fibre-optic cable that was lowered into the borehole. The combination of nonlinear traveltime tomography with a firn model constrained by multi-mode surface wave data, allows us to invert for P and S wave speeds with depth-dependent uncertainties on the order of only 10 m$/$s, and vertical resolution of 20--70 m. The wave speed model in conjunction with the regularly spaced DAS data enable a straightforward separation of internal upward reflections followed by a reverse-time migration that provides a detailed reflectivity image of the ice. While the differences between P and S wave speeds hint at anisotropy related to crystal orientation fabric, the reflectivity image seems to carry a pronounced climatic imprint caused by rapid variations in grain size. Currently, resolution is not limited by the DAS channel spacing. Instead, the maximum frequency of body waves below $\sim$200 Hz, low signal-to-noise ratio caused by poor coupling, and systematic errors produced by the ray approximation, appear to be the leading-order issues. Among these, only the latter has a simple existing solution in the form of full-waveform inversion. Improving signal bandwidth and quality, however, will likely require a significantly larger effort in terms of both sensing equipment and logistics.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization
Authors:
Federico Rollo,
Gennaro Raiola,
Andrea Zunino,
Nikolaos Tsagarakis,
Arash Ajoudani
Abstract:
Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under constructio…
▽ More
Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.
△ Less
Submitted 21 November, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
HMCLab: a framework for solving diverse geophysical inverse problems using the Hamiltonian Monte Carlo method
Authors:
Andrea Zunino,
Lars Gebraad,
Alessandro Ghirotto,
Andreas Fichtner
Abstract:
The use of the probabilistic approach to solve inverse problems is becoming more popular in the geophysical community, thanks to its ability to address nonlinear forward problems and to provide uncertainty quantification. However, such strategy is often tailored to specific applications and therefore there is a lack of a common platform for solving a range of different geophysical inverse problems…
▽ More
The use of the probabilistic approach to solve inverse problems is becoming more popular in the geophysical community, thanks to its ability to address nonlinear forward problems and to provide uncertainty quantification. However, such strategy is often tailored to specific applications and therefore there is a lack of a common platform for solving a range of different geophysical inverse problems and showing potential and pitfalls. We demonstrate a common framework to solve such inverse problems ranging from, e.g, earthquake source location to potential field data inversion and seismic tomography. Within this approach, we can provide probabilities related to certain properties or structures of the subsurface. Thanks to its ability to address high-dimensional problems, the Hamiltonian Monte Carlo (HMC) algorithm has emerged as the state-of-the-art tool for solving geophysical inverse problems within the probabilistic framework. HMC requires the computation of gradients, which can be obtained by adjoint methods, making the solution of tomographic problems ultimately feasible. These results can be obtained with "HMCLab", a tool for solving a range of different geophysical inverse problems using sampling methods, focusing in particular on the HMC algorithm. HMCLab consists of a set of samplers and a set of geophysical forward problems. For each problem its misfit function and gradient computation are provided and, in addition, a set of prior models can be combined to inject additional information into the inverse problem. This allows users to experiment with probabilistic inverse problems and also address real-world studies. We show how to solve a selected set of problems within this framework using variants of the HMC algorithm and analyze the results. HMCLab is provided as an open source package written both in Python and Julia, welcoming contributions from the community.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Reconstructing the Image Scanning Microscopy Dataset: an Inverse Problem
Authors:
Alessandro Zunino,
Marco Castello,
Giuseppe Vicidomini
Abstract:
Confocal laser-scanning microscopy (CLSM) is one of the most popular optical architectures for fluorescence imaging. In CLSM, a focused laser beam excites the fluorescence emission from a specific specimen position. Some actuators scan the probed region across the sample and a photodetector collects a single intensity value for each scan point, building a two-dimensional image pixel-by-pixel. Rece…
▽ More
Confocal laser-scanning microscopy (CLSM) is one of the most popular optical architectures for fluorescence imaging. In CLSM, a focused laser beam excites the fluorescence emission from a specific specimen position. Some actuators scan the probed region across the sample and a photodetector collects a single intensity value for each scan point, building a two-dimensional image pixel-by-pixel. Recently, new fast single-photon array detectors have allowed the recording of a full bi-dimensional image of the probed region for each scan point, transforming CLSM into image scanning microscopy (ISM). This latter offers significant improvements over traditional imaging but requires an optimal processing tool to extract a super-resolved image from the four-dimensional dataset. Here we describe the image formation process in ISM from a statistical point of view, and we use the Bayesian framework to formulate a multi-image deconvolution problem. Notably, the single-photon detector suffers exclusively from the photon shot noise, enabling the development of an effective likelihood model. We derive an iterative likelihood maximization algorithm and test it on experimental and simulated data. Furthermore, we demonstrate that the ISM dataset is redundant, enabling the possibility of obtaining reconstruction sampled at twice the scanning step. Our results prove that in ISM, under appropriate conditions, the Nyquist-Shannon sampling criterium is effectively relaxed. This finding can be exploited to speed up the acquisition process by a factor of four, further improving the versatility of ISM systems.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Diffuse ultrasound computed tomography for medical imaging
Authors:
Ines Elisa Ulrich,
Christian Boehm,
Andrea Zunino,
Cyrill Bösch,
Andreas Fichtner
Abstract:
An alternative approach to ultrasound computed tomography (USCT) for medical imaging is proposed, with the intent to (i) shorten acquisition time for devices with a large number of emitters, (ii) eliminate the calibration step, and (iii) suppress instrument noise. Inspired by seismic ambient field interferometry, the method rests on the active excitation of diffuse ultrasonic wavefields and the ex…
▽ More
An alternative approach to ultrasound computed tomography (USCT) for medical imaging is proposed, with the intent to (i) shorten acquisition time for devices with a large number of emitters, (ii) eliminate the calibration step, and (iii) suppress instrument noise. Inspired by seismic ambient field interferometry, the method rests on the active excitation of diffuse ultrasonic wavefields and the extraction of deterministic travel time information by inter-station correlation. To reduce stochastic errors and accelerate convergence, ensemble interferograms are obtained by phase-weighted stacking of observed and computed correlograms, generated with identical realizations of random sources. Mimicking a breast imaging setup, the accuracy of the travel time measurements as a function of the number of emitters and random realizations can be assessed both analytically and with spectral-element simulations for realistic breast phantoms. The results warrant tomographic reconstructions with straight- or bent-ray approaches, where the effect of inherent stochastic fluctuations can be made significantly smaller than the effect of subjective choices on regularisation. This work constitutes a first conceptual study and a necessary prelude to future implementations.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Compact CNN Structure Learning by Knowledge Distillation
Authors:
Waqar Ahmed,
Andrea Zunino,
Pietro Morerio,
Vittorio Murino
Abstract:
The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wis…
▽ More
The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure while preserving better control over the compression-performance tradeoff. Considering specific resource constraints, e.g., floating-point operations per inference (FLOPs) or model-parameters, our method results in a state of the art network compression while being capable of achieving better inference accuracy. In a comprehensive evaluation, we demonstrate that our method is effective, robust, and consistent with results over a variety of network architectures and datasets, at negligible training overhead. In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression in terms of FLOPs and model-parameters, respectively, while getting 1.05% better model performance than the baseline network.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Informed Proposal Monte Carlo
Authors:
Sarouyeh Khoshkholgh,
Andrea Zunino,
Klaus Mosegaard
Abstract:
Any search or sampling algorithm for solution of inverse problems needs guidance to be efficient. Many algorithms collect and apply information about the problem on the fly, and much improvement has been made in this way. However, as a consequence of the the No-Free-Lunch Theorem, the only way we can ensure a significantly better performance of search and sampling algorithms is to build in as much…
▽ More
Any search or sampling algorithm for solution of inverse problems needs guidance to be efficient. Many algorithms collect and apply information about the problem on the fly, and much improvement has been made in this way. However, as a consequence of the the No-Free-Lunch Theorem, the only way we can ensure a significantly better performance of search and sampling algorithms is to build in as much information about the problem as possible. In the special case of Markov Chain Monte Carlo sampling (MCMC) we review how this is done through the choice of proposal distribution, and we show how this way of adding more information about the problem can be made particularly efficient when based on an approximate physics model of the problem. A highly nonlinear inverse scattering problem with a high-dimensional model space serves as an illustration of the gain of efficiency through this approach.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
Explainable Deep Classification Models for Domain Generalization
Authors:
Andrea Zunino,
Sarah Adel Bargal,
Riccardo Volpi,
Mehrnoosh Sameki,
Jianming Zhang,
Stan Sclaroff,
Vittorio Murino,
Kate Saenko
Abstract:
Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in…
▽ More
Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in the form of a saliency map conveying how much each pixel contributed to the network's decision. Our training strategy enforces a periodic saliency-based feedback to encourage the model to focus on the image regions that directly correspond to the ground-truth object. We quantify explainability using an automated metric, and using human judgement. We propose explainability as a means for bridging the visual-semantic gap between different domains where model explanations are used as a means of disentagling domain specific information from otherwise relevant features. We demonstrate that this leads to improved generalization to new domains without hindering performance on the original domain.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Dynamic multi-focus laser writing with acousto-optofluidics
Authors:
A. Zunino,
S. Surdo,
M. Duocastella
Abstract:
Laser writing of materials is normally performed by the sequential scanning of a single focused beam across a sample. This process is time-consuming and it can severely limit the throughput of laser systems in key applications such as surgery, microelectronics, or manufacturing. Here we report a parallelization strategy based on ultrasound waves in a liquid to diffract light into multiple beamlets…
▽ More
Laser writing of materials is normally performed by the sequential scanning of a single focused beam across a sample. This process is time-consuming and it can severely limit the throughput of laser systems in key applications such as surgery, microelectronics, or manufacturing. Here we report a parallelization strategy based on ultrasound waves in a liquid to diffract light into multiple beamlets. Adjusting amplitude, frequency, or phase of ultrasound allows tunable multi-focus distributions with sub-microsecond control. When combined with sample translation, the dynamic splitting of light leads to high-throughput laser processing, as demonstrated by locally modifying the morphological and wettability properties of metals, polymers, and ceramics. The results illustrate how acousto-optofluidic systems are universal tools for fast multi-focus generation, with potential impact in fields such as imaging or optical trapping.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Guided Zoom: Questioning Network Evidence for Fine-grained Classification
Authors:
Sarah Adel Bargal,
Andrea Zunino,
Vitali Petsiuk,
Jianming Zhang,
Kate Saenko,
Vittorio Murino,
Stan Sclaroff
Abstract:
We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions. It does so by making sure the model has "the right reasons" for a prediction, defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep convolutional neural network makes a prediction is def…
▽ More
We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions. It does so by making sure the model has "the right reasons" for a prediction, defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep convolutional neural network makes a prediction is defined to be the spatial grounding, in the pixel space, for a specific class conditional probability in the model output. Guided Zoom examines how reasonable such evidence is for each of the top-k predicted classes, rather than solely trusting the top-1 prediction. We show that Guided Zoom improves the classification accuracy of a deep convolutional neural network model and obtains state-of-the-art results on three fine-grained classification benchmark datasets.
△ Less
Submitted 23 March, 2020; v1 submitted 6 December, 2018;
originally announced December 2018.
-
Towards a Service-oriented Platform for Intelligent Apps in Intermediate Cities
Authors:
J. Andres Diaz-Pace,
Luis Berdun,
Alejandro Zunino,
Silvia Schiaffino
Abstract:
Smart cities are a growing trend in many cities in Argentina. In particular, the so-called intermediate cities present a context and requirements different from those of large cities with respect to smart cities. One aspect of relevance is to encourage the development of applications (generally for mobile devices) that enable citizens to take advantage of data and services normally associated with…
▽ More
Smart cities are a growing trend in many cities in Argentina. In particular, the so-called intermediate cities present a context and requirements different from those of large cities with respect to smart cities. One aspect of relevance is to encourage the development of applications (generally for mobile devices) that enable citizens to take advantage of data and services normally associated with the city, for example, in the urban mobility domain. In this work, a platform is proposed for intermediate cities that provide "high level" services and that allow the construction of software applications that consume those services. Our platform-centric strategy focused aims to integrate systems and heterogeneous data sources, and provide "intelligent" services to different applications. Examples of these services include: construction of user profiles, recommending local events, and collaborative sensing based on data mining techniques, among others. In this work, the design of this platform (currently in progress) is described, and experiences of applications for urban mobility are discussed, which are being migrated in the form of reusable services provided by the platform
△ Less
Submitted 27 August, 2018;
originally announced August 2018.
-
Excitation Dropout: Encouraging Plasticity in Deep Neural Networks
Authors:
Andrea Zunino,
Sarah Adel Bargal,
Pietro Morerio,
Jianming Zhang,
Stan Sclaroff,
Vittorio Murino
Abstract:
We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction defined as the firing of neurons in specific paths. In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout. In essence, we dropout with higher probability those neurons which contri…
▽ More
We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction defined as the firing of neurons in specific paths. In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout. In essence, we dropout with higher probability those neurons which contribute more to decision making at training time. This approach penalizes high saliency neurons that are most relevant for model prediction, i.e. those having stronger evidence. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization, resulting in a plasticity-like behavior, a characteristic of human brains too. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression using several metrics over four image/video recognition benchmarks.
△ Less
Submitted 21 January, 2021; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Excitation Backprop for RNNs
Authors:
Sarah Adel Bargal,
Andrea Zunino,
Donghyun Kim,
Jianming Zhang,
Vittorio Murino,
Stan Sclaroff
Abstract:
Deep models are state-of-the-art for many vision tasks including video action recognition and video captioning. Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions. Grounding decisions made by deep networks has been studied in spatial visual content, giving more insight into model predictions for images. However, such stu…
▽ More
Deep models are state-of-the-art for many vision tasks including video action recognition and video captioning. Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions. Grounding decisions made by deep networks has been studied in spatial visual content, giving more insight into model predictions for images. However, such studies are relatively lacking for models of spatiotemporal visual content - videos. In this work, we devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep model's classification/captioning output using the model's internal representation. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks.
△ Less
Submitted 8 March, 2018; v1 submitted 17 November, 2017;
originally announced November 2017.
-
What Will I Do Next? The Intention from Motion Experiment
Authors:
Andrea Zunino,
Jacopo Cavazza,
Atesh Koul,
Andrea Cavallo,
Cristina Becchio,
Vittorio Murino
Abstract:
In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with differ…
▽ More
In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems. In this paper, we bridge cognitive and computer vision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside). We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computer vision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction.
△ Less
Submitted 3 August, 2017;
originally announced August 2017.
-
Predicting Human Intentions from Motion Only: A 2D+3D Fusion Approach
Authors:
Andrea Zunino,
Jacopo Cavazza,
Atesh Koul,
Andrea Cavallo,
Cristina Becchio,
Vittorio Murino
Abstract:
In this paper, we address the new problem of the prediction of human intents. There is neuro-psychological evidence that actions performed by humans are anticipated by peculiar motor acts which are discriminant of the type of action going to be performed afterwards. In other words, an actual intent can be forecast by looking at the kinematics of the immediately preceding movement. To prove it in a…
▽ More
In this paper, we address the new problem of the prediction of human intents. There is neuro-psychological evidence that actions performed by humans are anticipated by peculiar motor acts which are discriminant of the type of action going to be performed afterwards. In other words, an actual intent can be forecast by looking at the kinematics of the immediately preceding movement. To prove it in a computational and quantitative manner, we devise a new experimental setup where, without using contextual information, we predict human intents all originating from the same motor act. We posit the problem as a classification task and we introduce a new multi-modal dataset consisting of a set of motion capture marker 3D data and 2D video sequences, where, by only analysing very similar movements in both training and test phases, we are able to predict the underlying intent, i.e., the future, never observed action. We also present an extensive experimental evaluation as a baseline, customizing state-of-the-art techniques for either 3D and 2D data analysis. Realizing that video processing methods lead to inferior performance but show complementary information with respect to 3D data sequences, we developed a 2D+3D fusion analysis where we achieve better classification accuracies, attesting the superiority of the multimodal approach for the context-free prediction of human intents.
△ Less
Submitted 6 September, 2017; v1 submitted 31 May, 2016;
originally announced May 2016.
-
Revisiting Human Action Recognition: Personalization vs. Generalization
Authors:
Andrea Zunino,
Jacopo Cavazza,
Vittorio Murino
Abstract:
By thoroughly revisiting the classic human action recognition paradigm, this paper aims at proposing a new approach for the design of effective action classification systems. Taking as testbed publicly available three-dimensional (MoCap) action/activity datasets, we analyzed and validated different training/testing strategies. In particular, considering that each human action in the datasets is pe…
▽ More
By thoroughly revisiting the classic human action recognition paradigm, this paper aims at proposing a new approach for the design of effective action classification systems. Taking as testbed publicly available three-dimensional (MoCap) action/activity datasets, we analyzed and validated different training/testing strategies. In particular, considering that each human action in the datasets is performed several times by different subjects, we were able to precisely quantify the effect of inter- and intra-subject variability, so as to figure out the impact of several learning approaches in terms of classification performance. The net result is that standard testing strategies consisting in cross-validating the algorithm using typical splits of the data (holdout, k-fold, or one-subject-out) is always outperformed by a "personalization" strategy which learns how a subject is performing an action. In other words, it is advantageous to customize (i.e., personalize) the method to learn the actions carried out by each subject, rather than trying to generalize the actions executions across subjects. Consequently, we finally propose an action recognition framework consisting of a two-stage classification approach where, given a test action, the subject is first identified before the actual recognition of the action takes place. Despite the basic, off-the-shelf descriptors and standard classifiers adopted, we noted a relevant increase in performance with respect to standard state-of-the-art algorithms, so motivating the usage of personalized approaches for designing effective action recognition systems.
△ Less
Submitted 2 May, 2016;
originally announced May 2016.
-
Kernelized Covariance for Action Recognition
Authors:
Jacopo Cavazza,
Andrea Zunino,
Marco San Biagio,
Vittorio Murino
Abstract:
In this paper we aim at increasing the descriptive power of the covariance matrix, limited in capturing linear mutual dependencies between variables only. We present a rigorous and principled mathematical pipeline to recover the kernel trick for computing the covariance matrix, enhancing it to model more complex, non-linear relationships conveyed by the raw data. To this end, we propose Kernelized…
▽ More
In this paper we aim at increasing the descriptive power of the covariance matrix, limited in capturing linear mutual dependencies between variables only. We present a rigorous and principled mathematical pipeline to recover the kernel trick for computing the covariance matrix, enhancing it to model more complex, non-linear relationships conveyed by the raw data. To this end, we propose Kernelized-COV, which generalizes the original covariance representation without compromising the efficiency of the computation. In the experiments, we validate the proposed framework against many previous approaches in the literature, scoring on par or superior with respect to the state of the art on benchmark datasets for 3D action recognition.
△ Less
Submitted 2 September, 2016; v1 submitted 22 April, 2016;
originally announced April 2016.
-
Brainstorm/J: a Java Framework for Intelligent Agents
Authors:
Alejandro Zunino,
Analia Amandi
Abstract:
Despite the effort of many researchers in the area of multi-agent systems (MAS) for designing and programming agents, a few years ago the research community began to take into account that common features among different MAS exists. Based on these common features, several tools have tackled the problem of agent development on specific application domains or specific types of agents. As a consequ…
▽ More
Despite the effort of many researchers in the area of multi-agent systems (MAS) for designing and programming agents, a few years ago the research community began to take into account that common features among different MAS exists. Based on these common features, several tools have tackled the problem of agent development on specific application domains or specific types of agents. As a consequence, their scope is restricted to a subset of the huge application domain of MAS. In this paper we propose a generic infrastructure for programming agents whose name is Brainstorm/J. The infrastructure has been implemented as an object oriented framework. As a consequence, our approach supports a broader scope of MAS applications than previous efforts, being flexible and reusable.
△ Less
Submitted 4 July, 2000;
originally announced July 2000.