Search | arXiv e-print repository

arXiv:2501.13532 [pdf, other]

A discrete adjoint method for deterministic and probabilistic eikonal-equation-based inversion of traveltime for velocity and source location

Authors: Andrea Zunino, Scott Keating, Andreas Fichtner

Abstract: Seismic traveltime tomography represents a popular and useful tool for unravelling the structure of the subsurface across the scales. In this work we address the case where the forward model is represented by the eikonal equation and derive a formalism to solve the inverse problem where gradients are calculated efficiently using the discrete adjoint state method. Our approach provides gradients wi… ▽ More Seismic traveltime tomography represents a popular and useful tool for unravelling the structure of the subsurface across the scales. In this work we address the case where the forward model is represented by the eikonal equation and derive a formalism to solve the inverse problem where gradients are calculated efficiently using the discrete adjoint state method. Our approach provides gradients with respect to both velocity structure and source locations, allowing us to perform a consistent joint inversion. The forward problem is solved using a second-order fast-marching method, which provides a strategy to efficiently solve the adjoint problem. Our approach allows for arbitrary positions of both sources and receivers and for a refined grid around the source region to reduce errors in computed traveltimes. We show how gradients computed using the discrete adjoint method can be employed to perform either deterministic inversion, i.e., solving an optimization problem, or for a probabilistic (Bayesian) approach, i.e., obtaining a posterior probability density function. We show applications of our methodology on a set of synthetic examples both in 2D and 3D using the L-BFGS algorithm for the deterministic case and the Hamiltonian Monte Carlo algorithm for the probabilistic case. △ Less

Submitted 23 January, 2025; originally announced January 2025.

Comments: 28 pages, 7 figures

MSC Class: 86A22

arXiv:2406.12542 [pdf, other]

Structured Detection for Simultaneous Super-Resolution and Optical Sectioning in Laser Scanning Microscopy

Authors: Alessandro Zunino, Giacomo Garrè, Eleonora Perego, Sabrina Zappone, Mattia Donato, Giuseppe Vicidomini

Abstract: Fast and sensitive detector arrays enable image scanning microscopy (ISM), overcoming the trade-off between spatial resolution and signal-to-noise ratio (SNR) typical of confocal microscopy. However, current ISM approaches cannot provide optical sectioning and fail with thick samples, unless the size of the detector is limited. Thus, another trade-off between optical sectioning and SNR persists. H… ▽ More Fast and sensitive detector arrays enable image scanning microscopy (ISM), overcoming the trade-off between spatial resolution and signal-to-noise ratio (SNR) typical of confocal microscopy. However, current ISM approaches cannot provide optical sectioning and fail with thick samples, unless the size of the detector is limited. Thus, another trade-off between optical sectioning and SNR persists. Here, we propose a method without drawbacks that combines uncompromised super-resolution, high SNR, and optical sectioning. Furthermore, our approach enables super-sampling of images, relaxing Nyquist's criterion by a factor of two. Based on the observation that imaging with a detector array inherently embeds axial information about the sample, we designed a straightforward reconstruction algorithm that inverts the physical model of ISM. We present the comprehensive theoretical framework and validate our method with synthetic and experimental images of biological samples captured using a custom setup equipped with a single-photon avalanche diode (SPAD) array detector. We demonstrate the feasibility of our approach exciting fluorescence emission both in the linear and non-linear regime. Moreover, we generalize the algorithm for fluorescence lifetime imaging, fully exploiting the single-photon timing ability of the SPAD array detector. Our method outperforms conventional approaches to ISM and can be extended to any LSM technique. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 71 pages, 32 figures

arXiv:2404.08946 [pdf, other]

Image Scanning Microscopy Reconstruction by Autocorrelation Inversion

Authors: Daniele Ancora, Alessandro Zunino, Giuseppe Vicidomini, Alvaro H. Crevenna

Abstract: Confocal laser scanning microscopy (CLSM) stands out as one of the most widely used microscopy techniques, thanks to its three-dimensional imaging capability and its sub-diffraction spatial resolution, achieved through the closure of a pinhole in front of a single-element detector. However, the pinhole also rejects useful photons and beating the diffraction limit comes at the price of irremediably… ▽ More Confocal laser scanning microscopy (CLSM) stands out as one of the most widely used microscopy techniques, thanks to its three-dimensional imaging capability and its sub-diffraction spatial resolution, achieved through the closure of a pinhole in front of a single-element detector. However, the pinhole also rejects useful photons and beating the diffraction limit comes at the price of irremediably compromising the signal-to-noise ratio (SNR) of the data. Image scanning microscopy (ISM) emerged as the natural evolution of CLSM, exploiting a small array detector in place of the pinhole and the single-element detector. Each sensitive element is small enough to achieve sub-diffraction resolution through the confocal effect, but the size of the whole detector is large enough to guarantee excellent collection efficiency and SNR. However, the raw data produced by an ISM setup consists of a 4D dataset which can be seen as a set of confocal-like images. Thus, fusing the dataset into a single super-resolved image requires a dedicated reconstruction algorithm. Conventional methods are multi-image deconvolution, which requires prior knowledge of the system point spread functions (PSF), or adaptive pixel reassignment (APR), which is effective only on a limited range of experimental conditions. In this work, we describe and validate a novel concept for ISM image reconstruction based on autocorrelation inversion. We leverage unique properties of the autocorrelation to discard low-frequency components and maximize the resolution of the reconstructed image, without any assumption on the image or any knowledge of the PSF. Our results push the quality of the ISM reconstruction beyond the level provided by APR and open new perspectives for multi-dimensional image processing. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 21 pages, 5 figures

arXiv:2311.12992 [pdf, other]

doi 10.1109/ARSO56563.2023.10187536

FollowMe: a Robust Person Following Framework Based on Re-Identification and Gestures

Authors: Federico Rollo, Andrea Zunino, Gennaro Raiola, Fabio Amadio, Arash Ajoudani, Nikolaos Tsagarakis

Abstract: Human-robot interaction (HRI) has become a crucial enabler in houses and industries for facilitating operational flexibility. When it comes to mobile collaborative robots, this flexibility can be further increased due to the autonomous mobility and navigation capacity of the robotic agents, expanding their workspace and consequently, the personalizable assistance they can provide to the human oper… ▽ More Human-robot interaction (HRI) has become a crucial enabler in houses and industries for facilitating operational flexibility. When it comes to mobile collaborative robots, this flexibility can be further increased due to the autonomous mobility and navigation capacity of the robotic agents, expanding their workspace and consequently, the personalizable assistance they can provide to the human operators. This however requires that the robot is capable of detecting and identifying the human counterpart in all stages of the collaborative task, and in particular while following a human in crowded workplaces. To respond to this need, we developed a unified perception and navigation framework, which enables the robot to identify and follow a target person using a combination of visual Re-Identification (Re-ID), hand gestures detection, and collision-free navigation. The Re-ID module can autonomously learn the features of a target person and use the acquired knowledge to visually re-identify the target. The navigation stack is used to follow the target avoiding obstacles and other individuals in the environment. Experiments are conducted with few subjects in a laboratory setting where some unknown dynamic obstacles are introduced. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: published in "2023 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO)"

arXiv:2310.19413 [pdf, other]

CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance

Authors: Federico Rollo, Andrea Zunino, Nikolaos Tsagarakis, Enrico Mingo Hoffman, Arash Ajoudani

Abstract: In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil… ▽ More In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video. △ Less

Submitted 31 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted to the International Conference on Robotics and Automation (ICRA) 2024

arXiv:2307.05976 [pdf, other]

Borehole fibre-optic seismology inside the Northeast Greenland Ice Stream

Authors: Andreas Fichtner, Coen Hofstede, Lars Gebraad, Andrea Zunino, Dimitri Zigone, Olaf Eisen

Abstract: Ice streams are major contributors to ice sheet mass loss and sea level rise. Effects of their dynamic behaviour are imprinted into seismic properties, such as wave speeds and anisotropy. Here we present results from the first Distributed Acoustic Sensing (DAS) experiment in a deep ice-core borehole in the onset region of the Northeast Greenland Ice Stream. A series of active surface sources produ… ▽ More Ice streams are major contributors to ice sheet mass loss and sea level rise. Effects of their dynamic behaviour are imprinted into seismic properties, such as wave speeds and anisotropy. Here we present results from the first Distributed Acoustic Sensing (DAS) experiment in a deep ice-core borehole in the onset region of the Northeast Greenland Ice Stream. A series of active surface sources produced clear recordings of the P and S wavefield, including internal reflections, along a 1500 m long fibre-optic cable that was lowered into the borehole. The combination of nonlinear traveltime tomography with a firn model constrained by multi-mode surface wave data, allows us to invert for P and S wave speeds with depth-dependent uncertainties on the order of only 10 m$/$s, and vertical resolution of 20--70 m. The wave speed model in conjunction with the regularly spaced DAS data enable a straightforward separation of internal upward reflections followed by a reverse-time migration that provides a detailed reflectivity image of the ice. While the differences between P and S wave speeds hint at anisotropy related to crystal orientation fabric, the reflectivity image seems to carry a pronounced climatic imprint caused by rapid variations in grain size. Currently, resolution is not limited by the DAS channel spacing. Instead, the maximum frequency of body waves below $\sim$200 Hz, low signal-to-noise ratio caused by poor coupling, and systematic errors produced by the ray approximation, appear to be the leading-order issues. Among these, only the latter has a simple existing solution in the form of full-waveform inversion. Improving signal bandwidth and quality, however, will likely require a significantly larger effort in terms of both sensing equipment and logistics. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 15 pages, 14 figures

arXiv:2307.01121 [pdf, other]

Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Authors: Federico Rollo, Gennaro Raiola, Andrea Zunino, Nikolaos Tsagarakis, Arash Ajoudani

Abstract: Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under constructio… ▽ More Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach. △ Less

Submitted 21 November, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted to the 11th European Conference on Mobile Robots (ECMR) 2023

arXiv:2303.10047 [pdf, other]

doi 10.1093/gji/ggad403

HMCLab: a framework for solving diverse geophysical inverse problems using the Hamiltonian Monte Carlo method

Authors: Andrea Zunino, Lars Gebraad, Alessandro Ghirotto, Andreas Fichtner

Abstract: The use of the probabilistic approach to solve inverse problems is becoming more popular in the geophysical community, thanks to its ability to address nonlinear forward problems and to provide uncertainty quantification. However, such strategy is often tailored to specific applications and therefore there is a lack of a common platform for solving a range of different geophysical inverse problems… ▽ More The use of the probabilistic approach to solve inverse problems is becoming more popular in the geophysical community, thanks to its ability to address nonlinear forward problems and to provide uncertainty quantification. However, such strategy is often tailored to specific applications and therefore there is a lack of a common platform for solving a range of different geophysical inverse problems and showing potential and pitfalls. We demonstrate a common framework to solve such inverse problems ranging from, e.g, earthquake source location to potential field data inversion and seismic tomography. Within this approach, we can provide probabilities related to certain properties or structures of the subsurface. Thanks to its ability to address high-dimensional problems, the Hamiltonian Monte Carlo (HMC) algorithm has emerged as the state-of-the-art tool for solving geophysical inverse problems within the probabilistic framework. HMC requires the computation of gradients, which can be obtained by adjoint methods, making the solution of tomographic problems ultimately feasible. These results can be obtained with "HMCLab", a tool for solving a range of different geophysical inverse problems using sampling methods, focusing in particular on the HMC algorithm. HMCLab consists of a set of samplers and a set of geophysical forward problems. For each problem its misfit function and gradient computation are provided and, in addition, a set of prior models can be combined to inject additional information into the inverse problem. This allows users to experiment with probabilistic inverse problems and also address real-world studies. We show how to solve a selected set of problems within this framework using variants of the HMC algorithm and analyze the results. HMCLab is provided as an open source package written both in Python and Julia, welcoming contributions from the community. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 21 pages, 4 figures

Journal ref: Geophysical Journal International, Volume 235, Issue 3, December 2023, Pages 2979-2991

arXiv:2211.12510 [pdf, other]

doi 10.1088/1361-6420/accdc5

Reconstructing the Image Scanning Microscopy Dataset: an Inverse Problem

Authors: Alessandro Zunino, Marco Castello, Giuseppe Vicidomini

Abstract: Confocal laser-scanning microscopy (CLSM) is one of the most popular optical architectures for fluorescence imaging. In CLSM, a focused laser beam excites the fluorescence emission from a specific specimen position. Some actuators scan the probed region across the sample and a photodetector collects a single intensity value for each scan point, building a two-dimensional image pixel-by-pixel. Rece… ▽ More Confocal laser-scanning microscopy (CLSM) is one of the most popular optical architectures for fluorescence imaging. In CLSM, a focused laser beam excites the fluorescence emission from a specific specimen position. Some actuators scan the probed region across the sample and a photodetector collects a single intensity value for each scan point, building a two-dimensional image pixel-by-pixel. Recently, new fast single-photon array detectors have allowed the recording of a full bi-dimensional image of the probed region for each scan point, transforming CLSM into image scanning microscopy (ISM). This latter offers significant improvements over traditional imaging but requires an optimal processing tool to extract a super-resolved image from the four-dimensional dataset. Here we describe the image formation process in ISM from a statistical point of view, and we use the Bayesian framework to formulate a multi-image deconvolution problem. Notably, the single-photon detector suffers exclusively from the photon shot noise, enabling the development of an effective likelihood model. We derive an iterative likelihood maximization algorithm and test it on experimental and simulated data. Furthermore, we demonstrate that the ISM dataset is redundant, enabling the possibility of obtaining reconstruction sampled at twice the scanning step. Our results prove that in ISM, under appropriate conditions, the Nyquist-Shannon sampling criterium is effectively relaxed. This finding can be exploited to speed up the acquisition process by a factor of four, further improving the versatility of ISM systems. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2201.09509 [pdf, other]

Diffuse ultrasound computed tomography for medical imaging

Authors: Ines Elisa Ulrich, Christian Boehm, Andrea Zunino, Cyrill Bösch, Andreas Fichtner

Abstract: An alternative approach to ultrasound computed tomography (USCT) for medical imaging is proposed, with the intent to (i) shorten acquisition time for devices with a large number of emitters, (ii) eliminate the calibration step, and (iii) suppress instrument noise. Inspired by seismic ambient field interferometry, the method rests on the active excitation of diffuse ultrasonic wavefields and the ex… ▽ More An alternative approach to ultrasound computed tomography (USCT) for medical imaging is proposed, with the intent to (i) shorten acquisition time for devices with a large number of emitters, (ii) eliminate the calibration step, and (iii) suppress instrument noise. Inspired by seismic ambient field interferometry, the method rests on the active excitation of diffuse ultrasonic wavefields and the extraction of deterministic travel time information by inter-station correlation. To reduce stochastic errors and accelerate convergence, ensemble interferograms are obtained by phase-weighted stacking of observed and computed correlograms, generated with identical realizations of random sources. Mimicking a breast imaging setup, the accuracy of the travel time measurements as a function of the number of emitters and random realizations can be assessed both analytically and with spectral-element simulations for realistic breast phantoms. The results warrant tomographic reconstructions with straight- or bent-ray approaches, where the effect of inherent stochastic fluctuations can be made significantly smaller than the effect of subjective choices on regularisation. This work constitutes a first conceptual study and a necessary prelude to future implementations. △ Less

Submitted 24 January, 2022; originally announced January 2022.

Comments: 18 pages, 9 Figures

MSC Class: 86-10; 86A22

arXiv:2104.09191 [pdf, other]

Compact CNN Structure Learning by Knowledge Distillation

Authors: Waqar Ahmed, Andrea Zunino, Pietro Morerio, Vittorio Murino

Abstract: The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wis… ▽ More The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure while preserving better control over the compression-performance tradeoff. Considering specific resource constraints, e.g., floating-point operations per inference (FLOPs) or model-parameters, our method results in a state of the art network compression while being capable of achieving better inference accuracy. In a comprehensive evaluation, we demonstrate that our method is effective, robust, and consistent with results over a variety of network architectures and datasets, at negligible training overhead. In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression in terms of FLOPs and model-parameters, respectively, while getting 1.05% better model performance than the baseline network. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: This paper has been accepted to ICPR 2020

arXiv:2005.14398 [pdf, other]

doi 10.1093/gji/ggab173

Informed Proposal Monte Carlo

Authors: Sarouyeh Khoshkholgh, Andrea Zunino, Klaus Mosegaard

Abstract: Any search or sampling algorithm for solution of inverse problems needs guidance to be efficient. Many algorithms collect and apply information about the problem on the fly, and much improvement has been made in this way. However, as a consequence of the the No-Free-Lunch Theorem, the only way we can ensure a significantly better performance of search and sampling algorithms is to build in as much… ▽ More Any search or sampling algorithm for solution of inverse problems needs guidance to be efficient. Many algorithms collect and apply information about the problem on the fly, and much improvement has been made in this way. However, as a consequence of the the No-Free-Lunch Theorem, the only way we can ensure a significantly better performance of search and sampling algorithms is to build in as much information about the problem as possible. In the special case of Markov Chain Monte Carlo sampling (MCMC) we review how this is done through the choice of proposal distribution, and we show how this way of adding more information about the problem can be made particularly efficient when based on an approximate physics model of the problem. A highly nonlinear inverse scattering problem with a high-dimensional model space serves as an illustration of the gain of efficiency through this approach. △ Less

Submitted 29 May, 2020; originally announced May 2020.

arXiv:2003.06498 [pdf, other]

Explainable Deep Classification Models for Domain Generalization

Authors: Andrea Zunino, Sarah Adel Bargal, Riccardo Volpi, Mehrnoosh Sameki, Jianming Zhang, Stan Sclaroff, Vittorio Murino, Kate Saenko

Abstract: Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in… ▽ More Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in the form of a saliency map conveying how much each pixel contributed to the network's decision. Our training strategy enforces a periodic saliency-based feedback to encourage the model to focus on the image regions that directly correspond to the ground-truth object. We quantify explainability using an automated metric, and using human judgement. We propose explainability as a means for bridging the visual-semantic gap between different domains where model explanations are used as a means of disentagling domain specific information from otherwise relevant features. We demonstrate that this leads to improved generalization to new domains without hindering performance on the original domain. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:1907.08498 [pdf, other]

Dynamic multi-focus laser writing with acousto-optofluidics

Authors: A. Zunino, S. Surdo, M. Duocastella

Abstract: Laser writing of materials is normally performed by the sequential scanning of a single focused beam across a sample. This process is time-consuming and it can severely limit the throughput of laser systems in key applications such as surgery, microelectronics, or manufacturing. Here we report a parallelization strategy based on ultrasound waves in a liquid to diffract light into multiple beamlets… ▽ More Laser writing of materials is normally performed by the sequential scanning of a single focused beam across a sample. This process is time-consuming and it can severely limit the throughput of laser systems in key applications such as surgery, microelectronics, or manufacturing. Here we report a parallelization strategy based on ultrasound waves in a liquid to diffract light into multiple beamlets. Adjusting amplitude, frequency, or phase of ultrasound allows tunable multi-focus distributions with sub-microsecond control. When combined with sample translation, the dynamic splitting of light leads to high-throughput laser processing, as demonstrated by locally modifying the morphological and wettability properties of metals, polymers, and ceramics. The results illustrate how acousto-optofluidic systems are universal tools for fast multi-focus generation, with potential impact in fields such as imaging or optical trapping. △ Less

Submitted 19 July, 2019; originally announced July 2019.

arXiv:1812.02626 [pdf, other]

Guided Zoom: Questioning Network Evidence for Fine-grained Classification

Authors: Sarah Adel Bargal, Andrea Zunino, Vitali Petsiuk, Jianming Zhang, Kate Saenko, Vittorio Murino, Stan Sclaroff

Abstract: We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions. It does so by making sure the model has "the right reasons" for a prediction, defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep convolutional neural network makes a prediction is def… ▽ More We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions. It does so by making sure the model has "the right reasons" for a prediction, defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep convolutional neural network makes a prediction is defined to be the spatial grounding, in the pixel space, for a specific class conditional probability in the model output. Guided Zoom examines how reasonable such evidence is for each of the top-k predicted classes, rather than solely trusting the top-1 prediction. We show that Guided Zoom improves the classification accuracy of a deep convolutional neural network model and obtains state-of-the-art results on three fine-grained classification benchmark datasets. △ Less

Submitted 23 March, 2020; v1 submitted 6 December, 2018; originally announced December 2018.

Comments: BMVC 2019 Camera Ready Version

arXiv:1808.10492 [pdf, other]

Towards a Service-oriented Platform for Intelligent Apps in Intermediate Cities

Authors: J. Andres Diaz-Pace, Luis Berdun, Alejandro Zunino, Silvia Schiaffino

Abstract: Smart cities are a growing trend in many cities in Argentina. In particular, the so-called intermediate cities present a context and requirements different from those of large cities with respect to smart cities. One aspect of relevance is to encourage the development of applications (generally for mobile devices) that enable citizens to take advantage of data and services normally associated with… ▽ More Smart cities are a growing trend in many cities in Argentina. In particular, the so-called intermediate cities present a context and requirements different from those of large cities with respect to smart cities. One aspect of relevance is to encourage the development of applications (generally for mobile devices) that enable citizens to take advantage of data and services normally associated with the city, for example, in the urban mobility domain. In this work, a platform is proposed for intermediate cities that provide "high level" services and that allow the construction of software applications that consume those services. Our platform-centric strategy focused aims to integrate systems and heterogeneous data sources, and provide "intelligent" services to different applications. Examples of these services include: construction of user profiles, recommending local events, and collaborative sensing based on data mining techniques, among others. In this work, the design of this platform (currently in progress) is described, and experiences of applications for urban mobility are discussed, which are being migrated in the form of reusable services provided by the platform △ Less

Submitted 27 August, 2018; originally announced August 2018.

Comments: in Spanish, Accepted at ICSC-CITIES 2018

arXiv:1805.09092 [pdf, other]

Excitation Dropout: Encouraging Plasticity in Deep Neural Networks

Authors: Andrea Zunino, Sarah Adel Bargal, Pietro Morerio, Jianming Zhang, Stan Sclaroff, Vittorio Murino

Abstract: We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction defined as the firing of neurons in specific paths. In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout. In essence, we dropout with higher probability those neurons which contri… ▽ More We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction defined as the firing of neurons in specific paths. In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout. In essence, we dropout with higher probability those neurons which contribute more to decision making at training time. This approach penalizes high saliency neurons that are most relevant for model prediction, i.e. those having stronger evidence. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization, resulting in a plasticity-like behavior, a characteristic of human brains too. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression using several metrics over four image/video recognition benchmarks. △ Less

Submitted 21 January, 2021; v1 submitted 23 May, 2018; originally announced May 2018.

Comments: This work is published in the International Journal of Computer Vision (IJCV) in 2021

arXiv:1711.06778 [pdf, other]

Excitation Backprop for RNNs

Authors: Sarah Adel Bargal, Andrea Zunino, Donghyun Kim, Jianming Zhang, Vittorio Murino, Stan Sclaroff

Abstract: Deep models are state-of-the-art for many vision tasks including video action recognition and video captioning. Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions. Grounding decisions made by deep networks has been studied in spatial visual content, giving more insight into model predictions for images. However, such stu… ▽ More Deep models are state-of-the-art for many vision tasks including video action recognition and video captioning. Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions. Grounding decisions made by deep networks has been studied in spatial visual content, giving more insight into model predictions for images. However, such studies are relatively lacking for models of spatiotemporal visual content - videos. In this work, we devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep model's classification/captioning output using the model's internal representation. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks. △ Less

Submitted 8 March, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: CVPR 2018 Camera Ready Version

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

arXiv:1708.01034 [pdf, other]

doi 10.1109/CVPRW.2017.7

What Will I Do Next? The Intention from Motion Experiment

Authors: Andrea Zunino, Jacopo Cavazza, Atesh Koul, Andrea Cavallo, Cristina Becchio, Vittorio Murino

Abstract: In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with differ… ▽ More In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems. In this paper, we bridge cognitive and computer vision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside). We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computer vision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction. △ Less

Submitted 3 August, 2017; originally announced August 2017.

Comments: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops

arXiv:1605.09526 [pdf, other]

Predicting Human Intentions from Motion Only: A 2D+3D Fusion Approach

Authors: Andrea Zunino, Jacopo Cavazza, Atesh Koul, Andrea Cavallo, Cristina Becchio, Vittorio Murino

Abstract: In this paper, we address the new problem of the prediction of human intents. There is neuro-psychological evidence that actions performed by humans are anticipated by peculiar motor acts which are discriminant of the type of action going to be performed afterwards. In other words, an actual intent can be forecast by looking at the kinematics of the immediately preceding movement. To prove it in a… ▽ More In this paper, we address the new problem of the prediction of human intents. There is neuro-psychological evidence that actions performed by humans are anticipated by peculiar motor acts which are discriminant of the type of action going to be performed afterwards. In other words, an actual intent can be forecast by looking at the kinematics of the immediately preceding movement. To prove it in a computational and quantitative manner, we devise a new experimental setup where, without using contextual information, we predict human intents all originating from the same motor act. We posit the problem as a classification task and we introduce a new multi-modal dataset consisting of a set of motion capture marker 3D data and 2D video sequences, where, by only analysing very similar movements in both training and test phases, we are able to predict the underlying intent, i.e., the future, never observed action. We also present an extensive experimental evaluation as a baseline, customizing state-of-the-art techniques for either 3D and 2D data analysis. Realizing that video processing methods lead to inferior performance but show complementary information with respect to 3D data sequences, we developed a 2D+3D fusion analysis where we achieve better classification accuracies, attesting the superiority of the multimodal approach for the context-free prediction of human intents. △ Less

Submitted 6 September, 2017; v1 submitted 31 May, 2016; originally announced May 2016.

Comments: accepted as poster at the 25th ACM Multimedia (ACM MM) 2017, Mountain View, California, USA

arXiv:1605.00392 [pdf, other]

Revisiting Human Action Recognition: Personalization vs. Generalization

Authors: Andrea Zunino, Jacopo Cavazza, Vittorio Murino

Abstract: By thoroughly revisiting the classic human action recognition paradigm, this paper aims at proposing a new approach for the design of effective action classification systems. Taking as testbed publicly available three-dimensional (MoCap) action/activity datasets, we analyzed and validated different training/testing strategies. In particular, considering that each human action in the datasets is pe… ▽ More By thoroughly revisiting the classic human action recognition paradigm, this paper aims at proposing a new approach for the design of effective action classification systems. Taking as testbed publicly available three-dimensional (MoCap) action/activity datasets, we analyzed and validated different training/testing strategies. In particular, considering that each human action in the datasets is performed several times by different subjects, we were able to precisely quantify the effect of inter- and intra-subject variability, so as to figure out the impact of several learning approaches in terms of classification performance. The net result is that standard testing strategies consisting in cross-validating the algorithm using typical splits of the data (holdout, k-fold, or one-subject-out) is always outperformed by a "personalization" strategy which learns how a subject is performing an action. In other words, it is advantageous to customize (i.e., personalize) the method to learn the actions carried out by each subject, rather than trying to generalize the actions executions across subjects. Consequently, we finally propose an action recognition framework consisting of a two-stage classification approach where, given a test action, the subject is first identified before the actual recognition of the action takes place. Despite the basic, off-the-shelf descriptors and standard classifiers adopted, we noted a relevant increase in performance with respect to standard state-of-the-art algorithms, so motivating the usage of personalized approaches for designing effective action recognition systems. △ Less

Submitted 2 May, 2016; originally announced May 2016.

arXiv:1604.06582 [pdf, other]

Kernelized Covariance for Action Recognition

Authors: Jacopo Cavazza, Andrea Zunino, Marco San Biagio, Vittorio Murino

Abstract: In this paper we aim at increasing the descriptive power of the covariance matrix, limited in capturing linear mutual dependencies between variables only. We present a rigorous and principled mathematical pipeline to recover the kernel trick for computing the covariance matrix, enhancing it to model more complex, non-linear relationships conveyed by the raw data. To this end, we propose Kernelized… ▽ More In this paper we aim at increasing the descriptive power of the covariance matrix, limited in capturing linear mutual dependencies between variables only. We present a rigorous and principled mathematical pipeline to recover the kernel trick for computing the covariance matrix, enhancing it to model more complex, non-linear relationships conveyed by the raw data. To this end, we propose Kernelized-COV, which generalizes the original covariance representation without compromising the efficiency of the computation. In the experiments, we validate the proposed framework against many previous approaches in the literature, scoring on par or superior with respect to the state of the art on benchmark datasets for 3D action recognition. △ Less

Submitted 2 September, 2016; v1 submitted 22 April, 2016; originally announced April 2016.

Comments: Accepted paper at the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016

arXiv:cs/0007004 [pdf, ps, other]

Brainstorm/J: a Java Framework for Intelligent Agents

Authors: Alejandro Zunino, Analia Amandi

Abstract: Despite the effort of many researchers in the area of multi-agent systems (MAS) for designing and programming agents, a few years ago the research community began to take into account that common features among different MAS exists. Based on these common features, several tools have tackled the problem of agent development on specific application domains or specific types of agents. As a consequ… ▽ More Despite the effort of many researchers in the area of multi-agent systems (MAS) for designing and programming agents, a few years ago the research community began to take into account that common features among different MAS exists. Based on these common features, several tools have tackled the problem of agent development on specific application domains or specific types of agents. As a consequence, their scope is restricted to a subset of the huge application domain of MAS. In this paper we propose a generic infrastructure for programming agents whose name is Brainstorm/J. The infrastructure has been implemented as an object oriented framework. As a consequence, our approach supports a broader scope of MAS applications than previous efforts, being flexible and reusable. △ Less

Submitted 4 July, 2000; originally announced July 2000.

Comments: 15 pages. To be published in Proceedings of the Second Argentinian Symposium on Artificial Intelligence (ASAI'2000 - 29th JAIIO). September 2000. Tandil, Buenos Aires, Argentina. See http://www.exa.unicen.edu.ar/~azunino

ACM Class: I.2.11

Showing 1–23 of 23 results for author: Zunino, A