Search | arXiv e-print repository

Meta Learning-Driven Iterative Refinement for Robust Anomaly Detection in Industrial Inspection

Authors: Muhammad Aqeel, Shakiba Sharifi, Marco Cristani, Francesco Setti

Abstract: This study investigates the performance of robust anomaly detection models in industrial inspection, focusing particularly on their ability to handle noisy data. We propose to leverage the adaptation ability of meta learning approaches to identify and reject noisy training data to improve the learning process. In our model, we employ Model Agnostic Meta Learning (MAML) and an iterative refinement… ▽ More This study investigates the performance of robust anomaly detection models in industrial inspection, focusing particularly on their ability to handle noisy data. We propose to leverage the adaptation ability of meta learning approaches to identify and reject noisy training data to improve the learning process. In our model, we employ Model Agnostic Meta Learning (MAML) and an iterative refinement process through an Inter-Quartile Range rejection scheme to enhance their adaptability and robustness. This approach significantly improves the models capability to distinguish between normal and defective conditions. Our results of experiments conducted on well known MVTec and KSDD2 datasets demonstrate that the proposed method not only excels in environments with substantial noise but can also contribute in case of a clear training set, isolating those samples that are relatively out of distribution, thus offering significant improvements over traditional models. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: Accepted in the VISION workshop at ECCV 2024

arXiv:2411.13953 [pdf, other]

Material synthesis through simulations guided by machine learning: a position paper

Authors: Usman Syed, Federico Cunico, Uzair Khan, Eros Radicchi, Francesco Setti, Adolfo Speghini, Paolo Marone, Filiberto Semenzin, Marco Cristani

Abstract: In this position paper, we propose an approach for sustainable data collection in the field of optimal mix design for marble sludge reuse. Marble sludge, a calcium-rich residual from stone-cutting processes, can be repurposed by mixing it with various ingredients. However, determining the optimal mix design is challenging due to the variability in sludge composition and the costly, time-consuming… ▽ More In this position paper, we propose an approach for sustainable data collection in the field of optimal mix design for marble sludge reuse. Marble sludge, a calcium-rich residual from stone-cutting processes, can be repurposed by mixing it with various ingredients. However, determining the optimal mix design is challenging due to the variability in sludge composition and the costly, time-consuming nature of experimental data collection. Also, we investigate the possibility of using machine learning models using meta-learning as an optimization tool to estimate the correct quantity of stone-cutting sludge to be used in aggregates to obtain a mix design with specific mechanical properties that can be used successfully in the building industry. Our approach offers two key advantages: (i) through simulations, a large dataset can be generated, saving time and money during the data collection phase, and (ii) Utilizing machine learning models, with performance enhancement through hyper-parameter optimization via meta-learning, to estimate optimal mix designs reducing the need for extensive manual experimentation, lowering costs, minimizing environmental impact, and accelerating the processing of quarry sludge. Our idea promises to streamline the marble sludge reuse process by leveraging collective data and advanced machine learning, promoting sustainability and efficiency in the stonecutting sector. △ Less

Submitted 26 November, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

arXiv:2408.11561 [pdf, other]

doi 10.5220/0013178100003912

Self-Supervised Iterative Refinement for Anomaly Detection in Industrial Quality Control

Authors: Muhammad Aqeel, Shakiba Sharifi, Marco Cristani, Francesco Setti

Abstract: This study introduces the Iterative Refinement Process (IRP), a robust anomaly detection methodology designed for high-stakes industrial quality control. The IRP enhances defect detection accuracy through a cyclic data refinement strategy, iteratively removing misleading data points to improve model performance and robustness. We validate the IRP's effectiveness using two benchmark datasets, Kolek… ▽ More This study introduces the Iterative Refinement Process (IRP), a robust anomaly detection methodology designed for high-stakes industrial quality control. The IRP enhances defect detection accuracy through a cyclic data refinement strategy, iteratively removing misleading data points to improve model performance and robustness. We validate the IRP's effectiveness using two benchmark datasets, Kolektor SDD2 (KSDD2) and MVTec AD, covering a wide range of industrial products and defect types. Our experimental results demonstrate that the IRP consistently outperforms traditional anomaly detection models, particularly in environments with high noise levels. This study highlights the IRP's potential to significantly enhance anomaly detection processes in industrial settings, effectively managing the challenges of sparse and noisy data. △ Less

Submitted 3 March, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: Accepted to VISAPP 2025

Journal ref: In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP, ISBN 978-989-758-728-3, ISSN 2184-4321, pages 173-183 (2025)

arXiv:2407.11763 [pdf, other]

Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

Authors: Luigi Capogrosso, Enrico Fraccaroli, Giulio Petrozziello, Francesco Setti, Samarjit Chakraborty, Franco Fummi, Marco Cristani

Abstract: In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices availab… ▽ More In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices available today to solve real-time and real-world tasks. This paper introduces a novel approach to address this challenge by combining the concept of predefined sparsity with Split Computing (SC) and Early Exit (EE). In particular, SC aims at splitting a DNN with a part of it deployed on an edge device and the rest on a remote server. Instead, EE allows the system to stop using the remote server and rely solely on the edge device's computation if the answer is already good enough. Specifically, how to apply such a predefined sparsity to a SC and EE paradigm has never been studied. This paper studies this problem and shows how predefined sparsity significantly reduces the computational, storage, and energy burdens during the training and inference phases, regardless of the hardware platform. This makes it a valuable approach for enhancing the performance of SC and EE applications. Experimental results showcase reductions exceeding 4x in storage and computational complexity without compromising performance. The source code is available at https://github.com/intelligolabs/sparsity_sc_ee. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted at the 27th Forum on specification and Design Languages (FDL 2024)

arXiv:2407.03961 [pdf, other]

Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection

Authors: Federico Girella, Ziyue Liu, Franco Fummi, Francesco Setti, Marco Cristani, Luigi Capogrosso

Abstract: Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal s… ▽ More Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify what a defect looks like. In this work, we introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation. Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop, facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner, avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset, with an improvement in AP of approximately 18% when positive samples are available and 28% when they are missing. The source code is available at https://github.com/intelligolabs/DIAG. △ Less

Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted at the 21st International Conference on Content-Based Multimedia Indexing (CBMI 2024)

arXiv:2406.00501 [pdf, other]

Diffusion-based Image Generation for In-distribution Data Augmentation in Surface Defect Detection

Authors: Luigi Capogrosso, Federico Girella, Francesco Taioli, Michele Dalla Chiara, Muhammad Aqeel, Franco Fummi, Francesco Setti, Marco Cristani

Abstract: In this study, we show that diffusion models can be used in industrial scenarios to improve the data augmentation procedure in the context of surface defect detection. In general, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. For these reas… ▽ More In this study, we show that diffusion models can be used in industrial scenarios to improve the data augmentation procedure in the context of surface defect detection. In general, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. For these reasons, state-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples. This leads to out-of-distribution augmented data so that the classification system learns what is not a normal sample but does not know what a defect really is. We show that diffusion models overcome this situation, providing more realistic in-distribution defects so that the model can learn the defect's genuine appearance. We propose a novel approach for data augmentation that mixes out-of-distribution with in-distribution samples, which we call In&Out. The approach can deal with two data augmentation setups: i) when no defects are available (zero-shot data augmentation) and ii) when defects are available, which can be in a small number (few-shot) or a large one (full-shot). We focus the experimental part on the most challenging benchmark in the state-of-the-art, i.e., the Kolektor Surface-Defect Dataset 2, defining the new state-of-the-art classification AP score under weak supervision of .782. The code is available at https://github.com/intelligolabs/in_and_out. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: Accepted at the 19th International Conference on Computer Vision Theory and Applications (VISAPP 2024)

arXiv:2312.17335 [pdf, other]

doi 10.1103/PhysRevD.111.046004

Quantum-gravitational noise correlation in nearby detectors

Authors: Maulik Parikh, Francesco Setti

Abstract: We consider quantum gravity fluctuations in a pair of nearby gravitational wave detectors. Quantum fluctuations of long-wavelength modes of the gravitational field induce coherent fluctuations in the detectors, leading to correlated noise. We determine the variance and covariance in the lengths of the arms of the detectors, and thereby obtain the graviton noise correlation. We find that the correl… ▽ More We consider quantum gravity fluctuations in a pair of nearby gravitational wave detectors. Quantum fluctuations of long-wavelength modes of the gravitational field induce coherent fluctuations in the detectors, leading to correlated noise. We determine the variance and covariance in the lengths of the arms of the detectors, and thereby obtain the graviton noise correlation. We find that the correlation depends on the angle between the detector arms as well as their separation distance. Using our result, we propose an experimental setup to detect this noise. The suggested interferometer configuration can be used to distinguish quantum-gravitational noise from both correlated and uncorrelated sources of background noise. △ Less

Submitted 8 February, 2025; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 10 pages, 4 figures, LaTeX

Journal ref: Phys.Rev.D 111 (2025) 4, 046004

arXiv:2312.17214 [pdf, other]

doi 10.1007/JHEP07(2024)214

Quantum-Gravitational Null Raychaudhuri Equation

Authors: Sang-Eon Bak, Maulik Parikh, Sudipta Sarkar, Francesco Setti

Abstract: We consider a congruence of null geodesics in the presence of a quantized spacetime metric. The coupling to a quantum metric induces fluctuations in the congruence; we calculate the change in the area of a pencil of geodesics induced by such fluctuations. For the gravitational field in its vacuum state, we find that quantum gravity contributes a correction to the null Raychaudhuri equation which i… ▽ More We consider a congruence of null geodesics in the presence of a quantized spacetime metric. The coupling to a quantum metric induces fluctuations in the congruence; we calculate the change in the area of a pencil of geodesics induced by such fluctuations. For the gravitational field in its vacuum state, we find that quantum gravity contributes a correction to the null Raychaudhuri equation which is of the same sign as the classical terms. We thus derive a quantum-gravitational focusing theorem valid for linearized quantum gravity. △ Less

Submitted 25 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 15 pages, 1 figure, v2. published version in JHEP

Journal ref: JHEP 07 (2024) 214

arXiv:2309.02887 [pdf, other]

doi 10.1007/978-3-031-43153-1_15

A deep Natural Language Inference predictor without language-specific training data

Authors: Lorenzo Corradi, Alessandro Manenti, Francesca Del Bonifro, Francesco Setti, Dario Del Sorbo

Abstract: In this paper we present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset. We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model - the first to generate sentence embeddings for the source language, and the second… ▽ More In this paper we present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset. We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model - the first to generate sentence embeddings for the source language, and the second fine-tuned over the target language to mimic the first. This technique is known as Knowledge Distillation. The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset. We also test the proposed architecture over different tasks to empirically demonstrate the generality of the NLI task. The model has been evaluated over the native Italian ABSITA dataset, on the tasks of Sentiment Analysis, Aspect-Based Sentiment Analysis, and Topic Recognition. We emphasise the generality and exploitability of the Knowledge Distillation technique that outperforms other methodologies based on machine translation, even though the former was not directly trained on the data it was tested over. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: Conference: ICIAP2023

arXiv:2308.00519 [pdf, other]

doi 10.3389/fcomp.2023.1153160

Markerless human pose estimation for biomedical applications: a survey

Authors: Andrea Avogaro, Federico Cunico, Bodo Rosenhahn, Francesco Setti

Abstract: Markerless Human Pose Estimation (HPE) proved its potential to support decision making and assessment in many fields of application. HPE is often preferred to traditional marker-based Motion Capture systems due to the ease of setup, portability, and affordable cost of the technology. However, the exploitation of HPE in biomedical applications is still under investigation. This review aims to provi… ▽ More Markerless Human Pose Estimation (HPE) proved its potential to support decision making and assessment in many fields of application. HPE is often preferred to traditional marker-based Motion Capture systems due to the ease of setup, portability, and affordable cost of the technology. However, the exploitation of HPE in biomedical applications is still under investigation. This review aims to provide an overview of current biomedical applications of HPE. In this paper, we examine the main features of HPE approaches and discuss whether or not those features are of interest to biomedical applications. We also identify those areas where HPE is already in use and present peculiarities and trends followed by researchers and practitioners. We include here 25 approaches to HPE and more than 40 studies of HPE applied to motor development assessment, neuromuscolar rehabilitation, and gait & posture analysis. We conclude that markerless HPE offers great potential for extending diagnosis and rehabilitation outside hospitals and clinics, toward the paradigm of remote medical care. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Journal ref: Frontiers in Computer Science 5, (2023): 1153160

arXiv:2303.03155 [pdf, other]

doi 10.1109/TPAMI.2024.3451994

Unsupervised Active Visual Search with Monte Carlo planning under Uncertain Detections

Authors: Francesco Taioli, Francesco Giuliari, Yiming Wang, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

Abstract: We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor m… ▽ More We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor map is updated according to an intuitive mechanism, while an improved belief update increases the effectiveness of the agent's exploration. (iii) We incorporate the awareness that an object detector may fail into the aforementioned probability modelling by exploiting the success statistics of a specific detector. Our solution is dubbed POMP-BE-PD (Pomcp-based Online Motion Planning with Belief by Exploration and Probabilistic Detection). It uses the current pose of an agent and an RGB-D observation to learn an optimal search policy, exploiting a POMDP solved by a Monte-Carlo planning approach. On the Active Vision Database benchmark, we increase the average success rate over all the environments by a significant 35% while decreasing the average path length by 4% with respect to competing methods. Thus, our results are state-of-the-art, even without using any training procedure. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 12 pages,8 figures. Submitted for review at IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: text overlap with arXiv:2009.08140

arXiv:2212.14010 [pdf, other]

doi 10.1007/JHEP05(2023)125

Quantum Gravity Fluctuations in the Timelike Raychaudhuri Equation

Authors: Sang-Eon Bak, Maulik Parikh, Sudipta Sarkar, Francesco Setti

Abstract: We consider a timelike geodesic congruence in the presence of perturbative quantum fluctuations of the spacetime metric. We calculate the change in the volume of a bundle of geodesics due to such fluctuations and thereby obtain a quantum-gravitationally modified timelike Raychaudhuri equation. Quantum gravity generically increases the convergence of congruences and the production of caustics. We consider a timelike geodesic congruence in the presence of perturbative quantum fluctuations of the spacetime metric. We calculate the change in the volume of a bundle of geodesics due to such fluctuations and thereby obtain a quantum-gravitationally modified timelike Raychaudhuri equation. Quantum gravity generically increases the convergence of congruences and the production of caustics. △ Less

Submitted 25 May, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 12 pages, 2 figures; v2. published version

Journal ref: JHEP 05 (2023) 125

arXiv:2209.11607 [pdf, other]

doi 10.1109/ICPR56361.2022.9956625

I-SPLIT: Deep Network Interpretability for Split Computing

Authors: Federico Cunico, Luigi Capogrosso, Francesco Setti, Damiano Carra, Franco Fummi, Marco Cristani

Abstract: This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server. So far, potential split locations have been identified exploiting uniquely architectural aspects, i.e., based on the layer sizes. Under this paradigm, the efficacy of the split in terms of accuracy can be evaluated only… ▽ More This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server. So far, potential split locations have been identified exploiting uniquely architectural aspects, i.e., based on the layer sizes. Under this paradigm, the efficacy of the split in terms of accuracy can be evaluated only after having performed the split and retrained the entire pipeline, making an exhaustive evaluation of all the plausible splitting points prohibitive in terms of time. Here we show that not only the architecture of the layers does matter, but the importance of the neurons contained therein too. A neuron is important if its gradient with respect to the correct class decision is high. It follows that a split should be applied right after a layer with a high density of important neurons, in order to preserve the information flowing until then. Upon this idea, we propose Interpretable Split (I-SPLIT): a procedure that identifies the most suitable splitting points by providing a reliable prediction on how well this split will perform in terms of classification accuracy, beforehand of its effective implementation. As a further major contribution of I-SPLIT, we show that the best choice for the splitting point on a multiclass categorization problem depends also on which specific classes the network has to deal with. Exhaustive experiments have been carried out on two networks, VGG16 and ResNet-50, and three datasets, Tiny-Imagenet-200, notMNIST, and Chest X-Ray Pneumonia. The source code is available at https://github.com/vips4/I-Split. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: ICPR 2022

arXiv:2208.07308 [pdf, other]

Pose Forecasting in Industrial Human-Robot Collaboration

Authors: Alessio Sampieri, Guido D'Amely, Andrea Avogaro, Federico Cunico, Geri Skenderi, Francesco Setti, Marco Cristani, Fabio Galasso

Abstract: Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it… ▽ More Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it only uses 1.72% of the parameters and it is ~4 times faster, while still performing comparably in forecasting accuracy on Human3.6M at 1 second in the future, which enables cobots to be aware of human operators. As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO). CHICO includes multi-view videos, 3D poses and trajectories of 20 human operators and cobots, engaging in 7 realistic industrial actions. Additionally, it reports 226 genuine collisions, taking place during the human-cobot interaction. We test SeS-GCN on CHICO for two important perception tasks in robotics: human pose forecasting, where it reaches an average error of 85.3 mm (MPJPE) at 1 sec in the future with a run time of 2.3 msec, and collision detection, by comparing the forecasted human motion with the known cobot motion, obtaining an F1-score of 0.64. △ Less

Submitted 24 July, 2022; originally announced August 2022.

Comments: ECCV 2022

arXiv:2107.00914 [pdf, other]

POMP++: Pomcp-based Active Visual Search in unknown indoor environments

Authors: Francesco Giuliari, Alberto Castellini, Riccardo Berra, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti, Yiming Wang

Abstract: In this paper we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belie… ▽ More In this paper we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belief reinvigoration strategy which allows to use POMCP with a dynamically growing state space to address the online generation of the floor map. We evaluate our method on two public benchmark datasets, AVD that is acquired by real robotic platforms and Habitat ObjectNav that is rendered from real 3D scene scans, achieving the best success rate with an improvement of >10% over the state-of-the-art methods. △ Less

Submitted 5 November, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: Accepted at 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2104.03178 [pdf, other]

The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods

Authors: Vivek Singh Bawa, Gurkirt Singh, Francis KapingA, Inna Skarga-Bandurova, Elettra Oleari, Alice Leporini, Carmela Landolfo, Pengfei Zhao, Xi Xiang, Gongning Luo, Kuanquan Wang, Liangzhi Li, Bowen Wang, Shang Zhao, Li Li, Armando Stabile, Francesco Setti, Riccardo Muradore, Fabio Cuzzolin

Abstract: For an autonomous robotic system, monitoring surgeon actions and assisting the main surgeon during a procedure can be very challenging. The challenges come from the peculiar structure of the surgical scene, the greater similarity in appearance of actions performed via tools in a cavity compared to, say, human actions in unconstrained environments, as well as from the motion of the endoscopic camer… ▽ More For an autonomous robotic system, monitoring surgeon actions and assisting the main surgeon during a procedure can be very challenging. The challenges come from the peculiar structure of the surgical scene, the greater similarity in appearance of actions performed via tools in a cavity compared to, say, human actions in unconstrained environments, as well as from the motion of the endoscopic camera. This paper presents ESAD, the first large-scale dataset designed to tackle the problem of surgeon action detection in endoscopic minimally invasive surgery. ESAD aims at contributing to increase the effectiveness and reliability of surgical assistant robots by realistically testing their awareness of the actions performed by a surgeon. The dataset provides bounding box annotation for 21 action classes on real endoscopic video frames captured during prostatectomy, and was used as the basis of a recent MIDL 2020 challenge. We also present an analysis of the dataset conducted using the baseline model which was released as part of the challenge, and a description of the top performing models submitted to the challenge together with the results they obtained. This study provides significant insight into what approaches can be effective and can be extended further. We believe that ESAD will serve in the future as a useful benchmark for all researchers active in surgeon action detection and assistive robotics at large. △ Less

Submitted 7 April, 2021; originally announced April 2021.

arXiv:2009.08140 [pdf, other]

POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments

Authors: Yiming Wang, Francesco Giuliari, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

Abstract: In this paper we focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent (e.g. a robot) and a RGB-D frame. The task is to plan the next move that brings the agent closer to the target object. We model this problem as a Partially Observable Markov Decisi… ▽ More In this paper we focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent (e.g. a robot) and a RGB-D frame. The task is to plan the next move that brings the agent closer to the target object. We model this problem as a Partially Observable Markov Decision Process solved by a Monte-Carlo planning approach. This allows us to make decisions on the next moves by iterating over the known scenario at hand, exploring the environment and searching for the object at the same time. Differently from the current state of the art in Reinforcement Learning, POMP does not require extensive and expensive (in time and computation) labelled data so being very agile in solving AVS in small and medium real scenarios. We only require the information of the floormap of the environment, an information usually available or that can be easily extracted from an a priori single exploration run. We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1, performing close to the state of the art but without any training needed. Additionally, we show experimentally the robustness of our method when the quality of the object detection goes from ideal to faulty. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: Accepted at BMVC2020

arXiv:2006.07164 [pdf, other]

ESAD: Endoscopic Surgeon Action Detection Dataset

Authors: Vivek Singh Bawa, Gurkirt Singh, Francis KapingA, Inna Skarga-Bandurova, Alice Leporini, Carmela Landolfo, Armando Stabile, Francesco Setti, Riccardo Muradore, Elettra Oleari, Fabio Cuzzolin

Abstract: In this work, we take aim towards increasing the effectiveness of surgical assistant robots. We intended to make assistant robots safer by making them aware about the actions of surgeon, so it can take appropriate assisting actions. In other words, we aim to solve the problem of surgeon action detection in endoscopic videos. To this, we introduce a challenging dataset for surgeon action detection… ▽ More In this work, we take aim towards increasing the effectiveness of surgical assistant robots. We intended to make assistant robots safer by making them aware about the actions of surgeon, so it can take appropriate assisting actions. In other words, we aim to solve the problem of surgeon action detection in endoscopic videos. To this, we introduce a challenging dataset for surgeon action detection in real-world endoscopic videos. Action classes are picked based on the feedback of surgeons and annotated by medical professional. Given a video frame, we draw bounding box around surgical tool which is performing action and label it with action label. Finally, we presenta frame-level action detection baseline model based on recent advances in ob-ject detection. Results on our new dataset show that our presented dataset provides enough interesting challenges for future method and it can serveas strong benchmark corresponding research in surgeon action detection in endoscopic videos. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: In context of SARAS ESAD Challeneg at MIDL

arXiv:2005.06518 [pdf, other]

doi 10.1103/PhysRevD.102.032002

Search for millicharged particles in proton-proton collisions at $\sqrt{s} = 13$ TeV

Authors: A. Ball, G. Beauregard, J. Brooke, C. Campagnari, M. Carrigan, M. Citron, J. De La Haye, A. De Roeck, Y. Elskens, R. Escobar Franco, M. Ezeldine, B. Francis, M. Gastal, M. Ghimire, J. Goldstein, F. Golf, J. Guiang, A. Haas, R. Heller, C. S. Hill, L. Lavezzo, R. Loos, S. Lowette, G. Magill, B. Manley , et al. (13 additional authors not shown)

Abstract: We report on a search for elementary particles with charges much smaller than the electron charge using a data sample of proton-proton collisions provided by the CERN Large Hadron Collider in 2018, corresponding to an integrated luminosity of 37.5 fb$^{-1}$ at a center-of-mass energy of 13 TeV. A prototype scintillator-based detector is deployed to conduct the first search at a hadron collider sen… ▽ More We report on a search for elementary particles with charges much smaller than the electron charge using a data sample of proton-proton collisions provided by the CERN Large Hadron Collider in 2018, corresponding to an integrated luminosity of 37.5 fb$^{-1}$ at a center-of-mass energy of 13 TeV. A prototype scintillator-based detector is deployed to conduct the first search at a hadron collider sensitive to particles with charges ${\leq}0.1e$. The existence of new particles with masses between 20 and 4700 MeV is excluded at 95% confidence level for charges between $0.006e$ and $0.3e$, depending on their mass. New sensitivity is achieved for masses larger than $700$ MeV. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Report number: CERN-EP-2020-072

Journal ref: Phys. Rev. D 102, 032002 (2020)

arXiv:2005.04813 [pdf, other]

The Visual Social Distancing Problem

Authors: Marco Cristani, Alessio Del Bue, Vittorio Murino, Francesco Setti, Alessandro Vinciarelli

Abstract: One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, workplaces, public institutions, transports and schools will likely adopt restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to su… ▽ More One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, workplaces, public institutions, transports and schools will likely adopt restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to such physical constraint in our life, in order to figure out the reasons of the possible breaks of such distance limitations, and understand if this implies a possible threat given the scene context. All of this, complying with privacy policies and making the measurement acceptable. To this end, we introduce the Visual Social Distancing (VSD) problem, defined as the automatic estimation of the inter-personal distance from an image, and the characterization of the related people aggregations. VSD is pivotal for a non-invasive analysis to whether people comply with the SD restriction, and to provide statistics about the level of safety of specific areas whenever this constraint is violated. We then discuss how VSD relates with previous literature in Social Signal Processing and indicate which existing Computer Vision methods can be used to manage such problem. We conclude with future challenges related to the effectiveness of VSD systems, ethical implications and future application scenarios. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: 9 pages, 5 figures. All the authors equally contributed to this manuscript and they are listed by alphabetical order. Under submission

arXiv:1901.02000 [pdf, other]

Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets

Authors: Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Vasileios Belagiannis, Sikandar Amin, Alessio Del Bue, Marco Cristani, Fabio Galasso

Abstract: In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedes… ▽ More In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedestrian interaction, are often combined with tracklets. In this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between positions and head orientations (vislets) thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. We additionally exploit the head orientations as a proxy for the visual attention, when modeling social interactions. MX-LSTM predicts future pedestrians location and head pose, increasing the standard capabilities of the current approaches on long-term trajectory forecasting. Compared to the state-of-the-art, our approach shows better performances on an extensive set of public benchmarks. MX-LSTM is particularly effective when people move slowly, i.e. the most challenging scenario for all other models. The proposed approach also allows for accurate predictions on a longer time horizon. △ Less

Submitted 15 October, 2019; v1 submitted 7 January, 2019; originally announced January 2019.

Comments: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.00652

arXiv:1805.00652 [pdf, other]

MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses

Authors: Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Alessio Del Bue, Fabio Galasso, Marco Cristani

Abstract: Recent approaches on trajectory forecasting use tracklets to predict the future positions of pedestrians exploiting Long Short Term Memory (LSTM) architectures. This paper shows that adding vislets, that is, short sequences of head pose estimations, allows to increase significantly the trajectory forecasting performance. We then propose to use vislets in a novel framework called MX-LSTM, capturing… ▽ More Recent approaches on trajectory forecasting use tracklets to predict the future positions of pedestrians exploiting Long Short Term Memory (LSTM) architectures. This paper shows that adding vislets, that is, short sequences of head pose estimations, allows to increase significantly the trajectory forecasting performance. We then propose to use vislets in a novel framework called MX-LSTM, capturing the interplay between tracklets and vislets thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. At the same time, MX-LSTM predicts the future head poses, increasing the standard capabilities of the long-term trajectory forecasting approaches. With standard head pose estimators and an attentional-based social pooling, MX-LSTM scores the new trajectory forecasting state-of-the-art in all the considered datasets (Zara01, Zara02, UCY, and TownCentre) with a dramatic margin when the pedestrians slow down, a case where most of the forecasting approaches struggle to provide an accurate solution. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Comments: 10 pages, 3 figures to appear in CVPR 2018

arXiv:1710.00568 [pdf, other]

Indirect Match Highlights Detection with Deep Convolutional Neural Networks

Authors: Marco Godi, Paolo Rota, Francesco Setti

Abstract: Highlights in a sport video are usually referred as actions that stimulate excitement or attract attention of the audience. A big effort is spent in designing techniques which find automatically highlights, in order to automatize the otherwise manual editing process. Most of the state-of-the-art approaches try to solve the problem by training a classifier using the information extracted on the tv-… ▽ More Highlights in a sport video are usually referred as actions that stimulate excitement or attract attention of the audience. A big effort is spent in designing techniques which find automatically highlights, in order to automatize the otherwise manual editing process. Most of the state-of-the-art approaches try to solve the problem by training a classifier using the information extracted on the tv-like framing of players playing on the game pitch, learning to detect game actions which are labeled by human observers according to their perception of highlight. Obviously, this is a long and expensive work. In this paper, we reverse the paradigm: instead of looking at the gameplay, inferring what could be exciting for the audience, we directly analyze the audience behavior, which we assume is triggered by events happening during the game. We apply deep 3D Convolutional Neural Network (3D-CNN) to extract visual features from cropped video recordings of the supporters that are attending the event. Outputs of the crops belonging to the same frame are then accumulated to produce a value indicating the Highlight Likelihood (HL) which is then used to discriminate between positive (i.e. when a highlight occurs) and negative samples (i.e. standard play or time-outs). Experimental results on a public dataset of ice-hockey matches demonstrate the effectiveness of our method and promote further research in this new exciting direction. △ Less

Submitted 2 October, 2017; originally announced October 2017.

Comments: "Social Signal Processing and Beyond" workshop, in conjunction with ICIAP 2017

arXiv:1409.2702 [pdf, other]

doi 10.1371/journal.pone.0123783

F-formation Detection: Individuating Free-standing Conversational Groups in Images

Authors: Francesco Setti, Chris Russell, Chiara Bassetti, Marco Cristani

Abstract: Detection of groups of interacting people is a very interesting and useful task in many modern technologies, with application fields spanning from video-surveillance to social robotics. In this paper we first furnish a rigorous definition of group considering the background of the social sciences: this allows us to specify many kinds of group, so far neglected in the Computer Vision literature. On… ▽ More Detection of groups of interacting people is a very interesting and useful task in many modern technologies, with application fields spanning from video-surveillance to social robotics. In this paper we first furnish a rigorous definition of group considering the background of the social sciences: this allows us to specify many kinds of group, so far neglected in the Computer Vision literature. On top of this taxonomy, we present a detailed state of the art on the group detection algorithms. Then, as a main contribution, we present a brand new method for the automatic detection of groups in still images, which is based on a graph-cuts framework for clustering individuals; in particular we are able to codify in a computational sense the sociological definition of F-formation, that is very useful to encode a group having only proxemic information: position and orientation of people. We call the proposed method Graph-Cuts for F-formation (GCFF). We show how GCFF definitely outperforms all the state of the art methods in terms of different accuracy measures (some of them are brand new), demonstrating also a strong robustness to noise and versatility in recognizing groups of various cardinality. △ Less

Submitted 9 September, 2014; originally announced September 2014.

Comments: 32 pages, submitted to PLOS One

Showing 1–24 of 24 results for author: Setti, F