-
Material synthesis through simulations guided by machine learning: a position paper
Authors:
Usman Syed,
Federico Cunico,
Uzair Khan,
Eros Radicchi,
Francesco Setti,
Adolfo Speghini,
Paolo Marone,
Filiberto Semenzin,
Marco Cristani
Abstract:
In this position paper, we propose an approach for sustainable data collection in the field of optimal mix design for marble sludge reuse. Marble sludge, a calcium-rich residual from stone-cutting processes, can be repurposed by mixing it with various ingredients. However, determining the optimal mix design is challenging due to the variability in sludge composition and the costly, time-consuming…
▽ More
In this position paper, we propose an approach for sustainable data collection in the field of optimal mix design for marble sludge reuse. Marble sludge, a calcium-rich residual from stone-cutting processes, can be repurposed by mixing it with various ingredients. However, determining the optimal mix design is challenging due to the variability in sludge composition and the costly, time-consuming nature of experimental data collection. Also, we investigate the possibility of using machine learning models using meta-learning as an optimization tool to estimate the correct quantity of stone-cutting sludge to be used in aggregates to obtain a mix design with specific mechanical properties that can be used successfully in the building industry. Our approach offers two key advantages: (i) through simulations, a large dataset can be generated, saving time and money during the data collection phase, and (ii) Utilizing machine learning models, with performance enhancement through hyper-parameter optimization via meta-learning, to estimate optimal mix designs reducing the need for extensive manual experimentation, lowering costs, minimizing environmental impact, and accelerating the processing of quarry sludge. Our idea promises to streamline the marble sludge reuse process by leveraging collective data and advanced machine learning, promoting sustainability and efficiency in the stonecutting sector.
△ Less
Submitted 26 November, 2024; v1 submitted 21 November, 2024;
originally announced November 2024.
-
Learning based Ge'ez character handwritten recognition
Authors:
Hailemicael Lulseged Yimer,
Hailegabriel Dereje Degefa,
Marco Cristani,
Federico Cunico
Abstract:
Ge'ez, an ancient Ethiopic script of cultural and historical significance, has been largely neglected in handwriting recognition research, hindering the digitization of valuable manuscripts. Our study addresses this gap by developing a state-of-the-art Ge'ez handwriting recognition system using Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. Our approach uses a two…
▽ More
Ge'ez, an ancient Ethiopic script of cultural and historical significance, has been largely neglected in handwriting recognition research, hindering the digitization of valuable manuscripts. Our study addresses this gap by developing a state-of-the-art Ge'ez handwriting recognition system using Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. Our approach uses a two-stage recognition process. First, a CNN is trained to recognize individual characters, which then acts as a feature extractor for an LSTM-based system for word recognition. Our dual-stage recognition approach achieves new top scores in Ge'ez handwriting recognition, outperforming eight state-of-the-art methods, which are SVTR, ASTER, and others as well as human performance, as measured in the HHD-Ethiopic dataset work. This research significantly advances the preservation and accessibility of Ge'ez cultural heritage, with implications for historical document digitization, educational tools, and cultural preservation. The code will be released upon acceptance.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
IoT-Based Coma Patient Monitoring System
Authors:
Hailemicael Lulseged Yimer,
Hailegabriel Dereje Degefa,
Marco Cristani,
Federico Cunico
Abstract:
Continuous monitoring of coma patients is essential but challenging, especially in developing countries with limited resources, staff, and infrastructure. This paper presents a low-cost IoT-based system designed for such environments. It uses affordable hardware and robust software to monitor patients without constant internet access or extensive medical personnel. The system employs cost-effectiv…
▽ More
Continuous monitoring of coma patients is essential but challenging, especially in developing countries with limited resources, staff, and infrastructure. This paper presents a low-cost IoT-based system designed for such environments. It uses affordable hardware and robust software to monitor patients without constant internet access or extensive medical personnel. The system employs cost-effective sensors to track vital signs, including heart rate, body temperature, blood pressure, eye movement, and body position. An energy-efficient microcontroller processes data locally, synchronizing with a central server when network access is available. A locally hosted app provides on-site access to patient data, while a GSM module sends immediate alerts for critical events, even in areas with limited cellular coverage. This solution emphasizes ease of deployment, minimal maintenance, and resilience to power and network disruptions. Using open-source software and widely available hardware, it offers a scalable, adaptable system for resource-limited settings. At under $30, the system is a sustainable, cost-effective solution for continuous patient monitoring, bridging the gap until more advanced healthcare infrastructure is available.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Multi-Camera Industrial Open-Set Person Re-Identification and Tracking
Authors:
Federico Cunico,
Marco Cristani
Abstract:
In recent years, the development of deep learning approaches for the task of person re-identification led to impressive results. However, this comes with a limitation for industrial and practical real-world applications. Firstly, most of the existing works operate on closed-world scenarios, in which the people to re-identify (probes) are compared to a closed-set (gallery). Real-world scenarios oft…
▽ More
In recent years, the development of deep learning approaches for the task of person re-identification led to impressive results. However, this comes with a limitation for industrial and practical real-world applications. Firstly, most of the existing works operate on closed-world scenarios, in which the people to re-identify (probes) are compared to a closed-set (gallery). Real-world scenarios often are open-set problems in which the gallery is not known a priori, but the number of open-set approaches in the literature is significantly lower. Secondly, challenges such as multi-camera setups, occlusions, real-time requirements, etc., further constrain the applicability of off-the-shelf methods. This work presents MICRO-TRACK, a Modular Industrial multi-Camera Re_identification and Open-set Tracking system that is real-time, scalable, and easy to integrate into existing industrial surveillance scenarios. Furthermore, we release a novel Re-ID and tracking dataset acquired in an industrial manufacturing facility, dubbed Facility-ReID, consisting of 18-minute videos captured by 8 surveillance cameras.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset
Authors:
Andrea Avogaro,
Andrea Toaiari,
Federico Cunico,
Xiangmin Xu,
Haralambos Dafas,
Alessandro Vinciarelli,
Emma Li,
Marco Cristani
Abstract:
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The…
▽ More
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work.
△ Less
Submitted 23 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
A Machine Learning-oriented Survey on Tiny Machine Learning
Authors:
Luigi Capogrosso,
Federico Cunico,
Dong Seon Cheng,
Franco Fummi,
Marco Cristani
Abstract:
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused…
▽ More
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.
△ Less
Submitted 26 September, 2023; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language
Authors:
Francesco Taioli,
Federico Cunico,
Federico Girella,
Riccardo Bologna,
Alessandro Farinelli,
Marco Cristani
Abstract:
We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance map for Visual Navigation with natural language query prompts. The recently proposed RNR-Map employs a grid structure comprising latent codes positioned at each pixel. These latent codes, which are derived from image observation, enable: i) image rendering given a camera pose, since they are converted to Neural Radiance Field; i…
▽ More
We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance map for Visual Navigation with natural language query prompts. The recently proposed RNR-Map employs a grid structure comprising latent codes positioned at each pixel. These latent codes, which are derived from image observation, enable: i) image rendering given a camera pose, since they are converted to Neural Radiance Field; ii) image navigation and localization with astonishing accuracy. On top of this, we enhance RNR-Map with CLIP-based embedding latent codes, allowing natural language search without additional label data. We evaluate the effectiveness of this map in single and multi-object searches. We also investigate its compatibility with a Large Language Model as an "affordance query resolver". Code and videos are available at https://intelligolabs.github.io/Le-RNR-Map/
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Markerless human pose estimation for biomedical applications: a survey
Authors:
Andrea Avogaro,
Federico Cunico,
Bodo Rosenhahn,
Francesco Setti
Abstract:
Markerless Human Pose Estimation (HPE) proved its potential to support decision making and assessment in many fields of application. HPE is often preferred to traditional marker-based Motion Capture systems due to the ease of setup, portability, and affordable cost of the technology. However, the exploitation of HPE in biomedical applications is still under investigation. This review aims to provi…
▽ More
Markerless Human Pose Estimation (HPE) proved its potential to support decision making and assessment in many fields of application. HPE is often preferred to traditional marker-based Motion Capture systems due to the ease of setup, portability, and affordable cost of the technology. However, the exploitation of HPE in biomedical applications is still under investigation. This review aims to provide an overview of current biomedical applications of HPE. In this paper, we examine the main features of HPE approaches and discuss whether or not those features are of interest to biomedical applications. We also identify those areas where HPE is already in use and present peculiarities and trends followed by researchers and practitioners. We include here 25 approaches to HPE and more than 40 studies of HPE applied to motor development assessment, neuromuscolar rehabilitation, and gait & posture analysis. We conclude that markerless HPE offers great potential for extending diagnosis and rehabilitation outside hospitals and clinics, toward the paradigm of remote medical care.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
OO-dMVMT: A Deep Multi-view Multi-task Classification Framework for Real-time 3D Hand Gesture Classification and Segmentation
Authors:
Federico Cunico,
Federico Girella,
Andrea Avogaro,
Marco Emporio,
Andrea Giachetti,
Marco Cristani
Abstract:
Continuous mid-air hand gesture recognition based on captured hand pose streams is fundamental for human-computer interaction, particularly in AR / VR. However, many of the methods proposed to recognize heterogeneous hand gestures are tested only on the classification task, and the real-time low-latency gesture segmentation in a continuous stream is not well addressed in the literature. For this t…
▽ More
Continuous mid-air hand gesture recognition based on captured hand pose streams is fundamental for human-computer interaction, particularly in AR / VR. However, many of the methods proposed to recognize heterogeneous hand gestures are tested only on the classification task, and the real-time low-latency gesture segmentation in a continuous stream is not well addressed in the literature. For this task, we propose the On-Off deep Multi-View Multi-Task paradigm (OO-dMVMT). The idea is to exploit multiple time-local views related to hand pose and movement to generate rich gesture descriptions, along with using heterogeneous tasks to achieve high accuracy. OO-dMVMT extends the classical MVMT paradigm, where all of the multiple tasks have to be active at each time, by allowing specific tasks to switch on/off depending on whether they can apply to the input. We show that OO-dMVMT defines the new SotA on continuous/online 3D skeleton-based gesture recognition in terms of gesture classification accuracy, segmentation accuracy, false positives, and decision latency while maintaining real-time operation.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Split-Et-Impera: A Framework for the Design of Distributed Deep Learning Applications
Authors:
Luigi Capogrosso,
Federico Cunico,
Michele Lora,
Marco Cristani,
Franco Fummi,
Davide Quaglia
Abstract:
Many recent pattern recognition applications rely on complex distributed architectures in which sensing and computational nodes interact together through a communication network. Deep neural networks (DNNs) play an important role in this scenario, furnishing powerful decision mechanisms, at the price of a high computational effort. Consequently, powerful state-of-the-art DNNs are frequently split…
▽ More
Many recent pattern recognition applications rely on complex distributed architectures in which sensing and computational nodes interact together through a communication network. Deep neural networks (DNNs) play an important role in this scenario, furnishing powerful decision mechanisms, at the price of a high computational effort. Consequently, powerful state-of-the-art DNNs are frequently split over various computational nodes, e.g., a first part stays on an embedded device and the rest on a server. Deciding where to split a DNN is a challenge in itself, making the design of deep learning applications even more complicated. Therefore, we propose Split-Et-Impera, a novel and practical framework that i) determines the set of the best-split points of a neural network based on deep network interpretability principles without performing a tedious try-and-test approach, ii) performs a communication-aware simulation for the rapid evaluation of different neural network rearrangements, and iii) suggests the best match between the quality of service requirements of the application and the performance in terms of accuracy and latency time.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
A Masked Face Classification Benchmark on Low-Resolution Surveillance Images
Authors:
Federico Cunico,
Andrea Toaiari,
Marco Cristani
Abstract:
We propose a novel image dataset focused on tiny faces wearing face masks for mask classification purposes, dubbed Small Face MASK (SF-MASK), composed of a collection made from 20k low-resolution images exported from diverse and heterogeneous datasets, ranging from 7 x 7 to 64 x 64 pixel resolution. An accurate visualization of this collection, through counting grids, made it possible to highlight…
▽ More
We propose a novel image dataset focused on tiny faces wearing face masks for mask classification purposes, dubbed Small Face MASK (SF-MASK), composed of a collection made from 20k low-resolution images exported from diverse and heterogeneous datasets, ranging from 7 x 7 to 64 x 64 pixel resolution. An accurate visualization of this collection, through counting grids, made it possible to highlight gaps in the variety of poses assumed by the heads of the pedestrians. In particular, faces filmed by very high cameras, in which the facial features appear strongly skewed, are absent. To address this structural deficiency, we produced a set of synthetic images which resulted in a satisfactory covering of the intra-class variance. Furthermore, a small subsample of 1701 images contains badly worn face masks, opening to multi-class classification challenges. Experiments on SF-MASK focus on face mask classification using several classifiers. Results show that the richness of SF-MASK (real + synthetic images) leads all of the tested classifiers to perform better than exploiting comparative face mask datasets, on a fixed 1077 images testing set. Dataset and evaluation code are publicly available here: https://github.com/HumaticsLAB/sf-mask
△ Less
Submitted 3 August, 2023; v1 submitted 23 November, 2022;
originally announced November 2022.
-
I-SPLIT: Deep Network Interpretability for Split Computing
Authors:
Federico Cunico,
Luigi Capogrosso,
Francesco Setti,
Damiano Carra,
Franco Fummi,
Marco Cristani
Abstract:
This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server. So far, potential split locations have been identified exploiting uniquely architectural aspects, i.e., based on the layer sizes. Under this paradigm, the efficacy of the split in terms of accuracy can be evaluated only…
▽ More
This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server. So far, potential split locations have been identified exploiting uniquely architectural aspects, i.e., based on the layer sizes. Under this paradigm, the efficacy of the split in terms of accuracy can be evaluated only after having performed the split and retrained the entire pipeline, making an exhaustive evaluation of all the plausible splitting points prohibitive in terms of time. Here we show that not only the architecture of the layers does matter, but the importance of the neurons contained therein too. A neuron is important if its gradient with respect to the correct class decision is high. It follows that a split should be applied right after a layer with a high density of important neurons, in order to preserve the information flowing until then. Upon this idea, we propose Interpretable Split (I-SPLIT): a procedure that identifies the most suitable splitting points by providing a reliable prediction on how well this split will perform in terms of classification accuracy, beforehand of its effective implementation. As a further major contribution of I-SPLIT, we show that the best choice for the splitting point on a multiclass categorization problem depends also on which specific classes the network has to deal with. Exhaustive experiments have been carried out on two networks, VGG16 and ResNet-50, and three datasets, Tiny-Imagenet-200, notMNIST, and Chest X-Ray Pneumonia. The source code is available at https://github.com/vips4/I-Split.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Pose Forecasting in Industrial Human-Robot Collaboration
Authors:
Alessio Sampieri,
Guido D'Amely,
Andrea Avogaro,
Federico Cunico,
Geri Skenderi,
Francesco Setti,
Marco Cristani,
Fabio Galasso
Abstract:
Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it…
▽ More
Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it only uses 1.72% of the parameters and it is ~4 times faster, while still performing comparably in forecasting accuracy on Human3.6M at 1 second in the future, which enables cobots to be aware of human operators. As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO). CHICO includes multi-view videos, 3D poses and trajectories of 20 human operators and cobots, engaging in 7 realistic industrial actions. Additionally, it reports 226 genuine collisions, taking place during the human-cobot interaction. We test SeS-GCN on CHICO for two important perception tasks in robotics: human pose forecasting, where it reaches an average error of 85.3 mm (MPJPE) at 1 sec in the future with a run time of 2.3 msec, and collision detection, by comparing the forecasted human motion with the known cobot motion, obtaining an F1-score of 0.64.
△ Less
Submitted 24 July, 2022;
originally announced August 2022.