-
FLIM-based Salient Object Detection Networks with Adaptive Decoders
Authors:
Gilson Junior Soares,
Matheus Abrantes Cerqueira,
Jancarlo F. Gomes,
Laurent Najman,
Silvio Jamil F. Guimarães,
Alexandre Xavier Falcão
Abstract:
Salient Object Detection (SOD) methods can locate objects that stand out in an image, assign higher values to their pixels in a saliency map, and binarize the map outputting a predicted segmentation mask. A recent tendency is to investigate pre-trained lightweight models rather than deep neural networks in SOD tasks, coping with applications under limited computational resources. In this context,…
▽ More
Salient Object Detection (SOD) methods can locate objects that stand out in an image, assign higher values to their pixels in a saliency map, and binarize the map outputting a predicted segmentation mask. A recent tendency is to investigate pre-trained lightweight models rather than deep neural networks in SOD tasks, coping with applications under limited computational resources. In this context, we have investigated lightweight networks using a methodology named Feature Learning from Image Markers (FLIM), which assumes that the encoder's kernels can be estimated from marker pixels on discriminative regions of a few representative images. This work proposes flyweight networks, hundreds of times lighter than lightweight models, for SOD by combining a FLIM encoder with an adaptive decoder, whose weights are estimated for each input image by a given heuristic function. Such FLIM networks are trained from three to four representative images only and without backpropagation, making the models suitable for applications under labeled data constraints as well. We study five adaptive decoders; two of them are introduced here. Differently from the previous ones that rely on one neuron per pixel with shared weights, the heuristic functions of the new adaptive decoders estimate the weights of each neuron per pixel. We compare FLIM models with adaptive decoders for two challenging SOD tasks with three lightweight networks from the state-of-the-art, two FLIM networks with decoders trained by backpropagation, and one FLIM network whose labeled markers define the decoder's weights. The experiments demonstrate the advantages of the proposed networks over the baselines, revealing the importance of further investigating such methods in new applications.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Multi-Sensor Fusion for Quadruped Robot State Estimation using Invariant Filtering and Smoothing
Authors:
Ylenia Nisticò,
Hajun Kim,
João Carlos Virgolino Soares,
Geoff Fink,
Hae-Won Park,
Claudio Semini
Abstract:
This letter introduces two multi-sensor state estimation frameworks for quadruped robots, built on the Invariant Extended Kalman Filter (InEKF) and Invariant Smoother (IS). The proposed methods, named E-InEKF and E-IS, fuse kinematics, IMU, LiDAR, and GPS data to mitigate position drift, particularly along the z-axis, a common issue in proprioceptive-based approaches. We derived observation models…
▽ More
This letter introduces two multi-sensor state estimation frameworks for quadruped robots, built on the Invariant Extended Kalman Filter (InEKF) and Invariant Smoother (IS). The proposed methods, named E-InEKF and E-IS, fuse kinematics, IMU, LiDAR, and GPS data to mitigate position drift, particularly along the z-axis, a common issue in proprioceptive-based approaches. We derived observation models that satisfy group-affine properties to integrate LiDAR odometry and GPS into InEKF and IS. LiDAR odometry is incorporated using Iterative Closest Point (ICP) registration on a parallel thread, preserving the computational efficiency of proprioceptive-based state estimation. We evaluate E-InEKF and E-IS with and without exteroceptive sensors, benchmarking them against LiDAR-based odometry methods in indoor and outdoor experiments using the KAIST HOUND2 robot. Our methods achieve lower Relative Position Errors (RPE) and significantly reduce Absolute Trajectory Error (ATE), with improvements of up to 28% indoors and 40% outdoors compared to LIO-SAM and FAST-LIO2. Additionally, we compare E-InEKF and E-IS in terms of computational efficiency and accuracy.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
MUSE: A Real-Time Multi-Sensor State Estimator for Quadruped Robots
Authors:
Ylenia Nisticò,
João Carlos Virgolino Soares,
Lorenzo Amatucci,
Geoff Fink,
Claudio Semini
Abstract:
This paper introduces an innovative state estimator, MUSE (MUlti-sensor State Estimator), designed to enhance state estimation's accuracy and real-time performance in quadruped robot navigation. The proposed state estimator builds upon our previous work presented in [1]. It integrates data from a range of onboard sensors, including IMUs, encoders, cameras, and LiDARs, to deliver a comprehensive an…
▽ More
This paper introduces an innovative state estimator, MUSE (MUlti-sensor State Estimator), designed to enhance state estimation's accuracy and real-time performance in quadruped robot navigation. The proposed state estimator builds upon our previous work presented in [1]. It integrates data from a range of onboard sensors, including IMUs, encoders, cameras, and LiDARs, to deliver a comprehensive and reliable estimation of the robot's pose and motion, even in slippery scenarios. We tested MUSE on a Unitree Aliengo robot, successfully closing the locomotion control loop in difficult scenarios, including slippery and uneven terrain. Benchmarking against Pronto [2] and VILENS [3] showed 67.6% and 26.7% reductions in translational errors, respectively. Additionally, MUSE outperformed DLIO [4], a LiDAR-inertial odometry system in rotational errors and frequency, while the proprioceptive version of MUSE (P-MUSE) outperformed TSIF [5], with a 45.9% reduction in absolute trajectory error (ATE).
△ Less
Submitted 27 March, 2025; v1 submitted 15 March, 2025;
originally announced March 2025.
-
SANDRO: a Robust Solver with a Splitting Strategy for Point Cloud Registration
Authors:
Michael Adlerstein,
João Carlos Virgolino Soares,
Angelo Bratta,
Claudio Semini
Abstract:
Point cloud registration is a critical problem in computer vision and robotics, especially in the field of navigation. Current methods often fail when faced with high outlier rates or take a long time to converge to a suitable solution. In this work, we introduce a novel algorithm for point cloud registration called SANDRO (Splitting strategy for point cloud Alignment using Non-convex anD Robust O…
▽ More
Point cloud registration is a critical problem in computer vision and robotics, especially in the field of navigation. Current methods often fail when faced with high outlier rates or take a long time to converge to a suitable solution. In this work, we introduce a novel algorithm for point cloud registration called SANDRO (Splitting strategy for point cloud Alignment using Non-convex anD Robust Optimization), which combines an Iteratively Reweighted Least Squares (IRLS) framework with a robust loss function with graduated non-convexity. This approach is further enhanced by a splitting strategy designed to handle high outlier rates and skewed distributions of outliers. SANDRO is capable of addressing important limitations of existing methods, as in challenging scenarios where the presence of high outlier rates and point cloud symmetries significantly hinder convergence. SANDRO achieves superior performance in terms of success rate when compared to the state-of-the-art methods, demonstrating a 20% improvement from the current state of the art when tested on the Redwood real dataset and 60% improvement when tested on synthetic data.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Complex-Valued Neural Networks for Ultra-Reliable Massive MIMO
Authors:
Pedro Benevenuto Valadares,
Jonathan Aguiar Soares,
Kayol Mayer,
Dalton Soares Arantes
Abstract:
In the evolving landscape of 5G and 6G networks, the demands extend beyond high data rates, ultra-low latency, and extensive coverage, increasingly emphasizing the need for reliability. This paper proposes an ultra-reliable multiple-input multiple-output (MIMO) scheme utilizing quasi-orthogonal space-time block coding (QOSTBC) combined with singular value decomposition (SVD) for channel state info…
▽ More
In the evolving landscape of 5G and 6G networks, the demands extend beyond high data rates, ultra-low latency, and extensive coverage, increasingly emphasizing the need for reliability. This paper proposes an ultra-reliable multiple-input multiple-output (MIMO) scheme utilizing quasi-orthogonal space-time block coding (QOSTBC) combined with singular value decomposition (SVD) for channel state information (CSI) correction, significantly improving performance over QOSTBC and traditional orthogonal STBC (OSTBC) when analyzing spectral efficiency. Although QOSTBC enhances spectral efficiency, it also increases computational complexity at the maximum likelihood (ML) decoder. To address this, a neural network-based decoding scheme using phase-transmittance radial basis function (PT-RBF) architecture is also introduced to manage QOSTBC's complexity. Simulation results demonstrate improved system robustness and performance, making this approach a potential candidate for ultra-reliable communication in next-generation networks.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval
Authors:
Bhavin Jawade,
Joao V. B. Soares,
Kapil Thadani,
Deen Dayal Mohan,
Amir Erfan Eshratifar,
Benjamin Culpepper,
Paloma de Juan,
Srirangaraj Setlur,
Venu Govindaraju
Abstract:
Compositional image retrieval (CIR) is a multimodal learning task where a model combines a query image with a user-provided text modification to retrieve a target image. CIR finds applications in a variety of domains including product retrieval (e-commerce) and web search. Existing methods primarily focus on fully-supervised learning, wherein models are trained on datasets of labeled triplets such…
▽ More
Compositional image retrieval (CIR) is a multimodal learning task where a model combines a query image with a user-provided text modification to retrieve a target image. CIR finds applications in a variety of domains including product retrieval (e-commerce) and web search. Existing methods primarily focus on fully-supervised learning, wherein models are trained on datasets of labeled triplets such as FashionIQ and CIRR. This poses two significant challenges: (i) curating such triplet datasets is labor intensive; and (ii) models lack generalization to unseen objects and domains. In this work, we propose SCOT (Self-supervised COmpositional Training), a novel zero-shot compositional pretraining strategy that combines existing large image-text pair datasets with the generative capabilities of large language models to contrastively train an embedding composition network. Specifically, we show that the text embedding from a large-scale contrastively-pretrained vision-language model can be utilized as proxy target supervision during compositional pretraining, replacing the target image embedding. In zero-shot settings, this strategy surpasses SOTA zero-shot compositional retrieval methods as well as many fully-supervised methods on standard benchmarks such as FashionIQ and CIRR.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Proprioceptive State Estimation for Quadruped Robots using Invariant Kalman Filtering and Scale-Variant Robust Cost Functions
Authors:
Hilton Marques Souza Santana,
João Carlos Virgolino Soares,
Ylenia Nisticò,
Marco Antonio Meggiolaro,
Claudio Semini
Abstract:
Accurate state estimation is crucial for legged robot locomotion, as it provides the necessary information to allow control and navigation. However, it is also challenging, especially in scenarios with uneven and slippery terrain. This paper presents a new Invariant Extended Kalman filter for legged robot state estimation using only proprioceptive sensors. We formulate the methodology by combining…
▽ More
Accurate state estimation is crucial for legged robot locomotion, as it provides the necessary information to allow control and navigation. However, it is also challenging, especially in scenarios with uneven and slippery terrain. This paper presents a new Invariant Extended Kalman filter for legged robot state estimation using only proprioceptive sensors. We formulate the methodology by combining recent advances in state estimation theory with the use of robust cost functions in the measurement update. We tested our methodology on quadruped robots through experiments and public datasets, showing that we can obtain a pose drift up to 40% lower in trajectories covering a distance of over 450m, in comparison with a state-of-the-art Invariant Extended Kalman filter.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models
Authors:
Zehan Li,
Yan Hu,
Scott Lane,
Salih Selek,
Lokesh Shahani,
Rodrigo Machado-Vieira,
Jair Soares,
Hua Xu,
Hongfang Liu,
Ming Huang
Abstract:
Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple…
▽ More
Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple single-label and single multi-label) for detecting coexisting suicidal events from 500 annotated psychiatric evaluation notes. The notes were labeled for suicidal ideation (SI), suicide attempts (SA), exposure to suicide (ES), and non-suicidal self-injury (NSSI). RoBERTa outperformed other models using multiple single-label classification strategy (acc=0.86, F1=0.78). MentalBERT (acc=0.83, F1=0.74) also exceeded BioClinicalBERT (acc=0.82, F1=0.72) which outperformed BERT (acc=0.80, F1=0.70). RoBERTa fine-tuned with single multi-label classification further improved the model performance (acc=0.88, F1=0.81). The findings highlight that the model optimization, pretraining with domain-relevant data, and the single multi-label classification strategy enhance the model performance of suicide phenotyping. Keywords: EHR-based Phenotyping; Natural Language Processing; Secondary Use of EHR Data; Suicide Classification; BERT-based Model; Psychiatry; Mental Health
△ Less
Submitted 3 October, 2024; v1 submitted 27 September, 2024;
originally announced September 2024.
-
Creating a Segmented Pointcloud of Grapevines by Combining Multiple Viewpoints Through Visual Odometry
Authors:
Michael Adlerstein,
Angelo Bratta,
João Carlos Virgolino Soares,
Giovanni Dessy,
Miguel Fernandes,
Matteo Gatti,
Claudio Semini
Abstract:
Grapevine winter pruning is a labor-intensive and repetitive process that significantly influences the quality and quantity of the grape harvest and produced wine of the following season. It requires a careful and expert detection of the point to be cut. Because of its complexity, repetitive nature and time constraint, the task requires skilled labor that needs to be trained. This extended abstrac…
▽ More
Grapevine winter pruning is a labor-intensive and repetitive process that significantly influences the quality and quantity of the grape harvest and produced wine of the following season. It requires a careful and expert detection of the point to be cut. Because of its complexity, repetitive nature and time constraint, the task requires skilled labor that needs to be trained. This extended abstract presents the computer vision pipeline employed in project Vinum, using detectron2 as a segmentation network and keypoint visual odometry to merge different observation into a single pointcloud used to make informed pruning decisions.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation
Authors:
Gabriel Fischer Abati,
João Carlos Virgolino Soares,
Vivian Suzano Medeiros,
Marco Antonio Meggiolaro,
Claudio Semini
Abstract:
The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unkno…
▽ More
The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped robot with an RGB-D camera to test the applicability of our method in real-world scenarios. The tests were validated by a ground-truth created with a motion capture system.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Salient Object-Aware Background Generation using Text-Guided Diffusion Models
Authors:
Amir Erfan Eshratifar,
Joao V. B. Soares,
Kapil Thadani,
Shaunak Mishra,
Mikhail Kuznetsov,
Yueh-Ning Ku,
Paloma de Juan
Abstract:
Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on…
▽ More
Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion, they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently, when used for background creation, inpainting models frequently extend the salient object's boundaries and thereby change the object's identity, which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets, including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Cooperative Modular Manipulation with Numerous Cable-Driven Robots for Assistive Construction and Gap Crossing
Authors:
Kevin Murphy,
Joao C. V. Soares,
Justin K. Yim,
Dustin Nottage,
Ahmet Soylemezoglu,
Joao Ramos
Abstract:
Soldiers in the field often need to cross negative obstacles, such as rivers or canyons, to reach goals or safety. Military gap crossing involves on-site temporary bridges construction. However, this procedure is conducted with dangerous, time and labor intensive operations, and specialized machinery. We envision a scalable robotic solution inspired by advancements in force-controlled and Cable Dr…
▽ More
Soldiers in the field often need to cross negative obstacles, such as rivers or canyons, to reach goals or safety. Military gap crossing involves on-site temporary bridges construction. However, this procedure is conducted with dangerous, time and labor intensive operations, and specialized machinery. We envision a scalable robotic solution inspired by advancements in force-controlled and Cable Driven Parallel Robots (CDPRs); this solution can address the challenges inherent in this transportation problem, achieving fast, efficient, and safe deployment and field operations. We introduce the embodied vision in Co3MaNDR, a solution to the military gap crossing problem, a distributed robot consisting of several modules simultaneously pulling on a central payload, controlling the cables' tensions to achieve complex objectives, such as precise trajectory tracking or force amplification. Hardware experiments demonstrate teleoperation of a payload, trajectory following, and the sensing and amplification of operators' applied physical forces during slow operations. An operator was shown to manipulate a 27.2 kg (60 lb) payload with an average force utilization of 14.5\% of its weight. Results indicate that the system can be scaled up to heavier payloads without compromising performance or introducing superfluous complexity. This research lays a foundation to expand CDPR technology to uncoordinated and unstable mobile platforms in unknown environments.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Classification of Major Depressive Disorder Using Vertex-Wise Brain Sulcal Depth, Curvature, and Thickness with a Deep and a Shallow Learning Model
Authors:
Roberto Goya-Maldonado,
Tracy Erwin-Grabner,
Ling-Li Zeng,
Christopher R. K. Ching,
Andre Aleman,
Alyssa R. Amod,
Zeynep Basgoze,
Francesco Benedetti,
Bianca Besteher,
Katharina Brosch,
Robin Bülow,
Romain Colle,
Colm G. Connolly,
Emmanuelle Corruble,
Baptiste Couvy-Duchesne,
Kathryn Cullen,
Udo Dannlowski,
Christopher G. Davey,
Annemiek Dols,
Jan Ernsting,
Jennifer W. Evans,
Lukas Fisch,
Paola Fuentes-Claramonte,
Ali Saffet Gonul,
Ian H. Gotlib
, et al. (62 additional authors not shown)
Abstract:
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, h…
▽ More
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. Here, we used globally representative data from the ENIGMA-MDD working group containing 7,012 participants from 30 sites (N=2,772 MDD and N=4,240 HC), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of such features and classifiers is unfeasible. Perhaps more sophisticated integration of multimodal information may lead to a higher performance in this diagnostic task.
△ Less
Submitted 24 January, 2025; v1 submitted 18 November, 2023;
originally announced November 2023.
-
HAL 9000: a Risk Manager for ITSs
Authors:
Tadeu Freitas,
Carlos Novo,
Joao Soares,
Ines Dutra,
Manuel E. Correia,
Behnam Shariati,
Rolando Martins
Abstract:
HAL 9000 is an Intrusion Tolerant Systems (ITSs) Risk Manager, which assesses configuration risks against potential intrusions. It utilizes gathered threat knowledge and remains operational, even in the absence of updated information. Based on its advice, the ITSs can dynamically and proactively adapt to recent threats to minimize and mitigate future intrusions from malicious adversaries. Our goal…
▽ More
HAL 9000 is an Intrusion Tolerant Systems (ITSs) Risk Manager, which assesses configuration risks against potential intrusions. It utilizes gathered threat knowledge and remains operational, even in the absence of updated information. Based on its advice, the ITSs can dynamically and proactively adapt to recent threats to minimize and mitigate future intrusions from malicious adversaries. Our goal is to reduce the risk linked to the exploitation of recently uncovered vulnerabilities that have not been classified and/or do not have a script to reproduce the exploit, considering the potential that they may have already been exploited as zero-day exploits. Our experiments demonstrate that the proposed solution can effectively learn and replicate National Vulnerability Database's evaluation process with 99% accuracy.
△ Less
Submitted 21 March, 2025; v1 submitted 15 November, 2023;
originally announced November 2023.
-
On the Computational Complexities of Complex-valued Neural Networks
Authors:
Kayol Soares Mayer,
Jonathan Aguiar Soares,
Ariadne Arrais Cruz,
Dalton Soares Arantes
Abstract:
Complex-valued neural networks (CVNNs) are nonlinear filters used in the digital signal processing of complex-domain data. Compared with real-valued neural networks~(RVNNs), CVNNs can directly handle complex-valued input and output signals due to their complex domain parameters and activation functions. With the trend toward low-power systems, computational complexity analysis has become essential…
▽ More
Complex-valued neural networks (CVNNs) are nonlinear filters used in the digital signal processing of complex-domain data. Compared with real-valued neural networks~(RVNNs), CVNNs can directly handle complex-valued input and output signals due to their complex domain parameters and activation functions. With the trend toward low-power systems, computational complexity analysis has become essential for measuring an algorithm's power consumption. Therefore, this paper presents both the quantitative and asymptotic computational complexities of CVNNs. This is a crucial tool in deciding which algorithm to implement. The mathematical operations are described in terms of the number of real-valued multiplications, as these are the most demanding operations. To determine which CVNN can be implemented in a low-power system, quantitative computational complexities can be used to accurately estimate the number of floating-point operations. We have also investigated the computational complexities of CVNNs discussed in some studies presented in the literature.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
CVNN-based Channel Estimation and Equalization in OFDM Systems Without Cyclic Prefix
Authors:
Heitor dos Santos Sousa,
Jonathan Aguiar Soares,
Kayol Soares Mayer,
Dalton Soares Arantes
Abstract:
In modern communication systems operating with Orthogonal Frequency-Division Multiplexing (OFDM), channel estimation requires minimal complexity with one-tap equalizers. However, this depends on cyclic prefixes, which must be sufficiently large to cover the channel impulse response. Conversely, the use of cyclic prefix (CP) decreases the useful information that can be conveyed in an OFDM frame, th…
▽ More
In modern communication systems operating with Orthogonal Frequency-Division Multiplexing (OFDM), channel estimation requires minimal complexity with one-tap equalizers. However, this depends on cyclic prefixes, which must be sufficiently large to cover the channel impulse response. Conversely, the use of cyclic prefix (CP) decreases the useful information that can be conveyed in an OFDM frame, thereby degrading the spectral efficiency of the system. In this context, we study the impact of CPs on channel estimation with complex-valued neural networks (CVNNs). We show that the phase-transmittance radial basis function neural network offers superior results, in terms of required energy per bit, compared to classical minimum mean-squared error and least squares algorithms in scenarios without CP.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Matrices inducing generalized metric on sequences
Authors:
Eloi Araujo,
Fábio V. Martinez,
Carlos H. A. Higa,
José Soares
Abstract:
Sequence comparison is a basic task to capture similarities and differences between two or more sequences of symbols, with countless applications such as in computational biology. An alignment is a way to compare sequences, where a giving scoring function determines the degree of similarity between them. Many scoring functions are obtained from scoring matrices. However,not all scoring matrices in…
▽ More
Sequence comparison is a basic task to capture similarities and differences between two or more sequences of symbols, with countless applications such as in computational biology. An alignment is a way to compare sequences, where a giving scoring function determines the degree of similarity between them. Many scoring functions are obtained from scoring matrices. However,not all scoring matrices induce scoring functions which are distances, since the scoring function is not necessarily a metric. In this work we establish necessary and sufficient conditions for scoring matrices to induce each one of the properties of a metric in weighted edit distances. For a subset of scoring matrices that induce normalized edit distances, we also characterize each class of scoring matrices inducing normalized edit distances. Furthermore, we define an extended edit distance, which takes into account a set of editing operations that transforms one sequence into another regardless of the existence of a usual corresponding alignment to represent them, describing a criterion to find a sequence of edit operations whose weight is minimum. Similarly, we determine the class of scoring matrices that induces extended edit distances for each of the properties of a metric.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
SoccerNet 2022 Challenges Results
Authors:
Silvio Giancola,
Anthony Cioppa,
Adrien Deliège,
Floriane Magera,
Vladimir Somers,
Le Kang,
Xin Zhou,
Olivier Barnich,
Christophe De Vleeschouwer,
Alexandre Alahi,
Bernard Ghanem,
Marc Van Droogenbroeck,
Abdulrahman Darwish,
Adrien Maglo,
Albert Clapés,
Andreas Luyts,
Andrei Boiarov,
Artur Xarles,
Astrid Orcesi,
Avijit Shah,
Baoyu Fan,
Bharath Comandur,
Chen Chen,
Chen Zhang,
Chen Zhao
, et al. (69 additional authors not shown)
Abstract:
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on det…
▽ More
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on detecting line and goal part elements, (4) camera calibration, dedicated to retrieving the intrinsic and extrinsic camera parameters, (5) player re-identification, focusing on retrieving the same players across multiple views, and (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams. Compared to last year's challenges, tasks (1-2) had their evaluation metrics redefined to consider tighter temporal accuracies, and tasks (3-6) were novel, including their underlying data and annotations. More information on the tasks, challenges and leaderboards are available on https://www.soccer-net.org. Baselines and development kits are available on https://github.com/SoccerNet.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
PCA-based Channel Estimation for MIMO Communications
Authors:
Jonathan Aguiar Soares,
Kayol Soares Mayer,
Pedro Benevenuto Valadares,
Dalton Soares Arantes
Abstract:
In multiple-input multiple-output communications, channel estimation is paramount to keep base stations and users on track. This paper proposes a novel PCA-based-principal component analysis-channel estimation approach for MIMO orthogonal frequency division multiplexing systems. The channel frequency response is firstly estimated with the least squares method, and then PCA is used to filter only t…
▽ More
In multiple-input multiple-output communications, channel estimation is paramount to keep base stations and users on track. This paper proposes a novel PCA-based-principal component analysis-channel estimation approach for MIMO orthogonal frequency division multiplexing systems. The channel frequency response is firstly estimated with the least squares method, and then PCA is used to filter only the higher singular components of the channel impulse response, which is then converted back to the frequency domain. The proposed approach is compared with the MMSE, the minimum mean square error estimation, in terms of bit error rate versus Eb/N0.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Visual Localization and Mapping in Dynamic and Changing Environments
Authors:
João Carlos Virgolino Soares,
Vivian Suzano Medeiros,
Gabriel Fischer Abati,
Marcelo Becker,
Glauco Caurin,
Marcelo Gattass,
Marco Antonio Meggiolaro
Abstract:
The real-world deployment of fully autonomous mobile robots depends on a robust SLAM (Simultaneous Localization and Mapping) system, capable of handling dynamic environments, where objects are moving in front of the robot, and changing environments, where objects are moved or replaced after the robot has already mapped the scene. This paper presents Changing-SLAM, a method for robust Visual SLAM i…
▽ More
The real-world deployment of fully autonomous mobile robots depends on a robust SLAM (Simultaneous Localization and Mapping) system, capable of handling dynamic environments, where objects are moving in front of the robot, and changing environments, where objects are moved or replaced after the robot has already mapped the scene. This paper presents Changing-SLAM, a method for robust Visual SLAM in both dynamic and changing environments. This is achieved by using a Bayesian filter combined with a long-term data association algorithm. Also, it employs an efficient algorithm for dynamic keypoints filtering based on object detection that correctly identify features inside the bounding box that are not dynamic, preventing a depletion of features that could cause lost tracks. Furthermore, a new dataset was developed with RGB-D data especially designed for the evaluation of changing environments on an object level, called PUC-USP dataset. Six sequences were created using a mobile robot, an RGB-D camera and a motion capture system. The sequences were designed to capture different scenarios that could lead to a tracking failure or a map corruption. To the best of our knowledge, Changing-SLAM is the first Visual SLAM system that is robust to both dynamic and changing environments, not assuming a given camera pose or a known map, being also able to operate in real time. The proposed method was evaluated using benchmark datasets and compared with other state-of-the-art methods, proving to be highly accurate.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Action Spotting using Dense Detection Anchors Revisited: Submission to the SoccerNet Challenge 2022
Authors:
João V. B. Soares,
Avijit Shah
Abstract:
This brief technical report describes our submission to the Action Spotting SoccerNet Challenge 2022. The challenge was part of the CVPR 2022 ActivityNet Workshop. Our submission was based on a recently proposed method which focuses on increasing temporal precision via a densely sampled set of detection anchors. Due to its emphasis on temporal precision, this approach had shown significant improve…
▽ More
This brief technical report describes our submission to the Action Spotting SoccerNet Challenge 2022. The challenge was part of the CVPR 2022 ActivityNet Workshop. Our submission was based on a recently proposed method which focuses on increasing temporal precision via a densely sampled set of detection anchors. Due to its emphasis on temporal precision, this approach had shown significant improvements in the tight average-mAP metric. Tight average-mAP was used as the evaluation criterion for the challenge, and is defined using small temporal evaluation tolerances, thus being more sensitive to small temporal errors. In order to further improve results, here we introduce small changes in the pre- and post-processing steps, and also combine different input feature types via late fusion. These changes brought improvements that helped us achieve the first place in the challenge and also led to a new state-of-the-art on SoccerNet's test set when using the dataset's standard experimental protocol. This report briefly reviews the action spotting method based on dense detection anchors, then focuses on the modifications introduced for the challenge. We also describe the experimental protocols and training procedures we used, and finally present our results.
△ Less
Submitted 3 August, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors
Authors:
João V. B. Soares,
Avijit Shah,
Topojoy Biswas
Abstract:
We present a model for temporally precise action spotting in videos, which uses a dense set of detection anchors, predicting a detection confidence and corresponding fine-grained temporal displacement for each anchor. We experiment with two trunk architectures, both of which are able to incorporate large temporal contexts while preserving the smaller-scale features required for precise localizatio…
▽ More
We present a model for temporally precise action spotting in videos, which uses a dense set of detection anchors, predicting a detection confidence and corresponding fine-grained temporal displacement for each anchor. We experiment with two trunk architectures, both of which are able to incorporate large temporal contexts while preserving the smaller-scale features required for precise localization: a one-dimensional version of a u-net, and a Transformer encoder (TE). We also suggest best practices for training models of this kind, by applying Sharpness-Aware Minimization (SAM) and mixup data augmentation. We achieve a new state-of-the-art on SoccerNet-v2, the largest soccer video dataset of its kind, with marked improvements in temporal localization. Additionally, our ablations show: the importance of predicting the temporal displacements; the trade-offs between the u-net and TE trunks; and the benefits of training with SAM and mixup.
△ Less
Submitted 11 July, 2022; v1 submitted 20 May, 2022;
originally announced May 2022.
-
Analyzing Flight Delay Prediction Under Concept Drift
Authors:
Lucas Giusti,
Leonardo Carvalho,
Antonio Tadeu Gomes,
Rafaelli Coutinho,
Jorge Soares,
Eduardo Ogasawara
Abstract:
Flight delays impose challenges that impact any flight transportation system. Predicting when they are going to occur is an important way to mitigate this issue. However, the behavior of the flight delay system varies through time. This phenomenon is known in predictive analytics as concept drift. This paper investigates the prediction performance of different drift handling strategies in aviation…
▽ More
Flight delays impose challenges that impact any flight transportation system. Predicting when they are going to occur is an important way to mitigate this issue. However, the behavior of the flight delay system varies through time. This phenomenon is known in predictive analytics as concept drift. This paper investigates the prediction performance of different drift handling strategies in aviation under different scales (models trained from flights related to a single airport or the entire flight system). Specifically, two research questions were proposed and answered: (i) How do drift handling strategies influence the prediction performance of delays? (ii) Do different scales change the results of drift handling strategies? In our analysis, drift handling strategies are relevant, and their impacts vary according to scale and machine learning models used.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.
-
An effective and friendly tool for seed image analysis
Authors:
Andrea Loddo,
Cecilia Di Ruberto,
A. M. P. G. Vale,
Mariano Ucchesu,
J. M. Soares,
Gianluigi Bacchetta
Abstract:
Image analysis is an essential field for several topics in the life sciences, such as biology or botany. In particular, the analysis of seeds (e.g. fossil research) can provide significant information on their evolution, the history of agriculture, plant domestication and knowledge of diets in ancient times. This work aims to present software that performs image analysis for feature extraction and…
▽ More
Image analysis is an essential field for several topics in the life sciences, such as biology or botany. In particular, the analysis of seeds (e.g. fossil research) can provide significant information on their evolution, the history of agriculture, plant domestication and knowledge of diets in ancient times. This work aims to present software that performs image analysis for feature extraction and classification from images containing seeds through a novel and unique framework. In detail, we propose two plugins \emph{ImageJ}, one able to extract morphological, textual and colour features from seed images, and another to classify seeds into categories using the extracted features. The experimental results demonstrated the correctness and validity of both the extracted features and the classification predictions. The proposed tool is easily extendable to other fields of image analysis.
△ Less
Submitted 23 July, 2021; v1 submitted 31 March, 2021;
originally announced March 2021.
-
A new automatic approach to seed image analysis: From acquisition to segmentation
Authors:
A. M. P. G. Vale,
M. Ucchesu,
C. Di Ruberto,
A. Loddo,
J. M. Soares,
G. Bacchetta
Abstract:
Image Analysis offers a new tool for classifying vascular plant species based on the morphological and colorimetric features of the seeds, and has made significant contributions in systematic studies. However, in order to extract the morphological and colorimetric features, it is necessary to segment the image containing the samples to be analysed. This stage represents one of the most challenging…
▽ More
Image Analysis offers a new tool for classifying vascular plant species based on the morphological and colorimetric features of the seeds, and has made significant contributions in systematic studies. However, in order to extract the morphological and colorimetric features, it is necessary to segment the image containing the samples to be analysed. This stage represents one of the most challenging steps in image processing, as it is difficult to separate uniform and homogeneous objects from the background. In this paper, we present a new, open source plugin for the automatic segmentation of an image of a seed sample. This plugin was written in Java to allow it to work with ImageJ open source software. The new plugin was tested on a total of 3,386 seed samples from 120 species belonging to the Fabaceae family. Digital images were acquired using a flatbed scanner. In order to test the efficacy of this approach in terms of identifying the edges of objects and separating them from the background, each sample was scanned using four different hues of blue for the background, and a total of 480 digital images were elaborated. The performance of the new plugin was compared with a method based on double image acquisition (with a black and white background) using the same seed samples, in which images were manually segmented using the Core ImageJ plugin. The results showed that the new plugin was able to segment all of the digital images without generating any object detection errors. In addition, the new plugin was able to segment images within an average of 0.02 s, while the average time for execution with the manual method was 63 s. This new open source plugin is proven to be able to work on a single image, and to be highly efficient in terms of time and segmentation when working with large numbers of images and a wide diversity of shapes.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Polyhedral study of the Convex Recoloring problem
Authors:
Manoel Campêlo,
Phablo F. S. Moura,
Joel C. Soares
Abstract:
A coloring of the vertices of a connected graph is convex if each color class induces a connected subgraph. We address the convex recoloring (CR) problem defined as follows. Given a graph $G$ and a coloring of its vertices, recolor a minimum number of vertices of $G$ so that the resulting coloring is convex. This problem, known to be NP-hard even on paths, was firstly motivated by applications on…
▽ More
A coloring of the vertices of a connected graph is convex if each color class induces a connected subgraph. We address the convex recoloring (CR) problem defined as follows. Given a graph $G$ and a coloring of its vertices, recolor a minimum number of vertices of $G$ so that the resulting coloring is convex. This problem, known to be NP-hard even on paths, was firstly motivated by applications on perfect phylogenies. In this work, we study CR on general graphs from a polyhedral point of view. First, we introduce a full-dimensional polytope based on the idea of connected subgraphs, and present a class of valid inequalities with righthand side one that comprises all facet-defining inequalities with binary coefficients when the input graph is a tree. Moreover, we define a general class of inequalities with righthand side in $\{1, \ldots, k\}$, where $k$ is the amount of colors used in the initial coloring, and show sufficient conditions for validity and facetness of such inequalities. Finally, we report on computational experiments for an application on mobile networks that can be modeled by the polytope of CR on paths. We evaluate the potential of the proposed inequalities to reduce the integrality gaps.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Image Captioning: Transforming Objects into Words
Authors:
Simao Herdade,
Armin Kappeler,
Kofi Boakye,
Joao Soares
Abstract:
Image captioning models typically follow an encoder-decoder architecture which uses abstract image feature vectors as input to the encoder. One of the most successful algorithms uses feature vectors extracted from the region proposals obtained from an object detector. In this work we introduce the Object Relation Transformer, that builds upon this approach by explicitly incorporating information a…
▽ More
Image captioning models typically follow an encoder-decoder architecture which uses abstract image feature vectors as input to the encoder. One of the most successful algorithms uses feature vectors extracted from the region proposals obtained from an object detector. In this work we introduce the Object Relation Transformer, that builds upon this approach by explicitly incorporating information about the spatial relationship between input detected objects through geometric attention. Quantitative and qualitative results demonstrate the importance of such geometric attention for image captioning, leading to improvements on all common captioning metrics on the MS-COCO dataset.
△ Less
Submitted 11 January, 2020; v1 submitted 13 June, 2019;
originally announced June 2019.
-
A Review on Flight Delay Prediction
Authors:
Alice Sternberg,
Jorge Soares,
Diego Carvalho,
Eduardo Ogasawara
Abstract:
Flight delays hurt airlines, airports, and passengers. Their prediction is crucial during the decision-making process for all players of commercial aviation. Moreover, the development of accurate prediction models for flight delays became cumbersome due to the complexity of air transportation system, the number of methods for prediction, and the deluge of flight data. In this context, this paper p…
▽ More
Flight delays hurt airlines, airports, and passengers. Their prediction is crucial during the decision-making process for all players of commercial aviation. Moreover, the development of accurate prediction models for flight delays became cumbersome due to the complexity of air transportation system, the number of methods for prediction, and the deluge of flight data. In this context, this paper presents a thorough literature review of approaches used to build flight delay prediction models from the Data Science perspective. We propose a taxonomy and summarize the initiatives used to address the flight delay prediction problem, according to scope, data, and computational methods, giving particular attention to an increased usage of machine learning methods. Besides, we also present a timeline of significant works that depicts relationships between flight delay prediction problems and research trends to address them.
The published version of this paper is made available at \url{https://doi.org/10.1080/01441647.2020.1861123}.
Please cite as:
L. Carvalho, A. Sternberg, L. Maia Gonçalves, A. Beatriz Cruz, J.A. Soares, D. Brandão, D. Carvalho, e E. Ogasawara, 2020, On the relevance of data science for flight delay research: a systematic review, Transport Reviews
△ Less
Submitted 4 April, 2021; v1 submitted 15 March, 2017;
originally announced March 2017.
-
Multi-Objective Software Suite of Two-Dimensional Shape Descriptors for Object-Based Image Analysis
Authors:
Andrea Baraldi,
João V. B. Soares
Abstract:
In recent years two sets of planar (2D) shape attributes, provided with an intuitive physical meaning, were proposed to the remote sensing community by, respectively, Nagao & Matsuyama and Shackelford & Davis in their seminal works on the increasingly popular geographic object based image analysis (GEOBIA) paradigm. These two published sets of intuitive geometric features were selected as initial…
▽ More
In recent years two sets of planar (2D) shape attributes, provided with an intuitive physical meaning, were proposed to the remote sensing community by, respectively, Nagao & Matsuyama and Shackelford & Davis in their seminal works on the increasingly popular geographic object based image analysis (GEOBIA) paradigm. These two published sets of intuitive geometric features were selected as initial conditions by the present R&D software project, whose multi-objective goal was to accomplish: (i) a minimally dependent and maximally informative design (knowledge/information representation) of a general purpose, user and application independent dictionary of 2D shape terms provided with a physical meaning intuitive to understand by human end users and (ii) an effective (accurate, scale invariant, easy to use) and efficient implementation of 2D shape descriptors. To comply with the Quality Assurance Framework for Earth Observation guidelines, the proposed suite of geometric functions is validated by means of a novel quantitative quality assurance policy, centered on inter feature dependence (causality) assessment. This innovative multivariate feature validation strategy is alternative to traditional feature selection procedures based on either inductive data learning classification accuracy estimation, which is inherently case specific, or cross correlation estimation, because statistical cross correlation does not imply causation. The project deliverable is an original general purpose software suite of seven validated off the shelf 2D shape descriptors intuitive to use. Alternative to existing commercial or open source software libraries of tens of planar shape functions whose informativeness remains unknown, it is eligible for use in (GE)OBIA systems in operating mode, expected to mimic human reasoning based on a convergence of evidence approach.
△ Less
Submitted 2 February, 2017; v1 submitted 8 January, 2017;
originally announced January 2017.
-
Retinal Vessel Segmentation Using the 2-D Morlet Wavelet and Supervised Classification
Authors:
João V. B. Soares,
Jorge J. G. Leandro,
Roberto M. Cesar Jr.,
Herbert F. Jelinek,
Michael J. Cree
Abstract:
We present a method for automated segmentation of the vasculature in retinal images. The method produces segmentations by classifying each image pixel as vessel or non-vessel, based on the pixel's feature vector. Feature vectors are composed of the pixel's intensity and continuous two-dimensional Morlet wavelet transform responses taken at multiple scales. The Morlet wavelet is capable of tuning…
▽ More
We present a method for automated segmentation of the vasculature in retinal images. The method produces segmentations by classifying each image pixel as vessel or non-vessel, based on the pixel's feature vector. Feature vectors are composed of the pixel's intensity and continuous two-dimensional Morlet wavelet transform responses taken at multiple scales. The Morlet wavelet is capable of tuning to specific frequencies, thus allowing noise filtering and vessel enhancement in a single step. We use a Bayesian classifier with class-conditional probability density functions (likelihoods) described as Gaussian mixtures, yielding a fast classification, while being able to model complex decision surfaces and compare its performance with the linear minimum squared error classifier. The probability distributions are estimated based on a training set of labeled pixels obtained from manual segmentations. The method's performance is evaluated on publicly available DRIVE and STARE databases of manually labeled non-mydriatic images. On the DRIVE database, it achieves an area under the receiver operating characteristic (ROC) curve of 0.9598, being slightly superior than that presented by the method of Staal et al.
△ Less
Submitted 11 May, 2006; v1 submitted 30 September, 2005;
originally announced October 2005.