-
LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping
Authors:
J. Schmidinger,
S. Vogel,
V. Barkov,
A. -D. Pham,
R. Gebbers,
H. Tavakoli,
J. Correa,
T. R. Tavares,
P. Filippi,
E. J. Jones,
V. Lukas,
E. Boenecke,
J. Ruehlmann,
I. Schroeter,
E. Kramer,
S. Paetzold,
M. Kodaira,
A. M. J. -C. Wadoux,
L. Bragazza,
K. Metzger,
J. Huang,
D. S. M. Valente,
J. L. Safanelli,
E. L. Bottega,
R. S. D. Dalmolin
, et al. (11 additional authors not shown)
Abstract:
Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging and contentious. Benchmarking studies on multiple datasets are needed to reveal strengths and limitations of commonly used methods. Existing DSM studies usually rely on a single dataset with restricted access, leading to incomplete and potentially mis…
▽ More
Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging and contentious. Benchmarking studies on multiple datasets are needed to reveal strengths and limitations of commonly used methods. Existing DSM studies usually rely on a single dataset with restricted access, leading to incomplete and potentially misleading conclusions. To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa). LimeSoDa consists of 31 field- and farm-scale datasets from various countries. Each dataset has three target soil properties: (1) soil organic matter or soil organic carbon, (2) clay content and (3) pH, alongside a set of features. Features are dataset-specific and were obtained by optical spectroscopy, proximal- and remote soil sensing. All datasets were aligned to a tabular format and are ready-to-use for modeling. We demonstrated the use of LimeSoDa for benchmarking by comparing the predictive performance of four learning algorithms across all datasets. This comparison included multiple linear regression (MLR), support vector regression (SVR), categorical boosting (CatBoost) and random forest (RF). The results showed that although no single algorithm was universally superior, certain algorithms performed better in specific contexts. MLR and SVR performed better on high-dimensional spectral datasets, likely due to better compatibility with principal components. In contrast, CatBoost and RF exhibited considerably better performances when applied to datasets with a moderate number (< 20) of features. These benchmarking results illustrate that the performance of a method is highly context-dependent. LimeSoDa therefore provides an important resource for improving the development and evaluation of statistical methods in DSM.
△ Less
Submitted 20 May, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
Multi-label Cross-lingual automatic music genre classification from lyrics with Sentence BERT
Authors:
Tiago Fernandes Tavares,
Fabio José Ayres
Abstract:
Music genres are shaped by both the stylistic features of songs and the cultural preferences of artists' audiences. Automatic classification of music genres using lyrics can be useful in several applications such as recommendation systems, playlist creation, and library organization. We present a multi-label, cross-lingual genre classification system based on multilingual sentence embeddings gener…
▽ More
Music genres are shaped by both the stylistic features of songs and the cultural preferences of artists' audiences. Automatic classification of music genres using lyrics can be useful in several applications such as recommendation systems, playlist creation, and library organization. We present a multi-label, cross-lingual genre classification system based on multilingual sentence embeddings generated by sBERT. Using a bilingual Portuguese-English dataset with eight overlapping genres, we demonstrate the system's ability to train on lyrics in one language and predict genres in another. Our approach outperforms the baseline approach of translating lyrics and using a bag-of-words representation, improving the genrewise average F1-Score from 0.35 to 0.69. The classifier uses a one-vs-all architecture, enabling it to assign multiple genre labels to a single lyric. Experimental results reveal that dataset centralization notably improves cross-lingual performance. This approach offers a scalable solution for genre classification across underrepresented languages and cultural domains, advancing the capabilities of music information retrieval systems.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Measuring similarity between embedding spaces using induced neighborhood graphs
Authors:
Tiago F. Tavares,
Fabio Ayres,
Paris Smaragdis
Abstract:
Deep Learning techniques have excelled at generating embedding spaces that capture semantic similarities between items. Often these representations are paired, enabling experiments with analogies (pairs within the same domain) and cross-modality (pairs across domains). These experiments are based on specific assumptions about the geometry of embedding spaces, which allow finding paired items by ex…
▽ More
Deep Learning techniques have excelled at generating embedding spaces that capture semantic similarities between items. Often these representations are paired, enabling experiments with analogies (pairs within the same domain) and cross-modality (pairs across domains). These experiments are based on specific assumptions about the geometry of embedding spaces, which allow finding paired items by extrapolating the positional relationships between embedding pairs in the training dataset, allowing for tasks such as finding new analogies, and multimodal zero-shot classification. In this work, we propose a metric to evaluate the similarity between paired item representations. Our proposal is built from the structural similarity between the nearest-neighbors induced graphs of each representation, and can be configured to compare spaces based on different distance metrics and on different neighborhood sizes. We demonstrate that our proposal can be used to identify similar structures at different scales, which is hard to achieve with kernel methods such as Centered Kernel Alignment (CKA). We further illustrate our method with two case studies: an analogy task using GloVe embeddings, and zero-shot classification in the CIFAR-100 dataset using CLIP embeddings. Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity. These findings can help explain performance differences in these tasks, and may lead to improved design of paired-embedding models in the future.
△ Less
Submitted 13 November, 2024;
originally announced November 2024.
-
On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning
Authors:
Tiago Tavares,
Fabio Ayres,
Zhepei Wang,
Paris Smaragdis
Abstract:
Recent advances in audio-text cross-modal contrastive learning have shown its potential towards zero-shot learning. One possibility for this is by projecting item embeddings from pre-trained backbone neural networks into a cross-modal space in which item similarity can be calculated in either domain. This process relies on a strong unimodal pre-training of the backbone networks, and on a data-inte…
▽ More
Recent advances in audio-text cross-modal contrastive learning have shown its potential towards zero-shot learning. One possibility for this is by projecting item embeddings from pre-trained backbone neural networks into a cross-modal space in which item similarity can be calculated in either domain. This process relies on a strong unimodal pre-training of the backbone networks, and on a data-intensive training task for the projectors. These two processes can be biased by unintentional data leakage, which can arise from using supervised learning in pre-training or from inadvertently training the cross-modal projection using labels from the zero-shot learning evaluation. In this study, we show that a significant part of the measured zero-shot learning accuracy is due to strengths inherited from the audio and text backbones, that is, they are not learned in the cross-modal domain and are not transferred from one modality to another.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
DESiRED -- Dynamic, Enhanced, and Smart iRED: A P4-AQM with Deep Reinforcement Learning and In-band Network Telemetry
Authors:
Leandro C. de Almeida,
Washington Rodrigo Dias da Silva,
Thiago C. Tavares,
Rafael Pasquini,
Chrysa Papagianni,
Fábio L. Verdi
Abstract:
Active Queue Management (AQM) is a mechanism employed to alleviate transient congestion in network device buffers, such as routers and switches. Traditional AQM algorithms use fixed thresholds, like target delay or queue occupancy, to compute random packet drop probabilities. A very small target delay can increase packet losses and reduce link utilization, while a large target delay may increase q…
▽ More
Active Queue Management (AQM) is a mechanism employed to alleviate transient congestion in network device buffers, such as routers and switches. Traditional AQM algorithms use fixed thresholds, like target delay or queue occupancy, to compute random packet drop probabilities. A very small target delay can increase packet losses and reduce link utilization, while a large target delay may increase queueing delays while lowering drop probability. Due to dynamic network traffic characteristics, where traffic fluctuations can lead to significant queue variations, maintaining a fixed threshold AQM may not suit all applications. Consequently, we explore the question: \textit{What is the ideal threshold (target delay) for AQMs?} In this work, we introduce DESiRED (Dynamic, Enhanced, and Smart iRED), a P4-based AQM that leverages precise network feedback from In-band Network Telemetry (INT) to feed a Deep Reinforcement Learning (DRL) model. This model dynamically adjusts the target delay based on rewards that maximize application Quality of Service (QoS). We evaluate DESiRED in a realistic P4-based test environment running an MPEG-DASH service. Our findings demonstrate up to a 90x reduction in video stall and a 42x increase in high-resolution video playback quality when the target delay is adjusted dynamically by DESiRED.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Unsupervised Improvement of Audio-Text Cross-Modal Representations
Authors:
Zhepei Wang,
Cem Subakan,
Krishna Subramani,
Junkai Wu,
Tiago Tavares,
Fabio Ayres,
Paris Smaragdis
Abstract:
Recent advances in using language models to obtain cross-modal audio-text representations have overcome the limitations of conventional training approaches that use predefined labels. This has allowed the community to make progress in tasks like zero-shot classification, which would otherwise not be possible. However, learning such representations requires a large amount of human-annotated audio-t…
▽ More
Recent advances in using language models to obtain cross-modal audio-text representations have overcome the limitations of conventional training approaches that use predefined labels. This has allowed the community to make progress in tasks like zero-shot classification, which would otherwise not be possible. However, learning such representations requires a large amount of human-annotated audio-text pairs. In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio. We explore domain-unspecific and domain-specific curation methods to create audio-text pairs that we use to further improve the model. We also show that when domain-specific curation is used in conjunction with a soft-labeled contrastive loss, we are able to obtain significant improvement in terms of zero-shot classification performance on downstream sound event classification or acoustic scene classification tasks.
△ Less
Submitted 31 July, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
A multi-sensor human gait dataset captured through an optical system and inertial measurement units
Authors:
Geise Santos,
Marcelo Wanderley,
Tiago Tavares,
Anderson Rocha
Abstract:
Different technologies can acquire data for gait analysis, such as optical systems and inertial measurement units (IMUs). Each technology has its drawbacks and advantages, fitting best to particular applications. The presented multi-sensor human gait dataset comprises synchronized inertial and optical motion data from 25 subjects free of lower-limb injuries, aged between 18 and 47 years. A smartph…
▽ More
Different technologies can acquire data for gait analysis, such as optical systems and inertial measurement units (IMUs). Each technology has its drawbacks and advantages, fitting best to particular applications. The presented multi-sensor human gait dataset comprises synchronized inertial and optical motion data from 25 subjects free of lower-limb injuries, aged between 18 and 47 years. A smartphone and a custom micro-controlled device with an IMU were attached to one of the subject's legs to capture accelerometer data, and 42 reflexive markers were taped over the whole body to record three-dimensional trajectories. The trajectories and accelerations were simultaneously recorded and synchronized. Participants were instructed to walk on a straight-level walkway at their normal pace. Ten trials for each participant were recorded and pre-processed in each of two sessions, performed on different days. This dataset supports the comparison of gait parameters and properties of inertial and optical capture systems, whereas allows the study of gait characteristics specific for each system.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification
Authors:
Jorge Oliveira,
Francesco Renna,
Paulo Dias Costa,
Marcelo Nogueira,
Cristina Oliveira,
Carlos Ferreira,
Alipio Jorge,
Sandra Mattos,
Thamine Hatem,
Thiago Tavares,
Andoni Elola,
Ali Bahrami Rad,
Reza Sameni,
Gari D Clifford,
Miguel T. Coimbra
Abstract:
Cardiac auscultation is one of the most cost-effective techniques used to detect and identify many heart conditions. Computer-assisted decision systems based on auscultation can support physicians in their decisions. Unfortunately, the application of such systems in clinical trials is still minimal since most of them only aim to detect the presence of extra or abnormal waves in the phonocardiogram…
▽ More
Cardiac auscultation is one of the most cost-effective techniques used to detect and identify many heart conditions. Computer-assisted decision systems based on auscultation can support physicians in their decisions. Unfortunately, the application of such systems in clinical trials is still minimal since most of them only aim to detect the presence of extra or abnormal waves in the phonocardiogram signal, i.e., only a binary ground truth variable (normal vs abnormal) is provided. This is mainly due to the lack of large publicly available datasets, where a more detailed description of such abnormal waves (e.g., cardiac murmurs) exists.
To pave the way to more effective research on healthcare recommendation systems based on auscultation, our team has prepared the currently largest pediatric heart sound dataset. A total of 5282 recordings have been collected from the four main auscultation locations of 1568 patients, in the process, 215780 heart sounds have been manually annotated. Furthermore, and for the first time, each cardiac murmur has been manually annotated by an expert annotator according to its timing, shape, pitch, grading, and quality. In addition, the auscultation locations where the murmur is present were identified as well as the auscultation location where the murmur is detected more intensively. Such detailed description for a relatively large number of heart sounds may pave the way for new machine learning algorithms with a real-world application for the detection and analysis of murmur waves for diagnostic purposes.
△ Less
Submitted 24 December, 2021; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Segment Relevance Estimation for Audio Analysis and Weakly-Labelled Classification
Authors:
Juliano Henrique Foleiss,
Tiago Fernandes Tavares
Abstract:
We propose a method that quantifies the importance, namely relevance, of audio segments for classification in weakly-labelled problems. It works by drawing information from a set of class-wise one-vs-all classifiers. By selecting the classifiers used in each specific classification problem, the relevance measure adapts to different user-defined viewpoints without requiring additional neural networ…
▽ More
We propose a method that quantifies the importance, namely relevance, of audio segments for classification in weakly-labelled problems. It works by drawing information from a set of class-wise one-vs-all classifiers. By selecting the classifiers used in each specific classification problem, the relevance measure adapts to different user-defined viewpoints without requiring additional neural network training. This characteristic allows the relevance measure to highlight audio segments that quickly adapt to user-defined criteria. Such functionality can be used for computer-assisted audio analysis. Also, we propose a neural network architecture, namely RELNET, that leverages the relevance measure for weakly-labelled audio classification problems. RELNET was evaluated in the DCASE2018 dataset and achieved competitive classification results when compared to previous attention-based proposals.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Random Projections of Mel-Spectrograms as Low-Level Features for Automatic Music Genre Classification
Authors:
Juliano Henrique Foleiss,
Tiago Fernandes Tavares
Abstract:
In this work, we analyse the random projections of Mel-spectrograms as low-level features for music genre classification. This approach was compared to handcrafted features, features learned using an auto-encoder and features obtained from a transfer learning setting. Tests in five different well-known, publicly available datasets show that random projections leads to results comparable to learned…
▽ More
In this work, we analyse the random projections of Mel-spectrograms as low-level features for music genre classification. This approach was compared to handcrafted features, features learned using an auto-encoder and features obtained from a transfer learning setting. Tests in five different well-known, publicly available datasets show that random projections leads to results comparable to learned features and outperforms features obtained via transfer learning in a shallow learning scenario. Random projections do not require using extensive specialist knowledge and, simultaneously, requires less computational power for training than other projection-based low-level features. Therefore, they can be are a viable choice for usage in shallow learning content-based music genre classification.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Texture Selection for Automatic Music Genre Classification
Authors:
Juliano H. Foleiss,
Tiago F. Tavares
Abstract:
Music Genre Classification is the problem of associating genre-related labels to digitized music tracks. It has applications in the organization of commercial and personal music collections. Often, music tracks are described as a set of timbre-inspired sound textures. In shallow-learning systems, the total number of sound textures per track is usually too high, and texture downsampling is necessar…
▽ More
Music Genre Classification is the problem of associating genre-related labels to digitized music tracks. It has applications in the organization of commercial and personal music collections. Often, music tracks are described as a set of timbre-inspired sound textures. In shallow-learning systems, the total number of sound textures per track is usually too high, and texture downsampling is necessary to make training tractable. Although previous work has solved this by linear downsampling, no extensive work has been done to evaluate how texture selection benefits genre classification in the context of the bag of frames track descriptions. In this paper, we evaluate the impact of frame selection on automatic music genre classification in a bag of frames scenario. We also present a novel texture selector based on K-Means aimed to identify diverse sound textures within each track. We evaluated texture selection in diverse datasets, four different feature sets, as well as its relationship to a univariate feature selection strategy. The results show that frame selection leads to significant improvement over the single vector baseline on datasets consisting of full-length tracks, regardless of the feature set. Results also indicate that the K-Means texture selector achieves significant improvements over the baseline, using fewer textures per track than the commonly used linear downsampling. The results also suggest that texture selection is complementary to the feature selection strategy evaluated. Our qualitative analysis indicates that texture variety within classes benefits model generalization. Our analysis shows that selecting specific audio excerpts can improve classification performance, and it can be done automatically.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.