Search | arXiv e-print repository

arXiv:2505.20745 [pdf, ps, other]

Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

Authors: Jingping Nie, Dung T. Tran, Karan Thakkar, Vasudha Kowtha, Jon Huang, Carlos Avendano, Erdrin Azemi, Vikramjit Mitra

Abstract: Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs. However, there has been little exploration of the extent to which auscultation is encoded in these pre-trained FM representations. In this… ▽ More Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs. However, there has been little exploration of the extent to which auscultation is encoded in these pre-trained FM representations. In this work, using a publicly available phonocardiogram (PCG) dataset and a heart rate (HR) estimation model, we conduct a layer-wise investigation of six acoustic representation FMs: HuBERT, wav2vec2, wavLM, Whisper, Contrastive Language-Audio Pretraining (CLAP), and an in-house CLAP model. Additionally, we implement the baseline method from Nie et al., 2024 (which relies on acoustic features) and show that overall, representation vectors from pre-trained foundation models (FMs) offer comparable performance to the baseline. Notably, HR estimation using the representations from the audio encoder of the in-house CLAP model outperforms the results obtained from the baseline, achieving a lower mean absolute error (MAE) across various train/validation/test splits despite the domain mismatch. △ Less

Submitted 29 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

Comments: 5 pages, Interspeech 2025 conference

arXiv:2503.22711 [pdf, other]

Modeling speech emotion with label variance and analyzing performance across speakers and unseen acoustic conditions

Authors: Vikramjit Mitra, Amrit Romana, Dung T. Tran, Erdrin Azemi

Abstract: Spontaneous speech emotion data usually contain perceptual grades where graders assign emotion score after listening to the speech files. Such perceptual grades introduce uncertainty in labels due to grader opinion variation. Grader variation is addressed by using consensus grades as groundtruth, where the emotion with the highest vote is selected. Consensus grades fail to consider ambiguous insta… ▽ More Spontaneous speech emotion data usually contain perceptual grades where graders assign emotion score after listening to the speech files. Such perceptual grades introduce uncertainty in labels due to grader opinion variation. Grader variation is addressed by using consensus grades as groundtruth, where the emotion with the highest vote is selected. Consensus grades fail to consider ambiguous instances where a speech sample may contain multiple emotions, as captured through grader opinion uncertainty. We demonstrate that using the probability density function of the emotion grades as targets instead of the commonly used consensus grades, provide better performance on benchmark evaluation sets compared to results reported in the literature. We show that a saliency driven foundation model (FM) representation selection helps to train a state-of-the-art speech emotion model for both dimensional and categorical emotion recognition. Comparing representations obtained from different FMs, we observed that focusing on overall test-set performance can be deceiving, as it fails to reveal the models generalization capacity across speakers and gender. We demonstrate that performance evaluation across multiple test-sets and performance analysis across gender and speakers are useful in assessing usefulness of emotion models. Finally, we demonstrate that label uncertainty and data-skew pose a challenge to model evaluation, where instead of using the best hypothesis, it is useful to consider the 2- or 3-best hypotheses. △ Less

Submitted 24 March, 2025; originally announced March 2025.

Comments: 11 pages, 5 figures

arXiv:2501.03848 [pdf, other]

Semise: Semi-supervised learning for severity representation in medical image

Authors: Dung T. Tran, Hung Vu, Anh Tran, Hieu Pham, Hong Nguyen, Phong Nguyen

Abstract: This paper introduces SEMISE, a novel method for representation learning in medical imaging that combines self-supervised and supervised learning. By leveraging both labeled and augmented data, SEMISE addresses the challenge of data scarcity and enhances the encoder's ability to extract meaningful features. This integrated approach leads to more informative representations, improving performance o… ▽ More This paper introduces SEMISE, a novel method for representation learning in medical imaging that combines self-supervised and supervised learning. By leveraging both labeled and augmented data, SEMISE addresses the challenge of data scarcity and enhances the encoder's ability to extract meaningful features. This integrated approach leads to more informative representations, improving performance on downstream tasks. As result, our approach achieved a 12% improvement in classification and a 3% improvement in segmentation, outperforming existing methods. These results demonstrate the potential of SIMESE to advance medical image analysis and offer more accurate solutions for healthcare applications, particularly in contexts where labeled data is limited. △ Less

Submitted 7 January, 2025; originally announced January 2025.

Comments: Accepted for presentation at the 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)

arXiv:2405.17582 [pdf]

doi 10.5281/zenodo.6190227

Building a temperature forecasting model for the city with the regression neural network (RNN)

Authors: Nguyen Phuc Tran, Duy Thanh Tran, Thi Thuy Nga Duong

Abstract: In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building… ▽ More In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building predictive models to make accurate simulations. in Vietnam, research on weather forecast models is a recent development, having only begun around 2000. along with advancements in computer science, mathematical models are being built and applied with machine learning techniques to create more accurate and reliable predictive models. this article will summarize the research and solutions for applying recurrent neural networks to forecast urban temperatures. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 6 pages

Journal ref: The 6th International Conference for Small & Medium Business in 2020 (ICSMB 2020)

arXiv:2211.00466 [pdf, other]

doi 10.1109/INDIN51400.2023.10217993

Recognition of Defective Mineral Wool Using Pruned ResNet Models

Authors: Mehdi Rafiei, Dat Thanh Tran, Alexandros Iosifidis

Abstract: Mineral wool production is a non-linear process that makes it hard to control the final quality. Therefore, having a non-destructive method to analyze the product quality and recognize defective products is critical. For this purpose, we developed a visual quality control system for mineral wool. X-ray images of wool specimens were collected to create a training set of defective and non-defective… ▽ More Mineral wool production is a non-linear process that makes it hard to control the final quality. Therefore, having a non-destructive method to analyze the product quality and recognize defective products is critical. For this purpose, we developed a visual quality control system for mineral wool. X-ray images of wool specimens were collected to create a training set of defective and non-defective samples. Afterward, we developed several recognition models based on the ResNet architecture to find the most efficient model. In order to have a light-weight and fast inference model for real-life applicability, two structural pruning methods are applied to the classifiers. Considering the low quantity of the dataset, cross-validation and augmentation methods are used during the training. As a result, we obtained a model with more than 98% accuracy, which in comparison to the current procedure used at the company, it can recognize 20% more defective products. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 6 pages, 5 figures, 3 tables Submitted on IEEE Transactions on Industrial Informatics

arXiv:2109.01184 [pdf, other]

Remote Multilinear Compressive Learning with Adaptive Compression

Authors: Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis

Abstract: Multilinear Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. The level of signal compression affects the detection or classification performance of a MCL model, with higher compression rates often associated with lower inference accuracy. However, higher compression rates are more amenable to a wider range of applications, especially… ▽ More Multilinear Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. The level of signal compression affects the detection or classification performance of a MCL model, with higher compression rates often associated with lower inference accuracy. However, higher compression rates are more amenable to a wider range of applications, especially those that require low operating bandwidth and minimal energy consumption such as Internet-of-Things (IoT) applications. Many communication protocols provide support for adaptive data transmission to maximize the throughput and minimize energy consumption. By developing compressive sensing and learning models that can operate with an adaptive compression rate, we can maximize the informational content throughput of the whole application. In this paper, we propose a novel optimization scheme that enables such a feature for MCL models. Our proposal enables practical implementation of adaptive compressive signal acquisition and inference systems. Experimental results demonstrated that the proposed approach can significantly reduce the amount of computations required during the training phase of remote learning systems but also improve the informational content throughput via adaptive-rate sensing. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: 2 figures, 6 tables

Showing 1–6 of 6 results for author: Tran, D T