Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation

Tholan, Masoud Thajudeen; Hegde, Vinayaka; Sharma, Chetan; Ghosh, Prasanta Kumar

doi:10.1109/ICASSP49660.2025.10889096

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2502.14418 (eess)

[Submitted on 20 Feb 2025]

Title:Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation

Authors:Masoud Thajudeen Tholan, Vinayaka Hegde, Chetan Sharma, Prasanta Kumar Ghosh

View PDF HTML (experimental)

Abstract:Real-time Magnetic Resonance Imaging (rtMRI) is frequently used in speech production studies as it provides a complete view of the vocal tract during articulation. This study investigates the effectiveness of rtMRI in analyzing vocal tract movements by employing the SegNet and UNet models for Air-Tissue Boundary (ATB)segmentation tasks. We conducted pretraining of a few base models using increasing numbers of subjects and videos, to assess performance on two datasets. First, consisting of unseen subjects with unseen videos from the same data source, achieving 0.33% and 0.91% (Pixel-wise Classification Accuracy (PCA) and Dice Coefficient respectively) better than its matched condition. Second, comprising unseen videos from a new data source, where we obtained an accuracy of 99.63% and 98.09% (PCA and Dice Coefficient respectively) of its matched condition performance. Here, matched condition performance refers to the performance of a model trained only on the test subjects which was set as a benchmark for the other models. Our findings highlight the significance of fine-tuning and adapting models with limited data. Notably, we demonstrated that effective model adaptation can be achieved with as few as 15 rtMRI frames from any new dataset.

Comments:	Accepted to ICASSP 2025
Subjects:	Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Cite as:	arXiv:2502.14418 [eess.AS]
	(or arXiv:2502.14418v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2502.14418
Journal reference:	IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, pp. 1-5
Related DOI:	https://doi.org/10.1109/ICASSP49660.2025.10889096

Submission history

From: Masoud Thajudeen Tholan [view email]
[v1] Thu, 20 Feb 2025 10:15:43 UTC (3,223 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators