Skip to main content

Showing 1–6 of 6 results for author: Sasou, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.11014  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech

    Authors: Bagus Tris Atmaja, Akira Sasou

    Abstract: Speech emotion recognition has evolved from research to practical applications. Previous studies of emotion recognition from speech have focused on developing models on certain datasets like IEMOCAP. The lack of data in the domain of emotion modeling emerges as a challenge to evaluate models in the other dataset, as well as to evaluate speech emotion recognition models that work in a multilingual… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 4 pages, 6 tables, accepted in APSIPA-ASC 2023

  2. Effect of different splitting criteria on the performance of speech emotion recognition

    Authors: Bagus Tris Atmaja, Akira Sasou

    Abstract: Traditional speech emotion recognition (SER) evaluations have been performed merely on a speaker-independent condition; some of them even did not evaluate their result on this condition. This paper highlights the importance of splitting training and test data for SER by script, known as sentence-open or text-independent criteria. The results show that employing sentence-open criteria degraded the… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted at TENCON 2021

    Journal ref: TENCON 2021, pp 760-764

  3. arXiv:2210.05843  [pdf, other

    eess.AS cs.SD

    Cross-dataset COVID-19 Transfer Learning with Cough Detection, Cough Segmentation, and Data Augmentation

    Authors: Bagus Tris Atmaja, Zanjabila, Suyanto, Akira Sasou

    Abstract: This paper addresses issues on cough-based COVID-19 detection. We propose a cross-dataset transfer learning approach to improve the performance of COVID-19 detection by incorporating cough detection, cough segmentation, and data augmentation. The first aimed at removing non-cough signals and cough signals with low probability. The second aimed at segregating several coughs in a waveform into indiv… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  4. Comparing Hysteresis Comparator and RMS Threshold Methods for Automatic Single Cough Segmentations

    Authors: Bagus Tris Atmaja, Zanjabila, Suyanto, Akira Sasou

    Abstract: Research on diagnosing diseases based on voice signals currently are rapidly increasing, including cough-related diseases. When training the cough sound signals into deep learning models, it is necessary to have a standard input by segmenting several cough signals into individual cough signals. Previous research has been developed to segment cough signals from non-cough signals. This research eval… ▽ More

    Submitted 12 December, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: 3 figure,s 3 tables, accepted in IJIT

  5. arXiv:2209.13146  [pdf, other

    eess.AS

    Predicting Affective Vocal Bursts with Finetuned wav2vec 2.0

    Authors: Bagus Tris Atmaja, Akira Sasou

    Abstract: The studies of predicting affective states from human voices have relied heavily on speech. This study, indeed, explores the recognition of humans' affective state from their vocal burst, a short non-verbal vocalization. Borrowing the idea from the recent success of wav2vec 2.0, we evaluated finetuned wav2vec 2.0 models from different datasets to predict the affective state of the speaker from the… ▽ More

    Submitted 25 October, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

  6. arXiv:2207.10333  [pdf, other

    eess.AS

    Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding

    Authors: Bagus Tris Atmaja, Zanjabila, Akira Sasou

    Abstract: In this paper, we demonstrated the benefit of using pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Journal ref: International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) 2022