Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

Vos, Gideon; Trinh, Kelly; Sarnyai, Zoltan; Azghadi, Mostafa Rahimi

doi:10.1016/j.jbi.2023.104556

Computer Science > Machine Learning

arXiv:2209.15146 (cs)

[Submitted on 30 Sep 2022 (v1), last revised 3 Dec 2023 (this version, v2)]

Title:Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

Authors:Gideon Vos, Kelly Trinh, Zoltan Sarnyai, Mostafa Rahimi Azghadi

View PDF HTML (experimental)

Abstract:Introduction. We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols. Next, we propose and evaluate methods combining these datasets into a single, large dataset. Finally, we propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data.
Methods. Sensor biomarker data from six public datasets were utilized in this study. To test model generalization, we developed a gradient boosting model trained on one dataset (SWELL), and tested its predictive power on two datasets previously used in other studies (WESAD, NEURO). Next, we merged four small datasets, i.e. (SWELL, NEURO, WESAD, UBFC-Phys), to provide a combined total of 99 subjects,. In addition, we utilized random sampling combined with another dataset (EXAM) to build a larger training dataset consisting of 200 synthesized subjects,. Finally, we developed an ensemble model that combines our gradient boosting model with an artificial neural network, and tested it on two additional, unseen publicly available stress datasets (WESAD and Toadstool).
Results. Our method delivers a robust stress measurement system capable of achieving 85% predictive accuracy on new, unseen validation data, achieving a 25% performance improvement over single models trained on small datasets.
Conclusion. Models trained on small, single study protocol datasets do not generalize well for use on new, unseen data and lack statistical power. Ma-chine learning models trained on a dataset containing a larger number of varied study subjects capture physiological variance better, resulting in more robust stress detection.

Comments:	37 pages, 11 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
MSC classes:	I.2.0
Cite as:	arXiv:2209.15146 [cs.LG]
	(or arXiv:2209.15146v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.15146
Related DOI:	https://doi.org/10.1016/j.jbi.2023.104556

Submission history

From: Gideon Vos [view email]
[v1] Fri, 30 Sep 2022 00:20:57 UTC (1,317 KB)
[v2] Sun, 3 Dec 2023 05:39:00 UTC (1,841 KB)

Computer Science > Machine Learning

Title:Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators