Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks

Alashhab, Samer; Gallego, Antonio Javier; Lozano, Miguel Ángel

doi:10.1016/j.engappai.2022.105188

Computer Science > Computer Vision and Pattern Recognition

arXiv:2205.06980 (cs)

[Submitted on 14 May 2022]

Title:Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks

Authors:Samer Alashhab, Antonio Javier Gallego, Miguel Ángel Lozano

View PDF

Abstract:This paper proposes an interactive system for mobile devices controlled by hand gestures aimed at helping people with visual impairments. This system allows the user to interact with the device by making simple static and dynamic hand gestures. Each gesture triggers a different action in the system, such as object recognition, scene description or image scaling (e.g., pointing a finger at an object will show a description of it). The system is based on a multi-head neural network architecture, which initially detects and classifies the gestures, and subsequently, depending on the gesture detected, performs a second stage that carries out the corresponding action. This multi-head architecture optimizes the resources required to perform different tasks simultaneously, and takes advantage of the information obtained from an initial backbone to perform different processes in a second stage. To train and evaluate the system, a dataset with about 40k images was manually compiled and labeled including different types of hand gestures, backgrounds (indoors and outdoors), lighting conditions, etc. This dataset contains synthetic gestures (whose objective is to pre-train the system in order to improve the results) and real images captured using different mobile phones. The results obtained and the comparison made with the state of the art show competitive results as regards the different actions performed by the system, such as the accuracy of classification and localization of gestures, or the generation of descriptions for objects and scenes.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Report number:	114
Cite as:	arXiv:2205.06980 [cs.CV]
	(or arXiv:2205.06980v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2205.06980
Journal reference:	Engineering Applications of Artificial Intelligence, 2022
Related DOI:	https://doi.org/10.1016/j.engappai.2022.105188

Submission history

From: Antonio Javier Gallego [view email]
[v1] Sat, 14 May 2022 06:01:47 UTC (9,851 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators