Search | arXiv e-print repository

Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024

Authors: Patrick Pfreundschuh, Giovanni Cioffi, Cornelius von Einem, Alexander Wyss, Hans Wernher van de Venn, Cesar Cadena, Davide Scaramuzza, Roland Siegwart, Alireza Darvishy

Abstract: Visually impaired individuals face significant challenges navigating and interacting with unknown situations, particularly in tasks requiring spatial awareness and semantic scene understanding. To accelerate the development and evaluate the state of technologies that enable visually impaired people to solve these tasks, the Vision Assistance Race (VIS) at the Cybathlon 2024 competition was organiz… ▽ More Visually impaired individuals face significant challenges navigating and interacting with unknown situations, particularly in tasks requiring spatial awareness and semantic scene understanding. To accelerate the development and evaluate the state of technologies that enable visually impaired people to solve these tasks, the Vision Assistance Race (VIS) at the Cybathlon 2024 competition was organized. In this work, we present Sight Guide, a wearable assistive system designed for the VIS. The system processes data from multiple RGB and depth cameras on an embedded computer that guides the user through complex, real-world-inspired tasks using vibration signals and audio commands. Our software architecture integrates classical robotics algorithms with learning-based approaches to enable capabilities such as obstacle avoidance, object detection, optical character recognition, and touchscreen interaction. In a testing environment, Sight Guide achieved a 95.7% task success rate, and further demonstrated its effectiveness during the Cybathlon competition. This work provides detailed insights into the system design, evaluation results, and lessons learned, and outlines directions towards a broader real-world applicability. △ Less

Submitted 3 June, 2025; originally announced June 2025.

arXiv:2503.22216 [pdf, other]

doi 10.1145/3706598.3713084

Towards More Accessible Scientific PDFs for People with Visual Impairments: Step-by-Step PDF Remediation to Improve Tag Accuracy

Authors: Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Alireza Darvishy

Abstract: PDF inaccessibility is an ongoing challenge that hinders individuals with visual impairments from reading and navigating PDFs using screen readers. This paper presents a step-by-step process for both novice and experienced users to create accessible PDF documents, including an approach for creating alternative text for mathematical formulas without expert knowledge. In a study involving nineteen p… ▽ More PDF inaccessibility is an ongoing challenge that hinders individuals with visual impairments from reading and navigating PDFs using screen readers. This paper presents a step-by-step process for both novice and experienced users to create accessible PDF documents, including an approach for creating alternative text for mathematical formulas without expert knowledge. In a study involving nineteen participants, we evaluated our prototype PAVE 2.0 by comparing it against Adobe Acrobat Pro, the existing standard for remediating PDFs. Our study shows that experienced users improved their tagging scores from 42.0% to 80.1%, and novice users from 39.2% to 75.2% with PAVE 2.0. Overall, fifteen participants stated that they would prefer to use PAVE 2.0 in the future, and all participants would recommend it for novice users. Our work demonstrates PAVE 2.0's potential for increasing PDF accessibility for people with visual impairments and highlights remaining challenges. △ Less

Submitted 28 March, 2025; originally announced March 2025.

Journal ref: CHI Conference on Human Factors in Computing Systems (CHI '25), April 26-May 1, 2025, Yokohama, Japan

arXiv:2411.14358 [pdf, other]

doi 10.3390/s24248164

InCrowd-VI: A Realistic Visual-Inertial Dataset for Evaluating SLAM in Indoor Pedestrian-Rich Spaces for Human Navigation

Authors: Marziyeh Bamdad, Hans-Peter Hutter, Alireza Darvishy

Abstract: Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Proje… ▽ More Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 hours of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios. Under challenging conditions, systems exceeded the required localization accuracy of 0.5 meters and the 1\% drift threshold, with classical methods showing drift up to 5-10\%. While deep learning-based approaches maintained high pose estimation coverage (>90\%), they failed to achieve real-time processing speeds necessary for walking pace navigation. These results demonstrate the need and value of a new dataset to advance SLAM research for visually impaired navigation in complex indoor environments. The dataset and associated tools are publicly available at https://incrowd-vi.cloudlab.zhaw.ch/. △ Less

Submitted 17 December, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

Comments: 24 pages, 8 figures, 6 tables

arXiv:2404.13667 [pdf, other]

doi 10.1109/ACCESS.2024.3404834

MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

Authors: Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, Alireza Darvishy

Abstract: Printed mathematical expression recognition (MER) models are usually trained and tested using LaTeX-generated mathematical expressions (MEs) as input and the LaTeX source code as ground truth. As the same ME can be generated by various different LaTeX source codes, this leads to unwanted variations in the ground truth data that bias test performance results and hinder efficient learning. In additi… ▽ More Printed mathematical expression recognition (MER) models are usually trained and tested using LaTeX-generated mathematical expressions (MEs) as input and the LaTeX source code as ground truth. As the same ME can be generated by various different LaTeX source codes, this leads to unwanted variations in the ground truth data that bias test performance results and hinder efficient learning. In addition, the use of only one font to generate the MEs heavily limits the generalization of the reported results to realistic scenarios. We propose a data-centric approach to overcome this problem, and present convincing experimental results: Our main contribution is an enhanced LaTeX normalization to map any LaTeX ME to a canonical form. Based on this process, we developed an improved version of the benchmark dataset im2latex-100k, featuring 30 fonts instead of one. Second, we introduce the real-world dataset realFormula, with MEs extracted from papers. Third, we developed a MER model, MathNet, based on a convolutional vision transformer, with superior results on all four test sets (im2latex-100k, im2latexv2, realFormula, and InftyMDB-1), outperforming the previous state of the art by up to 88.3%. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 12 pages, 6 figures

Journal ref: IEEE Access 12 (2024) 76963-76974

arXiv:2305.14041

The state of scientific PDF accessibility in repositories: A survey in Switzerland

Authors: Alireza Darvishy, Rolf Sethe, Ines Engler, Oriane Pierres, Juliet Manning

Abstract: This survey analyzed the quality of the PDF documents on online repositories in Switzerland, examining their accessibility for people with visual impairments. Two minimal accessibility features were analyzed: the PDFs had to have tags and a hierarchical heading structure. The survey also included interviews with the managers or heads of multiple Swiss universities' repositories to assess the gener… ▽ More This survey analyzed the quality of the PDF documents on online repositories in Switzerland, examining their accessibility for people with visual impairments. Two minimal accessibility features were analyzed: the PDFs had to have tags and a hierarchical heading structure. The survey also included interviews with the managers or heads of multiple Swiss universities' repositories to assess the general opinion and knowledge of PDF accessibility. An analysis of interviewee responses indicates an overall lack of awareness of PDF accessibility, and showed that online repositories currently have no concrete plans to address the issue. This paper concludes by presenting a set of recommendations for online repositories to improve the accessibility of their PDF documents. △ Less

Submitted 14 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: We need to modify this paper and make some extensions before re-uploading

arXiv:2301.02546 [pdf]

A new conversational interaction concept for document creation and editing on mobile devices for visually impaired users

Authors: Alireza Darvishy, Hans-Peter Hutter, Edin Beljulji, Zeno Heeb

Abstract: This paper describes the ongoing development of a conversational interaction concept that allows visually impaired users to easily create and edit text documents on mobile devices using mainly voice input. In order to verify the concept, a prototype app was developed and tested for both iOS and Android systems, based on the natural-language understanding (NLU) platform Google Dialogflow. The app a… ▽ More This paper describes the ongoing development of a conversational interaction concept that allows visually impaired users to easily create and edit text documents on mobile devices using mainly voice input. In order to verify the concept, a prototype app was developed and tested for both iOS and Android systems, based on the natural-language understanding (NLU) platform Google Dialogflow. The app and interaction concept were repeatedly tested by users with and without visual impairments. Based on their feedback, the concept was continuously refined, adapted and improved on both mobile platforms. In an iterative user-centred design approach, the following research questions were investigated: Can a visually impaired user rely mainly on speech commands to efficiently create and edit a document on mobile devices? User testing found that an interaction concept based on conversational speech commands was easy and intuitive for visually impaired users. However, it was also found that relying on speech commands alone created its own obstacles, and that a combination of gestures and voice interaction would be more robust. Future research and more extensive useability tests should be carried out among visually impaired users in order to optimize the interaction concept. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2212.04745 [pdf, other]

doi 10.1109/ACCESS.2024.3454571

SLAM for Visually Impaired People: a Survey

Authors: Marziyeh Bamdad, Davide Scaramuzza, Alireza Darvishy

Abstract: In recent decades, several assistive technologies have been developed to improve the ability of blind and visually impaired (BVI) individuals to navigate independently and safely. At the same time, simultaneous localization and mapping (SLAM) techniques have become sufficiently robust and efficient to be adopted in developing these assistive technologies. We present the first systematic literature… ▽ More In recent decades, several assistive technologies have been developed to improve the ability of blind and visually impaired (BVI) individuals to navigate independently and safely. At the same time, simultaneous localization and mapping (SLAM) techniques have become sufficiently robust and efficient to be adopted in developing these assistive technologies. We present the first systematic literature review of 54 recent studies on SLAM-based solutions for blind and visually impaired people, focusing on literature published from 2017 onward. This review explores various localization and mapping techniques employed in this context. We systematically identified and categorized diverse SLAM approaches and analyzed their localization and mapping techniques, sensor types, computing resources, and machine-learning methods. We discuss the advantages and limitations of these techniques for blind and visually impaired navigation. Moreover, we examine the major challenges described across studies, including practical challenges and considerations that affect usability and adoption. Our analysis also evaluates the effectiveness of these SLAM-based solutions in real-world scenarios and user satisfaction, providing insights into their practical impact on BVI mobility. The insights derived from this review identify critical gaps and opportunities for future research activities, particularly in addressing the challenges presented by dynamic and complex environments. We explain how SLAM technology offers the potential to improve the ability of visually impaired individuals to navigate effectively. Finally, we present future opportunities and challenges in this domain. △ Less

Submitted 16 August, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 47 pages, 42 tables, 6 figures

Showing 1–7 of 7 results for author: Darvishy, A