ViewpointDepth: A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts
Authors:
Aurel Pjetri,
Stefano Caprasecca,
Leonardo Taccari,
Matteo Simoncini,
Henrique PiƱeiro Monteagudo,
Wallace Walter,
Douglas Coimbra de Andrade,
Francesco Sambo,
Andrew David Bagdanov
Abstract:
Monocular depth estimation is a critical task for autonomous driving and many other computer vision applications. While significant progress has been made in this field, the effects of viewpoint shifts on depth estimation models remain largely underexplored. This paper introduces a novel dataset and evaluation methodology to quantify the impact of different camera positions and orientations on mon…
▽ More
Monocular depth estimation is a critical task for autonomous driving and many other computer vision applications. While significant progress has been made in this field, the effects of viewpoint shifts on depth estimation models remain largely underexplored. This paper introduces a novel dataset and evaluation methodology to quantify the impact of different camera positions and orientations on monocular depth estimation performance. We propose a ground truth strategy based on homography estimation and object detection, eliminating the need for expensive LIDAR sensors. We collect a diverse dataset of road scenes from multiple viewpoints and use it to assess the robustness of a modern depth estimation model to geometric shifts. After assessing the validity of our strategy on a public dataset, we provide valuable insights into the limitations of current models and highlight the importance of considering viewpoint variations in real-world applications.
△ Less
Submitted 3 February, 2025; v1 submitted 26 September, 2024;
originally announced September 2024.
A neural attention model for speech command recognition
Authors:
Douglas Coimbra de Andrade,
Sabato Leo,
Martin Loesener Da Silva Viana,
Christoph Bernkopf
Abstract:
This paper introduces a convolutional recurrent network with attention for speech command recognition. Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keepi…
▽ More
This paper introduces a convolutional recurrent network with attention for speech command recognition. Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint of only 202K trainable parameters. Results are compared with previous convolutional implementations on 5 different tasks (20 commands recognition (V1 and V2), 12 commands recognition (V1), 35 word recognition (V1) and left-right (V1)). We show detailed performance results and demonstrate that the proposed attention mechanism not only improves performance but also allows inspecting what regions of the audio were taken into consideration by the network when outputting a given category.
△ Less
Submitted 27 August, 2018;
originally announced August 2018.