-
MonoPIC -- A Monocular Low-Latency Pedestrian Intention Classification Framework for IoT Edges Using ID3 Modelled Decision Trees
Authors:
Sriram Radhakrishna,
Adithya Balasubramanyam
Abstract:
Road accidents involving autonomous vehicles commonly occur in situations where a (pedestrian) obstacle presents itself in the path of the moving vehicle at very sudden time intervals, leaving the robot even lesser time to react to the change in scene. In order to tackle this issue, we propose a novel algorithmic implementation that classifies the intent of a single arbitrarily chosen pedestrian i…
▽ More
Road accidents involving autonomous vehicles commonly occur in situations where a (pedestrian) obstacle presents itself in the path of the moving vehicle at very sudden time intervals, leaving the robot even lesser time to react to the change in scene. In order to tackle this issue, we propose a novel algorithmic implementation that classifies the intent of a single arbitrarily chosen pedestrian in a two dimensional frame into logic states in a procedural manner using quaternions generated from a MediaPipe pose estimation model. This bypasses the need to employ any relatively high latency deep-learning algorithms primarily due to the lack of necessity for depth perception as well as an implicit cap on the computational resources that most IoT edge devices present. The model was able to achieve an average testing accuracy of 83.56% with a reliable variance of 0.0042 while operating with an average latency of 48 milliseconds, demonstrating multiple notable advantages over the current standard of using spatio-temporal convolutional networks for these perceptive tasks.
△ Less
Submitted 3 February, 2024; v1 submitted 31 March, 2023;
originally announced April 2023.
-
Economical Quaternion Extraction from a Human Skeletal Pose Estimate using 2-D Cameras
Authors:
Sriram Radhakrishna,
Adithya Balasubramanyam
Abstract:
In this paper, we present a novel algorithm to extract a quaternion from a two dimensional camera frame for estimating a contained human skeletal pose. The problem of pose estimation is usually tackled through the usage of stereo cameras and intertial measurement units for obtaining depth and euclidean distance for measurement of points in 3D space. However, the usage of these devices comes with a…
▽ More
In this paper, we present a novel algorithm to extract a quaternion from a two dimensional camera frame for estimating a contained human skeletal pose. The problem of pose estimation is usually tackled through the usage of stereo cameras and intertial measurement units for obtaining depth and euclidean distance for measurement of points in 3D space. However, the usage of these devices comes with a high signal processing latency as well as a significant monetary cost. By making use of MediaPipe, a framework for building perception pipelines for human pose estimation, the proposed algorithm extracts a quaternion from a 2-D frame capturing an image of a human object at a sub-fifty millisecond latency while also being capable of deployment at edges with a single camera frame and a generally low computational resource availability, especially for use cases involving last-minute detection and reaction by autonomous robots. The algorithm seeks to bypass the funding barrier and improve accessibility for robotics researchers involved in designing control systems.
△ Less
Submitted 14 September, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
No Reference Stereoscopic Video Quality Assessment Using Joint Motion and Depth Statistics
Authors:
Appina Balasubramanyam,
Jalli Akshith,
Battula Shanmukh Srinivas,
Channappayya S Sumohana
Abstract:
We present a no reference (NR) quality assessment algorithm for assessing the perceptual quality of natural stereoscopic 3D (S3D) videos. This work is inspired by our finding that the joint statistics of the subband coefficients of motion (optical flow or motion vector magnitude) and depth (disparity map) of natural S3D videos possess a unique signature. Specifically, we empirically show that the…
▽ More
We present a no reference (NR) quality assessment algorithm for assessing the perceptual quality of natural stereoscopic 3D (S3D) videos. This work is inspired by our finding that the joint statistics of the subband coefficients of motion (optical flow or motion vector magnitude) and depth (disparity map) of natural S3D videos possess a unique signature. Specifically, we empirically show that the joint statistics of the motion and depth subband coefficients of S3D video frames can be modeled accurately using a Bivariate Generalized Gaussian Distribution (BGGD). We then demonstrate that the parameters of the BGGD model possess the ability to discern quality variations in S3D videos. Therefore, the BGGD model parameters are employed as motion and depth quality features. In addition to these features, we rely on a frame level spatial quality feature that is computed using a robust off the shelf NR image quality assessment (IQA) algorithm. These frame level motion, depth and spatial features are consolidated and used with the corresponding S3D video's difference mean opinion score (DMOS) labels for supervised learning using support vector regression (SVR). The overall quality of an S3D video is computed by averaging the frame level quality predictions of the constituent video frames. The proposed algorithm, dubbed Video QUality Evaluation using MOtion and DEpth Statistics (VQUEMODES) is shown to outperform the state of the art methods when evaluated over the IRCCYN and LFOVIA S3D subjective quality assessment databases.
△ Less
Submitted 15 November, 2017;
originally announced November 2017.