-
ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection
Authors:
Nandakishor M,
Vrinda Govind V,
Anuradha Puthalath,
Anzy L,
Swathi P S,
Aswathi R,
Devaprabha A R,
Varsha Raj,
Midhuna Krishnan K,
Akhila Anilkumar T V,
Yamuna P V
Abstract:
Force estimation in human-object interactions is crucial for various fields like ergonomics, physical therapy, and sports science. Traditional methods depend on specialized equipment such as force plates and sensors, which makes accurate assessments both expensive and restricted to laboratory settings. In this paper, we introduce ForcePose, a novel deep learning framework that estimates applied fo…
▽ More
Force estimation in human-object interactions is crucial for various fields like ergonomics, physical therapy, and sports science. Traditional methods depend on specialized equipment such as force plates and sensors, which makes accurate assessments both expensive and restricted to laboratory settings. In this paper, we introduce ForcePose, a novel deep learning framework that estimates applied forces by combining human pose estimation with object detection. Our approach leverages MediaPipe for skeletal tracking and SSD MobileNet for object recognition to create a unified representation of human-object interaction. We've developed a specialized neural network that processes both spatial and temporal features to predict force magnitude and direction without needing any physical sensors. After training on our dataset of 850 annotated videos with corresponding force measurements, our model achieves a mean absolute error of 5.83 N in force magnitude and 7.4 degrees in force direction. When compared to existing computer vision approaches, our method performs 27.5% better while still offering real-time performance on standard computing hardware. ForcePose opens up new possibilities for force analysis in diverse real-world scenarios where traditional measurement tools are impractical or intrusive. This paper discusses our methodology, the dataset creation process, evaluation metrics, and potential applications across rehabilitation, ergonomics assessment, and athletic performance analysis.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
ML DevOps Adoption in Practice: A Mixed-Method Study of Implementation Patterns and Organizational Benefits
Authors:
Dileepkumar S R,
Juby Mathew
Abstract:
Machine Learning (ML) DevOps, also known as MLOps, has emerged as a critical framework for efficiently operationalizing ML models in various industries. This study investigates the adoption trends, implementation efforts, and benefits of ML DevOps through a combination of literature review and empirical analysis. By surveying 150 professionals across industries and conducting in-depth interviews w…
▽ More
Machine Learning (ML) DevOps, also known as MLOps, has emerged as a critical framework for efficiently operationalizing ML models in various industries. This study investigates the adoption trends, implementation efforts, and benefits of ML DevOps through a combination of literature review and empirical analysis. By surveying 150 professionals across industries and conducting in-depth interviews with 20 practitioners, the study provides insights into the growing adoption of ML DevOps, particularly in sectors like finance and healthcare. The research identifies key challenges, such as fragmented tooling, data management complexities, and skill gaps, which hinder widespread adoption. However, the findings highlight significant benefits, including improved deployment frequency, reduced error rates, enhanced collaboration between data science and DevOps teams, and lower operational costs. Organizations leveraging ML DevOps report accelerated model deployment, increased scalability, and better compliance with industry regulations. The study also explores the technical and cultural efforts required for successful implementation, such as investments in automation tools, real-time monitoring, and upskilling initiatives. The results indicate that while challenges remain, ML DevOps presents a viable path to optimizing ML lifecycle management, ensuring model reliability, and enhancing business value. Future research should focus on standardizing ML DevOps practices, assessing the return on investment across industries, and developing frameworks for seamless integration with traditional DevOps methodologies
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
Authors:
Shravan Venkatraman,
Jaskaran Singh Walia,
Joe Dhanith P R
Abstract:
Vision Transformers (ViTs) have redefined image classification by leveraging self-attention to capture complex patterns and long-range dependencies between image patches. However, a key challenge for ViTs is efficiently incorporating multi-scale feature representations, which is inherent in convolutional neural networks (CNNs) through their hierarchical structure. Graph transformers have made stri…
▽ More
Vision Transformers (ViTs) have redefined image classification by leveraging self-attention to capture complex patterns and long-range dependencies between image patches. However, a key challenge for ViTs is efficiently incorporating multi-scale feature representations, which is inherent in convolutional neural networks (CNNs) through their hierarchical structure. Graph transformers have made strides in addressing this by leveraging graph-based modeling, but they often lose or insufficiently represent spatial hierarchies, especially since redundant or less relevant areas dilute the image's contextual representation. To bridge this gap, we propose SAG-ViT, a Scale-Aware Graph Attention ViT that integrates multi-scale feature capabilities of CNNs, representational power of ViTs, graph-attended patching to enable richer contextual representation. Using EfficientNetV2 as a backbone, the model extracts multi-scale feature maps, dividing them into patches to preserve richer semantic information compared to directly patching the input images. The patches are structured into a graph using spatial and feature similarities, where a Graph Attention Network (GAT) refines the node embeddings. This refined graph representation is then processed by a Transformer encoder, capturing long-range dependencies and complex interactions. We evaluate SAG-ViT on benchmark datasets across various domains, validating its effectiveness in advancing image classification tasks. Our code and weights are available at https://github.com/shravan-18/SAG-ViT.
△ Less
Submitted 7 January, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Enhancing Diabetic Retinopathy Detection with CNN-Based Models: A Comparative Study of UNET and Stacked UNET Architectures
Authors:
Ameya Uppina,
S Navaneetha Krishnan,
Talluri Krishna Sai Teja,
Nikhil N Iyer,
Joe Dhanith P R
Abstract:
Diabetic Retinopathy DR is a severe complication of diabetes. Damaged or abnormal blood vessels can cause loss of vision. The need for massive screening of a large population of diabetic patients has generated an interest in a computer-aided fully automatic diagnosis of DR. In the realm of Deep learning frameworks, particularly convolutional neural networks CNNs, have shown great interest and prom…
▽ More
Diabetic Retinopathy DR is a severe complication of diabetes. Damaged or abnormal blood vessels can cause loss of vision. The need for massive screening of a large population of diabetic patients has generated an interest in a computer-aided fully automatic diagnosis of DR. In the realm of Deep learning frameworks, particularly convolutional neural networks CNNs, have shown great interest and promise in detecting DR by analyzing retinal images. However, several challenges have been faced in the application of deep learning in this domain. High-quality, annotated datasets are scarce, and the variations in image quality and class imbalances pose significant hurdles in developing a dependable model. In this paper, we demonstrate the proficiency of two Convolutional Neural Networks CNNs based models, UNET and Stacked UNET utilizing the APTOS Asia Pacific Tele-Ophthalmology Society Dataset. This system achieves an accuracy of 92.81% for the UNET and 93.32% for the stacked UNET architecture. The architecture classifies the images into five categories ranging from 0 to 4, where 0 is no DR and 4 is proliferative DR.
△ Less
Submitted 20 January, 2025; v1 submitted 2 November, 2024;
originally announced November 2024.
-
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Authors:
Joe Dhanith P R,
Shravan Venkatraman,
Vigya Sharma,
Santhosh Malarvannan,
Modigari Narendra
Abstract:
Understanding emotions is a fundamental aspect of human communication. Integrating audio and video signals offers a more comprehensive understanding of emotional states compared to traditional methods that rely on a single data source, such as speech or facial expressions. Despite its potential, multimodal emotion recognition faces significant challenges, particularly in synchronization, feature e…
▽ More
Understanding emotions is a fundamental aspect of human communication. Integrating audio and video signals offers a more comprehensive understanding of emotional states compared to traditional methods that rely on a single data source, such as speech or facial expressions. Despite its potential, multimodal emotion recognition faces significant challenges, particularly in synchronization, feature extraction, and fusion of diverse data sources. To address these issues, this paper introduces a novel transformer-based model named Audio-Video Transformer Fusion with Cross Attention (AVT-CA). The AVT-CA model employs a transformer fusion approach to effectively capture and synchronize interlinked features from both audio and video inputs, thereby resolving synchronization problems. Additionally, the Cross Attention mechanism within AVT-CA selectively extracts and emphasizes critical features while discarding irrelevant ones from both modalities, addressing feature extraction and fusion challenges. Extensive experimental analysis conducted on the CMU-MOSEI, RAVDESS and CREMA-D datasets demonstrates the efficacy of the proposed model. The results underscore the importance of AVT-CA in developing precise and reliable multimodal emotion recognition systems for practical applications.
△ Less
Submitted 19 February, 2025; v1 submitted 26 July, 2024;
originally announced July 2024.
-
Spine Vision X-Ray Image based GUI Planning of Pedicle Screws Using Enhanced YOLOv5 for Vertebrae Segmentation
Authors:
Yashwanth Rao,
Gaurisankar S,
Durga R,
Aparna Purayath,
Vivek Maik,
Manojkumar Lakshmanan,
Mohanasankar Sivaprakasm
Abstract:
In this paper, we propose an innovative Graphical User Interface (GUI) aimed at improving preoperative planning and intra-operative guidance for precise spinal screw placement through vertebrae segmentation. The methodology encompasses both front-end and back-end computations. The front end comprises a GUI that allows surgeons to precisely adjust the placement of screws on X-Ray images, thereby im…
▽ More
In this paper, we propose an innovative Graphical User Interface (GUI) aimed at improving preoperative planning and intra-operative guidance for precise spinal screw placement through vertebrae segmentation. The methodology encompasses both front-end and back-end computations. The front end comprises a GUI that allows surgeons to precisely adjust the placement of screws on X-Ray images, thereby improving the simulation of surgical screw insertion in the patient's spine. On the other hand, the back-end processing involves several steps, including acquiring spinal X-ray images, performing pre-processing techniques to reduce noise, and training a neural network model to achieve real-time segmentation of the vertebrae. The integration of vertebral segmentation in the GUI ensures precise screw placement, reducing complications like nerve injury and ultimately improving surgical outcomes. The Spine-Vision provides a comprehensive solution with innovative features like synchronous AP-LP planning, accurate screw positioning via vertebrae segmentation, effective screw visualization, and dynamic position adjustments. This X-ray image-based GUI workflow emerges as a valuable tool, enhancing precision and safety in spinal screw placement and planning procedures.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
GUI-based Pedicle Screw Planning on Fluoroscopic Images Utilizing Vertebral Segmentation
Authors:
Vivek Maik,
Aparna Purayath,
Durga R,
Manojkumar Lakshmanan,
Mohanasankar Sivaprakasm
Abstract:
The proposed work establishes a novel Graphical User Interface (GUI) framework, primarily designed for intraoperative pedicle screw planning. Current planning workflow in Image Guided Surgeries primarily relies on pre-operative CT planning. Intraoperative CT planning can be time-consuming and expensive and thus is not a common practice. In situations where efficiency and cost-effectiveness are par…
▽ More
The proposed work establishes a novel Graphical User Interface (GUI) framework, primarily designed for intraoperative pedicle screw planning. Current planning workflow in Image Guided Surgeries primarily relies on pre-operative CT planning. Intraoperative CT planning can be time-consuming and expensive and thus is not a common practice. In situations where efficiency and cost-effectiveness are paramount, planning to utilize fluoroscopic images acquired for image registration emerges as the optimal choice. The methodology proposed in this study employs a simulated 3D pedicle screw to calculate its coronal and sagittal projections for pedicle screw planning using anterior-posterior (AP) and lateral (LP) images. The initialization and placement of pedicle screw is computed by utilizing the bounding box of vertebral segmentation, which is obtained by the application of enhanced YOLOv5. The GUI front end includes functionality that allows surgeons or medical practitioners to efficiently choose, set up, and dynamically maneuver the pedicle screw on AP and LP images. This is based on a novel feature called synchronous planning, which involves correlating pedicle screws from the coronal and sagittal planes. This correlation utilizes projective correspondence to ensure that any movement of the pedicle screw in either the AP or LP image will be reflected in the other image. The proposed GUI framework is a time-efficient and cost-effective tool for synchronizing and planning the movement of pedicle screws during intraoperative surgical procedures.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Navigating Tabular Data Synthesis Research: Understanding User Needs and Tool Capabilities
Authors:
Maria F. Davila R.,
Sven Groen,
Fabian Panse,
Wolfram Wingerath
Abstract:
In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations). Synthesizing tabular data presents unique and complex challenges, especially handling (i) missing values, (ii) dataset imbalance, (iii) diverse column types, and (i…
▽ More
In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations). Synthesizing tabular data presents unique and complex challenges, especially handling (i) missing values, (ii) dataset imbalance, (iii) diverse column types, and (iv) complex data distributions, as well as preserving (i) column correlations, (ii) temporal dependencies, and (iii) integrity constraints (e.g., functional dependencies) present in the original dataset. While substantial progress has been made recently in the context of generational models, there is no one-size-fits-all solution for tabular data today, and choosing the right tool for a given task is therefore no trivial task. In this paper, we survey the state of the art in Tabular Data Synthesis (TDS), examine the needs of users by defining a set of functional and non-functional requirements, and compile the challenges associated with meeting those needs. In addition, we evaluate the reported performance of 36 popular research TDS tools about these requirements and develop a decision guide to help users find suitable TDS tools for their applications. The resulting decision guide also identifies significant research gaps.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Revolutionizing Underwater Exploration of Autonomous Underwater Vehicles (AUVs) and Seabed Image Processing Techniques
Authors:
Rajesh Sharma R,
Akey Sungheetha,
Dr Chinnaiyan R
Abstract:
The oceans in the Earth's in one of the last border lines on the World, with only a fraction of their depths having been explored. Advancements in technology have led to the development of Autonomous Underwater Vehicles (AUVs) that can operate independently and perform complex tasks underwater. These vehicles have revolutionized underwater exploration, allowing us to study and understand our ocean…
▽ More
The oceans in the Earth's in one of the last border lines on the World, with only a fraction of their depths having been explored. Advancements in technology have led to the development of Autonomous Underwater Vehicles (AUVs) that can operate independently and perform complex tasks underwater. These vehicles have revolutionized underwater exploration, allowing us to study and understand our oceans like never before. In addition to AUVs, image processing techniques have also been developed that can help us to better understand the seabed and its features. In this comprehensive survey, we will explore the latest advancements in AUV technology and seabed image processing techniques. We'll discuss how these advancements are changing the way we explore and understand our oceans, and their potential impact on the future of marine science. Join us on this journey to discover the exciting world of underwater exploration and the technologies that are driving it forward.
△ Less
Submitted 22 November, 2023;
originally announced February 2024.
-
Detection of Colluded Black-hole and Grey-hole attacks in Cloud Computing
Authors:
Divyasree I R,
Selvamani K,
Riasudheen H
Abstract:
The availability of the high-capacity network, massive storage, hardware virtualization, utility computing, service-oriented architecture leads to high accessibility of cloud computing. The extensive usage of cloud resources causes oodles of security controversies. Black-hole & Gray-hole attacks are the notable cloud network defenseless attacks while they launched easily but difficult to detect. T…
▽ More
The availability of the high-capacity network, massive storage, hardware virtualization, utility computing, service-oriented architecture leads to high accessibility of cloud computing. The extensive usage of cloud resources causes oodles of security controversies. Black-hole & Gray-hole attacks are the notable cloud network defenseless attacks while they launched easily but difficult to detect. This research work focuses on proposing an efficient integrated detection method for individual and collusion attacks in cloud computing. In the individual attack detection phase, the forwarding ratio metric is used for differentiating the malicious node and normal nodes. In the collusion attack detection phase, the malicious nodes are manipulated the encounter records for escaping the detection process. To overcome this user, fake encounters are examined along with appearance frequency, and the number of messages exploits abnormal patterns. The simulation results shown in this proposed system detect with better accuracy.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Asynchronous Wi-Fi Control Interface (AWCI) Using Socket IO Technology
Authors:
Devipriya T K,
Jovita Franci A,
Deepa R,
Godwin Sam Josh
Abstract:
The Internet of Things (IoT) is a system of interrelated computing devices to the Internet that are provided with unique identifiers which has the ability to transfer data over a network without requiring human-to- human or human-to- computer interaction. Raspberry pi-3 a popular, cheap, small and powerful computer with built in Wi-Fi can be used to make any devices smart by connecting to that par…
▽ More
The Internet of Things (IoT) is a system of interrelated computing devices to the Internet that are provided with unique identifiers which has the ability to transfer data over a network without requiring human-to- human or human-to- computer interaction. Raspberry pi-3 a popular, cheap, small and powerful computer with built in Wi-Fi can be used to make any devices smart by connecting to that particular device and embedding the required software to Raspberry pi-3 and connect it to Internet. It is difficult to install a full Linux OS inside a small devices like light switch so in that case to connect to a Wi-Fi connection a model was proposed known as Asynchronous Wi-Fi Control Interface (AWCI) which is a simple Wi-Fi connectivity software for a Debian compatible Linux OS). The objective of this paper is to make the interactive user interface for Wi-Fi connection in Raspberry Pi touch display by providing live updates using Socket IO technology. The Socket IO technology enables real-time bidirectional communication between client and server. Asynchronous Wi-Fi Control Interface (AWCI) is compatible with every platform, browser or device.
△ Less
Submitted 6 October, 2018;
originally announced October 2018.
-
Online Reweighted Least Squares Algorithm for Sparse Recovery and Application to Short-Wave Infrared Imaging
Authors:
Subhadip Mukherjee,
Deepak R.,
Huaijin Chen,
Ashok Veeraraghavan,
Chandra Sekhar Seelamantula
Abstract:
We address the problem of sparse recovery in an online setting, where random linear measurements of a sparse signal are revealed sequentially and the objective is to recover the underlying signal. We propose a reweighted least squares (RLS) algorithm to solve the problem of online sparse reconstruction, wherein a system of linear equations is solved using conjugate gradient with the arrival of eve…
▽ More
We address the problem of sparse recovery in an online setting, where random linear measurements of a sparse signal are revealed sequentially and the objective is to recover the underlying signal. We propose a reweighted least squares (RLS) algorithm to solve the problem of online sparse reconstruction, wherein a system of linear equations is solved using conjugate gradient with the arrival of every new measurement. The proposed online algorithm is useful in a setting where one seeks to design a progressive decoding strategy to reconstruct a sparse signal from linear measurements so that one does not have to wait until all measurements are acquired. Moreover, the proposed algorithm is also useful in applications where it is infeasible to process all the measurements using a batch algorithm, owing to computational and storage constraints. It is not needed a priori to collect a fixed number of measurements; rather one can keep collecting measurements until the quality of reconstruction is satisfactory and stop taking further measurements once the reconstruction is sufficiently accurate. We provide a proof-of-concept by comparing the performance of our algorithm with the RLS-based batch reconstruction strategy, known as iteratively reweighted least squares (IRLS), on natural images. Experiments on a recently proposed focal plane array-based imaging setup show up to 1 dB improvement in output peak signal-to-noise ratio as compared with the total variation-based reconstruction.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
DWT Based Fingerprint Recognition using Non Minutiae Features
Authors:
Shashi Kumar D. R.,
K. B. Raja,
R. K. Chhootaray,
Sabyasachi Pattanaik
Abstract:
Forensic applications like criminal investigations, terrorist identification and National security issues require a strong fingerprint data base and efficient identification system. In this paper we propose DWT based Fingerprint Recognition using Non Minutiae (DWTFR) algorithm. Fingerprint image is decomposed into multi resolution sub bands of LL, LH, HL and HH by applying 3 level DWT. The Dominan…
▽ More
Forensic applications like criminal investigations, terrorist identification and National security issues require a strong fingerprint data base and efficient identification system. In this paper we propose DWT based Fingerprint Recognition using Non Minutiae (DWTFR) algorithm. Fingerprint image is decomposed into multi resolution sub bands of LL, LH, HL and HH by applying 3 level DWT. The Dominant local orientation angle θ and Coherence are computed on LL band only. The Centre Area Features and Edge Parameters are determined on each DWT level by considering all four sub bands. The comparison of test fingerprint with database fingerprint is decided based on the Euclidean Distance of all the features. It is observed that the values of FAR, FRR and TSR are improved compared to the existing algorithm.
△ Less
Submitted 17 June, 2011;
originally announced June 2011.