-
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Authors:
Alan Dao,
Dinh Bach Vu,
Huy Hoang Ha,
Tuan Le Duc Anh,
Shreyas Gopal,
Yue Heng Yeo,
Warren Keng Hoong Low,
Eng Siong Chng,
Jia Qi Yip
Abstract:
The rapid growth of voice assistants powered by large language models (LLM) has highlighted a need for speech instruction data to train these systems. Despite the abundance of speech recognition data, there is a notable scarcity of speech instruction data, which is essential for fine-tuning models to understand and execute spoken commands. Generating high-quality synthetic speech requires a good t…
▽ More
The rapid growth of voice assistants powered by large language models (LLM) has highlighted a need for speech instruction data to train these systems. Despite the abundance of speech recognition data, there is a notable scarcity of speech instruction data, which is essential for fine-tuning models to understand and execute spoken commands. Generating high-quality synthetic speech requires a good text-to-speech (TTS) model, which may not be available to low resource languages. Our novel approach addresses this challenge by halting synthesis at the semantic representation level, bypassing the need for TTS. We achieve this by aligning synthetic semantic representations with the pre-trained Whisper encoder, enabling an LLM to be fine-tuned on text instructions while maintaining the ability to understand spoken instructions during inference. This simplified training process is a promising approach to building voice assistant for low-resource languages.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Multi class activity classification in videos using Motion History Image generation
Authors:
Senthilkumar Gopal
Abstract:
Human action recognition has been a topic of interest across multiple fields ranging from security to entertainment systems. Tracking the motion and identifying the action being performed on a real time basis is necessary for critical security systems. In entertainment, especially gaming, the need for immediate responses for actions and gestures are paramount for the success of that system. We sho…
▽ More
Human action recognition has been a topic of interest across multiple fields ranging from security to entertainment systems. Tracking the motion and identifying the action being performed on a real time basis is necessary for critical security systems. In entertainment, especially gaming, the need for immediate responses for actions and gestures are paramount for the success of that system. We show that Motion History image has been a well established framework to capture the temporal and activity information in multi dimensional detail enabling various usecases including classification. We utilize MHI to produce sample data to train a classifier and demonstrate its effectiveness for action classification across six different activities in a single multi-action video. We analyze the classifier performance and identify usecases where MHI struggles to generate the appropriate activity image and discuss mechanisms and future work to overcome those limitations.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Long-Distance Gesture Recognition using Dynamic Neural Networks
Authors:
Shubhang Bhatnagar,
Sharath Gopal,
Narendra Ahuja,
Liu Ren
Abstract:
Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a dr…
▽ More
Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
5G NR-LTE Coexistence: Opportunities, Challenges, and Solutions
Authors:
Sneihil Gopal,
David Griffith,
Richard A. Rouil,
Chunmei Liu
Abstract:
5G New Radio (NR) promises to support diverse services such as enhanced mobile broadband (eMBB), ultra-reliable low-latency communication (URLLC), and massive machine-type communication (mMTC). This requires spectrum, most of which is occupied by 4G Long Term Evolution (LTE). Hence, network operators are expected to deploy 5G using the existing LTE infrastructure while migrating to NR. In addition…
▽ More
5G New Radio (NR) promises to support diverse services such as enhanced mobile broadband (eMBB), ultra-reliable low-latency communication (URLLC), and massive machine-type communication (mMTC). This requires spectrum, most of which is occupied by 4G Long Term Evolution (LTE). Hence, network operators are expected to deploy 5G using the existing LTE infrastructure while migrating to NR. In addition, operators must support legacy LTE devices during the migration, so LTE and NR systems will coexist for the foreseeable future. In this article, we address LTE-NR coexistence starting with a review of both radio access technologies. We then describe the contributions by the 3rd Generation Partnership Project (3GPP) to solving the coexistence issue and catalog the major coexistence scenarios. Lastly, we introduce a novel spectrum sharing scheme that can be applied to the coexistence scenarios under study.
△ Less
Submitted 11 October, 2022; v1 submitted 8 October, 2022;
originally announced October 2022.