-
Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language
Authors:
Jesus Alvarez C,
Daua D. Karajeanes,
Ashley Celeste Prado,
John Ruttan,
Ivory Yang,
Sean O'Brien,
Vasu Sharma,
Kevin Zhu
Abstract:
The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually cura…
▽ More
The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. Our findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward inclusion. By establishing a foundation for Comanche in NLP, we advocate for computational approaches that prioritize accessibility, cultural sensitivity, and community engagement.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Improving endpoint detection in end-to-end streaming ASR for conversational speech
Authors:
Anandh C,
Karthik Pandia Durai,
Jeena Prakash,
Manickavela Arumugam,
Kadri Hacioglu,
S. Pavankumar Dubagunta,
Andreas Stolcke,
Shankar Venkatesan,
Aravind Ganapathiraju
Abstract:
ASR endpointing (EP) plays a major role in delivering a good user experience in products supporting human or artificial agents in human-human/machine conversations. Transducer-based ASR (T-ASR) is an end-to-end (E2E) ASR modelling technique preferred for streaming. A major limitation of T-ASR is delayed emission of ASR outputs, which could lead to errors or delays in EP. Inaccurate EP will cut the…
▽ More
ASR endpointing (EP) plays a major role in delivering a good user experience in products supporting human or artificial agents in human-human/machine conversations. Transducer-based ASR (T-ASR) is an end-to-end (E2E) ASR modelling technique preferred for streaming. A major limitation of T-ASR is delayed emission of ASR outputs, which could lead to errors or delays in EP. Inaccurate EP will cut the user off while speaking, returning incomplete transcript while delays in EP will increase the perceived latency, degrading the user experience. We propose methods to improve EP by addressing delayed emission along with EP mistakes. To address the delayed emission problem, we introduce an end-of-word token at the end of each word, along with a delay penalty. The EP delay is addressed by obtaining a reliable frame-level speech activity detection using an auxiliary network. We apply the proposed methods on Switchboard conversational speech corpus and evaluate it against a delay penalty method.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Flow Models for Unbounded and Geometry-Aware Distributional Reinforcement Learning
Authors:
Simo Alami C.,
Rim Kaddah,
Jesse Read,
Marie-Paule Cani
Abstract:
We introduce a new architecture for Distributional Reinforcement Learning (DistRL) that models return distributions using normalizing flows. This approach enables flexible, unbounded support for return distributions, in contrast to categorical approaches like C51 that rely on fixed or bounded representations. It also offers richer modeling capacity to capture multi-modality, skewness, and tail beh…
▽ More
We introduce a new architecture for Distributional Reinforcement Learning (DistRL) that models return distributions using normalizing flows. This approach enables flexible, unbounded support for return distributions, in contrast to categorical approaches like C51 that rely on fixed or bounded representations. It also offers richer modeling capacity to capture multi-modality, skewness, and tail behavior than quantile based approaches. Our method is significantly more parameter-efficient than categorical approaches. Standard metrics used to train existing models like KL divergence or Wasserstein distance either are scale insensitive or have biased sample gradients, especially when return supports do not overlap. To address this, we propose a novel surrogate for the Cramèr distance, that is geometry-aware and computable directly from the return distribution's PDF, avoiding the costly CDF computation. We test our model on the ATARI-5 sub-benchmark and show that our approach outperforms PDF based models while remaining competitive with quantile based methods.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
An information theoretic approach to quantify the stability of feature selection and ranking algorithms
Authors:
Alaiz-Rodriguez,
R.,
Parnell,
A. C
Abstract:
Feature selection is a key step when dealing with high dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data by selecting the most relevant features out of the noisy, redundant and irrelevant features. A problem that arises in many of these practical applications is that the outcome of the feature selection algorithm is not stable. Thus, small…
▽ More
Feature selection is a key step when dealing with high dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data by selecting the most relevant features out of the noisy, redundant and irrelevant features. A problem that arises in many of these practical applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations. We propose an information theoretic approach based on the Jensen Shannon divergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, feature subsets as well as the lesser studied partial ranked lists. This generalized metric quantifies the difference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties including correction for change, upper lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearmans rank correlation and the Kunchevas index on feature ranking and selection outcomes, respectively. Additionally, experimental validation of the proposed approach is carried out on a real-world problem of food quality assessment showing its potential to quantify stability from different perspectives.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Unsupervised and semi-supervised co-salient object detection via segmentation frequency statistics
Authors:
Souradeep Chakraborty,
Shujon Naha,
Muhammet Bastan,
Amit Kumar K C,
Dimitris Samaras
Abstract:
In this paper, we address the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enable us to develop a semi-supervised method. While previous works have mostly focused on fully supervised CoSOD, less attention has been allocated to detecting co-salient objects when limited segmentation annotations are available f…
▽ More
In this paper, we address the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enable us to develop a semi-supervised method. While previous works have mostly focused on fully supervised CoSOD, less attention has been allocated to detecting co-salient objects when limited segmentation annotations are available for training. Our simple yet effective unsupervised method US-CoSOD combines the object co-occurrence frequency statistics of unsupervised single-image semantic segmentations with salient foreground detections using self-supervised feature learning. For the first time, we show that a large unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to significantly improve unsupervised CoSOD performance. Our unsupervised model is a great pre-training initialization for our semi-supervised model SS-CoSOD, especially when very limited labeled data is available for training. To avoid propagating erroneous signals from predictions on unlabeled data, we propose a confidence estimation module to guide our semi-supervised training. Extensive experiments on three CoSOD benchmark datasets show that both of our unsupervised and semi-supervised models outperform the corresponding state-of-the-art models by a significant margin (e.g., on the Cosal2015 dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over a SOTA semi-supervised CoSOD model).
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Transferable Deep Metric Learning for Clustering
Authors:
Simo Alami. C,
Rim Kaddah,
Jesse Read
Abstract:
Clustering in high dimension spaces is a difficult task; the usual distance metrics may no longer be appropriate under the curse of dimensionality. Indeed, the choice of the metric is crucial, and it is highly dependent on the dataset characteristics. However a single metric could be used to correctly perform clustering on multiple datasets of different domains. We propose to do so, providing a fr…
▽ More
Clustering in high dimension spaces is a difficult task; the usual distance metrics may no longer be appropriate under the curse of dimensionality. Indeed, the choice of the metric is crucial, and it is highly dependent on the dataset characteristics. However a single metric could be used to correctly perform clustering on multiple datasets of different domains. We propose to do so, providing a framework for learning a transferable metric. We show that we can learn a metric on a labelled dataset, then apply it to cluster a different dataset, using an embedding space that characterises a desired clustering in the generic sense. We learn and test such metrics on several datasets of variable complexity (synthetic, MNIST, SVHN, omniglot) and achieve results competitive with the state-of-the-art while using only a small number of labelled training datasets and shallow networks.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
On Utilizing Relationships for Transferable Few-Shot Fine-Grained Object Detection
Authors:
Ambar Pal,
Arnau Ramisa,
Amit Kumar K C,
René Vidal
Abstract:
State-of-the-art object detectors are fast and accurate, but they require a large amount of well annotated training data to obtain good performance. However, obtaining a large amount of training annotations specific to a particular task, i.e., fine-grained annotations, is costly in practice. In contrast, obtaining common-sense relationships from text, e.g., "a table-lamp is a lamp that sits on top…
▽ More
State-of-the-art object detectors are fast and accurate, but they require a large amount of well annotated training data to obtain good performance. However, obtaining a large amount of training annotations specific to a particular task, i.e., fine-grained annotations, is costly in practice. In contrast, obtaining common-sense relationships from text, e.g., "a table-lamp is a lamp that sits on top of a table", is much easier. Additionally, common-sense relationships like "on-top-of" are easy to annotate in a task-agnostic fashion. In this paper, we propose a probabilistic model that uses such relational knowledge to transform an off-the-shelf detector of coarse object categories (e.g., "table", "lamp") into a detector of fine-grained categories (e.g., "table-lamp"). We demonstrate that our method, RelDetect, achieves performance competitive to finetuning based state-of-the-art object detector baselines when an extremely low amount of fine-grained annotations is available ($0.2\%$ of entire dataset). We also demonstrate that RelDetect is able to utilize the inherent transferability of relationship information to obtain a better performance ($+5$ mAP points) than the above baselines on an unseen dataset (zero-shot transfer). In summary, we demonstrate the power of using relationships for object detection on datasets where fine-grained object categories can be linked to coarse-grained categories via suitable relationships.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Conv-NILM-Net, a causal and multi-appliance model for energy source separation
Authors:
Simo Alami C.,
Jérémie Decock,
Rim Kaddah,
Jesse Read
Abstract:
Non-Intrusive Load Monitoring (NILM) seeks to save energy by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems. However most used models are used for Load Identification rather than online Source Separation. Among source separation models, most use a single-task learning approa…
▽ More
Non-Intrusive Load Monitoring (NILM) seeks to save energy by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems. However most used models are used for Load Identification rather than online Source Separation. Among source separation models, most use a single-task learning approach in which a neural network is trained exclusively for each appliance. This strategy is computationally expensive and ignores the fact that multiple appliances can be active simultaneously and dependencies between them. The rest of models are not causal, which is important for real-time application. Inspired by Convtas-Net, a model for speech separation, we propose Conv-NILM-net, a fully convolutional framework for end-to-end NILM. Conv-NILM-net is a causal model for multi appliance source separation. Our model is tested on two real datasets REDD and UK-DALE and clearly outperforms the state of the art while keeping a significantly smaller size than the competing models.
△ Less
Submitted 13 February, 2023; v1 submitted 3 August, 2022;
originally announced August 2022.
-
Real Time Object Detection System with YOLO and CNN Models: A Review
Authors:
Viswanatha V,
Chandana R K,
Ramachandra A. C.
Abstract:
The field of artificial intelligence is built on object detection techniques. YOU ONLY LOOK ONCE (YOLO) algorithm and it's more evolved versions are briefly described in this research survey. This survey is all about YOLO and convolution neural networks (CNN)in the direction of real time object detection.YOLO does generalized object representation more effectively without precision losses than oth…
▽ More
The field of artificial intelligence is built on object detection techniques. YOU ONLY LOOK ONCE (YOLO) algorithm and it's more evolved versions are briefly described in this research survey. This survey is all about YOLO and convolution neural networks (CNN)in the direction of real time object detection.YOLO does generalized object representation more effectively without precision losses than other object detection models.CNN architecture models have the ability to eliminate highlights and identify objects in any given image. When implemented appropriately, CNN models can address issues like deformity diagnosis, creating educational or instructive application, etc. This article reached atnumber of observations and perspective findings through the analysis.Also it provides support for the focused visual information and feature extraction in the financial and other industries, highlights the method of target detection and feature selection, and briefly describe the development process of YOLO algorithm.
△ Less
Submitted 23 July, 2022;
originally announced August 2022.
-
Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition
Authors:
Viswanatha V,
Ramachandra A. C,
Raghavendra Prasanna,
Prem Chowdary Kakarla,
Viveka Simha PJ,
Nishant Mohan
Abstract:
In this article gesture recognition and speech recognition applications are implemented on embedded systems with Tiny Machine Learning (TinyML). It features 3-axis accelerometer, 3-axis gyroscope and 3-axis magnetometer. The gesture recognition,provides an innovative approach nonverbal communication. It has wide applications in human-computer interaction and sign language. Here in the implementati…
▽ More
In this article gesture recognition and speech recognition applications are implemented on embedded systems with Tiny Machine Learning (TinyML). It features 3-axis accelerometer, 3-axis gyroscope and 3-axis magnetometer. The gesture recognition,provides an innovative approach nonverbal communication. It has wide applications in human-computer interaction and sign language. Here in the implementation of hand gesture recognition, TinyML model is trained and deployed from EdgeImpulse framework for hand gesture recognition and based on the hand movements, Arduino Nano 33 BLE device having 6-axis IMU can find out the direction of movement of hand. The Speech is a mode of communication. Speech recognition is a way by which the statements or commands of human speech is understood by the computer which reacts accordingly. The main aim of speech recognition is to achieve communication between man and machine. Here in the implementation of speech recognition, TinyML model is trained and deployed from EdgeImpulse framework for speech recognition and based on the keywords pronounced by human, Arduino Nano 33 BLE device having built-in microphone can make an RGB LED glow like red, green or blue based on keyword pronounced. The results of each application are obtained and listed in the results section and given the analysis upon the results.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies
Authors:
Simo Alami. C,
Fernando Llorente,
Rim Kaddah,
Luca Martino,
Jesse Read
Abstract:
Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solving a given problem (task or environment) involves converging towards an optimal policy. However, there might exist multiple optimal policies that can dramatically differ in their behaviour; for example, some may be faster than the others but at the expense of greater risk. We consider and study a di…
▽ More
Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solving a given problem (task or environment) involves converging towards an optimal policy. However, there might exist multiple optimal policies that can dramatically differ in their behaviour; for example, some may be faster than the others but at the expense of greater risk. We consider and study a distribution of optimal policies. We design a curiosity-augmented Metropolis algorithm (CAMEO), such that we can sample optimal policies, and such that these policies effectively adopt diverse behaviours, since this implies greater coverage of the different possible optimal policies. In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems, and even in the challenging case of environments that provide sparse rewards. We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability, and represents a first step towards learning the distribution of optimal policies itself.
△ Less
Submitted 15 February, 2023; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Collation of Feasible Solutions for Domain Based Problems: An Analysis of Sentiments Based on Codeathon Activity
Authors:
Rajeshwari K,
Preetha S,
Anitha C,
Lakshmi Shree K,
Pronoy Roy
Abstract:
Codeathon activity is a practical approach for enduring the principles of Software Engineering and Object Oriented Modelling. Real world domain problem's solution was accomplished through team work. Analysing the problem and designing a feasible solution through a one day activity was achieved through virtual connection. There are three different sections in a semester, 13 teams were framed and as…
▽ More
Codeathon activity is a practical approach for enduring the principles of Software Engineering and Object Oriented Modelling. Real world domain problem's solution was accomplished through team work. Analysing the problem and designing a feasible solution through a one day activity was achieved through virtual connection. There are three different sections in a semester, 13 teams were framed and assigned one problem statement. Individual team were supposed to prototype a solution which was further used to build one feasible solution. The feedback from students showed different sentiments associated with day long activity. Vivid emotions and expressions of students were analysed.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Codeathon Activity: A Design Prototype for Real World Problems
Authors:
Preetha S,
Rajeshwari K,
Anitha C,
Kausthub Narayan
Abstract:
Activity-based learning helps students to learn through participation. A virtual codeathon activity, as part of this learning scheme, was conducted for 180 undergraduate students to focus on analysis and design of solutions to crucial real-world problems in the existing Covid-19 pandemic situation. In this paper, an analysis is made to know the problem solving skills of students given a single pro…
▽ More
Activity-based learning helps students to learn through participation. A virtual codeathon activity, as part of this learning scheme, was conducted for 180 undergraduate students to focus on analysis and design of solutions to crucial real-world problems in the existing Covid-19 pandemic situation. In this paper, an analysis is made to know the problem solving skills of students given a single problem statement. Evaluators can further collate these multiple solutions into one optimal solution. This Codeathon activity impacts their practical approach towards the analysis and design.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Recognition of Oracle Bone Inscriptions by using Two Deep Learning Models
Authors:
Yoshiyuki Fujikawa,
Hengyi Li,
Xuebin Yue,
Aravinda C V,
Amar Prabhu G,
Lin Meng
Abstract:
Oracle bone inscriptions (OBIs) contain some of the oldest characters in the world and were used in China about 3000 years ago. As an ancient form of literature, OBIs store a lot of information that can help us understand the world history, character evaluations, and more. However, as OBIs were found only discovered about 120 years ago, few studies have described them, and the aging process has ma…
▽ More
Oracle bone inscriptions (OBIs) contain some of the oldest characters in the world and were used in China about 3000 years ago. As an ancient form of literature, OBIs store a lot of information that can help us understand the world history, character evaluations, and more. However, as OBIs were found only discovered about 120 years ago, few studies have described them, and the aging process has made the inscriptions less legible. Hence, automatic character detection and recognition has become an important issue. This paper aims to design a online OBI recognition system for helping preservation and organization the cultural heritage. We evaluated two deep learning models for OBI recognition, and have designed an API that can be accessed online for OBI recognition. In the first stage, you only look once (YOLO) is applied for detecting and recognizing OBIs. However, not all of the OBIs can be detected correctly by YOLO, so we next utilize MobileNet to recognize the undetected OBIs by manually cropping the undetected OBI in the image. MobileNet is used for this second stage of recognition as our evaluation of ten state-of-the-art models showed that it is the best network for OBI recognition due to its superior performance in terms of accuracy, loss and time consumption. We installed our system on an application programming interface (API) and opened it for OBI detection and recognition.
△ Less
Submitted 4 May, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
Iterative hypothesis testing for multi-object tracking in presence of features with variable reliability
Authors:
Amit Kumar K. C.,
Damien Delannay,
Christophe De Vleeschouwer
Abstract:
This paper assumes prior detections of multiple targets at each time instant, and uses a graph-based approach to connect those detections across time, based on their position and appearance estimates. In contrast to most earlier works in the field, our framework has been designed to exploit the appearance features, even when they are only sporadically available, or affected by a non-stationary noi…
▽ More
This paper assumes prior detections of multiple targets at each time instant, and uses a graph-based approach to connect those detections across time, based on their position and appearance estimates. In contrast to most earlier works in the field, our framework has been designed to exploit the appearance features, even when they are only sporadically available, or affected by a non-stationary noise, along the sequence of detections. This is done by implementing an iterative hypothesis testing strategy to progressively aggregate the detections into short trajectories, named tracklets. Specifically, each iteration considers a node, named key-node, and investigates how to link this key-node with other nodes in its neighborhood, under the assumption that the target appearance is defined by the key-node appearance estimate. This is done through shortest path computation in a temporal neighborhood of the key-node. The approach is conservative in that it only aggregates the shortest paths that are sufficiently better compared to alternative paths. It is also multi-scale in that the size of the investigated neighborhood is increased proportionally to the number of detections already aggregated into the key-node. The multi-scale nature of the process and the progressive relaxation of its conservativeness makes it both computationally efficient and effective.
Experimental validations are performed extensively on a toy example, a 15 minutes long multi-view basketball dataset, and other monocular pedestrian datasets.
△ Less
Submitted 1 September, 2015;
originally announced September 2015.
-
Discriminative and Efficient Label Propagation on Complementary Graphs for Multi-Object Tracking
Authors:
Amit Kumar K. C.,
Laurent Jacques,
Christophe De Vleeschouwer
Abstract:
Given a set of detections, detected at each time instant independently, we investigate how to associate them across time. This is done by propagating labels on a set of graphs, each graph capturing how either the spatio-temporal or the appearance cues promote the assignment of identical or distinct labels to a pair of detections. The graph construction is motivated by a locally linear embedding of…
▽ More
Given a set of detections, detected at each time instant independently, we investigate how to associate them across time. This is done by propagating labels on a set of graphs, each graph capturing how either the spatio-temporal or the appearance cues promote the assignment of identical or distinct labels to a pair of detections. The graph construction is motivated by a locally linear embedding of the detection features. Interestingly, the neighborhood of a node in appearance graph is defined to include all the nodes for which the appearance feature is available (even if they are temporally distant). This gives our framework the uncommon ability to exploit the appearance features that are available only sporadically. Once the graphs have been defined, multi-object tracking is formulated as the problem of finding a label assignment that is consistent with the constraints captured each graph, which results into a difference of convex (DC) program. We propose to decompose the global objective function into node-wise sub-problems. This not only allows a computationally efficient solution, but also supports an incremental and scalable construction of the graph, thereby making the framework applicable to large graphs and practical tracking scenarios. Moreover, it opens the possibility of parallel implementation.
△ Less
Submitted 1 December, 2015; v1 submitted 5 April, 2015;
originally announced April 2015.
-
Constructions of hamiltonian graphs with bounded degree and diameter O (log n)
Authors:
Aleksandar Ili\' c,
Dragan Stevanovi\' c
Abstract:
Token ring topology has been frequently used in the design of distributed loop computer networks and one measure of its performance is the diameter. We propose an algorithm for constructing hamiltonian graphs with $n$ vertices and maximum degree $Δ$ and diameter $O (\log n)$, where $n$ is an arbitrary number. The number of edges is asymptotically bounded by…
▽ More
Token ring topology has been frequently used in the design of distributed loop computer networks and one measure of its performance is the diameter. We propose an algorithm for constructing hamiltonian graphs with $n$ vertices and maximum degree $Δ$ and diameter $O (\log n)$, where $n$ is an arbitrary number. The number of edges is asymptotically bounded by $(2 - \frac{1}{Δ- 1} - \frac{(Δ- 2)^2}{(Δ- 1)^3}) n$. In particular, we construct a family of hamiltonian graphs with diameter at most $2 \lfloor \log_2 n \rfloor$, maximum degree 3 and at most $1+11n/8$ edges.
△ Less
Submitted 16 April, 2011;
originally announced April 2011.