-
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond
Authors:
Shubhi Bansal,
Sreeharish A,
Madhava Prasath J,
Manikandan S,
Sreekanth Madisetty,
Mohammad Zia Ur Rehman,
Chandravardhan Singh Raghaw,
Gaurav Duggal,
Nagendra Kumar
Abstract:
Mamba, a special case of the State Space Model, is gaining popularity as an alternative to template-based deep learning approaches in medical image analysis. While transformers are powerful architectures, they have drawbacks, including quadratic computational complexity and an inability to address long-range dependencies efficiently. This limitation affects the analysis of large and complex datase…
▽ More
Mamba, a special case of the State Space Model, is gaining popularity as an alternative to template-based deep learning approaches in medical image analysis. While transformers are powerful architectures, they have drawbacks, including quadratic computational complexity and an inability to address long-range dependencies efficiently. This limitation affects the analysis of large and complex datasets in medical imaging, where there are many spatial and temporal relationships. In contrast, Mamba offers benefits that make it well-suited for medical image analysis. It has linear time complexity, which is a significant improvement over transformers. Mamba processes longer sequences without attention mechanisms, enabling faster inference and requiring less memory. Mamba also demonstrates strong performance in merging multimodal data, improving diagnosis accuracy and patient outcomes. The organization of this paper allows readers to appreciate the capabilities of Mamba in medical imaging step by step. We begin by defining core concepts of SSMs and models, including S4, S5, and S6, followed by an exploration of Mamba architectures such as pure Mamba, U-Net variants, and hybrid models with convolutional neural networks, transformers, and Graph Neural Networks. We also cover Mamba optimizations, techniques and adaptations, scanning, datasets, applications, experimental results, and conclude with its challenges and future directions in medical imaging. This review aims to demonstrate the transformative potential of Mamba in overcoming existing barriers within medical imaging while paving the way for innovative advancements in the field. A comprehensive list of Mamba architectures applied in the medical field, reviewed in this work, is available at Github.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques
Authors:
Abhinand K.,
Abhiram B. Nair,
Dhananjay C.,
Hanan Hamza,
Mohammed Fawaz J.,
Rahma Fahim K.,
Anoop V. S
Abstract:
Technological advancements and innovations are advancing our daily life in all the ways possible but there is a larger section of society who are deprived of accessing the benefits due to their physical inabilities. To reap the real benefits and make it accessible to society, these talented and gifted people should also use such innovations without any hurdles. Many applications developed these da…
▽ More
Technological advancements and innovations are advancing our daily life in all the ways possible but there is a larger section of society who are deprived of accessing the benefits due to their physical inabilities. To reap the real benefits and make it accessible to society, these talented and gifted people should also use such innovations without any hurdles. Many applications developed these days address these challenges, but localized communities and other constrained linguistic groups may find it difficult to use them. Malayalam, a Dravidian language spoken in the Indian state of Kerala is one of the twenty-two scheduled languages in India. Recent years have witnessed a surge in the development of systems and tools in Malayalam, addressing the needs of Kerala, but many of them are not empathetically designed to cater to the needs of hearing-impaired people. One of the major challenges is the limited or no availability of sign language data for the Malayalam language and sufficient efforts are not made in this direction. In this connection, this paper proposes an approach for sign language identification for the Malayalam language using advanced deep learning and computer vision techniques. We start by developing a labeled dataset for Malayalam letters and for the identification we use advanced deep learning techniques such as YOLOv8 and computer vision. Experimental results show that the identification accuracy is comparable to other sign language identification systems and other researchers in sign language identification can use the model as a baseline to develop advanced models.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations
Authors:
Mathav Raj J,
Kushala VM,
Harikrishna Warrier,
Yogesh Gupta
Abstract:
There is a compelling necessity from enterprises for fine tuning LLMs (Large Language Models) o get them trained on proprietary domain knowledge. The challenge is to imbibe the LLMs with domain specific knowledge using the most optimial resource and cost and in the best possible time. Many enterprises rely on RAG (Retrieval Augmented Generation) which does not need LLMs to be ine-tuned but they ar…
▽ More
There is a compelling necessity from enterprises for fine tuning LLMs (Large Language Models) o get them trained on proprietary domain knowledge. The challenge is to imbibe the LLMs with domain specific knowledge using the most optimial resource and cost and in the best possible time. Many enterprises rely on RAG (Retrieval Augmented Generation) which does not need LLMs to be ine-tuned but they are limited by the quality of vector databases and their retrieval capabilities rather than the intrinsic capabilities of the LLMs themselves. In our current work we focus on fine tuning LLaMA, an open source LLM using proprietary documents and code from an enterprise repository and use the fine tuned models to evaluate the quality of responses. As part of this work, we aim to guide beginners on how to start with fine tuning an LLM for documentation and code by making educated guesses on size of GPU required and options that are available for formatting the data. We also propose pre processing recipes for both documentation and code to prepare dataset in different formats. The proposed methods of data preparation for document datasets are forming paragraph chunks, forming question and answer pairs and forming keyword and paragraph chunk pairs. For code dataset we propose forming summary and function pairs. Further, we qualitatively evaluate the results of the models for domain specific queries. Finally, we also propose practical guidelines and recommendations for fine tuning LLMs.
△ Less
Submitted 23 March, 2024;
originally announced April 2024.
-
SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras
Authors:
Nithya R,
Malavika S,
Jordan F,
Arjun Gangwar,
Metilda N J,
S Umesh,
Rithik Sarab,
Akhilesh Kumar Dubey,
Govind Divakaran,
Samudra Vijaya K,
Suryakanth V Gangashetty
Abstract:
India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sour…
▽ More
India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sourcing SPRING-INX data which has about 2000 hours of legally sourced and manually transcribed speech data for ASR system building in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi and Tamil. This endeavor is by SPRING Lab , Indian Institute of Technology Madras and is a part of National Language Translation Mission (NLTM), funded by the Indian Ministry of Electronics and Information Technology (MeitY), Government of India. We describe the data collection and data cleaning process along with the data statistics in this paper.
△ Less
Submitted 24 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Efficient Concept Drift Handling for Batch Android Malware Detection Models
Authors:
Molina-Coronado B.,
Mori U.,
Mendiburu A.,
Miguel-Alonso J
Abstract:
The rapidly evolving nature of Android apps poses a significant challenge to static batch machine learning algorithms employed in malware detection systems, as they quickly become obsolete. Despite this challenge, the existing literature pays limited attention to addressing this issue, with many advanced Android malware detection approaches, such as Drebin, DroidDet and MaMaDroid, relying on stati…
▽ More
The rapidly evolving nature of Android apps poses a significant challenge to static batch machine learning algorithms employed in malware detection systems, as they quickly become obsolete. Despite this challenge, the existing literature pays limited attention to addressing this issue, with many advanced Android malware detection approaches, such as Drebin, DroidDet and MaMaDroid, relying on static models. In this work, we show how retraining techniques are able to maintain detector capabilities over time. Particularly, we analyze the effect of two aspects in the efficiency and performance of the detectors: 1) the frequency with which the models are retrained, and 2) the data used for retraining. In the first experiment, we compare periodic retraining with a more advanced concept drift detection method that triggers retraining only when necessary. In the second experiment, we analyze sampling methods to reduce the amount of data used to retrain models. Specifically, we compare fixed sized windows of recent data and state-of-the-art active learning methods that select those apps that help keep the training dataset small but diverse. Our experiments show that concept drift detection and sample selection mechanisms result in very efficient retraining strategies which can be successfully used to maintain the performance of the static Android malware state-of-the-art detectors in changing environments.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Compressing Vision Transformers for Low-Resource Visual Learning
Authors:
Eric Youn,
Sai Mitheran J,
Sanjana Prabhu,
Siyuan Chen
Abstract:
Vision transformer (ViT) and its variants have swept through visual learning leaderboards and offer state-of-the-art accuracy in tasks such as image classification, object detection, and semantic segmentation by attending to different parts of the visual input and capturing long-range spatial dependencies. However, these models are large and computation-heavy. For instance, the recently proposed V…
▽ More
Vision transformer (ViT) and its variants have swept through visual learning leaderboards and offer state-of-the-art accuracy in tasks such as image classification, object detection, and semantic segmentation by attending to different parts of the visual input and capturing long-range spatial dependencies. However, these models are large and computation-heavy. For instance, the recently proposed ViT-B model has 86M parameters making it impractical for deployment on resource-constrained devices. As a result, their deployment on mobile and edge scenarios is limited. In our work, we aim to take a step toward bringing vision transformers to the edge by utilizing popular model compression techniques such as distillation, pruning, and quantization.
Our chosen application environment is an unmanned aerial vehicle (UAV) that is battery-powered and memory-constrained, carrying a single-board computer on the scale of an NVIDIA Jetson Nano with 4GB of RAM. On the other hand, the UAV requires high accuracy close to that of state-of-the-art ViTs to ensure safe object avoidance in autonomous navigation, or correct localization of humans in search-and-rescue. Inference latency should also be minimized given the application requirements. Hence, our target is to enable rapid inference of a vision transformer on an NVIDIA Jetson Nano (4GB) with minimal accuracy loss. This allows us to deploy ViTs on resource-constrained devices, opening up new possibilities in surveillance, environmental monitoring, etc. Our implementation is made available at https://github.com/chensy7/efficient-vit.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages
Authors:
Gowtham Ramesh,
Sumanth Doddapaneni,
Aravinth Bheemaraj,
Mayank Jobanputra,
Raghavan AK,
Ajitesh Sharma,
Sujit Sahoo,
Harshita Diddee,
Mahalakshmi J,
Divyanshu Kakwani,
Navneet Kumar,
Aswin Pradeep,
Srihari Nagaraj,
Kumar Deepak,
Vivek Raghavan,
Anoop Kunchukuttan,
Pratyush Kumar,
Mitesh Shantadevi Khapra
Abstract:
We present Samanantar, the largest publicly available parallel corpora collection for Indic languages. The collection contains a total of 49.7 million sentence pairs between English and 11 Indic languages (from two language families). Specifically, we compile 12.4 million sentence pairs from existing, publicly-available parallel corpora, and additionally mine 37.4 million sentence pairs from the w…
▽ More
We present Samanantar, the largest publicly available parallel corpora collection for Indic languages. The collection contains a total of 49.7 million sentence pairs between English and 11 Indic languages (from two language families). Specifically, we compile 12.4 million sentence pairs from existing, publicly-available parallel corpora, and additionally mine 37.4 million sentence pairs from the web, resulting in a 4x increase. We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences. Human evaluation of samples from the newly mined corpora validate the high quality of the parallel sentences across 11 languages. Further, we extract 83.4 million sentence pairs between all 55 Indic language pairs from the English-centric parallel corpus using English as the pivot language. We trained multilingual NMT models spanning all these languages on Samanantar, which outperform existing models and baselines on publicly available benchmarks, such as FLORES, establishing the utility of Samanantar. Our data and models are available publicly at https://ai4bharat.iitm.ac.in/samanantar and we hope they will help advance research in NMT and multilingual NLP for Indic languages.
△ Less
Submitted 12 June, 2023; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Investigation of Speaker-adaptation methods in Transformer based ASR
Authors:
Vishwas M. Shetty,
Metilda Sagaya Mary N J,
S. Umesh
Abstract:
End-to-end models are fast replacing the conventional hybrid models in automatic speech recognition. Transformer, a sequence-to-sequence model, based on self-attention popularly used in machine translation tasks, has given promising results when used for automatic speech recognition. This paper explores different ways of incorporating speaker information at the encoder input while training a trans…
▽ More
End-to-end models are fast replacing the conventional hybrid models in automatic speech recognition. Transformer, a sequence-to-sequence model, based on self-attention popularly used in machine translation tasks, has given promising results when used for automatic speech recognition. This paper explores different ways of incorporating speaker information at the encoder input while training a transformer-based model to improve its speech recognition performance. We present speaker information in the form of speaker embeddings for each of the speakers. We experiment using two types of speaker embeddings: x-vectors and novel s-vectors proposed in our previous work. We report results on two datasets a) NPTEL lecture database and b) Librispeech 500-hour split. NPTEL is an open-source e-learning portal providing lectures from top Indian universities. We obtain improvements in the word error rate over the baseline through our approach of integrating speaker embeddings into the model.
△ Less
Submitted 17 November, 2021; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Boundary-type Sets of Strong Product of Directed Graphs
Authors:
Prasanth G. Narasimha-Shenoi,
Bijo S Anand,
Mary Shalet T J
Abstract:
Let $D=(V,E)$ be a strongly connected digraph and let $u ,v\in V(D)$. The maximum distance $md (u,v)$ is defined as\\ $md(u,v)$=max\{$\overrightarrow{d}(u,v), \overrightarrow{d}(v,u)$\} where $\overrightarrow{d}(u,v)$ denote the length of a shortest directed $u-v$ path in $D$. This is a metric. The boundary, contour, eccentric and peripheral sets of a strong digraph $D$ with respect to this metric…
▽ More
Let $D=(V,E)$ be a strongly connected digraph and let $u ,v\in V(D)$. The maximum distance $md (u,v)$ is defined as\\ $md(u,v)$=max\{$\overrightarrow{d}(u,v), \overrightarrow{d}(v,u)$\} where $\overrightarrow{d}(u,v)$ denote the length of a shortest directed $u-v$ path in $D$. This is a metric. The boundary, contour, eccentric and peripheral sets of a strong digraph $D$ with respect to this metric have been defined, and the above said metrically defined sets of a large strong digraph $D$ have been investigated in terms of the factors in its prime factor decomposition with respect to Cartesian product. In this paper we investigate about the above boundary-type sets of a strong digraph $D$ in terms of the factors in its prime factor decomposition with respect to strong product.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
INFER: INtermediate representations for FuturE pRediction
Authors:
Shashank Srikanth,
Junaid Ahmed Ansari,
Karnik Ram R,
Sarthak Sharma,
Krishna Murthy J.,
Madhava Krishna K
Abstract:
In urban driving scenarios, forecasting future trajectories of surrounding vehicles is of paramount importance. While several approaches for the problem have been proposed, the best-performing ones tend to require extremely detailed input representations (eg. image sequences). But, such methods do not generalize to datasets they have not been trained on. We propose intermediate representations tha…
▽ More
In urban driving scenarios, forecasting future trajectories of surrounding vehicles is of paramount importance. While several approaches for the problem have been proposed, the best-performing ones tend to require extremely detailed input representations (eg. image sequences). But, such methods do not generalize to datasets they have not been trained on. We propose intermediate representations that are particularly well-suited for future prediction. As opposed to using texture (color) information, we rely on semantics and train an autoregressive model to accurately predict future trajectories of traffic participants (vehicles) (see fig. above). We demonstrate that using semantics provides a significant boost over techniques that operate over raw pixel intensities/disparities. Uncharacteristic of state-of-the-art approaches, our representations and models generalize to completely different datasets, collected across several cities, and also across countries where people drive on opposite sides of the road (left-handed vs right-handed driving). Additionally, we demonstrate an application of our approach in multi-object tracking (data association). To foster further research in transferrable representations and ensure reproducibility, we release all our code and data.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Directed graphs and its Boundary Vertices
Authors:
Manoj Changat,
Prasanth G. Narasimha-Shenoi,
Mary Shallet T. J,
Ram Kumar
Abstract:
Suppose that $D=(V,E)$ is a strongly connected digraph. Let $u,v\in V(D)$. The maximum distance $md (u,v)$ is defined as $md(u,v)$=max\{$\overrightarrow{d}(u,v), \overrightarrow{d}(v,u)$\} where $\overrightarrow{d}(u,v)$ denote the length of a shortest directed $u-v$ path in $D$. This is a metric. The boundary, contour, eccentric and peripheral sets of a strong digraph $D$ are defined with respect…
▽ More
Suppose that $D=(V,E)$ is a strongly connected digraph. Let $u,v\in V(D)$. The maximum distance $md (u,v)$ is defined as $md(u,v)$=max\{$\overrightarrow{d}(u,v), \overrightarrow{d}(v,u)$\} where $\overrightarrow{d}(u,v)$ denote the length of a shortest directed $u-v$ path in $D$. This is a metric. The boundary, contour, eccentric and peripheral sets of a strong digraph $D$ are defined with respect to this metric. The main aim of this paper is to identify the above said metrically defined sets of a large strong digraph $D$ in terms of its prime factor decomposition with respect to cartesian product.
△ Less
Submitted 10 September, 2016;
originally announced September 2016.
-
Activity Modeling in Smart Home using High Utility Pattern Mining over Data Streams
Authors:
Menaka Gandhi. J,
K. S. Gayathri
Abstract:
Smart home technology is a better choice for the people to care about security, comfort and power saving as well. It is required to develop technologies that recognize the Activities of Daily Living (ADLs) of the residents at home and detect the abnormal behavior in the individual's patterns. Data mining techniques such as Frequent pattern mining (FPM), High Utility Pattern (HUP) Mining were used…
▽ More
Smart home technology is a better choice for the people to care about security, comfort and power saving as well. It is required to develop technologies that recognize the Activities of Daily Living (ADLs) of the residents at home and detect the abnormal behavior in the individual's patterns. Data mining techniques such as Frequent pattern mining (FPM), High Utility Pattern (HUP) Mining were used to find those activity patterns from the collected sensor data. But applying the above technique for Activity Recognition from the temporal sensor data stream is highly complex and challenging task. So, a new approach is proposed for activity recognition from sensor data stream which is achieved by constructing Frequent Pattern Stream tree (FPS - tree). FPS is a sliding window based approach to discover the recent activity patterns over time from data streams. The proposed work aims at identifying the frequent pattern of the user from the sensor data streams which are later modeled for activity recognition. The proposed FPM algorithm uses a data structure called Linked Sensor Data Stream (LSDS) for storing the sensor data stream information which increases the efficiency of frequent pattern mining algorithm through both space and time. The experimental results show the efficiency of the proposed algorithm and this FPM is further extended for applying for power efficiency using HUP to detect the high usage of power consumption of residents at smart home.
△ Less
Submitted 25 June, 2013;
originally announced June 2013.