-
GroMo: Plant Growth Modeling with Multiview Images
Authors:
Ruchi Bhatt,
Shreya Bansal,
Amanpreet Chander,
Rupinder Kaur,
Malya Singh,
Mohan Kankanhalli,
Abdulmotaleb El Saddik,
Mukesh Kumar Saini
Abstract:
Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish,…
▽ More
Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish, okra, wheat, and mustard. Each crop consists of multiple plants (p1, p2, ..., pn) captured over different days (d1, d2, ..., dm) and categorized into five levels (L1, L2, L3, L4, L5). Each plant is captured from 24 different angles with a 15-degree gap between images. Participants are required to perform both tasks for all four crops with these multiview images. We proposed a Multiview Vision Transformer (MVVT) model for the GroMo challenge and evaluated the crop-wise performance on GroMo25. MVVT reports an average MAE of 7.74 for age prediction and an MAE of 5.52 for leaf count. The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth. The GitHub repository is publicly available at https://github.com/mriglab/GroMo-Plant-Growth-Modeling-with-Multiview-Images.
△ Less
Submitted 6 June, 2025; v1 submitted 9 March, 2025;
originally announced March 2025.
-
Characterizing Continual Learning Scenarios and Strategies for Audio Analysis
Authors:
Ruchi Bhatt,
Pratibha Kumari,
Dwarikanath Mahapatra,
Abdulmotaleb El Saddik,
Mukesh Saini
Abstract:
Audio analysis is useful in many application scenarios. The state-of-the-art audio analysis approaches assume the data distribution at training and deployment time will be the same. However, due to various real-life challenges, the data may encounter drift in its distribution or can encounter new classes in the late future. Thus, a one-time trained model might not perform adequately. Continual lea…
▽ More
Audio analysis is useful in many application scenarios. The state-of-the-art audio analysis approaches assume the data distribution at training and deployment time will be the same. However, due to various real-life challenges, the data may encounter drift in its distribution or can encounter new classes in the late future. Thus, a one-time trained model might not perform adequately. Continual learning (CL) approaches are devised to handle such changes in data distribution. There have been a few attempts to use CL approaches for audio analysis. Yet, there is a lack of a systematic evaluation framework. In this paper, we create a comprehensive CL dataset and characterize CL approaches for audio-based monitoring tasks. We have investigated the following CL and non-CL approaches: EWC, LwF, SI, GEM, A-GEM, GDumb, Replay, Naive, Cumulative, and Joint training. The study is very beneficial for researchers and practitioners working in the area of audio analysis for developing adaptive models. We observed that Replay achieved better results than other methods in the DCASE challenge data. It achieved an accuracy of 70.12% for the domain incremental scenario and an accuracy of 96.98% for the class incremental scenario.
△ Less
Submitted 26 July, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
GEE! Grammar Error Explanation with Large Language Models
Authors:
Yixiao Song,
Kalpesh Krishna,
Rajesh Bhatt,
Kevin Gimpel,
Mohit Iyyer
Abstract:
Grammatical error correction tools are effective at correcting grammatical errors in users' input sentences but do not provide users with \textit{natural language} explanations about their errors. Such explanations are essential for helping users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006). To address this gap, we propose the t…
▽ More
Grammatical error correction tools are effective at correcting grammatical errors in users' input sentences but do not provide users with \textit{natural language} explanations about their errors. Such explanations are essential for helping users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006). To address this gap, we propose the task of grammar error explanation, where a system needs to provide one-sentence explanations for each grammatical error in a pair of erroneous and corrected sentences. We analyze the capability of GPT-4 in grammar error explanation, and find that it only produces explanations for 60.2% of the errors using one-shot prompting. To improve upon this performance, we develop a two-step pipeline that leverages fine-tuned and prompted large language models to perform structured atomic token edit extraction, followed by prompting GPT-4 to generate explanations. We evaluate our pipeline on German and Chinese grammar error correction data sampled from language learners with a wide range of proficiency levels. Human evaluation reveals that our pipeline produces 93.9% and 98.0% correct explanations for German and Chinese data, respectively. To encourage further research in this area, we will open-source our data and code.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Explainable Artificial Intelligence in Retinal Imaging for the detection of Systemic Diseases
Authors:
Ayushi Raj Bhatt,
Rajkumar Vaghashiya,
Meghna Kulkarni,
Dr Prakash Kamaraj
Abstract:
Explainable Artificial Intelligence (AI) in the form of an interpretable and semiautomatic approach to stage grading ocular pathologies such as Diabetic retinopathy, Hypertensive retinopathy, and other retinopathies on the backdrop of major systemic diseases. The experimental study aims to evaluate an explainable staged grading process without using deep Convolutional Neural Networks (CNNs) direct…
▽ More
Explainable Artificial Intelligence (AI) in the form of an interpretable and semiautomatic approach to stage grading ocular pathologies such as Diabetic retinopathy, Hypertensive retinopathy, and other retinopathies on the backdrop of major systemic diseases. The experimental study aims to evaluate an explainable staged grading process without using deep Convolutional Neural Networks (CNNs) directly. Many current CNN-based deep neural networks used for diagnosing retinal disorders might have appreciable performance but fail to pinpoint the basis driving their decisions. To improve these decisions' transparency, we have proposed a clinician-in-the-loop assisted intelligent workflow that performs a retinal vascular assessment on the fundus images to derive quantifiable and descriptive parameters. The retinal vessel parameters meta-data serve as hyper-parameters for better interpretation and explainability of decisions. The semiautomatic methodology aims to have a federated approach to AI in healthcare applications with more inputs and interpretations from clinicians. The baseline process involved in the machine learning pipeline through image processing techniques for optic disc detection, vessel segmentation, and arteriole/venule identification.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
SLING: Sino Linguistic Evaluation of Large Language Models
Authors:
Yixiao Song,
Kalpesh Krishna,
Rajesh Bhatt,
Mohit Iyyer
Abstract:
To understand what kinds of linguistic knowledge are encoded by pretrained Chinese language models (LMs), we introduce the benchmark of Sino LINGuistics (SLING), which consists of 38K minimal sentence pairs in Mandarin Chinese grouped into 9 high-level linguistic phenomena. Each pair demonstrates the acceptability contrast of a specific syntactic or semantic phenomenon (e.g., The keys are lost vs.…
▽ More
To understand what kinds of linguistic knowledge are encoded by pretrained Chinese language models (LMs), we introduce the benchmark of Sino LINGuistics (SLING), which consists of 38K minimal sentence pairs in Mandarin Chinese grouped into 9 high-level linguistic phenomena. Each pair demonstrates the acceptability contrast of a specific syntactic or semantic phenomenon (e.g., The keys are lost vs. The keys is lost), and an LM should assign lower perplexity to the acceptable sentence. In contrast to the CLiMP dataset (Xiang et al., 2021), which also contains Chinese minimal pairs and was created by translating the vocabulary of the English BLiMP dataset, the minimal pairs in SLING are derived primarily by applying syntactic and lexical transformations to naturally-occurring, linguist-annotated sentences from the Chinese Treebank 9.0, thus addressing severe issues in CLiMP's data generation process. We test 18 publicly available pretrained monolingual (e.g., BERT-base-zh, CPM) and multi-lingual (e.g., mT5, XLM) language models on SLING. Our experiments show that the average accuracy for LMs is far below human performance (69.7% vs. 97.1%), while BERT-base-zh achieves the highest accuracy (84.8%) of all tested LMs, even much larger ones. Additionally, we find that most LMs have a strong gender and number (singular/plural) bias, and they perform better on local phenomena than hierarchical ones.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Similar Cases Recommendation using Legal Knowledge Graphs
Authors:
Jaspreet Singh Dhani,
Ruchika Bhatt,
Balaji Ganesan,
Parikshet Sirohi,
Vasudha Bhatnagar
Abstract:
A legal knowledge graph constructed from court cases, judgments, laws and other legal documents can enable a number of applications like question answering, document similarity, and search. While the use of knowledge graphs for distant supervision in NLP tasks is well researched, using knowledge graphs for applications like case similarity presents challenges. In this work, we describe our solutio…
▽ More
A legal knowledge graph constructed from court cases, judgments, laws and other legal documents can enable a number of applications like question answering, document similarity, and search. While the use of knowledge graphs for distant supervision in NLP tasks is well researched, using knowledge graphs for applications like case similarity presents challenges. In this work, we describe our solution for predicting similar cases in Indian court judgements. We present our results and also discuss the impact of large language models on this task.
△ Less
Submitted 2 March, 2024; v1 submitted 10 July, 2021;
originally announced July 2021.
-
Pho(SC)-CTC -- A Hybrid Approach Towards Zero-shot Word Image Recognition
Authors:
Ravi Bhatt,
Anuj Rai,
Narayanan C. Krishnan,
Sukalpa Chanda
Abstract:
Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, Zero-shot learning methods could aptly be used to recognize unseen/out-of-lexicon words in such historical document images. Based on…
▽ More
Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, Zero-shot learning methods could aptly be used to recognize unseen/out-of-lexicon words in such historical document images. Based on previous state-of-the-art method for zero-shot word recognition Pho(SC)Net, we propose a hybrid model based on the CTC framework (Pho(SC)-CTC) that takes advantage of the rich features learned by Pho(SC)Net followed by a connectionist temporal classification (CTC) framework to perform the final classification. Encouraging results were obtained on two publicly available historical document datasets and one synthetic handwritten dataset, which justifies the efficacy of Pho(SC)-CTC and Pho(SC)Net.
△ Less
Submitted 21 December, 2022; v1 submitted 31 May, 2021;
originally announced May 2021.
-
LANNS: A Web-Scale Approximate Nearest Neighbor Lookup System
Authors:
Ishita Doshi,
Dhritiman Das,
Ashish Bhutani,
Rajeev Kumar,
Rushi Bhatt,
Niranjan Balasubramanian
Abstract:
Nearest neighbor search (NNS) has a wide range of applications in information retrieval, computer vision, machine learning, databases, and other areas. Existing state-of-the-art algorithm for nearest neighbor search, Hierarchical Navigable Small World Networks(HNSW), is unable to scale to large datasets of 100M records in high dimensions. In this paper, we propose LANNS, an end-to-end platform for…
▽ More
Nearest neighbor search (NNS) has a wide range of applications in information retrieval, computer vision, machine learning, databases, and other areas. Existing state-of-the-art algorithm for nearest neighbor search, Hierarchical Navigable Small World Networks(HNSW), is unable to scale to large datasets of 100M records in high dimensions. In this paper, we propose LANNS, an end-to-end platform for Approximate Nearest Neighbor Search, which scales for web-scale datasets. Library for Large Scale Approximate Nearest Neighbor Search (LANNS) is deployed in multiple production systems for identifying topK ($100 \leq topK \leq 200$) approximate nearest neighbors with a latency of a few milliseconds per query, high throughput of 2.5k Queries Per Second (QPS) on a single node, on large ($\sim$180M data points) high dimensional (50-2048 dimensional) datasets.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
Efficient Hierarchical Clustering for Classification and Anomaly Detection
Authors:
Ishita Doshi,
Sreekalyan Sajjalla,
Jayesh Choudhari,
Rushi Bhatt,
Anirban Dasgupta
Abstract:
We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that ca…
▽ More
We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that can be used both for efficient and scalable real-time multiclass classification as well as in detecting new anomalies in user-generated content. Our methods have low query time, linear space usage, and come with theoretical guarantees with respect to a specific hierarchical clustering cost function (Dasgupta, 2016). We compare our solutions against a range of classification techniques and demonstrate excellent empirical performance.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
Personalizing Smartwatch Based Activity Recognition Using Transfer Learning
Authors:
Karanpreet Singh,
Rajen Bhatt
Abstract:
Smartwatches are increasingly being used to recognize human daily life activities. These devices may employ different kind of machine learning (ML) solutions. One of such ML models is Gradient Boosting Machine (GBM) which has shown an excellent performance in the literature. The GBM can be trained on available data set before it is deployed on any device. However, this data set may not represent e…
▽ More
Smartwatches are increasingly being used to recognize human daily life activities. These devices may employ different kind of machine learning (ML) solutions. One of such ML models is Gradient Boosting Machine (GBM) which has shown an excellent performance in the literature. The GBM can be trained on available data set before it is deployed on any device. However, this data set may not represent every kind of human behavior in real life. For example, a ML model to detect elder and young persons running activity may give different results because of differences in their activity patterns. This may result in decrease in the accuracy of activity recognition. Therefore, a transfer learning based method is proposed in which user-specific performance can be improved significantly by doing on-device calibration of GBM by just tuning its parameters without retraining its estimators. Results show that this method can significantly improve the user-based accuracy for activity recognition.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Competence building framework requirements for information technology for educational management
Authors:
Rakesh Mohan Bhatt
Abstract:
Progressive efforts have been evolving continuously for the betterment of the services of the Information Technology for Educational Management(ITEM). These services require data intensive and communication intensive applications. Due to the massive growth of information, situation becomes difficult to manage these services. Here the role of the Information and Communication Technology (ICT) infra…
▽ More
Progressive efforts have been evolving continuously for the betterment of the services of the Information Technology for Educational Management(ITEM). These services require data intensive and communication intensive applications. Due to the massive growth of information, situation becomes difficult to manage these services. Here the role of the Information and Communication Technology (ICT) infrastructure particularly data centre with communication components becomes important to facilitate these services. The present paper discusses the related issues such as competent staff, appropriate ICT infrastructure, ICT acceptance level etc. required for ITEM competence building framework considering the earlier approach for core competences for ITEM. It this connection, it is also necessary to consider the procurement of standard and appropriate ICT facilities. This will help in the integration of these facilities for the future expansion. This will also enable to create and foresee the impact of the pairing the management with information, technology, and education components individually and combined. These efforts will establish a strong coupling between the ITEM activities and resource management for effective implementation of the framework.
△ Less
Submitted 12 January, 2016;
originally announced October 2016.
-
Real-Time Bid Optimization for Group-Buying Ads
Authors:
Raju Balakrishnan,
Rushi P Bhatt
Abstract:
Group-buying ads seeking a minimum number of customers before the deal expiry are increasingly used by the daily-deal providers. Unlike the traditional web ads, the advertiser's profits for group-buying ads depends on the time to expiry and additional customers needed to satisfy the minimum group size. Since both these quantities are time-dependent, optimal bid amounts to maximize profits change w…
▽ More
Group-buying ads seeking a minimum number of customers before the deal expiry are increasingly used by the daily-deal providers. Unlike the traditional web ads, the advertiser's profits for group-buying ads depends on the time to expiry and additional customers needed to satisfy the minimum group size. Since both these quantities are time-dependent, optimal bid amounts to maximize profits change with every impression. Consequently, traditional static bidding strategies are far from optimal. Instead, bid values need to be optimized in real-time to maximize expected bidder profits. This online optimization of deal profits is made possible by the advent of ad exchanges offering real-time (spot) bidding. To this end, we propose a real-time bidding strategy for group-buying deals based on the online optimization of bid values. We derive the expected bidder profit of deals as a function of the bid amounts, and dynamically vary bids to maximize profits. Further, to satisfy time constraints of the online bidding, we present methods of minimizing computation timings. Subsequently, we derive the real time ad selection, admissibility, and real time bidding of the traditional ads as the special cases of the proposed method. We evaluate the proposed bidding, selection and admission strategies on a multi-million click stream of 935 ads. The proposed real-time bidding, selection and admissibility show significant profit increases over the existing strategies. Further the experiments illustrate the robustness of the bidding and acceptable computation timings.
△ Less
Submitted 3 June, 2012;
originally announced June 2012.
-
Adjacency Matrix Based Energy Efficient Scheduling using S-MAC Protocol in Wireless Sensor Networks
Authors:
Shweta Singh,
Ravindara Bhatt
Abstract:
Communication is the main motive in any Networks whether it is Wireless Sensor Network, Ad-Hoc networks, Mobile Networks, Wired Networks, Local Area Network, Metropolitan Area Network, Wireless Area Network etc, hence it must be energy efficient. The main parameters for energy efficient communication are maximizing network lifetime, saving energy at the different nodes, sending the packets in mini…
▽ More
Communication is the main motive in any Networks whether it is Wireless Sensor Network, Ad-Hoc networks, Mobile Networks, Wired Networks, Local Area Network, Metropolitan Area Network, Wireless Area Network etc, hence it must be energy efficient. The main parameters for energy efficient communication are maximizing network lifetime, saving energy at the different nodes, sending the packets in minimum time delay, higher throughput etc. This paper focuses mainly on the energy efficient communication with the help of Adjacency Matrix in the Wireless Sensor Networks. The energy efficient scheduling can be done by putting the idle node in to sleep node so energy at the idle node can be saved. The proposed model in this paper first forms the adjacency matrix and broadcasts the information about the total number of existing nodes with depths to the other nodes in the same cluster from controller node. When every node receives the node information about the other nodes for same cluster they communicate based on the shortest depths and schedules the idle node in to sleep mode for a specific time threshold so energy at the idle nodes can be saved.
△ Less
Submitted 4 April, 2012;
originally announced April 2012.