Skip to main content

Showing 1–30 of 30 results for author: Jana, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.21689  [pdf, other

    cs.CL cs.AI cs.LG

    LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model

    Authors: Avijit Gayen, Somyajit Chakraborty, Mainak Sen, Soham Paul, Angshuman Jana

    Abstract: The persistent accumulation of unresolved legal cases, especially within the Indian judiciary, significantly hampers the timely delivery of justice. Manual methods of prioritizing petitions are often prone to inefficiencies and subjective biases further exacerbating delays. To address this issue, we propose LLMPR (Large Language Model-based Petition Ranking), an automated framework that utilizes t… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 28 pages, 5 figures, journal paper, submitted to AI and Law

  2. arXiv:2504.17056  [pdf

    cs.CY

    Evaluating energy inefficiency in energy-poor households in India: A frontier analysis approach

    Authors: Vallary Gupta, Ahana Sarkar, Chirag Deb, Arnab Jana

    Abstract: Energy-poor households often compromise their thermal comfort and refrain from operating mechanical cooling devices to avoid high electricity bills. This is compounded by certain behavioral practices like retention of older, less efficient appliances, resulting in missed energy savings. Thus, the need to enhance efficiency becomes critical in these households. However, due to a lack of comprehensi… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 42 pages 7 Figures 5 Tables Arnab Jana led and supervised the study. Vallary Gupta analyzed the dataset, executed the SFA model, prepared graphics and wrote the manuscript. Dr. Ahana Sarkar coordinated the data collection, interpretation of model results and design of policy implications. Dr. Chirag Deb provided technical support. All authors reviewed and approved the final manuscript

  3. arXiv:2504.16276  [pdf, other

    cs.LG cs.AI cs.CV cs.SD

    An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon

    Authors: Abhishek Jana, Moeumu Uili, James Atherton, Mark O'Brien, Joe Wood, Leandra Brickson

    Abstract: This paper presents an automated one-shot bird call classification pipeline designed for rare species absent from large publicly available classifiers like BirdNET and Perch. While these models excel at detecting common birds with abundant training data, they lack options for species with only 1-3 known recordings-a critical limitation for conservationists monitoring the last remaining individuals… ▽ More

    Submitted 2 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 16 pages, 5 figures, 4 tables

  4. arXiv:2503.02032  [pdf, other

    cs.CL cs.AI cs.CV

    Comparative Analysis of OpenAI GPT-4o and DeepSeek R1 for Scientific Text Categorization Using Prompt Engineering

    Authors: Aniruddha Maiti, Samuel Adewumi, Temesgen Alemayehu Tikure, Zichun Wang, Niladri Sengupta, Anastasiia Sukhanova, Ananya Jana

    Abstract: This study examines how large language models categorize sentences from scientific papers using prompt engineering. We use two advanced web-based models, GPT-4o (by OpenAI) and DeepSeek R1, to classify sentences into predefined relationship categories. DeepSeek R1 has been tested on benchmark datasets in its technical report. However, its performance in scientific text categorization remains unexp… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted to ASEE North Central Section 2025

  5. arXiv:2502.20508  [pdf, other

    cs.CL cs.AI

    TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning

    Authors: Soumyabrata Chaudhuri, Pranav Purkar, Ritwik Raghav, Shubhojit Mallick, Manish Gupta, Abhik Jana, Shreya Ghosh

    Abstract: Recent advancements in probing Large Language Models (LLMs) have explored their latent potential as personalized travel planning agents, yet existing benchmarks remain limited in real world applicability. Existing datasets, such as TravelPlanner and TravelPlanner+, suffer from semi synthetic data reliance, spatial inconsistencies, and a lack of key travel constraints, making them inadequate for pr… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 27 pages, 18 Tables and 6 Figures

  6. arXiv:2502.19515  [pdf, other

    cs.CV

    Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation

    Authors: Daron Weekley, Jace Duckworth, Anastasiia Sukhanova, Ananya Jana

    Abstract: Intraoral scans are widely used in digital dentistry for tasks such as dental restoration, treatment planning, and orthodontic procedures. These scans contain detailed topological information, but manual annotation of these scans remains a time-consuming task. Deep learning-based methods have been developed to automate tasks such as tooth segmentation. A typical intraoral scan contains over 200,00… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: accepted to 2025 ASEE North Central Section Annual Conference

  7. arXiv:2411.04557  [pdf, other

    cs.CL cs.LG

    Pruning Literals for Highly Efficient Explainability at Word Level

    Authors: Rohan Kumar Yadav, Bimal Bhattarai, Abhik Jana, Lei Jiao, Seid Muhie Yimam

    Abstract: Designing an explainable model becomes crucial now for Natural Language Processing(NLP) since most of the state-of-the-art machine learning models provide a limited explanation for the prediction. In the spectrum of an explainable model, Tsetlin Machine(TM) is promising because of its capability of providing word-level explanation using proposition logic. However, concern rises over the elaborated… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: 8 pages, 3 figures

    Journal ref: 2024 International Symposium on the Tsetlin Machine (ISTM)

  8. arXiv:2410.01400  [pdf, other

    cs.CL

    CrowdCounter: A benchmark type-specific multi-target counterspeech dataset

    Authors: Punyajoy Saha, Abhilash Datta, Abhik Jana, Animesh Mukherjee

    Abstract: Counterspeech presents a viable alternative to banning or suspending users for hate speech while upholding freedom of expression. However, writing effective counterspeech is challenging for moderators/users. Hence, developing suggestion tools for writing counterspeech is the need of the hour. One critical challenge in developing such a tool is the lack of quality and diversity of the responses in… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: 19 pages, 1 figure, 14 tables, Code available https://github.com/hate-alert/CrowdCounter

  9. arXiv:2403.14938  [pdf, ps, other

    cs.CL

    On Zero-Shot Counterspeech Generation by LLMs

    Authors: Punyajoy Saha, Aalok Agrawal, Abhik Jana, Chris Biemann, Animesh Mukherjee

    Abstract: With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large lan… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 12 pages, 7 tables, accepted at LREC-COLING 2024

  10. arXiv:2305.00244  [pdf, other

    cs.CV cs.LG

    A Critical Analysis of the Limitation of Deep Learning based 3D Dental Mesh Segmentation Methods in Segmenting Partial Scans

    Authors: Ananya Jana, Aniruddha Maiti, Dimitris N. Metaxas

    Abstract: Tooth segmentation from intraoral scans is a crucial part of digital dentistry. Many Deep Learning based tooth segmentation algorithms have been developed for this task. In most of the cases, high accuracy has been achieved, although, most of the available tooth segmentation techniques make an implicit restrictive assumption of full jaw model and they report accuracy based on full jaw models. Medi… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

    Comments: accepted to IEEE EMBC 2023

  11. arXiv:2302.12039  [pdf, other

    cs.CL cs.AI

    Natural Language Processing in the Legal Domain

    Authors: Daniel Martin Katz, Dirk Hartung, Lauritz Gerlach, Abhik Jana, Michael J. Bommarito II

    Abstract: In this paper, we summarize the current state of the field of NLP & Law with a specific focus on recent technical and substantive developments. To support our analysis, we construct and analyze a nearly complete corpus of more than six hundred NLP & Law related papers published over the past decade. Our analysis highlights several major trends. Namely, we document an increasing number of papers wr… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: 13 pages, 7 figures, 2 tables, online source and data

  12. arXiv:2301.10531  [pdf, other

    cs.CV cs.AI

    3D Tooth Mesh Segmentation with Simplified Mesh Cell Representation

    Authors: Ananya Jana, Hrebesh Molly Subhash, Dimitris N. Metaxas

    Abstract: Manual tooth segmentation of 3D tooth meshes is tedious and there is variations among dentists. %Manual tooth annotation of 3D tooth meshes is a tedious task. Several deep learning based methods have been proposed to perform automatic tooth mesh segmentation. Many of the proposed tooth mesh segmentation algorithms summarize the mesh cell as - the cell center or barycenter, the normal at barycenter… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: accepted at IEEE ISBI 2023 International Symposium on Biomedical Imaging

  13. arXiv:2209.08132  [pdf, other

    cs.CV

    Automatic Tooth Segmentation from 3D Dental Model using Deep Learning: A Quantitative Analysis of what can be learnt from a Single 3D Dental Model

    Authors: Ananya Jana, Hrebesh Molly Subhash, Dimitris Metaxas

    Abstract: 3D tooth segmentation is an important task for digital orthodontics. Several Deep Learning methods have been proposed for automatic tooth segmentation from 3D dental models or intraoral scans. These methods require annotated 3D intraoral scans. Manually annotating 3D intraoral scans is a laborious task. One approach is to devise self-supervision methods to reduce the manual labeling effort. Compar… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: accepted to SIPAIM 2022

  14. arXiv:2110.00976  [pdf, other

    cs.CL

    LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

    Authors: Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras

    Abstract: Laws and their interpretations, legal arguments and agreements\ are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeav… ▽ More

    Submitted 8 November, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

    Comments: 9 pages, long paper at ACL 2022 proceedings. LexGLUE benchmark is available at: https://huggingface.co/datasets/lex_glue. Code is available at: https://github.com/coastalcph/lex-glue. Update TFIDF-SVM scores in the last version

  15. arXiv:2109.05087  [pdf, other

    cs.LG cs.AI

    Global and Local Interpretation of black-box Machine Learning models to determine prognostic factors from early COVID-19 data

    Authors: Ananya Jana, Carlos D. Minacapelli, Vinod Rustgi, Dimitris Metaxas

    Abstract: The COVID-19 corona virus has claimed 4.1 million lives, as of July 24, 2021. A variety of machine learning models have been applied to related data to predict important factors such as the severity of the disease, infection rate and discover important prognostic factors. Often the usefulness of the findings from the use of these techniques is reduced due to lack of method interpretability. Some r… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: accepted by SIPAIM 2021, code repository: https://github.com/ananyajana/interpretablecovid19

  16. arXiv:2103.03761  [pdf, other

    eess.IV cs.CV

    Liver Fibrosis and NAS scoring from CT images using self-supervised learning and texture encoding

    Authors: Ananya Jana, Hui Qu, Carlos D. Minacapelli, Carolyn Catalano, Vinod Rustgi, Dimitris Metaxas

    Abstract: Non-alcoholic fatty liver disease (NAFLD) is one of the most common causes of chronic liver diseases (CLD) which can progress to liver cancer. The severity and treatment of NAFLD is determined by NAFLD Activity Scores (NAS)and liver fibrosis stage, which are usually obtained from liver biopsy. However, biopsy is invasive in nature and involves risk of procedural complications. Current methods to p… ▽ More

    Submitted 15 March, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

    Comments: 5 pages, 2 figures, accepted at ISBI 2021, code at this URL: https://github.com/ananyajana/fibrosis_code

  17. arXiv:2009.10687  [pdf, other

    eess.IV cs.CV

    Deep Learning based NAS Score and Fibrosis Stage Prediction from CT and Pathology Data

    Authors: Ananya Jana, Hui Qu, Puru Rattan, Carlos D. Minacapelli, Vinod Rustgi, Dimitris Metaxas

    Abstract: Non-Alcoholic Fatty Liver Disease (NAFLD) is becoming increasingly prevalent in the world population. Without diagnosis at the right time, NAFLD can lead to non-alcoholic steatohepatitis (NASH) and subsequent liver damage. The diagnosis and treatment of NAFLD depend on the NAFLD activity score (NAS) and the liver fibrosis stage, which are usually evaluated from liver biopsies by pathologists. In t… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

    Comments: 6 pages, 3 figures. Accepted in IEEE BIBE 2020

  18. Neural Fuzzy Extractors: A Secure Way to Use Artificial Neural Networks for Biometric User Authentication

    Authors: Abhishek Jana, Bipin Paudel, Md Kamruzzaman Sarker, Monireh Ebrahimi, Pascal Hitzler, George T Amariucai

    Abstract: Powered by new advances in sensor development and artificial intelligence, the decreasing cost of computation, and the pervasiveness of handheld computation devices, biometric user authentication (and identification) is rapidly becoming ubiquitous. Modern approaches to biometric authentication, based on sophisticated machine learning techniques, cannot avoid storing either trained-classifier detai… ▽ More

    Submitted 18 December, 2023; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 8 pages, 5 figures

    Journal ref: Proceedings on Privacy Enhancing Technologies, 2022, volume 4, pages 86-104

  19. arXiv:2002.11506  [pdf, other

    cs.CL

    Using Distributional Thesaurus Embedding for Co-hyponymy Detection

    Authors: Abhik Jana, Nikhil Reddy Varimalla, Pawan Goyal

    Abstract: Discriminating lexical relations among distributionally similar words has always been a challenge for natural language processing (NLP) community. In this paper, we investigate whether the network embedding of distributional thesaurus can be effectively utilized to detect co-hyponymy relations. By extensive experiments over three benchmark datasets, we show that the vector representation obtained… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: Accepted in LREC 2020. arXiv admin note: text overlap with arXiv:1802.04609

  20. arXiv:1909.09774  [pdf

    cs.CY

    LULC classification methodology based on simple Convolutional Neural Network to map complex urban forms at finer scale: Evidence from Mumbai

    Authors: Deepank Verma, Arnab Jana

    Abstract: The satellite imagery classification task is fundamental to spatial knowledge discovery. Several image classification methods are used to create standardized Land use and Land cover (LULC) maps, which facilitate research on spatial and ecological processes and human activities. Local Climate Zones (LCZ) classification maps are an example of standardized maps which have been widely used to demarcat… ▽ More

    Submitted 1 May, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

    Comments: 28 pages, 9 figures

  21. arXiv:1909.00160  [pdf, other

    cs.CL cs.AI cs.LG

    Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs

    Authors: Soumya Sharma, Bishal Santra, Abhik Jana, T. Y. S. S. Santosh, Niloy Ganguly, Pawan Goyal

    Abstract: Recently, biomedical version of embeddings obtained from language models such as BioELMo have shown state-of-the-art results for the textual inference task in the medical domain. In this paper, we explore how to incorporate structured domain knowledge, available in the form of a knowledge graph (UMLS), for the Medical NLI task. Specifically, we experiment with fusing embeddings obtained from knowl… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

    Comments: EMNLP 2019 accepted short paper

  22. arXiv:1906.03007  [pdf, ps, other

    cs.CL

    On the Compositionality Prediction of Noun Phrases using Poincaré Embeddings

    Authors: Abhik Jana, Dmitry Puzyrev, Alexander Panchenko, Pawan Goyal, Chris Biemann, Animesh Mukherjee

    Abstract: The compositionality degree of multiword expressions indicates to what extent the meaning of a phrase can be derived from the meaning of its constituents and their grammatical relations. Prediction of (non)-compositionality is a task that has been frequently addressed with distributional semantic models. We introduce a novel technique to blend hierarchical information with distributional informati… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: Accepted in ACL 2019 [Long Paper]

  23. arXiv:1812.05936  [pdf, other

    cs.CL

    Detecting Reliable Novel Word Senses: A Network-Centric Approach

    Authors: Abhik Jana, Animesh Mukherjee, Pawan Goyal

    Abstract: In this era of Big Data, due to expeditious exchange of information on the web, words are being used to denote newer meanings, causing linguistic shift. With the recent availability of large amounts of digitized texts, an automated analysis of the evolution of language has become possible. Our study mainly focuses on improving the detection of new word senses. This paper presents a unique proposal… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

  24. arXiv:1806.04092  [pdf, other

    cs.CL

    WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages

    Authors: Abhik Jana, Pranjal Kanojiya, Pawan Goyal, Animesh Mukherjee

    Abstract: The exponential increase in the usage of Wikipedia as a key source of scientific knowledge among the researchers is making it absolutely necessary to metamorphose this knowledge repository into an integral and self-contained source of information for direct utilization. Unfortunately, the references which support the content of each Wikipedia entity page, are far from complete. Why are the referen… ▽ More

    Submitted 15 June, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

  25. arXiv:1802.06196  [pdf, other

    cs.CL

    Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?

    Authors: Abhik Jana, Pawan Goyal

    Abstract: Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model (Word2vec) or dense count based model (GloVe), others attempt to represent these in a distributional thesaurus network structure where the neighborhood of a word is a… ▽ More

    Submitted 17 February, 2018; originally announced February 2018.

  26. arXiv:1802.04609  [pdf, other

    cs.CL

    Network Features Based Co-hyponymy Detection

    Authors: Abhik Jana, Pawan Goyal

    Abstract: Distinguishing lexical relations has been a long term pursuit in natural language processing (NLP) domain. Recently, in order to detect lexical relations like hypernymy, meronymy, co-hyponymy etc., distributional semantic models are being used extensively in some form or the other. Even though a lot of efforts have been made for detecting hypernymy relation, the problem of co-hyponymy detection ha… ▽ More

    Submitted 13 February, 2018; originally announced February 2018.

  27. arXiv:1710.05246  [pdf

    q-bio.NC cs.DC

    Shared High Value Research Resources: The CamCAN Human Lifespan Neuroimaging Dataset Processed on the Open Science Grid

    Authors: Don Krieger, Paul Shepard, Ben Zusman, Anirban Jana, David O. Okonkwo

    Abstract: The CamCAN Lifespan Neuroimaging Dataset, Cambridge (UK) Centre for Ageing and Neuroscience, was acquired and processed beginning in December, 2016. The referee consensus solver deployed to the Open Science Grid was used for this task. The dataset includes demographic and screening measures, a high-resolution MRI scan of the brain, and whole-head magnetoencephalographic (MEG) recordings during eye… ▽ More

    Submitted 8 December, 2017; v1 submitted 14 October, 2017; originally announced October 2017.

    Comments: 8 pages, 7 figures; Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine; Keynote to The International Workshop on High Throughput Computing in Bioinformatics and Biomedicine using the Open Science Grid

  28. arXiv:1705.03264  [pdf, other

    cs.IR

    WikiM: Metapaths based Wikification of Scientific Abstracts

    Authors: Abhik Jana, Sruthi Mooriyath, Animesh Mukherjee, Pawan Goyal

    Abstract: In order to disseminate the exponential extent of knowledge being produced in the form of scientific publications, it would be best to design mechanisms that connect it with already existing rich repository of concepts -- the Wikipedia. Not only does it make scientific reading simple and easy (by connecting the involved concepts used in the scientific articles to their Wikipedia explanations) but… ▽ More

    Submitted 9 May, 2017; originally announced May 2017.

  29. arXiv:1608.05368  [pdf, ps, other

    cs.PL cs.LO

    Scaling Bounded Model Checking By Transforming Programs With Arrays

    Authors: Anushri Jana, Uday P. Khedker, Advaita Datar, R Venkatesh, C Niyas

    Abstract: Bounded Model Checking is one the most successful techniques for finding bugs in program. However, for programs with loops iterating over large-sized arrays, bounded model checkers often exceed the limit of resources available to them. We present a transformation that enables bounded model checkers to verify a certain class of array properties. Our technique transforms an array-manipulating progra… ▽ More

    Submitted 17 August, 2016; originally announced August 2016.

    Comments: Pre-proceedings paper presented at the 26th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2016), Edinburgh, Scotland UK, 6-8 September 2016 (arXiv:1608.02534)

    Report number: LOPSTR/2016/23

  30. arXiv:1606.06974  [pdf, ps, other

    cs.LO

    Scaling Bounded Model Checking By Transforming Programs With Arrays

    Authors: Anushri Jana, Uday P. Khedker, Advaita Datar, R Venkatesh, C Niyas

    Abstract: Bounded Model Checking is one the most successful techniques for finding bugs in program. However, model checkers are resource hungry and are often unable to verify programs with loops iterating over large arrays.We present a transformation that enables bounded model checkers to verify a certain class of array properties. Our technique transforms an array-manipulating (ANSI-C) program to an array-… ▽ More

    Submitted 7 March, 2017; v1 submitted 22 June, 2016; originally announced June 2016.