Skip to main content

Showing 1–50 of 178 results for author: Sharma, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08910  [pdf, ps, other

    cs.CV cs.CL

    Behind Maya: Building a Multilingual Vision Language Model

    Authors: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

    Abstract: In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pre… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted at VLMs4ALL CVPR 2025 Workshop; corrected workshop name spelling

  2. arXiv:2505.06151  [pdf, ps, other

    cs.CL

    Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework

    Authors: Alice Rueda, Argyrios Perivolaris, Niloy Roy, Dylan Weston, Sarmed Shaya, Zachary Cote, Martin Ivanov, Bazen G. Teferra, Yuqi Wu, Sirisha Rambhatla, Divya Sharma, Andrew Greenshaw, Rakesh Jetly, Yanbo Zhang, Bo Cao, Reza Samavi, Sridhar Krishnan, Venkat Bhat

    Abstract: Engagement between client and therapist is a critical determinant of therapeutic success. We propose a multi-dimensional natural language processing (NLP) framework that objectively classifies engagement quality in counseling sessions based on textual transcripts. Using 253 motivational interviewing transcripts (150 high-quality, 103 low-quality), we extracted 42 features across four domains: conv… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 12 pages, 4 figures, 7 tables

  3. arXiv:2505.01482  [pdf, other

    cs.AI

    Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers

    Authors: Alice Rueda, Mohammed S. Hassan, Argyrios Perivolaris, Bazen G. Teferra, Reza Samavi, Sirisha Rambhatla, Yuqi Wu, Yanbo Zhang, Bo Cao, Divya Sharma, Sridhar Krishnan Venkat Bhat

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, ana… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  4. arXiv:2504.13863  [pdf

    cs.HC

    Utsarjan: A smartphone App for providing kidney care and real-time assistance to children with nephrotic syndrome

    Authors: Snigdha Tiwari, Sahil Sharma, Arvind Bagga, Aditi Sinha, Deepak Sharma

    Abstract: Background Telemedicine has the potential to provide secure and cost-effective healthcare at the touch of a button. Nephrotic syndrome is a chronic childhood illness involving frequent relapses and demands long/complex treatment. Hence, developing a remote means of doctor-patient interface will ensure the provision of quality healthcare to patients. Methods The Utsarjan mobile App framework was bu… ▽ More

    Submitted 26 March, 2025; originally announced April 2025.

    Comments: 16 pages, 3 figures

  5. arXiv:2504.11952  [pdf, other

    cs.CL cs.AI cs.LG

    Robust and Fine-Grained Detection of AI Generated Texts

    Authors: Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, Drishti Sharma, Siddhant Gupta, Jebish Purbey, Ashay Srivastava, Subhasya TippaReddy, Arvind Reddy Bobbili, Suraj Telugara Chandrashekhar, Modabbir Adeeb, Srinadh Vura, Hamza Farooq

    Abstract: An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts.… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: ACL 2025 Feb ARR Submission

  6. arXiv:2504.09753  [pdf, other

    cs.CL cs.AI

    Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance

    Authors: Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Siddhant Gupta, Drishti Sharma, Jebish Purbey, Kanwal Mehreen, Muhammad Arham, Hamza Farooq

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities, but their development has primarily focused on English and other high-resource languages, leaving many languages underserved. We present our latest Hindi-English bi-lingual LLM \textbf{Mantra-14B} with ~3\% average improvement in benchmark scores over both languages, outperforming models twice its size. Using a curated dataset compos… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: ARR Feb 2025 submission

  7. arXiv:2504.08403  [pdf, other

    cs.NI

    Optimizing Collaborative UAV Networks for Data Efficiency in IoT Ecosystems

    Authors: Priyavrat Dev Sharma, Ibrahim Sorkhoh, Muthucumaru Maheswaran

    Abstract: Advances in the Internet of Things are revolutionizing data acquisition, enhancing artificial intelligence and quality of service. Unmanned Aerial Vehicles (UAVs) provide an efficient data-gathering solution across varied environments. This paper addresses challenges in integrating UAVs for large scale data operations, including mobility, multi-hop paths, and optimized multi-source information tra… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 7 pages, 6 figures. Accepted for presentation at the IEEE ICC Workshop 2025 in Montreal, Canada

  8. arXiv:2504.07072  [pdf, other

    cs.CL cs.CV

    Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

    Authors: Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary, Sharad Duwal, Alfonso Amayuelas, Swati Rajwal, Jebish Purbey, Ahmed Ruby, Nicholas Popovič, Marek Suppa, Azmine Toushik Wasi, Ram Mohan Rao Kadiyala, Olga Tsymboi , et al. (20 additional authors not shown)

    Abstract: The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam b… ▽ More

    Submitted 29 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: v2: corrected the author list

  9. arXiv:2504.06622  [pdf, other

    quant-ph cs.LG

    Quantum neural networks facilitating quantum state classification

    Authors: Diksha Sharma, Vivek Balasaheb Sabale, Thirumalai M., Atul Kumar

    Abstract: The classification of quantum states into distinct classes poses a significant challenge. In this study, we address this problem using quantum neural networks in combination with a problem-inspired circuit and customised as well as predefined ansätz. To facilitate the resource-efficient quantum state classification, we construct the dataset of quantum states using the proposed problem-inspired cir… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  10. arXiv:2503.11972  [pdf, other

    cs.DC

    MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models

    Authors: Yuchen Xia, Divyam Sharma, Yichao Yuan, Souvik Kundu, Nishil Talati

    Abstract: Diffusion-based text-to-image generation models trade latency for quality: small models are fast but generate lower-quality images, while large models produce better images but are slow. We present MoDM, a novel caching-based serving system for diffusion models that dynamically balances latency and quality through a mixture of diffusion models. Unlike prior approaches that rely on model-specific… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  11. arXiv:2503.11807  [pdf, other

    cs.CV cs.AI cs.LG

    Mitigating Bad Ground Truth in Supervised Machine Learning based Crop Classification: A Multi-Level Framework with Sentinel-2 Images

    Authors: Sanayya A, Amoolya Shetty, Abhijeet Sharma, Venkatesh Ravichandran, Masthan Wali Gosuvarapalli, Sarthak Jain, Priyamvada Nanjundiah, Ujjal Kr Dutta, Divya Sharma

    Abstract: In agricultural management, precise Ground Truth (GT) data is crucial for accurate Machine Learning (ML) based crop classification. Yet, issues like crop mislabeling and incorrect land identification are common. We propose a multi-level GT cleaning framework while utilizing multi-temporal Sentinel-2 data to address these issues. Specifically, this framework utilizes generating embeddings for farml… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted In IEEE India Geoscience and Remote Sensing Symposium (InGARSS) 2024

  12. arXiv:2503.04184  [pdf

    cs.NI cs.AI cs.CL

    Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

    Authors: Adnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi, Ahmed Elbakary, Alexandros Nikou, Ali Maatouk, Ali Mokh, Amirreza Kazemi, Antonio De Domenico, Athanasios Karapantelakis, Bo Cheng, Bo Yang, Bohao Wang, Carlo Fischione, Chao Zhang, Chaouki Ben Issaid, Chau Yuen, Chenghui Peng, Chongwen Huang, Christina Chaccour, Christo Kurisummoottil Thomas, Dheeraj Sharma, Dimitris Kalogiros, Dusit Niyato, Eli De Poorter , et al. (110 additional authors not shown)

    Abstract: This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced b… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  13. arXiv:2503.03184  [pdf, other

    stat.ML cs.GT cs.LG

    PAC Learning with Improvements

    Authors: Idan Attias, Avrim Blum, Keziah Naggita, Donya Saless, Dravyansh Sharma, Matthew Walter

    Abstract: One of the most basic lower bounds in machine learning is that in nearly any nontrivial setting, it takes $\textit{at least}$ $1/ε$ samples to learn to error $ε$ (and more, if the classifier being learned is complex). However, suppose that data points are agents who have the ability to improve by a small amount if doing so will allow them to receive a (desired) positive classification. In that cas… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 40 pages, 13 figures

  14. arXiv:2502.16255  [pdf

    eess.SP cs.AI cs.LG

    rECGnition_v2.0: Self-Attentive Canonical Fusion of ECG and Patient Data using deep learning for effective Cardiac Diagnostics

    Authors: Shreya Srivastava, Durgesh Kumar, Ram Jiwari, Sandeep Seth, Deepak Sharma

    Abstract: The variability in ECG readings influenced by individual patient characteristics has posed a considerable challenge to adopting automated ECG analysis in clinical settings. A novel feature fusion technique termed SACC (Self Attentive Canonical Correlation) was proposed to address this. This technique is combined with DPN (Dual Pathway Network) and depth-wise separable convolution to create a robus… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  15. arXiv:2502.14360  [pdf

    cs.CV

    Weed Detection using Convolutional Neural Network

    Authors: Santosh Kumar Tripathi, Shivendra Pratap Singh, Devansh Sharma, Harshavardhan U Patekar

    Abstract: In this paper we use convolutional neural networks (CNNs) for weed detection in agricultural land. We specifically investigate the application of two CNN layer types, Conv2d and dilated Conv2d, for weed detection in crop fields. The suggested method extracts features from the input photos using pre-trained models, which are subsequently adjusted for weed detection. The findings of the experiment,… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  16. arXiv:2502.14234  [pdf, other

    cond-mat.mtrl-sci cs.LG

    OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes

    Authors: Félix Therrien, Jamal Abou Haibeh, Divya Sharma, Rhiannon Hendley, Alex Hernández-García, Sun Sun, Alain Tchagang, Jiang Su, Samuel Huberman, Yoshua Bengio, Hongyu Guo, Homin Shin

    Abstract: Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theor… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 8 pages, 3 figures and 2 tables

  17. arXiv:2502.12937  [pdf, other

    cs.LG

    Tuning Algorithmic and Architectural Hyperparameters in Graph-Based Semi-Supervised Learning with Provable Guarantees

    Authors: Ally Yalei Du, Eric Huang, Dravyansh Sharma

    Abstract: Graph-based semi-supervised learning is a powerful paradigm in machine learning for modeling and exploiting the underlying graph structure that captures the relationship between labeled and unlabeled data. A large number of classical as well as modern deep learning based algorithms have been proposed for this problem, often having tunable hyperparameters. We initiate a formal study of tuning algor… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 31 pages (11 pages main body), 2 figures

  18. arXiv:2501.13734  [pdf, other

    cs.LG

    Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

    Authors: Maria-Florina Balcan, Anh Tuan Nguyen, Dravyansh Sharma

    Abstract: Modern machine learning algorithms, especially deep learning based techniques, typically involve careful hyperparameter tuning to achieve the best performance. Despite the surge of intense interest in practical techniques like Bayesian optimization and random search based approaches to automating this laborious and compute intensive task, the fundamental learning theoretic complexity of tuning hyp… ▽ More

    Submitted 29 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: 57 pages, 4 figures

  19. arXiv:2501.02926  [pdf, other

    cs.LG

    Offline-to-online hyperparameter transfer for stochastic bandits

    Authors: Dravyansh Sharma, Arun Sai Suggala

    Abstract: Classic algorithms for stochastic bandits typically use hyperparameters that govern their critical properties such as the trade-off between exploration and exploitation. Tuning these hyperparameters is a problem of great practical significance. However, this is a challenging problem and in certain cases is information theoretically impossible. To address this challenge, we consider a practically r… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: AAAI 2025

  20. arXiv:2412.12552  [pdf, other

    cs.CV cs.AI

    SAModified: A Foundation Model-Based Zero-Shot Approach for Refining Noisy Land-Use Land-Cover Maps

    Authors: Sparsh Pekhale, Rakshith Sathish, Sathisha Basavaraju, Divya Sharma

    Abstract: Land-use and land cover (LULC) analysis is critical in remote sensing, with wide-ranging applications across diverse fields such as agriculture, utilities, and urban planning. However, automating LULC map generation using machine learning is rendered challenging due to noisy labels. Typically, the ground truths (e.g. ESRI LULC, MapBioMass) have noisy labels that hamper the model's ability to learn… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  21. arXiv:2412.11836  [pdf

    cs.CV

    UnMA-CapSumT: Unified and Multi-Head Attention-driven Caption Summarization Transformer

    Authors: Dhruv Sharma, Chhavi Dhiman, Dinesh Kumar

    Abstract: Image captioning is the generation of natural language descriptions of images which have increased immense popularity in the recent past. With this different deep-learning techniques are devised for the development of factual and stylized image captioning models. Previous models focused more on the generation of factual and stylized captions separately providing more than one caption for a single… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  22. arXiv:2412.07112  [pdf, other

    cs.CV cs.CL

    Maya: An Instruction Finetuned Multilingual Multimodal Model

    Authors: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

    Abstract: The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to und… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  23. arXiv:2412.06009  [pdf, other

    cs.CL cs.IR cs.LG

    1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering

    Authors: Jebish Purbey, Drishti Sharma, Siddhant Gupta, Khawaja Murad, Siddartha Pullakhandam, Ram Mohan Rao Kadiyala

    Abstract: This paper presents the system description of our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge, focusing on leveraging advanced information retrieval and answer generation techniques in regulatory domains. We experimented with a combination of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged fine-tuning and rerankin… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: 5 pages, Accepted to RegNLP @ COLING 2025

  24. arXiv:2412.04351  [pdf, ps, other

    cs.CL cs.AI

    BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages

    Authors: Vandan Mujadia, Dipti Misra Sharma

    Abstract: This paper focuses on developing translation models and related applications for 36 Indian languages, including Assamese, Awadhi, Bengali, Bhojpuri, Braj, Bodo, Dogri, English, Konkani, Gondi, Gujarati, Hindi, Hinglish, Ho, Kannada, Kangri, Kashmiri (Arabic and Devanagari), Khasi, Mizo, Magahi, Maithili, Malayalam, Marathi, Manipuri (Bengali and Meitei), Nepali, Oriya, Punjabi, Sanskrit, Santali,… ▽ More

    Submitted 2 January, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

  25. arXiv:2412.00549  [pdf, other

    cs.CL cs.CE cs.LG q-fin.CP

    SeQwen at the Financial Misinformation Detection Challenge Task: Sequential Learning for Claim Verification and Explanation Generation in Financial Domains

    Authors: Jebish Purbey, Siddhant Gupta, Nikhil Manali, Siddartha Pullakhandam, Drishti Sharma, Ashay Srivastava, Ram Mohan Rao Kadiyala

    Abstract: This paper presents the system description of our entry for the COLING 2025 FMD challenge, focusing on misinformation detection in financial domains. We experimented with a combination of large language models, including Qwen, Mistral, and Gemma-2, and leveraged pre-processing and sequential learning for not only identifying fraudulent financial content but also generating coherent, and concise ex… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

    Comments: 6 pages, 9 figures, Submitted to FinNLP-FNP-LLMFinLegal @ COLING 2025

  26. arXiv:2411.19799  [pdf, other

    cs.CL

    INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

    Authors: Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam , et al. (34 additional authors not shown)

    Abstract: The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other th… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

  27. arXiv:2411.06850  [pdf, other

    cs.CL cs.AI cs.LG

    1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs

    Authors: Jebish Purbey, Siddartha Pullakhandam, Kanwal Mehreen, Muhammad Arham, Drishti Sharma, Ashay Srivastava, Ram Mohan Rao Kadiyala

    Abstract: This paper presents a detailed system description of our entry for the CHiPSAL 2025 shared task, focusing on language detection, hate speech identification, and target detection in Devanagari script languages. We experimented with a combination of large language models and their ensembles, including MuRIL, IndicBERT, and Gemma-2, and leveraged unique techniques like focal loss to address challenge… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: 13 pages, Submitted to CHIPSAL workshop @ COLING 2025

  28. arXiv:2411.02854  [pdf, other

    cs.AR cs.LG cs.NE

    SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception

    Authors: Deepika Sharma, Shubham Negi, Trishit Dutta, Amogh Agrawal, Kaushik Roy

    Abstract: Spiking Neural Networks (SNNs), with their inherent recurrence, offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS), making them well-suited for event-based vision applications. However, existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes, inefficient membrane pote… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    Comments: 9 pages, 17 figures

  29. arXiv:2410.19973  [pdf, other

    eess.IV cs.CV

    Multi-Class Abnormality Classification Task in Video Capsule Endoscopy

    Authors: Dev Rishi Verma, Vibhor Saxena, Dhruv Sharma, Arpan Gupta

    Abstract: In this work for Capsule Vision Challenge 2024, we addressed the challenge of multiclass anomaly classification in video capsule Endoscopy (VCE)[1] with a variety of deep learning models, ranging from custom CNNs to advanced transformer architectures. The purpose is to correctly classify diverse gastrointestinal disorders, which is critical for increasing diagnostic efficiency in clinical settings… ▽ More

    Submitted 3 December, 2024; v1 submitted 25 October, 2024; originally announced October 2024.

    Comments: Submission for Video Capsule Endoscopy Challenge

  30. arXiv:2410.18985  [pdf

    eess.SP cs.AI cs.LG

    rECGnition_v1.0: Arrhythmia detection using cardiologist-inspired multi-modal architecture incorporating demographic attributes in ECG

    Authors: Shreya Srivastava, Durgesh Kumar, Jatin Bedi, Sandeep Seth, Deepak Sharma

    Abstract: A substantial amount of variability in ECG manifested due to patient characteristics hinders the adoption of automated analysis algorithms in clinical practice. None of the ECG annotators developed till date consider the characteristics of the patients in a multi-modal architecture. We employed the XGBoost model to analyze the UCI Arrhythmia dataset, linking patient characteristics to ECG morpholo… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  31. arXiv:2410.15522  [pdf, other

    cs.CL cs.AI cs.LG

    M-RewardBench: Evaluating Reward Models in Multilingual Settings

    Authors: Srishti Gureja, Lester James V. Miranda, Shayekh Bin Islam, Rishabh Maheshwary, Drishti Sharma, Gusti Winata, Nathan Lambert, Sebastian Ruder, Sara Hooker, Marzieh Fadaee

    Abstract: Reward models (RMs) have driven the state-of-the-art performance of LLMs today by enabling the integration of human feedback into the language modeling process. However, RMs are primarily trained and evaluated in English, and their capabilities in multilingual settings remain largely understudied. In this work, we conduct a systematic evaluation of several reward models in multilingual settings. W… ▽ More

    Submitted 28 October, 2024; v1 submitted 20 October, 2024; originally announced October 2024.

    Comments: 16 pages, 6 figures, 10 tables. Website: https://m-rewardbench.github.io/ , Updated results with latest models. Added more author information

  32. arXiv:2410.10848  [pdf, other

    cs.CL cs.AI

    Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation

    Authors: Divyam Sharma, Divya Santhanam

    Abstract: Writing stories is an engaging yet challenging endeavor. Often, authors encounter moments of creative block, where the path forward in their narrative becomes obscured. This paper is designed to address such moments by providing an innovative solution: A tool that completes stories based on given prompts. By inputting a short story prompt, users can receive a conclusion to their story, articulated… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: 9 pages

  33. arXiv:2410.10739  [pdf, other

    cs.CL

    Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

    Authors: Ishan Jindal, Chandana Badrinath, Pranjal Bharti, Lakkidi Vinay, Sachin Dev Sharma

    Abstract: Large Language Models (LLMs) for public use require continuous pre-training to remain up-to-date with the latest data. The models also need to be fine-tuned with specific instructions to maintain their ability to follow instructions accurately. Typically, LLMs are released in two versions: the Base LLM, pre-trained on diverse data, and the instruction-refined LLM, additionally trained with specifi… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  34. arXiv:2410.08560  [pdf, other

    cs.RO

    Enhanced Robot Planning and Perception through Environment Prediction

    Authors: Vishnu Dutt Sharma

    Abstract: Mobile robots rely on maps to navigate through an environment. In the absence of any map, the robots must build the map online from partial observations as they move in the environment. Traditional methods build a map using only direct observations. In contrast, humans identify patterns in the observed environment and make informed guesses about what to expect ahead. Modeling these patterns explic… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: 289 pages, 81 figures, 16 tables; Dissertation submitted to UMD to fulfill PhD requirement

  35. arXiv:2410.03954  [pdf, other

    cs.LG cs.AI

    SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation

    Authors: Amir Eskandari, Aman Anand, Drishti Sharma, Farhana Zulkernine

    Abstract: In various applications, the multivariate time series often suffers from missing data. This issue can significantly disrupt systems that rely on the data. Spatial and temporal dependencies can be leveraged to impute the missing samples. Existing imputation methods often ignore dynamic changes in spatial dependencies. We propose a Spatial Dynamic Aware Graph Recurrent Imputation Network (SDA-GRIN)… ▽ More

    Submitted 5 May, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  36. arXiv:2410.03066  [pdf, other

    cs.RO

    Hybrid Classical/RL Local Planner for Ground Robot Navigation

    Authors: Vishnu D. Sharma, Jeongran Lee, Matthew Andrews, Ilija Hadžić

    Abstract: Local planning is an optimization process within a mobile robot navigation stack that searches for the best velocity vector, given the robot and environment state. Depending on how the optimization criteria and constraints are defined, some planners may be better than others in specific situations. We consider two conceptually different planners. The first planner explores the velocity space in re… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  37. arXiv:2409.04367  [pdf, other

    cs.LG cs.AI stat.ML

    Algorithm Configuration for Structured Pfaffian Settings

    Authors: Maria-Florina Balcan, Anh Tuan Nguyen, Dravyansh Sharma

    Abstract: Data-driven algorithm design automatically adapts algorithms to specific application domains, achieving better performance. In the context of parameterized algorithms, this approach involves tuning the algorithm's hyperparameters using problem instances drawn from the problem distribution of the target application domain. This can be achieved by maximizing empirical utilities that measure the algo… ▽ More

    Submitted 12 November, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

  38. arXiv:2409.03129  [pdf, other

    cs.GT cs.LG

    Subsidy design for better social outcomes

    Authors: Maria-Florina Balcan, Matteo Pozzi, Dravyansh Sharma

    Abstract: Overcoming the impact of selfish behavior of rational players in multiagent systems is a fundamental problem in game theory. Without any intervention from a central agent, strategic users take actions in order to maximize their personal utility, which can lead to extremely inefficient overall system performance, often indicated by a high Price of Anarchy. Recent work (Lin et al. 2021) investigated… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 30 pages, 3 figures, 5 tables

  39. arXiv:2408.01877  [pdf, other

    cs.RO cs.CV

    Improving Zero-Shot ObjectNav with Generative Communication

    Authors: Vishnu Sashank Dorbala, Vishnu Dutt Sharma, Pratap Tokekar, Dinesh Manocha

    Abstract: We propose a new method for improving zero-shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and… ▽ More

    Submitted 1 October, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  40. arXiv:2407.19779  [pdf

    cs.CL

    Synthesizing Scientific Summaries: An Extractive and Abstractive Approach

    Authors: Grishma Sharma, Aditi Paretkar, Deepak Sharma

    Abstract: The availability of a vast array of research papers in any area of study, necessitates the need of automated summarisation systems that can present the key research conducted and their corresponding findings. Scientific paper summarisation is a challenging task for various reasons including token length limits in modern transformer models and corresponding memory and compute requirements for long… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: the paper consists of 10 pages , 5 figures and 4 tables

  41. arXiv:2407.18496  [pdf, other

    cs.CL cs.LG

    Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies

    Authors: Manisha Singh, Divy Sharma, Alonso Ma, Nora Goldfine

    Abstract: Based on the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification, we predict the level of empathic concern and personal distress displayed in essays. For the first stage of this project we implemented a Feed-Forward Neural Network using sentence-level embeddings as features. We experimented with four different embedding models for generating the inputs to the neural network. The… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  42. arXiv:2407.18471  [pdf, other

    cs.CL cs.IR cs.LG

    Constructing the CORD-19 Vaccine Dataset

    Authors: Manisha Singh, Divy Sharma, Alonso Ma, Bridget Tyree, Margaret Mitchell

    Abstract: We introduce new dataset 'CORD-19-Vaccination' to cater to scientists specifically looking into COVID-19 vaccine-related research. This dataset is extracted from CORD-19 dataset [Wang et al., 2020] and augmented with new columns for language detail, author demography, keywords, and topic per paper. Facebook's fastText model is used to identify languages [Joulin et al., 2016]. To establish author d… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  43. arXiv:2407.03172  [pdf, other

    cs.CV cs.AI stat.AP

    IMC 2024 Methods & Solutions Review

    Authors: Shyam Gupta, Dhanisha Sharma, Songling Huang

    Abstract: For the past three years, Kaggle has been hosting the Image Matching Challenge, which focuses on solving a 3D image reconstruction problem using a collection of 2D images. Each year, this competition fosters the development of innovative and effective methodologies by its participants. In this paper, we introduce an advanced ensemble technique that we developed, achieving a score of 0.153449 on th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 Pages, 9 figures

  44. arXiv:2407.02598  [pdf, other

    cs.CV cs.AI

    AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

    Authors: Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, Bingbing Liu

    Abstract: Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatti… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  45. Harnessing Quantum Support Vector Machines for Cross-Domain Classification of Quantum States

    Authors: Diksha Sharma, Vivek Balasaheb Sabale, Parvinder Singh, Atul Kumar

    Abstract: In the present study, we use cross-domain classification using quantum machine learning for quantum advantages to readdress the entanglement versus separability paradigm. The inherent structure of quantum states and its relation to a particular class of quantum states are used to intuitively classify testing states from domains different from training states, called \textit{cross-domain classifica… ▽ More

    Submitted 22 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  46. arXiv:2406.16625  [pdf, other

    cs.RO

    GATSBI: An Online GTSP-Based Algorithm for Targeted Surface Bridge Inspection and Defect Detection

    Authors: Harnaik Dhami, Charith Reddy, Vishnu Dutt Sharma, Troi Williams, Pratap Tokekar

    Abstract: We study the problem of visual surface inspection of infrastructure for defects using an Unmanned Aerial Vehicle (UAV). We do not assume that the geometric model of the infrastructure is known beforehand. Our planner, termed GATSBI, plans a path in a receding horizon fashion to inspect all points on the surface of the infrastructure. The input to GATSBI consists of a 3D occupancy map created onlin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 10 pages, 12 figures, 2 tables. Submitted to IEEE TAES. arXiv admin note: text overlap with arXiv:2012.04803

  47. arXiv:2406.15958  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Bone Fracture Classification using Transfer Learning

    Authors: Shyam Gupta, Dhanisha Sharma

    Abstract: The manual examination of X-ray images for fractures is a time-consuming process that is prone to human error. In this work, we introduce a robust yet simple training loop for the classification of fractures, which significantly outperforms existing methods. Our method achieves superior performance in less than ten epochs and utilizes the latest dataset to deliver the best-performing model for thi… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: code is publicly available at - https://github.com/shyamgupta196/Bone-Fracture-Classification

  48. arXiv:2406.05199  [pdf, other

    eess.AS cs.SD

    XANE: eXplainable Acoustic Neural Embeddings

    Authors: Sri Harsha Dumpala, Dushyant Sharma, Chandramouli Shama Sastri, Stanislav Kruchinin, James Fosburgh, Patrick A. Naylor

    Abstract: We present a novel method for extracting neural embeddings that model the background acoustics of a speech signal. The extracted embeddings are used to estimate specific parameters related to the background acoustic properties of the signal in a non-intrusive manner, which allows the embeddings to be explainable in terms of those parameters. We illustrate the value of these embeddings by performin… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  49. arXiv:2405.15911  [pdf, other

    cs.LG

    Learning accurate and interpretable decision trees

    Authors: Maria-Florina Balcan, Dravyansh Sharma

    Abstract: Decision trees are a popular tool in machine learning and yield easy-to-understand models. Several techniques have been proposed in the literature for learning a decision tree classifier, with different techniques working well for data from different domains. In this work, we develop approaches to design decision tree learning algorithms given repeated access to data from the same domain. We propo… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 26 pages, UAI 2024

  50. arXiv:2405.05469  [pdf, other

    cs.CR

    PLLM-CS: Pre-trained Large Language Model (LLM) for Cyber Threat Detection in Satellite Networks

    Authors: Mohammed Hassanin, Marwa Keshk, Sara Salim, Majid Alsubaie, Dharmendra Sharma

    Abstract: Satellite networks are vital in facilitating communication services for various critical infrastructures. These networks can seamlessly integrate with a diverse array of systems. However, some of these systems are vulnerable due to the absence of effective intrusion detection systems, which can be attributed to limited research and the high costs associated with deploying, fine-tuning, monitoring,… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.