Skip to main content

Showing 1–50 of 89 results for author: Modi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.13666  [pdf, ps, other

    cs.RO cs.AI

    DREAM: Domain-aware Reasoning for Efficient Autonomous Underwater Monitoring

    Authors: Zhenqi Wu, Abhinav Modi, Angelos Mavrogiannis, Kaustubh Joshi, Nikhil Chopra, Yiannis Aloimonos, Nare Karapetyan, Ioannis Rekleitis, Xiaomin Lin

    Abstract: The ocean is warming and acidifying, increasing the risk of mass mortality events for temperature-sensitive shellfish such as oysters. This motivates the development of long-term monitoring systems. However, human labor is costly and long-duration underwater work is highly hazardous, thus favoring robotic solutions as a safer and more efficient option. To enable underwater robots to make real-time… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: submitted to ICRA 2026

  2. arXiv:2507.08036  [pdf

    cs.CL cs.CV

    Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights

    Authors: Deepali Mishra, Chaklam Silpasuwanchai, Ashutosh Modi, Madhumita Sushil, Sorayouth Chumnanvej

    Abstract: Medical Visual Question Answering (MedVQA) is a promising tool to assist radiologists by automating medical image interpretation through question answering. Despite advances in models and datasets, MedVQA's integration into clinical workflows remains limited. This study systematically reviews 68 publications (2018-2024) and surveys 50 clinicians from India and Thailand to examine MedVQA's practica… ▽ More

    Submitted 14 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

    Comments: 29 pages, 5 figures (1 in supplementary), 3 tables (1 in main text, 2 in supplementary). Scoping review and clinician survey

  3. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 16 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  4. arXiv:2506.10341  [pdf, ps, other

    cs.LG cs.CL

    Provably Learning from Language Feedback

    Authors: Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng

    Abstract: Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. While impressive empirical demonstrations have been shown, so far a principled framing of these decision problems remains lacking. In this paper, we formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to e… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  5. arXiv:2506.08504  [pdf, ps, other

    cs.CL cs.AI cs.LG

    CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations

    Authors: Divyaksh Shukla, Ritesh Baviskar, Dwijesh Gohil, Aniket Tiwari, Atul Shree, Ashutosh Modi

    Abstract: Discourse parsing is an important task useful for NLU applications such as summarization, machine comprehension, and emotion recognition. The current discourse parsing datasets based on conversations consists of written English dialogues restricted to a single domain. In this resource paper, we introduce CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations. Th… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Accepted at ACL Findings 2025 (16 pages: 5 pages main content + 3 pages references + 8 pages appendix)

  6. arXiv:2506.08488  [pdf, ps, other

    cs.CL cs.AI cs.CY

    EtiCor++: Towards Understanding Etiquettical Bias in LLMs

    Authors: Ashutosh Dwivedi, Siddhant Shivdutt Singh, Ashutosh Modi

    Abstract: In recent years, researchers have started analyzing the cultural sensitivity of LLMs. In this respect, Etiquettes have been an active area of research. Etiquettes are region-specific and are an essential part of the culture of a region; hence, it is imperative to make LLMs sensitive to etiquettes. However, there needs to be more resources in evaluating LLMs for their understanding and bias with re… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Accepted at ACL Findings 2025, 22 pages (9 pages main content + 4 pages references + 9 pages appendix)

  7. arXiv:2506.07621  [pdf, ps, other

    cs.CL cs.AI cs.LG

    LoRMA: Low-Rank Multiplicative Adaptation for LLMs

    Authors: Harsh Bihany, Shubham Patel, Ashutosh Modi

    Abstract: Large Language Models have shown remarkable capabilities in the NLP domain. Their effectiveness can mainly be attributed to their ability to adapt to an array of downstream tasks. However, generally, full fine-tuning is a computationally expensive job. To mitigate this, many techniques have been developed that prime efficiency, a prominent one being Low-Rank Adaptation (LoRA). However, LoRA and it… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted at ACL Findings 2025; 21 pages (9 main paper + 5 pages references + 7 pages appendix)

  8. arXiv:2505.24477  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Evaluating Gemini in an arena for learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Ankit Anand, Avishkar Bhoopchand, Brett Wiltshire, Daniel Gillick, Daniel Kasenberg, Eleni Sgouritsa, Gal Elidan, Hengrui Liu, Holger Winnemoeller, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Komal Singh, Lisa Wang, Markus Kunesch, Miruna Pîslar, Niv Efron , et al. (12 additional authors not shown)

    Abstract: Artificial intelligence (AI) is poised to transform education, but the research community lacks a robust, general benchmark to evaluate AI models for learning. To assess state-of-the-art support for educational use cases, we ran an "arena for learning" where educators and pedagogy experts conduct blind, head-to-head, multi-turn comparisons of leading AI models. In particular, $N = 189$ educators d… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  9. arXiv:2504.10077  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Quantifying Commonsense Reasoning with Mechanistic Insights

    Authors: Abhinav Joshi, Areeb Ahmad, Divyaksh Shukla, Ashutosh Modi

    Abstract: Commonsense reasoning deals with the implicit knowledge that is well understood by humans and typically acquired via interactions with the world. In recent times, commonsense reasoning and understanding of various LLMs have been evaluated using text-based tasks. In this work, we argue that a proxy of this understanding can be maintained as a graphical structure that can further help to perform a r… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted at NAACL 2025; 28 pages (9 pages + 7 pages references + 12 pages appendix)

  10. arXiv:2501.17586  [pdf, other

    cs.CV cs.LG

    Boosting Weak Positives for Text Based Person Search

    Authors: Akshay Modi, Ashhar Aziz, Nilanjana Chatterjee, A V Subramanyam

    Abstract: Large vision-language models have revolutionized cross-modal object retrieval, but text-based person search (TBPS) remains a challenging task due to limited data and fine-grained nature of the task. Existing methods primarily focus on aligning image-text pairs into a common representation space, often disregarding the fact that real world positive image-text pairs share a varied degree of similari… ▽ More

    Submitted 30 January, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  11. arXiv:2412.16429  [pdf, ps, other

    cs.CY cs.AI cs.LG

    LearnLM: Improving Gemini for Learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Brett Wiltshire, Brian Veprek, Daniel Gillick, Daniel Kasenberg, Derek Ahmed, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Lisa Wang, Markus Kunesch, Mike Schaekermann, Miruna Pîslar, Nikhil Joshi, Parsa Mahmoudieh, Paul Jhun, Sara Wiltberger, Shakir Mohamed , et al. (21 additional authors not shown)

    Abstract: Today's generative AI systems are tuned to present information by default, rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level in… ▽ More

    Submitted 22 August, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

  12. arXiv:2411.19500  [pdf, other

    cs.CL cs.AI cs.LG

    COLD: Causal reasOning in cLosed Daily activities

    Authors: Abhinav Joshi, Areeb Ahmad, Ashutosh Modi

    Abstract: Large Language Models (LLMs) have shown state-of-the-art performance in a variety of tasks, including arithmetic and reasoning; however, to gauge the intellectual capabilities of LLMs, causal reasoning has become a reliable proxy for validating a general understanding of the mechanics and intricacies of the world similar to humans. Previous works in natural language processing (NLP) have either fo… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

    Comments: Paper accepted at NeurIPS 2024; Total 37 Pages

  13. arXiv:2411.15477  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

    Authors: Abhinav Joshi, Shaswati Saha, Divyaksh Shukla, Sriram Vema, Harsh Jhamtani, Manas Gaur, Ashutosh Modi

    Abstract: Large Language Models (LLMs) have shown to be a great success in a wide range of applications ranging from regular NLP-based use cases to AI agents. LLMs have been trained on a vast corpus of texts from various sources; despite the best efforts during the data pre-processing stage while training the LLMs, they may pick some undesirable information such as personally identifiable information (PII).… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

    Comments: Accepted at EMNLP 2024 Findings; 21 pages (5 page main content + references + appendix)

  14. Overview of the First Shared Task on Clinical Text Generation: RRG24 and "Discharge Me!"

    Authors: Justin Xu, Zhihong Chen, Andrew Johnston, Louis Blankemeier, Maya Varma, Jason Hom, William J. Collins, Ankit Modi, Robert Lloyd, Benjamin Hopkins, Curtis Langlotz, Jean-Benoit Delbrouck

    Abstract: Recent developments in natural language generation have tremendous implications for healthcare. For instance, state-of-the-art systems could automate the generation of sections in clinical reports to alleviate physician workload and streamline hospital documentation. To explore these applications, we present a shared task consisting of two subtasks: (1) Radiology Report Generation (RRG24) and (2)… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: ACL Proceedings. BioNLP workshop

    Journal ref: Proceedings of the 23rd Workshop on Biomedical Natural Language Processing (2024) 85-98

  15. arXiv:2408.07753  [pdf, other

    cs.LG

    How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

    Authors: Ying Fan, Jingling Li, Adith Swaminathan, Aditya Modi, Ching-An Cheng

    Abstract: We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual Goal-Oriented (CGO) problems. By carefully constructing an action-augmented MDP that is equivalent to the original MDP, CODA creates a fully labeled transition dataset under training contexts without additional approximation e… ▽ More

    Submitted 29 April, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: NeurIPS 2024

  16. arXiv:2407.05887  [pdf, other

    cs.CL cs.AI cs.LG

    Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs

    Authors: Sanjeet Singh, Shreya Gupta, Niralee Gupta, Naimish Sharma, Lokesh Srivastava, Vibhu Agarwal, Ashutosh Modi

    Abstract: The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the lett… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at BioNLP Workshop at ACL 2024; 21 pages (9 pages main content)

  17. arXiv:2407.05404  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    iSign: A Benchmark for Indian Sign Language Processing

    Authors: Abhinav Joshi, Romit Mohanty, Mounika Kanakanti, Andesha Mangla, Sudeep Choudhary, Monali Barbate, Ashutosh Modi

    Abstract: Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024 Findings. 18 Pages (9 Pages + References + Appendix)

  18. arXiv:2407.05399  [pdf, other

    cs.CL cs.AI cs.LG

    IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

    Authors: Abhinav Joshi, Shounak Paul, Akshat Sharma, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi

    Abstract: Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing… ▽ More

    Submitted 26 November, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024 Main Conference; 40 Pages (9 Pages + References + Appendix)

  19. arXiv:2406.07860  [pdf, other

    cs.CL cs.AI cs.LG

    BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain

    Authors: Rahul Kumar, Amar Raja Dibbu, Shrutendra Harsola, Vignesh Subrahmaniam, Ashutosh Modi

    Abstract: Several large-scale datasets (e.g., WikiSQL, Spider) for developing natural language interfaces to databases have recently been proposed. These datasets cover a wide breadth of domains but fall short on some essential domains, such as finance and accounting. Given that accounting databases are used worldwide, particularly by non-technical people, there is an imminent need to develop models that co… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at NAACL 2024; 20 Pages (main + appendix)

  20. arXiv:2406.05828  [pdf, other

    cs.CV cs.AI eess.IV

    Multi-Stain Multi-Level Convolutional Network for Multi-Tissue Breast Cancer Image Segmentation

    Authors: Akash Modi, Sumit Kumar Jha, Purnendu Mishra, Rajiv Kumar, Kiran Aatre, Gursewak Singh, Shubham Mathur

    Abstract: Digital pathology and microscopy image analysis are widely employed in the segmentation of digitally scanned IHC slides, primarily to identify cancer and pinpoint regions of interest (ROI) indicative of tumor presence. However, current ROI segmentation models are either stain-specific or suffer from the issues of stain and scanner variance due to different staining protocols or modalities across m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  21. arXiv:2404.04525  [pdf, other

    cs.CL cs.AI cs.LG

    IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

    Authors: Shubham Patel, Divyaksh Shukla, Ashutosh Modi

    Abstract: This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at SemEval 2024, NAACL 2024; 10 Pages

  22. arXiv:2404.04520  [pdf, other

    cs.CL cs.AI cs.LG

    IITK at SemEval-2024 Task 4: Hierarchical Embeddings for Detection of Persuasion Techniques in Memes

    Authors: Shreenaga Chikoti, Shrey Mehta, Ashutosh Modi

    Abstract: Memes are one of the most popular types of content used in an online disinformation campaign. They are primarily effective on social media platforms since they can easily reach many users. Memes in a disinformation campaign achieve their goal of influencing the users through several rhetorical and psychological techniques, such as causal oversimplification, name-calling, and smear. The SemEval 202… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at SemEval 2024, NAACL 2024; 9 pages

  23. arXiv:2404.04513  [pdf, other

    cs.CL cs.AI cs.LG

    IITK at SemEval-2024 Task 1: Contrastive Learning and Autoencoders for Semantic Textual Relatedness in Multilingual Texts

    Authors: Udvas Basak, Rajarshi Dutta, Shivam Pandey, Ashutosh Modi

    Abstract: This paper describes our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness. The challenge is focused on automatically detecting the degree of relatedness between pairs of sentences for 14 languages including both high and low-resource Asian and African languages. Our team participated in two subtasks consisting of Track A: supervised and Track B: unsupervised. This paper f… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at SemEval 2024, NAACL 2024; 6 pages

  24. arXiv:2404.04510  [pdf, other

    cs.CL cs.AI cs.LG

    IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials

    Authors: Shreyasi Mandal, Ashutosh Modi

    Abstract: Large Language models (LLMs) have demonstrated state-of-the-art performance in various natural language processing (NLP) tasks across multiple domains, yet they are prone to shortcut learning and factual inconsistencies. This research investigates LLMs' robustness, consistency, and faithful reasoning when performing Natural Language Inference (NLI) on breast cancer Clinical Trial Reports (CTRs) in… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at SemEval 2024, NAACL 2024; 8 Pages

  25. arXiv:2403.15412  [pdf, other

    cs.CY cs.AI cs.CL

    Towards Measuring and Modeling "Culture" in LLMs: A Survey

    Authors: Muhammad Farid Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Singh, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, Monojit Choudhury

    Abstract: We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). We observe that none of the studies explicitly define "culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of "culture". We call these aspects the proxies of cultu… ▽ More

    Submitted 4 September, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  26. arXiv:2311.17578  [pdf

    cs.CR

    Data Driven Approaches to Cybersecurity Governance for Board Decision-Making -- A Systematic Review

    Authors: Anita Modi, Ievgeniia Kuzminykh, Bogdan Ghita

    Abstract: Cybersecurity governance influences the quality of strategic decision-making to ensure cyber risks are managed effectively. Board of Directors are the decisions-makers held accountable for managing this risk; however, they lack adequate and efficient information necessary for making such decisions. In addition to the myriad of challenges they face, they are often insufficiently versed in the techn… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  27. arXiv:2310.18974  [pdf, other

    cs.CL cs.AI cs.LG

    EtiCor: Corpus for Analyzing LLMs for Etiquettes

    Authors: Ashutosh Dwivedi, Pradhyumna Lavania, Ashutosh Modi

    Abstract: Etiquettes are an essential ingredient of day-to-day interactions among people. Moreover, etiquettes are region-specific, and etiquettes in one region might contradict those in other regions. In this paper, we propose EtiCor, an Etiquettes Corpus, having texts about social norms from five different regions across the globe. The corpus provides a test bed for evaluating LLMs for knowledge and under… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023, Main Conference

  28. arXiv:2307.05440  [pdf, other

    cs.CL cs.AI cs.LG

    ISLTranslate: Dataset for Translating Indian Sign Language

    Authors: Abhinav Joshi, Susmit Agrawal, Ashutosh Modi

    Abstract: Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted at ACL 2023 Findings, 8 Pages

  29. arXiv:2307.05260  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

    Authors: Abhinav Joshi, Akshat Sharma, Sai Kiran Tanikella, Ashutosh Modi

    Abstract: The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted at ACL 2023, 15 pages (12 main + 3 Appendix)

  30. arXiv:2307.03906  [pdf, other

    cs.CL cs.AI cs.LG cs.MA

    ScriptWorld: Text Based Environment For Learning Procedural Knowledge

    Authors: Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi

    Abstract: Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agent… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted at IJCAI 2023, 26 Pages (7 main + 19 for appendix)

  31. arXiv:2305.08358  [pdf, other

    cs.CR cs.DC cs.LG

    Quadratic Functional Encryption for Secure Training in Vertical Federated Learning

    Authors: Shuangyi Chen, Anuja Modi, Shweta Agrawal, Ashish Khisti

    Abstract: Vertical federated learning (VFL) enables the collaborative training of machine learning (ML) models in settings where the data is distributed amongst multiple parties who wish to protect the privacy of their individual data. Notably, in VFL, the labels are available to a single party and the complete feature set is formed only when data from all parties is combined. Recently, Xu et al. proposed a… ▽ More

    Submitted 19 June, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Accepted to ISIT 2023

  32. arXiv:2304.09548  [pdf, other

    cs.CL cs.AI cs.LG

    SemEval 2023 Task 6: LegalEval - Understanding Legal Texts

    Authors: Ashutosh Modi, Prathamesh Kalamkar, Saurabh Karn, Aman Tiwari, Abhinav Joshi, Sai Kiran Tanikella, Shouvik Kumar Guha, Sachin Malhan, Vivek Raghavan

    Abstract: In populous countries, pending legal cases have been growing exponentially. There is a need for developing NLP-based techniques for processing and automatically understanding legal documents. To promote research in the area of Legal NLP we organized the shared task LegalEval - Understanding Legal Texts at SemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about… ▽ More

    Submitted 1 May, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: 13 Pages (9 Pages + References), Accepted at SemEval 2023 at ACL 2023

  33. arXiv:2302.01061  [pdf

    cs.AI

    MLOps with enhanced performance control and observability

    Authors: Indradumna Banerjee, Dinesh Ghanta, Girish Nautiyal, Pradeep Sanchana, Prateek Katageri, Atin Modi

    Abstract: The explosion of data and its ever increasing complexity in the last few years, has made MLOps systems more prone to failure, and new tools need to be embedded in such systems to avoid such failure. In this demo, we will introduce crucial tools in the observability module of a MLOps system that target difficult issues like data drfit and model version control for optimum model selection. We believ… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: SECOND INTERNATIONAL CONFERENCE ON AI-ML SYSTEMS

  34. arXiv:2211.03742  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-Task Learning Framework for Extracting Emotion Cause Span and Entailment in Conversations

    Authors: Ashwani Bhat, Ashutosh Modi

    Abstract: Predicting emotions expressed in text is a well-studied problem in the NLP community. Recently there has been active research in extracting the cause of an emotion expressed in text. Most of the previous work has done causal emotion entailment in documents. In this work, we propose neural models to extract emotion cause span and entailment in conversations. For learning such models, we use RECCON… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 19 Pages, Accepted at Workshop on Transfer Learning for Natural Language Processing, NeurIPS 2022

  35. arXiv:2211.03587  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

    Authors: Abhinav Joshi, Naman Gupta, Jinang Shah, Binod Bhattarai, Ashutosh Modi, Danail Stoyanov

    Abstract: A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation Learning (MRL) has emerged as an active area of research in recent times. MRL involves learning reliable and robust representations of information from heterogeneo… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 11 Pages, Accepted at ICMI 2022 Oral

  36. BabyNet: A Lightweight Network for Infant Reaching Action Recognition in Unconstrained Environments to Support Future Pediatric Rehabilitation Applications

    Authors: Amel Dechemi, Vikarn Bhakri, Ipsita Sahin, Arjun Modi, Julya Mestas, Pamodya Peiris, Dannya Enriquez Barrundia, Elena Kokkoni, Konstantinos Karydis

    Abstract: Action recognition is an important component to improve autonomy of physical rehabilitation devices, such as wearable robotic exoskeletons. Existing human action recognition algorithms focus on adult applications rather than pediatric ones. In this paper, we introduce BabyNet, a light-weight (in terms of trainable parameters) network structure to recognize infant reaching action from off-body stat… ▽ More

    Submitted 12 October, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Accepted to RO-MAN 2021

  37. arXiv:2206.10770  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

    Authors: Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

    Abstract: We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions. On the positive side, we propose the RFOLIVE (Reward-Free OLIVE) algorithm for sample-efficient reward-free exploration under minimal structural assumptions, which covers the previously studied settings… ▽ More

    Submitted 22 October, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

  38. arXiv:2205.02455  [pdf, other

    cs.CL cs.AI cs.LG

    COGMEN: COntextualized GNN based Multimodal Emotion recognitioN

    Authors: Abhinav Joshi, Ashwani Bhat, Ayush Jain, Atin Vikram Singh, Ashutosh Modi

    Abstract: Emotions are an inherent part of human interactions, and consequently, it is imperative to develop AI systems that understand and recognize human emotions. During a conversation involving various people, a person's emotions are influenced by the other speaker's utterances and their own emotional state over the utterances. In this paper, we propose COntextualized Graph Neural Network based Multimod… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 17 pages (9 main + 8 appendix). Accepted at NAACL 2022

  39. arXiv:2204.00806  [pdf, other

    cs.CL cs.AI cs.LG

    HLDC: Hindi Legal Documents Corpus

    Authors: Arnav Kapoor, Mudit Dhawan, Anmol Goel, T. H. Arjun, Akshala Bhatnagar, Vibhu Agrawal, Amul Agrawal, Arnab Bhattacharya, Ponnurangam Kumaraguru, Ashutosh Modi

    Abstract: Many populous countries including India are burdened with a considerable backlog of legal cases. Development of automated systems that could process legal documents and augment legal practitioners can mitigate this. However, there is a dearth of high-quality corpora that is needed to develop such data-driven systems. The problem gets even more pronounced in the case of low resource languages such… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 April, 2022; originally announced April 2022.

    Comments: 16 Pages, Accepted at ACL 2022 Findings

  40. arXiv:2201.13125  [pdf, other

    cs.CL cs.AI cs.LG

    Corpus for Automatic Structuring of Legal Documents

    Authors: Prathamesh Kalamkar, Aman Tiwari, Astha Agarwal, Saurabh Karn, Smita Gupta, Vivek Raghavan, Ashutosh Modi

    Abstract: In populous countries, pending legal cases have been growing exponentially. There is a need for developing techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated… ▽ More

    Submitted 19 September, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: Accepted at LREC 2022, 10 Pages (8 page main paper + 2 page references)

  41. arXiv:2201.01387  [pdf, other

    eess.SY cs.AI cs.LG stat.ME

    Joint Learning-Based Stabilization of Multiple Unknown Linear Systems

    Authors: Mohamad Kazem Shirani Faradonbeh, Aditya Modi

    Abstract: Learning-based control of linear systems received a lot of attentions recently. In popular settings, the true dynamical models are unknown to the decision-maker and need to be interactively learned by applying control inputs to the systems. Unlike the matured literature of efficient reinforcement learning policies for adaptive control of a single system, results on joint learning of multiple syste… ▽ More

    Submitted 1 January, 2022; originally announced January 2022.

  42. arXiv:2112.10955  [pdf, other

    stat.ML cs.LG eess.SY math.DS

    Joint Learning of Linear Time-Invariant Dynamical Systems

    Authors: Aditya Modi, Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

    Abstract: Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the trans… ▽ More

    Submitted 2 January, 2024; v1 submitted 20 December, 2021; originally announced December 2021.

  43. arXiv:2112.01938  [pdf, other

    cs.CL cs.AI cs.LG

    Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

    Authors: Harsh Agarwal, Keshav Bansal, Abhinav Joshi, Ashutosh Modi

    Abstract: Emotion Recognition in Conversations (ERC) is an important and active research area. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this obse… ▽ More

    Submitted 7 November, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: 13 pages, Accepted at Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models, COLING 2022

  44. arXiv:2112.01836  [pdf, other

    cs.CL cs.AI cs.LG

    Semantic Segmentation of Legal Documents via Rhetorical Roles

    Authors: Vijit Malik, Rishabh Sanjay, Shouvik Kumar Guha, Angshuman Hazarika, Shubham Nigam, Arnab Bhattacharya, Ashutosh Modi

    Abstract: Legal documents are unstructured, use legal jargon, and have considerable length, making them difficult to process automatically via conventional text processing techniques. A legal document processing system would benefit substantially if the documents could be segmented into coherent information units. This paper proposes a new corpus of legal documents annotated (with the help of legal experts)… ▽ More

    Submitted 7 November, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: 19 pages, Accepted at Natural Legal Language Processing Workshop, EMNLP 2022

  45. arXiv:2109.07763  [pdf, other

    eess.SP cs.IT

    Design and Evaluation of Reconfigurable Intelligent Surfaces in Real-World Environment

    Authors: Georgios C. Trichopoulos, Panagiotis Theofanopoulos, Bharath Kashyap, Aditya Shekhawat, Anuj Modi, Tawfik Osman, Sanjay Kumar, Anand Sengar, Arkajyoti Chang, Ahmed Alkhateeb

    Abstract: Reconfigurable intelligent surfaces (RISs) have promising coverage and data rate gains for wireless communication systems in 5G and beyond. Prior work has mainly focused on analyzing the performance of these surfaces using computer simulations or lab-level prototypes. To draw accurate insights about the actual performance of these systems, this paper develops an RIS proof-of-concept prototype and… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: Submitted to IEEE Open Journal of the Communications Society, 29 pages, 20 figures

  46. arXiv:2107.12135  [pdf, other

    cs.CL cs.AI

    Fine-Grained Emotion Prediction by Modeling Emotion Definitions

    Authors: Gargi Singh, Dhanajit Brahma, Piyush Rai, Ashutosh Modi

    Abstract: In this paper, we propose a new framework for fine-grained emotion prediction in the text through emotion definition modeling. Our approach involves a multi-task learning framework that models definitions of emotions as an auxiliary task while being trained on the primary task of emotion prediction. We model definitions using masked language modeling and class definition prediction tasks. Our mode… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: 8 Pages, accepted at ACII 2021 for Orals

  47. arXiv:2107.08408  [pdf, other

    cs.CL cs.AI cs.MA cs.RO

    Pre-trained Language Models as Prior Knowledge for Playing Text-based Games

    Authors: Ishika Singh, Gargi Singh, Ashutosh Modi

    Abstract: Recently, text world games have been proposed to enable artificial agents to understand and reason about real-world scenarios. These text-based games are challenging for artificial agents, as it requires an understanding of and interaction using natural language in a partially observable environment. Agents observe the environment via textual descriptions designed to be challenging enough for even… ▽ More

    Submitted 23 December, 2021; v1 submitted 18 July, 2021; originally announced July 2021.

    Comments: 40 Pages (8 Pages main content + 1 Page references + 31 Pages Appendix). Some new results added

  48. arXiv:2107.05202  [pdf, other

    cs.CV

    Delta Sampling R-BERT for limited data and low-light action recognition

    Authors: Sanchit Hira, Ritwik Das, Abhinav Modi, Daniil Pakhomov

    Abstract: We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset. Most previous works only evaluate performance on large, well illuminated datasets like Kinetics and HMDB51. We demonstrate that our work is able to achieve a very low error rate while being trained on a much smaller dataset of dark videos. We also explore a variety… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  49. KEA: Tuning an Exabyte-Scale Data Infrastructure

    Authors: Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, Carlo Curino

    Abstract: Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this infrastructure, achieving state-of-the-art efficienc… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  50. arXiv:2105.13562  [pdf, other

    cs.CL cs.AI

    ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation

    Authors: Vijit Malik, Rishabh Sanjay, Shubham Kumar Nigam, Kripa Ghosh, Shouvik Kumar Guha, Arnab Bhattacharya, Ashutosh Modi

    Abstract: An automated system that could assist a judge in predicting the outcome of a case would help expedite the judicial process. For such a system to be practically useful, predictions by the system should be explainable. To promote research in developing such a system, we introduce ILDC (Indian Legal Documents Corpus). ILDC is a large corpus of 35k Indian Supreme Court cases annotated with original co… ▽ More

    Submitted 31 May, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

    Comments: Accepted at ACL 2021, 17 Pages (9 Pages main paper, 4 pages references, 4 pages appendix)