Skip to main content

Showing 1–13 of 13 results for author: Ghosal, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17592  [pdf, ps, other

    astro-ph.IM cs.LG

    AstroMLab 4: Benchmark-Topping Performance in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model

    Authors: Tijmen de Haan, Yuan-Sen Ting, Tirthankar Ghosal, Tuan Dung Nguyen, Alberto Accomazzi, Emily Herron, Vanessa Lama, Rui Pan, Azton Wells, Nesar Ramachandra

    Abstract: General-purpose large language models, despite their broad capabilities, often struggle with specialized domain knowledge, a limitation particularly pronounced in more accessible, lower-parameter versions. This gap hinders their deployment as effective agents in demanding fields such as astronomy. Building on our prior work with AstroSage-8B, this study introduces AstroSage-70B, a significantly la… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2504.12976  [pdf, other

    cs.CL

    Sparks of Science: Hypothesis Generation Using Structured Paper Data

    Authors: Charles O'Neill, Tirthankar Ghosal, Roberta Răileanu, Mike Walmsley, Thang Bui, Kevin Schawinski, Ioana Ciucă

    Abstract: Generating novel and creative scientific hypotheses is a cornerstone in achieving Artificial General Intelligence. Large language and reasoning models have the potential to aid in the systematic creation, selection, and validation of scientifically informed hypotheses. However, current foundation models often struggle to produce scientific ideas that are both novel and feasible. One reason is the… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 9 pages, 2 figures. Comments welcome

  3. arXiv:2504.05496  [pdf, ps, other

    cs.CL

    A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models

    Authors: Atilla Kaan Alkan, Shashwat Sourav, Maja Jablonska, Simone Astarita, Rishabh Chakrabarty, Nikhil Garuda, Pranav Khetarpal, Maciej Pióro, Dimitrios Tanoglidis, Kartheik G. Iyer, Mugdha S. Polimera, Michael J. Smith, Tirthankar Ghosal, Marc Huertas-Company, Sandor Kruk, Kevin Schawinski, Ioana Ciucă

    Abstract: Hypothesis generation is a fundamental step in scientific discovery, yet it is increasingly challenged by information overload and disciplinary fragmentation. Recent advances in Large Language Models (LLMs) have sparked growing interest in their potential to enhance and automate this process. This paper presents a comprehensive survey of hypothesis generation with LLMs by (i) reviewing existing me… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 9 pages (+2 pages of references), 2 figures

    MSC Class: 68T50

  4. arXiv:2410.09770  [pdf, other

    cs.CL cs.AI cs.DL cs.LG

    'Quis custodiet ipsos custodes?' Who will watch the watchmen? On Detecting AI-generated peer-reviews

    Authors: Sandeep Kumar, Mohit Sahu, Vardhan Gacche, Tirthankar Ghosal, Asif Ekbal

    Abstract: The integrity of the peer-review process is vital for maintaining scientific rigor and trust within the academic community. With the steady increase in the usage of large language models (LLMs) like ChatGPT in academic writing, there is a growing concern that AI-generated texts could compromise scientific publishing, including peer-reviews. Previous works have focused on generic AI-generated text… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: EMNLP Main, 17 pages, 5 figures, 9 tables

  5. arXiv:2409.19750  [pdf, other

    astro-ph.IM cs.CL

    AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy

    Authors: Rui Pan, Tuan Dung Nguyen, Hardik Arora, Alberto Accomazzi, Tirthankar Ghosal, Yuan-Sen Ting

    Abstract: Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: 10 pages, 1 figure, 1 table, accepted to AI4S: The 5th Workshop on Artificial Intelligence and Machine Learning for Scientific Applications at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC24). Models will be released at https://huggingface.co/AstroMLab. AstroMLab homepage: https://astromlab.org/

  6. arXiv:2409.06185  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Can Large Language Models Unlock Novel Scientific Research Ideas?

    Authors: Sandeep Kumar, Tirthankar Ghosal, Vinayak Goyal, Asif Ekbal

    Abstract: "An idea is nothing more nor less than a new combination of old elements" (Young, J.W.). The widespread adoption of Large Language Models (LLMs) and publicly available ChatGPT have marked a significant turning point in the integration of Artificial Intelligence (AI) into people's everyday lives. This study explores the capability of LLMs in generating novel research ideas based on information from… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 24 pages, 12 figures, 6 tables

  7. arXiv:2408.01556  [pdf, other

    astro-ph.IM cs.DL cs.IR

    pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy

    Authors: Kartheik G. Iyer, Mikaeel Yunus, Charles O'Neill, Christine Ye, Alina Hyk, Kiera McCormick, Ioana Ciuca, John F. Wu, Alberto Accomazzi, Simone Astarita, Rishabh Chakrabarty, Jesse Cranney, Anjalie Field, Tirthankar Ghosal, Michele Ginolfi, Marc Huertas-Company, Maja Jablonska, Sandor Kruk, Huiling Liu, Gabriel Marchidan, Rohit Mistry, J. P. Naiman, J. E. G. Peek, Mugdha Polimera, Sergio J. Rodriguez , et al. (5 additional authors not shown)

    Abstract: The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords.… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 25 pages, 9 figures, submitted to AAS jorunals. Comments are welcome, and the tools mentioned are available online at https://pfdr.app

  8. arXiv:2407.11194  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.GA astro-ph.SR cs.AI cs.CL

    AstroMLab 1: Who Wins Astronomy Jeopardy!?

    Authors: Yuan-Sen Ting, Tuan Dung Nguyen, Tirthankar Ghosal, Rui Pan, Hardik Arora, Zechang Sun, Tijmen de Haan, Nesar Ramachandra, Azton Wells, Sandeep Madireddy, Alberto Accomazzi

    Abstract: We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and asse… ▽ More

    Submitted 8 November, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 45 pages, 12 figures, 7 tables. Published in Astronomy & Computing. AstroMLab homepage: https://astromlab.org/

  9. arXiv:2310.18685  [pdf, other

    cs.CL

    When Reviewers Lock Horn: Finding Disagreement in Scientific Peer Reviews

    Authors: Sandeep Kumar, Tirthankar Ghosal, Asif Ekbal

    Abstract: To this date, the efficacy of the scientific publishing enterprise fundamentally rests on the strength of the peer review process. The journal editor or the conference chair primarily relies on the expert reviewers' assessment, identify points of agreement and disagreement and try to reach a consensus to make a fair and informed decision on whether to accept or reject a paper. However, with the es… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: 12 pages, 5 figures, EMNLP 2023 short

  10. arXiv:2202.02646  [pdf, other

    cs.CL

    RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification

    Authors: Ashish Rana, Deepanshu Khanna, Tirthankar Ghosal, Muskaan Singh, Harpreet Singh, Prashant Singh Rana

    Abstract: Exponential growth in digital information outlets and the race to publish has made scientific misinformation more prevalent than ever. However, the task to fact-verify a given scientific claim is not straightforward even for researchers. Scientific claim verification requires in-depth knowledge and great labor from domain experts to substantiate supporting and refuting evidence from credible scien… ▽ More

    Submitted 18 April, 2022; v1 submitted 5 February, 2022; originally announced February 2022.

    Comments: Accepted in the AAAI-22 Workshop on Scientific Document Understanding at the Thirty-Sixth AAAI Conference on Artificial Intelligence (SDU@AAAI-22)

  11. Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection

    Authors: Jan Philip Wahle, Nischal Ashok, Terry Ruas, Norman Meuschke, Tirthankar Ghosal, Bela Gipp

    Abstract: A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods predominantly target specific content types (e.g., n… ▽ More

    Submitted 10 November, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Journal ref: iConference 2022

  12. arXiv:1802.06950  [pdf, other

    cs.CL

    TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

    Authors: Tirthankar Ghosal, Amitra Salam, Swati Tiwari, Asif Ekbal, Pushpak Bhattacharyya

    Abstract: Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic novelt… ▽ More

    Submitted 19 February, 2018; originally announced February 2018.

    Comments: Accepted for publication in Language Resources and Evaluation Conference (LREC) 2018

  13. arXiv:1802.01403  [pdf, other

    cs.DL

    An AI aid to the editors. Exploring the possibility of an AI assisted article classification system

    Authors: Tirthankar Ghosal, Rajeev Verma, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya

    Abstract: This work is a preliminary exploratory study of how we could progress a step towards an AI assisted article classification sys- tem in academia. The proposed system aims to aid the journal editors in their decisions by pinpointing the potential weaknesses or strengths of a submitted manuscript. From a large collection of articles and corresponding author-editor interactions we explore the possible… ▽ More

    Submitted 16 February, 2018; v1 submitted 5 February, 2018; originally announced February 2018.