Skip to main content

Showing 1–15 of 15 results for author: Leaman, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.04303  [pdf

    q-bio.GN cs.AI cs.LG

    Knowledge-guided Contextual Gene Set Analysis Using Large Language Models

    Authors: Zhizheng Wang, Chi-Ping Day, Chih-Hsuan Wei, Qiao Jin, Robert Leaman, Yifan Yang, Shubo Tian, Aodong Qiu, Yin Fang, Qingqing Zhu, Xinghua Lu, Zhiyong Lu

    Abstract: Gene set analysis (GSA) is a foundational approach for interpreting genomic data of diseases by linking genes to biological processes. However, conventional GSA methods overlook clinical context of the analyses, often generating long lists of enriched pathways with redundant, nonspecific, or irrelevant results. Interpreting these requires extensive, ad-hoc manual effort, reducing both reliability… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 56 pages, 9 figures, 1 table

  2. arXiv:2501.14079  [pdf

    cs.CL

    Enhancing Biomedical Relation Extraction with Directionality

    Authors: Po-Ting Lai, Chih-Hsuan Wei, Shubo Tian, Robert Leaman, Zhiyong Lu

    Abstract: Biological relation networks contain rich information for understanding the biological mechanisms behind the relationship of entities such as genes, proteins, diseases, and chemicals. The vast growth of biomedical literature poses significant challenges updating the network knowledge. The recent Biomedical Relation Extraction Dataset (BioRED) provides valuable manual annotations, facilitating the… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  3. arXiv:2411.14487  [pdf

    cs.CL cs.AI cs.CY

    Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine

    Authors: Yifan Yang, Qiao Jin, Robert Leaman, Xiaoyu Liu, Guangzhi Xiong, Maame Sarfo-Gyamfi, Changlin Gong, Santiago Ferrière-Steinert, W. John Wilbur, Xiaojun Li, Jiaxin Yuan, Bang An, Kelvin S. Castro, Francisco Erramuspe Álvarez, Matías Stockle, Aidong Zhang, Furong Huang, Zhiyong Lu

    Abstract: The remarkable capabilities of Large Language Models (LLMs) make them increasingly compelling for adoption in real-world healthcare applications. However, the risks associated with using LLMs in medical applications have not been systematically characterized. We propose using five key principles for safe and trustworthy medical AI: Truthfulness, Resilience, Fairness, Robustness, and Privacy, along… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  4. arXiv:2410.18856  [pdf

    cs.AI cs.CL

    Demystifying Large Language Models for Medicine: A Primer

    Authors: Qiao Jin, Nicholas Wan, Robert Leaman, Shubo Tian, Zhizheng Wang, Yifan Yang, Zifeng Wang, Guangzhi Xiong, Po-Ting Lai, Qingqing Zhu, Benjamin Hou, Maame Sarfo-Gyamfi, Gongbo Zhang, Aidan Gilson, Balu Bhasuran, Zhe He, Aidong Zhang, Jimeng Sun, Chunhua Weng, Ronald M. Summers, Qingyu Chen, Yifan Peng, Zhiyong Lu

    Abstract: Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering me… ▽ More

    Submitted 19 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: Under review

  5. arXiv:2404.14209  [pdf

    cs.CL

    EnzChemRED, a rich enzyme chemistry relation extraction dataset

    Authors: Po-Ting Lai, Elisabeth Coudert, Lucila Aimo, Kristian Axelsen, Lionel Breuza, Edouard de Castro, Marc Feuermann, Anne Morgat, Lucille Pourcel, Ivo Pedruzzi, Sylvain Poux, Nicole Redaschi, Catherine Rivoire, Anastasia Sveshnikova, Chih-Hsuan Wei, Robert Leaman, Ling Luo, Zhiyong Lu, Alan Bridge

    Abstract: Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) metho… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  6. arXiv:2401.11048  [pdf

    cs.CL q-bio.QM

    PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge

    Authors: Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, Zhiyong Lu

    Abstract: PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text arti… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  7. PubMed and Beyond: Biomedical Literature Search in the Age of Artificial Intelligence

    Authors: Qiao Jin, Robert Leaman, Zhiyong Lu

    Abstract: Biomedical research yields a wealth of information, much of which is only accessible through the literature. Consequently, literature search is an essential tool for building on prior knowledge in clinical and biomedical research. Although recent improvements in artificial intelligence have expanded functionality beyond keyword-based search, these advances may be unfamiliar to clinicians and resea… ▽ More

    Submitted 21 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: 27 pages, 6 figures, 36 tools

    Journal ref: eBioMedicine, 2024

  8. AIONER: All-in-one scheme-based biomedical named entity recognition using deep learning

    Authors: Ling Luo, Chih-Hsuan Wei, Po-Ting Lai, Robert Leaman, Qingyu Chen, Zhiyong Lu

    Abstract: Biomedical named entity recognition (BioNER) seeks to automatically recognize biomedical entities in natural language text, serving as a necessary foundation for downstream text mining tasks and applications such as information extraction and question answering. Manually labeling training data for the BioNER task is costly, however, due to the significant domain expertise required for accurate ann… ▽ More

    Submitted 15 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted by Bioinformatics

  9. arXiv:2209.13428  [pdf

    cs.DL cs.IR

    LitCovid in 2022: an information resource for the COVID-19 literature

    Authors: Qingyu Chen, Alexis Allot, Robert Leaman, Chih-Hsuan Wei, Elaheh Aghaarabi, John J. Guerrerio, Lilly Xu, Zhiyong Lu

    Abstract: LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/), first launched in February 2020, is a first-of-its-kind literature hub for tracking up-to-date published research on COVID-19. The number of articles in LitCovid has increased from 55,000 to ~300,000 over the past two and half years, with a consistent growth rate of ~10,000 articles per month. In addition to the rapid literature growth… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 9 pages

  10. arXiv:2209.08124  [pdf

    cs.LG

    Comprehensively identifying Long Covid articles with human-in-the-loop machine learning

    Authors: Robert Leaman, Rezarta Islamaj, Alexis Allot, Qingyu Chen, W. John Wilbur, Zhiyong Lu

    Abstract: A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning fr… ▽ More

    Submitted 28 October, 2022; v1 submitted 16 September, 2022; originally announced September 2022.

  11. arXiv:2204.09781  [pdf

    cs.DL cs.CL cs.IR cs.LG

    Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

    Authors: Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj Doğan, Jingcheng Du, Li Fang, Kai Wang, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh, Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Senja Pollak, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu , et al. (14 additional authors not shown)

    Abstract: The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretatio… ▽ More

    Submitted 3 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  12. Artificial Intelligence (AI) in Action: Addressing the COVID-19 Pandemic with Natural Language Processing (NLP)

    Authors: Qingyu Chen, Robert Leaman, Alexis Allot, Ling Luo, Chih-Hsuan Wei, Shankai Yan, Zhiyong Lu

    Abstract: The COVID-19 pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP), the bran… ▽ More

    Submitted 5 September, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: 51 pages, 3 figures and 2 tables; published at the Annual Review of Biomedical Data Science

    Journal ref: Annual Review of Biomedical Data Science 4 (2021)

  13. arXiv:2010.14588  [pdf

    cs.DL cs.CL

    A Comprehensive Dictionary and Term Variation Analysis for COVID-19 and SARS-CoV-2

    Authors: Robert Leaman, Zhiyong Lu

    Abstract: The number of unique terms in the scientific literature used to refer to either SARS-CoV-2 or COVID-19 is remarkably large and has continued to increase rapidly despite well-established standardized terms. This high degree of term variation makes high recall identification of these important entities difficult. In this manuscript we present an extensive dictionary of terms used in the literature t… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Accepted EMNLP NLP-COVID Workshop

  14. arXiv:2008.03397  [pdf

    cs.DL cs.DB cs.IR cs.LG

    Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view

    Authors: Lana Yeganova, Rezarta Islamaj, Qingyu Chen, Robert Leaman, Alexis Allot, Chin-Hsuan Wei, Donald C. Comeau, Won Kim, Yifan Peng, W. John Wilbur, Zhiyong Lu

    Abstract: Timely access to accurate scientific literature in the battle with the ongoing COVID-19 pandemic is critical. This unprecedented public health risk has motivated research towards understanding the disease in general, identifying drugs to treat the disease, developing potential vaccines, etc. This has given rise to a rapidly growing body of literature that doubles in number of publications every 20… ▽ More

    Submitted 11 September, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

    Comments: 10 pages, 8 Figures, Submitted to KDD 2020 Health Day

    Journal ref: KDD 2020 Health Day: AI for COVID, August 23-27, 2020, Virtual Conference, CA, US

  15. arXiv:1909.10416  [pdf

    cs.CL

    Biomedical Mention Disambiguation using a Deep Learning Approach

    Authors: Chih-Hsuan Wei, Kyubum Lee, Robert Leaman, Zhiyong Lu

    Abstract: Automatically locating named entities in natural language text - named entity recognition - is an important task in the biomedical domain. Many named entity mentions are ambiguous between several bioconcept types, however, causing text spans to be annotated as more than one type when simultaneously recognizing multiple entity types. The straightforward solution is a rule-based approach applying a… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.