Skip to main content

Showing 1–13 of 13 results for author: Ramsundar, B

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2507.07060  [pdf, ps, other

    q-bio.QM cs.AI cs.CL cs.LG q-bio.BM q-bio.MN

    DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

    Authors: Shreyas Vinaya Sathyanarayana, Sharanabasava D. Hiremath, Rahil Shah, Rishikesh Panda, Rahul Jana, Riya Singh, Rida Irfan, Ashwin Murali, Bharath Ramsundar

    Abstract: The synthesis of complex natural products remains one of the grand challenges of organic chemistry. We present DeepRetro, a major advancement in computational retrosynthesis that enables the discovery of viable synthetic routes for complex molecules typically considered beyond the reach of existing retrosynthetic methods. DeepRetro is a novel, open-source framework that tightly integrates large la… ▽ More

    Submitted 19 August, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 64 pages,

  2. arXiv:2412.13519  [pdf, other

    cs.LG q-bio.BM

    Open-Source Protein Language Models for Function Prediction and Protein Design

    Authors: Shivasankaran Vanaja Pandi, Bharath Ramsundar

    Abstract: Protein language models (PLMs) have shown promise in improving the understanding of protein sequences, contributing to advances in areas such as function prediction and protein engineering. However, training these models from scratch requires significant computational resources, limiting their accessibility. To address this, we integrate a PLM into DeepChem, an open-source framework for computatio… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: To be published in 4th Annual AAAI workshop on AI to Accelerate Science and Engineering

  3. arXiv:2412.08658  [pdf, other

    cond-mat.soft cond-mat.mtrl-sci q-bio.BM q-bio.MN

    Open-source Polymer Generative Pipeline

    Authors: Debasish Mohanty, V Shreyas, Akshaya Palai, Bharath Ramsundar

    Abstract: Polymers play a crucial role in the development of engineering materials, with applications ranging from mechanical to biomedical fields. However, the limited polymerization processes constrain the variety of organic building blocks that can be experimentally tested. We propose an open-source computational generative pipeline that integrates neural-network-based discriminators, generators, and que… ▽ More

    Submitted 29 November, 2024; originally announced December 2024.

  4. arXiv:2411.11513  [pdf, ps, other

    q-bio.QM cs.LG

    A Modular Open Source Framework for Genomic Variant Calling

    Authors: Ankita Vaishnobi Bisoi, Shreyas V, Jose Siguenza, Bharath Ramsundar

    Abstract: Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open-source drug discovery framework, through the integration of DeepVariant. In particular, we introduce a variant calling pipeline that leverages Dee… ▽ More

    Submitted 28 July, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  5. arXiv:2409.08163  [pdf, other

    cs.LG cs.CV q-bio.QM

    Open Source Infrastructure for Automatic Cell Segmentation

    Authors: Aaron Rock Menezes, Bharath Ramsundar

    Abstract: Automated cell segmentation is crucial for various biological and medical applications, facilitating tasks like cell counting, morphology analysis, and drug discovery. However, manual segmentation is time-consuming and prone to subjectivity, necessitating robust automated methods. This paper presents open-source infrastructure, utilizing the UNet model, a deep-learning architecture noted for its e… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  6. arXiv:2408.06261  [pdf, other

    cs.LG cs.AI q-bio.BM

    Open-Source Molecular Processing Pipeline for Generating Molecules

    Authors: V Shreyas, Jose Siguenza, Karan Bania, Bharath Ramsundar

    Abstract: Generative models for molecules have shown considerable promise for use in computational chemistry, but remain difficult to use for non-experts. For this reason, we introduce open-source infrastructure for easily building generative molecular models into the widely used DeepChem [Ramsundar et al., 2019] library with the aim of creating a robust and reusable molecular generation pipeline. In partic… ▽ More

    Submitted 28 November, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Presented at the Molecular Machine Learning Conference 2024 (MoML 2024), BayLearn 2024 and the Machine Learning and Physical Sciences (ML4PS) Workshop at NeurIPS 2024

  7. arXiv:2209.01712  [pdf, other

    cs.LG cs.AI q-bio.BM

    ChemBERTa-2: Towards Chemical Foundation Models

    Authors: Walid Ahmad, Elana Simon, Seyone Chithrananda, Gabriel Grand, Bharath Ramsundar

    Abstract: Large pretrained models such as GPT-3 have had tremendous impact on modern natural language processing by leveraging self-supervised learning to learn salient representations that can be used to readily finetune on a wide variety of downstream tasks. We investigate the possibility of transferring such advances to molecular machine learning by building a chemical foundation model, ChemBERTa-2, usin… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

    Comments: ELLIS Machine Learning for Molecule Discovery Workshop

    ACM Class: I.2.7; I.2.1; J.2; J.3

  8. arXiv:2203.04698  [pdf, other

    cs.LG q-bio.QM

    Score-Based Generative Models for Molecule Generation

    Authors: Dwaraknath Gnaneshwar, Bharath Ramsundar, Dhairya Gandhi, Rachel Kurchin, Venkatasubramanian Viswanathan

    Abstract: Recent advances in generative models have made exploring design spaces easier for de novo molecule generation. However, popular generative models like GANs and normalizing flows face challenges such as training instabilities due to adversarial training and architectural constraints, respectively. Score-based generative models sidestep these challenges by modelling the gradient of the log probabili… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  9. Identification and Development of Therapeutics for COVID-19

    Authors: Halie M. Rando, Nils Wellhausen, Soumita Ghosh, Alexandra J. Lee, Anna Ada Dattoli, Fengling Hu, James Brian Byrd, Diane N. Rafizadeh, Ronan Lordan, Yanjun Qi, Yuchen Sun, Christian Brueffer, Jeffrey M. Field, Marouen Ben Guebila, Nafisa M. Jadavji, Ashwin N. Skelly, Bharath Ramsundar, Jinhui Wang, Rishi Raj Goel, YoSon Park, the COVID-19 Review Consortium, Simina M. Boca, Anthony Gitter, Casey S. Greene

    Abstract: After emerging in China in late 2019, the novel Severe acute respiratory syndrome-like coronavirus 2 (SARS-CoV-2) spread worldwide and as of early 2021, continues to significantly impact most countries. Only a small number of coronaviruses are known to infect humans, and only two are associated with the severe outcomes associated with SARS-CoV-2: Severe acute respiratory syndrome-related coronavir… ▽ More

    Submitted 10 September, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

  10. arXiv:2010.09885  [pdf, other

    cs.LG cs.CL physics.chem-ph q-bio.BM

    ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

    Authors: Seyone Chithrananda, Gabriel Grand, Bharath Ramsundar

    Abstract: GNNs and chemical fingerprints are the predominant approaches to representing molecules for property prediction. However, in NLP, transformers have become the de-facto standard for representation learning thanks to their strong downstream task transfer. In parallel, the software ecosystem around transformers is maturing rapidly, with libraries like HuggingFace and BertViz enabling streamlined trai… ▽ More

    Submitted 23 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: Submitted to NeurIPS 2020 ML for Molecules Workshop

    ACM Class: I.2.7; I.2.1; J.2; J.3

  11. arXiv:1911.05211  [pdf, other

    q-bio.QM cs.LG stat.ML

    AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

    Authors: Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson, Jim Brase, Jonathan E. Allen

    Abstract: One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine,… ▽ More

    Submitted 13 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

  12. arXiv:1706.01643  [pdf

    cs.LG q-bio.QM stat.ML

    Retrosynthetic reaction prediction using neural sequence-to-sequence models

    Authors: Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, Vijay Pande

    Abstract: We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation… ▽ More

    Submitted 6 June, 2017; originally announced June 2017.

  13. arXiv:1405.1444  [pdf, other

    q-bio.BM stat.AP stat.ML

    Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

    Authors: Robert T. McGibbon, Bharath Ramsundar, Mohammad M. Sultan, Gert Kiss, Vijay S. Pande

    Abstract: We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity of providing a… ▽ More

    Submitted 6 May, 2014; originally announced May 2014.

    Journal ref: Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014