Skip to main content

Showing 1–27 of 27 results for author: Ramsundar, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.13519  [pdf, other

    cs.LG q-bio.BM

    Open-Source Protein Language Models for Function Prediction and Protein Design

    Authors: Shivasankaran Vanaja Pandi, Bharath Ramsundar

    Abstract: Protein language models (PLMs) have shown promise in improving the understanding of protein sequences, contributing to advances in areas such as function prediction and protein engineering. However, training these models from scratch requires significant computational resources, limiting their accessibility. To address this, we integrate a PLM into DeepChem, an open-source framework for computatio… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: To be published in 4th Annual AAAI workshop on AI to Accelerate Science and Engineering

  2. arXiv:2411.19882  [pdf, other

    cs.LG

    Open source Differentiable ODE Solving Infrastructure

    Authors: Rakshit Kr. Singh, Aaron Rock Menezes, Rida Irfan, Bharath Ramsundar

    Abstract: Ordinary Differential Equations (ODEs) are widely used in physics, chemistry, and biology to model dynamic systems, including reaction kinetics, population dynamics, and biological processes. In this work, we integrate GPU-accelerated ODE solvers into the open-source DeepChem framework, making these tools easily accessible. These solvers support multiple numerical methods and are fully differentia… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

  3. arXiv:2411.11513  [pdf, other

    q-bio.QM cs.LG

    A Modular Open Source Framework for Genomic Variant Calling

    Authors: Ankita Vaishnobi Bisoi, Bharath Ramsundar

    Abstract: Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open-source drug discovery framework, through the integration of DeepVariant. In particular, we introduce a variant calling pipeline that leverages Dee… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  4. arXiv:2409.08163  [pdf, other

    cs.LG cs.CV q-bio.QM

    Open Source Infrastructure for Automatic Cell Segmentation

    Authors: Aaron Rock Menezes, Bharath Ramsundar

    Abstract: Automated cell segmentation is crucial for various biological and medical applications, facilitating tasks like cell counting, morphology analysis, and drug discovery. However, manual segmentation is time-consuming and prone to subjectivity, necessitating robust automated methods. This paper presents open-source infrastructure, utilizing the UNet model, a deep-learning architecture noted for its e… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  5. arXiv:2408.06261  [pdf, other

    cs.LG cs.AI q-bio.BM

    Open-Source Molecular Processing Pipeline for Generating Molecules

    Authors: V Shreyas, Jose Siguenza, Karan Bania, Bharath Ramsundar

    Abstract: Generative models for molecules have shown considerable promise for use in computational chemistry, but remain difficult to use for non-experts. For this reason, we introduce open-source infrastructure for easily building generative molecular models into the widely used DeepChem [Ramsundar et al., 2019] library with the aim of creating a robust and reusable molecular generation pipeline. In partic… ▽ More

    Submitted 28 November, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Presented at the Molecular Machine Learning Conference 2024 (MoML 2024), BayLearn 2024 and the Machine Learning and Physical Sciences (ML4PS) Workshop at NeurIPS 2024

  6. arXiv:2407.06209  [pdf, other

    cs.LG

    Self-supervised Pretraining for Partial Differential Equations

    Authors: Varun Madhavan, Amal S Sebastian, Bharath Ramsundar, Venkatasubramanian Viswanathan

    Abstract: In this work, we describe a novel approach to building a neural PDE solver leveraging recent advances in transformer based neural network architectures. Our model can provide solutions for different values of PDE parameters without any need for retraining the network. The training is carried out in a self-supervised manner, similar to pretraining approaches applied in language and vision tasks. We… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  7. arXiv:2401.10287  [pdf, other

    cs.LG physics.chem-ph

    Open-Source Fermionic Neural Networks with Ionic Charge Initialization

    Authors: Shai Pranesh, Shang Zhu, Venkat Viswanathan, Bharath Ramsundar

    Abstract: Finding accurate solutions to the electronic Schrödinger equation plays an important role in discovering important molecular and material energies and characteristics. Consequently, solving systems with large numbers of electrons has become increasingly important. Variational Monte Carlo (VMC) methods, especially those approximated through deep neural networks, are promising in this regard. In thi… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)

  8. arXiv:2310.03047  [pdf, other

    physics.chem-ph cs.LG

    Differentiable Modeling and Optimization of Battery Electrolyte Mixtures Using Geometric Deep Learning

    Authors: Shang Zhu, Bharath Ramsundar, Emil Annevelink, Hongyi Lin, Adarsh Dave, Pin-Wen Guan, Kevin Gering, Venkatasubramanian Viswanathan

    Abstract: Electrolytes play a critical role in designing next-generation battery systems, by allowing efficient ion transfer, preventing charge transfer, and stabilizing electrode-electrolyte interfaces. In this work, we develop a differentiable geometric deep learning (GDL) model for chemical mixtures, DiffMix, which is applied in guiding robotic experimentation and optimization towards fast-charging batte… ▽ More

    Submitted 1 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

  9. arXiv:2309.15985  [pdf, other

    cs.LG

    Open Source Infrastructure for Differentiable Density Functional Theory

    Authors: Advika Vidhyadhiraja, Arun Pa Thiagarajan, Shang Zhu, Venkat Viswanathan, Bharath Ramsundar

    Abstract: Learning exchange correlation functionals, used in quantum chemistry calculations, from data has become increasingly important in recent years, but training such a functional requires sophisticated software infrastructure. For this reason, we build open source infrastructure to train neural exchange correlation functionals. We aim to standardize the processing pipeline by adapting state-of-the-art… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  10. arXiv:2209.01712  [pdf, other

    cs.LG cs.AI q-bio.BM

    ChemBERTa-2: Towards Chemical Foundation Models

    Authors: Walid Ahmad, Elana Simon, Seyone Chithrananda, Gabriel Grand, Bharath Ramsundar

    Abstract: Large pretrained models such as GPT-3 have had tremendous impact on modern natural language processing by leveraging self-supervised learning to learn salient representations that can be used to readily finetune on a wide variety of downstream tasks. We investigate the possibility of transferring such advances to molecular machine learning by building a chemical foundation model, ChemBERTa-2, usin… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

    Comments: ELLIS Machine Learning for Molecule Discovery Workshop

    ACM Class: I.2.7; I.2.1; J.2; J.3

  11. arXiv:2203.04698  [pdf, other

    cs.LG q-bio.QM

    Score-Based Generative Models for Molecule Generation

    Authors: Dwaraknath Gnaneshwar, Bharath Ramsundar, Dhairya Gandhi, Rachel Kurchin, Venkatasubramanian Viswanathan

    Abstract: Recent advances in generative models have made exploring design spaces easier for de novo molecule generation. However, popular generative models like GANs and normalizing flows face challenges such as training instabilities due to adversarial training and architectural constraints, respectively. Score-based generative models sidestep these challenges by modelling the gradient of the log probabili… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  12. arXiv:2201.12419  [pdf, other

    physics.chem-ph cs.LG

    FastFlows: Flow-Based Models for Molecular Graph Generation

    Authors: Nathan C. Frey, Vijay Gadepally, Bharath Ramsundar

    Abstract: We propose a framework using normalizing-flow based models, SELF-Referencing Embedded Strings, and multi-objective optimization that efficiently generates small molecules. With an initial training set of only 100 small molecules, FastFlows generates thousands of chemically valid molecules in seconds. Because of the efficient sampling, substructure filters can be applied as desired to eliminate com… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

    Comments: 7 pages, 4 figures, ELLIS Machine Learning for Molecule Discovery Workshop 2021

  13. arXiv:2112.04977  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph

    Bringing Atomistic Deep Learning to Prime Time

    Authors: Nathan C. Frey, Siddharth Samsi, Bharath Ramsundar, Connor W. Coley, Vijay Gadepally

    Abstract: Artificial intelligence has not yet revolutionized the design of materials and molecules. In this perspective, we identify four barriers preventing the integration of atomistic deep learning, molecular science, and high-performance computing. We outline focused research efforts to address the opportunities presented by these challenges.

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: 6 pages, 1 figure, NeurIPS 2021 AI for Science workshop

  14. arXiv:2109.07573  [pdf, other

    cs.LG physics.chem-ph

    Differentiable Physics: A Position Piece

    Authors: Bharath Ramsundar, Dilip Krishnamurthy, Venkatasubramanian Viswanathan

    Abstract: Differentiable physics provides a new approach for modeling and understanding the physical systems by pairing the new technology of differentiable programming with classical numerical methods for physical simulation. We survey the rapidly growing literature of differentiable physics techniques and highlight methods for parameter estimation, learning representations, solving differential equations,… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: 12 pages, 1 figure

  15. arXiv:2011.04426  [pdf, other

    cond-mat.mtrl-sci cs.LG

    AutoMat: Accelerated Computational Electrochemical systems Discovery

    Authors: Emil Annevelink, Rachel Kurchin, Eric Muckley, Lance Kavalsky, Vinay I. Hegde, Valentin Sulzer, Shang Zhu, Jiankun Pu, David Farina, Matthew Johnson, Dhairya Gandhi, Adarsh Dave, Hongyi Lin, Alan Edelman, Bharath Ramsundar, James Saal, Christopher Rackauckas, Viral Shah, Bryce Meredig, Venkatasubramanian Viswanathan

    Abstract: Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive fi… ▽ More

    Submitted 13 May, 2022; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: v1-3:4 pages, 1 figure, accepted to NeurIPS Climate Change and AI Workshop 2020, updating acknowledgements and citations v4: substantially updated content and author list, accepted to MRS Bulletin

  16. arXiv:2010.09885  [pdf, other

    cs.LG cs.CL physics.chem-ph q-bio.BM

    ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

    Authors: Seyone Chithrananda, Gabriel Grand, Bharath Ramsundar

    Abstract: GNNs and chemical fingerprints are the predominant approaches to representing molecules for property prediction. However, in NLP, transformers have become the de-facto standard for representation learning thanks to their strong downstream task transfer. In parallel, the software ecosystem around transformers is maturing rapidly, with libraries like HuggingFace and BertViz enabling streamlined trai… ▽ More

    Submitted 23 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: Submitted to NeurIPS 2020 ML for Molecules Workshop

    ACM Class: I.2.7; I.2.1; J.2; J.3

  17. arXiv:1911.05211  [pdf, other

    q-bio.QM cs.LG stat.ML

    AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

    Authors: Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson, Jim Brase, Jonathan E. Allen

    Abstract: One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine,… ▽ More

    Submitted 13 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

  18. arXiv:1907.01489  [pdf, other

    cs.CR

    Secure Computation in Decentralized Data Markets

    Authors: Fattaneh Bayatbabolghani, Bharath Ramsundar

    Abstract: Decentralized data markets gather data from many contributors to create a joint data cooperative governed by market stakeholders. The ability to perform secure computation on decentralized data markets would allow for useful insights to be gained while respecting the privacy of data contributors. In this paper, we design secure protocols for such computation by utilizing secure multi-party computa… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: 13 pages, 2 figures

  19. arXiv:1806.00139  [pdf, other

    cs.CR

    Tokenized Data Markets

    Authors: Bharath Ramsundar, Roger Chen, Alok Vasudev, Rob Robbins, Artur Gorokh

    Abstract: We formalize the construction of decentralized data markets by introducing the mathematical construction of tokenized data structures, a new form of incentivized data structure. These structures both specialize and extend past work on token curated registries and distributed data structures. They provide a unified model for reasoning about complex data structures assembled by multiple agents with… ▽ More

    Submitted 31 May, 2018; originally announced June 2018.

  20. arXiv:1803.04465  [pdf, other

    cs.LG

    PotentialNet for Molecular Property Prediction

    Authors: Evan N. Feinberg, Debnil Sur, Zhenqin Wu, Brooke E. Husic, Huanghao Mai, Yang Li, Saisai Sun, Jianyi Yang, Bharath Ramsundar, Vijay S. Pande

    Abstract: The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning model… ▽ More

    Submitted 22 October, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: 13 pages, 5 figures, 8 tables

  21. arXiv:1706.01643  [pdf

    cs.LG q-bio.QM stat.ML

    Retrosynthetic reaction prediction using neural sequence-to-sequence models

    Authors: Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, Vijay Pande

    Abstract: We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation… ▽ More

    Submitted 6 June, 2017; originally announced June 2017.

  22. arXiv:1703.10603  [pdf, other

    cs.LG physics.chem-ph stat.ML

    Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

    Authors: Joseph Gomes, Bharath Ramsundar, Evan N. Feinberg, Vijay S. Pande

    Abstract: Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or f… ▽ More

    Submitted 30 March, 2017; originally announced March 2017.

  23. arXiv:1703.00564  [pdf, other

    cs.LG physics.chem-ph stat.ML

    MoleculeNet: A Benchmark for Molecular Machine Learning

    Authors: Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande

    Abstract: Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are be… ▽ More

    Submitted 25 October, 2018; v1 submitted 1 March, 2017; originally announced March 2017.

  24. arXiv:1611.03199  [pdf

    cs.LG stat.ML

    Low Data Drug Discovery with One-shot Learning

    Authors: Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, Vijay Pande

    Abstract: Recent advances in machine learning have made significant contributions to drug discovery. Deep neural networks in particular have been demonstrated to provide significant boosts in predictive power when inferring the properties and activities of small-molecule compounds. However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this… ▽ More

    Submitted 10 November, 2016; originally announced November 2016.

  25. arXiv:1610.01642  [pdf

    stat.ML cs.LG

    Learning Protein Dynamics with Metastable Switching Systems

    Authors: Bharath Ramsundar, Vijay S. Pande

    Abstract: We introduce a machine learning approach for extracting fine-grained representations of protein evolution from molecular dynamics datasets. Metastable switching linear dynamical systems extend standard switching models with a physically-inspired stability constraint. This constraint enables the learning of nuanced representations of protein dynamics that closely match physical reality. We derive a… ▽ More

    Submitted 5 October, 2016; originally announced October 2016.

  26. arXiv:1502.02072  [pdf, other

    stat.ML cs.LG cs.NE

    Massively Multitask Networks for Drug Discovery

    Authors: Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, Vijay Pande

    Abstract: Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct biological sources. To train these architectures at scale, we gather large amounts of data from public sources to create a dataset of nearly 40 million measurements across more than 200 biological targets. We investigate several aspects of the multitask framework… ▽ More

    Submitted 6 February, 2015; originally announced February 2015.

    Comments: Preliminary work. Under review by the International Conference on Machine Learning (ICML)

  27. arXiv:1305.1704  [pdf, other

    stat.ML cs.AI

    The Extended Parameter Filter

    Authors: Yusuf Erol, Lei Li, Bharath Ramsundar, Stuart J. Russell

    Abstract: The parameters of temporal models, such as dynamic Bayesian networks, may be modelled in a Bayesian context as static or atemporal variables that influence transition probabilities at every time step. Particle filters fail for models that include such variables, while methods that use Gibbs sampling of parameter variables may incur a per-sample cost that grows linearly with the length of the obser… ▽ More

    Submitted 7 May, 2013; originally announced May 2013.

    Report number: UCB/EECS-2013-48

    Journal ref: ICML 2013