Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

Zhang, Shuo; Xie, Lei

Quantitative Biology > Quantitative Methods

arXiv:2312.03016 (q-bio)

[Submitted on 5 Dec 2023]

Title:Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

Authors:Shuo Zhang, Lei Xie

View PDF HTML (experimental)

Abstract:Prediction of ligand binding sites of proteins is a fundamental and important task for understanding the function of proteins and screening potential drugs. Most existing methods require experimentally determined protein holo-structures as input. However, such structures can be unavailable on novel or less-studied proteins. To tackle this limitation, we propose LaMPSite, which only takes protein sequences and ligand molecular graphs as input for ligand binding site predictions. The protein sequences are used to retrieve residue-level embeddings and contact maps from the pre-trained ESM-2 protein language model. The ligand molecular graphs are fed into a graph neural network to compute atom-level embeddings. Then we compute and update the protein-ligand interaction embedding based on the protein residue-level embeddings and ligand atom-level embeddings, and the geometric constraints in the inferred protein contact map and ligand distance map. A final pooling on protein-ligand interaction embedding would indicate which residues belong to the binding sites. Without any 3D coordinate information of proteins, our proposed model achieves competitive performance compared to baseline methods that require 3D protein structures when predicting binding sites. Given that less than 50% of proteins have reliable structure information in the current stage, LaMPSite will provide new opportunities for drug discovery.

Comments:	Accepted by the AI for Science (AI4Science) Workshop and the New Frontiers of AI for Drug Discovery and Development (AI4D3) Workshop at NeurIPS 2023
Subjects:	Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2312.03016 [q-bio.QM]
	(or arXiv:2312.03016v1 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2312.03016

Submission history

From: Shuo Zhang [view email]
[v1] Tue, 5 Dec 2023 01:47:38 UTC (1,743 KB)

Quantitative Biology > Quantitative Methods

Title:Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title:Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators