A Systematic Study of Joint Representation Learning on Protein Sequences and Structures

Zhang, Zuobai; Wang, Chuanrui; Xu, Minghao; Chenthamarakshan, Vijil; Lozano, Aurélie; Das, Payel; Tang, Jian

Quantitative Biology > Quantitative Methods

arXiv:2303.06275 (q-bio)

[Submitted on 11 Mar 2023 (v1), last revised 18 Oct 2023 (this version, v2)]

Title:A Systematic Study of Joint Representation Learning on Protein Sequences and Structures

Authors:Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang

View PDF

Abstract:Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions. Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge. In contrast, structure-based methods leverage 3D structural information with graph neural networks and geometric pre-training methods show potential in function prediction tasks, but still suffers from the limited number of available structures. To bridge this gap, our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM (ESM-2) with distinct structure encoders (GVP, GearNet, CDConv). We introduce three representation fusion strategies and explore different pre-training techniques. Our method achieves significant improvements over existing sequence- and structure-based methods, setting new state-of-the-art for function annotation. This study underscores several important design choices for fusing protein sequence and structure information. Our implementation is available at this https URL.

Subjects:	Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
Cite as:	arXiv:2303.06275 [q-bio.QM]
	(or arXiv:2303.06275v2 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2303.06275

Submission history

From: Zuobai Zhang [view email]
[v1] Sat, 11 Mar 2023 01:24:10 UTC (734 KB)
[v2] Wed, 18 Oct 2023 16:11:11 UTC (7,564 KB)

Quantitative Biology > Quantitative Methods

Title:A Systematic Study of Joint Representation Learning on Protein Sequences and Structures

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title:A Systematic Study of Joint Representation Learning on Protein Sequences and Structures

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators