MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

Park, Jun-Hyung; Kim, Yeachan; Lee, Mingyu; Park, Hyuntae; Lee, SangKeun

Physics > Chemical Physics

arXiv:2408.01426 (physics)

[Submitted on 9 Jul 2024]

Title:MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

Authors:Jun-Hyung Park, Yeachan Kim, Mingyu Lee, Hyuntae Park, SangKeun Lee (Korea University)

View PDF HTML (experimental)

Abstract:Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences -- textual descriptors of molecules. Despite its success in molecular property prediction, current practices often lead to overfitting and limited scalability due to early convergence. In this paper, we introduce a novel chemical language representation learning framework, called MolTRES, to address these issues. MolTRES incorporates generator-discriminator training, allowing the model to learn from more challenging examples that require structural understanding. In addition, we enrich molecular representations by transferring knowledge from scientific literature by integrating external materials embedding. Experimental results show that our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.

Comments:	12 pages, 5 figures, submitted to EMNLP 2024 main track
Subjects:	Chemical Physics (physics.chem-ph); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
ACM classes:	I.2.7
Cite as:	arXiv:2408.01426 [physics.chem-ph]
	(or arXiv:2408.01426v1 [physics.chem-ph] for this version)
	https://doi.org/10.48550/arXiv.2408.01426

Submission history

From: Jun-Hyung Park [view email]
[v1] Tue, 9 Jul 2024 01:14:28 UTC (596 KB)

Physics > Chemical Physics

Title:MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Physics > Chemical Physics

Title:MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators