UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation

Guo, Shuhan; Bian, Yatao; Wang, Ruibing; Yin, Nan; Wang, Zhen; Yao, Quanming

Computer Science > Computation and Language

arXiv:2408.00863 (cs)

[Submitted on 1 Aug 2024 (v1), last revised 21 Jun 2025 (this version, v2)]

Title:UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation

Authors:Shuhan Guo, Yatao Bian, Ruibing Wang, Nan Yin, Zhen Wang, Quanming Yao

View PDF HTML (experimental)

Abstract:The remarkable success of Large Language Models (LLMs) across diverse tasks has driven the research community to extend their capabilities to molecular applications. However, most molecular LLMs employ adapter-based architectures that do not treat molecule and text modalities equally and lack a supervision signal for the molecule modality. To address these issues, we introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture that expands the vocabulary of LLM with molecule tokens. Specifically, we introduce a Vector Quantization-driven tokenizer that incorporates a Q-Former to bridge the modality gap between molecule and text. This tokenizer transforms molecules into sequences of molecule tokens with causal dependency, encapsulating high-level molecular and textual information. Equipped with this tokenizer, UniMoT can unify molecule and text modalities under a shared token representation and an autoregressive training paradigm, enabling it to interpret molecules as a foreign language and generate them as text. Following a four-stage training scheme, UniMoT emerges as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks. Extensive experiments demonstrate that UniMoT achieves state-of-the-art performance across a wide range of molecule comprehension and generation tasks.

Comments:	IJCAI 2025
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2408.00863 [cs.CL]
	(or arXiv:2408.00863v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2408.00863

Submission history

From: Quanming Yao [view email]
[v1] Thu, 1 Aug 2024 18:31:31 UTC (350 KB)
[v2] Sat, 21 Jun 2025 08:13:11 UTC (362 KB)

Computer Science > Computation and Language

Title:UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators