Data Metabolism: An Efficient Data Design Schema For Vision Language Model

Zhang, Jingyuan; Zhang, Hongzhi; Haonan, Zhou; Sun, Chenxi; ji, Xingguang; Wang, Jiakang; Kong, Fanheng; Liu, Yahui; Wang, Qi; Zhang, Fuzheng

Computer Science > Computation and Language

arXiv:2504.12316 (cs)

[Submitted on 10 Apr 2025]

Title:Data Metabolism: An Efficient Data Design Schema For Vision Language Model

Authors:Jingyuan Zhang, Hongzhi Zhang, Zhou Haonan, Chenxi Sun, Xingguang ji, Jiakang Wang, Fanheng Kong, Yahui Liu, Qi Wang, Fuzheng Zhang

View PDF HTML (experimental)

Abstract:Data curation plays a crucial role in training powerful Visual Language Models (VLMs). In this work, we introduce the concept of Data Metabolism and present our data-centric framework to build VLMs throughout the development lifecycle. Starting from a standard model architecture, we discuss and provide insights into two crucial development steps: data curation and iteration, forming a closed-loop system that continuously improves model performance. We show a detailed codebook on how to process existing massive datasets and build user-specific data flywheel. As a demonstration, we release a VLM, named Capybara-VL, which excels in typical multimodal tasks (e.g. , visual question answering, scientific reasoning, and text-rich tasks). Despite its relatively compact size, Capybara-VL surpasses several open-source models that are up to 10 times larger in size. Moreover, it achieves results that are on par with those of several leading proprietary models, demonstrating its remarkable competitiveness. These results highlight the power of our data-centric framework and the potential of training smaller and more efficient VLMs.

Comments:	To be presented at ICLR 2025, First Workshop on Open Science for Foundation Models
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.12316 [cs.CL]
	(or arXiv:2504.12316v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.12316

Submission history

From: Jingyuan Zhang [view email]
[v1] Thu, 10 Apr 2025 07:20:54 UTC (8,494 KB)

Computer Science > Computation and Language

Title:Data Metabolism: An Efficient Data Design Schema For Vision Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Data Metabolism: An Efficient Data Design Schema For Vision Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators