VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Zou, Bocheng; Cai, Mu; Zhang, Jianrui; Lee, Yong Jae

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.10972 (cs)

[Submitted on 15 Jul 2024 (v1), last revised 29 Aug 2024 (this version, v2)]

Title:VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Authors:Bocheng Zou, Mu Cai, Jianrui Zhang, Yong Jae Lee

View PDF HTML (experimental)

Abstract:In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the world using geometry primitives such as polygons. Vector graphics (VG), on the other hand, offer a textual representation of visual content, which can be more concise and powerful for content like cartoons, sketches and scientific figures. Recent studies have shown promising results on processing vector graphics with capable Large Language Models (LLMs). However, such works focus solely on qualitative results, understanding, or a specific type of vector graphics. We propose VGBench, a comprehensive benchmark for LLMs on handling vector graphics through diverse aspects, including (a) both visual understanding and generation, (b) evaluation of various vector graphics formats, (c) diverse question types, (d) wide range of prompting techniques, (e) under multiple LLMs and (f) comparison with VLMs on rasterized representations. Evaluating on our collected 4279 understanding and 5845 generation samples, we find that LLMs show strong capability on both aspects while exhibiting less desirable performance on low-level formats (SVG). Both data and evaluation pipeline will be open-sourced at this https URL.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2407.10972 [cs.CV]
	(or arXiv:2407.10972v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.10972

Submission history

From: Bocheng Zou [view email]
[v1] Mon, 15 Jul 2024 17:59:55 UTC (1,130 KB)
[v2] Thu, 29 Aug 2024 17:55:52 UTC (1,125 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators