Global and Local Entailment Learning for Natural World Imagery

Sastry, Srikumar; Dhakal, Aayush; Xing, Eric; Khanal, Subash; Jacobs, Nathan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2506.21476 (cs)

[Submitted on 26 Jun 2025]

Title:Global and Local Entailment Learning for Natural World Imagery

Authors:Srikumar Sastry, Aayush Dhakal, Eric Xing, Subash Khanal, Nathan Jacobs

View PDF HTML (experimental)

Abstract:Learning the hierarchical structure of data in vision-language models is a significant challenge. Previous works have attempted to address this challenge by employing entailment learning. However, these approaches fail to model the transitive nature of entailment explicitly, which establishes the relationship between order and semantics within a representation space. In this work, we introduce Radial Cross-Modal Embeddings (RCME), a framework that enables the explicit modeling of transitivity-enforced entailment. Our proposed framework optimizes for the partial order of concepts within vision-language models. By leveraging our framework, we develop a hierarchical vision-language foundation model capable of representing the hierarchy in the Tree of Life. Our experiments on hierarchical species classification and hierarchical retrieval tasks demonstrate the enhanced performance of our models compared to the existing state-of-the-art models. Our code and models are open-sourced at this https URL.

Comments:	Accepted at ICCV 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2506.21476 [cs.CV]
	(or arXiv:2506.21476v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2506.21476

Submission history

From: Srikumar Sastry [view email]
[v1] Thu, 26 Jun 2025 17:05:06 UTC (3,804 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Global and Local Entailment Learning for Natural World Imagery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Global and Local Entailment Learning for Natural World Imagery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators