-
What Do You See in Common? Learning Hierarchical Prototypes over Tree-of-Life to Discover Evolutionary Traits
Authors:
Harish Babu Manogaran,
M. Maruf,
Arka Daw,
Kazi Sajeed Mehrab,
Caleb Patrick Charpentier,
Josef C. Uyeda,
Wasila Dahdul,
Matthew J Thompson,
Elizabeth G Campolongo,
Kaiya L Provost,
Wei-Lun Chao,
Tanya Berger-Wolf,
Paula M. Mabee,
Hilmar Lapp,
Anuj Karpatne
Abstract:
A grand challenge in biology is to discover evolutionary traits - features of organisms common to a group of species with a shared ancestor in the tree of life (also referred to as phylogenetic tree). With the growing availability of image repositories in biology, there is a tremendous opportunity to discover evolutionary traits directly from images in the form of a hierarchy of prototypes. Howeve…
▽ More
A grand challenge in biology is to discover evolutionary traits - features of organisms common to a group of species with a shared ancestor in the tree of life (also referred to as phylogenetic tree). With the growing availability of image repositories in biology, there is a tremendous opportunity to discover evolutionary traits directly from images in the form of a hierarchy of prototypes. However, current prototype-based methods are mostly designed to operate over a flat structure of classes and face several challenges in discovering hierarchical prototypes, including the issue of learning over-specific prototypes at internal nodes. To overcome these challenges, we introduce the framework of Hierarchy aligned Commonality through Prototypical Networks (HComP-Net). The key novelties in HComP-Net include a novel over-specificity loss to avoid learning over-specific prototypes, a novel discriminative loss to ensure prototypes at an internal node are absent in the contrasting set of species with different ancestry, and a novel masking module to allow for the exclusion of over-specific prototypes at higher levels of the tree without hampering classification performance. We empirically show that HComP-Net learns prototypes that are accurate, semantically consistent, and generalizable to unseen species in comparison to baselines.
△ Less
Submitted 15 June, 2025; v1 submitted 3 September, 2024;
originally announced September 2024.
-
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
Authors:
M. Maruf,
Arka Daw,
Kazi Sajeed Mehrab,
Harish Babu Manogaran,
Abhilash Neog,
Medha Sawhney,
Mridul Khurana,
James P. Balhoff,
Yasin Bakis,
Bahadir Altintas,
Matthew J. Thompson,
Elizabeth G. Campolongo,
Josef C. Uyeda,
Hilmar Lapp,
Henry L. Bart,
Paula M. Mabee,
Yu Su,
Wei-Lun Chao,
Charles Stewart,
Tanya Berger-Wolf,
Wasila Dahdul,
Anuj Karpatne
Abstract:
Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning.…
▽ More
Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images. The code and datasets for running all the analyses reported in this paper can be found at https://github.com/sammarfy/VLM4Bio.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
Authors:
Mridul Khurana,
Arka Daw,
M. Maruf,
Josef C. Uyeda,
Wasila Dahdul,
Caleb Charpentier,
Yasin Bakış,
Henry L. Bart Jr.,
Paula M. Mabee,
Hilmar Lapp,
James P. Balhoff,
Wei-Lun Chao,
Charles Stewart,
Tanya Berger-Wolf,
Anuj Karpatne
Abstract:
A central problem in biology is to understand how organisms evolve and adapt to their environment by acquiring variations in the observable characteristics or traits of species across the tree of life. With the growing availability of large-scale image repositories in biology and recent advances in generative modeling, there is an opportunity to accelerate the discovery of evolutionary traits auto…
▽ More
A central problem in biology is to understand how organisms evolve and adapt to their environment by acquiring variations in the observable characteristics or traits of species across the tree of life. With the growing availability of large-scale image repositories in biology and recent advances in generative modeling, there is an opportunity to accelerate the discovery of evolutionary traits automatically from images. Toward this goal, we introduce Phylo-Diffusion, a novel framework for conditioning diffusion models with phylogenetic knowledge represented in the form of HIERarchical Embeddings (HIER-Embeds). We also propose two new experiments for perturbing the embedding space of Phylo-Diffusion: trait masking and trait swapping, inspired by counterpart experiments of gene knockout and gene editing/swapping. Our work represents a novel methodological advance in generative modeling to structure the embedding space of diffusion models using tree-based knowledge. Our work also opens a new chapter of research in evolutionary biology by using generative models to visualize evolutionary changes directly from images. We empirically demonstrate the usefulness of Phylo-Diffusion in capturing meaningful trait variations for fishes and birds, revealing novel insights about the biological mechanisms of their evolution.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks
Authors:
Mohannad Elhamod,
Mridul Khurana,
Harish Babu Manogaran,
Josef C. Uyeda,
Meghan A. Balk,
Wasila Dahdul,
Yasin Bakış,
Henry L. Bart Jr.,
Paula M. Mabee,
Hilmar Lapp,
James P. Balhoff,
Caleb Charpentier,
David Carlyn,
Wei-Lun Chao,
Charles V. Stewart,
Daniel I. Rubenstein,
Tanya Berger-Wolf,
Anuj Karpatne
Abstract:
Discovering evolutionary traits that are heritable across species on the tree of life (also referred to as a phylogenetic tree) is of great interest to biologists to understand how organisms diversify and evolve. However, the measurement of traits is often a subjective and labor-intensive process, making trait discovery a highly label-scarce problem. We present a novel approach for discovering evo…
▽ More
Discovering evolutionary traits that are heritable across species on the tree of life (also referred to as a phylogenetic tree) is of great interest to biologists to understand how organisms diversify and evolve. However, the measurement of traits is often a subjective and labor-intensive process, making trait discovery a highly label-scarce problem. We present a novel approach for discovering evolutionary traits directly from images without relying on trait labels. Our proposed approach, Phylo-NN, encodes the image of an organism into a sequence of quantized feature vectors -- or codes -- where different segments of the sequence capture evolutionary signals at varying ancestry levels in the phylogeny. We demonstrate the effectiveness of our approach in producing biologically meaningful results in a number of downstream tasks including species image generation and species-to-species image translation, using fish species as a target example.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.