Abundance-Aware Set Transformer for Microbiome Sample Embedding

Yoo, Hyunwoo; Rosen, Gail

Abstract:Microbiome sample representation to input into LLMs is essential for downstream tasks such as phenotype prediction and environmental classification. While prior studies have explored embedding-based representations of each microbiome sample, most rely on simple averaging over sequence embeddings, often overlooking the biological importance of taxa abundance. In this work, we propose an abundance-aware variant of the Set Transformer to construct fixed-size sample-level embeddings by weighting sequence embeddings according to their relative abundance. Without modifying the model architecture, we replicate embedding vectors proportional to their abundance and apply self-attention-based aggregation. Our method outperforms average pooling and unweighted Set Transformers on real-world microbiome classification tasks, achieving perfect performance in some cases. These results demonstrate the utility of abundance-aware aggregation for robust and biologically informed microbiome representation. To the best of our knowledge, this is one of the first approaches to integrate sequence-level abundance into Transformer-based sample embeddings.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2508.11075 [cs.LG]
	(or arXiv:2508.11075v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2508.11075

Computer Science > Machine Learning

Title:Abundance-Aware Set Transformer for Microbiome Sample Embedding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators