Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Ribeiro, Manuel de Sousa; Leote, Afonso; Leite, João

Computer Science > Machine Learning

arXiv:2507.18681 (cs)

[Submitted on 24 Jul 2025]

Title:Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Authors:Manuel de Sousa Ribeiro, Afonso Leote, João Leite

View PDF

Abstract:Concept probing has recently gained popularity as a way for humans to peek into what is encoded within artificial neural networks. In concept probing, additional classifiers are trained to map the internal representations of a model into human-defined concepts of interest. However, the performance of these probes is highly dependent on the internal representations they probe from, making identifying the appropriate layer to probe an essential task. In this paper, we propose a method to automatically identify which layer's representations in a neural network model should be considered when probing for a given human-defined concept of interest, based on how informative and regular the representations are with respect to the concept. We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.

Comments:	Extended version of the paper published in Proceedings of the International Conference on Neurosymbolic Learning and Reasoning (NeSy 2025)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2507.18681 [cs.LG]
	(or arXiv:2507.18681v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2507.18681

Submission history

From: Manuel de Sousa Ribeiro [view email]
[v1] Thu, 24 Jul 2025 16:30:10 UTC (1,118 KB)

Computer Science > Machine Learning

Title:Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators