NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
Authors:
Jaden Fiotto-Kaufman,
Alexander R. Loftus,
Eric Todd,
Jannik Brinkmann,
Koyena Pal,
Dmitrii Troitskii,
Michael Ripa,
Adam Belfki,
Can Rager,
Caden Juang,
Aaron Mueller,
Samuel Marks,
Arnab Sen Sharma,
Francesca Lucchetti,
Nikhil Prakash,
Carla Brodley,
Arjun Guha,
Jonathan Bell,
Byron C. Wallace,
David Bau
Abstract:
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU re…
▽ More
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models. These technologies are enabled by the Intervention Graph, an architecture developed to decouple experimental design from model runtime. Together, this framework provides transparent and efficient access to the internals of deep neural networks such as very large language models (LLMs) without imposing the cost or complexity of hosting customized models individually. We conduct a quantitative survey of the machine learning literature that reveals a growing gap in the study of the internals of large-scale AI. We demonstrate the design and use of our framework to address this gap by enabling a range of research methods on huge models. Finally, we conduct benchmarks to compare performance with previous approaches.
Code, documentation, and tutorials are available at https://nnsight.net/.
△ Less
Submitted 1 April, 2025; v1 submitted 18 July, 2024;
originally announced July 2024.
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Authors:
Wes Gurnee,
Neel Nanda,
Matthew Pauly,
Katherine Harvey,
Dmitrii Troitskii,
Dimitris Bertsimas
Abstract:
Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how high-level human-interpretable features are represented within the internal neuron activations of LLMs. We train $k$-sparse linear classifiers (probes) on these internal activations to predict the presence of f…
▽ More
Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how high-level human-interpretable features are represented within the internal neuron activations of LLMs. We train $k$-sparse linear classifiers (probes) on these internal activations to predict the presence of features in the input; by varying the value of $k$ we study the sparsity of learned representations and how this varies with model scale. With $k=1$, we localize individual neurons which are highly relevant for a particular feature, and perform a number of case studies to illustrate general properties of LLMs. In particular, we show that early layers make use of sparse combinations of neurons to represent many features in superposition, that middle layers have seemingly dedicated neurons to represent higher-level contextual features, and that increasing scale causes representational sparsity to increase on average, but there are multiple types of scaling dynamics. In all, we probe for over 100 unique features comprising 10 different categories in 7 different models spanning 70 million to 6.9 billion parameters.
△ Less
Submitted 2 June, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.