-
The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes
Authors:
Vladimir Baulin,
Austin Cook,
Daniel Friedman,
Janna Lumiruusu,
Andrew Pashea,
Shagor Rahman,
Benedikt Waldeck
Abstract:
The prevailing model for disseminating scientific knowledge relies on individual publications dispersed across numerous journals and archives. This legacy system is ill suited to the recent exponential proliferation of publications, contributing to insurmountable information overload, issues surrounding reproducibility and retractions. We introduce the Discovery Engine, a framework to address thes…
▽ More
The prevailing model for disseminating scientific knowledge relies on individual publications dispersed across numerous journals and archives. This legacy system is ill suited to the recent exponential proliferation of publications, contributing to insurmountable information overload, issues surrounding reproducibility and retractions. We introduce the Discovery Engine, a framework to address these challenges by transforming an array of disconnected literature into a unified, computationally tractable representation of a scientific domain. Central to our approach is the LLM-driven distillation of publications into structured "knowledge artifacts," instances of a universal conceptual schema, complete with verifiable links to source evidence. These artifacts are then encoded into a high-dimensional Conceptual Tensor. This tensor serves as the primary, compressed representation of the synthesized field, where its labeled modes index scientific components (concepts, methods, parameters, relations) and its entries quantify their interdependencies. The Discovery Engine allows dynamic "unrolling" of this tensor into human-interpretable views, such as explicit knowledge graphs (the CNM graph) or semantic vector spaces, for targeted exploration. Crucially, AI agents operate directly on the graph using abstract mathematical and learned operations to navigate the knowledge landscape, identify non-obvious connections, pinpoint gaps, and assist researchers in generating novel knowledge artifacts (hypotheses, designs). By converting literature into a structured tensor and enabling agent-based interaction with this compact representation, the Discovery Engine offers a new paradigm for AI-augmented scientific inquiry and accelerated discovery.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Authors:
Xilin Jiang,
Sukru Samet Dindar,
Vishal Choudhari,
Stephan Bickel,
Ashesh Mehta,
Guy M McKhann,
Daniel Friedman,
Adeen Flinker,
Nima Mesgarani
Abstract:
Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned…
▽ More
Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo and code available: https://aad-llm.github.io.
△ Less
Submitted 10 June, 2025; v1 submitted 23 February, 2025;
originally announced February 2025.
-
Continual Memorization of Factoids in Language Models
Authors:
Howard Chen,
Jiayi Geng,
Adithya Bhaskar,
Dan Friedman,
Danqi Chen
Abstract:
As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown that fine-tuning for memorization may be ineffective in storing knowledge or may exacerbate hallucinations. In this work, we introduce a setting we call continual memorizati…
▽ More
As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown that fine-tuning for memorization may be ineffective in storing knowledge or may exacerbate hallucinations. In this work, we introduce a setting we call continual memorization, where a model must memorize and retain a set of factoids through multiple stages of fine-tuning on subsequent datasets. We characterized the forgetting patterns through extensive experiments and show that LMs widely suffer from forgetting, especially when needing to memorize factoids in the second stage. We posit that forgetting can be alleviated by modifying training dynamics: (1) protecting the memorization process when learning factoids or (2) reducing interference from subsequent training stages. Intriguingly, we find that mixing randomly generated word sequences or generic data sampled from pretraining corpora at different training stages effectively mitigates forgetting REMIX: Random and Generic Data Mixing). REMIX can recover performance from severe forgetting, outperforming replay methods and other continual learning baselines. We analyze how REMIX influences the learning process and find that robust memorization follows a distinct pattern: the model stores factoids in earlier layers than usual and diversifies the layers that retain them, which results in easier recall and manipulate of the learned factoids.
△ Less
Submitted 27 February, 2025; v1 submitted 11 November, 2024;
originally announced November 2024.
-
When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1
Authors:
R. Thomas McCoy,
Shunyu Yao,
Dan Friedman,
Mathew D. Hardy,
Thomas L. Griffiths
Abstract:
In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms prev…
▽ More
In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms previous LLMs in many cases, with particularly large improvements on rare variants of common tasks (e.g., forming acronyms from the second letter of each word in a list, rather than the first letter). Despite these quantitative improvements, however, o1 still displays the same qualitative trends that we observed in previous systems. Specifically, o1 -- like previous LLMs -- is sensitive to the probability of examples and tasks, performing better and requiring fewer "thinking tokens" in high-probability settings than in low-probability ones. These results show that optimizing a language model for reasoning can mitigate but might not fully overcome the language model's probability sensitivity.
△ Less
Submitted 3 October, 2024; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Enhancing Population-based Search with Active Inference
Authors:
Nassim Dehouche,
Daniel Friedman
Abstract:
The Active Inference framework models perception and action as a unified process, where agents use probabilistic models to predict and actively minimize sensory discrepancies. In complement and contrast, traditional population-based metaheuristics rely on reactive environmental interactions without anticipatory adaptation. This paper proposes the integration of Active Inference into these metaheur…
▽ More
The Active Inference framework models perception and action as a unified process, where agents use probabilistic models to predict and actively minimize sensory discrepancies. In complement and contrast, traditional population-based metaheuristics rely on reactive environmental interactions without anticipatory adaptation. This paper proposes the integration of Active Inference into these metaheuristics to enhance performance through anticipatory environmental adaptation. We demonstrate this approach specifically with Ant Colony Optimization (ACO) on the Travelling Salesman Problem (TSP). Experimental results indicate that Active Inference can yield some improved solutions with only a marginal increase in computational cost, with interesting patterns of performance that relate to number and topology of nodes in the graph. Further work will characterize where and when different types of Active Inference augmentation of population metaheuristics may be efficacious.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Representing Rule-based Chatbots with Transformers
Authors:
Dan Friedman,
Abhishek Panigrahi,
Danqi Chen
Abstract:
What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing formal languages, but it remains unclear how to extend this approach to a conversational setting. In this work, we propose using ELIZA, a classic rule-based chat…
▽ More
What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing formal languages, but it remains unclear how to extend this approach to a conversational setting. In this work, we propose using ELIZA, a classic rule-based chatbot, as a setting for formal, mechanistic analysis of Transformer-based chatbots. ELIZA allows us to formally model key aspects of conversation, including local pattern matching and long-term dialogue state tracking. We first present a theoretical construction of a Transformer that implements the ELIZA chatbot. Building on prior constructions, particularly those for simulating finite-state automata, we show how simpler mechanisms can be composed and extended to produce more sophisticated behavior. Next, we conduct a set of empirical analyses of Transformers trained on synthetically generated ELIZA conversations. Our analysis illustrates the kinds of mechanisms these models tend to prefer--for example, models favor an induction head mechanism over a more precise, position-based copying mechanism; and using intermediate generations to simulate recurrent data structures, akin to an implicit scratchpad or Chain-of-Thought. Overall, by drawing an explicit connection between neural chatbots and interpretable, symbolic mechanisms, our results provide a new framework for the mechanistic analysis of conversational agents.
△ Less
Submitted 12 February, 2025; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Finding Transformer Circuits with Edge Pruning
Authors:
Adithya Bhaskar,
Alexander Wettig,
Dan Friedman,
Danqi Chen
Abstract:
The path to interpreting a language model often proceeds via analysis of circuits -- sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they rely either on inefficient search algorithms or inaccurate approximations. In this paper, we frame automated…
▽ More
The path to interpreting a language model often proceeds via analysis of circuits -- sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they rely either on inefficient search algorithms or inaccurate approximations. In this paper, we frame automated circuit discovery as an optimization problem and propose *Edge Pruning* as an effective and scalable solution. Edge Pruning leverages gradient-based pruning techniques, but instead of removing neurons or components, it prunes the \emph{edges} between components. Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods while being equally faithful to the full model predictions on standard circuit-finding tasks. Edge Pruning is efficient even with as many as 100K examples, outperforming previous methods in speed and producing substantially better circuits. It also perfectly recovers the ground-truth circuits in two models compiled with Tracr. Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on. We use this setting for a case study comparing the mechanisms behind instruction prompting and in-context learning. We find two circuits with more than 99.96% sparsity that match the performance of the full model and reveal that the mechanisms in the two settings overlap substantially. Our case study shows that Edge Pruning is a practical and scalable tool for interpretability and sheds light on behaviors that only emerge in large models.
△ Less
Submitted 2 April, 2025; v1 submitted 24 June, 2024;
originally announced June 2024.
-
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
Authors:
Adithya Bhaskar,
Dan Friedman,
Danqi Chen
Abstract:
Prior work has found that pretrained language models (LMs) fine-tuned with different random seeds can achieve similar in-domain performance but generalize differently on tests of syntactic generalization. In this work, we show that, even within a single model, we can find multiple subnetworks that perform similarly in-domain, but generalize vastly differently. To better understand these phenomena,…
▽ More
Prior work has found that pretrained language models (LMs) fine-tuned with different random seeds can achieve similar in-domain performance but generalize differently on tests of syntactic generalization. In this work, we show that, even within a single model, we can find multiple subnetworks that perform similarly in-domain, but generalize vastly differently. To better understand these phenomena, we investigate if they can be understood in terms of "competing subnetworks": the model initially represents a variety of distinct algorithms, corresponding to different subnetworks, and generalization occurs when it ultimately converges to one. This explanation has been used to account for generalization in simple algorithmic tasks ("grokking"). Instead of finding competing subnetworks, we find that all subnetworks -- whether they generalize or not -- share a set of attention heads, which we refer to as the heuristic core. Further analysis suggests that these attention heads emerge early in training and compute shallow, non-generalizing features. The model learns to generalize by incorporating additional attention heads, which depend on the outputs of the "heuristic" heads to compute higher-level features. Overall, our results offer a more detailed picture of the mechanisms for syntactic generalization in pretrained LMs.
△ Less
Submitted 5 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Interpretability Illusions in the Generalization of Simplified Models
Authors:
Dan Friedman,
Andrew Lampinen,
Lucas Dixon,
Danqi Chen,
Asma Ghandeharioun
Abstract:
A common method to study deep learning systems is to use simplified model representations--for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space. This approach assumes that the results of these simplifications are faithful to the original model. Here, we illustrate an important caveat to this assumption: even if the simplified represent…
▽ More
A common method to study deep learning systems is to use simplified model representations--for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space. This approach assumes that the results of these simplifications are faithful to the original model. Here, we illustrate an important caveat to this assumption: even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model's behavior out of distribution. We illustrate this by training Transformer models on controlled datasets with systematic generalization splits, including the Dyck balanced-parenthesis languages and a code completion task. We simplify these models using tools like dimensionality reduction and clustering, and then explicitly test how these simplified proxies match the behavior of the original model. We find consistent generalization gaps: cases in which the simplified proxies are more faithful to the original model on the in-distribution evaluations and less faithful on various tests of systematic generalization. This includes cases where the original model generalizes systematically but the simplified proxies fail, and cases where the simplified proxies generalize better. Together, our results raise questions about the extent to which mechanistic interpretations derived using tools like SVD can reliably predict what a model will do in novel situations.
△ Less
Submitted 5 June, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models
Authors:
Ariel Goldstein,
Eric Ham,
Mariano Schain,
Samuel Nastase,
Zaid Zada,
Avigail Dabush,
Bobbi Aubrey,
Harshvardhan Gazula,
Amir Feder,
Werner K Doyle,
Sasha Devore,
Patricia Dugan,
Daniel Friedman,
Roi Reichart,
Michael Brenner,
Avinatan Hassidim,
Orrin Devinsky,
Adeen Flinker,
Omer Levy,
Uri Hasson
Abstract:
Deep Language Models (DLMs) provide a novel computational paradigm for understanding the mechanisms of natural language processing in the human brain. Unlike traditional psycholinguistic models, DLMs use layered sequences of continuous numerical vectors to represent words and context, allowing a plethora of emerging applications such as human-like text generation. In this paper we show evidence th…
▽ More
Deep Language Models (DLMs) provide a novel computational paradigm for understanding the mechanisms of natural language processing in the human brain. Unlike traditional psycholinguistic models, DLMs use layered sequences of continuous numerical vectors to represent words and context, allowing a plethora of emerging applications such as human-like text generation. In this paper we show evidence that the layered hierarchy of DLMs may be used to model the temporal dynamics of language comprehension in the brain by demonstrating a strong correlation between DLM layer depth and the time at which layers are most predictive of the human brain. Our ability to temporally resolve individual layers benefits from our use of electrocorticography (ECoG) data, which has a much higher temporal resolution than noninvasive methods like fMRI. Using ECoG, we record neural activity from participants listening to a 30-minute narrative while also feeding the same narrative to a high-performing DLM (GPT2-XL). We then extract contextual embeddings from the different layers of the DLM and use linear encoding models to predict neural activity. We first focus on the Inferior Frontal Gyrus (IFG, or Broca's area) and then extend our model to track the increasing temporal receptive window along the linguistic processing hierarchy from auditory to syntactic and semantic areas. Our results reveal a connection between human language processing and DLMs, with the DLM's layer-by-layer accumulation of contextual information mirroring the timing of neural activity in high-order language areas.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
Authors:
R. Thomas McCoy,
Shunyu Yao,
Dan Friedman,
Matthew Hardy,
Thomas L. Griffiths
Abstract:
The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies t…
▽ More
The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. We predict that LLMs will achieve higher accuracy when these probabilities are high than when they are low - even in deterministic settings where probability should not matter. To test our predictions, we evaluate two LLMs (GPT-3.5 and GPT-4) on eleven tasks, and we find robust evidence that LLMs are influenced by probability in the ways that we have hypothesized. In many cases, the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability word sequence but only 13% when it is low-probability. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system - one that has been shaped by its own particular set of pressures.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Learning Transformer Programs
Authors:
Dan Friedman,
Alexander Wettig,
Danqi Chen
Abstract:
Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short of providing complete, faithful descriptions of the underlying algorithms. In this work, we introduce a procedure for training Transformers that are mechanistic…
▽ More
Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short of providing complete, faithful descriptions of the underlying algorithms. In this work, we introduce a procedure for training Transformers that are mechanistically interpretable by design. We build on RASP [Weiss et al., 2021], a programming language that can be compiled into Transformer weights. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization and then automatically converted into a discrete, human-readable program. We refer to these models as Transformer Programs. To validate our approach, we learn Transformer Programs for a variety of problems, including an in-context learning task, a suite of algorithmic problems (e.g. sorting, recognizing Dyck languages), and NLP tasks including named entity recognition and text classification. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size; and, more importantly, they are easy to interpret. To demonstrate these advantages, we convert Transformers into Python programs and use off-the-shelf code analysis tools to debug model errors and identify the "circuits" used to solve different sub-problems. We hope that Transformer Programs open a new path toward the goal of intrinsically interpretable machine learning.
△ Less
Submitted 30 October, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
Authors:
Chenglei Si,
Dan Friedman,
Nitish Joshi,
Shi Feng,
Danqi Chen,
He He
Abstract:
In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels. F…
▽ More
In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels. First, we characterize the feature biases of GPT-3 models by constructing underspecified demonstrations from a range of NLP datasets and feature combinations. We find that LLMs exhibit clear feature biases - for example, demonstrating a strong bias to predict labels according to sentiment rather than shallow lexical features, like punctuation. Second, we evaluate the effect of different interventions that are designed to impose an inductive bias in favor of a particular feature, such as adding a natural language instruction or using semantically relevant label words. We find that, while many interventions can influence the learner to prefer a particular feature, it can be difficult to overcome strong prior biases. Overall, our results provide a broader picture of the types of features that ICL may be more likely to exploit and how to impose inductive biases that are better aligned with the intended task.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Finding Dataset Shortcuts with Grammar Induction
Authors:
Dan Friedman,
Alexander Wettig,
Danqi Chen
Abstract:
Many NLP datasets have been found to contain shortcuts: simple decision rules that achieve surprisingly high accuracy. However, it is difficult to discover shortcuts automatically. Prior work on automatic shortcut detection has focused on enumerating features like unigrams or bigrams, which can find only low-level shortcuts, or relied on post-hoc model interpretability methods like saliency maps,…
▽ More
Many NLP datasets have been found to contain shortcuts: simple decision rules that achieve surprisingly high accuracy. However, it is difficult to discover shortcuts automatically. Prior work on automatic shortcut detection has focused on enumerating features like unigrams or bigrams, which can find only low-level shortcuts, or relied on post-hoc model interpretability methods like saliency maps, which reveal qualitative patterns without a clear statistical interpretation. In this work, we propose to use probabilistic grammars to characterize and discover shortcuts in NLP datasets. Specifically, we use a context-free grammar to model patterns in sentence classification datasets and use a synchronous context-free grammar to model datasets involving sentence pairs. The resulting grammars reveal interesting shortcut features in a number of datasets, including both simple and high-level features, and automatically identify groups of test examples on which conventional classifiers fail. Finally, we show that the features we discover can be used to generate diagnostic contrast examples and incorporated into standard robust optimization methods to improve worst-group accuracy.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
The Vendi Score: A Diversity Evaluation Metric for Machine Learning
Authors:
Dan Friedman,
Adji Bousso Dieng
Abstract:
Diversity is an important criterion for many areas of machine learning (ML), including generative modeling and dataset curation. However, existing metrics for measuring diversity are often domain-specific and limited in flexibility. In this paper, we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanic…
▽ More
Diversity is an important criterion for many areas of machine learning (ML), including generative modeling and dataset curation. However, existing metrics for measuring diversity are often domain-specific and limited in flexibility. In this paper, we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanics to ML. The Vendi Score is defined as the exponential of the Shannon entropy of the eigenvalues of a similarity matrix. This matrix is induced by a user-defined similarity function applied to the sample to be evaluated for diversity. In taking a similarity function as input, the Vendi Score enables its user to specify any desired form of diversity. Importantly, unlike many existing metrics in ML, the Vendi Score does not require a reference dataset or distribution over samples or labels, it is therefore general and applicable to any generative model, decoding algorithm, and dataset from any domain where similarity can be defined. We showcase the Vendi Score on molecular generative modeling where we found it addresses shortcomings of the current diversity metric of choice in that domain. We also applied the Vendi Score to generative models of images and decoding algorithms of text where we found it confirms known results about diversity in those domains. Furthermore, we used the Vendi Score to measure mode collapse, a known shortcoming of generative adversarial networks (GANs). In particular, the Vendi Score revealed that even GANs that capture all the modes of a labeled dataset can be less diverse than the original dataset. Finally, the interpretability of the Vendi Score allowed us to diagnose several benchmark ML datasets for diversity, opening the door for diversity-informed data augmentation.
△ Less
Submitted 2 July, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Artifact-Tolerant Clustering-Guided Contrastive Embedding Learning for Ophthalmic Images
Authors:
Min Shi,
Anagha Lokhande,
Mojtaba S. Fazli,
Vishal Sharma,
Yu Tian,
Yan Luo,
Louis R. Pasquale,
Tobias Elze,
Michael V. Boland,
Nazlee Zebardast,
David S. Friedman,
Lucy Q. Shen,
Mengyu Wang
Abstract:
Ophthalmic images and derivatives such as the retinal nerve fiber layer (RNFL) thickness map are crucial for detecting and monitoring ophthalmic diseases (e.g., glaucoma). For computer-aided diagnosis of eye diseases, the key technique is to automatically extract meaningful features from ophthalmic images that can reveal the biomarkers (e.g., RNFL thinning patterns) linked to functional vision los…
▽ More
Ophthalmic images and derivatives such as the retinal nerve fiber layer (RNFL) thickness map are crucial for detecting and monitoring ophthalmic diseases (e.g., glaucoma). For computer-aided diagnosis of eye diseases, the key technique is to automatically extract meaningful features from ophthalmic images that can reveal the biomarkers (e.g., RNFL thinning patterns) linked to functional vision loss. However, representation learning from ophthalmic images that links structural retinal damage with human vision loss is non-trivial mostly due to large anatomical variations between patients. The task becomes even more challenging in the presence of image artifacts, which are common due to issues with image acquisition and automated segmentation. In this paper, we propose an artifact-tolerant unsupervised learning framework termed EyeLearn for learning representations of ophthalmic images. EyeLearn has an artifact correction module to learn representations that can best predict artifact-free ophthalmic images. In addition, EyeLearn adopts a clustering-guided contrastive learning strategy to explicitly capture the intra- and inter-image affinities. During training, images are dynamically organized in clusters to form contrastive samples in which images in the same or different clusters are encouraged to learn similar or dissimilar representations, respectively. To evaluate EyeLearn, we use the learned representations for visual field prediction and glaucoma detection using a real-world ophthalmic image dataset of glaucoma patients. Extensive experiments and comparisons with state-of-the-art methods verified the effectiveness of EyeLearn for learning optimal feature representations from ophthalmic images.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
From Users to (Sense)Makers: On the Pivotal Role of Stigmergic Social Annotation in the Quest for Collective Sensemaking
Authors:
Ronen Tamari,
Daniel Friedman,
William Fischer,
Lauren Hebert,
Dafna Shahaf
Abstract:
The web has become a dominant epistemic environment, influencing people's beliefs at a global scale. However, online epistemic environments are increasingly polluted, impairing societies' ability to coordinate effectively in the face of global crises. We argue that centralized platforms are a main source of epistemic pollution, and that healthier environments require redesigning how we collectivel…
▽ More
The web has become a dominant epistemic environment, influencing people's beliefs at a global scale. However, online epistemic environments are increasingly polluted, impairing societies' ability to coordinate effectively in the face of global crises. We argue that centralized platforms are a main source of epistemic pollution, and that healthier environments require redesigning how we collectively govern attention. Inspired by decentralization and open source software movements, we propose Open Source Attention, a socio-technical framework for "freeing" human attention from control by platforms, through a decentralized eco-system for creating, storing and querying stigmergic markers; the digital traces of human attention.
△ Less
Submitted 4 August, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Single-dataset Experts for Multi-dataset Question Answering
Authors:
Dan Friedman,
Ben Dodge,
Danqi Chen
Abstract:
Many datasets have been created for training reading comprehension models, and a natural question is whether we can combine them to build models that (1) perform better on all of the training datasets and (2) generalize and transfer better to new datasets. Prior work has addressed this goal by training one network simultaneously on multiple datasets, which works well on average but is prone to ove…
▽ More
Many datasets have been created for training reading comprehension models, and a natural question is whether we can combine them to build models that (1) perform better on all of the training datasets and (2) generalize and transfer better to new datasets. Prior work has addressed this goal by training one network simultaneously on multiple datasets, which works well on average but is prone to over- or under-fitting different sub-distributions and might transfer worse compared to source models with more overlap with the target dataset. Our approach is to model multi-dataset question answering with a collection of single-dataset experts, by training a collection of lightweight, dataset-specific adapter modules (Houlsby et al., 2019) that share an underlying Transformer model. We find that these Multi-Adapter Dataset Experts (MADE) outperform all our baselines in terms of in-distribution accuracy, and simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance, offering a strong and versatile starting point for building new reading comprehension systems.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Factual Probing Is [MASK]: Learning vs. Learning to Recall
Authors:
Zexuan Zhong,
Dan Friedman,
Danqi Chen
Abstract:
Petroni et al. (2019) demonstrated that it is possible to retrieve world facts from a pre-trained language model by expressing them as cloze-style prompts and interpret the model's prediction accuracy as a lower bound on the amount of factual information it encodes. Subsequent work has attempted to tighten the estimate by searching for better prompts, using a disjoint set of facts as training data…
▽ More
Petroni et al. (2019) demonstrated that it is possible to retrieve world facts from a pre-trained language model by expressing them as cloze-style prompts and interpret the model's prediction accuracy as a lower bound on the amount of factual information it encodes. Subsequent work has attempted to tighten the estimate by searching for better prompts, using a disjoint set of facts as training data. In this work, we make two complementary contributions to better understand these factual probing techniques. First, we propose OptiPrompt, a novel and efficient method which directly optimizes in continuous embedding space. We find this simple method is able to predict an additional 6.4% of facts in the LAMA benchmark. Second, we raise a more important question: Can we really interpret these probing results as a lower bound? Is it possible that these prompt-search methods learn from the training data too? We find, somewhat surprisingly, that the training data used by these methods contains certain regularities of the underlying fact distribution, and all the existing prompt methods, including ours, are able to exploit them for better fact prediction. We conduct a set of control experiments to disentangle "learning" from "learning to recall", providing a more detailed picture of what different prompts can reveal about pre-trained language models.
△ Less
Submitted 14 December, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
Authors:
Michihiro Yasunaga,
Jungo Kasai,
Rui Zhang,
Alexander R. Fabbri,
Irene Li,
Dan Friedman,
Dragomir R. Radev
Abstract:
Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and…
▽ More
Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate the authors' original highlights (abstract) and the article's actual impacts on the community (citations), to create comprehensive, hybrid summaries. We conduct experiments to demonstrate the efficacy of our corpus in training data-driven models for scientific paper summarization and the advantage of our hybrid summaries over abstracts and traditional citation-based summaries. Our large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research.
△ Less
Submitted 15 September, 2019; v1 submitted 4 September, 2019;
originally announced September 2019.
-
Syntax-aware Neural Semantic Role Labeling with Supertags
Authors:
Jungo Kasai,
Dan Friedman,
Robert Frank,
Dragomir Radev,
Owen Rambow
Abstract:
We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models…
▽ More
We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models that performed SRL on the basis of a full dependency parse with more recent models that use no syntactic information at all. Our local and non-ensemble model achieves state-of-the-art performance on the CoNLL 09 English and Spanish datasets. SRL models benefit from syntactic information, and we show that supertagging is a simple, powerful, and robust way to incorporate syntax into a neural SRL system.
△ Less
Submitted 3 April, 2019; v1 submitted 12 March, 2019;
originally announced March 2019.
-
Recurrent Neural Networks for P300-based BCI
Authors:
Ori Tal,
Doron Friedman
Abstract:
P300-based spellers are one of the main methods for EEG-based brain-computer interface, and the detection of the P300 target event with high accuracy is an important prerequisite. The rapid serial visual presentation (RSVP) protocol is of high interest because it can be used by patients who have lost control over their eyes. In this study we wish to explore the suitability of recurrent neural netw…
▽ More
P300-based spellers are one of the main methods for EEG-based brain-computer interface, and the detection of the P300 target event with high accuracy is an important prerequisite. The rapid serial visual presentation (RSVP) protocol is of high interest because it can be used by patients who have lost control over their eyes. In this study we wish to explore the suitability of recurrent neural networks (RNNs) as a machine learning method for identifying the P300 signal in RSVP data. We systematically compare RNN with alternative methods such as linear discriminant analysis (LDA) and convolutional neural network (CNN). Our results indicate that LDA performs as well as the neural network models or better on single subject data, but a network combining CNN and RNN has advantages when transferring learning among subejcts, and is significantly more resilient to temporal noise than other methods.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
Deep density networks and uncertainty in recommender systems
Authors:
Yoel Zeldes,
Stavros Theodorakis,
Efrat Solodnik,
Aviv Rotman,
Gil Chamiel,
Dan Friedman
Abstract:
Building robust online content recommendation systems requires learning complex interactions between user preferences and content features. The field has evolved rapidly in recent years from traditional multi-arm bandit and collaborative filtering techniques, with new methods employing Deep Learning models to capture non-linearities. Despite progress, the dynamic nature of online recommendations s…
▽ More
Building robust online content recommendation systems requires learning complex interactions between user preferences and content features. The field has evolved rapidly in recent years from traditional multi-arm bandit and collaborative filtering techniques, with new methods employing Deep Learning models to capture non-linearities. Despite progress, the dynamic nature of online recommendations still poses great challenges, such as finding the delicate balance between exploration and exploitation. In this paper we show how uncertainty estimations can be incorporated by employing them in an optimistic exploitation/exploration strategy for more efficient exploration of new recommendations. We provide a novel hybrid deep neural network model, Deep Density Networks (DDN), which integrates content-based deep learning models with a collaborative scheme that is able to robustly model and estimate uncertainty. Finally, we present online and offline results after incorporating DNN into a real world content recommendation system that serves billions of recommendations per day, and show the benefit of using DDN in practice.
△ Less
Submitted 6 May, 2018; v1 submitted 7 November, 2017;
originally announced November 2017.
-
A Framework for Extending microKanren with Constraints
Authors:
Jason Hemann,
Daniel P. Friedman
Abstract:
We present a framework for building CLP languages with symbolic constraints based on microKanren, a domain-specific logic language shallowly embedded in Racket. We rely on Racket's macro system to generate a constraint solver and other components of the microKanren embedding. The framework itself and the constraints' implementations amounts to just over 100 lines of code. Our framework is both a t…
▽ More
We present a framework for building CLP languages with symbolic constraints based on microKanren, a domain-specific logic language shallowly embedded in Racket. We rely on Racket's macro system to generate a constraint solver and other components of the microKanren embedding. The framework itself and the constraints' implementations amounts to just over 100 lines of code. Our framework is both a teachable implementation for CLP as well as a test-bed and prototyping tool for symbolic constraint systems.
△ Less
Submitted 3 January, 2017;
originally announced January 2017.
-
A computer program for simulating time travel and a possible 'solution' for the grandfather paradox
Authors:
Doron Friedman
Abstract:
While the possibility of time travel in physics is still debated, the explosive growth of virtual-reality simulations opens up new possibilities to rigorously explore such time travel and its consequences in the digital domain. Here we provide a computational model of time travel and a computer program that allows exploring digital time travel. In order to explain our method we formalize a simplif…
▽ More
While the possibility of time travel in physics is still debated, the explosive growth of virtual-reality simulations opens up new possibilities to rigorously explore such time travel and its consequences in the digital domain. Here we provide a computational model of time travel and a computer program that allows exploring digital time travel. In order to explain our method we formalize a simplified version of the famous grandfather paradox, show how the system can allow the participant to go back in time, try to kill their ancestors before they were born, and experience the consequences. The system has even come up with scenarios that can be considered consistent "solutions" of the grandfather paradox. We discuss the conditions for digital time travel, which indicate that it has a large number of practical applications.
△ Less
Submitted 26 September, 2016;
originally announced September 2016.
-
Efficient Route Tracing from a Single Source
Authors:
Benoit Donnet Philippe Raoult Timur Friedman
Abstract:
Traceroute is a networking tool that allows one to discover the path that packets take from a source machine, through the network, to a destination machine. It is widely used as an engineering tool, and also as a scientific tool, such as for discovery of the network topology at the IP level. In prior work, authors on this technical report have shown how to improve the efficiency of route tracing…
▽ More
Traceroute is a networking tool that allows one to discover the path that packets take from a source machine, through the network, to a destination machine. It is widely used as an engineering tool, and also as a scientific tool, such as for discovery of the network topology at the IP level. In prior work, authors on this technical report have shown how to improve the efficiency of route tracing from multiple cooperating monitors. However, it is not unusual for a route tracing monitor to operate in isolation. Somewhat different strategies are required for this case, and this report is the first systematic study of those requirements. Standard traceroute is inefficient when used repeatedly towards multiple destinations, as it repeatedly probes the same interfaces close to the source. Others have recognized this inefficiency and have proposed tracing backwards from the destinations and stopping probing upon encounter with a previously-seen interface. One of this technical report's contributions is to quantify for the first time the efficiency of this approach. Another contribution is to describe the effect of non-responding destinations on this efficiency. Since a large portion of destination machines do not reply to probe packets, backwards probing from the destination is often infeasible. We propose an algorithm to tackle non-responding destinations, and we find that our algorithm can strongly decrease probing redundancy at the cost of a small reduction in node and link discovery.
△ Less
Submitted 29 May, 2006;
originally announced May 2006.