Skip to main content

Showing 1–26 of 26 results for author: Friedman, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17500  [pdf, ps, other

    cond-mat.soft cs.AI

    The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes

    Authors: Vladimir Baulin, Austin Cook, Daniel Friedman, Janna Lumiruusu, Andrew Pashea, Shagor Rahman, Benedikt Waldeck

    Abstract: The prevailing model for disseminating scientific knowledge relies on individual publications dispersed across numerous journals and archives. This legacy system is ill suited to the recent exponential proliferation of publications, contributing to insurmountable information overload, issues surrounding reproducibility and retractions. We introduce the Discovery Engine, a framework to address thes… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2502.16794  [pdf, ps, other

    cs.SD cs.AI cs.CL cs.HC eess.AS

    AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

    Authors: Xilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Daniel Friedman, Adeen Flinker, Nima Mesgarani

    Abstract: Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned… ▽ More

    Submitted 10 June, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: Accepted by ACL 2025 Main Conference

  3. arXiv:2411.07175  [pdf, other

    cs.CL

    Continual Memorization of Factoids in Language Models

    Authors: Howard Chen, Jiayi Geng, Adithya Bhaskar, Dan Friedman, Danqi Chen

    Abstract: As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown that fine-tuning for memorization may be ineffective in storing knowledge or may exacerbate hallucinations. In this work, we introduce a setting we call continual memorizati… ▽ More

    Submitted 27 February, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

  4. arXiv:2410.01792  [pdf, other

    cs.CL cs.AI

    When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1

    Authors: R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, Thomas L. Griffiths

    Abstract: In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms prev… ▽ More

    Submitted 3 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: 6 pages; updated to fix typo in Fig 4 caption

  5. arXiv:2408.09548  [pdf, other

    cs.NE

    Enhancing Population-based Search with Active Inference

    Authors: Nassim Dehouche, Daniel Friedman

    Abstract: The Active Inference framework models perception and action as a unified process, where agents use probabilistic models to predict and actively minimize sensory discrepancies. In complement and contrast, traditional population-based metaheuristics rely on reactive environmental interactions without anticipatory adaptation. This paper proposes the integration of Active Inference into these metaheur… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2407.10949  [pdf, other

    cs.CL cs.AI cs.LG

    Representing Rule-based Chatbots with Transformers

    Authors: Dan Friedman, Abhishek Panigrahi, Danqi Chen

    Abstract: What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing formal languages, but it remains unclear how to extend this approach to a conversational setting. In this work, we propose using ELIZA, a classic rule-based chat… ▽ More

    Submitted 12 February, 2025; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: NAACL 2025. Code and data are available at https://github.com/princeton-nlp/ELIZA-Transformer

  7. arXiv:2406.16778  [pdf, other

    cs.CL

    Finding Transformer Circuits with Edge Pruning

    Authors: Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen

    Abstract: The path to interpreting a language model often proceeds via analysis of circuits -- sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they rely either on inefficient search algorithms or inaccurate approximations. In this paper, we frame automated… ▽ More

    Submitted 2 April, 2025; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 (Spotlight), code available at https://github.com/princeton-nlp/Edge-Pruning

  8. arXiv:2403.03942  [pdf, other

    cs.CL cs.LG

    The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models

    Authors: Adithya Bhaskar, Dan Friedman, Danqi Chen

    Abstract: Prior work has found that pretrained language models (LMs) fine-tuned with different random seeds can achieve similar in-domain performance but generalize differently on tests of syntactic generalization. In this work, we show that, even within a single model, we can find multiple subnetworks that perform similarly in-domain, but generalize vastly differently. To better understand these phenomena,… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL 2024

  9. arXiv:2312.03656  [pdf, other

    cs.LG cs.CL

    Interpretability Illusions in the Generalization of Simplified Models

    Authors: Dan Friedman, Andrew Lampinen, Lucas Dixon, Danqi Chen, Asma Ghandeharioun

    Abstract: A common method to study deep learning systems is to use simplified model representations--for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space. This approach assumes that the results of these simplifications are faithful to the original model. Here, we illustrate an important caveat to this assumption: even if the simplified represent… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  10. arXiv:2310.07106  [pdf, other

    cs.CL cs.AI cs.LG q-bio.NC

    The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models

    Authors: Ariel Goldstein, Eric Ham, Mariano Schain, Samuel Nastase, Zaid Zada, Avigail Dabush, Bobbi Aubrey, Harshvardhan Gazula, Amir Feder, Werner K Doyle, Sasha Devore, Patricia Dugan, Daniel Friedman, Roi Reichart, Michael Brenner, Avinatan Hassidim, Orrin Devinsky, Adeen Flinker, Omer Levy, Uri Hasson

    Abstract: Deep Language Models (DLMs) provide a novel computational paradigm for understanding the mechanisms of natural language processing in the human brain. Unlike traditional psycholinguistic models, DLMs use layered sequences of continuous numerical vectors to represent words and context, allowing a plethora of emerging applications such as human-like text generation. In this paper we show evidence th… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  11. arXiv:2309.13638  [pdf, other

    cs.CL cs.AI

    Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

    Authors: R. Thomas McCoy, Shunyu Yao, Dan Friedman, Matthew Hardy, Thomas L. Griffiths

    Abstract: The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies t… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 50 pages plus 11 page of references and 23 pages of appendices

  12. arXiv:2306.01128  [pdf, other

    cs.LG cs.CL

    Learning Transformer Programs

    Authors: Dan Friedman, Alexander Wettig, Danqi Chen

    Abstract: Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short of providing complete, faithful descriptions of the underlying algorithms. In this work, we introduce a procedure for training Transformers that are mechanistic… ▽ More

    Submitted 30 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (oral). Our code is available at https://github.com/princeton-nlp/TransformerPrograms

  13. arXiv:2305.13299  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations

    Authors: Chenglei Si, Dan Friedman, Nitish Joshi, Shi Feng, Danqi Chen, He He

    Abstract: In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels. F… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  14. arXiv:2210.11560  [pdf, other

    cs.CL

    Finding Dataset Shortcuts with Grammar Induction

    Authors: Dan Friedman, Alexander Wettig, Danqi Chen

    Abstract: Many NLP datasets have been found to contain shortcuts: simple decision rules that achieve surprisingly high accuracy. However, it is difficult to discover shortcuts automatically. Prior work on automatic shortcut detection has focused on enumerating features like unigrams or bigrams, which can find only low-level shortcuts, or relied on post-hoc model interpretability methods like saliency maps,… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022. Our code is publicly available at https://github.com/princeton-nlp/ShortcutGrammar

  15. arXiv:2210.02410  [pdf, other

    cs.LG cond-mat.mtrl-sci stat.ML

    The Vendi Score: A Diversity Evaluation Metric for Machine Learning

    Authors: Dan Friedman, Adji Bousso Dieng

    Abstract: Diversity is an important criterion for many areas of machine learning (ML), including generative modeling and dataset curation. However, existing metrics for measuring diversity are often domain-specific and limited in flexibility. In this paper, we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanic… ▽ More

    Submitted 2 July, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: The Vendi Score is available as a pip package at https://github.com/vertaix/Vendi-Score

  16. arXiv:2209.00773  [pdf, other

    cs.CV cs.AI

    Artifact-Tolerant Clustering-Guided Contrastive Embedding Learning for Ophthalmic Images

    Authors: Min Shi, Anagha Lokhande, Mojtaba S. Fazli, Vishal Sharma, Yu Tian, Yan Luo, Louis R. Pasquale, Tobias Elze, Michael V. Boland, Nazlee Zebardast, David S. Friedman, Lucy Q. Shen, Mengyu Wang

    Abstract: Ophthalmic images and derivatives such as the retinal nerve fiber layer (RNFL) thickness map are crucial for detecting and monitoring ophthalmic diseases (e.g., glaucoma). For computer-aided diagnosis of eye diseases, the key technique is to automatically extract meaningful features from ophthalmic images that can reveal the biomarkers (e.g., RNFL thinning patterns) linked to functional vision los… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: 10 pages

  17. From Users to (Sense)Makers: On the Pivotal Role of Stigmergic Social Annotation in the Quest for Collective Sensemaking

    Authors: Ronen Tamari, Daniel Friedman, William Fischer, Lauren Hebert, Dafna Shahaf

    Abstract: The web has become a dominant epistemic environment, influencing people's beliefs at a global scale. However, online epistemic environments are increasingly polluted, impairing societies' ability to coordinate effectively in the face of global crises. We argue that centralized platforms are a main source of epistemic pollution, and that healthier environments require redesigning how we collectivel… ▽ More

    Submitted 4 August, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: Blue-sky ideas track of the 33rd ACM Conference on Hypertext and Social Media, Barcelona, 2022 (updated references)

  18. arXiv:2109.13880  [pdf, other

    cs.CL

    Single-dataset Experts for Multi-dataset Question Answering

    Authors: Dan Friedman, Ben Dodge, Danqi Chen

    Abstract: Many datasets have been created for training reading comprehension models, and a natural question is whether we can combine them to build models that (1) perform better on all of the training datasets and (2) generalize and transfer better to new datasets. Prior work has addressed this goal by training one network simultaneously on multiple datasets, which works well on average but is prone to ove… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021. The code is available at https://github.com/princeton-nlp/MADE

  19. arXiv:2104.05240  [pdf, other

    cs.CL

    Factual Probing Is [MASK]: Learning vs. Learning to Recall

    Authors: Zexuan Zhong, Dan Friedman, Danqi Chen

    Abstract: Petroni et al. (2019) demonstrated that it is possible to retrieve world facts from a pre-trained language model by expressing them as cloze-style prompts and interpret the model's prediction accuracy as a lower bound on the amount of factual information it encodes. Subsequent work has attempted to tighten the estimate by searching for better prompts, using a disjoint set of facts as training data… ▽ More

    Submitted 14 December, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: NAACL 2021. The code is publicly available at https://github.com/princeton-nlp/OptiPrompt

  20. arXiv:1909.01716  [pdf, other

    cs.CL cs.IR cs.LG

    ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks

    Authors: Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander R. Fabbri, Irene Li, Dan Friedman, Dragomir R. Radev

    Abstract: Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and… ▽ More

    Submitted 15 September, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: AAAI 2019

  21. arXiv:1903.05260  [pdf, other

    cs.CL

    Syntax-aware Neural Semantic Role Labeling with Supertags

    Authors: Jungo Kasai, Dan Friedman, Robert Frank, Dragomir Radev, Owen Rambow

    Abstract: We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models… ▽ More

    Submitted 3 April, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: NAACL 2019, Added Spanish ELMo results

  22. arXiv:1901.10798  [pdf, other

    cs.NE

    Recurrent Neural Networks for P300-based BCI

    Authors: Ori Tal, Doron Friedman

    Abstract: P300-based spellers are one of the main methods for EEG-based brain-computer interface, and the detection of the P300 target event with high accuracy is an important prerequisite. The rapid serial visual presentation (RSVP) protocol is of high interest because it can be used by patients who have lost control over their eyes. In this study we wish to explore the suitability of recurrent neural netw… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Journal ref: Int'l BCI Conf Graz, Austria, 2017

  23. arXiv:1711.02487  [pdf, other

    cs.IR cs.LG

    Deep density networks and uncertainty in recommender systems

    Authors: Yoel Zeldes, Stavros Theodorakis, Efrat Solodnik, Aviv Rotman, Gil Chamiel, Dan Friedman

    Abstract: Building robust online content recommendation systems requires learning complex interactions between user preferences and content features. The field has evolved rapidly in recent years from traditional multi-arm bandit and collaborative filtering techniques, with new methods employing Deep Learning models to capture non-linearities. Despite progress, the dynamic nature of online recommendations s… ▽ More

    Submitted 6 May, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

  24. A Framework for Extending microKanren with Constraints

    Authors: Jason Hemann, Daniel P. Friedman

    Abstract: We present a framework for building CLP languages with symbolic constraints based on microKanren, a domain-specific logic language shallowly embedded in Racket. We rely on Racket's macro system to generate a constraint solver and other components of the microKanren embedding. The framework itself and the constraints' implementations amounts to just over 100 lines of code. Our framework is both a t… ▽ More

    Submitted 3 January, 2017; originally announced January 2017.

    Comments: In Proceedings WLP'15/'16/WFLP'16, arXiv:1701.00148

    ACM Class: D.1.6 Logic Programming; D.3.2 Constraint and Logic Languages; D.3.3 Constraints

    Journal ref: EPTCS 234, 2017, pp. 135-149

  25. arXiv:1609.08470  [pdf

    cs.AI

    A computer program for simulating time travel and a possible 'solution' for the grandfather paradox

    Authors: Doron Friedman

    Abstract: While the possibility of time travel in physics is still debated, the explosive growth of virtual-reality simulations opens up new possibilities to rigorously explore such time travel and its consequences in the digital domain. Here we provide a computational model of time travel and a computer program that allows exploring digital time travel. In order to explain our method we formalize a simplif… ▽ More

    Submitted 26 September, 2016; originally announced September 2016.

  26. arXiv:cs/0605133  [pdf, ps, other

    cs.NI

    Efficient Route Tracing from a Single Source

    Authors: Benoit Donnet Philippe Raoult Timur Friedman

    Abstract: Traceroute is a networking tool that allows one to discover the path that packets take from a source machine, through the network, to a destination machine. It is widely used as an engineering tool, and also as a scientific tool, such as for discovery of the network topology at the IP level. In prior work, authors on this technical report have shown how to improve the efficiency of route tracing… ▽ More

    Submitted 29 May, 2006; originally announced May 2006.