Skip to main content

Showing 1–42 of 42 results for author: Wattenberg, M

.
  1. arXiv:2505.04741  [pdf, other

    cs.LG cs.AI cs.CL

    When Bad Data Leads to Good Models

    Authors: Kenneth Li, Yida Chen, Fernanda Viégas, Martin Wattenberg

    Abstract: In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy e… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  2. arXiv:2504.14379  [pdf, other

    cs.AI cs.LG

    The Geometry of Self-Verification in a Task-Specific Reasoning Model

    Authors: Andrew Lee, Lihao Sun, Chris Wendler, Fernanda Viégas, Martin Wattenberg

    Abstract: How do reasoning models verify their own answers? We study this question by training a model using DeepSeek R1's recipe on the CountDown task. We leverage the fact that preference tuning leads to mode collapse, yielding a model that always produces highly structured chain-of-thought sequences. With this setup, we do top-down and bottom-up analyses to reverse-engineer how the model verifies its out… ▽ More

    Submitted 11 May, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

  3. arXiv:2503.21073  [pdf, other

    cs.CL cs.LG

    Shared Global and Local Geometry of Language Model Embeddings

    Authors: Andrew Lee, Melanie Weber, Fernanda Viégas, Martin Wattenberg

    Abstract: Researchers have recently suggested that models share common representations. In our work, we find that token embeddings of language models exhibit common geometric structure. First, we find ``global'' similarities: token embeddings often share similar relative orientations. Next, we characterize local geometry in two ways: (1) by using Locally Linear Embeddings, and (2) by defining a simple measu… ▽ More

    Submitted 23 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  4. arXiv:2502.12892  [pdf, other

    cs.CV

    Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

    Authors: Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba Ba, Talia Konkle

    Abstract: Sparse Autoencoders (SAEs) have emerged as a powerful framework for machine learning interpretability, enabling the unsupervised decomposition of model representations into a dictionary of abstract, human-interpretable concepts. However, we reveal a fundamental limitation: existing SAEs exhibit severe instability, as identical models trained on similar datasets can produce sharply different dictio… ▽ More

    Submitted 23 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Journal ref: Proceedings of the 42nd International Conference on Machine Learning (ICML), 2025

  5. arXiv:2501.16496  [pdf, other

    cs.LG

    Open Problems in Mechanistic Interpretability

    Authors: Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Bloom, Stella Biderman, Adria Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, David Bau, Eric Todd, Atticus Geiger , et al. (4 additional authors not shown)

    Abstract: Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals,… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  6. arXiv:2501.00070  [pdf, other

    cs.CL cs.AI cs.LG

    ICLR: In-Context Learning of Representations

    Authors: Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang, Maya Okawa, Kento Nishi, Martin Wattenberg, Hidenori Tanaka

    Abstract: Recent work has demonstrated that semantics specified by pretraining data influence how representations of different concepts are organized in a large language model (LLM). However, given the open-ended nature of LLMs, e.g., their ability to in-context learn, we can ask whether models alter these pretraining semantics to adopt alternative, context-specified ones. Specifically, if we provide in-con… ▽ More

    Submitted 2 May, 2025; v1 submitted 29 December, 2024; originally announced January 2025.

    Comments: ICLR 2025

    Journal ref: International Conference on Learning Representations, 2025

  7. arXiv:2407.14662  [pdf, ps, other

    cs.AI cs.LG

    Relational Composition in Neural Networks: A Survey and Call to Action

    Authors: Martin Wattenberg, Fernanda B. Viégas

    Abstract: Many neural nets appear to represent data as linear combinations of "feature vectors." Algorithms for discovering these vectors have seen impressive recent success. However, we argue that this success is incomplete without an understanding of relational composition: how (or whether) neural nets combine feature vectors to represent more complicated relationships. To facilitate research in this area… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  8. arXiv:2406.11978  [pdf, other

    cs.CL cs.AI cs.LG

    Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

    Authors: Kenneth Li, Yiming Wang, Fernanda Viégas, Martin Wattenberg

    Abstract: We present an approach called Dialogue Action Tokens (DAT) that adapts language model agents to plan goal-directed dialogues. The core idea is to treat each utterance as an action, thereby converting dialogues into games where existing approaches such as reinforcement learning can be applied. Specifically, we freeze a pretrained language model and train a small planner model that predicts a contin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/likenneth/dialogue_action_token

  9. arXiv:2406.07882  [pdf, other

    cs.CL cs.AI cs.HC

    Designing a Dashboard for Transparency and Control of Conversational AI

    Authors: Yida Chen, Aoyu Wu, Trevor DePodesta, Catherine Yeh, Kenneth Li, Nicholas Castillo Marin, Oam Patel, Jan Riecke, Shivam Raval, Olivia Seow, Martin Wattenberg, Fernanda Viégas

    Abstract: Conversational LLMs function as black box systems, leaving users guessing about why they see the output they do. This lack of transparency is potentially problematic, especially given concerns around bias and truthfulness. To address this issue, we present an end-to-end prototype-connecting interpretability techniques with user experience design-that seeks to make chatbots more transparent. We beg… ▽ More

    Submitted 14 October, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Project page: https://bit.ly/talktuner-project-page, 38 pages, 23 figures

  10. arXiv:2402.14688  [pdf, other

    cs.LG

    Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

    Authors: Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

    Abstract: We present an approach called Q-probing to adapt a pre-trained language model to maximize a task-specific reward function. At a high level, Q-probing sits between heavier approaches such as finetuning and lighter approaches such as few shot prompting, but can also be combined with either. The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candid… ▽ More

    Submitted 2 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  11. arXiv:2402.10962  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

    Authors: Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

    Abstract: System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating… ▽ More

    Submitted 25 July, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: COLM 2024; Code and data: https://github.com/likenneth/persona_drift

  12. arXiv:2401.01967  [pdf, other

    cs.CL cs.AI

    A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

    Authors: Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea

    Abstract: While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks. In this work we study a popular algorithm, direct preference optimization (DPO), and the mechanisms by which it reduces toxicity. Namel… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  13. arXiv:2311.00710  [pdf, other

    cs.HC cs.AI

    Interactive AI Alignment: Specification, Process, and Evaluation Alignment

    Authors: Michael Terry, Chinmay Kulkarni, Martin Wattenberg, Lucas Dixon, Meredith Ringel Morris

    Abstract: Modern AI enables a high-level, declarative form of interaction: Users describe the intended outcome they wish an AI to produce, but do not actually create the outcome themselves. In contrast, in traditional user interfaces, users invoke specific operations to create the desired outcome. This paper revisits the basic input-output interaction cycle in light of this declarative style of interaction,… ▽ More

    Submitted 16 September, 2024; v1 submitted 23 October, 2023; originally announced November 2023.

  14. ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

    Authors: Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, Elena Glassman

    Abstract: Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. Chain… ▽ More

    Submitted 3 May, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: 18 pages, 7 figures, published at CHI 2024

    ACM Class: H.5.2; I.2

  15. arXiv:2309.00941  [pdf, other

    cs.LG

    Emergent Linear Representations in World Models of Self-Supervised Sequence Models

    Authors: Neel Nanda, Andrew Lee, Martin Wattenberg

    Abstract: How do sequence models represent their decision-making process? Prior work suggests that Othello-playing neural network learned nonlinear models of the board state (Li et al., 2023). In this work, we provide evidence of a closely related linear representation of the board. In particular, we show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the… ▽ More

    Submitted 7 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

  16. arXiv:2308.09124  [pdf, other

    cs.CL

    Linearity of Relation Decoding in Transformer Language Models

    Authors: Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau

    Abstract: Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations: relations between words and their synonyms, entities and their attributes, etc. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation. Linear relation representations may be obtained by constructing a fir… ▽ More

    Submitted 15 February, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

  17. arXiv:2306.05720  [pdf, other

    cs.CV cs.AI cs.LG

    Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

    Authors: Yida Chen, Fernanda Viégas, Martin Wattenberg

    Abstract: Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple… ▽ More

    Submitted 4 November, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: A short version of this paper is accepted in the NeurIPS 2023 Workshop on Diffusion Models: https://nips.cc/virtual/2023/74894

  18. arXiv:2306.03341  [pdf, other

    cs.LG cs.AI cs.CL

    Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

    Authors: Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

    Abstract: We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLa… ▽ More

    Submitted 26 June, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 spotlight; code: https://github.com/likenneth/honest_llama

  19. arXiv:2305.03210  [pdf, other

    cs.HC cs.CL cs.CV cs.LG

    AttentionViz: A Global View of Transformer Attention

    Authors: Catherine Yeh, Yida Chen, Aoyu Wu, Cynthia Chen, Fernanda Viégas, Martin Wattenberg

    Abstract: Transformer models are revolutionizing machine learning, but their inner workings remain mysterious. In this work, we present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers that allows these models to learn rich, contextual relationships between elements of a sequence. The main idea behind our method is to visualize a joint embedd… ▽ More

    Submitted 9 August, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: 11 pages, 13 figures

  20. arXiv:2305.02469  [pdf, other

    cs.HC cs.AI cs.LG

    The System Model and the User Model: Exploring AI Dashboard Design

    Authors: Fernanda Viégas, Martin Wattenberg

    Abstract: This is a speculative essay on interface design and artificial intelligence. Recently there has been a surge of attention to chatbots based on large language models, including widely reported unsavory interactions. We contend that part of the problem is that text is not all you need: sophisticated AI systems should have dashboards, just like all other complicated devices. Assuming the hypothesis t… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 10 pages, 2 figures

  21. Investigating How Practitioners Use Human-AI Guidelines: A Case Study on the People + AI Guidebook

    Authors: Nur Yildirim, Mahima Pushkarna, Nitesh Goyal, Martin Wattenberg, Fernanda Viegas

    Abstract: Artificial intelligence (AI) presents new challenges for the user experience (UX) of products and services. Recently, practitioner-facing resources and design guidelines have become available to ease some of these challenges. However, little research has investigated if and how these guidelines are used, and how they impact practice. In this paper, we investigated how industry practitioners use th… ▽ More

    Submitted 20 April, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

  22. arXiv:2210.13382  [pdf, other

    cs.LG cs.AI cs.CL

    Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

    Authors: Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

    Abstract: Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple boa… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: ICLR 2023 oral (notable-top-5%): https://openreview.net/forum?id=DeG07_TcZvT ; code: https://github.com/likenneth/othello_world

  23. arXiv:2209.10652  [pdf

    cs.LG

    Toy Models of Superposition

    Authors: Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Christopher Olah

    Abstract: Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Also available at https://transformer-circuits.pub/2022/toy_model/index.html

  24. arXiv:2202.07399  [pdf, other

    gr-qc astro-ph.HE cs.AI

    Interpreting a Machine Learning Model for Detecting Gravitational Waves

    Authors: Mohammadtaher Safarzadeh, Asad Khan, E. A. Huerta, Martin Wattenberg

    Abstract: We describe a case study of translational research, applying interpretability techniques developed for computer vision to machine learning models used to search for and find gravitational waves. The models we study are trained to detect black hole merger events in non-Gaussian and non-stationary advanced Laser Interferometer Gravitational-wave Observatory (LIGO) data. We produced visualizations of… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: 19 pages, to be submitted, comments are welcome. Movies based on this work can be accessed via: https://www.youtube.com/watch?v=SXFGMOtJwn0 https://www.youtube.com/watch?v=itVCj9gpmAs

  25. arXiv:2104.07143  [pdf, other

    cs.CL cs.LG

    An Interpretability Illusion for BERT

    Authors: Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif, Fernanda Viégas, Martin Wattenberg

    Abstract: We describe an "interpretability illusion" that arises when analyzing the BERT model. Activations of individual neurons in the network may spuriously appear to encode a single, simple concept, when in fact they are encoding something far more complex. The same effect holds for linear combinations of activations. We trace the source of this illusion to geometric properties of BERT's embedding space… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  26. arXiv:2012.00874  [pdf, other

    cs.CY

    "A cold, technical decision-maker": Can AI provide explainability, negotiability, and humanity?

    Authors: Allison Woodruff, Yasmin Asare Anderson, Katherine Jameson Armstrong, Marina Gkiza, Jay Jennings, Christopher Moessner, Fernanda Viegas, Martin Wattenberg, and Lynette Webb, Fabian Wrede, Patrick Gage Kelley

    Abstract: Algorithmic systems are increasingly deployed to make decisions in many areas of people's lives. The shift from human to algorithmic decision-making has been accompanied by concern about potentially opaque decisions that are not aligned with social values, as well as proposed remedies such as explainability. We present results of a qualitative study of algorithmic decision-making, comprised of fiv… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 23 pages, 1 appendix, 4 tables

    ACM Class: K.4; K.3.2; I.2

  27. The What-If Tool: Interactive Probing of Machine Learning Models

    Authors: James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, Jimbo Wilson

    Abstract: A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, anal… ▽ More

    Submitted 3 October, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: IEEE VIS (VAST) 2019

    ACM Class: H.5.2

  28. arXiv:1906.02715  [pdf, other

    cs.LG cs.CL stat.ML

    Visualizing and Measuring the Geometry of BERT

    Authors: Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg

    Abstract: Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of on… ▽ More

    Submitted 28 October, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: 8 pages, 5 figures

  29. arXiv:1903.01069  [pdf, other

    cs.LG stat.ML

    Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

    Authors: Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer

    Abstract: The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity. We use deep-learning methods to investigate whether natural scene statistics might be sufficient to derive the Gestalt laws. We examine the law of closure, which asserts that human visual perception… ▽ More

    Submitted 29 June, 2020; v1 submitted 3 March, 2019; originally announced March 2019.

  30. arXiv:1902.02960  [pdf

    cs.HC cs.CY

    Human-Centered Tools for Coping with Imperfect Algorithms during Medical Decision-Making

    Authors: Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, Michael Terry

    Abstract: Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  31. arXiv:1901.05350  [pdf, other

    cs.LG

    TensorFlow.js: Machine Learning for the Web and Beyond

    Authors: Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann Yuan, Nick Kreeger, Ping Yu, Kangyi Zhang, Shanqing Cai, Eric Nielsen, David Soergel, Stan Bileschi, Michael Terry, Charles Nicholson, Sandeep N. Gupta, Sarah Sirajuddin, D. Sculley, Rajat Monga, Greg Corrado, Fernanda B. Viégas, Martin Wattenberg

    Abstract: TensorFlow.js is a library for building and executing machine learning algorithms in JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is part of the TensorFlow ecosystem, providing a set of APIs that are compatible with those in Python, allowing models to be ported between the Python and JavaScript ecosystems. TensorFlow.js has empowered a new set o… ▽ More

    Submitted 27 February, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

    Comments: 10 pages, expanded performance section, fixed page breaks in code listings

  32. arXiv:1809.01587  [pdf, other

    cs.HC cs.AI cs.LG stat.ML

    GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation

    Authors: Minsuk Kahng, Nikhil Thorat, Duen Horng Chau, Fernanda Viégas, Martin Wattenberg

    Abstract: Recent success in deep learning has generated immense interest among practitioners and students, inspiring many to learn about this new technology. While visual and interactive approaches have been successfully developed to help people more easily learn deep learning, most existing tools focus on simpler models. In this work, we present GAN Lab, the first interactive visualization tool designed fo… ▽ More

    Submitted 5 September, 2018; originally announced September 2018.

    Comments: This paper will be published in the IEEE Transactions on Visualization and Computer Graphics, 25(1), January 2019, and presented at IEEE VAST 2018

  33. arXiv:1801.02774  [pdf, other

    cs.CV

    Adversarial Spheres

    Authors: Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow

    Abstract: State of the art computer vision models have been shown to be vulnerable to small adversarial perturbations of the input. In other words, most images in the data distribution are both correctly classified by the model and are very close to a visually similar misclassified image. Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. We h… ▽ More

    Submitted 10 September, 2018; v1 submitted 8 January, 2018; originally announced January 2018.

    MSC Class: 68T45 ACM Class: I.2.6

  34. arXiv:1711.11279  [pdf, other

    stat.ML

    Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

    Authors: Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres

    Abstract: The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-f… ▽ More

    Submitted 7 June, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

    Journal ref: ICML 2018

  35. arXiv:1708.03788  [pdf, other

    cs.LG cs.HC stat.ML

    Direct-Manipulation Visualization of Deep Networks

    Authors: Daniel Smilkov, Shan Carter, D. Sculley, Fernanda B. Viégas, Martin Wattenberg

    Abstract: The recent successes of deep learning have led to a wave of interest from non-experts. Gaining an understanding of this technology, however, is difficult. While the theory is important, it is also helpful for novices to develop an intuitive feel for the effect of different hyperparameters and structural variations. We describe TensorFlow Playground, an interactive, open sourced visualization that… ▽ More

    Submitted 12 August, 2017; originally announced August 2017.

  36. arXiv:1706.03825  [pdf, other

    cs.LG cs.CV stat.ML

    SmoothGrad: removing noise by adding noise

    Authors: Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, Martin Wattenberg

    Abstract: Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of explanation is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborat… ▽ More

    Submitted 12 June, 2017; originally announced June 2017.

    Comments: 10 pages

  37. arXiv:1611.05469  [pdf, other

    stat.ML cs.HC

    Embedding Projector: Interactive Visualization and Interpretation of Embeddings

    Authors: Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, Martin Wattenberg

    Abstract: Embeddings are ubiquitous in machine learning, appearing in recommender systems, NLP, and many other applications. Researchers and developers often need to explore the properties of a specific embedding, and one way to analyze embeddings is to visualize them. We present the Embedding Projector, a tool for interactive visualization and interpretation of embeddings.

    Submitted 16 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

  38. arXiv:1611.04558  [pdf, other

    cs.CL cs.AI

    Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

    Authors: Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

    Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, rem… ▽ More

    Submitted 21 August, 2017; v1 submitted 14 November, 2016; originally announced November 2016.

  39. arXiv:1603.04467  [pdf, other

    cs.DC cs.LG

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    Authors: Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah , et al. (15 additional authors not shown)

    Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational de… ▽ More

    Submitted 16 March, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

    Comments: Version 2 updates only the metadata, to correct the formatting of Martín Abadi's name

  40. Semiclassical Geometry of 4D Reduced Supersymmetric Yang-Mills Integrals

    Authors: Zdzislaw Burda, Bengt Petersson, Marc Wattenberg

    Abstract: We investigate semiclassical properties of space-time geometry of the low energy limit of reduced four dimensional supersymmetric Yang-Mills integrals using Monte-Carlo simulations. The limit is obtained by an one-loop approximation of the original Yang-Mills integrals leading to an effective model of branched polymers. We numerically determine the behaviour of the gyration radius, the two-point… ▽ More

    Submitted 13 April, 2005; v1 submitted 3 March, 2005; originally announced March 2005.

    Comments: 14 pages, 5 figures, corrected version v3

    Journal ref: JHEP 0503 (2005) 058

  41. arXiv:hep-th/0308194  [pdf, ps, other

    hep-th

    From 4D Reduced SYM Integrals to Branched-Polymers

    Authors: Zdzislaw Burda, Bengt Petersson, Marc Wattenberg

    Abstract: We derive analytically one-loop corrections to the effective Polyakov-line operator in the branched-polymer approximation of the reduced four-dimensional supersymmetric Yang-Mills integrals.

    Submitted 29 August, 2003; v1 submitted 28 August, 2003; originally announced August 2003.

    Comments: to be published in Acta Physica Polonica

    Journal ref: Acta Phys.Polon. B34 (2003) 4765-4776

  42. arXiv:cond-mat/0207459  [pdf, ps, other

    cond-mat hep-lat hep-th

    Exotic trees

    Authors: Z. Burda, J. Erdmann, B. Petersson, M. Wattenberg

    Abstract: We discuss the scaling properties of free branched polymers. The scaling behaviour of the model is classified by the Hausdorff dimensions for the internal geometry: d_L and d_H, and for the external one: D_L and D_H. The dimensions d_H and D_H characterize the behaviour for long distances while d_L and D_L for short distances. We show that the internal Hausdorff dimension is d_L=2 for generic an… ▽ More

    Submitted 18 July, 2002; originally announced July 2002.

    Comments: 33 pages, 6 eps figures

    Report number: BI-TP 2002/15, TPJU - 13/2002

    Journal ref: Phys.Rev. E67 (2003) 026105