Skip to main content

Showing 1–26 of 26 results for author: Scheidegger, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.08069  [pdf, other

    cs.LG

    Persistent Classification: A New Approach to Stability of Data and Adversarial Examples

    Authors: Brian Bell, Michael Geyer, David Glickenstein, Keaton Hamm, Carlos Scheidegger, Amanda Fernandez, Juston Moore

    Abstract: There are a number of hypotheses underlying the existence of adversarial examples for classification problems. These include the high-dimensionality of the data, high codimension in the ambient space of the data manifolds of interest, and that the structure of machine learning models may encourage classifiers to develop decision boundaries close to data points. This article proposes a new framewor… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2209.07616  [pdf, other

    cs.SI

    Reducing Access Disparities in Networks using Edge Augmentation

    Authors: Ashkan Bashardoust, Sorelle A. Friedler, Carlos E. Scheidegger, Blair D. Sullivan, Suresh Venkatasubramanian

    Abstract: In social networks, a node's position is a form of \it{social capital}. Better-positioned members not only benefit from (faster) access to diverse information, but innately have more potential influence on information spread. Structural biases often arise from network formation, and can lead to significant disparities in information access based on position. Further, processes such as link recomme… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  3. arXiv:2208.00109  [pdf, other

    cs.HC

    Traveler: Navigating Task Parallel Traces for Performance Analysis

    Authors: Sayef Azad Sakin, Alex Bigelow, R. Tohid, Connor Scully-Allison, Carlos Scheidegger, Steven R. Brandt, Christopher Taylor, Kevin A. Huck, Hartmut Kaiser, Katherine E. Isaacs

    Abstract: Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large savings in terms of computational resource use. To aid performance analysis, developers may collect an execution trace - a chronological log of program activit… ▽ More

    Submitted 3 September, 2022; v1 submitted 29 July, 2022; originally announced August 2022.

    Comments: IEEE VIS 2022

  4. arXiv:2111.01744  [pdf, other

    cs.HC cs.LG

    UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data

    Authors: Mateus Espadoto, Gabriel Appleby, Ashley Suh, Dylan Cashman, Mingwei Li, Carlos Scheidegger, Erik W Anderson, Remco Chang, Alexandru C Telea

    Abstract: Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection -- the process of mapping the projected points, or more generally, the projection space back to the origina… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

  5. arXiv:2110.09431  [pdf, other

    cs.HC cs.CV cs.LG

    Comparing Deep Neural Nets with UMAP Tour

    Authors: Mingwei Li, Carlos Scheidegger

    Abstract: Neural networks should be interpretable to humans. In particular, there is a growing interest in concepts learned in a layer and similarity between layers. In this work, a tool, UMAP Tour, is built to visually inspect and compare internal behavior of real-world neural network models using well-aligned, instance-level representations. The method used in the visualization also implies a new similari… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  6. arXiv:2109.00197  [pdf, other

    cs.HC

    STFT-LDA: An Algorithm to Facilitate the Visual Analysis of Building Seismic Responses

    Authors: Zhenge Zhao, Danilo Motta, Matthew Berger, Joshua A. Levine, Ismail B. Kuzucu, Robert B. Fleischman, Afonso Paiva, Carlos Scheidegger

    Abstract: Civil engineers use numerical simulations of a building's responses to seismic forces to understand the nature of building failures, the limitations of building codes, and how to determine the latter to prevent the former. Such simulations generate large ensembles of multivariate, multiattribute time series. Comprehensive understanding of this data requires techniques that support the multivariate… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: 16 pages, 10 figures

  7. arXiv:2108.03738  [pdf

    cs.HC

    Human-in-the-loop Extraction of Interpretable Concepts in Deep Learning Models

    Authors: Zhenge Zhao, Panpan Xu, Carlos Scheidegger, Liu Ren

    Abstract: The interpretation of deep neural networks (DNNs) has become a key topic as more and more people apply them to solve various problems and making critical decisions. Concept-based explanations have recently become a popular approach for post-hoc interpretation of DNNs. However, identifying human-understandable visual concepts that affect model decisions is a challenging task that is not easily addr… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

    Comments: 11 pages, 10 figures

  8. arXiv:2010.12611  [pdf, other

    cs.SI

    Information access representations and social capital in networks

    Authors: Ashkan Bashardoust, Hannah C. Beilinson, Sorelle A. Friedler, Jiajie Ma, Jade Rousseau, Carlos E. Scheidegger, Blair D. Sullivan, Nasanbayar Ulzii-Orshikh, Suresh Venkatasubramanian

    Abstract: Social network position confers power and social capital. In the setting of online social networks that have massive reach, creating mathematical representations of social capital is an important step towards understanding how network position can differentially confer advantage to different groups and how network position can itself be a source of advantage. In this paper, we use well established… ▽ More

    Submitted 16 October, 2023; v1 submitted 23 October, 2020; originally announced October 2020.

  9. arXiv:2002.11097  [pdf, other

    cs.AI cs.LG stat.ML

    Problems with Shapley-value-based explanations as feature importance measures

    Authors: I. Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, Sorelle Friedler

    Abstract: Game-theoretic formulations of feature importance have become popular as a way to "explain" machine learning models. These methods define a cooperative game between the features of a model and distribute influence among these input elements using some form of the game's unique Shapley values. Justification for these methods rests on two pillars: their desirable mathematical properties, and their a… ▽ More

    Submitted 30 June, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted to ICML 2020

  10. arXiv:1907.02872  [pdf, other

    cs.HC cs.SE

    Anteater: Interactive Visualization of Program Execution Values in Context

    Authors: Rebecca Faust, Katherine Isaacs, William Z. Bernstein, Michael Sharp, Carlos Scheidegger

    Abstract: Debugging is famously one the hardest parts in programming. In this paper, we tackle the question: what does a debugging environment look like when we take interactive visualization as a central design principle? We introduce Anteater, an interactive visualization system for tracing and exploring the execution of Python programs. Existing systems often have visualization components built on top of… ▽ More

    Submitted 26 February, 2024; v1 submitted 5 July, 2019; originally announced July 2019.

    Comments: 31 pages, 9 figures, 3 tables

  11. arXiv:1906.08652  [pdf, other

    cs.LG stat.ML

    Disentangling Influence: Using Disentangled Representations to Audit Model Predictions

    Authors: Charles T. Marx, Richard Lanas Phillips, Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian

    Abstract: Motivated by the need to audit complex and black box models, there has been extensive research on quantifying how data features influence model predictions. Feature influence can be direct (a direct influence on model outcomes) and indirect (model outcomes are influenced via proxy features). Feature influence can also be expressed in aggregate over the training or test data or locally with respect… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  12. arXiv:1903.02047  [pdf, other

    cs.SI physics.soc-ph

    Gaps in Information Access in Social Networks

    Authors: Benjamin Fish, Ashkan Bashardoust, danah boyd, Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian

    Abstract: The study of influence maximization in social networks has largely ignored disparate effects these algorithms might have on the individuals contained in the social network. Individuals may place a high value on receiving information, e.g. job openings or advertisements for loans. While well-connected individuals at the center of the network are likely to receive the information that is being distr… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

    Comments: Accepted at The Web Conference 2019

  13. arXiv:1902.03501  [pdf, other

    cs.LG cs.HC stat.ML

    Assessing the Local Interpretability of Machine Learning Models

    Authors: Dylan Slack, Sorelle A. Friedler, Carlos Scheidegger, Chitradeep Dutta Roy

    Abstract: The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on two definitions of interpretability that have been introduced in the machine learning literature: simulatability (a user's ability to run a model on a given input… ▽ More

    Submitted 2 August, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

  14. arXiv:1901.09565  [pdf, other

    cs.LG stat.ML

    Fairness in representation: quantifying stereotyping as a representational harm

    Authors: Mohsen Abbasi, Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian

    Abstract: While harms of allocation have been increasingly studied as part of the subfield of algorithmic fairness, harms of representation have received considerably less attention. In this paper, we formalize two notions of stereotyping and show how they manifest in later allocative harms within the machine learning pipeline. We also propose mitigation strategies and demonstrate their effectiveness on syn… ▽ More

    Submitted 28 January, 2019; originally announced January 2019.

    Comments: 9 pages, 6 figures, Siam International Conference on Data Mining

  15. arXiv:1808.08983  [pdf, other

    cs.DB

    NeuralCubes: Deep Representations for Visual Data Exploration

    Authors: Zhe Wang, Dylan Cashman, Mingwei Li, Jixian Li, Matthew Berger, Joshua A. Levine, Remco Chang, Carlos Scheidegger

    Abstract: Visual exploration of large multidimensional datasets has seen tremendous progress in recent years, allowing users to express rich data queries that produce informative visual summaries, all in real time. Techniques based on data cubes are some of the most promising approaches. However, these techniques usually require a large memory footprint for large datasets. To tackle this problem, we present… ▽ More

    Submitted 10 July, 2019; v1 submitted 27 August, 2018; originally announced August 2018.

  16. arXiv:1806.08460  [pdf, other

    cs.CG cs.GR

    Homology-Preserving Dimensionality Reduction via Manifold Landmarking and Tearing

    Authors: Lin Yan, Yaodong Zhao, Paul Rosen, Carlos Scheidegger, Bei Wang

    Abstract: Dimensionality reduction is an integral part of data visualization. It is a process that obtains a structure preserving low-dimensional representation of the high-dimensional data. Two common criteria can be used to achieve a dimensionality reduction: distance preservation and topology preservation. Inspired by recent work in topological data analysis, we are on the quest for a dimensionality redu… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

  17. arXiv:1802.04422  [pdf, other

    stat.ML cs.CY cs.LG

    A comparative study of fairness-enhancing interventions in machine learning

    Authors: Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, Derek Roth

    Abstract: Computers are increasingly used to make decisions that have significant impact in people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers and predictors have appeared in the literature. This paper seeks to study the following questions:… ▽ More

    Submitted 12 February, 2018; originally announced February 2018.

  18. Persistent Homology Guided Force-Directed Graph Layouts

    Authors: Ashley Suh, Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen

    Abstract: Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to analyze. Node-link diagrams are popular for drawing graphs, and force-directed layouts provide a flexible method for node arrangements that use local relationships in an attempt to reveal the global shape of the graph. However, clutter and overlap of unrelated structures can lead to conf… ▽ More

    Submitted 4 October, 2019; v1 submitted 15 December, 2017; originally announced December 2017.

    Journal ref: IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 697-707, Jan. 2020

  19. DimReader: Axis lines that explain non-linear projections

    Authors: Rebecca Faust, David Glickenstein, Carlos Scheidegger

    Abstract: Non-linear dimensionality reduction (NDR) methods such as LLE and t-SNE are popular with visualization researchers and experienced data analysts, but present serious problems of interpretation. In this paper, we present DimReader, a technique that recovers readable axes from such techniques. DimReader is based on analyzing infinitesimal perturbations of the dataset with respect to variables of int… ▽ More

    Submitted 30 July, 2018; v1 submitted 3 October, 2017; originally announced October 2017.

    Comments: 13 Pages, 12 Figures

    Journal ref: IEEE transactions on visualization and computer graphics 25.1 (2018): 481-490

  20. arXiv:1707.06683  [pdf, other

    cs.GR cs.CG

    Visual Detection of Structural Changes in Time-Varying Graphs Using Persistent Homology

    Authors: Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen

    Abstract: Topological data analysis is an emerging area in exploratory data analysis and data mining. Its main tool, persistent homology, has become a popular technique to study the structure of complex, high-dimensional data. In this paper, we propose a novel method using persistent homology to quantify structural changes in time-varying graphs. Specifically, we transform each instance of the time-varying… ▽ More

    Submitted 2 October, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

  21. arXiv:1706.09847  [pdf, other

    cs.CY stat.ML

    Runaway Feedback Loops in Predictive Policing

    Authors: Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, Suresh Venkatasubramanian

    Abstract: Predictive policing systems are increasingly used to determine how to allocate police across a city in order to best prevent crime. Discovered crime data (e.g., arrest counts) are used to help update the model, and the process is repeated. Such systems have been empirically shown to be susceptible to runaway feedback loops, where police are repeatedly sent back to the same neighborhoods regardless… ▽ More

    Submitted 21 December, 2017; v1 submitted 29 June, 2017; originally announced June 2017.

    Comments: Extended version accepted to the 1st Conference on Fairness, Accountability and Transparency, 2018. Adds further treatment of reported as well as discovered incidents

  22. arXiv:1609.07236  [pdf, other

    cs.CY stat.ML

    On the (im)possibility of fairness

    Authors: Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian

    Abstract: What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the "observed" space) and outputs (the "decision" s… ▽ More

    Submitted 23 September, 2016; originally announced September 2016.

  23. arXiv:1602.07043  [pdf, other

    stat.ML cs.LG

    Auditing Black-box Models for Indirect Influence

    Authors: Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, Suresh Venkatasubramanian

    Abstract: Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior, and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models, or asserting that certain problematic attribute… ▽ More

    Submitted 30 November, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: Final version of paper that appears in the IEEE International Conference on Data Mining (ICDM), 2016

  24. arXiv:1503.00582  [pdf, other

    cs.HC

    Towards Understanding Enjoyment and Flow in Information Visualization

    Authors: Bahador Saket, Carlos Scheidegger, Stephen Kobourov

    Abstract: Traditionally, evaluation studies in information visualization have measured effectiveness by assessing performance time and accuracy. More recently, there has been a concerted effort to understand aspects beyond time and errors. In this paper we study enjoyment, which, while arguably not the primary goal of visualization, has been shown to impact performance and memorability. Different models of… ▽ More

    Submitted 2 March, 2015; originally announced March 2015.

  25. arXiv:1412.3756  [pdf, other

    stat.ML cs.CY

    Certifying and removing disparate impact

    Authors: Michael Feldman, Sorelle Friedler, John Moeller, Carlos Scheidegger, Suresh Venkatasubramanian

    Abstract: What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process. When th… ▽ More

    Submitted 15 July, 2015; v1 submitted 11 December, 2014; originally announced December 2014.

    Comments: Extended version of paper accepted at 2015 ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  26. arXiv:1208.5801  [pdf, other

    cs.LG

    Vector Field k-Means: Clustering Trajectories by Fitting Multiple Vector Fields

    Authors: Nivan Ferreira, James T. Klosowski, Carlos Scheidegger, Claudio Silva

    Abstract: Scientists study trajectory data to understand trends in movement patterns, such as human mobility for traffic analysis and urban planning. There is a pressing need for scalable and efficient techniques for analyzing this data and discovering the underlying patterns. In this paper, we introduce a novel technique which we call vector-field $k$-means. The central idea of our approach is to use vec… ▽ More

    Submitted 31 August, 2012; v1 submitted 28 August, 2012; originally announced August 2012.

    Comments: 30 pages, 15 figures