Skip to main content

Showing 1–16 of 16 results for author: G'Sell, M

.
  1. arXiv:2405.00752  [pdf, other

    cs.DL

    Clustering Running Titles to Understand the Printing of Early Modern Books

    Authors: Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick

    Abstract: We propose a novel computational approach to automatically analyze the physical process behind printing of early modern letterpress books via clustering the running titles found at the top of their pages. Specifically, we design and compare custom neural and feature-based kernels for computing pairwise visual similarity of a scanned document's running titles and cluster the titles in order to trac… ▽ More

    Submitted 22 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted at ICDAR 2024; updated Acknowledgments in v2

  2. arXiv:2306.07998  [pdf, other

    cs.CV cs.AI

    Contrastive Attention Networks for Attribution of Early Modern Print

    Authors: Nikolai Vogler, Kartik Goyal, Kishore PV Reddy, Elizaveta Pertseva, Samuel V. Lemley, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick

    Abstract: In this paper, we develop machine learning techniques to identify unknown printers in early modern (c.~1500--1800) English printed books. Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers in order to provide evidence of their origins. Until now, this work has been limited to manual investigations by analytical bibl… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Proceedings of AAAI 2023

  3. arXiv:2106.07623  [pdf, other

    stat.AP stat.ME

    Inference with generalizable classifier predictions

    Authors: Ciaran Evans, Zara Y. Weinberg, Manojkumar A. Puthenveedu, Max G'Sell

    Abstract: This paper addresses the problem of making statistical inference about a population that can only be identified through classifier predictions. The problem is motivated by scientific studies in which human labels of a population are replaced by a classifier. For downstream analysis of the population based on classifier predictions to be sound, the predictions must generalize equally across experim… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: 26 pages, 9 figures

  4. arXiv:2009.08592  [pdf, other

    stat.ME stat.ML

    Sequential changepoint detection in classification data under label shift

    Authors: Ciaran Evans, Max G'Sell

    Abstract: Classifier predictions often rely on the assumption that new observations come from the same distribution as training data. When the underlying distribution changes, so does the optimal classification rule, and performance may degrade. We consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the d… ▽ More

    Submitted 31 August, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: 25 pages, 3 figures, 4 tables

  5. arXiv:2005.01646  [pdf, other

    cs.LG cs.CL

    A Probabilistic Generative Model for Typographical Analysis of Early Modern Printing

    Authors: Kartik Goyal, Chris Dyer, Christopher Warren, Max G'Sell, Taylor Berg-Kirkpatrick

    Abstract: We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in printed Early Modern documents. We focus on clustering extracted glyph images into underlying templates in the presence of multiple confounding sources of variance. Our approach introduces a neural editor model that first generates well-understood printing phenomena like spatial perturbations from templat… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: To appear at ACL 2020

  6. arXiv:2003.13808  [pdf, other

    stat.ME cs.CY

    Fairness Evaluation in Presence of Biased Noisy Labels

    Authors: Riccardo Fogliato, Max G'Sell, Alexandra Chouldechova

    Abstract: Risk assessment tools are widely used around the country to inform decision making within the criminal justice system. Recently, considerable attention has been devoted to the question of whether such tools may suffer from racial bias. In this type of assessment, a fundamental issue is that the training and evaluation of the model is based on a variable (arrest) that may represent a noisy version… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Accepted at International Conference on Artificial Intelligence and Statistics (AISTATS), 2020

  7. arXiv:1812.03644  [pdf, other

    stat.ME

    Post-Selection Inference for Changepoint Detection Algorithms with Application to Copy Number Variation Data

    Authors: Sangwon Hyun, Kevin Lin, Max G'Sell, Ryan J. Tibshirani

    Abstract: Changepoint detection methods are used in many areas of science and engineering, e.g., in the analysis of copy number variation data, to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or presence) of given changepoints, post-detection, are lacking. Post-selection inference offers a fram… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

  8. arXiv:1801.03635  [pdf, other

    stat.ME

    Sharp instruments for classifying compliers and generalizing causal effects

    Authors: Edward H. Kennedy, Sivaraman Balakrishnan, Max G'Sell

    Abstract: It is well-known that, without restricting treatment effect heterogeneity, instrumental variable (IV) methods only identify "local" effects among compliers, i.e., those subjects who take treatment only when encouraged by the IV. Local effects are controversial since they seem to only apply to an unidentified subgroup; this has led many to denounce these effects as having little policy relevance. H… ▽ More

    Submitted 30 May, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

  9. arXiv:1707.00046  [pdf, other

    stat.AP cs.CY stat.ML

    Fairer and more accurate, but for whom?

    Authors: Alexandra Chouldechova, Max G'Sell

    Abstract: Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the… ▽ More

    Submitted 30 June, 2017; originally announced July 2017.

    Comments: Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

  10. arXiv:1611.05401  [pdf, other

    math.ST

    Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free Inference

    Authors: Alessandro Rinaldo, Larry Wasserman, Max G'Sell, Jing Lei

    Abstract: Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-free approach to inference and we establish results on… ▽ More

    Submitted 2 April, 2018; v1 submitted 16 November, 2016; originally announced November 2016.

    MSC Class: 62G05

  11. arXiv:1606.03552  [pdf, other

    stat.ME

    Exact Post-Selection Inference for Changepoint Detection and Other Generalized Lasso Problems

    Authors: Sangwon Hyun, Max G'Sell, Ryan J. Tibshirani

    Abstract: We study tools for inference conditioned on model selection events that are defined by the generalized lasso regularization path. The generalized lasso estimate is given by the solution of a penalized least squares regression problem, where the penalty is the l1 norm of a matrix D times the coefficient vector. The generalized lasso path collects these estimates for a range of penalty parameter (λ)… ▽ More

    Submitted 11 June, 2016; originally announced June 2016.

  12. arXiv:1604.04173  [pdf, other

    stat.ME math.ST stat.ML

    Distribution-Free Predictive Inference For Regression

    Authors: Jing Lei, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman

    Abstract: We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guarantee… ▽ More

    Submitted 8 March, 2017; v1 submitted 14 April, 2016; originally announced April 2016.

    Comments: 50 pages, 7 figures, 3 tables

  13. arXiv:1309.5352  [pdf, other

    math.ST stat.ME

    Sequential Selection Procedures and False Discovery Rate Control

    Authors: Max Grazier G'Sell, Stefan Wager, Alexandra Chouldechova, Robert Tibshirani

    Abstract: We consider a multiple hypothesis testing setting where the hypotheses are ordered and one is only permitted to reject an initial contiguous block, H_1,\dots,H_k, of hypotheses. A rejection rule in this setting amounts to a procedure for choosing the stopping point k. This setting is inspired by the sequential nature of many model selection problems, where choosing a stopping point or a model is e… ▽ More

    Submitted 23 March, 2015; v1 submitted 20 September, 2013; originally announced September 2013.

    Comments: 31 pages, 14 figures. Accepted to the Journal of the Royal Statistical Society: Series B

  14. arXiv:1308.2329  [pdf, other

    stat.ME

    Sensitivity Analysis for Inference with Partially Identifiable Covariance Matrices

    Authors: Max Grazier G'Sell, Shai S. Shen-Orr, Robert Tibshirani

    Abstract: In some multivariate problems with missing data, pairs of variables exist that are never observed together. For example, some modern biological tools can produce data of this form. As a result of this structure, the covariance matrix is only partially identifiable, and point estimation requires that identifying assumptions be made. These assumptions can introduce an unknown and potentially large b… ▽ More

    Submitted 10 August, 2013; originally announced August 2013.

    Comments: 19 pages, 8 figures. Submitted to Computational Statistics

  15. arXiv:1307.4765  [pdf, other

    math.ST stat.ME

    Adaptive testing for the graphical lasso

    Authors: Max Grazier G'Sell, Jonathan Taylor, Robert Tibshirani

    Abstract: We consider tests of significance in the setting of the graphical lasso for inverse covariance matrix estimation. We propose a simple test statistic based on a subsequence of the knots in the graphical lasso path. We show that this statistic has an exponential asymptotic null distribution, under the null hypothesis that the model contains the true connected components. Though the null distributi… ▽ More

    Submitted 22 July, 2013; v1 submitted 17 July, 2013; originally announced July 2013.

    Comments: 33 pages, 8 figures. Submitted to Annals of Statistics

    MSC Class: 62F12; 62H15

  16. arXiv:1302.2303  [pdf, other

    stat.ME

    False Variable Selection Rates in Regression

    Authors: Max Grazier G'Sell, Trevor Hastie, Robert Tibshirani

    Abstract: There has been recent interest in extending the ideas of False Discovery Rates (FDR) to variable selection in regression settings. Traditionally the FDR in these settings has been defined in terms of the coefficients of the full regression model. Recent papers have struggled with controlling this quantity when the predictors are correlated. This paper shows that this full model definition of FDR s… ▽ More

    Submitted 10 February, 2013; originally announced February 2013.

    Comments: 14 figures, 21 pages. Submitted to Annals of Applied Statistics