Skip to main content

Showing 1–11 of 11 results for author: Fogliato, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.17427  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Stronger Neyman Regret Guarantees for Adaptive Experimental Design

    Authors: Georgy Noarov, Riccardo Fogliato, Martin Bertran, Aaron Roth

    Abstract: We study the design of adaptive, sequential experiments for unbiased average treatment effect (ATE) estimation in the design-based potential outcomes setting. Our goal is to develop adaptive designs offering sublinear Neyman regret, meaning their efficiency must approach that of the hindsight-optimal nonadaptive design. Recent work [Dai et al, 2023] introduced ClipOGD, the first method achieving… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  2. arXiv:2410.05222  [pdf, other

    cs.LG cs.CL cs.CV stat.AP

    Precise Model Benchmarking with Only a Few Observations

    Authors: Riccardo Fogliato, Pratik Patil, Nil-Jana Akpinar, Mathew Monfort

    Abstract: How can we precisely estimate a large language model's (LLM) accuracy on questions belonging to a specific topic within a larger question-answering dataset? The standard direct estimator, which averages the model's accuracy on the questions in each subgroup, may exhibit high variance for subgroups (topics) with small sample sizes. Synthetic regression modeling, which leverages the model's accuracy… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: To appear at EMNLP 2024

  3. arXiv:2406.07320  [pdf, other

    cs.CV stat.AP

    A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation

    Authors: Riccardo Fogliato, Pratik Patil, Mathew Monfort, Pietro Perona

    Abstract: Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time completely random selection of the data. However, by employing tailored sampling and estimation strategies, one can obtain more precise estimates and reduce annotation costs. In this paper, we propose a statist… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: To appear at ECCV 2024

  4. arXiv:2404.04689  [pdf, other

    stat.ML cs.CL cs.LG

    Multicalibration for Confidence Scoring in LLMs

    Authors: Gianluca Detommaso, Martin Bertran, Riccardo Fogliato, Aaron Roth

    Abstract: This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting groupings of the data. We show how to form groupings for prompt/completion pairs that are correlated with the probability of correctnes… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  5. arXiv:2310.07935  [pdf, other

    stat.ME stat.AP

    Estimating the Likelihood of Arrest from Police Records in Presence of Unreported Crimes

    Authors: Riccardo Fogliato, Arun Kumar Kuchibhotla, Zachary Lipton, Daniel Nagin, Alice Xiang, Alexandra Chouldechova

    Abstract: Many important policy decisions concerning policing hinge on our understanding of how likely various criminal offenses are to result in arrests. Since many crimes are never reported to law enforcement, estimates based on police records alone must be adjusted to account for the likelihood that each crime would have been reported to the police. In this paper, we present a methodological framework fo… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  6. arXiv:2306.01198  [pdf, other

    stat.ME cs.CV stat.ML

    Confidence Intervals for Error Rates in 1:1 Matching Tasks: Critical Statistical Analysis and Recommendations

    Authors: Riccardo Fogliato, Pratik Patil, Pietro Perona

    Abstract: Matching algorithms are commonly used to predict matches between items in a collection. For example, in 1:1 face verification, a matching algorithm predicts whether two face images depict the same person. Accurately assessing the uncertainty of the error rates of such algorithms can be challenging when data are dependent and error rates are low, two aspects that have been often overlooked in the l… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  7. Racial Disparities in the Enforcement of Marijuana Violations in the US

    Authors: Bradley Butcher, Chris Robinson, Miri Zilka, Riccardo Fogliato, Carolyn Ashurst, Adrian Weller

    Abstract: Racial disparities in US drug arrest rates have been observed for decades, but their causes and policy implications are still contested. Some have argued that the disparities largely reflect differences in drug use between racial groups, while others have hypothesized that discriminatory enforcement policies and police practices play a significant role. In this work, we analyze racial disparities… ▽ More

    Submitted 1 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: AAAI/ACM Conference on AI, Ethics, and Society 2022

  8. arXiv:2106.11188  [pdf, other

    stat.ME stat.CO

    maars: Tidy Inference under the 'Models as Approximations' Framework in R

    Authors: Riccardo Fogliato, Shamindra Shrotriya, Arun Kumar Kuchibhotla

    Abstract: Linear regression using ordinary least squares (OLS) is a critical part of every statistician's toolkit. In R, this is elegantly implemented via lm() and its related functions. However, the statistical inference output from this suite of functions is based on the assumption that the model is well specified. This assumption is often unrealistic and at best satisfied approximately. In the statistics… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: The first two authors contributed equally to this work and are ordered alphabetically

  9. arXiv:2105.04953  [pdf, other

    stat.AP

    On the Validity of Arrest as a Proxy for Offense: Race and the Likelihood of Arrest for Violent Crimes

    Authors: Riccardo Fogliato, Alice Xiang, Zachary Lipton, Daniel Nagin, Alexandra Chouldechova

    Abstract: The risk of re-offense is considered in decision-making at many stages of the criminal justice system, from pre-trial, to sentencing, to parole. To aid decision makers in their assessments, institutions increasingly rely on algorithmic risk assessment instruments (RAIs). These tools assess the likelihood that an individual will be arrested for a new criminal offense within some time window followi… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: Accepted at AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021

  10. arXiv:2003.13808  [pdf, other

    stat.ME cs.CY

    Fairness Evaluation in Presence of Biased Noisy Labels

    Authors: Riccardo Fogliato, Max G'Sell, Alexandra Chouldechova

    Abstract: Risk assessment tools are widely used around the country to inform decision making within the criminal justice system. Recently, considerable attention has been devoted to the question of whether such tools may suffer from racial bias. In this type of assessment, a fundamental issue is that the training and evaluation of the model is based on a variable (arrest) that may represent a noisy version… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Accepted at International Conference on Artificial Intelligence and Statistics (AISTATS), 2020

  11. arXiv:2002.01328  [pdf, other

    stat.AP

    TRAP: A Predictive Framework for Trail Running Assessment of Performance

    Authors: Riccardo Fogliato, Natalia L. Oliveira, Ronald Yurko

    Abstract: Trail running is an endurance sport in which athletes face severe physical challenges. Due to the growing number of participants, the organization of limited staff, equipment, and medical support in these races now plays a key role. Monitoring runner's performance is a difficult task that requires knowledge of the terrain and of the runner's ability. In the past, choices were solely based on the o… ▽ More

    Submitted 12 July, 2020; v1 submitted 4 February, 2020; originally announced February 2020.