Skip to main content

Showing 1–6 of 6 results for author: Chew, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.21528  [pdf, other

    stat.ML cs.CR cs.LG

    Bayesian Pseudo Posterior Mechanism for Differentially Private Machine Learning

    Authors: Robert Chew, Matthew R. Williams, Elan A. Segarra, Alexander J. Preiss, Amanda Konet, Terrance D. Savitsky

    Abstract: Differential privacy (DP) is becoming increasingly important for deployed machine learning applications because it provides strong guarantees for protecting the privacy of individuals whose data is used to train models. However, DP mechanisms commonly used in machine learning tend to struggle on many real world distributions, including highly imbalanced or small labeled training sets. In this work… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  2. arXiv:2502.02121  [pdf, other

    cs.LG stat.ML

    BILBO: BILevel Bayesian Optimization

    Authors: Ruth Wan Theng Chew, Quoc Phong Nguyen, Bryan Kian Hsiang Low

    Abstract: Bilevel optimization is characterized by a two-level optimization structure, where the upper-level problem is constrained by optimal lower-level solutions, and such structures are prevalent in real-world problems. The constraint by optimal lower-level solutions poses significant challenges, especially in noisy, constrained, and derivative-free settings, as repeating lower-level optimizations is sa… ▽ More

    Submitted 28 May, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  3. arXiv:2501.06826  [pdf, other

    stat.ME cs.CL

    Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR)

    Authors: Stephanie Eckman, Bolei Ma, Christoph Kern, Rob Chew, Barbara Plank, Frauke Kreuter

    Abstract: Models trained on crowdsourced labels may not reflect broader population views, because those who work as annotators do not represent the population. We propose Population-Aligned Instance Replication (PAIR), a method to address bias caused by non-representative annotator pools. Using a simulation study of offensive language and hate speech, we create two types of annotators with different labelin… ▽ More

    Submitted 7 March, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  4. arXiv:2311.14212  [pdf, other

    stat.ML cs.CL cs.LG stat.ME

    Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

    Authors: Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter

    Abstract: When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annota… ▽ More

    Submitted 22 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Findings: https://aclanthology.org/2023.findings-emnlp.992/

  5. arXiv:2306.14924  [pdf, other

    cs.CL cs.AI cs.LG stat.AP

    LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

    Authors: Robert Chew, John Bollenbacher, Michael Wenger, Jessica Speer, Annice Kim

    Abstract: Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  6. arXiv:1812.06591  [pdf, other

    stat.ML cs.LG

    SMART: An Open Source Data Labeling Platform for Supervised Learning

    Authors: Rob Chew, Michael Wenger, Caroline Kery, Jason Nance, Keith Richards, Emily Hadley, Peter Baumgartner

    Abstract: SMART is an open source web application designed to help data scientists and research teams efficiently build labeled training data sets for supervised machine learning tasks. SMART provides users with an intuitive interface for creating labeled data sets, supports active learning to help reduce the required amount of labeled data, and incorporates inter-rater reliability statistics to provide ins… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

    Comments: 5 pages, 1 figure

    Journal ref: The Journal of Machine Learning Research, 20(1), 2999-3003 (2019)