Skip to main content

Showing 1–21 of 21 results for author: Thain, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.03654  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset

    Authors: Jessica Hoffmann, Christiane Ahlheim, Zac Yu, Aria Walfrand, Jarvis Jin, Marie Tano, Ahmad Beirami, Erin van Liemt, Nithum Thain, Hakim Sidahmed, Lucas Dixon

    Abstract: This paper describes the construction of a dataset and the evaluation of training methods to improve generative large language models' (LLMs) ability to answer queries on sensitive topics with a Neutral Point of View (NPOV), i.e., to provide significantly more informative, diverse and impartial answers. The dataset, the SHQ-NPOV dataset, comprises 300 high-quality, human-written quadruplets: a que… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  2. arXiv:2403.08904  [pdf, other

    cs.CL

    Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics

    Authors: Tyler A. Chang, Katrin Tomanek, Jessica Hoffmann, Nithum Thain, Erin van Liemt, Kathleen Meier-Hellstern, Lucas Dixon

    Abstract: We explore a strategy to handle controversial topics in LLM-based chatbots based on Wikipedia's Neutral Point of View (NPOV) principle: acknowledge the absence of a single true answer and surface multiple perspectives. We frame this as retrieval augmented generation, where perspectives are retrieved from a knowledge base and the LLM is tasked with generating a fluent and faithful response from the… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  3. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  4. arXiv:2403.04894  [pdf, other

    cs.CL cs.AI

    ConstitutionalExperts: Training a Mixture of Principle-based Prompts

    Authors: Savvas Petridis, Ben Wedin, Ann Yuan, James Wexler, Nithum Thain

    Abstract: Large language models (LLMs) are highly capable at a variety of tasks given the right prompt, but writing one is still a difficult and tedious process. In this work, we introduce ConstitutionalExperts, a method for learning a prompt consisting of constitutional principles (i.e. rules), given a training dataset. Unlike prior methods that optimize the prompt as a single entity, our method incrementa… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  5. arXiv:2305.13535  [pdf, other

    cs.CL cs.LG

    Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

    Authors: Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Jilin Chen, Ed H. Chi, Alex Beutel

    Abstract: Counterfactual Data Augmentation (CDA) is a commonly used technique for improving robustness in natural language classifiers. However, one fundamental challenge is how to discover meaningful counterfactuals and efficiently label them, with minimal human labeling cost. Most existing methods either completely rely on human-annotated labels, an expensive process which limits the scale of counterfactu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  6. arXiv:2302.06598  [pdf, other

    cs.CL

    Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning

    Authors: Maximilian Mozes, Tolga Bolukbasi, Ann Yuan, Frederick Liu, Nithum Thain, Lucas Dixon

    Abstract: Pretrained large language models (LLMs) are able to solve a wide variety of tasks through transfer learning. Various explainability methods have been developed to investigate their decision making process. TracIn (Pruthi et al., 2020) is one such gradient-based method which explains model inferences based on the influence of training examples. In this paper, we explore the use of TracIn to improve… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: Pre-print

  7. arXiv:2302.06541  [pdf, other

    cs.CL

    Towards Agile Text Classifiers for Everyone

    Authors: Maximilian Mozes, Jessica Hoffmann, Katrin Tomanek, Muhamed Kouate, Nithum Thain, Ann Yuan, Tolga Bolukbasi, Lucas Dixon

    Abstract: Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies require different classifiers, and safety policies themselves improve from iteration and adaptation. This paper introduces and evaluates methods for agile text cla… ▽ More

    Submitted 21 October, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Findings of EMNLP 2023

  8. arXiv:2207.07411  [pdf, other

    cs.LG stat.ML

    Plex: Towards Reliability using Pretrained Large Model Extensions

    Authors: Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek , et al. (1 additional authors not shown)

    Abstract: A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive per… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Code available at https://goo.gle/plex-code

  9. arXiv:2101.04526  [pdf, other

    cs.LG cs.CY cs.IR

    Measuring Recommender System Effects with Simulated Users

    Authors: Sirui Yao, Yoni Halpern, Nithum Thain, Xuezhi Wang, Kang Lee, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel

    Abstract: Imagine a food recommender system -- how would we check if it is \emph{causing} and fostering unhealthy eating habits or merely reflecting users' interests? How much of a user's experience over time with a recommender is caused by the recommender system's choices and biases, and how much is based on the user's preferences and biases? Popularity bias and filter bubbles are two of the most well-stud… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: Presented at Second Workshop on Fairness, Accountability, Transparency, Ethics and Society on the Web (FATES 2020) with the title "Beyond Next Step Bias: Trajectory Simulation for Understanding Recommender System Behavior"

  10. arXiv:2010.07410  [pdf, other

    cs.CL cs.SI

    Six Attributes of Unhealthy Conversation

    Authors: Ilan Price, Jordan Gifford-Moore, Jory Fleming, Saul Musker, Maayan Roichman, Guillaume Sylvain, Nithum Thain, Lucas Dixon, Jeffrey Sorensen

    Abstract: We present a new dataset of approximately 44000 comments labeled by crowdworkers. Each comment is labelled as either 'healthy' or 'unhealthy', in addition to binary labels for the presence of six potentially 'unhealthy' sub-attributes: (1) hostile; (2) antagonistic, insulting, provocative or trolling; (3) dismissive; (4) condescending or patronising; (5) sarcastic; and/or (6) an unfair generalisat… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: Appearing in the 4th Workshop on Online Abuse and Harms (2020)

  11. arXiv:2006.13114  [pdf, other

    cs.LG stat.ML

    Fairness without Demographics through Adversarially Reweighted Learning

    Authors: Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, Ed H. Chi

    Abstract: Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fai… ▽ More

    Submitted 3 November, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: To appear at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  12. arXiv:2006.00998  [pdf, other

    cs.CL

    Toxicity Detection: Does Context Really Matter?

    Authors: John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos

    Abstract: Moderation is crucial to promoting healthy on-line discussions. Although several `toxicity' detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments maybe judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improv… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

  13. arXiv:2004.05476  [pdf, other

    cs.CL cs.CY cs.IR cs.LG

    Classifying Constructive Comments

    Authors: Varada Kolhatkar, Nithum Thain, Jeffrey Sorensen, Lucas Dixon, Maite Taboada

    Abstract: We introduce the Constructive Comments Corpus (C3), comprised of 12,000 annotated news comments, intended to help build new tools for online communities to improve the quality of their discussions. We define constructive comments as high-quality comments that make a contribution to the conversation. We explain the crowd worker annotation scheme and define a taxonomy of sub-characteristics of const… ▽ More

    Submitted 4 August, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

  14. arXiv:1911.01916  [pdf, other

    cs.LG stat.ML

    Practical Compositional Fairness: Understanding Fairness in Multi-Component Recommender Systems

    Authors: Xuezhi Wang, Nithum Thain, Anu Sinha, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel

    Abstract: How can we build recommender systems to take into account fairness? Real-world recommender systems are often composed of multiple models, built by multiple teams. However, most research on fairness focuses on improving fairness in a single model. Further, recent research on classification fairness has shown that combining multiple "fair" classifiers can still result in an "unfair" classification s… ▽ More

    Submitted 25 January, 2021; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: WSDM 2021

  15. arXiv:1908.02810  [pdf, other

    cs.LG cs.CL stat.ML

    Debiasing Embeddings for Reduced Gender Bias in Text Classification

    Authors: Flavien Prost, Nithum Thain, Tolga Bolukbasi

    Abstract: (Bolukbasi et al., 2016) demonstrated that pretrained word embeddings can inherit gender bias from the data they were trained on. We investigate how this bias affects downstream classification tasks, using the case study of occupation classification (De-Arteaga et al.,2019). We show that traditional techniques for debiasing embeddings can actually worsen the bias of the downstream classifier by pr… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

  16. arXiv:1903.04561  [pdf, other

    cs.LG cs.CL stat.ML

    Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

    Authors: Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman

    Abstract: Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary ac… ▽ More

    Submitted 8 May, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: Updated to fix typo in Equation 4

  17. arXiv:1903.02088  [pdf, other

    stat.ML cs.LG

    Limitations of Pinned AUC for Measuring Unintended Bias

    Authors: Daniel Borkan, Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman

    Abstract: This report examines the Pinned AUC metric introduced and highlights some of its limitations. Pinned AUC provides a threshold-agnostic measure of unintended bias in a classification model, inspired by the ROC-AUC metric. However, as we highlight in this report, there are ways that the metric can obscure different kinds of unintended biases when the underlying class distributions on which bias is b… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

  18. arXiv:1810.13181  [pdf, other

    cs.CL

    WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

    Authors: Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon

    Abstract: We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of deta… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

  19. arXiv:1805.05345  [pdf, other

    cs.CL cs.AI cs.CY cs.HC physics.soc-ph

    Conversations Gone Awry: Detecting Early Signs of Conversational Failure

    Authors: Justine Zhang, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Nithum Thain, Dario Taraborelli

    Abstract: One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversa… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

    Comments: To appear in the Proceedings of ACL 2018, 15 pages, 1 figure. Data, quiz, code and additional information at http://www.cs.cornell.edu/~cristian/Conversations_gone_awry.html

  20. arXiv:1610.08914  [pdf, other

    cs.CL

    Ex Machina: Personal Attacks Seen at Scale

    Authors: Ellery Wulczyn, Nithum Thain, Lucas Dixon

    Abstract: The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We s… ▽ More

    Submitted 25 February, 2017; v1 submitted 27 October, 2016; originally announced October 2016.

  21. arXiv:1202.4134  [pdf, other

    cs.GT cs.DS

    On the Implications of Lookahead Search in Game Playing

    Authors: Vahab Mirrokni, Nithum Thain, Adrian Vetta

    Abstract: Lookahead search is perhaps the most natural and widely used game playing strategy. Given the practical importance of the method, the aim of this paper is to provide a theoretical performance examination of lookahead search in a wide variety of applications. To determine a strategy play using lookahead search}, each agent predicts multiple levels of possible re-actions to her move (via the use o… ▽ More

    Submitted 19 February, 2012; originally announced February 2012.