Search | arXiv e-print repository

Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Authors: Adam Fisch, Joshua Maynez, R. Alex Hofer, Bhuwan Dhingra, Amir Globerson, William W. Cohen

Abstract: Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean… ▽ More Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data. △ Less

Submitted 3 December, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2004.03658 [pdf, other]

Faithful Embeddings for Knowledge Base Queries

Authors: Haitian Sun, Andrew O. Arnold, Tania Bedrax-Weiss, Fernando Pereira, William W. Cohen

Abstract: The deductive closure of an ideal knowledge base (KB) contains exactly the logical queries that the KB can answer. However, in practice KBs are both incomplete and over-specified, failing to answer some queries that have real-world answers. \emph{Query embedding} (QE) techniques have been recently proposed where KB entities and KB queries are represented jointly in an embedding space, supporting r… ▽ More The deductive closure of an ideal knowledge base (KB) contains exactly the logical queries that the KB can answer. However, in practice KBs are both incomplete and over-specified, failing to answer some queries that have real-world answers. \emph{Query embedding} (QE) techniques have been recently proposed where KB entities and KB queries are represented jointly in an embedding space, supporting relaxation and generalization in KB inference. However, experiments in this paper show that QE systems may disagree with deductive reasoning on answers that do not require generalization or relaxation. We address this problem with a novel QE method that is more faithful to deductive reasoning, and show that this leads to better performance on complex queries to incomplete KBs. Finally we show that inserting this new QE module into a neural question-answering system leads to substantial improvements over the state-of-the-art. △ Less

Submitted 28 January, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Published at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

arXiv:2002.06115 [pdf, other]

Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base

Authors: William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler

Abstract: We describe a novel way of representing a symbolic knowledge base (KB) called a sparse-matrix reified KB. This representation enables neural modules that are fully differentiable, faithful to the original semantics of the KB, expressive enough to model multi-hop inferences, and scalable enough to use with realistically large KBs. The sparse-matrix reified KB can be distributed across multiple GPUs… ▽ More We describe a novel way of representing a symbolic knowledge base (KB) called a sparse-matrix reified KB. This representation enables neural modules that are fully differentiable, faithful to the original semantics of the KB, expressive enough to model multi-hop inferences, and scalable enough to use with realistically large KBs. The sparse-matrix reified KB can be distributed across multiple GPUs, can scale to tens of millions of entities and facts, and is orders of magnitude faster than naive sparse-matrix implementations. The reified KB enables very simple end-to-end architectures to obtain competitive performance on several benchmarks representing two families of tasks: KB completion, and learning semantic parsers from denotations. △ Less

Submitted 14 February, 2020; originally announced February 2020.

Comments: Also published in ICLR2020 https://openreview.net/forum?id=BJlguT4YPr&noteId=BJlguT4YPr

arXiv:1912.06074 [pdf, other]

Game Design for Eliciting Distinguishable Behavior

Authors: Fan Yang, Liu Leqi, Yifan Wu, Zachary C. Lipton, Pradeep Ravikumar, William W. Cohen, Tom Mitchell

Abstract: The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing… ▽ More The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing \emph{behavior diagnostic games} that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin. △ Less

Submitted 12 December, 2019; originally announced December 2019.

Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:1911.06111 [pdf, other]

Instance-based Transfer Learning for Multilingual Deep Retrieval

Authors: Andrew O. Arnold, William W. Cohen

Abstract: We focus on the problem of search in the multilingual setting. Examining the problems of next-sentence prediction and inverse cloze, we show that at large scale, instance-based transfer learning is surprisingly effective in the multilingual setting, leading to positive transfer on all of the 35 target languages and two tasks tested. We analyze this improvement and argue that the most natural expla… ▽ More We focus on the problem of search in the multilingual setting. Examining the problems of next-sentence prediction and inverse cloze, we show that at large scale, instance-based transfer learning is surprisingly effective in the multilingual setting, leading to positive transfer on all of the 35 target languages and two tasks tested. We analyze this improvement and argue that the most natural explanation, namely direct vocabulary overlap between languages, only partially explains the performance gains: in fact, we demonstrate target-language improvement can occur after adding data from an auxiliary language even with no vocabulary in common with the target. This surprising result is due to the effect of transitive vocabulary overlaps between pairs of auxiliary and target languages. △ Less

Submitted 15 April, 2021; v1 submitted 8 November, 2019; originally announced November 2019.

Journal ref: The Web Conference Workshop on Multilingual Search, 2021

arXiv:1905.10417 [pdf, other]

Differentiable Representations For Multihop Inference Rules

Authors: William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler

Abstract: We present efficient differentiable implementations of second-order multi-hop reasoning using a large symbolic knowledge base (KB). We introduce a new operation which can be used to compositionally construct second-order multi-hop templates in a neural model, and evaluate a number of alternative implementations, with different time and memory trade offs. These techniques scale to KBs with millions… ▽ More We present efficient differentiable implementations of second-order multi-hop reasoning using a large symbolic knowledge base (KB). We introduce a new operation which can be used to compositionally construct second-order multi-hop templates in a neural model, and evaluate a number of alternative implementations, with different time and memory trade offs. These techniques scale to KBs with millions of entities and tens of millions of triples, and lead to simple models with competitive performance on several learning tasks requiring multi-hop reasoning. △ Less

Submitted 24 May, 2019; originally announced May 2019.

arXiv:1806.05662 [pdf, other]

GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

Authors: Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun

Abstract: Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning… ▽ More Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks. Our proposed transfer learning framework improves performance on various tasks including question answering, natural language inference, sentiment analysis, and image classification. We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels. △ Less

Submitted 2 July, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

arXiv:1804.09238 [pdf, other]

Semi-Supervised Learning with Declaratively Specified Entropy Constraints

Authors: Haitian Sun, William W. Cohen, Lidong Bing

Abstract: We propose a technique for declaratively specifying strategies for semi-supervised learning (SSL). The proposed method can be used to specify ensembles of semi-supervised learning, as well as agreement constraints and entropic regularization constraints between these learners, and can be used to model both well-known heuristics such as co-training and novel domain-specific heuristics. In addition… ▽ More We propose a technique for declaratively specifying strategies for semi-supervised learning (SSL). The proposed method can be used to specify ensembles of semi-supervised learning, as well as agreement constraints and entropic regularization constraints between these learners, and can be used to model both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can also be automatically combined using Bayesian optimization methods. We show consistent improvements on a suite of well-studied SSL benchmarks, including a new state-of-the-art result on a difficult relation extraction task. △ Less

Submitted 18 May, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

arXiv:1703.01557 [pdf, other]

Using Graphs of Classifiers to Impose Declarative Constraints on Semi-supervised Learning

Authors: Lidong Bing, William W. Cohen, Bhuwan Dhingra

Abstract: We propose a general approach to modeling semi-supervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristi… ▽ More We propose a general approach to modeling semi-supervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can be automatically combined using Bayesian optimization methods. We experiment with two classes of tasks, link-based text classification and relation extraction. We show modest improvements on well-studied link-based classification benchmarks, and state-of-the-art results on relation-extraction tasks for two realistic domains. △ Less

Submitted 23 March, 2017; v1 submitted 4 March, 2017; originally announced March 2017.

Comments: 8 pages, 3 figures

arXiv:1602.04393 [pdf, other]

Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

Authors: Abhinav Maurya, Kenton Murray, Yandong Liu, Chris Dyer, William W. Cohen, Daniel B. Neill

Abstract: Early detection and precise characterization of emerging topics in text streams can be highly useful in applications such as timely and targeted public health interventions and discovering evolving regional business trends. Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have numerous shortcomings that make them unsuitable… ▽ More Early detection and precise characterization of emerging topics in text streams can be highly useful in applications such as timely and targeted public health interventions and discovering evolving regional business trends. Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have numerous shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. In this paper, we describe Semantic Scan (SS) that has been developed specifically to overcome these shortcomings in detecting new spatially compact events in text streams. Semantic Scan integrates novel contrastive topic modeling with online document assignment and principled likelihood ratio-based spatial scanning to identify emerging events with unexpected patterns of keywords hidden in text streams. This enables more timely and accurate detection and characterization of anomalous, spatially localized emerging events. Semantic Scan does not require manual intervention or labeled training data, and is robust to noise in real-world text data since it identifies anomalous text patterns that occur in a cluster of new documents rather than an anomaly in a single new document. We compare Semantic Scan to alternative state-of-the-art methods such as Topics over Time, Online LDA, and Labeled LDA on two real-world tasks: (i) a disease surveillance task monitoring free-text Emergency Department chief complaints in Allegheny County, and (ii) an emerging business trend detection task based on Yelp reviews. On both tasks, we find that Semantic Scan provides significantly better event detection and characterization accuracy than competing approaches, while providing up to an order of magnitude speedup. △ Less

Submitted 13 February, 2016; originally announced February 2016.

Comments: 10 pages, 4 figures, KDD 2016 submission

Showing 1–10 of 10 results for author: Cohen, W W