-
Evolution of Symbiosis in the Game of Life: Three Characteristics of Successful Symbiotes
Authors:
Peter D. Turney
Abstract:
In past work, we developed a computational model of the evolution of symbiotic entities (Model-S), based on Conway's Game of Life. In this article, we examine three trends that biologists have observed in the evolution of symbiotes. (1) Management: If one partner is able to control the symbiotic relation, this control can reduce conflict; thus, evolutionary selection favours symbiotes that have a…
▽ More
In past work, we developed a computational model of the evolution of symbiotic entities (Model-S), based on Conway's Game of Life. In this article, we examine three trends that biologists have observed in the evolution of symbiotes. (1) Management: If one partner is able to control the symbiotic relation, this control can reduce conflict; thus, evolutionary selection favours symbiotes that have a manager. (2) Mutualism: Although partners in a symbiote often have conflicting needs, evolutionary selection favours symbiotes in which partners are better off together inside the symbiote than they would be as individuals outside of the symbiote. (3) Interaction: Repeated interaction among partners in symbiosis tends to promote increasing fitness due to evolutionary selection. We have added new components to Model-S that allow us to observe these three trends in runs of Model-S. The new components are analogous to the practice of staining cells in biology research, to reveal patterns that are not usually visible. When we measure the fitness of a symbiote by making it compete with other symbiotes, we find that fitter symbiotes have significantly more management, mutualism, and interaction than less fit symbiotes. These results confirm the trends observed in nature by biologists. Model-S allows biologists to study these evolutionary trends and other characteristics of symbiosis in ways that are not tractable with living organisms.
△ Less
Submitted 26 September, 2022; v1 submitted 2 April, 2021;
originally announced April 2021.
-
Measuring Behavioural Similarity of Cellular Automata
Authors:
Peter D. Turney
Abstract:
Conway's Game of Life is the best-known cellular automaton. It is a classic model of emergence and self-organization, it is Turing-complete, and it can simulate a universal constructor. The Game of Life belongs to the set of semi-totalistic cellular automata, a family with 262,144 members. Many of these automata may deserve as much attention as the Game of Life, if not more. The challenge we addre…
▽ More
Conway's Game of Life is the best-known cellular automaton. It is a classic model of emergence and self-organization, it is Turing-complete, and it can simulate a universal constructor. The Game of Life belongs to the set of semi-totalistic cellular automata, a family with 262,144 members. Many of these automata may deserve as much attention as the Game of Life, if not more. The challenge we address here is to provide a structure for organizing this large family, to make it easier to find interesting automata, and to understand the relations between automata. Packard and Wolfram (1985) divided the family into four classes, based on the observed behaviours of the rules. Eppstein (2010) proposed an alternative four-class system, based on the forms of the rules. Instead of a class-based organization, we propose a continuous high-dimensional vector space, where each automaton is represented by a point in the space. The distance between two automata in this space corresponds to the differences in their behavioural characteristics. Nearest neighbours in the space have similar behaviours. This space should make it easier for researchers to see the structure of the family of semi-totalistic rules and to find the hidden gems in the family.
△ Less
Submitted 17 December, 2020; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Evolution of Autopoiesis and Multicellularity in the Game of Life
Authors:
Peter D. Turney
Abstract:
Recently we introduced a model of symbiosis, Model-S, based on the evolution of seed patterns in Conway's Game of Life. In the model, the fitness of a seed pattern is measured by one-on-one competitions in the Immigration Game, a two-player variation of the Game of Life. Our previous article showed that Model-S can serve as a highly abstract, simplified model of biological life: (1) The initial se…
▽ More
Recently we introduced a model of symbiosis, Model-S, based on the evolution of seed patterns in Conway's Game of Life. In the model, the fitness of a seed pattern is measured by one-on-one competitions in the Immigration Game, a two-player variation of the Game of Life. Our previous article showed that Model-S can serve as a highly abstract, simplified model of biological life: (1) The initial seed pattern is analogous to a genome. (2) The changes as the game runs are analogous to the development of the phenome. (3) Tournament selection in Model-S is analogous to natural selection in biology. (4) The Immigration Game in Model-S is analogous to competition in biology. (5) The first three layers in Model-S are analogous to biological reproduction. (6) The fusion of seed patterns in Model-S is analogous to symbiosis. The current article takes this analogy two steps further: (7) Autopoietic structures in the Game of Life (still lifes, oscillators, and spaceships -- collectively known as ashes) are analogous to cells in biology. (8) The seed patterns in the Game of Life give rise to multiple, diverse, cooperating autopoietic structures, analogous to multicellular biological life. We use the apgsearch software (Ash Pattern Generator Search), developed by Adam Goucher for the study of ashes, to analyze autopoiesis and multicellularity in Model-S. We find that the fitness of evolved seed patterns in Model-S is highly correlated with the diversity and quantity of multicellular autopoietic structures.
△ Less
Submitted 11 January, 2021; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Conditions for Open-Ended Evolution in Immigration Games
Authors:
Peter D. Turney
Abstract:
The Immigration Game (invented by Don Woods in 1971) extends the solitaire Game of Life (invented by John Conway in 1970) to enable two-player competition. The Immigration Game can be used in a model of evolution by natural selection, where fitness is measured with competitions. The rules for the Game of Life belong to the family of semitotalistic rules, a family with 262,144 members. Woods' metho…
▽ More
The Immigration Game (invented by Don Woods in 1971) extends the solitaire Game of Life (invented by John Conway in 1970) to enable two-player competition. The Immigration Game can be used in a model of evolution by natural selection, where fitness is measured with competitions. The rules for the Game of Life belong to the family of semitotalistic rules, a family with 262,144 members. Woods' method for converting the Game of Life into a two-player game generalizes to 8,192 members of the family of semitotalistic rules. In this paper, we call the original Immigration Game the Life Immigration Game and we call the 8,192 generalizations Immigration Games (including the Life Immigration Game). The question we examine here is, what are the conditions for one of the 8,192 Immigration Games to be suitable for modeling open-ended evolution? Our focus here is specifically on conditions for the rules, as opposed to conditions for other aspects of the model of evolution. In previous work, it was conjectured that Turing-completeness of the rules for the Game of Life may have been necessary for the success of evolution using the Life Immigration Game. Here we present evidence that Turing-completeness is a sufficient condition on the rules of Immigration Games, but not a necessary condition. The evidence suggests that a necessary and sufficient condition on the rules of Immigration Games, for open-ended evolution, is that the rules should allow growth.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Symbiosis Promotes Fitness Improvements in the Game of Life
Authors:
Peter D. Turney
Abstract:
We present a computational simulation of evolving entities that includes symbiosis with shifting levels of selection. Evolution by natural selection shifts from the level of the original entities to the level of the new symbiotic entity. In the simulation, the fitness of an entity is measured by a series of one-on-one competitions in the Immigration Game, a two-player variation of Conway's Game of…
▽ More
We present a computational simulation of evolving entities that includes symbiosis with shifting levels of selection. Evolution by natural selection shifts from the level of the original entities to the level of the new symbiotic entity. In the simulation, the fitness of an entity is measured by a series of one-on-one competitions in the Immigration Game, a two-player variation of Conway's Game of Life. Mutation, reproduction, and symbiosis are implemented as operations that are external to the Immigration Game. Because these operations are external to the game, we are able to freely manipulate the operations and observe the effects of the manipulations. The simulation is composed of four layers, each layer building on the previous layer. The first layer implements a simple form of asexual reproduction, the second layer introduces a more sophisticated form of asexual reproduction, the third layer adds sexual reproduction, and the fourth layer adds symbiosis. The experiments show that a small amount of symbiosis, added to the other layers, significantly increases the fitness of the population. We suggest that the model may provide new insights into symbiosis in biological and cultural evolution.
△ Less
Submitted 16 June, 2020; v1 submitted 19 August, 2019;
originally announced August 2019.
-
The Natural Selection of Words: Finding the Features of Fitness
Authors:
Peter D. Turney,
Saif M. Mohammad
Abstract:
We introduce a dataset for studying the evolution of words, constructed from WordNet and the Google Books Ngram Corpus. The dataset tracks the evolution of 4,000 synonym sets (synsets), containing 9,000 English words, from 1800 AD to 2000 AD. We present a supervised learning algorithm that is able to predict the future leader of a synset: the word in the synset that will have the highest frequency…
▽ More
We introduce a dataset for studying the evolution of words, constructed from WordNet and the Google Books Ngram Corpus. The dataset tracks the evolution of 4,000 synonym sets (synsets), containing 9,000 English words, from 1800 AD to 2000 AD. We present a supervised learning algorithm that is able to predict the future leader of a synset: the word in the synset that will have the highest frequency. The algorithm uses features based on a word's length, the characters in the word, and the historical frequencies of the word. It can predict change of leadership (including the identity of the new leader) fifty years in the future, with an F-score considerably above random guessing. Analysis of the learned models provides insight into the causes of change in the leader of a synset. The algorithm confirms observations linguists have made, such as the trend to replace the -ise suffix with -ize, the rivalry between the -ity and -ness suffixes, and the struggle between economy (shorter words are easier to remember and to write) and clarity (longer words are more distinctive and less likely to be confused with one another). The results indicate that integration of the Google Books Ngram Corpus with WordNet has significant potential for improving our understanding of how language evolves.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Conditions for Major Transitions in Biological and Cultural Evolution
Authors:
Peter D. Turney
Abstract:
Evolution by natural selection can be seen an algorithm for generating creative solutions to difficult problems. More precisely, evolution by natural selection is a class of algorithms that share a set of properties. The question we address here is, what are the conditions that define this class of algorithms? There is a standard answer to this question: Briefly, the conditions are variation, here…
▽ More
Evolution by natural selection can be seen an algorithm for generating creative solutions to difficult problems. More precisely, evolution by natural selection is a class of algorithms that share a set of properties. The question we address here is, what are the conditions that define this class of algorithms? There is a standard answer to this question: Briefly, the conditions are variation, heredity, and selection. We agree that these three conditions are sufficient for a limited type of evolution, but they are not sufficient for open-ended evolution. By open-ended evolution, we mean evolution that generates a continuous stream of creative solutions, without stagnating. We propose a set of conditions for open-ended evolution. The new conditions build on the standard conditions by adding fission, fusion, and cooperation. We test the proposed conditions by applying them to major transitions in the evolution of life and culture. We find that the proposed conditions are able to account for the major transitions.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors
Authors:
Peter D. Turney
Abstract:
While open-domain question answering (QA) systems have proven effective for answering simple questions, they struggle with more complex questions. Our goal is to answer more complex questions reliably, without incurring a significant cost in knowledge resource construction to support the QA. One readily available knowledge resource is a term bank, enumerating the key concepts in a domain. We have…
▽ More
While open-domain question answering (QA) systems have proven effective for answering simple questions, they struggle with more complex questions. Our goal is to answer more complex questions reliably, without incurring a significant cost in knowledge resource construction to support the QA. One readily available knowledge resource is a term bank, enumerating the key concepts in a domain. We have developed an unsupervised learning approach that leverages a term bank to guide a QA system, by representing the terminological knowledge with thousands of specialized vector spaces. In experiments with complex science questions, we show that this approach significantly outperforms several state-of-the-art QA systems, demonstrating that significant leverage can be gained from continuous vector representations of domain terminology.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
Semantic Composition and Decomposition: From Recognition to Generation
Authors:
Peter D. Turney
Abstract:
Semantic composition is the task of understanding the meaning of text by composing the meanings of the individual words in the text. Semantic decomposition is the task of understanding the meaning of an individual word by decomposing it into various aspects (factors, constituents, components) that are latent in the meaning of the word. We take a distributional approach to semantics, in which a wor…
▽ More
Semantic composition is the task of understanding the meaning of text by composing the meanings of the individual words in the text. Semantic decomposition is the task of understanding the meaning of an individual word by decomposing it into various aspects (factors, constituents, components) that are latent in the meaning of the word. We take a distributional approach to semantics, in which a word is represented by a context vector. Much recent work has considered the problem of recognizing compositions and decompositions, but we tackle the more difficult generation problem. For simplicity, we focus on noun-modifier bigrams and noun unigrams. A test for semantic composition is, given context vectors for the noun and modifier in a noun-modifier bigram ("red salmon"), generate a noun unigram that is synonymous with the given bigram ("sockeye"). A test for semantic decomposition is, given a context vector for a noun unigram ("snifter"), generate a noun-modifier bigram that is synonymous with the given unigram ("brandy glass"). With a vocabulary of about 73,000 unigrams from WordNet, there are 73,000 candidate unigram compositions for a bigram and 5,300,000,000 (73,000 squared) candidate bigram decompositions for a unigram. We generate ranked lists of potential solutions in two passes. A fast unsupervised learning algorithm generates an initial list of candidates and then a slower supervised learning algorithm refines the list. We evaluate the candidate solutions by comparing them to WordNet synonym sets. For decomposition (unigram to bigram), the top 100 most highly ranked bigrams include a WordNet synonym of the given unigram 50.7% of the time. For composition (bigram to unigram), the top 100 most highly ranked unigrams include a WordNet synonym of the given bigram 77.8% of the time.
△ Less
Submitted 30 May, 2014;
originally announced May 2014.
-
Experiments with Three Approaches to Recognizing Lexical Entailment
Authors:
Peter D. Turney,
Saif M. Mohammad
Abstract:
Inference in natural language often involves recognizing lexical entailment (RLE); that is, identifying whether one word entails another. For example, "buy" entails "own". Two general strategies for RLE have been proposed: One strategy is to manually construct an asymmetric similarity measure for context vectors (directional similarity) and another is to treat RLE as a problem of learning to recog…
▽ More
Inference in natural language often involves recognizing lexical entailment (RLE); that is, identifying whether one word entails another. For example, "buy" entails "own". Two general strategies for RLE have been proposed: One strategy is to manually construct an asymmetric similarity measure for context vectors (directional similarity) and another is to treat RLE as a problem of learning to recognize semantic relations using supervised machine learning techniques (relation classification). In this paper, we experiment with two recent state-of-the-art representatives of the two general strategies. The first approach is an asymmetric similarity measure (an instance of the directional similarity strategy), designed to capture the degree to which the contexts of a word, a, form a subset of the contexts of another word, b. The second approach (an instance of the relation classification strategy) represents a word pair, a:b, with a feature vector that is the concatenation of the context vectors of a and b, and then applies supervised learning to a training set of labeled feature vectors. Additionally, we introduce a third approach that is a new instance of the relation classification strategy. The third approach represents a word pair, a:b, with a feature vector in which the features are the differences in the similarities of a and b to a set of reference words. All three approaches use vector space models (VSMs) of semantics, based on word-context matrices. We perform an extensive evaluation of the three approaches using three different datasets. The proposed new approach (similarity differences) performs significantly better than the other two approaches on some datasets and there is no dataset for which it is significantly worse. Our results suggest it is beneficial to make connections between the research in lexical entailment and the research in semantic relation classification.
△ Less
Submitted 31 January, 2014;
originally announced January 2014.
-
Distributional semantics beyond words: Supervised learning of analogy and paraphrase
Authors:
Peter D. Turney
Abstract:
There have been several efforts to extend distributional semantics beyond individual words, to measure the similarity of word pairs, phrases, and sentences (briefly, tuples; ordered sets of words, contiguous or noncontiguous). One way to extend beyond words is to compare two tuples using a function that combines pairwise similarities between the component words in the tuples. A strength of this ap…
▽ More
There have been several efforts to extend distributional semantics beyond individual words, to measure the similarity of word pairs, phrases, and sentences (briefly, tuples; ordered sets of words, contiguous or noncontiguous). One way to extend beyond words is to compare two tuples using a function that combines pairwise similarities between the component words in the tuples. A strength of this approach is that it works with both relational similarity (analogy) and compositional similarity (paraphrase). However, past work required hand-coding the combination function for different tasks. The main contribution of this paper is that combination functions are generated by supervised learning. We achieve state-of-the-art results in measuring relational similarity between word pairs (SAT analogies and SemEval~2012 Task 2) and measuring compositional similarity between noun-modifier phrases and unigrams (multiple-choice paraphrase questions).
△ Less
Submitted 18 October, 2013;
originally announced October 2013.
-
Domain and Function: A Dual-Space Model of Semantic Relations and Compositions
Authors:
Peter D. Turney
Abstract:
Given appropriate representations of the semantic relations between carpenter and wood and between mason and stone (for example, vectors in a vector space model), a suitable algorithm should be able to recognize that these relations are highly similar (carpenter is to wood as mason is to stone; the relations are analogous). Likewise, with representations of dog, house, and kennel, an algorithm sho…
▽ More
Given appropriate representations of the semantic relations between carpenter and wood and between mason and stone (for example, vectors in a vector space model), a suitable algorithm should be able to recognize that these relations are highly similar (carpenter is to wood as mason is to stone; the relations are analogous). Likewise, with representations of dog, house, and kennel, an algorithm should be able to recognize that the semantic composition of dog and house, dog house, is highly similar to kennel (dog house and kennel are synonymous). It seems that these two tasks, recognizing relations and compositions, are closely connected. However, up to now, the best models for relations are significantly different from the best models for compositions. In this paper, we introduce a dual-space model that unifies these two tasks. This model matches the performance of the best previous models for relations and compositions. The dual-space model consists of a space for measuring domain similarity and a space for measuring function similarity. Carpenter and wood share the same domain, the domain of carpentry. Mason and stone share the same domain, the domain of masonry. Carpenter and mason share the same function, the function of artisans. Wood and stone share the same function, the function of materials. In the composition dog house, kennel has some domain overlap with both dog and house (the domains of pets and buildings). The function of kennel is similar to the function of house (the function of shelters). By combining domain and function similarities in various ways, we can model relations, compositions, and other aspects of semantics.
△ Less
Submitted 16 September, 2013;
originally announced September 2013.
-
Computing Lexical Contrast
Authors:
Saif M. Mohammad,
Bonnie J. Dorr,
Graeme Hirst,
Peter D. Turney
Abstract:
Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually-created lexicons focus on opposites, such as {\rm hot} and {\rm cold}. Opposites are of many kinds such as antipodals, complementaries, and gradable. However, existing lexicons often do not classify opp…
▽ More
Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually-created lexicons focus on opposites, such as {\rm hot} and {\rm cold}. Opposites are of many kinds such as antipodals, complementaries, and gradable. However, existing lexicons often do not classify opposites into the different kinds. They also do not explicitly list word pairs that are not opposites but yet have some degree of contrast in meaning, such as {\rm warm} and {\rm cold} or {\rm tropical} and {\rm freezing}. We propose an automatic method to identify contrasting word pairs that is based on the hypothesis that if a pair of words, $A$ and $B$, are contrasting, then there is a pair of opposites, $C$ and $D$, such that $A$ and $C$ are strongly related and $B$ and $D$ are strongly related. (For example, there exists the pair of opposites {\rm hot} and {\rm cold} such that {\rm tropical} is related to {\rm hot,} and {\rm freezing} is related to {\rm cold}.) We will call this the contrast hypothesis. We begin with a large crowdsourcing experiment to determine the amount of human agreement on the concept of oppositeness and its different kinds. In the process, we flesh out key features of different kinds of opposites. We then present an automatic and empirical measure of lexical contrast that relies on the contrast hypothesis, corpus statistics, and the structure of a {\it Roget}-like thesaurus. We show that the proposed measure of lexical contrast obtains high precision and large coverage, outperforming existing methods.
△ Less
Submitted 28 August, 2013;
originally announced August 2013.
-
Crowdsourcing a Word-Emotion Association Lexicon
Authors:
Saif M. Mohammad,
Peter D. Turney
Abstract:
Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word-emotion and word-polarity association lexicon q…
▽ More
Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word-emotion and word-polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion-annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher inter-annotator agreement than that obtained by asking if a term evokes an emotion.
△ Less
Submitted 28 August, 2013;
originally announced August 2013.
-
Analogy perception applied to seven tests of word comprehension
Authors:
Peter D. Turney
Abstract:
It has been argued that analogy is the core of cognition. In AI research, algorithms for analogy are often limited by the need for hand-coded high-level representations as input. An alternative approach is to use high-level perception, in which high-level representations are automatically generated from raw data. Analogy perception is the process of recognizing analogies using high-level perceptio…
▽ More
It has been argued that analogy is the core of cognition. In AI research, algorithms for analogy are often limited by the need for hand-coded high-level representations as input. An alternative approach is to use high-level perception, in which high-level representations are automatically generated from raw data. Analogy perception is the process of recognizing analogies using high-level perception. We present PairClass, an algorithm for analogy perception that recognizes lexical proportional analogies using representations that are automatically generated from a large corpus of raw textual data. A proportional analogy is an analogy of the form A:B::C:D, meaning "A is to B as C is to D". A lexical proportional analogy is a proportional analogy with words, such as carpenter:wood::mason:stone. PairClass represents the semantic relations between two words using a high-dimensional feature vector, in which the elements are based on frequencies of patterns in the corpus. PairClass recognizes analogies by applying standard supervised machine learning techniques to the feature vectors. We show how seven different tests of word comprehension can be framed as problems of analogy perception and we then apply PairClass to the seven resulting sets of analogy perception problems. We achieve competitive results on all seven tests. This is the first time a uniform approach has handled such a range of tests of word comprehension.
△ Less
Submitted 22 July, 2011;
originally announced July 2011.
-
From Frequency to Meaning: Vector Space Models of Semantics
Authors:
Peter D. Turney,
Patrick Pantel
Abstract:
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text.…
▽ More
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
△ Less
Submitted 4 March, 2010;
originally announced March 2010.
-
The Latent Relation Mapping Engine: Algorithm and Experiments
Authors:
Peter D. Turney
Abstract:
Many AI researchers and cognitive scientists have argued that analogy is the core of cognition. The most influential work on computational modeling of analogy-making is Structure Mapping Theory (SMT) and its implementation in the Structure Mapping Engine (SME). A limitation of SME is the requirement for complex hand-coded representations. We introduce the Latent Relation Mapping Engine (LRME), w…
▽ More
Many AI researchers and cognitive scientists have argued that analogy is the core of cognition. The most influential work on computational modeling of analogy-making is Structure Mapping Theory (SMT) and its implementation in the Structure Mapping Engine (SME). A limitation of SME is the requirement for complex hand-coded representations. We introduce the Latent Relation Mapping Engine (LRME), which combines ideas from SME and Latent Relational Analysis (LRA) in order to remove the requirement for hand-coded representations. LRME builds analogical mappings between lists of words, using a large corpus of raw text to automatically discover the semantic relations among the words. We evaluate LRME on a set of twenty analogical mapping problems, ten based on scientific analogies and ten based on common metaphors. LRME achieves human-level performance on the twenty problems. We compare LRME with a variety of alternative approaches and find that they are not able to reach the same level of performance.
△ Less
Submitted 23 December, 2008;
originally announced December 2008.
-
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
Authors:
Peter D. Turney
Abstract:
Recognizing analogies, synonyms, antonyms, and associations appear to be four distinct tasks, requiring distinct NLP algorithms. In the past, the four tasks have been treated independently, using a wide variety of algorithms. These four semantic classes, however, are a tiny sample of the full range of semantic phenomena, and we cannot afford to create ad hoc algorithms for each semantic phenomen…
▽ More
Recognizing analogies, synonyms, antonyms, and associations appear to be four distinct tasks, requiring distinct NLP algorithms. In the past, the four tasks have been treated independently, using a wide variety of algorithms. These four semantic classes, however, are a tiny sample of the full range of semantic phenomena, and we cannot afford to create ad hoc algorithms for each semantic phenomenon; we need to seek a unified approach. We propose to subsume a broad range of phenomena under analogies. To limit the scope of this paper, we restrict our attention to the subsumption of synonyms, antonyms, and associations. We introduce a supervised corpus-based machine learning algorithm for classifying analogous word pairs, and we show that it can solve multiple-choice SAT analogy questions, TOEFL synonym questions, ESL synonym-antonym questions, and similar-associated-both questions from cognitive psychology.
△ Less
Submitted 31 August, 2008;
originally announced September 2008.
-
Empirical Evaluation of Four Tensor Decomposition Algorithms
Authors:
Peter D. Turney
Abstract:
Higher-order tensor decompositions are analogous to the familiar Singular Value Decomposition (SVD), but they transcend the limitations of matrices (second-order tensors). SVD is a powerful tool that has achieved impressive results in information retrieval, collaborative filtering, computational linguistics, computational vision, and other fields. However, SVD is limited to two-dimensional array…
▽ More
Higher-order tensor decompositions are analogous to the familiar Singular Value Decomposition (SVD), but they transcend the limitations of matrices (second-order tensors). SVD is a powerful tool that has achieved impressive results in information retrieval, collaborative filtering, computational linguistics, computational vision, and other fields. However, SVD is limited to two-dimensional arrays of data (two modes), and many potential applications have three or more modes, which require higher-order tensor decompositions. This paper evaluates four algorithms for higher-order tensor decomposition: Higher-Order Singular Value Decomposition (HO-SVD), Higher-Order Orthogonal Iteration (HOOI), Slice Projection (SP), and Multislice Projection (MP). We measure the time (elapsed run time), space (RAM and disk space requirements), and fit (tensor reconstruction accuracy) of the four algorithms, under a variety of conditions. We find that standard implementations of HO-SVD and HOOI do not scale up to larger tensors, due to increasing RAM requirements. We recommend HOOI for tensors that are small enough for the available RAM and MP for larger tensors.
△ Less
Submitted 13 November, 2007;
originally announced November 2007.
-
Similarity of Semantic Relations
Authors:
Peter D. Turney
Abstract:
There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For exampl…
▽ More
There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector Space Model (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM.
△ Less
Submitted 25 August, 2006;
originally announced August 2006.
-
Self-Replication and Self-Assembly for Manufacturing
Authors:
Robert Ewaschuk,
Peter D. Turney
Abstract:
It has been argued that a central objective of nanotechnology is to make products inexpensively, and that self-replication is an effective approach to very low-cost manufacturing. The research presented here is intended to be a step towards this vision. We describe a computational simulation of nanoscale machines floating in a virtual liquid. The machines can bond together to form strands (chain…
▽ More
It has been argued that a central objective of nanotechnology is to make products inexpensively, and that self-replication is an effective approach to very low-cost manufacturing. The research presented here is intended to be a step towards this vision. We describe a computational simulation of nanoscale machines floating in a virtual liquid. The machines can bond together to form strands (chains) that self-replicate and self-assemble into user-specified meshes. There are four types of machines and the sequence of machine types in a strand determines the shape of the mesh they will build. A strand may be in an unfolded state, in which the bonds are straight, or in a folded state, in which the bond angles depend on the types of machines. By choosing the sequence of machine types in a strand, the user can specify a variety of polygonal shapes. A simulation typically begins with an initial unfolded seed strand in a soup of unbonded machines. The seed strand replicates by bonding with free machines in the soup. The child strands fold into the encoded polygonal shape, and then the polygons drift together and bond to form a mesh. We demonstrate that a variety of polygonal meshes can be manufactured in the simulation, by simply changing the sequence of machine types in the seed.
△ Less
Submitted 27 July, 2006;
originally announced July 2006.
-
Expressing Implicit Semantic Relations without Supervision
Authors:
Peter D. Turney
Abstract:
We present an unsupervised learning algorithm that mines large text corpora for patterns that express implicit semantic relations. For a given input word pair X:Y with some unspecified semantic relations, the corresponding output list of patterns <P1,...,Pm> is ranked according to how well each pattern Pi expresses the relations between X and Y. For example, given X=ostrich and Y=bird, the two h…
▽ More
We present an unsupervised learning algorithm that mines large text corpora for patterns that express implicit semantic relations. For a given input word pair X:Y with some unspecified semantic relations, the corresponding output list of patterns <P1,...,Pm> is ranked according to how well each pattern Pi expresses the relations between X and Y. For example, given X=ostrich and Y=bird, the two highest ranking output patterns are "X is the largest Y" and "Y such as the X". The output patterns are intended to be useful for finding further pairs with the same relations, to support the construction of lexicons, ontologies, and semantic networks. The patterns are sorted by pertinence, where the pertinence of a pattern Pi for a word pair X:Y is the expected relational similarity between the given pair and typical pairs for Pi. The algorithm is empirically evaluated on two tasks, solving multiple-choice SAT word analogy questions and classifying semantic relations in noun-modifier pairs. On both tasks, the algorithm achieves state-of-the-art results, performing significantly better than several alternative pattern ranking algorithms, based on tf-idf.
△ Less
Submitted 27 July, 2006;
originally announced July 2006.
-
Corpus-based Learning of Analogies and Semantic Relations
Authors:
Peter D. Turney,
Michael L. Littman
Abstract:
We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is t…
▽ More
We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.
△ Less
Submitted 23 August, 2005;
originally announced August 2005.
-
Measuring Semantic Similarity by Latent Relational Analysis
Authors:
Peter D. Turney
Abstract:
This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fun…
▽ More
This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks.
△ Less
Submitted 10 August, 2005;
originally announced August 2005.
-
Combining Independent Modules in Lexical Multiple-Choice Problems
Authors:
Peter D. Turney,
Michael L. Littman,
Jeffrey Bigham,
Victor Shnayder
Abstract:
Existing statistical approaches to natural language problems are very coarse approximations to the true complexity of language processing. As such, no single technique will be best for all problem instances. Many researchers are examining ensemble methods that combine the output of multiple modules to create more accurate solutions. This paper examines three merging rules for combining probabili…
▽ More
Existing statistical approaches to natural language problems are very coarse approximations to the true complexity of language processing. As such, no single technique will be best for all problem instances. Many researchers are examining ensemble methods that combine the output of multiple modules to create more accurate solutions. This paper examines three merging rules for combining probability distributions: the familiar mixture rule, the logarithmic rule, and a novel product rule. These rules were applied with state-of-the-art results to two problems used to assess human mastery of lexical semantics -- synonym questions and analogy questions. All three merging rules result in ensembles that are more accurate than any of their component modules. The differences among the three rules are not statistically significant, but it is suggestive that the popular mixture rule is not the best rule for either of the two problems.
△ Less
Submitted 10 January, 2005;
originally announced January 2005.
-
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
Authors:
Peter D. Turney
Abstract:
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attribu…
▽ More
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in Latent Semantic Analysis), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus.
△ Less
Submitted 6 December, 2004;
originally announced December 2004.
-
Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities
Authors:
Peter D. Turney
Abstract:
This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vect…
▽ More
This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vectors with several hundred features. Approximately half of the features are syntactic and the other half are semantic. The main novelty in the system is the method for generating the semantic features, based on word \hbox{co-occurrence} probabilities. The probabilities are estimated using the Waterloo MultiText System with a corpus of about one terabyte of unlabeled text, collected by a web crawler.
△ Less
Submitted 29 July, 2004;
originally announced July 2004.
-
Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems
Authors:
Peter D. Turney,
Michael L. Littman,
Jeffrey Bigham,
Victor Shnayder
Abstract:
Existing statistical approaches to natural language problems are very coarse approximations to the true complexity of language processing. As such, no single technique will be best for all problem instances. Many researchers are examining ensemble methods that combine the output of successful, separately developed modules to create more accurate solutions. This paper examines three merging rules…
▽ More
Existing statistical approaches to natural language problems are very coarse approximations to the true complexity of language processing. As such, no single technique will be best for all problem instances. Many researchers are examining ensemble methods that combine the output of successful, separately developed modules to create more accurate solutions. This paper examines three merging rules for combining probability distributions: the well known mixture rule, the logarithmic rule, and a novel product rule. These rules were applied with state-of-the-art results to two problems commonly used to assess human mastery of lexical semantics -- synonym questions and analogy questions. All three merging rules result in ensembles that are more accurate than any of their component modules. The differences among the three rules are not statistically significant, but it is suggestive that the popular mixture rule is not the best rule for either of the two problems.
△ Less
Submitted 19 September, 2003;
originally announced September 2003.
-
Measuring Praise and Criticism: Inference of Semantic Orientation from Association
Authors:
Peter D. Turney,
Michael L. Littman
Abstract:
The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would h…
▽ More
The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words.
△ Less
Submitted 19 September, 2003;
originally announced September 2003.
-
Coherent Keyphrase Extraction via Web Mining
Authors:
Peter D. Turney
Abstract:
Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually a…
▽ More
Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is that the selected keyphrases are occasionally incoherent. That is, the majority of the output keyphrases may fit together well, but there may be a minority that appear to be outliers, with no clear semantic relation to the majority or to each other. This paper presents enhancements to the Kea keyphrase extraction algorithm that are designed to increase the coherence of the extracted keyphrases. The approach is to use the degree of statistical association among candidate keyphrases as evidence that they may be semantically related. The statistical association is measured using web mining. Experiments demonstrate that the enhancements improve the quality of the extracted keyphrases. Furthermore, the enhancements are not domain-specific: the algorithm generalizes well when it is trained on one domain (computer science documents) and tested on another (physics documents).
△ Less
Submitted 20 August, 2003;
originally announced August 2003.
-
Learning Analogies and Semantic Relations
Authors:
Peter D. Turney,
Michael L. Littman
Abstract:
We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the Scholastic Aptitude Test (SAT). A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem…
▽ More
We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the Scholastic Aptitude Test (SAT). A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct). We motivate this research by relating it to work in cognitive science and linguistics, and by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for these challenging problems.
△ Less
Submitted 24 July, 2003;
originally announced July 2003.
-
Increasing Evolvability Considered as a Large-Scale Trend in Evolution
Authors:
Peter D. Turney
Abstract:
Evolvability is the capacity to evolve. This paper introduces a simple computational model of evolvability and demonstrates that, under certain conditions, evolvability can increase indefinitely, even when there is no direct selection for evolvability. The model shows that increasing evolvability implies an accelerating evolutionary pace. It is suggested that the conditions for indefinitely incr…
▽ More
Evolvability is the capacity to evolve. This paper introduces a simple computational model of evolvability and demonstrates that, under certain conditions, evolvability can increase indefinitely, even when there is no direct selection for evolvability. The model shows that increasing evolvability implies an accelerating evolutionary pace. It is suggested that the conditions for indefinitely increasing evolvability are satisfied in biological and cultural evolution. We claim that increasing evolvability is a large-scale trend in evolution. This hypothesis leads to testable predictions about biological and cultural evolution.
△ Less
Submitted 12 December, 2002;
originally announced December 2002.
-
Robust Classification with Context-Sensitive Features
Authors:
Peter D. Turney
Abstract:
This paper addresses the problem of classifying observations when features are context-sensitive, especially when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are tested…
▽ More
This paper addresses the problem of classifying observations when features are context-sensitive, especially when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are tested on three domains. The first domain is the diagnosis of gas turbine engines. The problem is to diagnose a faulty engine in one context, such as warm weather, when the fault has previously been seen only in another context, such as cold weather. The second domain is speech recognition. The context is given by the identity of the speaker. The problem is to recognize words spoken by a new speaker, not represented in the training set. The third domain is medical prognosis. The problem is to predict whether a patient with hepatitis will live or die. The context is the age of the patient. For all three domains, exploiting context results in substantially more accurate classification.
△ Less
Submitted 12 December, 2002;
originally announced December 2002.
-
Data Engineering for the Analysis of Semiconductor Manufacturing Data
Authors:
Peter D. Turney
Abstract:
We have analyzed manufacturing data from several different semiconductor manufacturing plants, using decision tree induction software called Q-YIELD. The software generates rules for predicting when a given product should be rejected. The rules are intended to help the process engineers improve the yield of the product, by helping them to discover the causes of rejection. Experience with Q-YIELD…
▽ More
We have analyzed manufacturing data from several different semiconductor manufacturing plants, using decision tree induction software called Q-YIELD. The software generates rules for predicting when a given product should be rejected. The rules are intended to help the process engineers improve the yield of the product, by helping them to discover the causes of rejection. Experience with Q-YIELD has taught us the importance of data engineering -- preprocessing the data to enable or facilitate decision tree induction. This paper discusses some of the data engineering problems we have encountered with semiconductor manufacturing data. The paper deals with two broad classes of problems: engineering the features in a feature vector representation and engineering the definition of the target concept (the classes). Manufacturing process data present special problems for feature engineering, since the data have multiple levels of granularity (detail, resolution). Engineering the target concept is important, due to our focus on understanding the past, as opposed to the more common focus in machine learning on predicting the future.
△ Less
Submitted 12 December, 2002;
originally announced December 2002.
-
Low Size-Complexity Inductive Logic Programming: The East-West Challenge Considered as a Problem in Cost-Sensitive Classification
Authors:
Peter D. Turney
Abstract:
The Inductive Logic Programming community has considered proof-complexity and model-complexity, but, until recently, size-complexity has received little attention. Recently a challenge was issued "to the international computing community" to discover low size-complexity Prolog programs for classifying trains. The challenge was based on a problem first proposed by Ryszard Michalski, 20 years ago.…
▽ More
The Inductive Logic Programming community has considered proof-complexity and model-complexity, but, until recently, size-complexity has received little attention. Recently a challenge was issued "to the international computing community" to discover low size-complexity Prolog programs for classifying trains. The challenge was based on a problem first proposed by Ryszard Michalski, 20 years ago. We interpreted the challenge as a problem in cost-sensitive classification and we applied a recently developed cost-sensitive classifier to the competition. Our algorithm was relatively successful (we won a prize). This paper presents our algorithm and analyzes the results of the competition.
△ Less
Submitted 12 December, 2002;
originally announced December 2002.
-
The Identification of Context-Sensitive Features: A Formal Definition of Context for Concept Learning
Authors:
Peter D. Turney
Abstract:
A large body of research in machine learning is concerned with supervised learning from examples. The examples are typically represented as vectors in a multi-dimensional feature space (also known as attribute-value descriptions). A teacher partitions a set of training examples into a finite number of classes. The task of the learning algorithm is to induce a concept from the training examples.…
▽ More
A large body of research in machine learning is concerned with supervised learning from examples. The examples are typically represented as vectors in a multi-dimensional feature space (also known as attribute-value descriptions). A teacher partitions a set of training examples into a finite number of classes. The task of the learning algorithm is to induce a concept from the training examples. In this paper, we formally distinguish three types of features: primary, contextual, and irrelevant features. We also formally define what it means for one feature to be context-sensitive to another feature. Context-sensitive features complicate the task of the learner and potentially impair the learner's performance. Our formal definitions make it possible for a learner to automatically identify context-sensitive features. After context-sensitive features have been identified, there are several strategies that the learner can employ for managing the features; however, a discussion of these strategies is outside of the scope of this paper. The formal definitions presented here correct a flaw in previously proposed definitions. We discuss the relationship between our work and a formal definition of relevance.
△ Less
Submitted 12 December, 2002;
originally announced December 2002.
-
The Management of Context-Sensitive Features: A Review of Strategies
Authors:
Peter D. Turney
Abstract:
In this paper, we review five heuristic strategies for handling context-sensitive features in supervised machine learning from examples. We discuss two methods for recovering lost (implicit) contextual information. We mention some evidence that hybrid strategies can have a synergetic effect. We then show how the work of several machine learning researchers fits into this framework. While we do n…
▽ More
In this paper, we review five heuristic strategies for handling context-sensitive features in supervised machine learning from examples. We discuss two methods for recovering lost (implicit) contextual information. We mention some evidence that hybrid strategies can have a synergetic effect. We then show how the work of several machine learning researchers fits into this framework. While we do not claim that these strategies exhaust the possibilities, it appears that the framework includes all of the techniques that can be found in the published literature on contextsensitive learning.
△ Less
Submitted 12 December, 2002;
originally announced December 2002.
-
Myths and Legends of the Baldwin Effect
Authors:
Peter D. Turney
Abstract:
This position paper argues that the Baldwin effect is widely misunderstood by the evolutionary computation community. The misunderstandings appear to fall into two general categories. Firstly, it is commonly believed that the Baldwin effect is concerned with the synergy that results when there is an evolving population of learning individuals. This is only half of the story. The full story is mo…
▽ More
This position paper argues that the Baldwin effect is widely misunderstood by the evolutionary computation community. The misunderstandings appear to fall into two general categories. Firstly, it is commonly believed that the Baldwin effect is concerned with the synergy that results when there is an evolving population of learning individuals. This is only half of the story. The full story is more complicated and more interesting. The Baldwin effect is concerned with the costs and benefits of lifetime learning by individuals in an evolving population. Several researchers have focussed exclusively on the benefits, but there is much to be gained from attention to the costs. This paper explains the two sides of the story and enumerates ten of the costs and benefits of lifetime learning by individuals in an evolving population. Secondly, there is a cluster of misunderstandings about the relationship between the Baldwin effect and Lamarckian inheritance of acquired characteristics. The Baldwin effect is not Lamarckian. A Lamarckian algorithm is not better for most evolutionary computing problems than a Baldwinian algorithm. Finally, Lamarckian inheritance is not a better model of memetic (cultural) evolution than the Baldwin effect.
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
Exploiting Context When Learning to Classify
Authors:
Peter D. Turney
Abstract:
This paper addresses the problem of classifying observations when features are context-sensitive, specifically when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are test…
▽ More
This paper addresses the problem of classifying observations when features are context-sensitive, specifically when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are tested on two domains. The first domain is the diagnosis of gas turbine engines. The problem is to diagnose a faulty engine in one context, such as warm weather, when the fault has previously been seen only in another context, such as cold weather. The second domain is speech recognition. The problem is to recognize words spoken by a new speaker, not represented in the training set. For both domains, exploiting context results in substantially more accurate classification.
△ Less
Submitted 12 December, 2002;
originally announced December 2002.
-
Types of Cost in Inductive Concept Learning
Authors:
Peter D. Turney
Abstract:
Inductive concept learning is the task of learning to assign cases to a discrete set of classes. In real-world applications of concept learning, there are many different types of cost involved. The majority of the machine learning literature ignores all types of cost (unless accuracy is interpreted as a type of cost measure). A few papers have investigated the cost of misclassification errors. V…
▽ More
Inductive concept learning is the task of learning to assign cases to a discrete set of classes. In real-world applications of concept learning, there are many different types of cost involved. The majority of the machine learning literature ignores all types of cost (unless accuracy is interpreted as a type of cost measure). A few papers have investigated the cost of misclassification errors. Very few papers have examined the many other types of cost. In this paper, we attempt to create a taxonomy of the different types of cost that are involved in inductive concept learning. This taxonomy may help to organize the literature on cost-sensitive learning. We hope that it will inspire researchers to investigate all types of cost in inductive concept learning in more depth.
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
Authors:
Peter D. Turney
Abstract:
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a For…
▽ More
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
Authors:
Peter D. Turney
Abstract:
This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a n…
▽ More
This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
Contextual Normalization Applied to Aircraft Gas Turbine Engine Diagnosis
Authors:
Peter D. Turney,
Michael Halasz
Abstract:
Diagnosing faults in aircraft gas turbine engines is a complex problem. It involves several tasks, including rapid and accurate interpretation of patterns in engine sensor data. We have investigated contextual normalization for the development of a software tool to help engine repair technicians with interpretation of sensor data. Contextual normalization is a new strategy for employing machine…
▽ More
Diagnosing faults in aircraft gas turbine engines is a complex problem. It involves several tasks, including rapid and accurate interpretation of patterns in engine sensor data. We have investigated contextual normalization for the development of a software tool to help engine repair technicians with interpretation of sensor data. Contextual normalization is a new strategy for employing machine learning. It handles variation in data that is due to contextual factors, rather than the health of the engine. It does this by normalizing the data in a context-sensitive manner. This learning strategy was developed and tested using 242 observations of an aircraft gas turbine engine in a test cell, where each observation consists of roughly 12,000 numbers, gathered over a 12 second interval. There were eight classes of observations: seven deliberately implanted classes of faults and a healthy class. We compared two approaches to implementing our learning strategy: linear regression and instance-based learning. We have three main results. (1) For the given problem, instance-based learning works better than linear regression. (2) For this problem, contextual normalization works better than other common forms of normalization. (3) The algorithms described here can be the basis for a useful software tool for assisting technicians with the interpretation of sensor data.
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
Theoretical Analyses of Cross-Validation Error and Voting in Instance-Based Learning
Authors:
Peter D. Turney
Abstract:
This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic. Cross-validation requires a set of training examples and a set of testing examples. The value of the attribute that is to be predicted is known to the learner in the…
▽ More
This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic. Cross-validation requires a set of training examples and a set of testing examples. The value of the attribute that is to be predicted is known to the learner in the training set, but unknown in the testing set. The theory demonstrates that cross-validation error has two components: error on the training set (inaccuracy) and sensitivity to noise (instability). This general theory is then applied to voting in instance-based learning. Given an example in the testing set, a typical instance-based learning algorithm predicts the designated attribute by voting among the k nearest neighbors (the k most similar examples) to the testing example in the training set. Voting is intended to increase the stability (resistance to noise) of instance-based learning, but a theoretical analysis shows that there are circumstances in which voting can be destabilizing. The theory suggests ways to minimize cross-validation error, by insuring that voting is stable and does not adversely affect accuracy.
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
A Theory of Cross-Validation Error
Authors:
Peter D. Turney
Abstract:
This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize cross-validation error. A…
▽ More
This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize cross-validation error. A general theory is presented, then it is developed in detail for linear regression and instance-based learning.
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
Technical Note: Bias and the Quantification of Stability
Authors:
Peter D. Turney
Abstract:
Research on bias in machine learning algorithms has generally been concerned with the impact of bias on predictive accuracy. We believe that there are other factors that should also play a role in the evaluation of bias. One such factor is the stability of the algorithm; in other words, the repeatability of the results. If we obtain two sets of data from the same phenomenon, with the same underl…
▽ More
Research on bias in machine learning algorithms has generally been concerned with the impact of bias on predictive accuracy. We believe that there are other factors that should also play a role in the evaluation of bias. One such factor is the stability of the algorithm; in other words, the repeatability of the results. If we obtain two sets of data from the same phenomenon, with the same underlying probability distribution, then we would like our learning algorithm to induce approximately the same concepts from both sets of data. This paper introduces a method for quantifying stability, based on a measure of the agreement between concepts. We also discuss the relationships among stability, predictive accuracy, and bias.
△ Less
Submitted 11 December, 2002;
originally announced December 2002.
-
How to Shift Bias: Lessons from the Baldwin Effect
Authors:
Peter D. Turney
Abstract:
An inductive learning algorithm takes a set of data as input and generates a hypothesis as output. A set of data is typically consistent with an infinite number of hypotheses; therefore, there must be factors other than the data that determine the output of the learning algorithm. In machine learning, these other factors are called the bias of the learner. Classical learning algorithms have a fi…
▽ More
An inductive learning algorithm takes a set of data as input and generates a hypothesis as output. A set of data is typically consistent with an infinite number of hypotheses; therefore, there must be factors other than the data that determine the output of the learning algorithm. In machine learning, these other factors are called the bias of the learner. Classical learning algorithms have a fixed bias, implicit in their design. Recently developed learning algorithms dynamically adjust their bias as they search for a hypothesis. Algorithms that shift bias in this manner are not as well understood as classical algorithms. In this paper, we show that the Baldwin effect has implications for the design and analysis of bias shifting algorithms. The Baldwin effect was proposed in 1896, to explain how phenomena that might appear to require Lamarckian evolution (inheritance of acquired characteristics) can arise from purely Darwinian evolution. Hinton and Nowlan presented a computational model of the Baldwin effect in 1987. We explore a variation on their model, which we constructed explicitly to illustrate the lessons that the Baldwin effect has for research in bias shifting algorithms. The main lesson is that it appears that a good strategy for shift of bias in a learning algorithm is to begin with a weak bias and gradually shift to a strong bias.
△ Less
Submitted 10 December, 2002;
originally announced December 2002.
-
A Simple Model of Unbounded Evolutionary Versatility as a Largest-Scale Trend in Organismal Evolution
Authors:
Peter D. Turney
Abstract:
The idea that there are any large-scale trends in the evolution of biological organisms is highly controversial. It is commonly believed, for example, that there is a large-scale trend in evolution towards increasing complexity, but empirical and theoretical arguments undermine this belief. Natural selection results in organisms that are well adapted to their local environments, but it is not cl…
▽ More
The idea that there are any large-scale trends in the evolution of biological organisms is highly controversial. It is commonly believed, for example, that there is a large-scale trend in evolution towards increasing complexity, but empirical and theoretical arguments undermine this belief. Natural selection results in organisms that are well adapted to their local environments, but it is not clear how local adaptation can produce a global trend. In this paper, I present a simple computational model, in which local adaptation to a randomly changing environment results in a global trend towards increasing evolutionary versatility. In this model, for evolutionary versatility to increase without bound, the environment must be highly dynamic. The model also shows that unbounded evolutionary versatility implies an accelerating evolutionary pace. I believe that unbounded increase in evolutionary versatility is a large-scale trend in evolution. I discuss some of the testable predictions about organismal evolution that are suggested by the model.
△ Less
Submitted 10 December, 2002;
originally announced December 2002.
-
Learning Algorithms for Keyphrase Extraction
Authors:
Peter D. Turney
Abstract:
Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a wide variety of tasks for which keyphrases are useful, as we discuss in this paper. We approach the problem of automatically extracting keyphrases from…
▽ More
Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a wide variety of tasks for which keyphrases are useful, as we discuss in this paper. We approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. We evaluate the performance of nine different configurations of C4.5. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for automatically extracting keyphrases from text. The experimental results support the claim that a custom-designed algorithm (GenEx), incorporating specialized procedural domain knowledge, can generate better keyphrases than a generalpurpose algorithm (C4.5). Subjective human evaluation of the keyphrases generated by Extractor suggests that about 80% of the keyphrases are acceptable to human readers. This level of performance should be satisfactory for a wide variety of applications.
△ Less
Submitted 10 December, 2002;
originally announced December 2002.
-
Answering Subcognitive Turing Test Questions: A Reply to French
Authors:
Peter D. Turney
Abstract:
Robert French has argued that a disembodied computer is incapable of passing a Turing Test that includes subcognitive questions. Subcognitive questions are designed to probe the network of cultural and perceptual associations that humans naturally develop as we live, embodied and embedded in the world. In this paper, I show how it is possible for a disembodied computer to answer subcognitive que…
▽ More
Robert French has argued that a disembodied computer is incapable of passing a Turing Test that includes subcognitive questions. Subcognitive questions are designed to probe the network of cultural and perceptual associations that humans naturally develop as we live, embodied and embedded in the world. In this paper, I show how it is possible for a disembodied computer to answer subcognitive questions appropriately, contrary to French's claim. My approach to answering subcognitive questions is to use statistical information extracted from a very large collection of text. In particular, I show how it is possible to answer a sample of subcognitive questions taken from French, by issuing queries to a search engine that indexes about 350 million Web pages. This simple algorithm may shed light on the nature of human (sub-) cognition, but the scope of this paper is limited to demonstrating that French is mistaken: a disembodied computer can answer subcognitive questions.
△ Less
Submitted 9 December, 2002;
originally announced December 2002.