Search | arXiv e-print repository

arXiv:1302.1733 [pdf, ps, other]

Feature Selection for Microarray Gene Expression Data using Simulated Annealing guided by the Multivariate Joint Entropy

Authors: Fernando González, Lluís A. Belanche

Abstract: In this work a new way to calculate the multivariate joint entropy is presented. This measure is the basis for a fast information-theoretic based evaluation of gene relevance in a Microarray Gene Expression data context. Its low complexity is based on the reuse of previous computations to calculate current feature relevance. The mu-TAFS algorithm --named as such to differentiate it from previous T… ▽ More In this work a new way to calculate the multivariate joint entropy is presented. This measure is the basis for a fast information-theoretic based evaluation of gene relevance in a Microarray Gene Expression data context. Its low complexity is based on the reuse of previous computations to calculate current feature relevance. The mu-TAFS algorithm --named as such to differentiate it from previous TAFS algorithms-- implements a simulated annealing technique specially designed for feature subset selection. The algorithm is applied to the maximization of gene subset relevance in several public-domain microarray data sets. The experimental results show a notoriously high classification performance and low size subsets formed by biologically meaningful genes. △ Less

Submitted 7 February, 2013; originally announced February 2013.

Comments: 12 pages, 6 Tables, 2 figures

arXiv:1212.2791 [pdf, other]

Understanding (dis)similarity measures

Authors: Lluís A. Belanche

Abstract: Intuitively, the concept of similarity is the notion to measure an inexact matching between two entities of the same reference set. The notions of similarity and its close relative dissimilarity are widely used in many fields of Artificial Intelligence. Yet they have many different and often partial definitions or properties, usually restricted to one field of application and thus incompatible wit… ▽ More Intuitively, the concept of similarity is the notion to measure an inexact matching between two entities of the same reference set. The notions of similarity and its close relative dissimilarity are widely used in many fields of Artificial Intelligence. Yet they have many different and often partial definitions or properties, usually restricted to one field of application and thus incompatible with other uses. This paper contributes to the design and understanding of similarity and dissimilarity measures for Artificial Intelligence. A formal dual definition for each concept is proposed, joined with a set of fundamental properties. The behavior of the properties under several transformations is studied and revealed as an important matter to bear in mind. We also develop several practical examples that work out the proposed approach. △ Less

Submitted 12 December, 2012; originally announced December 2012.

Comments: 10 pages, 2 figures

Report number: LSI-12-16-R ACM Class: I.5.3

arXiv:1101.2320 [pdf, ps, other]

Review and Evaluation of Feature Selection Algorithms in Synthetic Problems

Authors: L. A. Belanche, F. F. González

Abstract: The main purpose of Feature Subset Selection is to find a reduced subset of attributes from a data set described by a feature set. The task of a feature selection algorithm (FSA) is to provide with a computational solution motivated by a certain definition of relevance or by a reliable evaluation measure. In this paper several fundamental algorithms are studied to assess their performance in a con… ▽ More The main purpose of Feature Subset Selection is to find a reduced subset of attributes from a data set described by a feature set. The task of a feature selection algorithm (FSA) is to provide with a computational solution motivated by a certain definition of relevance or by a reliable evaluation measure. In this paper several fundamental algorithms are studied to assess their performance in a controlled experimental scenario. A measure to evaluate FSAs is devised that computes the degree of matching between the output given by a FSA and the known optimal solutions. An extensive experimental study on synthetic problems is carried out to assess the behaviour of the algorithms in terms of solution accuracy and size as a function of the relevance, irrelevance, redundancy and size of the data samples. The controlled experimental conditions facilitate the derivation of better-supported and meaningful conclusions. △ Less

Submitted 12 January, 2011; originally announced January 2011.

Comments: 13 pages, 3 figures

Showing 1–3 of 3 results for author: Belanche, L A