-
Performance analysis of unsupervised feature selection methods
Authors:
A. Nisthana Parveen,
H. Hannah Inbarani,
E. N. Sathishkumar
Abstract:
Feature selection (FS) is a process which attempts to select more informative features. In some cases, too many redundant or irrelevant features may overpower main features for classification. Feature selection can remedy this problem and therefore improve the prediction accuracy and reduce the computational overhead of classification algorithms. The main aim of feature selection is to determine a…
▽ More
Feature selection (FS) is a process which attempts to select more informative features. In some cases, too many redundant or irrelevant features may overpower main features for classification. Feature selection can remedy this problem and therefore improve the prediction accuracy and reduce the computational overhead of classification algorithms. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In this paper, Principal Component Analysis (PCA), Rough PCA, Unsupervised Quick Reduct (USQR) algorithm and Empirical Distribution Ranking (EDR) approaches are applied to discover discriminative features that will be the most adequate ones for classification. Efficiency of the approaches is evaluated using standard classification metrics.
△ Less
Submitted 6 June, 2013;
originally announced June 2013.
-
An Analysis of Gene Expression Data using Penalized Fuzzy C-Means Approach
Authors:
P. K. Nizar Banu,
H. Hannah Inbarani
Abstract:
With the rapid advances of microarray technologies, large amounts of high-dimensional gene expression data are being generated, which poses significant computational challenges. A first step towards addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. A rob…
▽ More
With the rapid advances of microarray technologies, large amounts of high-dimensional gene expression data are being generated, which poses significant computational challenges. A first step towards addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. A robust gene expression clustering approach to minimize undesirable clustering is proposed. In this paper, Penalized Fuzzy C-Means (PFCM) Clustering algorithm is described and compared with the most representative off-line clustering techniques: K-Means Clustering, Rough K-Means Clustering and Fuzzy C-Means clustering. These techniques are implemented and tested for a Brain Tumor gene expression Dataset. Analysis of the performance of the proposed approach is presented through qualitative validation experiments. From experimental results, it can be observed that Penalized Fuzzy C-Means algorithm shows a much higher usability than the other projected clustering algorithms used in our comparison study. Significant and promising clustering results are presented using Brain Tumor Gene expression dataset. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. In these clustering results, we find that Penalized Fuzzy C-Means algorithm provides useful information as an aid to diagnosis in oncology.
△ Less
Submitted 8 January, 2013;
originally announced February 2013.
-
Fuzzy Soft Set Based Classification for Gene Expression Data
Authors:
N. Kalaiselvi,
H. Hannah Inbarani
Abstract:
Classification is one of the major issues in Data Mining Research fields. The classification problems in medical area often classify medical dataset based on the result of medical diagnosis or description of medical treatment by the medical practitioner. This research work discusses the classification process of Gene Expression data for three different cancers which are breast cancer, lung cancer…
▽ More
Classification is one of the major issues in Data Mining Research fields. The classification problems in medical area often classify medical dataset based on the result of medical diagnosis or description of medical treatment by the medical practitioner. This research work discusses the classification process of Gene Expression data for three different cancers which are breast cancer, lung cancer and leukemia cancer with two classes which are cancerous stage and non cancerous stage. We have applied a fuzzy soft set similarity based classifier to enhance the accuracy to predict the stages among cancer genes and the informative genes are selected by using Entopy filtering.
△ Less
Submitted 8 January, 2013;
originally announced January 2013.
-
Soft Set Based Feature Selection Approach for Lung Cancer Images
Authors:
G. Jothi,
H. Hannah Inbarani
Abstract:
Lung cancer is the deadliest type of cancer for both men and women. Feature selection plays a vital role in cancer classification. This paper investigates the feature selection process in Computed Tomographic (CT) lung cancer images using soft set theory. We propose a new soft set based unsupervised feature selection algorithm. Nineteen features are extracted from the segmented lung images using g…
▽ More
Lung cancer is the deadliest type of cancer for both men and women. Feature selection plays a vital role in cancer classification. This paper investigates the feature selection process in Computed Tomographic (CT) lung cancer images using soft set theory. We propose a new soft set based unsupervised feature selection algorithm. Nineteen features are extracted from the segmented lung images using gray level co-occurence matrix (GLCM) and gray level different matrix (GLDM). In this paper, an efficient Unsupervised Soft Set based Quick Reduct (SSUSQR) algorithm is presented. This method is used to select features from the data set and compared with existing rough set based unsupervised feature selection methods. Then K-Means and Self Organizing Map (SOM) clustering algorithms are used to cluster the data. The performance of the feature selection algorithms is evaluated based on performance of clustering techniques. The results show that the proposed method effectively removes redundant features.
△ Less
Submitted 21 December, 2012;
originally announced December 2012.
-
Fuzzy soft rough K-Means clustering approach for gene expression data
Authors:
K. Dhanalakshmi,
H. Hannah Inbarani
Abstract:
Clustering is one of the widely used data mining techniques for medical diagnosis. Clustering can be considered as the most important unsupervised learning technique. Most of the clustering methods group data based on distance and few methods cluster data based on similarity. The clustering algorithms classify gene expression data into clusters and the functionally related genes are grouped togeth…
▽ More
Clustering is one of the widely used data mining techniques for medical diagnosis. Clustering can be considered as the most important unsupervised learning technique. Most of the clustering methods group data based on distance and few methods cluster data based on similarity. The clustering algorithms classify gene expression data into clusters and the functionally related genes are grouped together in an efficient manner. The groupings are constructed such that the degree of relationship is strong among members of the same cluster and weak among members of different clusters. In this work, we focus on a similarity relationship among genes with similar expression patterns so that a consequential and simple analytical decision can be made from the proposed Fuzzy Soft Rough K-Means algorithm. The algorithm is developed based on Fuzzy Soft sets and Rough sets. Comparative analysis of the proposed work is made with bench mark algorithms like K-Means and Rough K-Means and efficiency of the proposed algorithm is illustrated in this work by using various cluster validity measures such as DB index and Xie-Beni index.
△ Less
Submitted 21 December, 2012;
originally announced December 2012.