-
On the Predictive Properties of Binary Link Functions
Authors:
Necla Gunduz,
Ernest Fokoue
Abstract:
This paper provides a theoretical and computational justification of the long held claim that of the similarity of the probit and logit link functions often used in binary classification. Despite this widespread recognition of the strong similarities between these two link functions, very few (if any) researchers have dedicated time to carry out a formal study aimed at establishing and characteriz…
▽ More
This paper provides a theoretical and computational justification of the long held claim that of the similarity of the probit and logit link functions often used in binary classification. Despite this widespread recognition of the strong similarities between these two link functions, very few (if any) researchers have dedicated time to carry out a formal study aimed at establishing and characterizing firmly all the aspects of the similarities and differences. This paper proposes a definition of both structural and predictive equivalence of link functions-based binary regression models, and explores the various ways in which they are either similar or dissimilar. From a predictive analytics perspective, it turns out that not only are probit and logit perfectly predictively concordant, but the other link functions like cauchit and complementary log log enjoy very high percentage of predictive equivalence. Throughout this paper, simulated and real life examples demonstrate all the equivalence results that we prove theoretically.
△ Less
Submitted 16 February, 2015;
originally announced February 2015.
-
An Information-Theoretic Alternative to the Cronbach's Alpha Coefficient of Item Reliability
Authors:
Ernest Fokoue,
Necla Gunduz
Abstract:
We propose an information-theoretic alternative to the popular Cronbach alpha coefficient of reliability. Particularly suitable for contexts in which instruments are scored on a strictly nonnumeric scale, our proposed index is based on functions of the entropy of the distributions of defined on the sample space of responses. Our reliability index tracks the Cronbach alpha coefficient uniformly whi…
▽ More
We propose an information-theoretic alternative to the popular Cronbach alpha coefficient of reliability. Particularly suitable for contexts in which instruments are scored on a strictly nonnumeric scale, our proposed index is based on functions of the entropy of the distributions of defined on the sample space of responses. Our reliability index tracks the Cronbach alpha coefficient uniformly while offering several other advantages discussed in great details in this paper.
△ Less
Submitted 16 January, 2015;
originally announced January 2015.
-
Pattern Discovery in Students' Evaluations of Professors: A Statistical Data Mining Approach
Authors:
Necla Gunduz,
Ernest Fokoue
Abstract:
The evaluation of instructors by their students has been practiced at most universities for many decades, and there has always been a great interest in a variety of aspects of the evaluations. Are students matured and knowledgeable enough to provide useful and dependable feedback for the improvement of their instructors' teaching skills/abilities? Does the level of difficulty of the course have a…
▽ More
The evaluation of instructors by their students has been practiced at most universities for many decades, and there has always been a great interest in a variety of aspects of the evaluations. Are students matured and knowledgeable enough to provide useful and dependable feedback for the improvement of their instructors' teaching skills/abilities? Does the level of difficulty of the course have a strong relationship with the rating the student give an instructor? In this paper, we attempt to answer questions such as these using some state of the art statistical data mining techniques such support vector machines, classification and regression trees, boosting, random forest, factor analysis, kMeans clustering. hierarchical clustering. We explore various aspects of the data from both the supervised and unsupervised learning perspective. The data set analyzed in this paper was collected from a university in Turkey. The application of our techniques to this data reveals some very interesting patterns in the evaluations, like the strong association between the student's seriousness and dedication (measured by attendance) and the kind of scores they tend to assign to their instructors.
△ Less
Submitted 9 January, 2015;
originally announced January 2015.
-
Robust Classification of High Dimension Low Sample Size Data
Authors:
Necla Gunduz,
Ernest Fokoue
Abstract:
The robustification of pattern recognition techniques has been the subject of intense research in recent years. Despite the multiplicity of papers on the subject, very few articles have deeply explored the topic of robust classification in the high dimension low sample size context. In this work, we explore and compare the predictive performances of robust classification techniques with a special…
▽ More
The robustification of pattern recognition techniques has been the subject of intense research in recent years. Despite the multiplicity of papers on the subject, very few articles have deeply explored the topic of robust classification in the high dimension low sample size context. In this work, we explore and compare the predictive performances of robust classification techniques with a special concentration on robust discriminant analysis and robust PCA applied to a wide variety of large $p$ small $n$ data sets. We also explore the performance of random forest by way of comparing and contrasting the differences single model methods and ensemble methods in this context. Our work reveals that Random Forest, although not inherently designed to be robust to outliers, substantially outperforms the existing techniques specifically designed to achieve robustness. Indeed, random forest emerges as the best predictively on both real life and simulated data.
△ Less
Submitted 3 January, 2015;
originally announced January 2015.