Skip to main content

Showing 1–4 of 4 results for author: Vinh, N X

Searching in archive stat. Search in all archives.
.
  1. arXiv:1606.05596  [pdf, other

    stat.ML cs.LG

    Ground Truth Bias in External Cluster Validity Indices

    Authors: Yang Lei, James C. Bezdek, Simone Romano, Nguyen Xuan Vinh, Jeffrey Chan, James Bailey

    Abstract: It has been noticed that some external CVIs exhibit a preferential bias towards a larger or smaller number of clusters which is monotonic (directly or inversely) in the number of clusters in candidate partitions. This type of bias is caused by the functional form of the CVI model. For example, the popular Rand index (RI) exhibits a monotone increasing (NCinc) bias, while the Jaccard Index (JI) ind… ▽ More

    Submitted 17 June, 2016; originally announced June 2016.

  2. arXiv:1512.01286  [pdf, other

    stat.ML

    Adjusting for Chance Clustering Comparison Measures

    Authors: Simone Romano, Nguyen Xuan Vinh, James Bailey, Karin Verspoor

    Abstract: Adjusted for chance measures are widely used to compare partitions/clusterings of the same data set. In particular, the Adjusted Rand Index (ARI) based on pair-counting, and the Adjusted Mutual Information (AMI) based on Shannon information theory are very popular in the clustering community. Nonetheless it is an open problem as to what are the best application scenarios for each measure and guide… ▽ More

    Submitted 3 December, 2015; originally announced December 2015.

  3. arXiv:1510.07786  [pdf, other

    stat.ML

    A Framework to Adjust Dependency Measure Estimates for Chance

    Authors: Simone Romano, Nguyen Xuan Vinh, James Bailey, Karin Verspoor

    Abstract: Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonethele… ▽ More

    Submitted 20 January, 2016; v1 submitted 27 October, 2015; originally announced October 2015.

    Comments: In Proceedings of the 2016 SIAM International Conference on Data Mining

  4. arXiv:1109.5796  [pdf

    stat.AP cs.CE q-bio.GN q-bio.QM

    Genetic Testing for Complex Diseases: a Simulation Study Perspective

    Authors: Nguyen Xuan Vinh

    Abstract: It is widely recognized nowadays that complex diseases are caused by, amongst the others, multiple genetic factors. The recent advent of genome-wide association study (GWA) has triggered a wave of research aimed at discovering genetic factors underlying common complex diseases. While the number of reported susceptible genetic variants is increasing steadily, the application of such findings into d… ▽ More

    Submitted 29 September, 2011; v1 submitted 27 September, 2011; originally announced September 2011.

    Comments: 5 pages technical report