Skip to main content

Showing 1–3 of 3 results for author: Tang, T M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.08928  [pdf, ps, other

    cs.LG stat.ME stat.ML

    Local MDI+: Local Feature Importances for Tree-Based Models

    Authors: Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu

    Abstract: Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific)… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2506.04553  [pdf, ps, other

    cs.LG stat.AP stat.CO stat.ML

    Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices

    Authors: Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

    Abstract: Unsupervised machine learning is widely used to mine large, unlabeled datasets to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more. However, despite its widespread utilization, there is a lack of standardization in unsupervised learning workflows for making reliable and reproducible scientific discoveries. In this paper, we prese… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 23 pages, 4 figures, 12 additional pages of citations

  3. arXiv:2307.01932  [pdf, other

    stat.ME cs.AI cs.LG stat.ML

    Integrating Random Forests and Generalized Linear Models for Improved Accuracy and Interpretability

    Authors: Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu

    Abstract: Random forests (RFs) are among the most popular supervised learning algorithms due to their nonlinear flexibility and ease-of-use. However, as black box models, they can only be interpreted via algorithmically-defined feature importance methods, such as Mean Decrease in Impurity (MDI), which have been observed to be highly unstable and have ambiguous scientific meaning. Furthermore, they can perfo… ▽ More

    Submitted 23 May, 2025; v1 submitted 4 July, 2023; originally announced July 2023.