Skip to main content

Showing 1–5 of 5 results for author: Lucena, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2210.16247  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Nonparametric Probabilistic Regression with Coarse Learners

    Authors: Brian Lucena

    Abstract: Probabilistic Regression refers to predicting a full probability density function for the target conditional on the features. We present a nonparametric approach to this problem which combines base classifiers (typically gradient boosted forests) trained on different coarsenings of the target value. By combining such classifiers and averaging the resulting densities, we are able to compute precise… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  2. arXiv:2206.07122  [pdf, other

    stat.ML cs.CV cs.IT cs.LG

    Loss Functions for Classification using Structured Entropy

    Authors: Brian Lucena

    Abstract: Cross-entropy loss is the standard metric used to train classification models in deep learning and gradient boosting. It is well-known that this loss function fails to account for similarities between the different values of the target. We propose a generalization of entropy called {\em structured entropy} which uses a random partition to incorporate the structure of the target variable in a manne… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

  3. arXiv:2007.04446  [pdf, other

    stat.ML cs.LG stat.AP stat.CO

    StructureBoost: Efficient Gradient Boosting for Structured Categorical Variables

    Authors: Brian Lucena

    Abstract: Gradient boosting methods based on Structured Categorical Decision Trees (SCDT) have been demonstrated to outperform numerical and one-hot-encodings on problems where the categorical variable has a known underlying structure. However, the enumeration procedure in the SCDT is infeasible except for categorical variables with low or moderate cardinality. We propose and implement two methods to overco… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  4. arXiv:2004.07383  [pdf, other

    stat.ML cs.AI cs.LG stat.AP

    Exploiting Categorical Structure Using Tree-Based Methods

    Authors: Brian Lucena

    Abstract: Standard methods of using categorical variables as predictors either endow them with an ordinal structure or assume they have no structure at all. However, categorical variables often possess structure that is more complicated than a linear ordering can capture. We develop a mathematical framework for representing the structure of categorical variables and show how to generalize decision trees to… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: To appear in AISTATS 2020 Proceedings

  5. arXiv:1809.07751  [pdf, other

    stat.ML cs.AI cs.LG math.PR

    Spline-Based Probability Calibration

    Authors: Brian Lucena

    Abstract: In many classification problems it is desirable to output well-calibrated probabilities on the different classes. We propose a robust, non-parametric method of calibrating probabilities called SplineCalib that utilizes smoothing splines to determine a calibration function. We demonstrate how applying certain transformations as part of the calibration process can improve performance on problems in… ▽ More

    Submitted 20 September, 2018; originally announced September 2018.