Search | arXiv e-print repository

doi 10.1016/j.neucom.2024.128975

Scalable Kernel Logistic Regression with Nyström Approximation: Theoretical Analysis and Application to Discrete Choice Modelling

Authors: José Ángel Martín-Baos, Ricardo García-Ródenas, Luis Rodriguez-Benitez, Michel Bierlaire

Abstract: The application of kernel-based Machine Learning (ML) techniques to discrete choice modelling using large datasets often faces challenges due to memory requirements and the considerable number of parameters involved in these models. This complexity hampers the efficient training of large-scale models. This paper addresses these problems of scalability by introducing the Nyström approximation for K… ▽ More The application of kernel-based Machine Learning (ML) techniques to discrete choice modelling using large datasets often faces challenges due to memory requirements and the considerable number of parameters involved in these models. This complexity hampers the efficient training of large-scale models. This paper addresses these problems of scalability by introducing the Nyström approximation for Kernel Logistic Regression (KLR) on large datasets. The study begins by presenting a theoretical analysis in which: i) the set of KLR solutions is characterised, ii) an upper bound to the solution of KLR with Nyström approximation is provided, and finally iii) a specialisation of the optimisation algorithms to Nyström KLR is described. After this, the Nyström KLR is computationally validated. Four landmark selection methods are tested, including basic uniform sampling, a k-means sampling strategy, and two non-uniform methods grounded in leverage scores. The performance of these strategies is evaluated using large-scale transport mode choice datasets and is compared with traditional methods such as Multinomial Logit (MNL) and contemporary ML techniques. The study also assesses the efficiency of various optimisation techniques for the proposed Nyström KLR model. The performance of gradient descent, Momentum, Adam, and L-BFGS-B optimisation methods is examined on these datasets. Among these strategies, the k-means Nyström KLR approach emerges as a successful solution for applying KLR to large datasets, particularly when combined with the L-BFGS-B and Adam optimisation methods. The results highlight the ability of this strategy to handle datasets exceeding 200,000 observations while maintaining robust performance. △ Less

Submitted 2 December, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: 34 pages, 5 figures

Journal ref: J.A. Martín-Baos, R. García-Ródenas, L. Rodriguez-Benitez, M. Bierlaire (2025). Scalable kernel logistic regression with Nyström approximation: Theoretical analysis and application to discrete choice modelling. Neurocomputing 617

arXiv:2009.06383 [pdf, other]

doi 10.1007/s11222-022-10182-3

Robust discrete choice models with t-distributed kernel errors

Authors: Rico Krueger, Michel Bierlaire, Thomas Gasos, Prateek Bansal

Abstract: Outliers in discrete choice response data may result from misclassification and misreporting of the response variable and from choice behaviour that is inconsistent with modelling assumptions (e.g. random utility maximisation). In the presence of outliers, standard discrete choice models produce biased estimates and suffer from compromised predictive accuracy. Robust statistical models are less se… ▽ More Outliers in discrete choice response data may result from misclassification and misreporting of the response variable and from choice behaviour that is inconsistent with modelling assumptions (e.g. random utility maximisation). In the presence of outliers, standard discrete choice models produce biased estimates and suffer from compromised predictive accuracy. Robust statistical models are less sensitive to outliers than standard non-robust models. This paper analyses two robust alternatives to the multinomial probit (MNP) model. The two models are robit models whose kernel error distributions are heavy-tailed t-distributions to moderate the influence of outliers. The first model is the multinomial robit (MNR) model, in which a generic degrees of freedom parameter controls the heavy-tailedness of the kernel error distribution. The second model, the generalised multinomial robit (Gen-MNR) model, is more flexible than MNR, as it allows for distinct heavy-tailedness in each dimension of the kernel error distribution. For both models, we derive Gibbs samplers for posterior inference. In a simulation study, we illustrate the excellent finite sample properties of the proposed Bayes estimators and show that MNR and Gen-MNR produce more accurate estimates if the choice data contain outliers through the lens of the non-robust MNP model. In a case study on transport mode choice behaviour, MNR and Gen-MNR outperform MNP by substantial margins in terms of in-sample fit and out-of-sample predictive accuracy. The case study also highlights differences in elasticity estimates across models. △ Less

Submitted 5 December, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

Journal ref: Statistics and Computing, 33 (2), 2023

arXiv:1906.03855 [pdf, other]

Bayesian Automatic Relevance Determination for Utility Function Specification in Discrete Choice Models

Authors: Filipe Rodrigues, Nicola Ortelli, Michel Bierlaire, Francisco Pereira

Abstract: Specifying utility functions is a key step towards applying the discrete choice framework for understanding the behaviour processes that govern user choices. However, identifying the utility function specifications that best model and explain the observed choices can be a very challenging and time-consuming task. This paper seeks to help modellers by leveraging the Bayesian framework and the conce… ▽ More Specifying utility functions is a key step towards applying the discrete choice framework for understanding the behaviour processes that govern user choices. However, identifying the utility function specifications that best model and explain the observed choices can be a very challenging and time-consuming task. This paper seeks to help modellers by leveraging the Bayesian framework and the concept of automatic relevance determination (ARD), in order to automatically determine an optimal utility function specification from an exponentially large set of possible specifications in a purely data-driven manner. Based on recent advances in approximate Bayesian inference, a doubly stochastic variational inference is developed, which allows the proposed DCM-ARD model to scale to very large and high-dimensional datasets. Using semi-artificial choice data, the proposed approach is shown to very accurately recover the true utility function specifications that govern the observed choices. Moreover, when applied to real choice data, DCM-ARD is shown to be able discover high quality specifications that can outperform previous ones from the literature according to multiple criteria, thereby demonstrating its practical applicability. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 21 pages, 2 figures, 11 tables

arXiv:1905.00419 [pdf, other]

Variational Bayesian Inference for Mixed Logit Models with Unobserved Inter- and Intra-Individual Heterogeneity

Authors: Rico Krueger, Prateek Bansal, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Abstract: Variational Bayes (VB), a method originating from machine learning, enables fast and scalable estimation of complex probabilistic models. Thus far, applications of VB in discrete choice analysis have been limited to mixed logit models with unobserved inter-individual taste heterogeneity. However, such a model formulation may be too restrictive in panel data settings, since tastes may vary both bet… ▽ More Variational Bayes (VB), a method originating from machine learning, enables fast and scalable estimation of complex probabilistic models. Thus far, applications of VB in discrete choice analysis have been limited to mixed logit models with unobserved inter-individual taste heterogeneity. However, such a model formulation may be too restrictive in panel data settings, since tastes may vary both between individuals as well as across choice tasks encountered by the same individual. In this paper, we derive a VB method for posterior inference in mixed logit models with unobserved inter- and intra-individual heterogeneity. In a simulation study, we benchmark the performance of the proposed VB method against maximum simulated likelihood (MSL) and Markov chain Monte Carlo (MCMC) methods in terms of parameter recovery, predictive accuracy and computational efficiency. The simulation study shows that VB can be a fast, scalable and accurate alternative to MSL and MCMC estimation, especially in applications in which fast predictions are paramount. VB is observed to be between 2.8 and 17.7 times faster than the two competing methods, while affording comparable or superior accuracy. Besides, the simulation study demonstrates that a parallelised implementation of the MSL estimator with analytical gradients is a viable alternative to MCMC in terms of both estimation accuracy and computational efficiency, as the MSL estimator is observed to be between 0.9 and 2.1 times faster than MCMC. △ Less

Submitted 16 January, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

arXiv:1904.07688 [pdf, other]

Pólygamma Data Augmentation to address Non-conjugacy in the Bayesian Estimation of Mixed Multinomial Logit Models

Authors: Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Abstract: The standard Gibbs sampler of Mixed Multinomial Logit (MMNL) models involves sampling from conditional densities of utility parameters using Metropolis-Hastings (MH) algorithm due to unavailability of conjugate prior for logit kernel. To address this non-conjugacy concern, we propose the application of Pólygamma data augmentation (PG-DA) technique for the MMNL estimation. The posterior estimates o… ▽ More The standard Gibbs sampler of Mixed Multinomial Logit (MMNL) models involves sampling from conditional densities of utility parameters using Metropolis-Hastings (MH) algorithm due to unavailability of conjugate prior for logit kernel. To address this non-conjugacy concern, we propose the application of Pólygamma data augmentation (PG-DA) technique for the MMNL estimation. The posterior estimates of the augmented and the default Gibbs sampler are similar for two-alternative scenario (binary choice), but we encounter empirical identification issues in the case of more alternatives ($J \geq 3$). △ Less

Submitted 13 April, 2019; originally announced April 2019.

Comments: arXiv admin note: text overlap with arXiv:1904.03647

arXiv:1904.03647 [pdf, other]

doi 10.1016/j.trb.2019.12.001

Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations

Authors: Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Abstract: Variational Bayes (VB) methods have emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of mixed multinomial logit (MMNL) models. It has been established that VB is substantially faster than MCMC at practically no compromises in predictive accuracy. In this paper, we address two critical gaps concerning the usage a… ▽ More Variational Bayes (VB) methods have emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of mixed multinomial logit (MMNL) models. It has been established that VB is substantially faster than MCMC at practically no compromises in predictive accuracy. In this paper, we address two critical gaps concerning the usage and understanding of VB for MMNL. First, extant VB methods are limited to utility specifications involving only individual-specific taste parameters. Second, the finite-sample properties of VB estimators and the relative performance of VB, MCMC and maximum simulated likelihood estimation (MSLE) are not known. To address the former, this study extends several VB methods for MMNL to admit utility specifications including both fixed and random utility parameters. To address the latter, we conduct an extensive simulation-based evaluation to benchmark the extended VB methods against MCMC and MSLE in terms of estimation times, parameter recovery and predictive accuracy. The results suggest that all VB variants with the exception of the ones relying on an alternative variational lower bound constructed with the help of the modified Jensen's inequality perform as well as MCMC and MSLE at prediction and parameter recovery. In particular, VB with nonconjugate variational message passing and the delta-method (VB-NCVMP-Delta) is up to 16 times faster than MCMC and MSLE. Thus, VB-NCVMP-Delta can be an attractive alternative to MCMC and MSLE for fast, scalable and accurate estimation of MMNL models. △ Less

Submitted 12 December, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

Journal ref: Transportation Research Part B: Methodological, Volume 131, January 2020, Pages 124-142

Showing 1–6 of 6 results for author: Bierlaire, M