-
TabFlex: Scaling Tabular Learning to Millions with Linear Attention
Authors:
Yuchen Zeng,
Tuan Dinh,
Wonjun Kang,
Andreas C Mueller
Abstract:
Leveraging the in-context learning (ICL) capability of Large Language Models (LLMs) for tabular classification has gained significant attention for its training-free adaptability across diverse datasets. Recent advancements, like TabPFN, excel in small-scale tabular datasets but struggle to scale for large and complex datasets. Our work enhances the efficiency and scalability of TabPFN for larger…
▽ More
Leveraging the in-context learning (ICL) capability of Large Language Models (LLMs) for tabular classification has gained significant attention for its training-free adaptability across diverse datasets. Recent advancements, like TabPFN, excel in small-scale tabular datasets but struggle to scale for large and complex datasets. Our work enhances the efficiency and scalability of TabPFN for larger datasets by incorporating linear attention mechanisms as a scalable alternative to complexity-quadratic self-attention. Our model, TabFlex, efficiently handles tabular datasets with thousands of features and hundreds of classes, scaling seamlessly to millions of samples. For instance, TabFlex processes the poker-hand dataset with over a million samples in just 5 seconds. Our extensive evaluations demonstrate that TabFlex can achieve over a 2x speedup compared to TabPFN and a 1.5x speedup over XGBoost, outperforming 25 tested baselines in terms of efficiency across a diverse range of datasets. Furthermore, TabFlex remains highly effective on large-scale datasets, delivering strong performance with significantly reduced computational costs, especially when combined with data-efficient techniques such as dimensionality reduction and data sampling.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Importance of Tuning Hyperparameters of Machine Learning Algorithms
Authors:
Hilde J. P. Weerts,
Andreas C. Mueller,
Joaquin Vanschoren
Abstract:
The performance of many machine learning algorithms depends on their hyperparameter settings. The goal of this study is to determine whether it is important to tune a hyperparameter or whether it can be safely set to a default value. We present a methodology to determine the importance of tuning a hyperparameter based on a non-inferiority test and tuning risk: the performance loss that is incurred…
▽ More
The performance of many machine learning algorithms depends on their hyperparameter settings. The goal of this study is to determine whether it is important to tune a hyperparameter or whether it can be safely set to a default value. We present a methodology to determine the importance of tuning a hyperparameter based on a non-inferiority test and tuning risk: the performance loss that is incurred when a hyperparameter is not tuned, but set to a default value. Because our methods require the notion of a default parameter, we present a simple procedure that can be used to determine reasonable default parameters. We apply our methods in a benchmark study using 59 datasets from OpenML. Our results show that leaving particular hyperparameters at their default value is non-inferior to tuning these hyperparameters. In some cases, leaving the hyperparameter at its default value even outperforms tuning it using a search procedure with a limited number of iterations.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Learning a Loopy Model For Semantic Segmentation Exactly
Authors:
Andreas Christian Mueller,
Sven Behnke
Abstract:
Learning structured models using maximum margin techniques has become an indispensable tool for com- puter vision researchers, as many computer vision applications can be cast naturally as an image labeling problem. Pixel-based or superpixel-based conditional random fields are particularly popular examples. Typ- ically, neighborhood graphs, which contain a large number of cycles, are used. As exac…
▽ More
Learning structured models using maximum margin techniques has become an indispensable tool for com- puter vision researchers, as many computer vision applications can be cast naturally as an image labeling problem. Pixel-based or superpixel-based conditional random fields are particularly popular examples. Typ- ically, neighborhood graphs, which contain a large number of cycles, are used. As exact inference in loopy graphs is NP-hard in general, learning these models without approximations is usually deemed infeasible. In this work we show that, despite the theoretical hardness, it is possible to learn loopy models exactly in practical applications. To this end, we analyze the use of multiple approximate inference techniques together with cutting plane training of structural SVMs. We show that our proposed method yields exact solutions with an optimality guarantees in a computer vision application, for little additional computational cost. We also propose a dynamic caching scheme to accelerate training further, yielding runtimes that are comparable with approximate methods. We hope that this insight can lead to a reconsideration of the tractability of loopy models in computer vision.
△ Less
Submitted 16 September, 2013;
originally announced September 2013.