Skip to main content

Showing 1–7 of 7 results for author: Bej, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.13786  [pdf, other

    cs.LG

    Handling Missing Data in Downstream Tasks With Distribution-Preserving Guarantees

    Authors: Rahul Bordoloi, Clémence Réda, Saptarshi Bej, Olaf Wolkenhauer

    Abstract: Missing feature values are a significant hurdle for downstream machine-learning tasks such as classification. However, imputation methods for classification might be time-consuming for high-dimensional data, and offer few theoretical guarantees on the preservation of the data distribution and imputation quality, especially for not-missing-at-random mechanisms. First, we propose an imputation appro… ▽ More

    Submitted 14 May, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  2. arXiv:2409.17684  [pdf, other

    cs.LG cs.AI

    Preserving logical and functional dependencies in synthetic tabular data

    Authors: Chaithra Umesh, Kristian Schultz, Manjunath Mahendra, Saparshi Bej, Olaf Wolkenhauer

    Abstract: Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quanti… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Submitted to Pattern Recognition Journal

  3. arXiv:2407.09789  [pdf, other

    cs.LG

    Convex space learning for tabular synthetic data generation

    Authors: Manjunath Mahendra, Chaithra Umesh, Saptarshi Bej, Kristian Schultz, Olaf Wolkenhauer

    Abstract: Generating synthetic samples from the convex space of the minority class is a popular oversampling approach for imbalanced classification problems. Recently, deep-learning approaches have been successfully applied to modeling the convex space of minority samples. Beyond oversampling, learning the convex space of neighborhoods in training data has not been used to generate entire tabular datasets.… ▽ More

    Submitted 20 February, 2025; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: 30 pages, 10 figures, submitted to Neurocomputing journal

  4. Multivariate Functional Linear Discriminant Analysis for the Classification of Short Time Series with Missing Data

    Authors: Rahul Bordoloi, Clémence Réda, Orell Trautmann, Saptarshi Bej, Olaf Wolkenhauer

    Abstract: Functional linear discriminant analysis (FLDA) is a powerful tool that extends LDA-mediated multiclass classification and dimension reduction to univariate time-series functions. However, in the age of large multivariate and incomplete data, statistical dependencies between features must be estimated in a computationally tractable way, while also dealing with missing data. There is a need for a co… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    MSC Class: 62R10 (Primary); 62R07 (Secondary)

  5. arXiv:2206.09812  [pdf, other

    cs.LG

    ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets

    Authors: Kristian Schultz, Saptarshi Bej, Waldemar Hahn, Markus Wolfien, Prashant Srivastava, Olaf Wolkenhauer

    Abstract: Data is commonly stored in tabular format. Several fields of research are prone to small imbalanced tabular data. Supervised Machine Learning on such data is often difficult due to class imbalance. Synthetic data generation, i.e., oversampling, is a common remedy used to improve classifier performance. State-of-the-art linear interpolation approaches, such as LoRAS and ProWRAS can be used to gener… ▽ More

    Submitted 13 July, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

  6. arXiv:2107.07349  [pdf, other

    cs.LG cs.AI

    A multi-schematic classifier-independent oversampling approach for imbalanced datasets

    Authors: Saptarshi Bej, Kristian Schultz, Prashant Srivastava, Markus Wolfien, Olaf Wolkenhauer

    Abstract: Over 85 oversampling algorithms, mostly extensions of the SMOTE algorithm, have been built over the past two decades, to solve the problem of imbalanced datasets. However, it has been evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers. With numerous algorithms available, it is difficult to decide on an oversampling a… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: 12 tables, 6 figures

  7. arXiv:1908.08346  [pdf, other

    cs.LG stat.ML

    LoRAS: An oversampling approach for imbalanced datasets

    Authors: Saptarshi Bej, Narek Davtyan, Markus Wolfien, Mariam Nassar, Olaf Wolkenhauer

    Abstract: The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets. It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine… ▽ More

    Submitted 15 August, 2020; v1 submitted 22 August, 2019; originally announced August 2019.

    Comments: 2 figures, Supplementary data