Skip to main content

Showing 1–11 of 11 results for author: Riegler, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2501.10540  [pdf, other

    stat.ML cs.LG

    DPERC: Direct Parameter Estimation for Mixed Data

    Authors: Tuan L. Vo, Quan Huu Do, Uyen Dang, Thu Nguyen, Pål Halvorsen, Michael A. Riegler, Binh T. Nguyen

    Abstract: The covariance matrix is a foundation in numerous statistical and machine-learning applications such as Principle Component Analysis, Correlation Heatmap, etc. However, missing values within datasets present a formidable obstacle to accurately estimating this matrix. While imputation methods offer one avenue for addressing this challenge, they often entail a trade-off between computational efficie… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  2. arXiv:2412.11164  [pdf, other

    cs.LG stat.AP

    Missing data imputation for noisy time-series data and applications in healthcare

    Authors: Lien P. Le, Xuan-Hien Nguyen Thi, Thu Nguyen, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen

    Abstract: Healthcare time series data is vital for monitoring patient activity but often contains noise and missing values due to various reasons such as sensor errors or data interruptions. Imputation, i.e., filling in the missing values, is a common way to deal with this issue. In this study, we compare imputation methods, including Multiple Imputation with Random Forest (MICE-RF) and advanced deep learni… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  3. arXiv:2311.16877  [pdf, other

    cs.LG stat.ML

    Imputation using training labels and classification via label imputation

    Authors: Thu Nguyen, Tuan L. Vo, Pål Halvorsen, Michael A. Riegler

    Abstract: Missing data is a common problem in practical data science settings. Various imputation methods have been developed to deal with missing data. However, even though the labels are available in the training data in many situations, the common practice of imputation usually only relies on the input and ignores the label. We propose Classification Based on MissForest Imputation (CBMI), a classificatio… ▽ More

    Submitted 29 January, 2025; v1 submitted 28 November, 2023; originally announced November 2023.

  4. arXiv:2305.06044  [pdf, other

    cs.LG stat.ML

    Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

    Authors: Nhat-Hao Pham, Khanh-Linh Vo, Mai Anh Vu, Thu Nguyen, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen

    Abstract: Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the effects of various missing data methods on the correlation plot, focusing on two common missing patterns: random and monotone. We aim to provide practical strategies… ▽ More

    Submitted 5 September, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

  5. arXiv:2302.00911  [pdf, other

    stat.ML cs.LG

    Conditional expectation with regularization for missing data imputation

    Authors: Mai Anh Vu, Thu Nguyen, Tu T. Do, Nhan Phan, Nitesh V. Chawla, Pål Halvorsen, Michael A. Riegler, Binh T. Nguyen

    Abstract: Missing data frequently occurs in datasets across various domains, such as medicine, sports, and finance. In many cases, to enable proper and reliable analyses of such data, the missing values are often imputed, and it is necessary that the method used has a low root mean square error (RMSE) between the imputed and the true values. In addition, for some critical applications, it is also often a re… ▽ More

    Submitted 11 September, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  6. arXiv:2210.05165  [pdf, ps, other

    stat.ML cs.LG

    Combining datasets to increase the number of samples and improve model fitting

    Authors: Thu Nguyen, Rabindra Khadka, Nhan Phan, Anis Yazidi, Pål Halvorsen, Michael A. Riegler

    Abstract: For many use cases, combining information from different datasets can be of interest to improve a machine learning model's performance, especially when the number of samples from at least one of the datasets is small. However, a potential challenge in such cases is that the features from these datasets are not identical, even though there are some commonly shared features among the datasets. To ta… ▽ More

    Submitted 16 May, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

  7. arXiv:2205.15150  [pdf, ps, other

    cs.LG stat.ML

    Principal Component Analysis based frameworks for efficient missing data imputation algorithms

    Authors: Thu Nguyen, Hoang Thien Ly, Michael Alexander Riegler, Pål Halvorsen, Hugo L. Hammer

    Abstract: Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but v… ▽ More

    Submitted 19 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  8. arXiv:2005.03912  [pdf, other

    cs.LG cs.MM stat.ML

    An Extensive Study on Cross-Dataset Bias and Evaluation Metrics Interpretation for Machine Learning applied to Gastrointestinal Tract Abnormality Classification

    Authors: Vajira Thambawita, Debesh Jha, Hugo Lewi Hammer, Håvard D. Johansen, Dag Johansen, Pål Halvorsen, Michael A. Riegler

    Abstract: Precise and efficient automated identification of Gastrointestinal (GI) tract diseases can help doctors treat more patients and improve the rate of disease detection and identification. Currently, automatic analysis of diseases in the GI tract is a hot topic in both computer science and medical-related journals. Nevertheless, the evaluation of such an automatic analysis is often incomplete or simp… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: 30 pages, 12 figures, 8 tables, Accepted for ACM Transactions on Computing for Healthcare

  9. arXiv:2004.12588  [pdf, other

    stat.ME stat.CO stat.ML

    Efficient Quantile Tracking Using an Oracle

    Authors: Hugo L. Hammer, Anis Yazidi, Michael A. Riegler, Håvard Rue

    Abstract: For incremental quantile estimators the step size and possibly other tuning parameters must be carefully set. However, little attention has been given on how to set these values in an online manner. In this article we suggest two novel procedures that address this issue. The core part of the procedures is to estimate the current tracking mean squared error (MSE). The MSE is decomposed in trackin… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

  10. arXiv:1910.13327  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Machine Learning-Based Analysis of Sperm Videos and Participant Data for Male Fertility Prediction

    Authors: Steven A. Hicks, Jorunn M. Andersen, Oliwia Witczak, Vajira Thambawita, Påll Halvorsen, Hugo L. Hammer, Trine B. Haugen, Michael A. Riegler

    Abstract: Methods for automatic analysis of clinical data are usually targeted towards a specific modality and do not make use of all relevant data available. In the field of male human reproduction, clinical and biological data are not used to its fullest potential. Manual evaluation of a semen sample using a microscope is time-consuming and requires extensive training. Furthermore, the validity of manual… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

    Comments: Preprint, accepted by Nature Scientific Reports for publication 24.10.2019

  11. arXiv:1810.13278  [pdf, other

    cs.LG stat.ML

    The Medico-Task 2018: Disease Detection in the Gastrointestinal Tract using Global Features and Deep Learning

    Authors: Vajira Thambawita, Debesh Jha, Michael Riegler, Pål Halvorsen, Hugo Lewi Hammer, Håvard D. Johansen, Dag Johansen

    Abstract: In this paper, we present our approach for the 2018 Medico Task classifying diseases in the gastrointestinal tract. We have proposed a system based on global features and deep neural networks. The best approach combines two neural networks, and the reproducible experimental results signify the efficiency of the proposed model with an accuracy rate of 95.80%, a precision of 95.87%, and an F1-score… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

    Comments: 2 pages + 1 page for references, 1 figure, Conference paper

    Journal ref: MediaEval 2018