-
Model-agnostic Mitigation Strategies of Data Imbalance for Regression
Authors:
Jelke Wibbeke,
Sebastian Rohjans,
Andreas Rauh
Abstract:
Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability. This is particularly detrimental in applications aimed at predicting rare events that fall outside the domain of the bulk of the training data. In this study, we review the current state-of-the-art regarding sampling-based methods and cost-sensitive le…
▽ More
Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability. This is particularly detrimental in applications aimed at predicting rare events that fall outside the domain of the bulk of the training data. In this study, we review the current state-of-the-art regarding sampling-based methods and cost-sensitive learning. Additionally, we propose novel approaches to mitigate model bias. To better asses the importance of data, we introduce the density-distance and density-ratio relevance functions, which effectively integrate empirical frequency of data with domain-specific preferences, offering enhanced interpretability for end-users. Furthermore, we present advanced mitigation techniques (cSMOGN and crbSMOGN), which build upon and improve existing sampling methods. In a comprehensive quantitative evaluation, we benchmark state-of-the-art methods on 10 synthetic and 42 real-world datasets, using neural networks, XGBoosting trees and Random Forest models. Our analysis reveals that while most strategies improve performance on rare samples, they often degrade it on frequent ones. We demonstrate that constructing an ensemble of models -- one trained with imbalance mitigation and another without -- can significantly reduce these negative effects. The key findings underscore the superior performance of our novel crbSMOGN sampling technique with the density-ratio relevance function for neural networks, outperforming state-of-the-art methods.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
EmoconLite: Bridging the Gap Between Emotiv and Play for Children With Severe Disabilities
Authors:
Javad Rahimipour Anaraki,
Chelsea Anne Rauh,
Jason Leung,
Tom Chau
Abstract:
Brain-computer interfaces (BCIs) allow users to control computer applications by modulating their brain activity. Since BCIs rely solely on brain activity, they have enormous potential as an alternative access method for engaging children with severe disabilities and/or medical complexities in therapeutic recreation and leisure. In particular, one commercially available BCI platform is the Emotiv…
▽ More
Brain-computer interfaces (BCIs) allow users to control computer applications by modulating their brain activity. Since BCIs rely solely on brain activity, they have enormous potential as an alternative access method for engaging children with severe disabilities and/or medical complexities in therapeutic recreation and leisure. In particular, one commercially available BCI platform is the Emotiv EPOC headset, which is a portable and affordable electroencephalography (EEG) device. Combined with the EmotivBCI software, the Emotiv system can generate a model to discern between different mental tasks based on the user's EEG signals in real-time. While the Emotiv system shows promise for use by the pediatric population in the setting of a BCI clinic, it lacks integrated support that allows users to directly control computer applications using the generated classification output. To achieve this, users would have to create their own program, which can be challenging for those who may not be technologically inclined. To address this gap, we developed a freely available and user-friendly BCI software application called EmoconLite. Using the classification output from EmotivBCI, EmoconLite allows users to play YouTube video clips and a variety of video games from multiple platforms, ultimately creating an end-to-end solution for users. Through its deployment in the Holland Bloorview Kids Rehabilitation Hospital's BCI clinic, EmoconLite is bridging the gap between research and clinical practice, providing children with access to BCI technology and supporting BCI-enabled play.
△ Less
Submitted 25 May, 2021; v1 submitted 7 January, 2021;
originally announced January 2021.
-
Flint Water Crisis: Data-Driven Risk Assessment Via Residential Water Testing
Authors:
Jacob Abernethy,
Cyrus Anderson,
Chengyu Dai,
Arya Farahi,
Linh Nguyen,
Adam Rauh,
Eric Schwartz,
Wenbo Shen,
Guangsha Shi,
Jonathan Stroud,
Xinyu Tan,
Jared Webb,
Sheng Yang
Abstract:
Recovery from the Flint Water Crisis has been hindered by uncertainty in both the water testing process and the causes of contamination. In this work, we develop an ensemble of predictive models to assess the risk of lead contamination in individual homes and neighborhoods. To train these models, we utilize a wide range of data sources, including voluntary residential water tests, historical recor…
▽ More
Recovery from the Flint Water Crisis has been hindered by uncertainty in both the water testing process and the causes of contamination. In this work, we develop an ensemble of predictive models to assess the risk of lead contamination in individual homes and neighborhoods. To train these models, we utilize a wide range of data sources, including voluntary residential water tests, historical records, and city infrastructure data. Additionally, we use our models to identify the most prominent factors that contribute to a high risk of lead contamination. In this analysis, we find that lead service lines are not the only factor that is predictive of the risk of lead contamination of water. These results could be used to guide the long-term recovery efforts in Flint, minimize the immediate damages, and improve resource-allocation decisions for similar water infrastructure crises.
△ Less
Submitted 30 September, 2016;
originally announced October 2016.