A Similarity-Based Oversampling Method for Multi-label Imbalanced Text Data
Authors:
Ismail Hakki Karaman,
Gulser Koksal,
Levent Eriskin,
Salih Salihoglu
Abstract:
In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects, particularly those focused on multi-label classification, also grapple with data imbalance issues, where certain classes may lack sufficient data to train effective class…
▽ More
In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects, particularly those focused on multi-label classification, also grapple with data imbalance issues, where certain classes may lack sufficient data to train effective classifiers. This study introduces and examines a novel oversampling method for multi-label text classification, designed to address performance challenges associated with data imbalance. The proposed method identifies potential new samples from unlabeled data by leveraging similarity measures between instances. By iteratively searching the unlabeled dataset, the method locates instances similar to those in underrepresented classes and evaluates their contribution to classifier performance enhancement. Instances that demonstrate performance improvement are then added to the labeled dataset. Experimental results indicate that the proposed approach effectively enhances classifier performance post-oversampling.
△ Less
Submitted 21 December, 2024; v1 submitted 1 November, 2024;
originally announced November 2024.
Enhancing Next Destination Prediction: A Novel Long Short-Term Memory Neural Network Approach Using Real-World Airline Data
Authors:
Salih Salihoglu,
Gulser Koksal,
Orhan Abar
Abstract:
In the modern transportation industry, accurate prediction of travelers' next destinations brings multiple benefits to companies, such as customer satisfaction and targeted marketing. This study focuses on developing a precise model that captures the sequential patterns and dependencies in travel data, enabling accurate predictions of individual travelers' future destinations. To achieve this, a n…
▽ More
In the modern transportation industry, accurate prediction of travelers' next destinations brings multiple benefits to companies, such as customer satisfaction and targeted marketing. This study focuses on developing a precise model that captures the sequential patterns and dependencies in travel data, enabling accurate predictions of individual travelers' future destinations. To achieve this, a novel model architecture with a sliding window approach based on Long Short-Term Memory (LSTM) is proposed for destination prediction in the transportation industry. The experimental results highlight satisfactory performance and high scores achieved by the proposed model across different data sizes and performance metrics. This research contributes to advancing destination prediction methods, empowering companies to deliver personalized recommendations and optimize customer experiences in the dynamic travel landscape.
△ Less
Submitted 16 September, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.