An Oversampling Technique for Classifying Imbalanced Datasets
Advances in Business and Management Forecasting
ISBN: 978-1-78743-070-9, eISBN: 978-1-78743-069-3
Publication date: 26 October 2017
Abstract
We propose an oversampling technique to increase the true positive rate (sensitivity) in classifying imbalanced datasets (i.e., those with a value for the target variable that occurs with a small frequency) and hence boost the overall performance measurements such as balanced accuracy, G-mean and area under the receiver operating characteristic (ROC) curve, AUC. This oversampling method is based on the idea of applying the Synthetic Minority Oversampling Technique (SMOTE) on only a selective portion of the dataset instead of the entire dataset. We demonstrate the effectiveness of our oversampling method with four real and simulated datasets generated from three models.
Keywords
Citation
Nguyen, S., Quinn, J. and Olinsky, A. (2017), "An Oversampling Technique for Classifying Imbalanced Datasets", Advances in Business and Management Forecasting (Advances in Business and Management Forecasting, Vol. 12), Emerald Publishing Limited, Leeds, pp. 63-80. https://doi.org/10.1108/S1477-407020170000012004
Publisher
:Emerald Publishing Limited
Copyright © 2018 Emerald Publishing Limited