Addressing Class Imbalance in Soil Movement Predictions

EGUsphere

Kala Venkata Uday., Varun Dutt, Praveen Kumar, & P Priyanka

2023-08-10

Landslides threaten human life and infrastructure, resulting in fatalities and economic losses. Monitoring stations provide valuable data for predicting soil movement, which is crucial in mitigating this threat. Accurately predicting soil movement from monitoring data is challenging due to its complexity and inherent class imbalance. This study proposes developing machine learning (ML) models with oversampling techniques to address the class imbalance issue and develop a robust soil movement prediction system. The dataset, comprising two years (2019–2021) of monitoring data from a landslide in Uttarakhand, was split into a 70:30 ratio for training and testing. To tackle the class imbalance problem, various oversampling techniques, including Synthetic Minority Oversampling Technique (SMOTE), K-Means SMOTE, Borderline SMOTE, Support Vector Machine SMOTE, and Adaptive SMOTE (ADASYN), were applied to the dataset. Several ML models, namely Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (Light GBM), Adaptive Boosting (AdaBoost), Category Boosting (CatBoost), Long Short-Term Memory (LSTM), Multilayer Perceptron (MLP), and dynamic ensemble models, were trained and compared for soil movement prediction. Among these models, the dynamic ensemble model with K-Means SMOTE performed the best in testing, with an accuracy, precision, and recall rate of 99.68 % each and an F1-score of 0.9968. The RF model with K-Means SMOTE stood out as the second-best performer, achieving an impressive accuracy, precision, and recall rate of 99.64 % each and an F1-score of 0.9964. These results show that ML models with class imbalance techniques have the potential to significantly improve soil movement predictions in landslide-prone areas.