Data Imbalance in Landslide Susceptibility Zonation: A Case Study of Mandakini River Basin, Uttarakhand, India

IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium

Dericks Praise Shukla., & Sharad Kumar Gupta.

2020-09-26

Machine learning methods require a large amount of training data, however, the landslides do not occur everywhere and the number of landslide occurrences are limited in an area. This leads to a small number of landslide samples and a higher number of non-landslide samples. This underrepresented data and severe skewness in class distribution create an imbalance for learning algorithms, which becomes biased towards the majority class and have a low performance on the minority class. We have used two algorithms namely EasyEnsemble and BalanceCascade for reducing the imbalance in data. The balanced data is used with SVM to generate landslide susceptibility zonation maps. The results of the study show that SVM with balanced data has major improvements in the preparation of susceptibility maps over imbalanced data.