Using Support Vector Regression(SVR) on GIS and map data, the goal is to create a landslide susceptibility model for the province of Laguna.
The flowchart in the next figure describes the process of the methodology on how the study plans to accomlish its goals.
The study aims to acquire topographic, soil, vegetation, and hazards map data from National Mapping and Resource Information Authority(NAMRIA) and PhilGIS. Also, data from OpenStreetMap and Google Earth Pro images and tools will be utilized. All the data to be acquired are fully accessible for public use.
Consolidated GIS and map data will be prepared, studied; influenced by the geographic location of the province of Laguna and will be including all the adjacent provinces with significant geographical impact.
The data will focus on, but is not limited to, the hazard factors that can highly contribute to the conception of landslides, namely: topographic factors, earthquake, ground rupture and shaking, liquefication, flood, and rain.
GIS data will be cleaned and tested for sanity-checks, understanding its basic properties, features, and characteristics. Maps will be analysed and enhanced; exploring the relationship between the different characteristics of the data being presented in them. This part of the process aims to detect and expose unexpected features within the collected data. It is expected that the datasets and sources may contain anomalies and unwanted artifacts. After finding anomalies, the values will be reformatted and recoded. Preperation may include but is not limited to: grouping, smoothing and subsetting.
In this part of the study, important data which cannot be interpreted mathematically will also be transformed. Topographic and map images will be transformed into scalar numeric values. The map images will be separated into grid matrices that will be represented by longitude and latitude.
There will be progressions of data point separations, from a 100 by 100 grid matrix up to a maximum of 500 by 500 grid matrix incrementing by 100 in each iteration, with a total of five(5) sets of data to independently run through the prediction model.
After preparation, the data will be transformed into Comma-Separated Values(CSV) files, for easy and better processing and importing into Geopandas, an open source project specific to working with geospatial data in Python.
Using Scikit-learn, a Python machine learning library, the data will be modeled using Support Vector Regression(SVR). SVR is an extension of the Support Vector Classification for solving regression problems. The study will be using a radial basis function(rbf) as kernel for training the data.
Using Scikit-learn's Estimator Score Method, the model will be evaluated. To quantify the quality of the predictions made by the model, the estimator will have a score morethod providing a default evaluation criterion for measuring the accuracy of predicting the landslide susceptibility.
The output from the model will be mapped using Shapely, a BSD-licensed Python package for manipulation and analysis of planar geometric objects.