In this study, a landslide susceptibility workflow was built by using Analytical Hierarchy Process and Frequency Ratio, and then used to create a Support Vector Regression model based on the features showing the heaviest feature weights. The model is then integrated in an interactive web application to provide functionality to incorporate rainfall data. The web application also provides a way to access the regression model.
This study attempted to develop a computational workflow for landslide susceptibility mapping of the province of Laguna. The data necessary for the creation of the workflow were acquired and processed to generate a landslide susceptibility map. Landslide controlling factors were selected through expert opinion and literature. The weights of each controlling factor was calculated using AHP and frequency ratio was also calculated. Frequency ratios were combined with the controlling factor weights to calculate the integrated weighted index that was used to create a landslide susceptibility map.
Data Acquisition: In order to map the landslide susceptibility of the province of Laguna, data from different local and international; government and non-government organizations were acquired. (1)Slope data and (2)land cover were acquired from the Philippines’ National Mapping and Resource Information Authority which came in the form of shape files, and acquired through a formal data request. (3) Elevation data in the form of geotiff file - ASTER GDEM was acquired from PhilGIS and NASA’s Earth Observing System(EOS) through the Landviewer web application that automatically accesses United States Geological Survey (USGS) and Shuttle Radar Topography Mission (SRTM) databases. (5) Near-Infrared and Red Bands were also acquired from EOS Landviewer. Vector maps of Philippines’ (6)water ways, (7)water basins were acquired from Geofabrik, a company that provides free vector shape data derived from OpenStreetMap. (8) Lithology data was acquired from the Bureau of Soils and Water Management(BWSM), PhilSoil project, and downloadable data in PhilGIS, (9) Landslide related hazard ratings(liquifaction, earthquake-induced, raininduced) were acquired from Philippine Institute of Volcanology and Seismology(PHILVOCS) , and (10) rainfall and weather data was acquired from the Philippine Atmospheric, Geophysical and Astronomical Services Administration(PAGASA)
Selecting the Landslide Controlling Factors: In this study, slope, aspect, elevation, ground integrity, distance from faults, distance from waterways, land use–land cover (LULC), normalized difference vegetation index (NDVI), and Terrain Ruggedness Index(TRI) were selected as the static landslide controlling factors and rainfall as a dynamic landslide controlling factor. Slope is one of the most recognized topographic landslide controlling factor which significantly affect the probability of a landslide(Ayalew and Yamagishi, 2005; Chalkias et al., 2016). When the slope reaches a gradient higher than 15 degrees, the possibility of a landslide happening increases (Lee and Min, 2001). Aspect refers to the direction where the slope faces, it is indirectly related to the amount of water a type of soil can hold, the ruggedness of the terrain and the vegetation, which indirectly affects landslide development [10]. Elevation, is the measure of land surface height, it is one of the key factors in determining gravitational potential energy of terrain [11]. Multiple studies have considered elevation as one of the most important landslide controlling factors. Ground Integrity can be derived from lithology and soil characteristics. Lithology is directly related to the slope stability, as the slope gets steeper, the lithological attributes will greatly determine if a landslide will be triggered (Guo et al., 2015). The ability of the soil to hold water and prevent breaking also hold key ingredients in the possibility of landslides. The cohesion factor will determine if the integrity of the soil is enough to hold significant amount of stress at a specific slope. Land Use and Land Cover is also a direct factor affecting the possibility of landslides [12]. The susceptibility of landslides increases during tectonic movement, thus distance to faults is also a landslide controlling factor. Earthquake triggered landslides are normally found in areas near a fault, distances of a slope from geological tectonic zones are important to consider in slope analysis (Fan et al, 2018). Distance to water ways and basins are also a factor to consider in slope analysis and landslide prediction [13]. Water moving along a steep slope contribute to erosion and terrain movement. Landslide can be indirectly affected by vegetation through the prevention of soil erosion. NDVI is the measurement of vegetation coverage using the observation of the visible red and near-infrared regions.
Another factor that supports the relationship of slope and elevation is the terrain ruggedness. ”Terrain Ruggedness Index(TRI) is a measurement developed by Riley, et al. (1999) to express the amount of elevation difference between adjacent cells of a digital elevation grid. The process essentially calculates the difference in elevation values from a center cell and the eight cells immediately surrounding it. Then it squares each of the eight elevation difference values to make them all positive and averages the squares. The terrain ruggedness index is then derived by taking the square root of this average, and corresponds to average elevation change between any point on a grid and its surrounding area.” - https://www.edenextdata.com/?q=content/terrainruggedness-index-tri One dynamic controlling factor of landslide susceptibility that continuously change, can be collected daily and has predicted values throughout the year is rainfall. The amount of rain an area receives for a period of time greatly affects the rate of erosion and the ground integrity most significantly in an area with a steep slope. Thus, including rainfall in the considered landslide controlling factors is essential.
Analytical Hierarchy Process(AHP): The pair-wise comparison matrix used in this study is based on a study by Yi, Zhang et. al., in 2019 they created pair-wise matrix for the landslide susceptibility mapping for Jiuzhaigou region of Sichuan Province, China. The matrix was then subjected to consultation from geologist from the Mines and Geosciences Bureau of the Philippines. The consistency ratio of this study, given 10 factors is 0.02967 which satisfies that it must be below the 0.1 consistency requirement.
Acquiring the Landslide Inventory: To derive a landslide inventory, two techniques can be used: (1) NASA DripSlip algorithm using Earth Engine. Drip-Slip is a landslide identification and extreme precipitation monitoring software that detect spectral changes from Landsat imagery that may indicate landslides; Figure 5 shows the interface provided by Earth Engine in implementing the Drip-Slip algorithm. (2) Practical identification of Landslides using Google Earth Pro as proposed by M. Mihir and B. Malamud where areas are scanned for coloration and characteristics of landslide or land movement. Figure 4 shows the sources of the landslide inventory that were processed in QGIS. This study focused on the landslide areas prevalent during the year 2015 to 2016 because most of the data acquired from the government offices are of the said date range. With the help of QGIS, a total of 319 suspected landslide areas were determined and mapped
Points of Interest and Feature Extraction: The slope data of the study area was provided by NAMRIA in the form of shape files, these shape files used polygons to map the slope, as seen in Figure 7 and 8. Using QGIS, centroids were extracted to get the points of interests, totaled to almost half a million data points ( 445,000) as seen in Figure 15. The points contain slope data and coordinates that will be used to extract the other landslide controlling factors from all the collected data formats.
Sources of Landslide Inventory
Potential landslide areas vectorized and buffered in QGIS
Centroid points derived from the slope layer that will be used to merge all data layers and feature extraction
Slope layer acquired from NAMRIA processed in QGIS
Result of Aspect processing in QGIS using DEM raster layer
Elevation Layer Landsat 7 ASTER DEM imported in QGIS
Mergin of Lithology layers from PhilSoil, PhilGIS, and Geofabrik using QGIS
Water ways and water basin layers from Geofabrik merged together using QGIS
Computed Terrain Ruggedness Index from Landsat DEM raster using QGIS
Combined Visible Red and Near-Infrared bands from Landsat 7 data layers
Feature Cleaning and Transformation: In this study, the PoI and their features were transformed into comma separated values(CSV) files and imported into a Jupyter notebook; Jupyter notebook is an open-source web application where one can create and share documents that contain live code, equations, visualizations and narrative text. Using the notebook allowed a way of importing, transforming, and presenting the data in such a way that the steps can be rearranged, refined, and refactored. The collected data was not ready for the computational workflow initially. Features had to be mapped to numerical values and cleaned.
The value transformation and standardization of the aspect and slope values were necessary. Removal of empty values and extreme outliers were also done. Some features were also normalized to create better results when processing of the data. The next figures show the normalized values extracted from the data layers that were labeled and processed in QGIS and Jupyter Notebook. From the starting feature count of 440,000 points, derived from the slope data layer by extracting the centroids of each slope polygon layer, as shown in Figure 16, the final feature count was 235,786 data points. The data points were then merged with the landslide inventory to create the final data set.
Frequency Ratio: The Frequency Ratio calculated in this study is presented in the next figures. The values presented in this study were calculated using python scripts implemented in Jupyter Notebook. The frequency table shows the contribution of each landslide controlling factor to what can be a potential landslide area.
Integrated Weight Index: In this study, the integrated weighted index formula was used to calculate the susceptibility of each potential landslide point. By acquiring the mean, which is and scale of remote sensing images, the effectivity of the tools used, and the expertise of the interpreter involved (Malamud et al., 2004). Improving the landslide inventory will greatly improve the performance of the model.
Creating the Susceptibility Map with Rainfall data: To generate a susceptibility map, in the Jupyter notebook, matplotlib, a plotting library for python was imported. The points of interest with their integrated weighted index were plotted on a 2d plane using the boundaries: minimum - (13.965200, 120.882250) and maximum - (14.565927, 121.677325).
In this study, the landslide mapping for static landslide controlling factors were generated first to compare the result when there is and there is no rainfall data; to show how rainfall can impact the landslide susceptibility of an area. In the left figure below, the base map created using the weighted integration index created using static landslide controlling factors while the figure on the right shows the map created when the rainfall data is integrated.
One of the goals of this study is to create a regression model to easily calculate the susceptibility rating of a presented area using landslide controlling factors used in creating the computational workflow that used the integrated weighted index and to be able to apply rainfall data as a dynamic landslide controlling factor. The study also wanted to know if the susceptibility can be derived with only half of the landslide controlling factors by training the support regression model using the highest weighted controlling factors.
Flowchart of the creation of the landslide susceptibility model and application of the Support Vector Regression Model)
The landslide controlling factors with the highest weights were used in training the machine learning model namely: Slope, Rainfall, Fault Distance, TRI, and Ground Integrity. The sorted list of the weights of the landslide controlling factors presented on Table VIII. The usage of AHP and FR methodologies created a complementing approach to the creation of an integrated weighed index for landslide susceptibility, and in turn paved a way to the successful application of the Support Vector Regression algorithm.
The Support Vector Regression implementation used was taken from the Scikit Learn SVM library. Scikit Learn is a free software machine learning library for the Python programming language. In order to get the same results as with the computational workflow, two layers of regressors were developed. The first layer handles the static landslide controlling factors, and the second layer integrates the Rainfall data to the result of the first regressor
In training the regression models, the data set was split into two: 80% training and 20% testing using the train test split and StratifiedShuffleSplit module from sklearn.model selection library. The regression model was created and continuously improved by using GridSearchCV on the kernel parameters. The regression models were trained iteratively to get the highest possible performance. Using the same data set used in the computational workflow.
Performance of the Support Vector Regression Model: The resulting base hazard map generated using the trained regressor from the selected static landslide controlling factors namely: Slope, Ground Integrity, Fault Distance, and Terrain Ruggedness Index is shown in the figures below.
Similar to the approach used on the computational workflow, to classify the landslide susceptibility of the points of interest, the mean was value of the results was set as the basis for Moderately susceptibility and the standard deviation of the results was used to get the distance of each classification from one another. When compared to the result of the generated map using the computational workflow shown in Figure 24, the resulting maps crerated are strikingly similar, both being able to mark the high susceptibility areas with high slopes and elevation. The same similarity of results were also observed after the integration of rainfall data.
A result of 88.9 percent for the base regressor and 89.2 percent for the rainfall regressor, both using rbf, was a breakthrough. This means that more than 80 percent of the landslide susceptibility results can be predicted by only providing the top 5 landslide controlling factors. Which can be further improved by generating a more accurate landslide inventory.
In order to visualize the landslide susceptibility efficiently, a web application was developed. The web application serves as the tool to dynamically supply the rainfall data to the existing model created using the static controlling factors. A grid view is overlayed on a map of the study area, each grid can be toggled to increase or decrease the amount of rainfall. The rainfall data of each point is updated and used to recalculate the susceptibility. The web application was also develop to create an interface for using the support vector regression models that were created in this study. Using React.js for the front end and Python with Flask RESTful API on the back end, the web application was successfully developed to be able to interface with the integrated weighted index and support vector regression models created in this study. The backend part of the application made use of the python scripts used in the development of the computational workflow. It also made use of the data set used in creating the maps made by the computational workflow.
Functionalities:
1. Display the current landslide susceptibility of the province of Laguna over a satellite map image of the province of Laguna
2. Provide an interactive weather/rainfall interface to change the rainfall value of the susceptibility model
3. Provide a form to input the 5 landslide controlling factors: Slope, Ground Integrity, Terrain Ruggedness Index, Distance from Faults, and Rainfall that will be used by the Support Vector Regression Models at the server side of the application to predict the landslide susceptibility
The web application can be deployed remotely or locally. The building and deployment instructions and resources, including the regression models and data sets needed, can be freely downloaded in https://github.com/rccapunpon/laguna-landslidesvr.