Machine Learning
What is machine learning?
Machine learning is the field of study that gives computers the ability to learn from data (e.g., discovering patterns and relations among the given data) and make predictions. It is arguably the fastest-growing area of study across all of the science and technology fields. It is probably also the only 'common language' that connects researchers and scientists from all scientific disciplines. A geophysicist and a medical researcher might not be able to communicate with each other regarding brain cancer or seismic migration but chances are that they both are familiar with convolutional neural networks and TensorFlow (or PyTorch). It is difficult to find one area of research that has not been influenced by machine learning. In fact, machine learning has transformed so many areas that it has permeated into many aspects of our daily life. Anyone with a smartphone can tangibly experience the technological advances, such as those in computer vision and natural language processing, brought about by machine learning.
The fundamental reason why machine learning has impacted so many areas is that many of the tasks that we human beings deal with are essentially recognizing patterns. Think about how a doctor recognizes tumors based on MRI images. He or she must have seen many such images (and their corresponding labels) to develop an ability to recognize the defining patterns pertaining to tumors. Based on the presence or absence of such image patterns, he or she can then make diagnosis. Similarly, a seismic interpreter learns to recognize the faults and geological layers from a seismic image based on some patterns such as discontinuities. Another good example in geophysics is the way geophysicists make structural interpretations based on a gravity or magnetic map. These geophysicists are trained through their education and their many years of working experiences in such a way that they are able to tell the existence of faults based on some particular patterns in the data maps.
It turns out that machine learning is very good at learning patterns. That is why machine learning has found so many successfully applications in a vast number of areas of study.
Machine learning works in much the same way that small children learn to recognize different objects in their environment. Kids learn to recognize an object by simply looking at many examples of that object such as a car. After a period of learning, they know how to recognize if an object is a car or not. This is also, in principle, how kids learn to speak. Similarly, to teach a computer to recognize, say, a cat, humans need to provide many examples of cat images (together with their labels) to a computer. After being exposed to enough number of examples, computers will learn to recognize cats. In fact, this is exactly what Google Brain did back in 2012. For those who are interested, here is a news article on New York Times back in 2012 that covers this story.
Machine learning in Geoscience
Machine learning has a long history in geoscience. Many review articles can be found online. Below are three that I think could be a good starting point for anyone who is interested in how machine learning has been used in geoscience.
- A recent review article 'Machine learning for data-driven discovery in solid Earth geoscience' published in Science. This article focuses on the applications of modern machine learning to solid Earth problems.
- A more recent review article '70 years of machine learning in geoscience in review' published in arXiv. This review has a bias towards geophysics but aims to strike a balance with other geoscience sub-disciplines such as geochemistry, geostatistics and geology. This review also excludes remote sensing.
- For those who are interested in remote sensing, this 2016 article 'machine learning in geosciences and remote sensing' provides a good summary.
Examples of my work on machine learning
My work on machine learning dates back to 2011 when I started working to solve the discrete-valued inverse problem. I have developed a new inversion method that combines the classical Tikhonov regularized inversion formalism with fuzzy c-means clustering, a widely used unsupervised machine learning technique in image analysis. I have also successfully extended this idea to joint inversion of multiple geophysical data sets and magnetization vector inversion.
Below I give a few examples of my research work that involves machine learning.
Solving geophysical discrete-valued inverse problem using fuzzy c-means clustering
Assuming that the physical property under investigation, say, seismic P-wave velocity and density, can only assume a few discrete values in an area of study, how can we incorporate such a constraint into inversion so that the recovered values only show these prescribed discrete values? It is well known that, because of the smoothness regularization used in most Tikhonov regularized inversions, the inverted physical property values typically vary smoothly across the volume of interest. A further consequence of the smooth nature of the inverted models from geologically unconstrained inversion is that the inverted physical property values exhibit reduced contrasts and less variability than the true values, which makes geologic interpretation based on physical property measurements on rock samples difficult and only possible in a relative sense. It is, therefore, critically important that, whenever possible, we should constrain our inversions in such a way that the inverted values are consistent with measured values.
To solve this problem, I have combined the objective function of fuzzy c-means clustering into the classical Tikhonov regularized inversion formalism and developed a new (and slightly more complicated) objective function. This new objective function encourages the inverted values to show a user-specified number of clusters at user-specified locations. These locations are determined by the prior physical property data. For a complete mathematical treatment and explanation of this method, please refer to Sun and Li (2015). Hereafter, I will refer to this new inversion method as clustering inversion. Below I show a few figures that summarize the outcomes of this new method.
As a proof-of-concept, we have constructed a synthetic density contrast model in Figure 1(a), simulated its surface and borehole gravity, and performed several different gravity inversions. We first did a regularized smooth inversion without bound constraint. Figure 1(c) shows the inverted density model and Figure 1(d) the corresponding histogram. It is obvious that the distribution of the inverted values is not consistent with the distribution of the true density values in Figure 1(b). We then carried out a smooth inversion with bound constraint. The inverted model in Figure 1(e) does show significant improvement in identifying the locations, general shape, and boundaries of the anomalous bodies . However, if we look at the distribution of the inverted density values in Figure 1(f), they show a more or less uniform distribution, if we exclude all the values at 0 (i.e., zero background). Finally, we performed a clustering inversion with the constraint that the density contrast values can only be either 0 or 0.4 /gcc. The inverted model is shown Figure 1(g). This model clearly shows the existence of two geologic units with relative density values, 0 and 0.4 g/cc. This model can now be directly interpreted in terms of lithologies. The histogram in Figure 1(h) clearly shows the presence of two rock types with densities concentrating approximately 0 and 0.4 g/cc.
Figure 1: (a) True density model, (b) the distribution of the true density values, and (c) inverted density model without bound constraints. (d) Distribution of inverted density values without bound constraints and (e) inverted density model with bound constraints 0 g/cc ≤ ρ ≤ 0.4 g/cc, where ρ represents inverted density values. (f) Histogram of inverted density values with bound constraints, (g) density model obtained from the clustering inversion, and (h) histogram of inverted density values from multidomain clustering inversion. Bound constraints were also applied. The white boxes in models (c, e, and g) indicate the boundaries of the true density anomalies. All three inverted density models reproduce the observed gravity to its noise level. (Sun and Li, 2015)
I have also applied this method to a set of crosswell seismic data which were collected by the USGS in the Forest Service East (FSE) well field near Mirror Lake, New Hampshire. The objective was to relate P-wave velocity to hydraulic conductivity and to understand how groundwater flows in fractured bedrock.
Figure 2. (a) Hydraulic conductivities for well 9 based on single-borehole hydraulic tests. The dashed line corresponds to a hydraulic conductivity of 10−6 m/s10−6 m/s and serves as the threshold between high and low hydraulic conductivities. (b) Velocity model obtained from clustering inversion of seismic traveltimes with velocity values constrained to be 4870, 5000, 5150, 5280, and 5450 m/s. Black triangles in well 9 and circles in well 8 mark the locations of transmitters and receivers, respectively. (c) Hydraulic conductivities for well 8. (Sun and Li, 2015)
Figure 2(b) shows the inverted velocity model from the clustering inversion. Based on the measurements of velocity on rock samples, we have constrained the velocity values to be 4870, 5000, 5150, 5280, and 5450 m/s. Figure 2(a) and (c) shows hows the hydraulic conductivities in wells 9 and 8, respectively. The black bars delimit intervals over which hydraulic tests were carried out. The dashed line corresponds to a hydraulic conductivity of 10−6 m/s that is considered as the demarcation between low and high hydraulic conductivities. Following Ellefsen et al. (2002), we describe velocities less than the average velocity, 5200 m/s, as low and velocities greater than or equal to 5200 m/s as high. According to this definition, the orange and red regions in Figure 2(b) have high velocities and the rest have low velocities.
Ellefsen et al. (2002) find that the hydraulic conductivities are mostly low irrespective of velocity values, but that high hydraulic conductivities are much more likely to occur in a low-velocity zone than in a high-velocity zone. Comparing the velocity model in Figure 2(b) with the hydraulic values in Figure 2(a) and (c), we note that the two zones with high hydraulic conductivities, 40–45 m in well 9 and 70–76 m in well 8, are associated with low velocity values. This observation is consistent with the findings of Ellefsen et al. (2002).
Crosswell hydraulic tests described by Hsieh and Shapiro (1996) indicate the existence of a hydraulic connection in the 70–75-m depth interval between wells 9 and 8. Considering that high hydraulic conductivities are more likely to be associated with low velocities, the blue region at the bottom of the velocity model in Figure 2(b) could be interpreted as a hydraulically conductive fracture that serves as a channel connecting these two wells. This same finding is reported by Ellefsen et al. (2002) who interpret the tomographic velocity model from a probabilistic point of view. We, therefore, conclude that the clustering inversion has produced a directly interpretable result that is consistent with the known hydrogeology.
Joint inversion of geophysical and petrophysical data using generalized fuzzy clustering
I have extended the idea to joint inversion of multiple geophysical data sets constrained by a priori petrophysical relationships. Figure 3 shows an example of the a priori petrophysical data (i.e, the blue dots) that were obtained from rock property measurements on 82 rock samples from our study area near the town of Boden in Sweden. We can clearly see two linear trends among the blue dots, as indicated by the purple and red lines in Figure 3. In fact, the former corresponds to the bedrock and the latter to the gabbro intrusions. Gabbro intrusions can host copper, nickel and platinum group element mineralization and therefore become the target of interest in this area. Ground gravity and airborne magnetic data were collected, together with the petrophysical data in Figure 3, in order to better image the gabbro intrusions. The question now becomes: how to incorporate the two linear trends that we observe to a joint inversion? This is a very challenging problem because of, again, the multimodality problem explained in joint inversion.
Figure 3: The blue dots summarizes the density and susceptibility values obtained from physical property measurements on 82 rock samples. The physical property values show two linear trends as indicated by the purple and red solid lines, corresponding respectively to the host rock and the gabbro intrusion. The linear trends can also be approximated by the two dotted ellipses.
To deal with this problem, I have investigated the use of Gustafson–Kessel (GK) clustering and Fuzzy c-regression models (FCRM), and developed a framework that allows multiple geophysical data sets to be joint inverted and constrained by a priori petrophysical data such as those in Figure 3. More technical details can be found in Sun and Li (2017). Here I just briefly summarize my results.
Figure 4: (a) A cross-section of recovered density model from joint clustering inversion based on FCRM. (b) A cross-section of the susceptibility model obtained from joint clustering inversion based on FCRM clustering. (c) Crossplot of jointly inverted density and susceptibility values based on FCRM clustering. The red lines summarize the linear trends in a priori petrophysical data in Fig. 3. (Sun and Li (2017)
Figure 5; (a) A cross-section of recovered density model from joint GK clustering inversion. (b) A cross-section of the susceptibility model obtained from joint GK clustering inversion. (c) Crossplot of jointly inverted density and susceptibility values based on GK clustering. The red lines mark the linear trends in a priori petrophysical data in Fig. 3. (Sun and Li (2017)
Figures 4 and 5 summarize the joint inversion results based on clustering constraints. The red lines in Figure 4(c) and 5(c) indicate the two linear trends in the petrophysical data in Figure 3. We observe that jointly inverted density and susceptibility values based on FCRM and GK clustering successfully reproduce the two linear trends. The jointly inverted density and susceptibility models in Figures 4 and 5 are consistent with not only the measured gravity and magnetic data (which are not shown here) but also the petrophysical data (in Figure 3). Therefore, we believe that the density and susceptibility models in Figures 4 and 5 are better representations of the gabbro intrusions than separately inverted models with the petrophysical constraints.
Thanks to the fuzzy clustering techniques, we are now able to incorporate multimodal petrophysical data into geophysical inversions!
Predicting magnetization directions using convolutional neural networks (CNN)
Magnetic data have been widely used for understanding basin structures, mineral deposit systems, formation history of various geological systems, and many others. Proper interpretation of magnetic data requires an accurate knowledge of total magnetization directions of the source bodies in an area of study. Existing approaches for estimating magnetization directions involve either unstable data processing steps or computationally intensive processes such as 3D inversions. My student Felicia Nurindrawati and I have developed a new method of automatically predicting the magnetization direction of a magnetic source body using Convolutional Neural Networks (CNN). CNNs have achieved great success in many other applications such as computer vision and seismic image interpretation, but have not been attempted before to predict magnetization directions.
Figure 6: A schematic of how the magnetization directions are predicted in our work. The input is a magnetic data map. This is input into two separate predictive models (CNN1 and CNN2). The output is in the form of two predicted angle categories, one for predicting magnetization inclination and the other for magnetization declination.
We have applied our CNNs to a set of field data from Black Hill Norite in Australia. Figure 7(a) shows the regional magnetic data over the area. In our study, we have focused on the anomalies in the northeast, as indicated by the black box. Figure 7(b) shows the isolated magnetic anomalies for which the magnetization direction of the magnetic source body is predicted.
Figure 7: (a) The regional magnetic data map of the norite intrusion area. The boxed area indicates the area of our study. (b) The Black Hill Site data map with the inferred shape of the source body indicated by the dotted-lines.
We have investigated many different CNN architectures and compared their performances using a synthetic case. In the end, we have determined the optimal CNN architectures for predicting the inclination and declination, as shown in Figure 8(a) and (b), respectively.
Figure 8: Optimal CNN architecture used to predict magnetization inclination(a) and declination (b). The convolutional layer is represented as the cube, with its dimension indicated under it. For example, (29x29x32) indicates that the convolutional layer accepts an input data map of size 29x29, and has 32 filters in its layer. FC stands for the fully connected layer which, in our study, has 128 neurons. The output layer for inclination prediction (a) has 18 neurons because we have divided the whole range of inclination [-90, 90] into 18 classes. Similarly, the output layer for declination prediction (b) has 36 neurons because we have divided the whole range of declination [-180, 180] into 36 classes.
We have compared our predictions with the other authors who have also studied the same magnetic anomalies. In Figure 9, our prediction of the magnetization direction is indicated by the pink box. The center of the box corresponds to an inclination of 25 degrees and a declination of -135 degrees. Our predictions are consistent with previous studies,
Figure 9: Comparison of our estimates of the magnetization direction with different authors for the Black Hill site. The red box represents our prediction, while the blue symbols are predictions made by different authors, which are represented in the legend (Rajagopalan, 1993; Phillips, 2006; Foss & McKenzie, 2011; Coleman, 2014).