Precisely measuring seismic arrival times is a labor-intensive task but is critical for both earthquake monitoring and subsurface imaging. Recently published deep learning models have demonstrated superior performance compared to traditional automatic approaches for picking arrival times. Although existing deep learning models have shown promising results, further advancements are necessary as their performance is not yet satisfactory especially when applied to new regions and station networks. Increasing model size has led to improved performance in other machine learning applications. We aimed to investigate whether enlarging deep learning models can increase performance on accepted benchmarks. We trained three models of varying sizes, small (1X), medium (4X) and large (16X), using globally distributed local and regional earthquake signals and background noise waveforms from a benchmark dataset, Stanford Earthquake Dataset. Our results indicate that the largest model (PickerXL) outperforms both the smaller models and Seisbench implementation of the PhaseNet model, which has the same number of parameters as our small model. The PickerXL model’s enhanced capacity to extract complex patterns from seismograms contributes to its superior arrival picking abilities compared to the smaller model.
A comparison of absolute arrival time errors between ground truth and model predictions of the large model (PickerXL) using clean signals from the test set. (a) and (b) are the absolute arrival time errors for P and S waves computed using the large model of this study. (c) and (d) are the absolute arrival time errors computed using the SeisBench implementation of the PhaseNet model. Abs. Med. represents absolute median arrival time error (s). Abs. Avg. stands for absolute average arrival time error (s). STD means standard deviation. We consider model predictions with absolute arrival time errors larger than 0.5 s as outliers. (Chai et al., SRL, 2025)
Our research has demonstrated the successful application of transfer learning to a neural network for seismic data with orders of difference in spatial and temporal characteristics. To achieve this, we developed a novel workflow that combines deep learning and double-difference seismic imaging techniques. Our new workflow has several advantages over traditional methods, including the ability to generate a more comprehensive seismic catalog and a larger number of phase picks than could be obtained through human analysis alone. By leveraging transfer learning, we were able to adapt a previously published deep-learning model using only 3,500 seismograms. Training such a deep-learning model from scratch usually requires hundreds of thousands of seismograms. This approach has significant implications for seismic imaging and earthquake monitoring, as it can greatly enhance our ability to process large volumes of seismic data quickly and accurately. By automating many of the tasks traditionally performed by human analysts, our method can help improve the efficiency and effectiveness of seismic imaging and analysis, and enable more rapid and accurate responses to seismic events.
A comparison of original (c, f, i) and updated (a, b, d, e, g, h) microseismic event locations associated with stimulations in May 2018 (a–c), June 2018 (d–f), and December 2018 (g–i). TL stands for transfer learning. Seismic events in panels (a), (d), and (g) show locations updated with TL‐derived phase picks using tomoDD. Panels (b), (e), and (h) show locations updated with manual picks using tomoDD. Fractures are clearly visible with TL‐derived phase picks. (Chai et al., GRL, 2020)
We explored machine learning approaches to improve the efficiency of surface-wave data quality control. We evaluated five machine learning models - logistic regression, support vector machines, K-nearest neighbors, random forests (RF), and artificial neural networks (ANN) - using nearly 400,000 human-labeled waveforms. Our results indicate that the ANN and RF models outperformed the other algorithms, achieving a remarkable test accuracy of 92%. To validate our findings, we further evaluated the models using seismic events from geographic regions not included in the original training data. These tests demonstrated that the ANN and RF models are able to accurately identify surface-wave data quality, achieving high levels of agreement with human analyst labels while requiring only 0.4% of the time.
A comparison of performance and runtime of different machine learning algorithms (Chai et al., SRL, 2022).
We developed a novel workflow that utilizes machine learning models to estimate the operation states (OFF, transition, and ON) and power levels of a nuclear reactor using seismic and acoustic data. The workflow employs two machine learning models to achieve the task. We achieved an overall accuracy of 98% for determining whether the reactor is OFF or ON. However, we observed that the accuracies for the transition state and power levels were lower, with a minimum accuracy of 66%. Despite these limitations, our findings highlight the potential of seismic and acoustic data as a valuable resource for obtaining information about the state and power levels of a nuclear reactor.
A workflow that estimates the operational states and power levels of a nuclear reactor with seismic and acoustic data (Chai et al., SRL, 2022).
Seismic data recorded at industrial sites contain valuable information on anthropogenic activities. With advances in machine learning and computing power, new opportunities have emerged to explore the seismic wavefield in these complex environments. We applied two unsupervised machine learning algorithms to analyze continuous seismic data collected from an industrial facility in Texas, United States. The Uniform Manifold Approximation and Projection for Dimension Reduction algorithm was used to reduce the dimensionality of the data and generate 2D embeddings. Then, the Hierarchical Density-Based Spatial Clustering of Applications with Noise method was employed to automatically group these embeddings into distinct signal clusters. Our analysis of over 1400 hr (around 59 days) of continuous seismic data revealed five and seven signal clusters at two separate stations. At both stations, we identified clusters associated with background noise and vehicle traffic, with the latter’s temporal patterns aligning closely with the facility’s work schedule. Furthermore, the algorithms detected signal clusters from unknown sources and underline the ability of unsupervised machine learning for uncovering previously unrecognized patterns. Our analysis demonstrates the effectiveness of unsupervised approaches in examining continuous seismic data without requiring prior knowledge or pre-existing labels.
One-day-long continuous vertical-componet seismic data labeled using unsupervised machine learning algorithms. Each color represents a distinct signal cluster. (Chai et al., TSR, 2025)
The EarthScope USArray Transportable Array (TA) has been a game-changer for seismic data collection in the United States, providing an unprecedented amount of high-quality seismic records that have enabled significant advances in seismic imaging. As the western U.S. TA data collection phase came to an end, our research focus shifted to developing advanced techniques to further improve the quality and resolution of seismic imaging. One technique we developed is a receiver function smoothing/interpolation approach that reduces scattering effects on receiver function waveforms, while also equalizing the lateral sampling distances between receiver functions and surface wave dispersion. By simultaneously inverting the interpolated receiver functions with Rayleigh wave group velocities and gravity observations, we were able to extract detailed information about the Earth's subsurface structure. Additionally, we employed clustering analysis to regionalize the 1D shear velocity profiles and identify correlations between the crust and upper mantle structures with geological provinces. These results offer new insights into the complex processes that shape the Earth's interior.
A comparison of stacked (traditional, left) and interpolated (right) receiver functions at a time slice of 3.8 s (Chai et al., GRL, 2015).
Depth slices of the shear-wave velocity model (Chai et al., GRL, 2015).
Our research involved jointly inverting smoothed/interpolated P-wave receiver functions, surface-wave dispersions, and gravity observations for the eastern region of the United States. The resulting velocity model is highly detailed and broadly consistent with previously published results. In our analysis, we compared seismicity and subsurface structure and found that while earthquakes often occur near areas with seismic velocity variations, this is not always the case. We also observed that not all velocity changes are associated with seismic activity, highlighting the complex and nuanced relationship between subsurface structure and seismicity.
Crustal thickness map of (e) the Eastern United States and (a–d and f–i) example velocity profiles at eight locations. (Chai et al., G-cubed, 2022).
Slow speed shallow structures, such as ice and sediments, can create large-amplitude reverberations in P-wave receiver function waveforms that obscure important information about deeper subsurface variations. To address this issue, we have developed a novel approach that leverages wavefield continuation and decomposition techniques to derive subsurface signals by removing these shallow reverberations. Our method uses the better-constrained shallow structure to back-propagate the teleseismic wavefield and isolate features in the subsurface waveforms that provide additional information about the velocity just beneath the shallow structure. By applying this technique, we can obtain subsurface signals that tightly constrain seismic speed changes at depth. We have applied our approach to several Antarctica stations using Monte Carlo Markov Chain simulations to estimate speed variations beneath these stations. The resulting subsurface signals offer valuable insights into the complex processes that shape the Earth's subsurface in this region. Our technique is also applicable to sediment-covered regions, and has the potential to significantly improve our ability to image subsurface structure and constrain seismic velocity variations in a wide range of geologic environments.
Detecting and locating remote seismic event can be challenging due to limited station coverage. Surface waves are often recorded for these events, and cross-correlation techniques can be used to detect and locate them. We developed a method based on surface wave cross-correlations to detect and locate remote earthquakes. By comparing the similarity of candidate waveforms with a set of known earthquake templates, we were able to detect remote events. The locations of the detected earthquakes were obtained by searching a grid around the template events and maximizing the cross-correlation values with corresponding time shifts. We also computed the magnitude of the detected events from the cross-correlation values. Our analysis revealed that seismicity is low in the oceanic ridge region.
We used a surface-wave-based double-difference algorithm to relocate approximately 60 moderate-earthquake epicenters, analyzed temporal changes in seismicity, and modeled Coulomb stress changes resulting from the first main earthquake. Our results suggest that the complex spatial distribution of aftershocks is not solely due to location errors, but rather reflects the evolution of many short fault segments in the strain release of the aftershocks. The typical aftershock decay observed is consistent with intraplate oceanic earthquakes with a relatively low number of aftershocks. Furthermore, the predicted Coulomb stress changes are generally consistent with most of the moderate-magnitude aftershocks.
Plots shown changes in Coulomb stress due to the first main earthquake (Chai et al., Tectonophysics, 2019).
Geolocation of emergent seismic signals is challenging at close distances. We used threecomponent data from a seismic network and a targeted experiment at a research nuclear reactor to locate seismic sources. Utilizing known events collected during the targeted experiment, we were able to infer source locations with seismic amplitudes and polarization characteristics of the data. Although the resolution of the source location is not perfect, the seismic amplitudes and polarization analysis offer useful constraints. For the known events, the source region inferred with our analysis includes the true source locations. Synthetic tests indicate the resolution is largely due to limited data coverage and measurement uncertainties because the synthetic tests show similar results compared with the field data. We identified the source of the unknown event through spectrum cross correlation between the signals from the known events and an unknown event. Our findings were confirmed by operational staff at the facility. When the propagation medium properties (i.e., seismic velocity and quality factor for attenuation) are known, our analysis can be applied to continuous data from a seismic array to infer both source amplitude and location. If the medium properties are not known, a targeted experiment can be conducted to estimate them.
A comparison of the inferred source region using amplitude measurements from all stations against the true location on a satellite image for an operational event. The bar diagram centered at each seismic station shows the distribution of inferred back azimuths with polarization measurements. A longer bar indicates a larger number of inferred back azimuths were registered in the direction of the bar. (Chai et al., 2023, BSSA, 2023)
Our approach involves computing the stress field using finite element modeling, which takes into account both gravitational and far-field tectonic forces. To determine the orientation and magnitude of the tectonic force, we find the stress field that best fits the observed stress observations. We have found that the gravitational contribution to the horizontal stress field is comparable in magnitude to the tectonic contribution for the upper 5 km.
A comparison of magnitudes of deviatoric stress (left column) and its gravitational (middle column) and tectonic contributions (right column) at 5 km depth using the preferred stress model. The color scale changes for each image. The deviatoric stress tensor D is the sum of the gravitational contribution G and the tectonic contribution T. The first subscript is the orientation of the surface that the tensor component acts on. The second subscript of the tensor component is the direction of the component. Subscript 1 represents north, 2 for east, and 3 for down. (Chai et al., JGR, 2021)
As seismic data and models become more complex and voluminous, the demand for scientific visualization tools has increased. To meet this need, we have developed several interactive visualization tools that utilize modern internet browser interfaces to display data and models side by side. These tools allow for dynamic representations of data and models, with interactive inputs such as clicking, slider bars, and buttons that provide users with a more accurate and dynamic understanding of the data and model.