Abstract
Acoustic emissions (AEs) are bursts of elastic waves generated by ruptures in laboratory rock mechanics experiments that mirror typical seismograms recorded in natural earthquakes, albeit at much higher frequencies. Traditionally, AE events were manually sorted and picked—a time-consuming and daunting task. Recently, automatic methods based on machine learning (ML) or template matching have been applied to detect AE events. In order to accurately and quickly analyze a large quantity of raw AE waveforms, the current study explores the direct application of ML tools designed for regular earthquake waveforms to the AE detection and picking process. We investigated applications of a deep-learning-based detector EQTransformer (EQT) that was trained on global earthquake data to laboratory AE datasets without retraining. Two AE datasets were collected from laboratory deformation experiments during the syn-deformational phase transformation from olivine to spinel in Mg2GeO4. We compared EQT’s performance on AEs to its published performance on natural earthquakes, as well as to a neural network (NN) designed for AE detection and picking called MultiNet. When applied to dataset D2540, EQT detected all 3901 previously identified events in the dataset with a mean P-pick error of < 1 sampling point, in addition to 2521 previously undetected events. For dataset D1247, EQT also detected all 550 known events with a mean error of < 1 sampling point, as well as 22 new events. In both cases, EQT performed within the standards advertised for EQT on earthquake data and with similar precision to MultiNet. Our results indicate that the EQT model pre-trained using global seismic data can be directly applied to accurately pick AE events in laboratory settings, with robust performance across multiple recording channels.

Similar content being viewed by others


Deep and confident prediction for a laboratory earthquake
Article
12 March 2021
1 Introduction
Acoustic emissions (AEs) are mechanical waves corresponding to sudden irreversible changes within a sample material, serving as a measure of rapid cracking (creation or growth) at in situ conditions (Lei and Ma 2014; Lockner and Byerlee 1992). Unlike the more well-known ultrasonic testing that uses active waveform pulses to interrogate a sample’s elastic properties, AE testing passively “listens” to acoustic signals generated by small ruptures in the sample during deformation. Therefore, an AE waveform reflects the convolution of both its source mechanism and subsequent path effect. AEs in rock samples are studied as a model of natural earthquakes; it is well established that earthquakes in the crust and AE events in stressed rocks show mechanical and statistical similarities in a wide range of aspects (Lei and Ma 2014). The cumulative number of AEs in a sample denote total internal damage, while tracking AE paths and energy peaks can be used to recreate fracture nucleation and growth patterns, which are essential for understanding laboratory earthquake precursors and prediction (Bolton et al. 2020; Goebel et al. 2024).
Numerous AE experiments have been conducted to study fractures facilitated by mineral phase transformations at a wide range of pressures (P) and temperatures (T). This makes AE testing a prime analytical technique for evaluating different hypotheses of physical mechanisms of deep earthquakes—seismicity below ~ 70 km depth, where rocks generally deform in a ductile fashion (Ohuchi et al. 2017, 2022; Shi et al. 2022, 2024; Wang et al. 2017). Following the collection of these internal waves generated by small ruptures, source location algorithms utilize the first arrivals identified from the signals to determine AE origin locations with time (Lockner 1993). In such experiments, relocated AEs can be combined with post-mortem microstructure analysis (Schubnel et al. 2013) or X-ray microtomography to link AEs to specific faults (Wang et al. 2017; Officer and Secco 2020) and even track nanoseismic event propagation along faults analogous to those proposed to form at depths of several hundred kilometers (Wang et al. 2017; Officer et al. 2022).
AEs are detected outside the sample by piezoelectric transducers, which convert changes in strain or force to an electric voltage. AE data are typically recorded in two ways, both of which present issues. One is to record continuously, whether events occur or not. While continuous recording ensures all potential events are captured, it may result in a very large dataset without many true events and even run the risk of filling up the entire disk space of the recording instrument. This size can quickly become debilitatingly large for analysts manually picking P-waves, and researchers are often forced to rely on a second recording method, event-triggered recording (Schubnel et al. 2013). In this method, a predetermined amplitude threshold is set and only the events whose amplitudes exceed the threshold are recorded. While this eliminates excess data, relying on thresholding may fail to capture all the events, particularly the smallest events. Additionally, the signals can be affected by electrical noise, which can overwhelm the triggered data with spurious signals. Without careful manual intervention, this will cause the total number of AE events to be misestimated and reduce information about fault nucleation on the smallest scale.
In the past decades, multiple techniques have been introduced to automatically detect earthquake signals recorded in natural settings. These include the matched filter technique—also known as template matching—which utilizes waveforms of known events as templates to scan through additional events with waveform cross-correlation techniques (Gibbons and Ringdal 2006; Lei et al. 2022; Peng and Zhao 2009). More recently, machine learning (ML) techniques trained on millions of labeled waveforms have quickly gained popularity in seismology due to their improved consistency across channels and stations, robustness to noise, and ability to handle waveform variations. (Kong et al. 2019; Münchmeyer et al. 2022). As seismic datasets have grown larger and more complex (Arrowsmith et al. 2022), these advantages have made ML particularly appealing to seismologists. ML methods can likewise be applied to detect and pick arrival times of AEs in laboratory settings, which often contain analogous challenges (Guo et al. 2021; Lei et al. 2022; Tanaka et al. 2021; Wang et al. 2024). MultiNet (Li et al. 2022) is one such specialized neural network (NN) that has demonstrated promising results in AE detection and picking. However, unlike natural earthquakes where millions of cataloged waveforms and picks are currently available (Mousavi et al. 2019), labeled AE data are still limited with only a few recent exceptions (Lei 2024). While the collection of AEs allows researchers to retrieve in situ measurements that would otherwise be impossible to obtain, the quantity of data and the labor required to retrieve accurate spatial and temporal resolution can be a significant drawback. At the length of scale of these experiments, small errors in P-wave arrival time can result in mm-scale errors in event relocation. Given that high-pressure generation necessitates restricted sample size, this can render relocation errors of the same order of magnitude as the sample length (Officer et al. 2022).
In this study, we directly apply an existing deep NN called EQTransformer (EQT) (Mousavi et al. 2020), which was trained on the global earthquake dataset STEAD (Mousavi et al. 2019), to detect and pick AEs generated during two different high P–T experiments investigating deep-focus earthquakes (DFEQs). Our motivation for selecting EQT as opposed to a specialized ML model such as MultiNet is twofold. By successfully repurposing EQT’s pre-trained model, we are able to immediately apply EQT without relying on labeled AE datasets, allowing us to take advantage of the millions of already labeled seismic waveforms. Also, EQT is open-source and community-maintained, making it a more sustainable and practical choice for future laboratory earthquake research. Our objective is therefore to evaluate EQT’s performance on a completely new dataset without retraining. We compare our detected results with both manual picks and MultiNet. In the next sections, we detail the experimental setup, EQT’s application, parameter tuning, evaluation metrics, and comparative analyses. We present detailed results and discuss EQT’s scalability and future potential as an efficient tool to evaluate AEs and other events in earthquake-related studies.
2 Data and methods
The two DFEQ experimental analogs on which we applied EQT returned both event-triggered data and continuous data from piezoelectric transducers attached to the rear of six orthogonally oriented anvils in the D-DIA (Y. Wang et al. 2003) and DDIA-30 (Shen and Wang 2014) multi-anvil deformation apparatuses (Fig. 1a). These were used to generate two datasets—labeled as D1247 (Schubnel et al. 2013; Wang et al. 2017) and D2540, respectively—collected during experiments of syn-deformational phase transformation from olivine to spinel in Mg2GeO4 in high P–T environments to investigate DFEQs. A more detailed description of the experimental techniques and motivations used for the datasets can be found in previous publications (Officer et al. 2022; Schubnel et al. 2013; Wang et al. 2017). For both experiments, event-triggered data were collected at 50 MHz for all six transducers. The signal was amplified between 40 and 60 dB, and the AEs are registered using a trigger and hit count unit (THC) (Wang et al. 2017). As continuous data pass through the unit, recordings are automatically discarded unless AE events register enough signal to surpass a predetermined voltage threshold. Once an event is registered, an 81.9–163.8 µs trace (comprised 4096–8192 sampling points) was extracted to capture the triggering signal.
Experimental setup. a Diagram of the DDIA-30 experimental setup used to generate the AE data in D2540. The sample pressure cell is centered between the six anvils. The short arrows indicate the compression of the cell by the anvils. b Periodic background signal for a 0.3-s segment of continuous data from D2540 compared to an acoustic emission event (in the dashed box). The signal in the box is magnified 250 × to show the unprocessed event, the absolute value of the signal after applying a 1-MHz high-pass filter, and the spectrogram in c, d, and e, respectively
EQT Output. EQT waveforms generated from an event detected in a D1247 on the six event-triggered channels, and b D2540 on the four continuous channels. In both a and b each trace is 60 s long at 100 Hz, or 6000 sampling points. Each event is detected on all available channels and displays a detection probability above 0.6 and a P-wave probability above 0.3. Because a displays triggered data, the tail of the preceding appended event can be seen on each of the six channels’ traces
Meanwhile, waveform data on channels 3–6 were continuously recorded at 10 MHz in tandem with the event-triggered data. This allows for the detection of non-impulsive and long-period signals that do not cross the THC threshold (Ohuchi et al. 2022). Additionally, continuous data are recorded at a relatively low gain of 30 dB compared to 40 dB in event-triggered data. This reduces the sinusoidal background in the data—a product of large electrical noise resulting from the acoustic equipment. A benefit of this lower gain is a decrease in the saturation of large events as compared to the event-triggered data. Continuous recording also allows for potential correlation between events that were only triggered on some channels, which we used to analyze EQT’s detection of new events (see the following sections).
The first dataset, D2540, consisted of 3901 triggered AE events with manually picked P-wave arrivals, as well as a full continuous dataset. EQT was applied to the continuous waveforms and then compared to EQT’s advertised success rate from its intended application on global seismicity, published in Mousavi et al. (2020). This allowed us to quantify the precision of EQT’s AE P-picks compared to earthquake P-picks. The second dataset, D1247, consisted of 550 manually picked triggered AE events, and was recently analyzed using a purpose-built AE detector and picker, MultiNet (Li et al. 2022). Comparing our results to MultiNet allowed us to quantify the ability of EQT to detect and pick AE P-waves compared to an independent NN developed specifically for AE detection. In experiment D2540, EQT was run on the continuous data because it contained a wider variety of waveforms to quantify the model’s performance. For D1247, EQT was run on the event-triggered files so a direct comparison to MultiNet could be made. A schematic depicting this process is shown in Fig. 2.
Because EQT was trained on genuine earthquake data, both datasets were manipulated to mimic seismic data from regular earthquakes. This included adding placeholder metadata to the miniSEED event files, such as network, station, location, and channel information. Additionally, we altered the time series’ sampling rates. For both D2540 and D1247, EQT read the data as 100 Hz, allowing us to directly compare the datasets to one another and to EQT’s advertised performance on earthquake data. To reduce subsequent computation time, we downsampled the 50-MHz event-triggered D1247 data to match the continuous 10 MHz D2540 data. Before downsampling, a low-pass filter with a cutoff frequency of 4.5 MHz was applied to the dataset to prevent aliasing. We then applied a scaling correction to the time and distance units for both datasets in EQT, taking advantage of the common 106 scaling factor between “labquakes” (µs, mm) and earthquakes (s, km). Manually selected P-wave arrival times were picked using the event-triggered data, totaling 3901 and 550 for D2540 and D1247, respectively. These were treated as the true arrival times in order to evaluate EQT’s performance. For the triggered data on each of the six channels in D1247, we appended the 550 individual traces to form a single 45,056 µs waveform with 550 confirmed events, 550 times the length of each 81.9 µs event-triggered file.
In addition to re-packaging the AEs as standard seismic waveforms, we also tuned EQT’s parameters to optimize performance for AE data. The variable parameters utilized were detection threshold, P-pick threshold, batch size, and percent overlap. The targeted performance was to eliminate false negatives, but we further evaluated EQT’s performance using the following assessments: mean error (µ), mean absolute error (MAE), standard deviation (σ), and F1-score. Mean error, mean absolute error, and standard deviation characterize the spread of error between P-wave arrival times that were picked by both EQT and analysts. These are the measures used to compare EQT’s performance on D2540 to its advertised performance on earthquakes. F-1 score, on the other hand, evaluates the model’s robustness using the harmonic mean of precision and recall:
$$F1 = frac2P_T 2P_T + P_F + N_F ,$$
where PT is the number of true positives, PF is the number of false positives, and NF is the number of false negatives. This score is used to compare EQT’s performance on D1247 to MultiNet’s performance on the same dataset. All four metrics are used to compare this study’s results to each other.
EQT also detected and picked new events that were previously missed by either the event-triggered threshold or analyst. To characterize the new detections as true AE events, we utilized EQT’s uncertainty estimation. By using the dropout technique (Gal and Ghahramani 2016)—a form of stochastic regularization—between each layer of its NN, EQT can make probabilistic statements related to its parameters. For each event, EQT outputs detection and picking uncertainties (Fig. 3). These results were compared between events and across channels. Because EQT demonstrated strong cross-channel consistency, to confirm each of the new events found in both datasets, we required the event to be detected on three of the four continuous channels in D2540, or five of the six event-triggered channels in D1247. New events were only considered true if they were detected on the minimum number of channels and their detection and picking uncertainties were at least equivalent to the minimum uncertainties in the surrounding known events.
An additional EQT parameter is S-pick probability. A crucial difference between our AEs and typical earthquake waveforms is the S-wave. In earthquake waveform data, the S-waves are typically larger than the P-waves. However, in the AEs collected in this study, the S-waves are much weaker. Unlike typical seismometers that have three components, piezoelectric transducers only have one component, oriented to measure vibrations perpendicular to the transducer surface. Most of the S-wave energy is polarized perpendicular to this; the remaining contribution of S-waves polarized parallel to this direction is very small. Because reverberations inside the crystals caused by the elastic waves tend to contaminate the later signals, including S-phase detection thresholds in our model significantly increased the number of false negatives in our results and lowered our other performance metrics. Given the disproportionately small contribution and large contamination of S-waves, we therefore elected to only compare P-picks and event detections in our subsequent analyses.
3 Results
For experiment D2540, we employed the now deprecated EqT_model2.h5—corresponding to the updated Original Model—which was designed to be robust to false negatives (Mousavi et al. 2020). A detection threshold of 0.6 and a P-pick threshold of 0.3 were used, along with a batch size of 500 and 30% overlap. These thresholds were chosen to eliminate false negatives; the run was considered successful if EQT detected all the manually picked events in each experiment. When applied to the four continuous channels, EQT identified 6422 events (Fig. 4a) that were detected on at least three out of four channels, including all original 3901 events. This is a 64.62% increase in the number of events originally found using the triggered data. One such newly identified event is shown as an example in Fig. 5, where it can be seen ~ 15 µs before a much larger event that was captured in the event-triggered dataset. The newly identified event was not recorded by the THC unit because it was too small to cross the trigger threshold, but EQT detected it on three of the four channels independently of the larger previously detected event.
EQT D2540 Performance. a Euler diagram of events detected by EQT (blue), and by both EQT and analysts (purple). No events were detected solely by analysts, which would correspond to a false negative. b Frequency and density distribution of arrival time differences for the 3901 D2540 P-waves picked by analysts and EQT
Undetected Event. a EQT output of one of the 2521 previously undetected events. The blue line represents the P-wave of the new event, while the purple line represents the P-wave of the event-triggered event. (Note: these colors correspond to the Euler diagram in Fig. 4a.) b A segment of the continuous recording system trace corresponding to this missed event for Channels 1, 2, and 4. The arrow on the right indicates progressively worse signal-to-noise ratios
To test the accuracy of the EQT P-picks, we calculated the arrival time differences between the 3901 P-waves that were picked both manually using the event-triggered data and by EQT. This distribution is shown in Fig. 4b. Treating the manual picks as the true arrival times, the P-pick mean difference of these events is − 0.7630 ns, which is much less than the 20 ns sampling interval and can be considered effectively zero. In addition, the distribution is nearly symmetric about the mean. The mean absolute error is 0.05848 µs (< ± 6 sampling points when corrected to 100 Hz and scaled by 106), with a standard deviation of 0.07491 µs (< ± 8 sampling points). The small absolute size of the mean and normal distribution of the errors signifies that EQT is not biased toward early or late picking. The F1-score is 0.9770.
In order to contextualize our results, we compared EQT’s performance on D2540 to its performance published in Mousavi et al. (2020) on the Mw 6.7 Tottori earthquake on October 6, 2000. Using detection, P-pick, and S-pick probability thresholds of 0.5, 0,3, and 0.3, respectively, Mousavi et al. (2020) found that EQT detected 401,566 P– and S-wave arrival times, compared to 279,104 arrival times picked manually by Japanese Meteorological Agency (JMA) analysts. This is 143.9% of the number of arrival times detected. Of the ~ 42,000 arrival times that were picked from the same stations by both EQT and JMA analysts, Mousavi et al. (2020) showed that the mean error distribution in P-pick time was 0.01 s, the MAE was 0.06 s (6 sampling points), and the standard deviation was 0.08 s (8 sampling points). Their results are directly comparable to ours because of the common scaling factor between AE and typical seismic data.
For experiment D1247, because the dataset is generated using similar experimental conditions, the model and threshold parameters that were tuned according to the D2540 data were used. An important distinction between the two datasets, however, is that EQT was run only on the event-triggered data for D1247. This was done to replicate the conditions used to evaluate MultiNet, which was solely applied to the triggered data (Li et al. 2022). When applied to the 550 triggered traces on each of the six channels, EQT detected and picked 572 events and P-waves on at least five out of six channels, including all 550 original events. This is a 4% increase in the number of P-waves picked manually using only the event-triggered files, which requires that undetected events happened to be captured within the same trace as an event registered using the THC system. This result from EQT illuminates one of the primary benefits of high-sampling continuous recording. Given the low likelihood of multiple events captured within the same trace, it is not unreasonable to assume that it could detect many more small events if applied to continuous data, as it did when applied to the D2540 dataset.
Using the 550 common events, we calculated the arrival time differences between manual and automatic picks (Fig. 6c). Once again, we assume the manual analyst picks are the true arrival times. This results in a mean error of 0.006297 µs, a mean absolute error of 0.09367 µs (~ ± 9 sampling points when corrected to 100 Hz and scaled by 106), and a standard deviation of 0.1071 µs (< ± 11 sampling points). The corresponding F1-score is 0.9905. Because this mean error is again very small compared to the other measures, we can continue to infer that there is no bias towards picking early or late.
EQT D1247 Performance. a Frequency and density distribution of arrival time differences for the 500 P-waves picked by both MultiNet and EQT. b The relationship between arrival time differences for EQT when compared to MultiNet and human analysts for the 500 events picked by each of the three methods. c Frequency and density distribution of arrival time differences for the 550 P-waves picked by both analysts and EQT
To mirror the comparison of the D2540 EQT results to the EQT results when applied to earthquake data published in Mousavi et al. (2020), the D1247 EQT results were compared to MultiNet’s performance on the same D1247 dataset from Li et al. (2022). Using a convolutional NN to identify AE events and a fully convolutional NN to pick P-wave arrivals, (Li et al. 2022) determined the optimal output threshold and image length to be 0.001 and 256, respectively. With these parameters, MultiNet detected 548 events in 500 of the event-triggered waveforms, which translates to 109.6% of the original events picked manually (Li et al. 2022). It is important to note that 50 triggered waveforms were removed from the dataset for training purposes. Therefore, MultiNet identified all 500 of the manually picked events on which it was run, as well as 48 new events (Li et al. 2022). This resulted in a calculated F1-score of 0.9930.
Because both EQT and MultiNet were applied to the same dataset, it is possible to directly compare their respective arrival time picks (Fig. 6a). Using the 500 common events between the two methods, we calculated the difference in EQT and MultiNet pick times for each event. The result is a mean time difference of − 0.0795 µs, mean absolute error of 0.04734 µs (< ± 5 sampling points), and standard deviation of 0.03112 µs (~ ± 3 sampling points). Instead of considering one method the true value and analyzing the other method’s “error”, here we compare these measures to the calculated arrival time differences between manual and EQT picks defined above (µ = 0.006297 µs, MAE = 0.09367 µs, σ = 0.1071 µs). We found that the time difference still follows a normal distribution, but now the mean difference is an order of magnitude larger than when EQT was compared to manual picks on both datasets (Fig. 6a, c).
Additionally, the measures of spread in the data—mean absolute error and standard deviation—are approximately half as large as the corresponding values in the manual difference calculations, which is represented by the short, wide density distribution in Fig. 6b. This indicates that EQT is generally in better agreement with MultiNet than with the analysts’ P-picks. While this could be due to the similarities of the two deep-learning-based methods, it may also indicate a randomness associated with analyst picks. However, measures of spread in both cases are within the same order of magnitude, demonstrating that the agreement between the two models is still similar to that between EQT and analysts.
4 Discussion
By comparing EQT’s results on continuous data in experiment D2540 to EQT’s performance in Mousavi et al. (2020), we analyzed EQT’s ability to detect and pick AEs without retraining. We also measured EQT’s results on event-triggered data in experiment D1247 against MultiNet’s results on the same dataset in Li et al. (2022) to compare EQT’s performance to an NN purpose-built for AEs. When applied to D2540 and measured with mean, mean absolute error, and standard deviation, EQT performed within one sampling point (20 ns) of its published arrival time differences on traditional seismic data. Despite the large difference in dataset size, EQT also achieved a similar increase in the percentage of events detected in this dataset: 164.6% in this study and 143.9% in Mousavi et al. (2020). When applied to D1247 event-triggered data and measured with F-1 score, EQT matched MultiNet in capturing all the original known events. However, MultiNet was able to identify over twice as many new events as EQT (Li et al. 2022), which explains why MultiNet’s F-1 score was 0.9930 compared to EQT’s 0.9905 (Table 1). Evaluating the events that analysts and both models detected, the differences between EQT and MultiNet arrival times are notably more tightly constrained than the differences between EQT and analyst picks.
Directly comparing the results of D2540 and D1247 in this study highlights differences between the two experiments’ data and provides insight into EQT’s performance. Potentially the greatest difference is the relative number of previously undetected true positives: in D2540, EQT detected 2521 newly identified events (a 64.62% increase), while in D1247 EQT detected 22 (4.545% increase). From these initial findings, the number of true positives appears to positively correlate with the number of data points. However, it also depends on the origin of the events; while D1247 records at 50 MHz compared to D2540’s 10 MHz, D2540 preserves the waveforms between triggered events with continuous recording. This introduces an inherent limitation when applying detection methods to event-triggered recordings. A significant number of the previously unidentified events in D2540 occurred between triggered traces and were missed because they were not large enough to trigger the THC unit. Consequently, when applied to only event-triggered data in D1247, EQT can only detect new events that were missed because they were too small to cross the threshold but were by chance present in the trace of a larger event. No matter how sensitive the model is, it cannot detect events that were not recorded. As expected, because D2540 contains both types of unidentified events, EQT detected more previously missed events in this dataset. Given ongoing advances in continuous, high-sampling, multi-channel waveform recording, EQT’s utility will likely increase significantly. Continuous, high-frequency recordings would enable EQT to fully leverage its strengths, efficiently capturing both small and large AE events. Future applications of EQT will benefit from these technological advances.
In both datasets, EQT detected all the previously known events. According to the means, MAEs, and standard deviations of these known events, EQT performed better on D2540 than on D1247 (Table 1). However, evaluating the two performances using the F1-score results in the opposite conclusion. Our chosen analytical scores provide slightly different contexts to our results. F1-score measures how robust the model is by accounting for false negatives, true positives, and false positives. Of the true positives that were detected by both analysts and EQT, mean, MAE, and standard deviation measure the dispersion of the correlated events.
The mean decreasing with the increasing number of events in the dataset further confirms that EQT shows no bias towards picking early or late compared to the manual picks. The larger sample size in D2540 decreases the impact of each random arrival time error, lowering the overall mean. This also agrees with our assumption that the better agreement between EQT and MultiNet compared to the agreement between EQT and the analysts in D1247 is due to manual—not model—randomness. The MAE and standard deviation correlations are less pronounced, but still slightly better for the D2540 dataset than D1247. A potential explanation is the trace sizes. When a shorter window around an event is used, it is more difficult to differentiate events from noise (Mousavi et al. 2020). Longer windows result in more variable probabilities, which can effectively strengthen EQT’s ability to accurately discern the P-wave arrival, potentially explaining this small discrepancy between the two results (Zhu et al. 2019). Extending the window fed into EQT does not improve this, because D1247 uses appended 81.9 µs triggered traces, which artificially shorten the windows.
The F1-scores display the opposite trend: D1247 results appear slightly more robust than D2540. This is likely because D1247 has less non-event data, leaving EQT with fewer waveforms to potentially mistake for false positives. Combining these two methods of evaluation, F1-score could be used to argue that event-triggered data are the easier data type for applying EQT directly without retraining; however, when provided with continuous datasets—which include more variance and non-triggered events—EQT can be more precise and potentially more useful for a wide array of datasets.
As ML continues to become more prevalent in the field of seismology, it is important to analyze models’ performances beyond their initial proof-of-concept applications. Currently, the growth of the field is moving so rapidly that evaluation techniques are struggling to keep up (Arrowsmith et al. 2022; Woollam et al. 2022). As noted in Mousavi and Beroza (2022), researchers often choose analytical techniques that highlight the strengths of their models. Our study demonstrates the complicated nature of evaluation by utilizing analytical criteria from two different studies, which can both be used individually to emphasize one model and dataset pairing over the other. Future clarity within analytical techniques will allow researchers to better understand the strengths and weaknesses of a package, which will in turn allow researchers to better infer conclusions about their data using a model’s performance. For instance, in this study, EQT was able to detect AEs at a similar rate to its own published application on global seismic events, despite not being retrained on AEs. This could serve as evidence in support of the scalability of earthquake mechanics from field to laboratory scales, also suggesting that the micromechanics of DFEQs can be replicated in a lab setting—an area of active research (Goebel et al. 2017, 2024; Officer and Secco 2020; Shi et al. 2024).
Specifically in the field of laboratory DFEQ analogs, applying ML algorithms like EQT to AEs will help researchers significantly reduce the human effort required for accurate phase picking in these datasets, leading to further examination of the fracturing process. When evaluating the utility of EQT in laboratory data processing, it is important to also consider its effectiveness compared to established automatic methods, such as template matching. The primary advantage of ML is its ability to generalize across a variety of waveform types without relying on previously identified similar waveforms (Mousavi and Beroza 2023). Our results in D2540 exemplify this, as EQT detected many new events between triggered events that more traditional template matching may have overlooked due to lower correlation coefficients to previously identified events. Additionally, traditional automated methods occasionally struggle to identify low-amplitude or noisy waveforms across all channels (Peng and Zhao 2009). EQT’s robust performance detecting events across all channels significantly improves event reliability, highlighting a distinct practical advantage of utilizing ML. Nevertheless, traditional automated methods remain highly effective, especially when waveforms are expected to be uniform across time and channels (Lei et al. 2022). As with conventional seismology, EQT is a valuable complementary analytical tool, particularly for initial applications for immediate results. Although not as accurate as MultiNet, EQT still performed well according to all evaluation metrics. By not requiring a purpose-built model—or even retraining—EQT offers a fast, simple solution for researchers interested in analyzing seismic-adjacent time series data. While retraining EQT or using a phase picking model like MultiNet designed for AEs will likely always result in better performance, this study characterizes EQT’s basic model as an efficient first option beyond template matching for researchers without an abundance of labeled training data.
Availability of data and materials
The data analyzed during this study are included in this published article and its supplementary information files. The acoustic emission detection and phase picking were performed using EQTransformer (Mousavi et al. 2020), publicly available under the MIT license at https://github.com/smousavi05/EQTransformer.
Abbreviations
- AE:
-
Acoustic emission
- P:
-
Pressure
- T:
-
Temperature
- ML:
-
Machine learning
- NN:
-
Neural network
- EQT:
-
EQTransformer
- DFEQ:
-
Deep-focus earthquake
- THC:
-
Trigger and hit count
References
-
Arrowsmith SJ, Trugman DT, MacCarthy J, Bergen KJ, Lumley D, Magnani MB (2022) Big Data Seismology. Rev Geophys 60(2):e2021RG000769. https://doi.org/10.1029/2021RG000769
-
Bolton DC, Shreedharan S, Rivière J, Marone C (2020) Acoustic energy release during the laboratory seismic cycle: insights on laboratory earthquake precursors and prediction. J Geophys Res Solid Earth. https://doi.org/10.1029/2019JB018975
-
Gal Y, Ghahramani Z (2016) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In: Balcan MF, Weinberger KQ (ed) Proceedings of Machine Learning Research 48, New York, 2016
-
Gibbons SJ, Ringdal F (2006) The detection of low magnitude seismic events using array-based waveform correlation. Geophys J Int 165(1):149–166. https://doi.org/10.1111/j.1365-246X.2006.02865.x
-
Goebel THW, Kwiatek G, Becker TW, Brodsky EE, Dresen G (2017) What allows seismic events to grow big?: Insights from b-value and fault roughness analysis in laboratory stick-slip experiments. Geology 45(9):815–818. https://doi.org/10.1130/G39147.1
-
Goebel THW, Schuster V, Kwiatek G, Pandey K, Dresen G (2024) A laboratory perspective on accelerating preparatory processes before earthquakes and implications for foreshock detectability. Nat Commun 15(1):5588. https://doi.org/10.1038/s41467-024-49959-7
-
Guo C, Zhu T, Gao Y, Wu S, Sun J (2021) AEnet: automatic picking of P-wave first arrivals using deep learning. IEEE Trans Geosci Remote Sens 59(6):5293–5303. https://doi.org/10.1109/TGRS.2020.3010541
-
Kong Q, Trugman DT, Ross ZE, Bianco MJ, Meade BJ, Gerstoft P (2019) Machine learning in seismology: turning data into insights. Seismol Res Lett 90(1):3–14. https://doi.org/10.1785/0220180259
-
Lei X (2024) Fluid-driven fault nucleation, rupture processes, and permeability evolution in oshima granite—preliminary results and acoustic emission datasets. Geohaz Mech 2(3):164–180. https://doi.org/10.1016/j.ghm.2024.04.003
-
Lei X, Ma S (2014) Laboratory acoustic emission study for earthquake generation process. Earthq Sci 27(6):627–646. https://doi.org/10.1007/s11589-014-0103-y
-
Lei X, Ohuchi T, Kitamura M, Li X, Li Q (2022) An effective method for laboratory acoustic emission detection and location using template matching. J Rock Mech Geotech Eng 14(5):1642–1651. https://doi.org/10.1016/j.jrmge.2022.03.010
-
Li Z, Zhu L, Officer T, Shi F, Yu T, Wang Y (2022) A machine-learning-based method of detecting and picking the first P-wave arrivals of acoustic emission events in laboratory experiments. Geophys J Int 230(3):1818–1823. https://doi.org/10.1093/gji/ggac148
-
Lockner D (1993) The role of acoustic emission in the study of rock fracture. Int J Rock Mech Min Sci 30(7):883–899. https://doi.org/10.1016/0148-9062(93)90041-B
-
Lockner DA, Byerlee JD (1992) Fault growth and acoustic emissions in confined granite. Appl Mech Rev 45(3):S165-173. https://doi.org/10.1115/1.3121387
-
Mousavi SM, Beroza GC (2022) Deep-learning seismology. Science 377(6607):eabm4470. https://doi.org/10.1126/science.abm4470
-
Mousavi SM, Beroza GC (2023) Machine learning in earthquake seismology. Annu Rev Earth Planet Sci 51:105–129. https://doi.org/10.1146/annurev-earth-071822
-
Mousavi SM, Sheng Y, Zhu W, Beroza GC (2019) STanford EArthquake Dataset (STEAD): a global data set of seismic signals for AI. IEEE Access 7:179464–179476. https://doi.org/10.1109/ACCESS.2019.2947848
-
Mousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11(1):3952. https://doi.org/10.1038/s41467-020-17591-w
-
Münchmeyer J, Woollam J, Rietbrock A, Tilmann F, Lange D, Bornstein T, Diehl T, Giunchi C, Haslinger F, Jozinović D, Michelini A, Saul J, Soto H (2022) Which Picker Fits My Data? A Quantitative Evaluation of Deep Learning Based Seismic Pickers. J Geophys Res Solid Earth 127(1):e2021JB023499. https://doi.org/10.1029/2021JB023499
-
Officer T, Secco RA (2020) Detection of high P,T transformational faulting in Fe2SiO4 via in-situ acoustic emission: relevance to deep-focus earthquakes. Phys Earth Planet Inter 300:106429. https://doi.org/10.1016/j.pepi.2020.106429
-
Officer T, Zhu L, Li Z, Yu T, Edey DR, Wang Y (2022) Application of the double-difference relocation method to acoustic emission events in high-pressure deformation experiments. Phys Chem Miner 49(8):29. https://doi.org/10.1007/s00269-022-01203-8
-
Ohuchi T, Lei X, Ohfuji H, Higo Y, Tange Y, Sakai T, Fujino K, Irifune T (2017) Intermediate-depth earthquakes linked to localized heating in dunite and harzburgite. Nat Geosci 10:771–776. https://doi.org/10.1038/NGEO3011
-
Ohuchi T, Higo Y, Tange Y, Sakai T, Matusda K, Irifune T (2022) In situ X-ray and acoustic observations of deep seismic faulting upon phase transitions in olivine. Nat Commun 13:5213. https://doi.org/10.1038/s41467-022-32923-8
-
Peng Z, Zhao P (2009) Migration of early aftershocks following the 2004 Parkfield earthquake. Nat Geosci 2(12):877–881. https://doi.org/10.1038/ngeo697
-
Schubnel A, Brunet F, Hilairet N, Gasc J, Wang Y, Green HW (2013) Deep-Focus Earthquake analogs recorded at high pressure and temperature in the laboratory. Science 341:1377–1380. https://doi.org/10.1126/science.1241764
-
Shen G, Wang Y (2014) High-pressure apparatus integrated with synchrotron radiation. In: Henderson GS, Neuville DR, Downs RT (ed.) Spectroscopic Methods in Mineralogy and Materials Science. Reviews in Mineralogy and Geochemistry, vol 78(1). Mineralogical Society of America, Chantilly, pp 745–777. https://doi.org/10.2138/rmg.2014.78.18
-
Shi F, Wang Y, Wen J, Yu T, Zhu L, Huang T, Wang K (2022) Metamorphism-facilitated faulting in deforming orthopyroxene: implications for global intermediate-depth seismicity. Proc Natl Acad Sci USA 119(11):e2112386119. https://doi.org/10.1073/pnas.2112386119
-
Shi F, Wang Y, Officer T, Yao D, Yu T, Zhu L, Wen J, Zhang J, Peng Z (2024) Transformational faulting in Mn2GeO4 from olivine to wadsleyite structure: implications for physical mechanism of deep-focus earthquakes. Tectonophysics 889:230467. https://doi.org/10.1016/j.tecto.2024.230467
-
Tanaka R, Naoi M, Chen Y, Yamamoto K, Imakita K, Tsutsumi N, Shimoda A, Hiramatsu D, Kawakata H, Ishida T, Fukuyama E, Tanaka H, Arima Y, Kitamura S, Hyodo D (2021) Preparatory acoustic emission activity of hydraulic fracture in granite with various viscous fluids revealed by deep learning technique. Geophys J Int 226(1):493–510. https://doi.org/10.1093/gji/ggab096
-
Wang Y, Durham WB, Getting IC, Weidner DJ (2003) The deformation-DIA: a new apparatus for high temperature triaxial deformation to pressures up to 15 GPa. Rev Sci Instrum 74(6):3002–3011. https://doi.org/10.1063/1.1570948
-
Wang Y, Lupei Z, Shi F, Schubnel A, Hilairet N, Yu T, Rivers M, Gasc J, Addad A, Deldicque D, Li Z, Brunet F (2017) A laboratory nanoseismological study on deep-focus earthquake micromechanics. Science 3(7):e1601896. https://doi.org/10.1126/sciadv.1601896
-
Wang X, Yue Q, Liu X (2024) Reliable arrival time picking of acoustic emission using ensemble machine learning models. Mech Syst Signal Process 215:111442. https://doi.org/10.1016/j.ymssp.2024.111442
-
Woollam J, Van der Heiden V, Rietbrock A, Schurr B, Tilmann F, Dushi E (2022) Machine learning event detection workflows in practice: a case study from the 2019 Durrës aftershock sequence. arXiv e-prints. arXiv:2205.12033
-
Zhu L, Peng Z, McClellan J, Li C, Yao D, Li Z, Fang L (2019) Deep learning for seismic phase detection and picking in the aftershock zone of 2008 Mw7.9 Wenchuan Earthquake. Phys Earth Planet Inter 293:106261. https://doi.org/10.1016/j.pepi.2019.05.004
Acknowledgements
The experimental data were collected at GeoSoilEnviroCARS (The University of Chicago, Sector 13), Advanced Photon Source, Argonne National Laboratory. GeoSoilEnviroCARS is supported by the National Science Foundation–Earth Sciences via SEES: Synchrotron Earth and Environmental Science (EAR–2223273). The experiment used resources of the Advanced Photon Source, a US Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. We would like to thank the reviewers and the editor for their valuable comments. We also thank Amanda Jackson and Cheryl Sheehan for their support and inspiring discussions, and the IRIS Undergraduate Research Internships in Seismology (URISE) Program for fostering the collaborations that gave rise to this project.
Funding
J.S. was supported by the IRIS Undergraduate Research Internships in Seismology (URISE) Program. This research was supported by a number of NSF grants: EAR-1661489 and 1925920 (to YW), EAR-1661519 (to LZ) and EAR-1925965 (to ZP).
Author information
J.S. and Z.P. designed the methodology. J.S. implemented the study and conducted the analyses. Q.Z. and L.Y.C. assisted with the application of EQTransformer. T.O. and Y.W. provided the experimental datasets and helped with their interpretation. L.Z. provided the MultiNet results and helped with their interpretation. J.S. wrote the manuscript with direction from Z.P. All authors reviewed the manuscript and contributed to the discussion and presentation of the results.
Ethics declarations
Not applicable.
Not applicable.
The authors declare no competing interests.
Additional information
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Profiles
-
Jack Sheehan
View author profile






