1 Introduction

Acoustic emissions (AEs) are mechanical waves corresponding to sudden irreversible changes within a sample material, serving as a measure of rapid cracking (creation or growth) at in situ conditions (Lei and Ma 2014; Lockner and Byerlee 1992). Unlike the more well-known ultrasonic testing that uses active waveform pulses to interrogate a sample’s elastic properties, AE testing passively “listens” to acoustic signals generated by small ruptures in the sample during deformation. Therefore, an AE waveform reflects the convolution of both its source mechanism and subsequent path effect. AEs in rock samples are studied as a model of natural earthquakes; it is well established that earthquakes in the crust and AE events in stressed rocks show mechanical and statistical similarities in a wide range of aspects (Lei and Ma 2014). The cumulative number of AEs in a sample denote total internal damage, while tracking AE paths and energy peaks can be used to recreate fracture nucleation and growth patterns, which are essential for understanding laboratory earthquake precursors and prediction (Bolton et al. 2020; Goebel et al. 2024).

Numerous AE experiments have been conducted to study fractures facilitated by mineral phase transformations at a wide range of pressures (P) and temperatures (T). This makes AE testing a prime analytical technique for evaluating different hypotheses of physical mechanisms of deep earthquakes—seismicity below ~ 70 km depth, where rocks generally deform in a ductile fashion (Ohuchi et al. 2017, 2022; Shi et al. 2022, 2024; Wang et al. 2017). Following the collection of these internal waves generated by small ruptures, source location algorithms utilize the first arrivals identified from the signals to determine AE origin locations with time (Lockner 1993). In such experiments, relocated AEs can be combined with post-mortem microstructure analysis (Schubnel et al. 2013) or X-ray microtomography to link AEs to specific faults (Wang et al. 2017; Officer and Secco 2020) and even track nanoseismic event propagation along faults analogous to those proposed to form at depths of several hundred kilometers (Wang et al. 2017; Officer et al. 2022).

AEs are detected outside the sample by piezoelectric transducers, which convert changes in strain or force to an electric voltage. AE data are typically recorded in two ways, both of which present issues. One is to record continuously, whether events occur or not. While continuous recording ensures all potential events are captured, it may result in a very large dataset without many true events and even run the risk of filling up the entire disk space of the recording instrument. This size can quickly become debilitatingly large for analysts manually picking P-waves, and researchers are often forced to rely on a second recording method, event-triggered recording (Schubnel et al. 2013). In this method, a predetermined amplitude threshold is set and only the events whose amplitudes exceed the threshold are recorded. While this eliminates excess data, relying on thresholding may fail to capture all the events, particularly the smallest events. Additionally, the signals can be affected by electrical noise, which can overwhelm the triggered data with spurious signals. Without careful manual intervention, this will cause the total number of AE events to be misestimated and reduce information about fault nucleation on the smallest scale.

In the past decades, multiple techniques have been introduced to automatically detect earthquake signals recorded in natural settings. These include the matched filter technique—also known as template matching—which utilizes waveforms of known events as templates to scan through additional events with waveform cross-correlation techniques (Gibbons and Ringdal 2006; Lei et al. 2022; Peng and Zhao 2009). More recently, machine learning (ML) techniques trained on millions of labeled waveforms have quickly gained popularity in seismology due to their improved consistency across channels and stations, robustness to noise, and ability to handle waveform variations. (Kong et al. 2019; Münchmeyer et al. 2022). As seismic datasets have grown larger and more complex (Arrowsmith et al. 2022), these advantages have made ML particularly appealing to seismologists. ML methods can likewise be applied to detect and pick arrival times of AEs in laboratory settings, which often contain analogous challenges (Guo et al. 2021; Lei et al. 2022; Tanaka et al. 2021; Wang et al. 2024). MultiNet (Li et al. 2022) is one such specialized neural network (NN) that has demonstrated promising results in AE detection and picking. However, unlike natural earthquakes where millions of cataloged waveforms and picks are currently available (Mousavi et al. 2019), labeled AE data are still limited with only a few recent exceptions (Lei 2024). While the collection of AEs allows researchers to retrieve in situ measurements that would otherwise be impossible to obtain, the quantity of data and the labor required to retrieve accurate spatial and temporal resolution can be a significant drawback. At the length of scale of these experiments, small errors in P-wave arrival time can result in mm-scale errors in event relocation. Given that high-pressure generation necessitates restricted sample size, this can render relocation errors of the same order of magnitude as the sample length (Officer et al. 2022).

In this study, we directly apply an existing deep NN called EQTransformer (EQT) (Mousavi et al. 2020), which was trained on the global earthquake dataset STEAD (Mousavi et al. 2019), to detect and pick AEs generated during two different high P–T experiments investigating deep-focus earthquakes (DFEQs). Our motivation for selecting EQT as opposed to a specialized ML model such as MultiNet is twofold. By successfully repurposing EQT’s pre-trained model, we are able to immediately apply EQT without relying on labeled AE datasets, allowing us to take advantage of the millions of already labeled seismic waveforms. Also, EQT is open-source and community-maintained, making it a more sustainable and practical choice for future laboratory earthquake research. Our objective is therefore to evaluate EQT’s performance on a completely new dataset without retraining. We compare our detected results with both manual picks and MultiNet. In the next sections, we detail the experimental setup, EQT’s application, parameter tuning, evaluation metrics, and comparative analyses. We present detailed results and discuss EQT’s scalability and future potential as an efficient tool to evaluate AEs and other events in earthquake-related studies.

2 Data and methods

The two DFEQ experimental analogs on which we applied EQT returned both event-triggered data and continuous data from piezoelectric transducers attached to the rear of six orthogonally oriented anvils in the D-DIA (Y. Wang et al. 2003) and DDIA-30 (Shen and Wang 2014) multi-anvil deformation apparatuses (Fig. 1a). These were used to generate two datasets—labeled as D1247 (Schubnel et al. 2013; Wang et al. 2017) and D2540, respectively—collected during experiments of syn-deformational phase transformation from olivine to spinel in Mg2GeO4 in high P–T environments to investigate DFEQs. A more detailed description of the experimental techniques and motivations used for the datasets can be found in previous publications (Officer et al. 2022; Schubnel et al. 2013; Wang et al. 2017). For both experiments, event-triggered data were collected at 50 MHz for all six transducers. The signal was amplified between 40 and 60 dB, and the AEs are registered using a trigger and hit count unit (THC) (Wang et al. 2017). As continuous data pass through the unit, recordings are automatically discarded unless AE events register enough signal to surpass a predetermined voltage threshold. Once an event is registered, an 81.9–163.8 µs trace (comprised 4096–8192 sampling points) was extracted to capture the triggering signal.

Fig. 1
Fig. 1

Experimental setup. a Diagram of the DDIA-30 experimental setup used to generate the AE data in D2540. The sample pressure cell is centered between the six anvils. The short arrows indicate the compression of the cell by the anvils. b Periodic background signal for a 0.3-s segment of continuous data from D2540 compared to an acoustic emission event (in the dashed box). The signal in the box is magnified 250 × to show the unprocessed event, the absolute value of the signal after applying a 1-MHz high-pass filter, and the spectrogram in c, d, and e, respectively

Full size image

Fig. 2
Fig. 2

Flowchart. Diagram detailing the steps taken to detect and pick AE events using EQT for D2540 and D1247. The shaded, dashed box represents the work done in this study. The results from Li et al. (2022) and Mousavi et al. (2020) are used to evaluate EQT’s performance in this study

Full size image

Fig. 3
Fig. 3

EQT Output. EQT waveforms generated from an event detected in a D1247 on the six event-triggered channels, and b D2540 on the four continuous channels. In both a and b each trace is 60 s long at 100 Hz, or 6000 sampling points. Each event is detected on all available channels and displays a detection probability above 0.6 and a P-wave probability above 0.3. Because a displays triggered data, the tail of the preceding appended event can be seen on each of the six channels’ traces

Full size image

Meanwhile, waveform data on channels 3–6 were continuously recorded at 10 MHz in tandem with the event-triggered data. This allows for the detection of non-impulsive and long-period signals that do not cross the THC threshold (Ohuchi et al. 2022). Additionally, continuous data are recorded at a relatively low gain of 30 dB compared to 40 dB in event-triggered data. This reduces the sinusoidal background in the data—a product of large electrical noise resulting from the acoustic equipment. A benefit of this lower gain is a decrease in the saturation of large events as compared to the event-triggered data. Continuous recording also allows for potential correlation between events that were only triggered on some channels, which we used to analyze EQT’s detection of new events (see the following sections).

The first dataset, D2540, consisted of 3901 triggered AE events with manually picked P-wave arrivals, as well as a full continuous dataset. EQT was applied to the continuous waveforms and then compared to EQT’s advertised success rate from its intended application on global seismicity, published in Mousavi et al. (2020). This allowed us to quantify the precision of EQT’s AE P-picks compared to earthquake P-picks. The second dataset, D1247, consisted of 550 manually picked triggered AE events, and was recently analyzed using a purpose-built AE detector and picker, MultiNet (Li et al. 2022). Comparing our results to MultiNet allowed us to quantify the ability of EQT to detect and pick AE P-waves compared to an independent NN developed specifically for AE detection. In experiment D2540, EQT was run on the continuous data because it contained a wider variety of waveforms to quantify the model’s performance. For D1247, EQT was run on the event-triggered files so a direct comparison to MultiNet could be made. A schematic depicting this process is shown in Fig. 2.

Because EQT was trained on genuine earthquake data, both datasets were manipulated to mimic seismic data from regular earthquakes. This included adding placeholder metadata to the miniSEED event files, such as network, station, location, and channel information. Additionally, we altered the time series’ sampling rates. For both D2540 and D1247, EQT read the data as 100 Hz, allowing us to directly compare the datasets to one another and to EQT’s advertised performance on earthquake data. To reduce subsequent computation time, we downsampled the 50-MHz event-triggered D1247 data to match the continuous 10 MHz D2540 data. Before downsampling, a low-pass filter with a cutoff frequency of 4.5 MHz was applied to the dataset to prevent aliasing. We then applied a scaling correction to the time and distance units for both datasets in EQT, taking advantage of the common 106 scaling factor between “labquakes” (µs, mm) and earthquakes (s, km). Manually selected P-wave arrival times were picked using the event-triggered data, totaling 3901 and 550 for D2540 and D1247, respectively. These were treated as the true arrival times in order to evaluate EQT’s performance. For the triggered data on each of the six channels in D1247, we appended the 550 individual traces to form a single 45,056 µs waveform with 550 confirmed events, 550 times the length of each 81.9 µs event-triggered file.

In addition to re-packaging the AEs as standard seismic waveforms, we also tuned EQT’s parameters to optimize performance for AE data. The variable parameters utilized were detection threshold, P-pick threshold, batch size, and percent overlap. The targeted performance was to eliminate false negatives, but we further evaluated EQT’s performance using the following assessments: mean error (µ), mean absolute error (MAE), standard deviation (σ), and F1-score. Mean error, mean absolute error, and standard deviation characterize the spread of error between P-wave arrival times that were picked by both EQT and analysts. These are the measures used to compare EQT’s performance on D2540 to its advertised performance on earthquakes. F-1 score, on the other hand, evaluates the model’s robustness using the harmonic mean of precision and recall:

$$F1 = frac2P_T 2P_T + P_F + N_F ,$$

where PT is the number of true positives, PF is the number of false positives, and NF is the number of false negatives. This score is used to compare EQT’s performance on D1247 to MultiNet’s performance on the same dataset. All four metrics are used to compare this study’s results to each other.

EQT also detected and picked new events that were previously missed by either the event-triggered threshold or analyst. To characterize the new detections as true AE events, we utilized EQT’s uncertainty estimation. By using the dropout technique (Gal and Ghahramani 2016)—a form of stochastic regularization—between each layer of its NN, EQT can make probabilistic statements related to its parameters. For each event, EQT outputs detection and picking uncertainties (Fig. 3). These results were compared between events and across channels. Because EQT demonstrated strong cross-channel consistency, to confirm each of the new events found in both datasets, we required the event to be detected on three of the four continuous channels in D2540, or five of the six event-triggered channels in D1247. New events were only considered true if they were detected on the minimum number of channels and their detection and picking uncertainties were at least equivalent to the minimum uncertainties in the surrounding known events.

An additional EQT parameter is S-pick probability. A crucial difference between our AEs and typical earthquake waveforms is the S-wave. In earthquake waveform data, the S-waves are typically larger than the P-waves. However, in the AEs collected in this study, the S-waves are much weaker. Unlike typical seismometers that have three components, piezoelectric transducers only have one component, oriented to measure vibrations perpendicular to the transducer surface. Most of the S-wave energy is polarized perpendicular to this; the remaining contribution of S-waves polarized parallel to this direction is very small. Because reverberations inside the crystals caused by the elastic waves tend to contaminate the later signals, including S-phase detection thresholds in our model significantly increased the number of false negatives in our results and lowered our other performance metrics. Given the disproportionately small contribution and large contamination of S-waves, we therefore elected to only compare P-picks and event detections in our subsequent analyses.

3 Results

For experiment D2540, we employed the now deprecated EqT_model2.h5—corresponding to the updated Original Model—which was designed to be robust to false negatives (Mousavi et al. 2020). A detection threshold of 0.6 and a P-pick threshold of 0.3 were used, along with a batch size of 500 and 30% overlap. These thresholds were chosen to eliminate false negatives; the run was considered successful if EQT detected all the manually picked events in each experiment. When applied to the four continuous channels, EQT identified 6422 events (Fig. 4a) that were detected on at least three out of four channels, including all original 3901 events. This is a 64.62% increase in the number of events originally found using the triggered data. One such newly identified event is shown as an example in Fig. 5, where it can be seen ~ 15 µs before a much larger event that was captured in the event-triggered dataset. The newly identified event was not recorded by the THC unit because it was too small to cross the trigger threshold, but EQT detected it on three of the four channels independently of the larger previously detected event.

Fig. 4
Fig. 4

EQT D2540 Performance. a Euler diagram of events detected by EQT (blue), and by both EQT and analysts (purple). No events were detected solely by analysts, which would correspond to a false negative. b Frequency and density distribution of arrival time differences for the 3901 D2540 P-waves picked by analysts and EQT

Full size image

Fig. 5
Fig. 5

Undetected Event. a EQT output of one of the 2521 previously undetected events. The blue line represents the P-wave of the new event, while the purple line represents the P-wave of the event-triggered event. (Note: these colors correspond to the Euler diagram in Fig. 4a.) b A segment of the continuous recording system trace corresponding to this missed event for Channels 1, 2, and 4. The arrow on the right indicates progressively worse signal-to-noise ratios

Full size image

To test the accuracy of the EQT P-picks, we calculated the arrival time differences between the 3901 P-waves that were picked both manually using the event-triggered data and by EQT. This distribution is shown in Fig. 4b. Treating the manual picks as the true arrival times, the P-pick mean difference of these events is − 0.7630 ns, which is much less than the 20 ns sampling interval and can be considered effectively zero. In addition, the distribution is nearly symmetric about the mean. The mean absolute error is 0.05848 µs (< ± 6 sampling points when corrected to 100 Hz and scaled by 106), with a standard deviation of 0.07491 µs (< ± 8 sampling points). The small absolute size of the mean and normal distribution of the errors signifies that EQT is not biased toward early or late picking. The F1-score is 0.9770.

Table 1 Summary of dataset evaluations

Full size table

In order to contextualize our results, we compared EQT’s performance on D2540 to its performance published in Mousavi et al. (2020) on the Mw 6.7 Tottori earthquake on October 6, 2000. Using detection, P-pick, and S-pick probability thresholds of 0.5, 0,3, and 0.3, respectively, Mousavi et al. (2020) found that EQT detected 401,566 P– and S-wave arrival times, compared to 279,104 arrival times picked manually by Japanese Meteorological Agency (JMA) analysts. This is 143.9% of the number of arrival times detected. Of the ~ 42,000 arrival times that were picked from the same stations by both EQT and JMA analysts, Mousavi et al. (2020) showed that the mean error distribution in P-pick time was 0.01 s, the MAE was 0.06 s (6 sampling points), and the standard deviation was 0.08 s (8 sampling points). Their results are directly comparable to ours because of the common scaling factor between AE and typical seismic data.

For experiment D1247, because the dataset is generated using similar experimental conditions, the model and threshold parameters that were tuned according to the D2540 data were used. An important distinction between the two datasets, however, is that EQT was run only on the event-triggered data for D1247. This was done to replicate the conditions used to evaluate MultiNet, which was solely applied to the triggered data (Li et al. 2022). When applied to the 550 triggered traces on each of the six channels, EQT detected and picked 572 events and P-waves on at least five out of six channels, including all 550 original events. This is a 4% increase in the number of P-waves picked manually using only the event-triggered files, which requires that undetected events happened to be captured within the same trace as an event registered using the THC system. This result from EQT illuminates one of the primary benefits of high-sampling continuous recording. Given the low likelihood of multiple events captured within the same trace, it is not unreasonable to assume that it could detect many more small events if applied to continuous data, as it did when applied to the D2540 dataset.

Using the 550 common events, we calculated the arrival time differences between manual and automatic picks (Fig. 6c). Once again, we assume the manual analyst picks are the true arrival times. This results in a mean error of 0.006297 µs, a mean absolute error of 0.09367 µs (~ ± 9 sampling points when corrected to 100 Hz and scaled by 106), and a standard deviation of 0.1071 µs (< ± 11 sampling points). The corresponding F1-score is 0.9905. Because this mean error is again very small compared to the other measures, we can continue to infer that there is no bias towards picking early or late.

Fig. 6
Fig. 6

EQT D1247 Performance. a Frequency and density distribution of arrival time differences for the 500 P-waves picked by both MultiNet and EQT. b The relationship between arrival time differences for EQT when compared to MultiNet and human analysts for the 500 events picked by each of the three methods. c Frequency and density distribution of arrival time differences for the 550 P-waves picked by both analysts and EQT

Full size image

To mirror the comparison of the D2540 EQT results to the EQT results when applied to earthquake data published in Mousavi et al. (2020), the D1247 EQT results were compared to MultiNet’s performance on the same D1247 dataset from Li et al. (2022). Using a convolutional NN to identify AE events and a fully convolutional NN to pick P-wave arrivals, (Li et al. 2022) determined the optimal output threshold and image length to be 0.001 and 256, respectively. With these parameters, MultiNet detected 548 events in 500 of the event-triggered waveforms, which translates to 109.6% of the original events picked manually (Li et al. 2022). It is important to note that 50 triggered waveforms were removed from the dataset for training purposes. Therefore, MultiNet identified all 500 of the manually picked events on which it was run, as well as 48 new events (Li et al. 2022). This resulted in a calculated F1-score of 0.9930.

Because both EQT and MultiNet were applied to the same dataset, it is possible to directly compare their respective arrival time picks (Fig. 6a). Using the 500 common events between the two methods, we calculated the difference in EQT and MultiNet pick times for each event. The result is a mean time difference of − 0.0795 µs, mean absolute error of 0.04734 µs (< ± 5 sampling points), and standard deviation of 0.03112 µs (~ ± 3 sampling points). Instead of considering one method the true value and analyzing the other method’s “error”, here we compare these measures to the calculated arrival time differences between manual and EQT picks defined above (µ = 0.006297 µs, MAE = 0.09367 µs, σ = 0.1071 µs). We found that the time difference still follows a normal distribution, but now the mean difference is an order of magnitude larger than when EQT was compared to manual picks on both datasets (Fig. 6a, c).

Additionally, the measures of spread in the data—mean absolute error and standard deviation—are approximately half as large as the corresponding values in the manual difference calculations, which is represented by the short, wide density distribution in Fig. 6b. This indicates that EQT is generally in better agreement with MultiNet than with the analysts’ P-picks. While this could be due to the similarities of the two deep-learning-based methods, it may also indicate a randomness associated with analyst picks. However, measures of spread in both cases are within the same order of magnitude, demonstrating that the agreement between the two models is still similar to that between EQT and analysts.

4 Discussion

By comparing EQT’s results on continuous data in experiment D2540 to EQT’s performance in Mousavi et al. (2020), we analyzed EQT’s ability to detect and pick AEs without retraining. We also measured EQT’s results on event-triggered data in experiment D1247 against MultiNet’s results on the same dataset in Li et al. (2022) to compare EQT’s performance to an NN purpose-built for AEs. When applied to D2540 and measured with mean, mean absolute error, and standard deviation, EQT performed within one sampling point (20 ns) of its published arrival time differences on traditional seismic data. Despite the large difference in dataset size, EQT also achieved a similar increase in the percentage of events detected in this dataset: 164.6% in this study and 143.9% in Mousavi et al. (2020). When applied to D1247 event-triggered data and measured with F-1 score, EQT matched MultiNet in capturing all the original known events. However, MultiNet was able to identify over twice as many new events as EQT (Li et al. 2022), which explains why MultiNet’s F-1 score was 0.9930 compared to EQT’s 0.9905 (Table 1). Evaluating the events that analysts and both models detected, the differences between EQT and MultiNet arrival times are notably more tightly constrained than the differences between EQT and analyst picks.

Directly comparing the results of D2540 and D1247 in this study highlights differences between the two experiments’ data and provides insight into EQT’s performance. Potentially the greatest difference is the relative number of previously undetected true positives: in D2540, EQT detected 2521 newly identified events (a 64.62% increase), while in D1247 EQT detected 22 (4.545% increase). From these initial findings, the number of true positives appears to positively correlate with the number of data points. However, it also depends on the origin of the events; while D1247 records at 50 MHz compared to D2540’s 10 MHz, D2540 preserves the waveforms between triggered events with continuous recording. This introduces an inherent limitation when applying detection methods to event-triggered recordings. A significant number of the previously unidentified events in D2540 occurred between triggered traces and were missed because they were not large enough to trigger the THC unit. Consequently, when applied to only event-triggered data in D1247, EQT can only detect new events that were missed because they were too small to cross the threshold but were by chance present in the trace of a larger event. No matter how sensitive the model is, it cannot detect events that were not recorded. As expected, because D2540 contains both types of unidentified events, EQT detected more previously missed events in this dataset. Given ongoing advances in continuous, high-sampling, multi-channel waveform recording, EQT’s utility will likely increase significantly. Continuous, high-frequency recordings would enable EQT to fully leverage its strengths, efficiently capturing both small and large AE events. Future applications of EQT will benefit from these technological advances.

In both datasets, EQT detected all the previously known events. According to the means, MAEs, and standard deviations of these known events, EQT performed better on D2540 than on D1247 (Table 1). However, evaluating the two performances using the F1-score results in the opposite conclusion. Our chosen analytical scores provide slightly different contexts to our results. F1-score measures how robust the model is by accounting for false negatives, true positives, and false positives. Of the true positives that were detected by both analysts and EQT, mean, MAE, and standard deviation measure the dispersion of the correlated events.

The mean decreasing with the increasing number of events in the dataset further confirms that EQT shows no bias towards picking early or late compared to the manual picks. The larger sample size in D2540 decreases the impact of each random arrival time error, lowering the overall mean. This also agrees with our assumption that the better agreement between EQT and MultiNet compared to the agreement between EQT and the analysts in D1247 is due to manual—not model—randomness. The MAE and standard deviation correlations are less pronounced, but still slightly better for the D2540 dataset than D1247. A potential explanation is the trace sizes. When a shorter window around an event is used, it is more difficult to differentiate events from noise (Mousavi et al. 2020). Longer windows result in more variable probabilities, which can effectively strengthen EQT’s ability to accurately discern the P-wave arrival, potentially explaining this small discrepancy between the two results (Zhu et al. 2019). Extending the window fed into EQT does not improve this, because D1247 uses appended 81.9 µs triggered traces, which artificially shorten the windows.

The F1-scores display the opposite trend: D1247 results appear slightly more robust than D2540. This is likely because D1247 has less non-event data, leaving EQT with fewer waveforms to potentially mistake for false positives. Combining these two methods of evaluation, F1-score could be used to argue that event-triggered data are the easier data type for applying EQT directly without retraining; however, when provided with continuous datasets—which include more variance and non-triggered events—EQT can be more precise and potentially more useful for a wide array of datasets.

As ML continues to become more prevalent in the field of seismology, it is important to analyze models’ performances beyond their initial proof-of-concept applications. Currently, the growth of the field is moving so rapidly that evaluation techniques are struggling to keep up (Arrowsmith et al. 2022; Woollam et al. 2022). As noted in Mousavi and Beroza (2022), researchers often choose analytical techniques that highlight the strengths of their models. Our study demonstrates the complicated nature of evaluation by utilizing analytical criteria from two different studies, which can both be used individually to emphasize one model and dataset pairing over the other. Future clarity within analytical techniques will allow researchers to better understand the strengths and weaknesses of a package, which will in turn allow researchers to better infer conclusions about their data using a model’s performance. For instance, in this study, EQT was able to detect AEs at a similar rate to its own published application on global seismic events, despite not being retrained on AEs. This could serve as evidence in support of the scalability of earthquake mechanics from field to laboratory scales, also suggesting that the micromechanics of DFEQs can be replicated in a lab setting—an area of active research (Goebel et al. 2017, 2024; Officer and Secco 2020; Shi et al. 2024).

Specifically in the field of laboratory DFEQ analogs, applying ML algorithms like EQT to AEs will help researchers significantly reduce the human effort required for accurate phase picking in these datasets, leading to further examination of the fracturing process. When evaluating the utility of EQT in laboratory data processing, it is important to also consider its effectiveness compared to established automatic methods, such as template matching. The primary advantage of ML is its ability to generalize across a variety of waveform types without relying on previously identified similar waveforms (Mousavi and Beroza 2023). Our results in D2540 exemplify this, as EQT detected many new events between triggered events that more traditional template matching may have overlooked due to lower correlation coefficients to previously identified events. Additionally, traditional automated methods occasionally struggle to identify low-amplitude or noisy waveforms across all channels (Peng and Zhao 2009). EQT’s robust performance detecting events across all channels significantly improves event reliability, highlighting a distinct practical advantage of utilizing ML. Nevertheless, traditional automated methods remain highly effective, especially when waveforms are expected to be uniform across time and channels (Lei et al. 2022). As with conventional seismology, EQT is a valuable complementary analytical tool, particularly for initial applications for immediate results. Although not as accurate as MultiNet, EQT still performed well according to all evaluation metrics. By not requiring a purpose-built model—or even retraining—EQT offers a fast, simple solution for researchers interested in analyzing seismic-adjacent time series data. While retraining EQT or using a phase picking model like MultiNet designed for AEs will likely always result in better performance, this study characterizes EQT’s basic model as an efficient first option beyond template matching for researchers without an abundance of labeled training data.