Machine learning for earthquakes: detecting events in highly noisy time sequences


Impending rupture in man-made structures, natural landslides or earthquake faults is sometimes preceded by slow preparatory strain and increased acoustic emission. These can be hidden, low amplitude signals within a noisy time series, difficult to detect with traditional methods. While the paucity of data on natural faults prevents a systematic study of the preparatory strain events, laboratory experiments provide an opportunity to deepen our exploration of the dynamics of rupture onset, under a controlled and well-monitored environment.
We propose to train a machine learning algorithm with time sequences of experimental data to detect and forecast high-frequency dynamic rupture events and slow strain events.
The current frontier research on rock mechanics in the laboratory consists in monitoring Acoustic Emissions (AEs) and strain events at high frequency (10 MHz) for extended time periods. This is necessary to understand both the very short-lived dynamic rupture instability events and the long-lived, slow deformation instability events during the preparatory or nucleation phase (Figure 1). Impending rupture in man-made structures, natural landslides or earthquake faults is sometimes preceded by slow preparatory strain. However, the paucity of data prevents to assess how pervasive is such preparatory strain and the absence of a clear causality relation to instability prevents its use for reliable probabilistic forecasting. Laboratory experiments performed on rocks provide an opportunity to deepen our exploration of the dynamics of rupture onset, under a controlled and well-monitored environment.
Continuous high-frequency acquisition generates very large data sets for each experiment. Occasional instability events need to be detected and harvested to allow an efficient analysis. It can be quite tedious to detect events by hand, or even with the use of classic automated algorithms such as STA (Short Time Average) threshold detection. It is particularly challenging when the signal is drowned within a high noise level, and this is often the case in high frequency monitoring, or in seismic/geodetic data from natural events (Figure 2).
The potential of Machine Learning in rupture and instability detection will be investigated in a situation where large data sets are easily generated (laboratory experiments). However, we aim at upscaling and adapting effective algorithms to data available in earthquake-prone regions, where an acceleration in seismic moment rate and geodetic motion have been recently observed before large magnitude mega-thrust earthquakes.

Click on an image to expand

Image Captions

Figure 1. Time sequence representing the evolution of a laboratory fault with stick-slip and slow slip events during a two-hour long experiment. (a) Stress and friction with fast stick-slip (red labels), slow slip (green) and mixed rupture (blue) events. The saw-tooth pattern is typical of the loading/rupture alterning on earthquake faults. (b) indicates the detail of the imposed loading history, with hold times (no motion) and interval of load increase at different rates (green curve).

Figure 2. Clip of 4.5 s extracted from a long time series of strain gage recording on a rock sample undergoing rupture. (a) Original raw signal with high-noise level (black curve); (b) low-pass filter (blue, red) revealing the rupture event with the associated stress drop. Because the filter cannot be applied to the entire length of the signal, automated detection and harvesting of short time clips should be implemented on the original raw signal using ML. (From Simon Guerin-Marthe, PhD 2019).


We propose to train machine learning algorithms (Recurrent Neural Networks – RNNs and Random Forests) with time sequences of experimental data where the date of a number of hand-picked events is known. The trained algorithm will then be blind-tested on new time sequences of similar experiments. The initial aim is the detection of high-frequency dynamic rupture events; subsequently, a similar detection method will be tested for slow strain events. Furthermore, we will investigate whether a statistically significant forward correlation exists between slow strain events and an upcoming fast rupture (premonitory strain) and backward correlation (after-slip relaxation strain). Finally, time sequences of natural events in the vicinity of earthquake faults will be analysed (borehole strain-meters, broad-band seismic stations, GPS data).
Recurrent Neural Networks (RNNs, see Pattanayak, 2017; Goyal et al., 2018; Gale & Capelo, 2018) are very efficient for the detection of events. RNNs are very suitable for the task since contrary to other types of neural networks, they can analyses inputs of any length and also continuous data streams. This feature alone makes RNNs the best type of neural networks for a monitoring or an alerting system. RNNs are efficient at detecting particular events in a long data stream even if such events mixed with other signals, and can be easily extended to also classify different events types.
Random Forest algorithms, on the other hand, do not rely on neural networks but evaluate statistical properties within a sliding time window, which are then used to define nodes on a decision tree. This approach yielded promising preliminary results when applied to rupture data (Hulbert et al., 2019).
At the moment the best language for the development of Machine Learning is Python, which is the environment suggested for this project. One advantage of Python is the great number of ready made modules for artificial intelligence that reduce enormously the development time. In particular, for this project, we suggest using either Keras or TensorFlow.

Project Timeline

Year 1

The initial months of the projects will be mainly devoted to an in-depth study of the state-of-the-art on earthquakes and machine learning techniques.
The student will conduct a Bibliographic research, receive training in programming tools (python, machine learning, data analysis).
The student will also become acquainted with protocols and techniques of rock mechanics laboratory experiments.
My the mid- to end-of-year, a first series of short duration (minutes to hours) laboratory experiments will be conducted on experimental earthquake faults.
High frequency acoustic emission data from a network of sensors will be collected (strain gages, piezoceramic sensors and displacement sensors).
Preliminary algorithms for automatic detection and classification of know events will be designed using recurrent neural network and other machine learning methods.
The algorithms will be trained, tuned and tested using time series from both an existing database and newly realised experiments.

Year 2

The second year will be devoted to the development of longer duration (hours to days) laboratory experiments and fully developed machine-learning algorithms.
To this end, the most accurate and efficient type of algorithms will be selected, and expanded into a full analytic machine-learning suite for lab data.
The detection of known events in the time series will be augmented by tests on the detection of unknown events. What appears as noise in the data actually contains some information on the evolution of the fault. Machine learning has allowed to identify unknown tremors and increased seismic activity, in time series previously classified as noise.
The identification of unknown events will then be tested as a possible probabilistic forecasting tool for earthquake rupture.
The student and the supervisory team will start to organise the results obtained into publishable form, with the aim to disseminate them through articles in scientific journals, conferences, seminars, web pages.

Year 3

The work started in the second year will be pursued, with the additional testing of the developed analytical tools on data from natural faults (seismological broad-band data, geodesy, strain-meters). The use of the algorithms in probabilistic forecasting of seismic or non-seismic instability events will be tested. The accuracy will be cast in terms of probability of forecasting successfully within a given time, space and magnitude window, versus over-prediction (cry wolf effect). Time should be allowed to finalise the publication and dissemination of main results, and to the write-up of the PhD manuscript.

Year 3.5

Time should be allowed to finalise and confirm the tests, to the publication and dissemination of main results, and to the write-up of the PhD manuscript.

& Skills

– Programming in Python
– Machine learning applications
– Data analysis and signal processing
– Laboratory experiments on rock mechanics
– Earthquake hazard, statistics and probabilistic forecasting

References & further reading

– Guarina-Marthe, S., Nielsen, S., Bird, R., Giani, S. & Di Toro, G. Earthquake Nucleation Size: Evidence of Loading Rate Dependence in Laboratory Faults. Journal of Geophysical Research: Solid Earth 124(1): 698-708. 2019
– Hulbert, C. et al. Similarity of fast and slow earthquakes illuminated by machine learning. Nature Geoscience 12: 69-74. 2019
– Pattanayak, S. Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python. Apress, 2017.
– Goyal, P. and Pandey, S. and Jain, K. Deep Learning for Natural Language Processing: Creating Neural Networks with Python. Apress 2018.
– Galea, A. and Capelo, L. Applied Deep Learning with Python: Use scikit-learn, TensorFlow, and Keras to create intelligent systems and machine learning solutions. Packt Publishing, 2018.

Further Information

Contact info: Stefan Nielsen can be reached by email at or by phone at 191 3344308.
Stefan Giani can be reached by email at stefano or by phone at 191 3342397.

Apply Now