Research ArticleGEOPHYSICS

Crowdsourcing triggers rapid, reliable earthquake locations

See allHide authors and affiliations

Science Advances  03 Apr 2019:
Vol. 5, no. 4, eaau9824
DOI: 10.1126/sciadv.aau9824

Abstract

In many cases, it takes several minutes after an earthquake to publish online a seismic location with confidence. Via monitoring for specific types of increased website, app, or Twitter usage, crowdsourced detection of seismic activity can be used to “seed” the search in the seismic data for an earthquake and reduce the risk of false detections, thereby accelerating the publication of locations for felt earthquakes. We demonstrate that this low-cost approach can work at the global scale to produce reliable and rapid results. The system was retroactively tested on a set of real crowdsourced detections of earthquakes made during 2016 and 2017, with 50% of successful locations found within 103 s, 76 s faster than GEOFON and 271 s faster than the European-Mediterranean Seismological Centre’s publication times, and 90% of successful locations found within 54 km of the final accepted epicenter.

INTRODUCTION

Rapid public earthquake information is essential for both the public and authorities and can contribute to a more efficient earthquake response (1). Until now, the majority of efforts to tackle this challenge have been oriented toward the implementation of dense seismological networks with fast and robust data communication. In the best cases, early warning systems have been deployed to rapidly locate earthquakes and estimate their magnitude using the closest seismic stations and potentially warn the population before the seismic waves reached their location (24). However, those performant systems require high investment and maintenance, and therefore, they are only implemented in a few regions of the world. For many earthquakes worldwide, even a location within several tens of seconds of an event would be a useful acceleration of current practice.

Here, we present a new approach to boost seismic network performance, which we call CsLoc (Crowdseeded Seismic Location). Since 2008 (5), the EMSC (European-Mediterranean Seismological Centre) has been developing systems for the detection of felt earthquakes via crowdsourcing. While other crowdsourced approaches in seismology have focused on using accelerometers in smartphones (69) or dedicated sensors that are maintained by the public (6, 10), our approach is a side effect of the public’s search for information or their online reactions. Presently, peaks in the rate of connections to the EMSC websites, LastQuake app launches, or tweets with certain keywords rapidly alert the EMSC that an earthquake is happening (1113), with 85% of these crowdsourced detections having a known seismic cause (see table S1). Multiple detections can sometimes occur because of the multiple detection methods and also since countries are monitored individually to increase signal-to-noise ratio. Following a detection, the barycenter of its activity is computed by applying a clustering algorithm to individual users’ geolocations, which has been found to have an accuracy to within hundreds of kilometers of the earthquake’s epicenter (see the Supplementary Materials). The barycenter represents a center of the public response and so is not necessarily expected to coincide with the epicenter; for instance, the earthquake could be offshore or in an uninhabited region. Hence, it indicates an earthquake’s region but does not always provide an accurate estimation for its location. Nevertheless, this barycenter is the seed for the seismic location algorithm.

The CsLoc procedure is illustrated in Fig. 1. Upon notification of a crowdsourced detection, CsLoc searches for phases from stations within 2000 km of the crowdsourced barycenter and close in time to the crowdsourced detection time. A “phase” or “arrival time” is the measured time that a seismic wavefront arrives at a seismic station. A simple phase association procedure is then used. The chosen phases are located in a window centered on a regression to the ak135 seismic propagation model (Fig. 2) (14). These phases are then analyzed by the location software iLoc (15). If a location is found, then the result must also satisfy publication criteria defined in terms of standard seismological parameters. Until the criteria are met, the process runs iteratively at 15-s intervals; each iteration may add newly received arrival times and starts from the solution obtained in the previous iteration (see Fig. 2A). For this study, P-wave arrival times have been provided by the GEOFON program from more than 800 stations affiliated to 73 seismic networks distributed worldwide from the International Federation of Digital Seismograph Networks (FDSN) (16). In real time, data are received via the httpmsgbus(HMB) protocol (17), which is essential for the system’s response time. The transmission of data from GEOFON to the EMSC has been measured to take less than 1 s. Details of the entire procedure are available in the Supplementary Materials.

Fig. 1 CsLoc fuses crowdsourced and seismic detection of earthquakes.

Crowdsourced detections are quick but do not yield the physical properties of an event, and some detections are not related to seismic events. Seismic networks need strong quality criteria for the automatic publication of seismic events to avoid false detections. The fusion of the two sources of data improves the reliability of crowdsourced detections and reduces the response time of a seismic network for the rapid location of felt events.

Fig. 2 The CsLoc procedure.

(A) Flowchart of the CsLoc association and location process. (B) First iteration of a typical CsLoc analysis. Fast-arriving Pn-phases up to 10° from the initial crowdsourced location are considered in the association process. (C) Phases within three times the median absolute deviation (MAD) are used for the iLoc location analysis. (D and E) The location process typically obtains a stable solution in less than 10 iterations. By the 10th iteration, more phases have arrived and the arrival times are highly aligned on the predicted Pn travel-time curve. Consequently, many more stations contribute to the location (note that the EMSC-published epicenter is hidden by the CsLoc epicenter).

RESULTS

To validate the CsLoc system, we analyzed 1536 earthquakes seen by crowdsourced detections recorded between 1 January 2016 and 31 December 2017 at the EMSC (Fig. 3). The testing was retroactive but made to be as realistic as possible, with each iteration in the analysis gaining access to only those phases that would have been available according to the creation time of each phase arrival. Including duplicated detections of events and 370 crowdsourced detections that are unexplained seismically or false detections, we tested 2590 crowdsourced detections in total. CsLoc was demonstrated to be a high-reliability system for the detection of felt earthquakes: Within the full 2590 detections, only 4 false detections led to a location, giving a false-positive rate of only 0.15%. Hence, CsLoc can be used to rapidly validate that a crowdsourced detection has a seismic origin, which is valuable knowledge independent of the reliability of the location found (the corollary is not true; lack of confirmation does not imply that a detection is erroneous).

Fig. 3 Testing of CsLoc on crowdsourced detections from 2016 and 2017.

(A) Results of CsLoc analyses overlaid on a density plot of the number of GEOFON seismic stations within 1000 km of each position. Successful locations are related to local network density: Almost all nonlocalized events are out of the network. (B) Results broken down by crowdsourced detection source. Note that some earthquakes were detected by multiple systems. Success rates are similar for each source of event detection. (C) Histogram of separation of first publishable CsLoc result for each earthquake with respect to the final EMSC-published epicenter.

As the seismic network was not homogeneously distributed, successful locations occurred predominantly in well-instrumented regions (Fig. 3A). Of the 1536 earthquakes, CsLoc located 735 earthquakes well enough to satisfy our publication criteria, and 97% of these were located at less than 100 km from the published EMSC location and 89% were located at less than 50 km. Successful location was not found to depend on earthquake magnitude (fig. S5). For the rest, 231 events were poorly located and 520 had insufficient data. Although it had been considered a possibility, for this study, CsLoc did not locate any earthquakes that were not already recorded in the EMSC databases, although it did detect events that were not officially published by GEOFON while using the same seismic stations (discussed later).

For the 735 CsLoc-located earthquakes in our test dataset, 50% of locations were found within 103 s from the origin time and 90% were found within 3 min. The speed of the CsLoc system depends largely on two factors: the seismic network coverage and the density of internet-enabled users in the felt area. Closer stations have shorter seismic wave travel times, making the phases available quickly. Equivalently, dense populations generally lead to faster crowdsourced detections (independently of earthquake magnitude). When both criteria are fulfilled, the location can be computed in some cases in less than a minute. The delay of the CsLoc system for the selection of phases and analysis by the iLoc software was less than 5.3 s in 90% of cases. As shown in Fig. 4A, the system was actually often constrained by the time taken to receive enough phases to produce a publishable result.

Fig. 4 Latency of CsLoc during testing.

(A) Breakdown of the analysis delays for the 735 earthquakes located by CsLoc using the earliest publishable location. Analysis is largely limited by the time required to collect sufficient phases. (B) Violin plot of the minimum publication delays for CsLoc, GEOFON, and EMSC from an analysis of the set of 429 earthquakes detected by both CsLoc and GEOFON within 10 min of the origin time.

For comparison with a purely seismic system, the current system of publishing earthquake locations at GEOFON relies on strict criteria for the automatic analysis to guarantee high-quality solutions. To avoid false locations, 30 associated phases are necessary to produce an automatic published solution. CsLoc located more events than GEOFON within the first 10 min following each event (735 versus 406). CsLoc located more small or moderate events usually since these events failed to have the 30 phases necessary for GEOFON to publish them. GEOFON also located 426 additional earthquakes more than 10 min after each event; these were mainly large-magnitude events at teleseismic scale (a long distance from the network) that were not detected by CsLoc (fig. S6). CsLoc is not currently optimized to detect such events as it only runs for a few minutes after each event and because its publication criteria are optimized for earthquakes with seismic stations relatively nearby (see the Supplementary Materials).

Figure 4B compares the temporal performance of the CsLoc, GEOFON, and EMSC location catalogs for the subset of felt events detected by all of the systems. The medians of the three distributions show that CsLoc accelerates the publication time by 76 s with respect to GEOFON and by 271 s with respect to the EMSC’s systems (more details in table S4). A study of the internal results of GEOFON showed that many of the published earthquakes were registered much earlier than their publication time (as well as some that were never published); thus, it appears that much of the benefit of CsLoc comes from the low false-positive rate obtained from using crowdsourced detections, which allows more aggressive publication criteria to be used while still obtaining reliable results.

DISCUSSION

In summary, fusion of crowdsourced detections and seismic data yields timely, accurate earthquake locations for precisely those earthquakes that are felt by and that most affect the public. It accelerates the publication of locations by over a minute compared to seismic analysis alone, and it can confirm a seismic event more reliably than purely crowdsourced detections (with 97% accuracy compared to 85%). This can be used to raise situational awareness and engage with eyewitnesses. CsLoc is also multiscale, locating events independently of magnitude as long as there is regional station coverage. CsLoc does not compete with the rapidity of early warning systems, but advantageously, CsLoc works on a global scale and is a low-cost approach, since it takes advantage of existing seismic networks and the crowdsourced detection methods require relatively low investment [compared to the installation of early warning detector networks (2)].

This fusion of different data sources (crowdsourced and instrumental) unveils a new path in citizen science, where public engagement becomes a tool for improving scientific results. Our intention is to start using CsLoc to provide preliminary earthquake locations in real time for EMSC’s crowdsourced detections to more rapidly inform the public. It is hoped that its global coverage can be improved by enlarged collaboration with other seismic networks.

MATERIALS AND METHODS

Crowdsourced detection of earthquakes

In this study, three systems of crowdsourced earthquake detections were used: the detection of increased traffic on the www.emsc-csem.org or m.emsc-csem.org website, the detection of increased launches of the EMSC’s LastQuake app, and the detection of peaks in the rate of tweets with less than seven words and containing “earthquake” in 1 of 59 languages [Twitter Earthquake Detection (TED)]. Each peak was associated either automatically or manually to a published earthquake record with an 85% success rate. Table S1 shows the detections and success rates for each system during the test period of 2016–2017.

The EMSC’s detection systems have been developed over several years to their current state (5, 12, 18, 19). The TED system is a collaboration between the EMSC and P. Earle and M. Guy at the United States Geological Survey (USGS). During 2016–2017, detections were made at the USGS (13) by their Twitter monitoring process and then forwarded automatically to the EMSC for further analysis, association with EMSC earthquakes, and integration with the rest of the EMSC systems.

To increase the signal-to-noise ratio, both the website and app systems monitored each country separately, and they only counted users who were new within the previous half-hour. In addition, the website system filtered visitors using their referring uniform resource locator (url) (only visitors arriving from search engines or directly were considered) and a blacklist of internet protocol (IP) addresses (for filtering out internet robots or seismological institutes). Detections were made when the rate of arrivals per minute exceeded a multiple of the average rate per minute for the previous half-hour by a certain threshold; the parameters varied by system (12, 13). Figure S1 shows the distributions of the detection delays (for those detections that could be associated with EMSC earthquakes), and table S2 presents some summary percentiles.

Each system geolocated its users in a different way. The website system used a database [NetAcuity by digital element (20)] that associated the visitor’s IP address with a physical location. The accuracy of this method varied by country; in Europe and the United States, it could be accurate to the street level, but in many other countries, it was often accurate only to the city level or less. For users accessing the website via their mobiles, the accuracy was particularly bad since the IP addresses showed only the location where the mobile network connected to the internet, which might be far from the user’s true location.

The LastQuake app asked its users to enable access to their mobile phone’s location, which could be sourced from Global Positioning System readings or the location of the local cell masts. Hence, the positions could be accurate to less than a kilometer. At the time of writing, more than 80% of users allowed LastQuake to use their location.

The Twitter detection system had to perform an analysis of the user-written location string found in the profile of the author of each tweet. A process called “geocoding” attempted to extract a location from the natural language text (i.e., a city, town, or village name) (21). Naturally, there were many missing or humorous location entries in users’ profiles, but nevertheless, useful information could be extracted.

Once a detection was made, the set of users within 2 min of the trigger time (and within the same country as the detection for the app and website peaks) was collected and geolocated. A hierarchical bottom-up clustering algorithm was applied to this dataset using linkage based on the unweighted pair group method with arithmetic mean (UPGMA) and a Euclidean distance metric between latitude and longitude coordinates. This procedure eliminated outliers and found the largest cluster of activity. The barycenter (the average coordinate of this cluster) was taken to be the location of the crowdsourced detection. For the crowdsourced detections during 2016–2017, fig. S1 shows the distribution of the locations with respect to the EMSC-published epicenters (for those detections that could be associated with EMSC earthquakes) and table S2 presents some summary percentiles.

A single earthquake could have multiple crowdsourced detections from each of the systems, as well as from neighboring countries within the same system. However, in general, each system complemented the others, as different regions of the world showed preference for different channels of information (see fig. S2).

CsLoc procedure

Upon notification of a crowdsourced detection, CsLoc searched for phases from stations within 1000 to 2000 km of the crowdsourced barycenter (regional scale), with the maximum radius varying to ensure that at least seven stations were present. The search was limited to between 210 s before and 120 s after the crowdsourced detection time. This was simply a Structured Query Language request performed on the phase data, which were stored in a database table.

A phase is the earliest time that a particular type of seismic wave is measured as arriving at a seismic station. The P waves arrive the earliest and so have the cleanest signal for analysis. A phase is equivalently known as an “arrival time” or a “pick.”

For this study, P-wave arrival times were provided by GEOFON from more than 800 stations affiliated to 73 FDSN seismic networks distributed worldwide (16). In real time, data are received via the HMB protocol (17), which is essential for the system’s response time. We note that the analysis of P-wave arrival times from the waveform data was found to take approximately 30 s for GEOFON, and the transmission of data from GEOFON to EMSC has been measured to take less than 1 s.

Once the phases were collected, the phase association procedure was applied to filter away outlying events (see below). The chosen phases, along with the current epicenter estimate (which is initially the crowdsourced location), were given to the iLoc locator for analysis.

The association and location process ran iteratively at 15-s intervals; each iteration might add newly received arrival times and started from the solution obtained in the previous iteration. The iteration process stopped when the seismic location satisfied the publication criteria.

CsLoc phase association procedure

The phase association algorithm was applied to determine which phases may belong to the earthquake. The assumption was made that the collected P-phases were all first arriving Pn-phases propagating at 8.04 km/s following the ak135 model (14). Since the crowdsourced detection could not estimate the depth, it was also assumed that the event was shallow. Starting from the previous iteration’s calculated epicenter or from the initial crowdsourced barycenter, a wavefront interval between 210 and 15 s before the trigger time was used to select phases for analysis (red lines in Fig. 2 graphs). A robust linear regression with a fixed Pn velocity of 8.04 km/s was performed on these phases (sloping blue line in Fig. 2 graphs), relative to the starting epicenter estimate. The final chosen phases were those within the bounds of three times the median absolute deviation (MAD) of the fitted ak135 Pn line (sloping dashed black lines in Fig. 2 graphs). This algorithm naturally becomes more selective as a more accurate solution is used as the starting location and has proven effective at rejecting the phases of aftershocks as seen in Fig. 2C for the 10th iteration of the CsLoc analysis. The seismic inversion was then computed with the selected phases by the location software iLoc developed for the International Seismological Centre (ISC) (15).

iLoc

The iLoc location algorithm is based on the state-of-the-art location algorithm developed for the ISC. The ISC locator (15) has been operational since 2011; that is, the ISC Bulletin is produced with the new locator since January 2009. The relocation of the entire ISC Bulletin with the ISC locator is expected to be finished by the end of 2019. In addition to the features of the ISC locator, iLoc provides further functionality to support the needs of national seismological networks. iLoc uses most ak135 phases (including depth phases) (14) in the location with elevation, ellipticity (22, 23), and depth-phase bounce point corrections (24). It is fully integrated with the Regional Seismic Travel Times [RSTT (25)] three-dimensional global upper mantle and crust velocity model, and by default, it obtains travel-time predictions for crustal and mantle phases from RSTT. It can use any three-dimensional crustal and upper mantle model compliant with the RSTT parameterization, and it also supports the use of local, one-dimensional velocity models.

One of the major strengths of the locator is that it accounts for correlated travel-time prediction errors due to unmodeled velocity structures along similar ray paths that allow it to obtain reliable locations and uncertainty estimates even with unfavorable station distributions. It obtains the initial hypocenter guess from the neighborhood algorithm search (26, 27). Once close to the global minimum, it switches to an iterative linearized inversion using an a priori estimate of data covariance matrix whose off-diagonal elements represent the correlation structure in the dataset (28). Another important feature is that it attempts free-depth solution, only if there is depth resolution either from local networks or from teleseismic depth phases, thus avoiding the pitfall of depth–origin time trade-off. If there is no depth resolution in the data, iLoc fixes the depth to the most probable value based on historical seismicity. The iLoc locator, together with the latest RSTT three-dimensional velocity model (25), is available for download from the IRIS software depository (https://seiscode.iris.washington.edu/projects/iloc).

CsLoc publication criteria

The publication criteria were applied to the results of each iteration in an analysis to estimate whether the result was reliable. During an analysis, the actual effect of the criteria was often to force the algorithm to wait until more phases had been collected before a location could be considered reliable or to cause an extra iteration to refine the result.

The following criteria were calculated with respect to the new solution’s epicenter and using only the stations that provided phases that contributed to the solution:

(1) Minimum distance to the nearest station <306 km.

(2) Largest gap in azimuthal angles between stations <240°.

(3) Largest gap in azimuthal angles between next nearest neighboring stations <300°

(4) Separation between solution and crowdsourced location <500 km

(5) Separation between solution and starting location for this iteration <200 km

(6) MAD (see Fig. 2) from the ak135 fit of phases selected for iLoc but recalculated using the iLoc solution’s epicenter <4.4.

These parameters were optimized empirically using the test dataset, trading off between obtaining the most reliable results and obtaining the fastest results. Figure S3 attempts to show the effect of applying the criteria to the located results for the 10th iteration of the analyses. Each parameter eliminates a different set of outliers in the results to improve the quality of the published locations.

Test dataset

To validate the CsLoc system, we analyzed 2590 crowdsourced detections recorded between 1 January 2016 and 31 December 2017 at the EMSC, of which 2200 were associated to 1536 distinct earthquakes with published locations (table S1). Some duplicate detections had occurred because of multiple detection methods (fig. S2) and also since countries had been monitored individually to increase the signal-to-noise ratio (table S1). The full dataset of crowdsourced detections and the results of testing the CsLoc system can be found at (29).

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/4/eaau9824/DC1

Fig. S1. Analysis of the crowdsourced detections during 2016–2017 that could be associated with EMSC-published epicenters, considering each detection method individually.

Fig. S2. A Venn diagram of the earthquakes detected by each crowdsourced system during 2016–2017.

Fig. S3. Scatter graphs showing all obtained locations for the 10th iteration of the CsLoc analyses for all 2200 detections that were associated with an EMSC epicenter.

Fig. S4. Summary of the test dataset and its results starting from the 2590 crowdsourced detections; the transition from “seismic” to “distinct earthquakes” corresponds to the deduplication of detections from the multiple crowdsourced detection methods.

Fig. S5. An analysis of the 735 earthquakes located by CsLoc with respect to earthquake magnitude.

Fig. S6. The earthquakes located by CsLoc and GEOFON by earthquake magnitude; CsLoc had a wider spectrum of magnitudes, locating a larger number of events of magnitude lower than M5 with respect to GEOFON in the first 10 min.

Table S1. Summary statistics for crowdsourced detections at the EMSC during 2016–2017.

Table S2. Summary statistics for the earthquakes detected by each crowdsourced detection.

Table S3. Summary of the 735 earthquakes located by CsLoc that met the publication criteria.

Table S4. Statistics for the 429 earthquakes located by both GEOFON and CsLoc within 10 min of the origin time.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: Twitter earthquake detections were provided by P. Earle and M. Guy from USGS as part of a long-running collaboration with the EMSC. P-phase arrival times were obtained from the GEOFON program of the GFZ German Research Centre for Geosciences. We thank network operators for making data available in real time and M. Landes, E. Matrullo, L. Fallou, F. Roussel, and R. Madariaga for feedback and advice during both the research and the composition of this article. We also thank S. Hough and an anonymous reviewer for improvements to the article. Funding: Parts of the work presented represent the results of the SERA project. The SERA project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 730900. The present article reflects only the authors’ view, and the European Commission is not responsible for any use that may be made of the information it contains. Author contributions: R.J.S. and A.F. developed the CsLoc implementation and optimized its parameters. They performed analyses of the system on historical data and developed the publication criteria. R.J.S. primarily implemented the software. They are also the primary authors of the article and its figures. Figure 1 was created by www.comscious.es in collaboration with the authors. I.B. developed the phase association algorithm and its code. R.B. formulated the overarching research goals, led and supervised the project, and acquired funding. A.D. worked the HMB messaging bus and on some preliminary investigation of the concept. A.S., A.H., and J.S. provided phase and seismic detection from the GEOFON program both historically and in real time using the HMB messaging bus. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. P-phase arrival times used in this publication based on automatic detections using real-time waveforms from 73 seismic networks are available at http://doi.org/10.5880/GFZ.2.4.2018.002. The crowdsourced detections used for testing CsLoc and the resulting dataset from the tests are available at http://doi.org/10.5880/fidgeo.2018.068. Other data are available upon request to the corresponding author.
View Abstract

Navigate This Article