Research ArticleDISASTER MANAGEMENT

Rapid assessment of disaster damage using social media activity

See allHide authors and affiliations

Science Advances  11 Mar 2016:
Vol. 2, no. 3, e1500779
DOI: 10.1126/sciadv.1500779
  • Fig. 1 Example of the spatiotemporal evolution of Twitter activity across keywords.

    (A) Geographical and topical variation of normalized activity (the number of daily messages divided by the number of local users active on the topic during the observation period). The horizontal axis is an offset (in hours) with respect to the time of hurricane landfall (00:00 UTC on 30 October 2012). Activity on hurricane-related words like “sandy” increases and reaches its peak on the day of landfall and then gradually falls off. Qualitatively similar trends are observed everywhere, with distance to the path of the hurricane affecting the strength of the response (compare magnitudes of activity peaks between New York, Chicago, and Miami). Different temporal patterns are exhibited by different keywords: “gas”-related discussion peaks with delay corresponding to posthurricane fuel shortages, and activity on “storm” has a secondary spike attributable to November “Nor’easter” storm. (B) Summary of activities by topic and location. Color corresponds to the level of normalized activity (blue, low; red, high). In columns, places are ranked according to their proximity to the path of the hurricane (closest on the left; farthest on the right). In rows, words are ranked according to the average activity on the topic. Evolution of the event brings disaster-related words to the top of the agenda, with the northeast showing the highest level of activity.

  • Fig. 2 Characteristic features of Twitter activity across locations (labeled by color according to hurricane proximity; blue, farther from the disaster; red, closer to the disaster).

    In all panels, the primary plot shows results for messages with keyword “sandy” and an inset for keyword “weather” to contrast behaviors between event-related and neutral words. (A) A primary feature is the sharp decline in normalized activity as the distance between a location and the path of the hurricane increases. After the distance exceeds 1200 to 1500 km, its effect on the strength of response disappears. This trend may be caused by a combination of factors, with direct observation of disaster effects and perception of risk both increasing the tweet activity of the East Coast cities. Anxiety, anticipation, and risk perception evidently contribute to the magnitude of response because many of the communities falling into the decreasing trend were not directly hit or were affected only marginally, whereas New Orleans, for example, shows a significant tweeting level that reflects its historical experience with damaging hurricanes like Katrina. (B) The retweet rate is inversely related to activity, with affected areas producing more original content. (C) The popularity of the content created in the disaster area is also higher and therefore increases with activity as well. None of the features discussed above are present for neutral words (see the insets in all panels).

  • Fig. 3 Predictive capacity of Hurricane Sandy’s digital traces.

    The horizontal axis is an offset (in hours) with respect to the time of hurricane landfall (00:00 UTC on 30 October 2012). (A) The number of messages as a function of time (labeled on the secondary y axis on the right) and the number of “active” (with at least one message posted) ZCTAs (labeled on the primary y axis on the left). (B) Evolution of the rank correlation coefficients between the normalized per-capita activity (number of original messages divided by the population of a corresponding ZCTA) and per-capita damage (composed of FEMA individual assistance grants and Sandy-related insurance claims). In addition, the dashed trend shows Kendall rank correlations between average sentiment and per-capita damage. The correlation increases from the prelandfall stage to the postlandfall stage of the hurricane, with a drop on the day of hurricane landfall. We conclude that the postdisaster stage, or persistent activity on the topic in the immediate aftermath of an event, is a good predictor of damage inflicted locally. The strength of the average sentiment of tweets does not seem to be a good predictor, at least at this level of spatial granularity (ZCTA resolution).

  • Fig. 4 Spatial distributions and mutual correlations between Hurricane Sandy damage, Twitter activity, and average sentiment of tweets.

    Correlations between per-capita Twitter activity and damage are illustrated at the ZCTA level for New Jersey (A) and at the county level for New Jersey and New York (B). The difference in geographic coverage is dictated by the quality of data: no insurance data are available for New York at the ZCTA level. Spatial distributions show that both variables reach their highest levels along the coast and in densely populated metropolitan areas around New York City. Normalized activity and damage both follow a quasi log-normal distribution [see the histograms along the axes of the scatter plot in (A)]. A moderately strong positive correlation between postlandfall activity and damage is observed, especially for fine-resolution analysis [see inset tables in the scatter plots in (A) and (B) for exact statistics and P values]. Sentiment-versus-damage (S-D) analysis is underpowered at the ZCTA level (τ = −0.031, P = 0.29), but county-level analysis shows that negative sentiment correlates with damage (τ = −0.28, P = 0.018).

  • Fig. 5 Distribution of activity-damage correlations (Pearson correlation coefficients) across all disasters considered in the study.

    In terms of damage, disasters appear to group according to their type, with cost increasing from tornado storms, to floods, and eventually to hurricanes. The correlation between activity and damage is very strong for small-scale (low-cost) disasters, then it weakens and remains, on average, at the same level across moderate-cost to high-cost events.

  • Table 1 Activity-damage correlation (Kendall τ, Spearman ρ, and Pearson ρ) for additional events.

    Disasters are sorted in order of increasing strength of the Pearson correlation coefficient. All disasters demonstrate moderate to strong levels of statistically significant correlations (P < 0.05) [with the exception of Alaska floods (DR-4122)].

    Event IDTypeKendall τPSpearman ρPPearson ρP
    DR4116Floods0.159.04 × 10−50.211.87 × 10−40.189.71 × 10−4
    DR4117Tornadoes0.170.050.260.050.240.06
    DR4176Tornadoes0.188.92 × 10−30.286.68 × 10−30.279.60 × 10−3
    SandyHurricane0.163.30 × 10−130.245.04 × 10−130.305.99 × 10−20
    DR4145Floods0.333.54 × 10−80.472.42 × 10−80.451.08 × 10−7
    DR4177Floods0.364.44 × 10−40.522.33 × 10−40.451.53 × 10−3
    DR4175Tornadoes0.340.020.460.030.460.03
    DR4195Floods0.321.28 × 10−80.473.35 × 10−90.466.32 × 10−9
    DR4174Tornadoes0.565.24 × 10−30.696.07 × 10−30.686.93 × 10−3
    DR4157Tornadoes0.519.70 × 10−40.712.38 × 10−40.721.71 × 10−4
    DR4168Mudslide0.440.040.590.030.861.84 × 10−4
    DR4193Earthquake0.743.80 × 10−50.907.50 × 10−70.883.92 × 10−6
    DR4122Floods1.001.001.00

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/2/3/e1500779/DC1

    Table S1. List of keywords included in the analysis, with their corresponding message counts.

    Table S2. Ranking of the keywords included in the analysis according to the strength of the correlation between distance and activity for East Coast cities.

    Table S3. Activity-damage correlations across keywords in order of decreasing strength.

    Table S4. Effect of normalization variable choice on the strength of activity-damage relationship (ZCTA resolution).

    Table S5. County-level estimates of damage: Modeling (Hazus-MH) and ex-post data on insurance and FEMA individual assistance grants.

    Table S6. Strength of activity-damage correlations for different damage estimates.

    Table S7. Predictive power of sentiment, analyzed at different spatial resolutions and normalized by either area Census population or local Twitter user count (“Twitter population”).

    Table S8. List of the disasters considered in the study, with a description of the damage data available for analysis.

    Table S9. Effect of the activity threshold filter on the strength of the relationship between Twitter activity and damage.

    Table S10. Mutual correlations between sentiment metrics at the level of individual messages.

    Table S11. Top-ranking words by frequency of occurrence in positive and negative messages.

    Table S12. Sentiment as a predictor of damage: Comparison between metrics.

    Fig. S1. Normalized local activity on the topic as a function of distance to the hurricane path.

    Fig. S2. Originality of content, expressed through the fraction of retweets in the stream of messages.

    Fig. S3. Global popularity of local content.

    Fig. S4. Comparison of predictive capacity of activity and sentiment.

    Fig. S5. Comparison of activity-damage correlation strength for different precision levels of geo-location.

    Fig. S6. Average sentiment trends over time: Comparison between sentiment metrics.

  • Supplementary Materials

    This PDF file includes:

    • Table S1. List of keywords included in the analysis, with their corresponding
      message counts.
    • Table S2. Ranking of the keywords included in the analysis according to the strength of the correlation between distance and activity for East Coast cities.
    • Table S3. Activity-damage correlations across keywords in order of decreasing
      strength.
    • Table S4. Effect of normalization variable choice on the strength of activity-damage relationship (ZCTA resolution).
    • Table S5. County-level estimates of damage: Modeling (Hazus-MH) and ex-post
      data on insurance and FEMA individual assistance grants.
    • Table S6. Strength of activity-damage correlations for different damage estimates.
    • Table S7. Predictive power of sentiment, analyzed at different spatial resolutions and normalized by either area Census population or local Twitter user count (“Twitter population”).
    • Table S8. List of the disasters considered in the study, with a description of the damage data available for analysis.
    • Table S9. Effect of the activity threshold filter on the strength of the relationship between Twitter activity and damage.
    • Table S10. Mutual correlations between sentiment metrics at the level of individual messages.
    • Table S11. Top-ranking words by frequency of occurrence in positive and negative messages.
    • Table S12. Sentiment as a predictor of damage: Comparison between metrics.
    • Fig. S1. Normalized local activity on the topic as a function of distance to the hurricane path.
    • Fig. S2. Originality of content, expressed through the fraction of retweets in the stream of messages.
    • Fig. S3. Global popularity of local content.
    • Fig. S4. Comparison of predictive capacity of activity and sentiment.
    • Fig. S5. Comparison of activity-damage correlation strength for different precision levels of geo-location.
    • Fig. S6. Average sentiment trends over time: Comparison between sentiment metrics.

    Download PDF

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article