Research ArticleDISASTER MANAGEMENT

# Rapid assessment of disaster damage using social media activity

See allHide authors and affiliations

Vol. 2, no. 3, e1500779

## Abstract

Could social media data aid in disaster response and damage assessment? Countries face both an increasing frequency and an increasing intensity of natural disasters resulting from climate change. During such events, citizens turn to social media platforms for disaster-related communication and information. Social media improves situational awareness, facilitates dissemination of emergency information, enables early warning systems, and helps coordinate relief efforts. In addition, the spatiotemporal distribution of disaster-related messages helps with the real-time monitoring and assessment of the disaster itself. We present a multiscale analysis of Twitter activity before, during, and after Hurricane Sandy. We examine the online response of 50 metropolitan areas of the United States and find a strong relationship between proximity to Sandy’s path and hurricane-related social media activity. We show that real and perceived threats, together with physical disaster effects, are directly observable through the intensity and composition of Twitter’s message stream. We demonstrate that per-capita Twitter activity strongly correlates with the per-capita economic damage inflicted by the hurricane. We verify our findings for a wide range of disasters and suggest that massive online social networks can be used for rapid assessment of damage caused by a large-scale disaster.

Keywords
• Social Media
• Social Networks
• Disaster Management

## INTRODUCTION

Natural disasters are costly. They are costly in terms of property, political stability, and lives lost (13). Unfortunately, as a result of climate change, natural disasters, such as hurricanes, floods, and tornadoes, are also likely to become more common, more intense, and subsequently more costly in the future (47). Developing rapid response tools that are designed to aid in adapting to these forthcoming changes is critical (8).

As society faces this need, the use of social media on platforms like Facebook and Twitter is on the rise. Unlike traditional media, these platforms enable data collection on an unprecedented scale, documenting public reaction to events unfolding in both virtual and physical worlds. This makes social media platforms attractive large-scale laboratories for social science research (911). Opportunities provided by social media are used in various domains, including the economic (12), political (1316), and social (14, 1721) sciences, as well as in public health (22, 23).

Because of the potential of social media, the use of massive online social networks in disaster management has attracted significant public and research interest (2426). In particular, the microblogging platform Twitter has been especially useful during emergency events (2729). Twitter allows its users to share short 140-character messages and to follow public messages from any other registered user. Such openness leads to a network topology characterized by a large number of accounts followed by an average user, placing Twitter somewhere in between a purely social network and a purely informational network (30). The information network properties of Twitter facilitate and accelerate the global spread of information; its social network properties ease access to geographically and personally relevant information, and the message length limit encourages informative exchange. These factors combine to make Twitter especially well suited for a fast-paced emergency environment.

Existing research on the use of Twitter in an emergency context is manifold. Researchers study platform-specific features (retweets and private messages) of emergency information diffusion (31, 32), the role of the service in gathering and disseminating news (33, 34), its contribution to situational awareness (35, 36), and the adoption of social media by formal respondents to serve public demand for crisis-related information (37, 38). Another branch focuses on the practical aspects of classifying disaster messages, detecting events, and identifying messages from crisis regions (3943). Others use Twitter’s network properties to devise sensor techniques for early awareness (44), to gauge thedynamics of societal response (45, 46), and to crowdsource relief efforts (47).

More recently, researchers have begun using social media platforms to derive information about disaster events themselves. For instance, the number of photographs uploaded to Flickr was shown to correlate strongly with physical variables that characterize natural disasters (atmospheric pressure during Hurricane Sandy) (48). Although it is unclear what causes the link (external information, network effects, or direct observer effects), the correlation suggests that digital traces of a disaster can help measure its strength or impact. On the basis of a similar concept, other studies verify the link between the spatiotemporal distribution of tweets and the physical extent of floods (49) and the link between the prevalence of disaster-related tweets and the distribution of Hurricane Sandy damage predicted from modeling (50).

Here, we present a hierarchical multiscale analysis of disaster-related Twitter activity. We start at the national level and progressively use a finer spatial resolution of counties and zip code tabulation areas (ZCTAs). First, we examine how geographical and sociocultural differences across the United States manifest through Twitter activity during a large-scale natural disaster (that is, Hurricane Sandy). We investigate the response of cities to the hurricane and identify general features of disaster-related behavior at the community level. Second, we study the distribution of geo-located messages at the state level within the two most affected states (New Jersey and New York) and, for the first time, analyze the relationship between Twitter activity and the ex-post assessment of damage inflicted by the hurricane. We verify the external validity of our findings across 12 other disaster events.

## DISCUSSION

We found that Twitter activity during a large-scale natural disaster—in this instance Hurricane Sandy—is related to the proximity of the region to the path of the hurricane. Activity drops as the distance from the hurricane increases; after a distance of approximately 1200 to 1500 km, the influence of proximity disappears. High-level analysis of the composition of the message stream reveals additional findings. Geo-enriched data (with location of tweets inferred from users’ profiles) show that the areas close to the disaster generate more original content, characterized by a lower fraction of retweets. This extends the previous understanding of retweeting behavior in crisis (31, 32) and confirms other studies (41). Finally, we find that messages from disaster regions generate more interest globally, with a higher normalized count of retweet sources.

In the first study of its kind, based on the actual ex-post damage assessments, we demonstrated that the per-capita number of Twitter messages corresponds directly to disaster-inflicted monetary damage. The correlation is especially pronounced for persistent postdisaster activity and is weakest at the peak of the disaster. We established that per-capita activity and per-capita damage both have an approximately log-normal distribution and that the Pearson correlation coefficient between the two can reach 0.6 for a carefully selected observation period in the aftermath of the landfall. This makes social media a viable platform for preliminary rapid damage assessment in the chaotic time immediately after a disaster. Our results suggest that, during a disaster, officials should pay attention to normalized activity levels, rates of original content creation, and rates of content rebroadcast to identify the hardest hit areas in real time. Immediately after a disaster, they should focus on persistence in activity levels to assess which areas are likely to need the most assistance.

We tested the sensitivity of our technique to variations in normalization strategies (Census population estimates versus Twitter user count), the volume and quality of underlying geo-coded data (natively geo-coded versus geo-enriched), and the methodology of damage assessment (multihazard modeling versus ex-post assessment). We also minimized potential intervening effects of media coverage by excluding tweets from media accounts, together with the associated retweets, and by filtering all messages using several activity thresholds (see table S9). Our results qualitatively hold in every case, noting the strongest relationship when we use native Twitter data, reliable Census population estimates, and comprehensive ex-post damage estimates.

The role of proximity as the primary factor explaining activity suggests that individuals realistically assess danger based on personal experiences (66) and that their level of interest is moderated accordingly. The cutoff in the activity-distance relationship is on the same order of magnitude as the footprint of a large atmospheric system, indicating that once people feel safe where they are, the level of engagement is uniform and most likely depends on the intensity of media coverage. Activity within the zone of the disaster sharply rises with proximity to its epicenter, possibly attributable to a combination of factors, including heightened anxiety, sense of direct relevance, and observation of the associated effects (wind, precipitation, and physical damage). Our findings echo other studies, such as the study on the correlation of the number of Flickr photos tagged “#sandy” with atmospheric pressure over New Jersey, emphasizing that online activity increases with the intensity of the event (48). However, what is striking with all of the different factors that motivate people to tweet is that a simple normalized measure of this activity—per-capita number of messages—serves as an efficient assessment tool for measuring the physical damage caused by the disaster.

The method for the assessment of the damage distribution proposed here offers a range of advantages to complement traditional alternatives (modeling, postdisaster surveying, and collection of data from multiple institutions): the advantages of fine spatial resolution, speed, low cost, and simplicity. For instance, damage forecasts issued by the FEMA Modeling Task Force rely on the sophisticated multihazard modeling. Although these forecasts are timely (generated before or immediately after a disaster), their verification with aerial imagery and physical site inspections is resource- and time-consuming. Social media damage assessment provides an additional low-cost tool in the arsenal of authorities to expedite the allocation of relief funds. Fine spatial resolution and speed of preliminary assessment can also be used to inform stochastic optimization algorithms for the joint assessment and repair of complex infrastructures, like power systems (61). In the long term, the technique can be used to check the integrity of the damage assessment process itself, especially in light of protracted settlement time frames and allegations of irregularities that recently prompted a blanket review of all insurance claims by FEMA (67). In addition, for disasters that affect multiple jurisdictions (states), the method mitigates the issue of local differences in assessment practices.

The correlation that we observed is not uniformly definitive in its strength for all events, and care should be taken in the attempt to devise practical applications. Moreover, an indirect and potentially nonstationary relationship between social media signals and real-world phenomena, compounded by changing social norms in the use of particular online platforms, calls for caution in developing predictive tools based on Big Data analysis (68). However, we believe that the method can be fine-tuned and strengthened by combination with traditional approaches like multihazard modeling. More robust estimates of damage through other data sources—for instance, the inclusion of municipal losses and nonmonetary indicators such as statistics on power losses and emergency shelters (65)—may reinforce the relationship. Composite metrics that combine per-capita activity with other properties of Twitter’s message stream [for example, fraction of disaster-related tweets (50) and sentiment (provided that activity is high and the volume of data is sufficient for sentiment to be predictive)] may prove to be even more sensitive to damage. Moreover, data from other social media, such as Facebook, Instagram, and Flickr, may be included to complement Twitter activity.

Finally, with continued monitoring of social media over time, we can potentially devise disaster-specific predictive models once a sufficient number of events of similar nature are available for the analysis against social media data. More broadly, our study suggests that the distribution of per-capita online activity on a specific topic has the potential to describe and quantify other natural, economic, or cultural phenomena.

## MATERIALS AND METHODS

The raw data for Hurricane Sandy comprise two distinct sets of messages. We obtained the data sets through the analytics company Topsy Labs. The first set consists of messages with the hashtag “#sandy” posted between 15 October and 12 November 2012. The data include the text of the messages and a range of additional information, such as message identifiers, user identifiers, follower counts, retweet statuses, self-reported or automatically detected location, time stamps, and sentiment scores. The second data set has a similar structure and was collected within the same time frame; however, instead of a hashtag, it includes all messages that contain one or more instances of specific keywords that are considered to be relevant to the event and its consequences (“sandy,” “hurricane,” “storm,” “superstorm,” “flooding,” “blackout,” “gas,” “power,” “weather,” “climate,” etc.; see table S1 for the full list). In total, for Hurricane Sandy, we have 52.55 million messages from 13.75 million unique users.

Data for the additional disasters were obtained in two ways. For the disasters that occurred during 2013, the data were purchased from Gnip, a Twitter subsidiary data reseller. For each disaster, we used the geographic boundary of the affected region and collected all messages that contained a preselected set of keywords (“storm,” “rain,” “flood,” “wind,” “tornado,” “mudslide,” “landslide,” “quake,” “fema”). Data for the events from 2014 are extracted from continuously collected geo-tagged tweets from the United States via Twitter’s Streaming Application Programing Interface (API).

Data sets obtained from data providers (Topsy and Gnip) are the subsets of full historical data (“high fidelity”). Streaming API offers almost complete coverage because only about 1 to 1.5% of all messages are geo-enabled and more than 90% of natively geo-coded messages are captured when geographic boundary is used in a request (69).

### Tweets location data

Spatial analysis relies on the location information embedded in a message or otherwise inferred. Only a small fraction of messages [in our Hurricane Sandy data, about 1.2% for the hashtag data set and 1.5% for the keywords data set (775,000 messages in total)] are geo-tagged by Twitter. Moreover, if the message is a retweet, it carries no geographic information of its own but rather contains details of the source. For this reason, retweet studies usually rely on historic geo-enabled messages by a user to infer the location of other messages from the same user (32).

To expand the data available for spatial analysis and to enable analysis of the stream composition (fraction of retweets), we performed geo-enrichment of raw data. We parsed location strings from user profiles and assigned coordinates when a profile-listed location returned a match against the U.S. Census Bureau Topologically Integrated Geographic Encoding and Referencing (TIGER) database. In Hurricane Sandy data, this results in the geo-enriched set of 9.7 million tweets from 2.2 million unique accounts.

National-scale analysis used geo-enriched data and would otherwise be impossible for many keywords because of the lack of data in some places. For damage analysis at the state level, we used both natively geo-coded messages (as a geographically more reliable set) and geo-enriched data to test the effects of enrichment on the strength of correlation.

### Filtering

Two potential issues may arise when collecting and analyzing geo-coded messages.

The first issue is an artificial clustering of messages at arbitrary virtual locations, which may happen with external applications like Foursquare. Foursquare check-in “@Frankenstorm Apocalypse” created several clusters of messages in Lower Manhattan, East River, and Bayside areas of New York City. Three of these clusters significantly skewed the local message count of corresponding ZCTAs.

The second issue is also associated with stationary clusters of messages—those produced by institutional accounts that issue or distribute weather forecasts and emergency warnings. Such accounts often operate automatically and publish frequent updates at regular intervals. In the course of the data collection period, they may produce tens or hundreds of messages, similarly inflating the local count of messages.

We rectified both issues by implementing filtering that detects clusters of colocated messages. We checked each cluster individually to ensure that it fell into one of the two categories mentioned above and discarded all corresponding messages if that was the case.

### Sentiment

The objective of sentiment analysis is to assign a measure of emotion or mood expressed in the text and to classify the text accordingly as positive, negative, or neutral. Sentiment in Twitter has been studied and demonstrated to reflect temporal (19, 70) and geographical (71) mood variations. In the context of natural disasters, we previously observed (44) that sentiment is sensitive to large-scale disasters. Here, we aimed to investigate further whether the signal carried by sentiment was indicative of damage.

Sentiment analysis usually relies on a lexicon of words that are classified as positive or negative and analyzes the text for the frequency of the occurrence of such words. An output could be the rate of positive/negative terms or (if the lexicon assigns the strength to each word on a certain scale) an absolute total or word-count normalized score.

Our raw data obtained from the data provider Topsy have sentiment scores assigned to every message. The algorithm of classification is proprietary, and the lexicon is unavailable for open access. Three versions of the score are provided: total absolute score, word-count normalized score (relative score), and trinary classification (+1, −1, and 0). From the distribution of scores in the data set, we concluded that the method was lexicon-based, with the weights of dictionary words falling within the range from −5 to +5. Because Topsy’s algorithm is proprietary, we verified it with two alternative methods: Linguistic Inquiry and Word Count (LIWC), which is a frequency-based tool with unweighted lexicon that is widely used in psychological research (72), and SentiStrength (73), which uses a weighted lexicon and takes into account certain features that are prevalent in short messages (emoticons, standard abbreviations, slang, “booster,” and negation lexical constructs).

We find that, at the level of individual messages, all metrics are correlated: strongly correlated for Topsy versus LIWC, moderately correlated for LIWC versus SentiStrength, and somewhat weakly correlated for Topsy versus SentiStrength (see table S10 for correlation coefficients). The temporal trends in the average sentiment of messages aggregated hourly for all three metrics closely follow each other (see fig. S6), suggesting that all classification techniques are comparable and robust, especially in aggregate analysis. We also separately analyzed the frequencies of the words that were most prevalent among positive and negative messages (see table S11). Our selection keywords, such as “hurricane,” “power,” and “storm,” are featured equally in both groups. However, apart from these terms, the rest of the top-ranking positive words are clearly positive emotion terms; for the negative group, they are negative emotions, profanities, and event-related words (“emergency”).

Given that all metrics perform adequately, we use native Topsy sentiment because it returns the highest statistically significant sentiment-damage correlation (see table S12). We used the relative score, which reflects both the polarity and the strength of emotions in a principled manner, taking into account the length of a message. The mean sentiment score was calculated for all messages within a particular area of interest (ZCTA or county), and the distribution of sentiment was analyzed against the distribution of damage.

## SUPPLEMENTARY MATERIALS

Table S1. List of keywords included in the analysis, with their corresponding message counts.

Table S2. Ranking of the keywords included in the analysis according to the strength of the correlation between distance and activity for East Coast cities.

Table S3. Activity-damage correlations across keywords in order of decreasing strength.

Table S4. Effect of normalization variable choice on the strength of activity-damage relationship (ZCTA resolution).

Table S5. County-level estimates of damage: Modeling (Hazus-MH) and ex-post data on insurance and FEMA individual assistance grants.

Table S6. Strength of activity-damage correlations for different damage estimates.

Table S7. Predictive power of sentiment, analyzed at different spatial resolutions and normalized by either area Census population or local Twitter user count (“Twitter population”).

Table S8. List of the disasters considered in the study, with a description of the damage data available for analysis.

Table S9. Effect of the activity threshold filter on the strength of the relationship between Twitter activity and damage.

Table S10. Mutual correlations between sentiment metrics at the level of individual messages.

Table S11. Top-ranking words by frequency of occurrence in positive and negative messages.

Table S12. Sentiment as a predictor of damage: Comparison between metrics.

Fig. S1. Normalized local activity on the topic as a function of distance to the hurricane path.

Fig. S2. Originality of content, expressed through the fraction of retweets in the stream of messages.

Fig. S3. Global popularity of local content.

Fig. S4. Comparison of predictive capacity of activity and sentiment.

Fig. S5. Comparison of activity-damage correlation strength for different precision levels of geo-location.

Fig. S6. Average sentiment trends over time: Comparison between sentiment metrics.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

Acknowledgments: We thank C. O’Dea (NJ Spotlight; www.njspotlight.com) for helpful suggestions and advice on obtaining the data for Hurricane Sandy damage, as well as FEMA, the New Jersey State Department of Banking and Insurance, and the New York State Department of Financial Services for providing the data. We thank the anonymous reviewers for their valuable suggestions. Funding: Y.K., H.C., P.V.H., and M.C. were supported by the Australian Government as represented by the Department of Broadband, Communications, and Digital Economy and the Australian Research Council through the ICT Centre of Excellence program. E.M. received support from the Spanish Ministry of Science and Technology (grant FIS2013-47532-C3-3-P). N.O. was supported by the National Science Foundation (grants DGE0707423 and 1424091). M.C. received support from the Army Research Laboratory (cooperative agreement numbers W911NF-09-2-0053 and W911NF-11-1-0363), National Science Foundation (grant 0905645), and DARPA/Lockheed Martin Guard Dog Programme (grant PO 4100149822). Author contributions: The authors contributed equally to the design of the study, collection and analysis of the data, and preparation of the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors. The data for this study are publicly available from the DRYAD repository via http://dx.doi.org/10.5061/dryad.15fv2.
View Abstract