ReviewSOCIAL SCIENCES

The science of contemporary street protest: New efforts in the United States

See allHide authors and affiliations

Science Advances  23 Oct 2019:
Vol. 5, no. 10, eaaw5461
DOI: 10.1126/sciadv.aaw5461

Abstract

Since the inauguration of Donald Trump, there has been substantial and ongoing protest against the Administration. Street demonstrations are some of the most visible forms of opposition to the Administration and its policies. This article reviews the two most central methods for studying street protest on a large scale: building comprehensive event databases and conducting field surveys of participants at demonstrations. After discussing the broader development of these methods, this article provides a detailed assessment of recent and ongoing projects studying the current wave of contention. Recommendations are offered to meet major challenges, including making data publicly available in near real time, increasing the validity and reliability of event data, expanding the scope of crowd surveys, and integrating ongoing projects in a meaningful way by building new research infrastructure.

Since Donald Trump was elected president of the United States on 8 November 2016, the United States has seen an outpouring of protest. Millions of Americans have marched and rallied in a number of massive, multilocation protests, including the Women’s March (2017, 2018, and 2019), the March for Science (2017, 2018, and 2019), the March for Our Lives (2018), Families Belong Together (2018), the National Student Walkout (2018), and the Global Climate Strikes (2019). Simultaneous activism in multiple locations has been a component of protest at least since the globalization movement (1) and has been a consistent characteristic of this “major cycle of contention” against the Trump Administration since it took office in January 2017 (2). At the same time, although in far fewer numbers, supporters of the president and demonstrators protesting issues unrelated to the current administration have rallied and marched as well (3).

In a time of sharp political polarization, protest is a notable way that citizens attempt to communicate their views on key issues. Protest is partly a response to citizens’ concerns that they are not being represented well by governmental institutions. As a result, it is important to understand the nature of, and messages conveyed by, protests. Who is protesting? How often? What messages are they trying to send? How do protests connect (if at all) to other political activities? Just as there is a longstanding science to measuring public opinion, so too there is a continually developing science to study protest. Unlike public opinion surveys, which convey whether people support an idea or candidate, protest helps to signal the strength of opinion on a topic and to clarify more precisely what citizens care about (4, 5). This paper provides an overview of this area of research and then discusses two central methods of studying street protest that have been notably active in recent years.

Prior research has made extensive contributions to understanding the ways that protest has influenced government and politics. Scholars have demonstrated the social value of protest in a wide range of arenas, such as struggles by minority groups for procedural rights and substantive justice (6, 7), the fight for democracy against authoritarian governments (8, 9), and transnational efforts to end war and militarism (10, 11), as well as reactionary movements attempting to block progressive change (12, 13). Studies in this area have illuminated the functions and dysfunctions of political systems on topics that include the role of mainstream media institutions (14), the evolving place of new media in grassroots mobilization (15, 16), the coevolution between political parties and social movements (17), the role of protests in shaping peoples’ biographies (18, 19), the construction of transnational advocacy networks (20, 21), and the impact of protest on public policy outcomes (22).

The expansion of street protests during the current political moment coincides with new collaborative efforts to study protest scientifically on a larger scale. Among the more prominent of these efforts are internet-assisted projects to count and categorize sprawling protest events, as well as coordinated surveys at protests distributed across space and time, including the use of computerized technologies to assist in data collection. These research trajectories present novel opportunities to investigate the linkages among protests and varied local contexts, the real-time diffusion of protest tactics across space, the connections between different social movements progressing during the same period, and other topics that may not have been as readily investigated when faced with more limited data availability. In light of these opportunities, this article reviews past and recently in-progress scholarship on event counting and crowd surveys of street protests with an eye toward understanding the future directions and best practices for this area of research.

It is important to highlight that neither our focus on street protests nor our attention to event counting and crowd surveys is meant to imply that these topics and methods are the only approaches to studying protest. On the contrary, the study of protest and social movements is a vibrant, interdisciplinary field that embraces a great diversity of methodological approaches. For example, there is a rich literature that examines the relationship between the internet and social activism (16, 23, 24) that is not examined in our discussion of street protest. However, given the centrality of street protests to social movements and these methods to past and present scholarship, this article aims to build on the opportunity to expand upon nascent projects aimed at coordinating work on street protest.

This article proceeds by detailing two efforts that document as many protest events reported in the United States as possible and share those data with the general public and researchers, in close to real time: Count Love and the Crowd Counting Consortium (CCC). Next, we discuss projects that have surveyed protesters in the streets at the rallies themselves. Although all these efforts build upon the study of street protest before the 2016 election, this article focuses specifically on research being conducted since 2016. We conclude with a discussion of how these new efforts lead to new and substantive methodological insights.

EVENT COUNTING

Protest event data have been central to the study of social movements and related areas of political science and sociology since at least the 1960s. Social movement scholars, primarily in sociology, have focused much of their efforts on media accounts of protest events to understand social movement dynamics, including the origins, workings, and consequences of protest (5, 25, 26). Political scientists, in contrast, have collected protest event data as part of larger efforts to measure and forecast political conflict (27).

Motivated by the efforts of social historians, scholars began compiling extensive “event catalogs” of crowds, demonstrations, strikes, riots, and related phenomena (28). As a methodological and theoretical innovation, the practice of building event databases to document the occurrence and characteristics of protests has enabled scholars to examine a wide range of questions. These methods are central to scholarship using event data, as well as to research on contentious politics more broadly.

Event databases provide standardized coding of basic features of events including the who, what, when, where, and why of events (29). With these data in hand, scholars can trace the rise and fall of movements, shifts in goals or tactics, and the geographic patterning of protest. Event data have also been used to describe, explain, and forecast political violence, from urban rioting to the outbreak of civil war. Typically, researchers combine event data with other temporal and spatial data to answer questions about the links between collective action and political and social institutions.

The earliest event databases were compiled to document strike activity starting in the late 19th century (30). These methods were adopted much more widely beginning in the 1960s, coinciding with increased scholarly interest in the politics of protest. Early and influential projects include Tilly’s research on contentious politics in Europe [e.g., (31)], out of which he developed his larger theoretical arguments with colleagues about the connection between social movements, democratization, and the nation-state (28, 32). Efforts to explain the onset and significance of the 1960s urban riots in the United States also spurred the early development of these methods (33, 34). In the 1970s, Charles Perrow launched comparative projects on the movements of the 1960s that would help to consolidate the resource mobilization and political process approaches to social movements (6, 35).

In the 1980s and 1990s, scholars began more extensive efforts to build cross-national and cross-movement event data projects (36) and to automate the process (37). Led by McAdam and colleagues, the Dynamics of Collective Action (DoCA) Project extended the hand-coding tradition to develop data on all U.S. protest and collection action in the United States that was reported in the New York Times from 1960 to 1995 (38). While many prior studies relied on indices and focused on specific movements, the DoCA study included full-text reading of newspapers over the entire period for all events. Meanwhile, scholars generated protest event databases for numerous cases outside the United States, such as the four-nation study of new social movements by Kriesi and his collaborators (36) and Ron Francisco’s Protest and Coercion project, which studies events in 28 European countries from 1980 to 1995 (39).

More recently, Kriesi and colleagues have been collecting protest event data from media reports on 30 European countries before, during, and after the Great Recession (40). Salehyan and colleagues collected data on social protest in Africa and Latin America from 1990 to 2016 (41). These datasets produce event data on particular world regions or on particular types of regimes (for details, see www.eui.eu/Projects/POLCON). Similarly, Clark and Regan have released the Mass Mobilization Project, which involves hand coding of numerous news sources and covers protest events involving at least 50 observed participants in 162 countries (excluding the United States) between 1990 and 2014 (42). The Mass Mobilization in Autocracies data project collects data on reported protests in autocratic countries (27), and the Nonviolent and Violent Campaigns and Outcomes Dataset (version 3) includes hand-coded data on contentious events in 21 countries from 1991 to 2012 (43). No datasets exist that provide consistent coverage of all protest events in the United States after 1995, although some have begun developing data on specific types of protests, such as black protest events (44).

Several major studies began to investigate the methodological biases associated with newspapers by comparing media coverage with official permit records for events (33, 34, 45). This work showed significant coverage biases associated with event size and proximity to media sources. Other potential biases in the descriptions of events have also been identified. For instance, data collection efforts that rely on English language–only news sources risk underreporting.

A second tradition, based primarily on political science, uses media data to map a broader set of political events. One early project was McClelland’s WEIS (World Event/Interaction Survey) Project (46), which coded a diverse set of political, diplomatic, and military actions to study interstate conflict among states and other actors, such as nongovernmental organizations, between 1966 and 1978, as reported in the New York Times. Scholars such as Schrodt began to use natural language–processing tools to code larger text corpuses for information on interstate conflict events more rapidly (37). One prominent pioneering example of this tradition was the Kansas Event Data System (KEDS), which parsed Reuters news summaries for political conflict data. In this style of data collection, computerized natural language methods would parse the text using the actor/verb dictionaries to produce automated event counts. For example, the KEDS was used to produce a 12-year time series of the Arab-Israeli conflict, while other systems were used for updates of the World Handbook of Political Indicators associated with Jenkins (47, 48). Contemporary work in this tradition includes the Global Database of Events, Language, and Tone (GDELT), which produces counts of daily events based on scraping a large media corpus using a proprietary system. Both the earlier KEDS project and the more recent GDELT rely on Schrodt’s Cameo Event Data coding scheme that contains more than 200 different event types (49). The Integrated Crisis Early Warning System similarly uses automated machine coding to collect data on different event types (50). Because these datasets are not specific to studying protests, their taxonomy and accompanying details lack sufficient granularity for most protest research.

Despite advances in machine learning over the decades, automating the collection of political event data has not accomplished a high degree of reliability or advanced beyond English language sources. While some large-scale efforts exist, such as the KEDS descendant PETRARCH, many existing systems have adopted a hybrid technique. For example, the Machine-Learning Protest Event Data System involves two rounds of human coding, followed by machine learning and forecasting (51). Notably, while dictionary-based systems use humans to identify verbs and nouns associated with events, the machine-learning systems infer protest event characteristics based on similarities to a training set of human-coded articles. Currently, all automated studies that rely on automated machine coding rely on English language sources because of the need to construct dictionaries and create machine-readable text using a consistent linguistic pattern.

More recently, scholars have begun using digital trace data from social media to estimate offline protest activity. Steinert-Threlkeld built a list of protest events and a network of protest participants using events data from the Integrated Conflict Early Warning System and individual communications data from Twitter during the Arab Spring protests in 2010 and 2011 to study the differences in influence between well-connected actors versus those at the edges of a social network [as discussed in (52)]. Similarly, Alanyali and colleagues (53) used tags on Flickr images to estimate global protest trends in 2013. This approach, while sharing some of the previously documented coverage challenges for event counting, provides an orthogonal technique for estimating attendee counts and incorporating the heterogeneity of individual participants for studying protest. The method faces challenges in event coverage, however, as protesters may use a variety of public and private social media that vary, in part, due to local context.

Last, Beyerlein and colleagues have developed an innovative strategy for documenting protest events that have important implications for efforts to build event databases (52). They use hypernetwork methods conducting detailed surveys with respondents who have attended a protest in the previous 6 months. By using nationally representative survey data, this approach is able to provide a more representative profile of protest than can be obtained from media reports, which are subject to standard sources of coverage bias. Beyerlein and colleagues find substantial differences in the kinds of issues and event characteristics between their national representative profile and the events reported in the Chicago Tribune, Los Angeles Times, New York Times, and Washington Post during the time corresponding with their survey reports (52). Similarly, scholars could compare hypernetwork data on protest events with the more comprehensive searches or crowdsourcing methods that we discuss below to ascertain potential sources of selection bias.

Given this long history, projects have varied on key dimensions including the kinds of events being documented (e.g., peaceful demonstrations and riots). However, these studies share common features, including the reliance on one or a small number of sources, standardized search processes for events, systematic coding schemes for reports, and teams of researchers. As we document below, contemporary event databases build on key features of this tradition. Access to numerous electronic records of sources allows for far greater coverage, as well as an ability to keep data relatively up to date and accessible to researchers and the public alike. At the same time, this opportunity presents new challenges that scholars are beginning to grapple with, including the difficulty in assessing source bias, as well as duplication of coverage when the same event is covered by multiple sources.

Two contemporary event-counting projects

Tracking protest events in real time is fundamentally a discovery and coding problem. It resembles the data collection components of past efforts to study protest by aggregating data from third-party sources (51, 54). Unique to today’s environment is the sheer number of sources and the time-limited nature of the discovery-and-review period: Given the transience of information on the internet compared to print media, thousands of sources produce reports of variable reliability on a daily basis. Researchers must archive and extract information such as where, when, and why a protest took place, as well as how many people attended, before that content is moved behind a paywall, deleted, or otherwise made unavailable.

Current event-coding projects adopt a hybrid approach, where researchers use both automated machine-coding techniques to capture incidence of events and hand coding of data to ensure accuracy in coding different dimensions of events. Here, we discuss two projects that have combined these methods since Trump’s inauguration: Count Love and the CCC. These projects, which work in collaboration with one another, have aggregated and shared data near real time about protests based on reports in news articles, social media posts, advocacy groups’ announcements, and attendee submissions. This section details the data collection processes that the two projects use to track protest events as they occur, followed by a discussion of challenges and suggestions for best practices related to source reliability, coding accuracy and reliability, and incomplete coverage of protest events and reports.

Count Love’s data collection process

To find reports and aggregate information about where, when, why, and how many people participated in a protest, Count Love maintains a list of local newspapers, radio stations, and television stations, including URLs to their home pages, as well as their news and metro subsections. Count Love initially compiled its list of sources by combining reports from the CCC’s Women’s March records with Wikipedia listings returned for “[state] newspapers” searches. As of 11 November 2018, Count Love’s source list contained 2816 URLs for organizations spanning all 50 states, and it continues to update its list with new sources found by cross-referencing protests announced on organizing sites with confirming news reports. On a nightly basis, Count Love’s web crawler programmatically visits each news source and downloads a copy of the article text for any link that contains the words “march,” “protest,” “demonstration,” or “rally.”

Not every news article that contains the word “protest,” “march,” “rally,” or “demonstration” describes a protest event. For example, some articles use the word “rally” to describe a rally in the stock market. To automate parts of the review process, after downloading articles, Count Love uses custom natural language-processing tools to flag irrelevant articles; group similar articles together; identify duplicate syndicated articles; annotate text that may describe a date, location, attendee count, or reason for protest; and populate those details with best guesses for human review. The natural language-processing and machine-learning-annotation-and-prediction tools used to automate components of the review process are based on similarity hashing, word vectors trained on a global corpus, and long short-term memory recurrent neural networks [for details, see (5557)].

After the machine-annotation-and-prediction processes are completed, a researcher hand-reviews the final results, refining annotations as necessary and coding actual protest events. Researchers at Count Love track all article references to each documented protest found and deduplicate new events by reviewing all previously documented events that occur in geographic proximity on the same day. Given resource constraints, in most cases, only one reviewer reviews each article. Count Love started tracking references to past protest events in news articles on 12 February 2017 and references to future protest events on 17 November 2017. At the end of each nightly review, Count Love publishes an updated list of protests online as a comma-separated-value file and as a set of searchable maps and charts on countlove.org.

Figure 1 summarizes the data collected from 20 January 2017 to 18 December 2018 as a population-based cartogram, where states are sized according to their population and colored on the basis of the number of protest participants per capita.

Fig. 1 Cartogram depicting protest activity by state.

States sized according to their population. Each state is colored based on the number of protest participants per capita; darker color indicates higher number of protesters per resident. Washington, DC, is an outlier: It has 1898 attendees per thousand residents. Based on data collected by Count Love from 20 January 2017 to 18 December 2018.

CCC’s data collection process

The CCC started somewhat improvisationally as two volunteer codirectors began recording data on crowd sizes at different Women’s Marches on 21 January 2017. Using a combination of news coverage and reports on social media, they crowdsourced a spreadsheet listing event locations, as well as low and high reported estimates of crowd sizes. The CCC has continued since the first Women’s March, relying on crowdsourced, publicly verifiable data on public protests to build a record of all protests taking place in the United States. The CCC’s codirectors collate and maintain the data, which they, along with numerous volunteers and several part-time paid research assistants, collect and update on a daily basis. The consortium approach allows the CCC to make the data available on an up-to-date basis, meaning that data for each month are reliably available within a few months of taking place. Although it started as a public-interest project meant to provide a reliable, impartial source of information regarding the occurrence and magnitude of political crowds in the United States, over time, data from the CCC have been used by scholars to analyze both the occurrence and effects of protests in the United States (58).

The events of interest for the CCC include protests, demonstrations, marches, rallies, sit-ins, strikes, vigils, and walkouts with at least one participant reported. The data exclude fundraisers, hearings, pep rallies, press conferences, regular meetings, and town halls. The listings come from search engines, social media (Facebook event pages and Twitter), organization websites (especially for major, multilocation events like the Women’s March or the March for Our Lives), and online news sites. Initially, Count Love shared its raw URLs with the CCC, whose researchers enter data into a live-updated spreadsheet as quickly as possible. As of 1 February 2019, Count Love shared results that also had been subjected to human review. CCC’s use of human coders to enter data allows a reliable method of deduplication of events—a common problem in machine-coded or automated data collection (e.g., GDELT), particularly when multiple sources report on the same event (59, 60). Notably, the spreadsheets are available for public viewing while data entry is taking place, and members of the public can anonymously submit records that have not yet been included in the tally so long as they can include a publicly verifiable URL containing confirmation of the event.

For each event, the CCC data list the date, town or city, location within that town or city, crowd size, organizing group, and contested issue. CCC notes if there were arrests, injuries, or property damage and includes a link to the source or sources of information. CCC participants have generally been able to find crowd estimates for about 70% of the events reported. It is worth noting that 95 to 99% of all protests are peaceful and arrest-free. The CCC codes each event as pro-Trump, anti-Trump, or neither.

For crowd size, CCC includes low and high counts if more than one estimate is available. For smaller protests where no crowd estimate is available, CCC counts attendees in a picture or video (if a picture or video is available on the publicly listed site). CCC lists events that have taken place and events that are planned for the near future but includes crowd size only after the fact, not the expected turnout.

Some methods of crowd counting involve much more sophisticated visual and/or scientific methods than the ones the CCC uses to derive its participation counts. Smaller crowds (of up to 300) are easier to verify with headcounts or other direct-counting techniques. Larger crowds are often estimated using grid density procedures and various forms of visualizing crowds such as through aerial photography (61). When reports of such estimates are publicly available, CCC includes them in the tally with the reference. That said, using multiple sources and allowing viewers to contest or revise the existing estimates on reported crowd size assist with this process.

While organizers may have incentives to overreport the number of participants who attend their events, state and local officials may also have incentives to underreport the number. Therefore, the CCC relies on low and high estimates of participants. These estimates correspond often (but not always) with official and organizers estimates, respectively. Discrepancies in the low and high participation counts can be themselves informative. Day and colleagues argued that major ambiguities or discrepancies in estimated numbers of protesters can reveal particularly controversial or contested political space (62). Many smaller events have only one crowd size estimate.

Once all reported data have been entered for a month, the codirectors then review the total monthly data. Total event, participant, arrest, and injury tallies are produced for each month, and the event listings collected are finalized and made publicly available at crowdcounting.org. The CCC publishes routine updates on protest and crowd trends in the Washington Post. CCC’s monthly estimates are reported in Fig. 2 from January 2017 to October 2018.

Fig. 2 Number of protest attendees by month from January 2017 to October 2018 based on data collected by the CCC.

Light and dark bars depict the low and high attendee estimates, respectively, highlighting discrepancies in reporting.

The multisourced approach used by the CCC has several benefits. First, the CCC is able to avoid problems of underreporting that often occur in the context of protest event data that rely on only one or two newspapers as sources (6366). By relying on social media, as well as on internet searches and web crawls, the CCC is able to locate and list many events that newspapers do not print—particularly events with few attendees and events in remote and rural communities (29). Second, relying on participant-generated event listings allows the research team to check the veracity of (and often validate) claimed crowd sizes and other event-related information. Third, allowing for live viewing of the monthly spreadsheets-in-progress increases the efficiency of the coding process by allowing (and encouraging) volunteers to submit records that have not yet been recorded in the data. It also allows CCC participants to verify and validate data entries, particularly when dealing with crowd counts. For instance, viewers sending in additional information (such as a newspaper article that was updated several days after an event) can help to revise the data to reflect more accurate counts.

From 21 January to 31 December 2017, the CCC counted 8730 protest events with between 5,906,031 and 9,051,870 observed participants. The largest event during this time period was the Women’s March on 21 January 2017. The CCC estimates that from 3,267,134 to 5,246,670 people participated in this event. Other events with more than 1 million attendees included the second Women’s March (January 2018), the National Student Walkout (14 March 2018), the March for Our Lives (24 March 2018), and Pride events in both 2017 and 2018. Some of these large events have drawn focused scholarly interest (67, 68). To visualize how these events have been distributed over time and space, protest attendance for the Women’s March in 2017 and 2018 are visualized in Figs. 3 and 4, respectively, using CCC estimates (note that colors are intentionally different in the images to differentiate between the years).

Fig. 3 Map of Women’s March events occurring in the continental United States in 2017 from the CCC data, with circles sized according to the high estimate of attendees.

Protests had an average of 8300 attendees, with the largest occurring in Washington, DC (up to 1 million attendees).

Fig. 4 Map of Women’s March events occurring in the continental United States in 2018 from the CCC data, with circles sized according to the high estimate of attendees.

Protests had an average of 6900 attendees, with the largest occurring in Los Angeles, CA (up to 600,000 attendees).

Strengths, limitations, and challenges for event counting

The event-counting methods described above inform the public, activists, policy-makers, and researchers about the scale and nature of contemporary protests across the country, and they create historical records that otherwise may not be easy to reconstruct in the future. However, these event-counting methods also have several reliability, coding, and discovery limitations and challenges, including (i) resolving discrepancies in reported data, such as crowd size, for the same event reported by multiple sources; (ii) evaluating the reliability and bias of each source; (iii) requiring manual review of what can be hundreds of potential protest reports every day; (iv) accurately and consistently coding events in near real time; and (v) having an incomplete list of sources and an incomplete list of reports from known sources.

Multisource discrepancies. Crowd size reports can exhibit high variance and low precision when multiple sources report the same protest event. For example, journalists commonly cite crowd counts in the “dozens” or “hundreds” of attendees or offer no estimate. For events with no official police count or reporter estimate, other participant counts—such as the number of people arrested at a protest—often are not representative of the actual number of attendees. For other sources that offer precise crowd counts, those counts can exhibit a consistent upward or downward bias relative to other sources that report about the same events. The high variance and low precision of crowd counts across multiple sources highlight the reliability challenges that researchers face when recording crowd size estimates.

Reliability and bias problems are present not only in the crowd sizes reported but also in the reported reasons for protest. Single-source reports may omit relevant details, such as racial injustice, as a motivating factor in police brutality protests, while multisource reports may offer differing descriptions. Given that the list of active news sources numbers at least into the thousands, and Twitter and Facebook users in the United States number into the hundreds of millions, evaluating the reliability and bias of every source for every protest report in real time is nearly impossible. These sources introduce reliability challenges, especially given increased attention to the potential for internet dissemination of fabricated content. To help mitigate these risks, events are associated with one or more sources, which enables both attribution and ongoing evaluation of specific source reliability.

Volume of reports. The number of sources reporting protest events in real time creates its own set of challenges. Given the transience of internet content, at a minimum, researchers must find and archive every potential protest report before it becomes inaccessible. For example, of the first 1000 articles that Count Love crawled in 2017, 15.5% could no longer be freely retrieved as of 31 March 2019, as the page had either been removed, archived, or moved behind a paywall. Given both the transience of internet data and the volume of data generated each day, reviewing articles in a timely manner poses a resource challenge. Between 12 February 2017 and 11 November 2018, Count Love reviewed a total of 62,592 news articles, averaging 97 articles each day, with a minimum of four articles on 23 November 2017 (Thanksgiving in the United States) and a maximum of 1130 articles during the March for Our Lives protests on 24 March 2018. To handle this volume of review with two researchers, Count Love only assigns one researcher per article, unless that article is ambiguous, and nightly article reviews require an average of 1 hour of time per person. CCC likewise assigns discovered events to single researchers. The volume of reports in combination with the time-sensitive nature of reviewing contemporary events leads to a data collection process that produces timely data that, while potentially representative, are also necessarily incomplete.

Real-time efforts to document protest events compromise some reliability owing to the evolving sets of reasons why, and the ways that, people protest. Researchers make judgments during review to evaluate ambiguous reports. For example, should researchers count public disturbances inside town halls as protests, or Pride events advocating for LGBTQ rights? Should vigils after a mass shooting count? Does it matter if the shooting was racially or religiously motivated? If a protest features counter-protestors, should the counter-protest count as its own event? Does a single person protesting count as a protest? Protests deemed as “new” for some attribute, such as those raised by the questions enumerated above, may not neatly fit into the taxonomies in the existing literature. Yet, when researchers document a new protest event in their aggregate data, they are modifying these taxonomies in real time. Their aggregate data offer a contemporary and evolving definition of “protest” that has not yet been subject to peer review, and these differences can introduce accuracy and consistency implications when drawing historical comparisons.

Intercoder reliability. The two projects have both worked to address intercoder reliability given resource constraints. Count Love aims to maintain reliability by holding constant the two researchers who have coded the entire dataset. Coding occurs on a nightly basis, generally in the same physical location at the same time. Machine-learning logic suggests likely codes for each article based on previous examples, and researchers finalize decisions manually through a software interface that presents a list of existing codes and requires manual confirmation to add new codes. Researchers at Count Love jointly adjudicate how to code new or ambiguous events in real time, publish nightly data about when codes were first used in their dataset, and have developed tools to allow others to fully search their dataset using the same codes.

The CCC project, in contrast, relies on an exceedingly straightforward coding protocol, which asks coders to enter general information regarding an event’s basic characteristics. The greatest variability occurs around the decision as to whether an event constitutes a protest (e.g., a town hall meeting or a vigil memorializing a historical event) for inclusion. When in doubt, CCC coders enter a case and then flag it for further consideration. The codirectors decide as tiebreakers on whether to include or exclude questionable cases.

The CCC manages other coding decisions among those who volunteer as consortium coders in several ways. First, one codirector uses a standardized training of all the research assistants on the coding protocol. Frequently asked questions and coding criteria are shared with all volunteers. Second, when the initial coding of a month’s worth of data is completed, one or both codirectors clean the spreadsheet and resolve any questionable coding decisions through consensus. Because all source materials are posted in the spreadsheet, all researchers can validate or verify the coding decisions as well. While these practices cannot guarantee intercoder reliability, they represent efforts toward that goal, and they allow other researchers that use these data to conduct their own evaluation and validation.

Coverage of events. Real-time counting methods also face several accuracy challenges related to coverage completeness. First, given the transient nature of internet content detailed above, a discovery and archival problem exists with respect to finding and storing all relevant articles from known sources. Second, given that there are, at a minimum, several thousand local news sources in the United States and hundreds of millions of users on various social media platforms, finding every relevant report from every source is practically intractable. Third, even with a complete list of sources, counting events based on reports requires that every event is covered by at least one source. Fourth, coverage is not sufficient for event discovery; the keywords that a researcher uses to search will bias the set of articles and results found. For example, searching for variations of the words “protest,” “demonstration,” “rally,” and “march” favors specific forms of protest, and the addition of terms such as “petition” or “strike” could expand the types and numbers of events found. These coverage challenges introduce bias into the counts and types of protest events identified.

Increased automation has enabled both event-counting projects to aggregate, cross-reference, and disseminate information about protest activity relatively quickly. However, increased automation requires that decisions are made about the data-aggregation process that merit additional scrutiny. For example, implementing an automated crawler requires decisions about what websites to crawl (news sites, social media, organizer websites, etc.), how extensively to crawl (home pages, the metro subsection of a newspaper, etc.), and what content to review (articles, videos, images, image captions, etc.). Alternatively, when searching existing data indices, such as social media sites or search engines, the results are constrained to their publicly accessible content and are filtered by proprietary search algorithms that prioritize content that is subjectively evaluated to be relevant to both the query and the searcher (a phenomenon referred to as the “filter bubble”) (69). These constraints on accessing content may introduce a bias in results or underrepresentation of geographically distant events.

Beyond data aggregation, automation of research processes can further affect data coding. For example, decisions about data structure may affect the ability to disambiguate duplicate events (e.g., by storing longitude and latitude in addition to location name). Machine-learning tools can help in making initial guesses about content, but such models are not sufficient on their own. For example, Count Love’s neural network correctly identifies that an article describes a protest event 89.9% of the time yet only correctly categorizes the protest 78% of the time. Automation enables the event-counting projects to collect and share event data on an ongoing basis, but these processes are still highly dependent on human oversight and review.

Overall, the data on present-day protest activity may not be directly comparable to historic datasets because of differences in both the information collected and how it is collected, but these datasets provide a scaffold for comparing across protest periods. By documenting the data collection process of current event-counting projects, it will be possible for future researchers to either collect comparable data or have sufficient context to compare collected data. Although we have noted that links and sources are not permanent, it might be possible to build backward into previous years using remaining web links and social media. Although such a search would be incomplete, it could broadly indicate the level, size, and location of protests under a different administration or enable some level of comparison to other time periods or places.

Recommended best practices going forward

The limitations and challenges for event counting in near real time fall largely into five groups: (i) communicating and resolving discrepancies in reported data, (ii) evaluating the reliability and bias of each source, (iii) reviewing all reports with limited resources, (iv) accurately and consistently coding events in near real time, and (v) discovering events based on an incomplete list of sources and an incomplete list of articles from those sources.

Researchers could address the first four problems at a later point in time if all the raw data for protest reports remain available. For example, for events with multiple reporting sources, Count Love currently cites the most precise, minimum crowd count reported: “a dozen” maps to 10 attendees, “hundreds” maps to 100 attendees, etc. However, by preserving the original raw data, future researchers could return to the original text and recode events or filter sources using different criteria.

As to the question of what original data to archive, at a minimum, reconstructing the real-time set of events and understanding coverage problems require archiving the list of sources used; the date that each source was added; and the date, URL, and text content of protest reports. In addition to preserving raw data, saving annotations and the original text for those annotations can improve the efficiency of future research efforts to understand (and potentially remap) coding judgments made in real time. For events with multiple sources, saving every reference to a particular protest can help future researchers evaluate the qualitative and quantitative bias of each source. In the aggregate, these data archival suggestions improve replicability.

No amount of data archival effort, however, solves the problem of incomplete coverage. While comprehensively discovering reports to document protests may be practically infeasible, it is feasible to estimate event coverage by sampling “missed” events. Using the Families Belong Together nationally coordinated protests as an example, MoveOn.org announced 751 protests at https://act.moveon.org/event/families-belong-together/search/ in the United States. Count Love initially found 461 references to these protests from its crawled articles, 408 of which were past event references (as opposed to future event announcements). An additional 169 protests were found after cross-referencing with the national list and searching for relevant news articles, and the remaining 121 protests remain unconfirmed. If every announced event, in fact, occurred, this validation exercise estimates Count Love’s coverage rate of protest events at between 54% (events discovered before validation) and 77% (total events discovered after validation). If some of the announced events did not occur (which is likely the case), Count Love’s actual coverage rates would be higher. Repeating this error estimate with other announced protests provides one method to measure how well real-time counting efforts capture protest activity.

The need to implement these best practices points to the potential value of greater collaboration and institutionalization for event counting. Multi-scholar collaboration has begun in the United States but could benefit from expansion of efforts and resources to support these efforts. International collaboration would be particularly helpful, given that many contemporary events occur in multiple countries concurrently (e.g., the Women’s March of 2017 had over 600 events in the United States but was closer to 1000 events including those around the world). Moreover, such collaboration could allow for standardization of event classification procedures, which could yield more comparable data produced from non-English language sources.

CROWD SURVEYS

From the standpoint of event counting, each person attending a protest or demonstration is counted equally. This perspective, like that of the casual observer, treats protests as consisting of undifferentiated, homogeneous masses. However, there is generally a great deal of heterogeneity among the participants at protests, who are often assembled by diverse coalitions of organizations and interests (70, 71). Crowd surveys make it possible to understand this heterogeneity in terms of participants’ demographics, attitudes, political engagement, connections to social movement organizations, and more (11, 17, 7280). Moreover, research that has combined this methodology with surveys of nonparticipants has been used to address questions regarding how the people who mobilize are similar to and different from the general population and why people participate in protest [(11), pp. 86 to 93; (17), pp. 115 to 128; (81), pp. 61 to 62, 109 to 110, and 123]. Further, this research documents how those who protest are connected, the development of political identities, and the relationship of protest to other political activities (11, 17, 8184). As we discuss in the following sections, one of the main limitations of crowd surveys is that they generally only cover a small sample of events, as opposed to the census approach used by event counting. Another limitation of this approach is that it may sometimes be difficult to obtain samples of nonparticipants that appropriately match with crowd surveys, which is necessary for drawing certain types of inferences about participation.

Protest surveys typically rely on in-person interviews, respondent-completed surveys, mail-return questionnaires, or some combination of these approaches at one major protest event (or, in a limited number of cases, a handful of related protest events). While the first protest surveys were conducted in the mid-1960s (73, 77), the use of this method has expanded substantially since the early 2000s in tandem with increased reliance on protest as a tactic to attract the attention of decision-makers, the media, and the general public (85). As we have already noted, since President Trump took office, large-scale protests, as well as crowd surveys of them, have become even more common in the United States.

The complex environment of a protest leads researchers to focus their attention on several considerations that are not common in many other types of surveys. First, it is impossible to establish a sampling frame based on the population, as the investigator does not have a list of all people participating in an event; who participates in a protest is not known until the day of the event; and no census of participants exists. Working without this information, the investigator must find a way to elicit a random sample in the field during the event. Second, crowd conditions may affect the ability of the investigator to draw a sample. The ease or difficulty of sampling depends on whether the crowd is stationary or moving, whether it is sparse or dense, and the level of confrontation by participants. Stationary, sparse crowds that are peaceful and not engaged in confrontational tactics (such as civil disobedience, or more violent tactics, like throwing items at the police) tend to be more conducive to research. In general, the presence of police, counter-protesters, or violence by demonstrators are all likely to make it more difficult to collect a sample. Third and last, weather is an important factor. Weather conditions, such as rain, snow, or high temperatures, may interfere with the data collection process and the crowd’s willingness to participate in a survey.

The most common approach to crowd surveys has been for researchers to either administer surveys in the field or disseminate a survey to be mailed back [for an overview, see (85 Table 1)]. Sampling involves investigators entering the protest site from varying locations throughout the crowd, approaching every nth participant (usually every fifth participant) at the protest and asking them to participate in the survey (72, 74, 81). In some cases, investigators adopt a slight variation to this approach. For example, Heaney and Rojas instructed surveyors to first select an “anchor” (not sampled) and then count five participants from the anchor before making an invitation, with respondents clustered in sets of three (17).

Table 1 Demographic data collected by surveying a sampling of attendees (N = 1936) at protests taking place in Washington, DC, associated with the Resistance.

Data were collected by Fisher and published in (81).

View this table:

Walgrave and Verhulst introduced an additional layer into the selection process by having fieldwork supervisors select rows in a moving crowd before identifying the particular participant to be sampled (85). Field experiments conducted by these authors demonstrate that systematic counting procedures are more reliable than nonsystematic sampling procedures (wherein surveyors are more likely to sample approachable peers than typical participants). To date, however, there is no research that assigns a higher reliability to any one counting procedure over another. Samples conducted by competing research teams at the same event showed that similar (although not identical) sampling procedures may yield nearly identical results. For example, Fisher (83, 86) and Heaney (75) obtained samples with approximately equivalent composition when conducting surveys at the same anti-Trump protests in Washington, DC.

Response rates to protest surveys generally compare favorably to high-quality national opinion surveys, such as the General Social Survey (61.3% in 2016—available at http://gss.norc.org/Documents/other/Response%20rates.pdf) and the American National Election Survey (50% in 2016) (87), although response rates vary tremendously, depending, in part, on the methods used. In a meta-analysis of protest surveys conducted at 51 demonstrations held in seven European nations, Walgrave and colleagues examined the covariates of response rates [(88); see also (85) and Table 1]. The authors found that questionnaires distributed in the field had a somewhat higher response rate (90%) than requests for face-to-face interviews (87%). Gender was a factor in accepting interviews. In the words of Walgrave and colleagues, “Demonstrators prefer to talk to female interlocutors instead of male interlocutors” [(88), p. 92]. The nature of the demonstration mattered, with lower response rates observed when demonstrations were chaotic. Response rates were substantially lower (36%) when potential respondents were asked to return surveys by mail. Older persons had lower response rates for both face-to-face interviews and postal-return surveys.

Other factors may influence survey nonresponse rates. A study by Rüdig documented that longer questionnaires tend to garner lower response rates (89). Heaney (75) observed that response rates varied with ideology, with individuals at conservative protest events agreeing to take a six-page questionnaire at lower rates (49 to 60%) than individuals at liberal protests (68 to 85%). This difference may be attributable to a conservative ideology that casts suspicion on academics and on scientific research more broadly (67).

Higher response rates are, in general, preferred to lower response rates because surveys with higher response rates are more likely to provide a good representation of the population attending the event. Nonetheless, the most important question is how representative the respondents are of the overall protesting population. If those who respond to the survey are systematically different from those in the population, then selection bias may be a problem. In their meta-analysis, Walgrave and colleagues determined that age, education, and motivation were associated with variations in response rate differences among demonstrations (88). However, in comparing the results of surveys conducted in the field with follow-up surveys conducted via the internet, Fisher found limited differences among the nonresponse to her two waves of follow-ups [for details, see (81)]. Specifically, she found that more educated respondents and those with previous protest experience had higher response rates in her first wave of follow-up surveys.

In their 2015 book, Heaney and Rojas reported response rates as a function of race and gender (17). They noted that data collected from black respondents consistently yielded lower response rates than other racial groups, ranging from roughly 1 to 15% lower. Women generally had higher response rates than men, ranging from parity to 10% higher. These types of differences may be attributable to a variety of factors, such as variations in the willingness to make an uncompensated contribution to a collective good, level of trust in the investigator, or the race/gender of the surveyor. To address response biases, Heaney and Rojas implemented survey weights in their regression models (17). However, they did not find that the uses of these weights affected the substantive conclusions of their analysis.

Overall, the extant literature on protest surveys indicates that these instruments can be reliable and valid tools for assessing the composition of protests. Adopting methods of random sampling (such as approaching every fifth demonstrator) is essential to prevent surveyors from introducing selection biases into the data. Multiple approaches to randomization are acceptable and may be adapted to variations in the survey conditions. When biases are identified, they may be corrected using survey weights. As we discuss in more detail in the best practices section below, more research is needed to determine the sources of response bias and the most effective ways to avoid or correct for it.

Contemporary crowd survey projects

Studies using crowd surveys have expanded what is known about social movements, protest, activism, political parties, and related topics [see, in particular, (11, 81, 84, 90, 91)]. In one of the first multinational crowd surveys, Fisher and colleagues presented data collected from surveys of participants at five globalization protests held in Canada, The Netherlands, and the United States in 2000 to 2002 (72). The study shed light on the role of organizations in an era when the internet was becoming a central tool for coordinating activism and protest [(15); for an overview, see (16)]. The authors found that organizations were critical to mobilizing nonlocal participants, most of whom learned about protests through the internet. Although protests generally drew participants from within the nation of the protest, the internet enabled the participation of “rooted cosmopolitans” who worked on global issues within their own national contexts [for more details on rooted cosmopolitans, see (21)].

In furthering the multinational approach to crowd surveys, Walgrave and Rucht (11) conducted a study of 11 anti-war demonstrations held in eight countries on 15 February 2003. Their study design held a number of key variables constant, including the issue (stopping the war in Iraq), the targets (the United States and the United Kingdom), the stage of the social movement being studied (pre-war opposition), the tactics (peaceful protest), and the key slogans (e.g., “the world says no to war”). This design enabled the authors to analyze the relationships among protests and numerous characteristics of the people who participated, as well as the countries in which the events took place. One of the main findings of the study was that the national context was critical to shaping the nature of protest, particularly depending on whether the nation where the event happened was actively involved in war. Protesters in bellicose countries, for example, were more likely to oppose their own governments, while those in countries outside the war coalition were more likely to be influenced by their own leaders’ positions on the war. The study demonstrated the feasibility of conducting surveys in multiple nations at related events on the same day.

Following on the success of these studies, teams of scholars became more ambitious in their research designs by incorporating variation in space, time, and issues within a common study framework. A model study across issues, nations, and time was conducted by Walgrave and Rucht to examine how digital media technologies allow activists to maintain engagements with multiple social movement communities (11). A collaborative effort of European scholars has aimed to institutionalize this agenda through a project funded by the European Science Foundation titled “Caught in the Act of Protest: Contextualizing Contention” [for an overview, see (92, 93)]. Articles from this project have analyzed a range of topics. For example, Saunders and colleagues compared participants at protests around the issue of climate change in December 2009 and around May Day events in May 2010 in multiple European cities to understand differing levels of participation in protest (84). Other articles have looked at lone protesters (94), unaffiliated protesters who attend events without links to organizations (91), and protest diffusion (90).

Alongside crowd surveys, the “Caught in the Act of Protest” study collects systematic observations at the level of the event (93). This component of the research builds on studies that have conducted systematic field observations of demonstrations or other forms of collective action, typically by teams of researchers using common protocols (95, 96). The method has significant advantages in terms of the range of characteristics of events that can be collected, as well as providing contextual measures for comparison across events. Rather than solely relying on media reports, researchers can develop protocols and intentionally document theoretically relevant features of protest events.

The United States does not have a systematic collaborative effort that is comparable to the Caught in the Act of Protest project. Still, building on this agenda, Heaney and Rojas added a time component to the crowd survey approach by following the anti-war movement in the United States after 9/11 over two waves (17). In both waves of data collection, the authors collected data in numerous cities simultaneously. By following the movement over time, Heaney and Rojas were able to track changes in the partisan identities of participants, thus enabling them to understand the coevolution of a political party and a social movement. With this approach, the authors found that democratically identified participants withdrew from the anti-war movement as the Democratic Party reaped electoral success from 2006 to 2008.

The election of Donald Trump in 2016 presented new opportunities and challenges for scholars conducting crowd surveys in the United States. Protests became larger and more widely distributed than they had been in recent years (1). Further, the topical scope of protests diversified; rather than focusing primarily on topics such as globalization or war, protest expanded to myriad issues, such as women’s rights, scientific freedom/independence, climate change, gun control, racial justice, and immigration (83). This expansion not only created more opportunities for scholars to conduct crowd surveys but also demanded greater resources to do so.

In her book American Resistance, Fisher examined seven of the largest protests in Washington, DC, associated with opposition to President Trump: the 2017 Women’s March, the March for Science, the People’s Climate March, the March for Racial Justice, the 2018 Women’s March, the March for Our Lives, and Families Belong Together (81). Her results, reported in Table 1, show that the Resistance was disproportionately female (at least 54%), highly educated (with more than 70% holding a bachelor’s degree), majority white (more than 62%), and had an average adult age of 38 to 49 years. Further, she found that the Resistance is almost entirely left-leaning in its political ideology (more than 85%). Resistance participants were motivated to march by a wide range of issues, with women’s rights, environmental protection, racial justice, immigration, and police brutality being among the more common motivations (83). She also found that participants did not limit their activism to marching in the streets, as more than half of the respondents had previously contacted an elected official and more than 40% had attended a town hall meeting (81).

A similar study conducted by Heaney collected crowd surveys at 10 events in Washington, DC, during 2017 (75). Seven of these events were ideologically liberal (or pro-Resistance), while three of them were ideologically conservative (or pro-Trump), thus enabling the comparison of events and their participants on the basis of ideology. He found that pro-Resistance events were significantly more female than were the pro-Trump events and that the pro-Resistance events were significantly more partisan than were the pro-Trump events, although he found no significant differences in the racial backgrounds or prior experiences of participants on the basis of ideology. He observed a considerable contingent of participants at the Women’s March (15%) and the March for Racial Justice (8%) that volunteered “intersectional” issues (concerned with more than one category of social marginalization) as one of the principal reasons for their involvement.

As Heaney’s study remains in progress, we report results from his data on the political attitudes of conservative versus liberal protesters at 16 protest events in Washington, DC (5 conservative events and 11 liberal events) during 2017 and 2018 in Table 2. Questions were asked on a five-point Likert scale, with “strongly agree” taking the value of 5, “somewhat agree” taking the value of 4, “neither agree nor disagree” taking the value of 3, “somewhat disagree” taking the value of 2, and “strongly disagree” taking the value of 1. Average response values by group are posted in the table for 10 questions. The results reveal both similarities and differences between the two types of events.

Table 2 Political attitudes by conservative versus liberal protesters in the United States.

Surveys collected by Heaney at 16 protest events in Washington, DC (N = 3222) are shown. Data are weighted to account for response rate differences based on race/ethnicity and sex/gender.

View this table:

Conservative protesters were somewhat more likely than liberal protesters to see the efficacy of the American political system, saying that it is slightly more effective in solving public problems, and more likely than liberals to say that elections are a valuable mechanism of accountability. Conservatives and liberals were equally likely to say that they value conversing with people who hold different partisan loyalties, but liberals were considerably more likely to prefer a greater role for third parties in American democracy. Conservatives were more likely to see the importance of civility at protests and less likely to acknowledge the potential influence of property damage and violence at demonstrations than were liberal protesters. Liberals and conservatives had roughly equivalent views of their personal political efficacy. However, race and gender were the issues (of the 10 considered here) that most divided the two groups. Liberal protesters registered strong agreement that African Americans and women are more likely to be mistreated in the workplace than are whites and men, respectively. However, conservatives were more skeptical of these claims, leaning somewhat in the direction of disagreeing with these statements about race- and gender-based inequalities.

At the time of this writing, there were numerous works in progress that drew on crowd surveys in the United States during the Trump era to produce their primary data. Reuning and colleagues were researching protest politics outside the 2016 presidential nominating conventions (97). Heaney was exploring the commitment to principles of intersectionality by protesters during the Trump era, including surveys in multiple cities on the same day (98). Fisher and Jasny (99) were analyzing data collected from large-scale marches that targeted the Trump Administration and its policies in Washington, DC, to understand who persists in protest, turning out to participate again and again. These and other studies continued to yield insights into contemporary politics using the crowd survey method.

Recommend best practices going forward

The increasing frequency and widening geographic scope of protests raises significant challenges for conducting crowd surveys. Previous studies, such as those by Walgrave and Rucht (11), Heaney and Rojas (17), and the many coming out of the Caught in the Act of Protest project [for an overview, see (93)], have demonstrated the feasibility of conducting surveys across issues, space, and time. European scholars have put into place an infrastructure to standardize research and sustain this area of investigation. However, scholars outside of Europe have not followed suit. This lack of infrastructure poses a challenge in an era where protests may take place in hundreds of cities on the same day. The next step for research by scholars in this area is to develop a methodology for conducting crowd surveys across a range of sites of protest during such days of action in a way that collects data from a more representative sample of events. While European scholars are significantly ahead of the United States on this goal, their research could also benefit from expanding the geographic representativeness of their surveys.

Moving forward, best practices will require forming teams of scholars that are geographically dispersed in a way that corresponds with the distribution of the events under investigation. While previous studies have concentrated on conducting surveys in different regions and in major cities, the datasets would be more representative if data were collected in multiple locations simultaneously in a way that represents smaller cities, suburbs, and rural areas.

Consider an event projected to take place in 300 cities simultaneously in the United States or Europe. Suppose that the target areas were stratified into 12 regions or countries. If a survey was conducted in three types of locations—one city, one suburb, and one rural site or one capital, one college town, and one urban area with neither a capital nor a university—in each region, that would require the survey to go into the field in 36 locations (or roughly 12% of events). Such a task would likely require a minimum of 12 to 36 scholars working together, each coordinating research teams to collect survey data at events in their region. Even more resources and institutionalization would be required to conduct crowd surveys at a genuine random sample of events.

Beyond collaboration among multiple scholars, scaling up the administration of surveys would also require standardization of the instrument, sampling, and practices in entering and coding the survey data. Previous studies have placed a premium on administering pen-and-paper surveys. The advantage of this practice is that pen-and-paper surveys are familiar to most potential respondents, thus minimizing selection biases that result from lack of familiarity with the survey technology. However, entering and coding data collected with this method is time-consuming and cumbersome, and research that includes a scaled-up approach where data are collected in 30 or more sites would be prohibitive.

In contrast, Fisher’s recent work has used electronic tablets to administer surveys and tabulate data (81). This approach is more practical than paper surveys as the number of locations scales up. Tablets are relatively inexpensive and could be purchased and widely distributed, guaranteeing that equivalent data are collected and entered across field sites. A disadvantage of using tablets is that there is more cost and technological synchronization required at the onset of the project. Moreover, older respondents and individuals with lower levels of education may have less facility with tablets than with paper surveys. This flaw is not dispositive, however, as programmers could endeavor to improve tablet-based surveys with an eye toward making them more user-friendly.

Participating in protests and demonstrations is an important form of political participation throughout the world. If scholars are to understand the meaning of these events for politics, greater collective effort is needed to scale up and standardize the use of crowd surveys. Just as election studies have been centralized around national efforts, such as the American National Election Study, more routine crowd surveys would prove more instructive if produced through more centralized collaboration. Some previous studies have shown feasibility of this type of coordination, such as the Caught in the Act of Protest project, but more substantial efforts are needed in light of vastly expanding protests in the current era.

CONCLUSION

Protest event data and crowd surveys represent central efforts to answer fundamental questions about protests and social movements. This research benefits not only scholars but also the wider public in many ways. First, these methods—especially when compared with data on the characteristics and attitudes of nonparticipants—allow us to address important descriptive questions, such as who protests and what are their major motivations or goals? After all, we know that most people do not attend protests regularly, if at all. As such, it is important to understand the characteristics of people who do demonstrate, as well as when, where, and how much protest has occurred. Answering these questions is important for developing an accurate picture of protest and how it varies over time and by place. Second, these methods—when coupled with appropriate comparative analysis—allow us to answer broader theoretical questions. For example, what effects do these protests have on political outcomes, such as voting? In addition, what is the relationship between protests and other forms of engagement, as well as specific political phenomena, such as political parties, legislation, and policy decisions?

Event counting and crowd surveys are useful because they each yield rich, large-scale evidence regarding the factors that bear directly on these questions. We learn about protestors’ various identity characteristics (e.g., gender, race, ethnicity, age, education, political ideology, and values), the geographic locations and scope of the protests, the issue areas of concern, the organizations and movements orchestrating the demonstrations, how these protests connect to other political institutions, and other relevant details. By combining other kinds of data with crowd survey and protest event data, we can develop more powerful historical, political, and sociological analyses of social and political change. This work is foundational to creating a science of protest that is comparable to the science of public opinion.

While the research tradition on studying street protests presents significant benefits, it also faces notable limitations that must be kept in mind moving forward. For example, both event counting and crowd surveys face the challenges of limiting bias and ensuring that their data are representative of the protest events taking place and the protest participants involved, since the nature of these events makes the characteristics of participants difficult to observe. Moreover, they provide little insight into the internal structures and dynamics of advocacy organizations or the scope of online activism [for example, see (100)], which are both essential to supporting the street protest that is the focus of this paper. Studies relying on event counting and crowd surveys would benefit from supplementing their analysis with elite interviews, ethnography, and other methods that provide a more intricate—although, perhaps, less generalizable—portrait of the state of protest and activism.

In the middle of a period of heightened protest, collecting and analyzing high-quality data on protest and making it publicly available have special significance. The first 2 1/2 years of the Trump presidency witnessed a surge of social mobilization, with most of it focused on challenging President Trump, his statements, and his administration’s policies (101). Millions of Americans took to the streets, the capitols, the sidewalks, and parks, often to express dissent with political elites and support for an alternative vision of politics in America. As such, this moment provides an opportunity to understand the groundswell of civic participation and activism, as well as its public impact, while simultaneously offering scholars the chance to hone further and deepen the scholarly tools available for such research—especially in terms of event counting and protest surveys. These research innovations are needed to advance our knowledge of protest and social change. Whether we see continued escalation of protest or demobilization in the coming years, rigorous and ongoing research on this wave of contention will be central to understanding protest mobilization and its broader consequences for generations to come.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: Funding: This collaboration was supported, in part, by grant no. 19010090 from the Ford Foundation Civic Engagement and Government Program (www.fordfoundation.org/work/challenging-inequality/civic-engagement-and-government/). The CCC was supported by Humility & Conviction in Public Life, a project of the University of Connecticut’s Humanities Institute, and the Carnegie Corporation of New York, through the Sié Chéou-Kang Center for International Security and Diplomacy at the Josef Korbel School of International Studies at the University of Denver. D.R.F.’s research was supported, in part, by funding from the Launch Program at the College of Behavioral and Social Sciences of the University of Maryland. M.T.H.’s research was supported by the University of Michigan (especially the Institute for Research on Women and Gender, the National Center for Institutional Diversity, the Undergraduate Research Opportunity Program, and the Organizational Studies Program) and the National Institute for Civil Discourse. Author contributions: D.R.F. took the lead in organizing this collaboration and securing the funding to initiate the collaboration. All other authors contributed equally to the research, writing, and revision of this article. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the materials cited here. Additional data related to this paper may be requested from the authors.
View Abstract

Stay Connected to Science Advances

Navigate This Article