Table 1 Mean and SD of monthly macro-averaged F1 scores.

CountryPlatformTask 1: Within-
month train/test*
Task 2: Train on t
− 1, test on t
Task 3: Train on t
− 1, test on new
users in t
Task 4: Within-
month cross-
Task 5: Within-
month cross-

*Training data are all tweets from a 50% random sample of troll users combined with independent random samples from each of our two control groups. Test data use all tweets by the other 50% of troll users and a stratified random sample of 50% of tweets by nontroll users.

†Because this test includes the same troll accounts in both train and test sets, we exclude features related to account creation date.

‡We calculate mean and SD in F1 over months in which there are at least 1000 troll tweets or 500 troll Reddit posts in the test month.

§Not applicable. There was only one official data release for the Chinese campaign on Twitter and the Russian campaign on Reddit as of 1 December 2019.

║Not applicable. Cross-platform data are only available for Russian campaign.