Monthly mean of macro-averaged F1 scores for detection of Russian troll tweets, with varying predictor sets.. User-timing features were removed for task 3. Task 5 is excluded because it is based on a reduced set of features and therefore not comparable to other tasks.
Only content | (1) + meta-content | (2) + content timing | (3) + user timing | (4) + network features | |
Model number | (1) | (2) | (3) | (4) | (5) |
Experiments | |||||
Within-month train/ test (task 1) | 0.76 | 0.81 | 0.82 | 0.85 | 0.84 |
Train on t − 1 test on t (task 2) | 0.74 | 0.82 | 0.82 | NA | 0.85 |
Train on t − 1 test on new users in t (task 3) | 0.66 | 0.75 | 0.75 | 0.81 | 0.82 |
Within-month cross-release (task 4) | 0.66 | 0.70 | 0.70 | 0.74 | 0.75 |