The CooRTweet
package is a R tool for detecting and
analyzing coordinated behavior across social media platforms. Named
after Twitter, a quintessential site for coordinated message
amplification through its features like hashtags and trending topics,
CooRTweet is applicable to any social media platform, enabling analysis
on mono-platform, multi-platform, and cross-platform datasets. Besides
being platform-independent, it is also content-independent, supporting a
wide range of content types (including hashtags, URLs, images, and any
other objects of interest to the researcher). The package allows for
flexible thresholds to identify coordination while also accounting for
the uncoordinated network in which the coordination is contextualized.
CooRTweet is one of the first software tools for coordinated detection
to have undergone rigorous validation. With its output, researchers can
effectively explore networks of coordinated activity.
You can install the CooRTweet
package from CRAN or
GitHub:
igraph
objects for further analysis and
visualization.simulate_data
function to generate synthetic coordinated
networks.The input dataset should include the following columns:
object_id
: A unique identifier for the shared content
(data type: character).account_id
: The user account identifier (data type:
character).content_id
: The unique ID of the post (data type:
character).timestamp_share
: The timestamp when the content was
shared (data type: integer, UNIX format).Example:
library(CooRTweet)
head(russian_coord_tweets)
#> object_id account_id
#> 1 85d2d12251a735ce05255061f7f231e2 0fb4232d1b7b37069c13ee17579bd10e
#> 2 89864519cd34cabd6f5a801b6857fea6 7d0e462c10d52c4ec1db5af953bf9b26
#> 3 6f04c951961f5cf8c05df6f284fc7c17 2e1140330c02584ffabaf1da362f8e10
#> 4 025b1b3dc82df1cc6c6b766e9c651251 38b5eea36ec86e4a9aac60980ebf6526
#> 5 025b1b3dc82df1cc6c6b766e9c651251 536783d1fd886a85ab697f299f153d3b
#> 6 280d86b602da34926d3797b94d0a7e15 285ddd6b5c9b35ea56257071d4cb6b4d
#> content_id timestamp_share
#> 1 114e22f1b2b648528277b76f0b0224e7 1623881091
#> 2 60e1901aa670f6c05a2d7f7e74cadaf3 1623879483
#> 3 85d2d12251a735ce05255061f7f231e2 1623865591
#> 4 321677c8cae729398e86c5c044b3fbba 1623864621
#> 5 707b5dc150f7405711f992341d0dd32f 1623864188
#> 6 c2592fc64962853976bb75d183a4301b 1623656233
Use the detect_groups()
function to find groups of
accounts coordinating within a specified time window.
result <- detect_groups(
x = russian_coord_tweets,
min_participation = 2,
time_window = 60
)
head(result)
#> object_id content_id
#> <char> <char>
#> 1: 4e04165c28ea7dd3cf4c8512c3f490d7 2944a8d714bba22c120aad58fca851d8
#> 2: 4e04165c28ea7dd3cf4c8512c3f490d7 7b92b077328ca532e5d1c4781484ce35
#> 3: 4e04165c28ea7dd3cf4c8512c3f490d7 7b92b077328ca532e5d1c4781484ce35
#> 4: 4e04165c28ea7dd3cf4c8512c3f490d7 e9a4787f1d970dbe4c3868a783ba1535
#> 5: 4e04165c28ea7dd3cf4c8512c3f490d7 1283205f0e23fda216c4de27bec4df80
#> 6: bc46f3cae46cd00726c4c1992145ae20 54cecb686722d539dd556a9c6e8d72e0
#> content_id_y time_delta account_id
#> <char> <num> <char>
#> 1: 354126c2d9e2e69676c9dbdcc167d3d7 36 d9fe8e4d34b6dcfba8cbcf1be4b28717
#> 2: 354126c2d9e2e69676c9dbdcc167d3d7 59 94b2413eb4e850246c07ba1bd55625c2
#> 3: 2944a8d714bba22c120aad58fca851d8 23 94b2413eb4e850246c07ba1bd55625c2
#> 4: c5b0dcb930f979202600a59bf64db452 48 7289281c087ccc0342d96604243d0069
#> 5: e9a4787f1d970dbe4c3868a783ba1535 41 6c051ab25467ae690fb24cd2c6c3ad99
#> 6: cf2b35e31413d2da92940bc571c2c6a2 5 f442b084eb6be4c7f66dffa386c01e4b
#> account_id_y
#> <char>
#> 1: 0fb4232d1b7b37069c13ee17579bd10e
#> 2: 0fb4232d1b7b37069c13ee17579bd10e
#> 3: d9fe8e4d34b6dcfba8cbcf1be4b28717
#> 4: 47a750359d66201ddefe7f7efbfed0b9
#> 5: 7289281c087ccc0342d96604243d0069
#> 6: 3dced626839250a9c9bf41a381234214
Convert detected groups into a coordination network using
generate_coordinated_network()
.
graph <- generate_coordinated_network(
result,
edge_weight = 0.5
)
graph
#> IGRAPH 889e6ff UNW- 2110 3721 --
#> + attr: name (v/c), weight (e/n), avg_time_delta (e/n), n_content_id
#> | (e/n), n_content_id_y (e/n), edge_symmetry_score (e/n),
#> | weight_threshold (e/n)
#> + edges from 889e6ff (vertex names):
#> [1] 0fb4232d1b7b37069c13ee17579bd10e--d9fe8e4d34b6dcfba8cbcf1be4b28717
#> [2] 0fb4232d1b7b37069c13ee17579bd10e--94b2413eb4e850246c07ba1bd55625c2
#> [3] 94b2413eb4e850246c07ba1bd55625c2--d9fe8e4d34b6dcfba8cbcf1be4b28717
#> [4] 47a750359d66201ddefe7f7efbfed0b9--7289281c087ccc0342d96604243d0069
#> [5] 6c051ab25467ae690fb24cd2c6c3ad99--7289281c087ccc0342d96604243d0069
#> [6] 3dced626839250a9c9bf41a381234214--f442b084eb6be4c7f66dffa386c01e4b
#> + ... omitted several edges
To analyze multiple types of content (e.g., URLs, hashtags), run
detect_groups()
separately for each type and combine the
results.
We provide an anonymized sample from the authentic dataset by Righetti et al. (2022) that showcases coordinated behavior during the German federal elections in 2021.
# Example datasets for different content types
head(german_elections)
#> # A data frame: 6 × 7
#> account_id post_id url_id hashtag_id domain_id phash_id timestamp
#> * <chr> <int> <int> <int> <int> <int> <dbl>
#> 1 fb_12670 129235 23678 NA 3498 NA 1629589836
#> 2 fb_5966 84441 NA NA 6756 NA 1629589069
#> 3 fb_5966 84443 29871 NA 5534 NA 1629589050
#> 4 fb_5966 84445 NA NA NA 9280 1629589022
#> 5 fb_5966 84446 30435 NA 5639 NA 1629589009
#> 6 fb_9045 104337 13609 NA 5804 NA 1629588823
First we prepare shared URLs:
# URLs
urls_data <- prep_data(german_elections,
object_id = "url_id",
account_id = "account_id",
content_id = "post_id",
timestamp_share = "timestamp")
urls_data <- unique(urls_data,
by = c("object_id", "account_id", "content_id", "timestamp_share"))
urls_data <- urls_data[!is.na(object_id)]
urls_data$object_id <- paste0("url_", urls_data$object_id)
Next, we prepare images. We used the pHash
algorithm to uniquely identify images. The algorithm is implemented
in the OpenImageR
package (Mouselimis 2023).
# images (pHash)
img_data <- prep_data(german_elections,
object_id = "phash_id",
account_id = "account_id",
content_id = "post_id",
timestamp_share = "timestamp")
img_data <- unique(img_data,
by = c("object_id", "account_id", "content_id", "timestamp_share"))
img_data <- img_data[!is.na(object_id)]
img_data$object_id <- paste0("hash_", img_data$object_id)
Next, we perform the first step of coordination detection on each
subset of the data with the detect_groups
function:
# Detect coordinated groups for URLs and hashtags --------------------
result_urls <- detect_groups(urls_data, time_window = 30,
min_participation = 2)
result_images <- detect_groups(img_data, time_window = 30,
min_participation = 2)
Then we can simply stack both resulting data.tables
:
# Combine results --------------------
library(data.table)
combined_results <- rbindlist(
list(result_urls, result_images),
use.names = TRUE,
fill = TRUE
)
Now we can let the network analysis run with the default settings to find accounts that show coordinated behavior in terms of image and URL sharing:
# Generate the coordinated multi-modal network --------------------
graph <- generate_coordinated_network(combined_results, edge_weight = 0.5)
graph
#> IGRAPH d99b317 UNW- 671 1732 --
#> + attr: name (v/c), weight (e/n), avg_time_delta (e/n), n_content_id
#> | (e/n), n_content_id_y (e/n), edge_symmetry_score (e/n),
#> | weight_threshold (e/n)
#> + edges from d99b317 (vertex names):
#> [1] fb_5761 --fb_7103 fb_12065--fb_4039 fb_11199--fb_3297 fb_11202--fb_18649
#> [5] fb_2258 --fb_4039 fb_21069--fb_9754 fb_11202--fb_21069 fb_11202--fb_4039
#> [9] fb_11202--fb_9754 fb_21069--fb_18649 fb_18649--fb_4039 fb_14401--fb_8707
#> [13] fb_18649--fb_9754 fb_4039 --fb_7548 fb_12670--fb_17402 fb_4039 --fb_8027
#> [17] fb_17402--fb_8027 fb_4039 --fb_8900 fb_8027 --fb_8900 fb_17402--fb_8900
#> [21] fb_17326--fb_7548 fb_17326--fb_3732 fb_3732 --fb_7548 fb_17326--fb_20736
#> + ... omitted several edges
The CooRTweet package includes several additional functions and features that enable refined exploration of coordinated networks, as detailed in the package documentation.
The CooRTweet
package enables researchers to study
coordinated behaviors with a high degree of flexibility and precision.
Its generalized architecture makes it adaptable to various contexts and
datasets, empowering social media research and analysis.