Introduction to CooRTweet

Overview

The CooRTweet package is a R tool for detecting and analyzing coordinated behavior across social media platforms. Named after Twitter, a quintessential site for coordinated message amplification through its features like hashtags and trending topics, CooRTweet is applicable to any social media platform, enabling analysis on mono-platform, multi-platform, and cross-platform datasets. Besides being platform-independent, it is also content-independent, supporting a wide range of content types (including hashtags, URLs, images, and any other objects of interest to the researcher). The package allows for flexible thresholds to identify coordination while also accounting for the uncoordinated network in which the coordination is contextualized. CooRTweet is one of the first software tools for coordinated detection to have undergone rigorous validation. With its output, researchers can effectively explore networks of coordinated activity.

Installation

You can install the CooRTweet package from CRAN or GitHub:

# Install from CRAN
install.packages("CooRTweet")

# Or install the development version from GitHub
devtools::install_github("username/CooRTweet") # Replace with actual GitHub repository

Key Features

Flexible Data Handling: Works with mono-modal, multi-modal, and cross-platform datasets, and any type of object.
Customizable Thresholds: Set time intervals and repetition/edge-weight thresholds to detect coordinated activities.
Graph-Based Analysis: Outputs coordination networks as igraph objects for further analysis and visualization.
Included Data: Comes with datasets for learning Righetti et al. (2022) and a simulate_data function to generate synthetic coordinated networks.

Getting Started

Input Data Format

The input dataset should include the following columns:

object_id: A unique identifier for the shared content (data type: character).
account_id: The user account identifier (data type: character).
content_id: The unique ID of the post (data type: character).
timestamp_share: The timestamp when the content was shared (data type: integer, UNIX format).

Example:

library(CooRTweet)
head(russian_coord_tweets)
#>                          object_id                       account_id
#> 1 85d2d12251a735ce05255061f7f231e2 0fb4232d1b7b37069c13ee17579bd10e
#> 2 89864519cd34cabd6f5a801b6857fea6 7d0e462c10d52c4ec1db5af953bf9b26
#> 3 6f04c951961f5cf8c05df6f284fc7c17 2e1140330c02584ffabaf1da362f8e10
#> 4 025b1b3dc82df1cc6c6b766e9c651251 38b5eea36ec86e4a9aac60980ebf6526
#> 5 025b1b3dc82df1cc6c6b766e9c651251 536783d1fd886a85ab697f299f153d3b
#> 6 280d86b602da34926d3797b94d0a7e15 285ddd6b5c9b35ea56257071d4cb6b4d
#>                         content_id timestamp_share
#> 1 114e22f1b2b648528277b76f0b0224e7      1623881091
#> 2 60e1901aa670f6c05a2d7f7e74cadaf3      1623879483
#> 3 85d2d12251a735ce05255061f7f231e2      1623865591
#> 4 321677c8cae729398e86c5c044b3fbba      1623864621
#> 5 707b5dc150f7405711f992341d0dd32f      1623864188
#> 6 c2592fc64962853976bb75d183a4301b      1623656233

Detect Coordinated Groups

Use the detect_groups() function to find groups of accounts coordinating within a specified time window.

result <- detect_groups(
  x = russian_coord_tweets,
  min_participation = 2,
  time_window = 60
)
head(result)
#>                           object_id                       content_id
#>                              <char>                           <char>
#> 1: 4e04165c28ea7dd3cf4c8512c3f490d7 2944a8d714bba22c120aad58fca851d8
#> 2: 4e04165c28ea7dd3cf4c8512c3f490d7 7b92b077328ca532e5d1c4781484ce35
#> 3: 4e04165c28ea7dd3cf4c8512c3f490d7 7b92b077328ca532e5d1c4781484ce35
#> 4: 4e04165c28ea7dd3cf4c8512c3f490d7 e9a4787f1d970dbe4c3868a783ba1535
#> 5: 4e04165c28ea7dd3cf4c8512c3f490d7 1283205f0e23fda216c4de27bec4df80
#> 6: bc46f3cae46cd00726c4c1992145ae20 54cecb686722d539dd556a9c6e8d72e0
#>                        content_id_y time_delta                       account_id
#>                              <char>      <num>                           <char>
#> 1: 354126c2d9e2e69676c9dbdcc167d3d7         36 d9fe8e4d34b6dcfba8cbcf1be4b28717
#> 2: 354126c2d9e2e69676c9dbdcc167d3d7         59 94b2413eb4e850246c07ba1bd55625c2
#> 3: 2944a8d714bba22c120aad58fca851d8         23 94b2413eb4e850246c07ba1bd55625c2
#> 4: c5b0dcb930f979202600a59bf64db452         48 7289281c087ccc0342d96604243d0069
#> 5: e9a4787f1d970dbe4c3868a783ba1535         41 6c051ab25467ae690fb24cd2c6c3ad99
#> 6: cf2b35e31413d2da92940bc571c2c6a2          5 f442b084eb6be4c7f66dffa386c01e4b
#>                        account_id_y
#>                              <char>
#> 1: 0fb4232d1b7b37069c13ee17579bd10e
#> 2: 0fb4232d1b7b37069c13ee17579bd10e
#> 3: d9fe8e4d34b6dcfba8cbcf1be4b28717
#> 4: 47a750359d66201ddefe7f7efbfed0b9
#> 5: 7289281c087ccc0342d96604243d0069
#> 6: 3dced626839250a9c9bf41a381234214

Generate Coordination Networks

Convert detected groups into a coordination network using generate_coordinated_network().

graph <- generate_coordinated_network(
  result,
  edge_weight = 0.5
)
graph
#> IGRAPH 889e6ff UNW- 2110 3721 -- 
#> + attr: name (v/c), weight (e/n), avg_time_delta (e/n), n_content_id
#> | (e/n), n_content_id_y (e/n), edge_symmetry_score (e/n),
#> | weight_threshold (e/n)
#> + edges from 889e6ff (vertex names):
#> [1] 0fb4232d1b7b37069c13ee17579bd10e--d9fe8e4d34b6dcfba8cbcf1be4b28717
#> [2] 0fb4232d1b7b37069c13ee17579bd10e--94b2413eb4e850246c07ba1bd55625c2
#> [3] 94b2413eb4e850246c07ba1bd55625c2--d9fe8e4d34b6dcfba8cbcf1be4b28717
#> [4] 47a750359d66201ddefe7f7efbfed0b9--7289281c087ccc0342d96604243d0069
#> [5] 6c051ab25467ae690fb24cd2c6c3ad99--7289281c087ccc0342d96604243d0069
#> [6] 3dced626839250a9c9bf41a381234214--f442b084eb6be4c7f66dffa386c01e4b
#> + ... omitted several edges

Advanced Usage

Multi-Modal and Multi-Platform Analysis

To analyze multiple types of content (e.g., URLs, hashtags), run detect_groups() separately for each type and combine the results.

We provide an anonymized sample from the authentic dataset by Righetti et al. (2022) that showcases coordinated behavior during the German federal elections in 2021.

# Example datasets for different content types
head(german_elections)
#> # A data frame: 6 × 7
#>   account_id post_id url_id hashtag_id domain_id phash_id  timestamp
#> * <chr>        <int>  <int>      <int>     <int>    <int>      <dbl>
#> 1 fb_12670    129235  23678         NA      3498       NA 1629589836
#> 2 fb_5966      84441     NA         NA      6756       NA 1629589069
#> 3 fb_5966      84443  29871         NA      5534       NA 1629589050
#> 4 fb_5966      84445     NA         NA        NA     9280 1629589022
#> 5 fb_5966      84446  30435         NA      5639       NA 1629589009
#> 6 fb_9045     104337  13609         NA      5804       NA 1629588823

First we prepare shared URLs:

# URLs
urls_data <- prep_data(german_elections,
                       object_id = "url_id",
                       account_id = "account_id",
                       content_id = "post_id",
                       timestamp_share = "timestamp")

urls_data <- unique(urls_data,
                    by = c("object_id", "account_id", "content_id", "timestamp_share"))

urls_data <- urls_data[!is.na(object_id)]

urls_data$object_id <- paste0("url_", urls_data$object_id)

Next, we prepare images. We used the pHash algorithm to uniquely identify images. The algorithm is implemented in the OpenImageR package (Mouselimis 2023).

# images (pHash)
img_data <- prep_data(german_elections,
                      object_id = "phash_id",
                      account_id = "account_id",
                      content_id = "post_id",
                      timestamp_share = "timestamp")

img_data <- unique(img_data,
                   by = c("object_id", "account_id", "content_id", "timestamp_share"))

img_data <- img_data[!is.na(object_id)]

img_data$object_id <- paste0("hash_", img_data$object_id)

Next, we perform the first step of coordination detection on each subset of the data with the detect_groups function:

# Detect coordinated groups for URLs and hashtags  --------------------
result_urls <- detect_groups(urls_data, time_window = 30,
                             min_participation = 2)

result_images <- detect_groups(img_data, time_window = 30,
                               min_participation = 2)

Then we can simply stack both resulting data.tables:

# Combine results  --------------------
library(data.table)

combined_results <- rbindlist( 
    list(result_urls, result_images),
    use.names = TRUE,
    fill = TRUE
)

Now we can let the network analysis run with the default settings to find accounts that show coordinated behavior in terms of image and URL sharing:

# Generate the coordinated multi-modal network  --------------------
graph <- generate_coordinated_network(combined_results, edge_weight = 0.5)
graph
#> IGRAPH d99b317 UNW- 671 1732 -- 
#> + attr: name (v/c), weight (e/n), avg_time_delta (e/n), n_content_id
#> | (e/n), n_content_id_y (e/n), edge_symmetry_score (e/n),
#> | weight_threshold (e/n)
#> + edges from d99b317 (vertex names):
#>  [1] fb_5761 --fb_7103  fb_12065--fb_4039  fb_11199--fb_3297  fb_11202--fb_18649
#>  [5] fb_2258 --fb_4039  fb_21069--fb_9754  fb_11202--fb_21069 fb_11202--fb_4039 
#>  [9] fb_11202--fb_9754  fb_21069--fb_18649 fb_18649--fb_4039  fb_14401--fb_8707 
#> [13] fb_18649--fb_9754  fb_4039 --fb_7548  fb_12670--fb_17402 fb_4039 --fb_8027 
#> [17] fb_17402--fb_8027  fb_4039 --fb_8900  fb_8027 --fb_8900  fb_17402--fb_8900 
#> [21] fb_17326--fb_7548  fb_17326--fb_3732  fb_3732 --fb_7548  fb_17326--fb_20736
#> + ... omitted several edges

Visualization

Visualize the coordination network using igraph:

library(igraph)
plot.igraph(
    graph,
    layout = layout.fruchterman.reingold,
    edge.width = 0.5,
    edge.curved = 0.3,
    vertex.size = 3,
    vertex.frame.color = "grey",
    vertex.frame.width = 0.1,
    vertex.label = NA
)

Additional features

The CooRTweet package includes several additional functions and features that enable refined exploration of coordinated networks, as detailed in the package documentation.

Conclusion

The CooRTweet package enables researchers to study coordinated behaviors with a high degree of flexibility and precision. Its generalized architecture makes it adaptable to various contexts and datasets, empowering social media research and analysis.

References

Kulichkina, Aytalina, Nicola Righetti, and Annie Waldherr. 2024. “Protest and Repression on Social Media: Pro-Navalny and Pro-Government Mobilization Dynamics and Coordination Patterns on Russian Twitter.” New Media & Society. https://doi.org/10.1177/14614448241254126.

Mouselimis, Lampros. 2023. OpenImageR: An Image Processing Toolkit. https://CRAN.R-project.org/package=OpenImageR.

Righetti, Nicola, Fabio Giglietto, Azade Esther Kakavand, Aytalina Kulichkina, Giada Marino, Massimo Terenzi, et al. 2022. “Political Advertisement and Coordinated Behavior on Social Media in the Lead-up to the 2021 German Federal Elections.” Dusseldorf: Media Authority of North Rhine-Westphalia.