---
title: "Linking census tract-level income data to electoral outcomes with ineAtlas"
output: rmarkdown::html_vignette
author: "Pablo Garcia Guzman"
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{Linking census tract-level income data to electoral outcomes with ineAtlas}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 10,
  fig.height = 7,
  dpi = 300,
  message = FALSE,
  warning = FALSE
)
```

## Introduction

This vignette demonstrates how to combine data from the Spanish Statistical Office's Atlas (accessed through ineAtlas) with other administrative data sources. We'll analyze the relationship between income levels and voting patterns in Madrid's census tracts using data from the 2021 regional elections.

## Required packages

```{r packages}
library(ggplot2)
library(dplyr)
library(tidyr)
library(ineAtlas)
library(extrafont)
library(data.table)
library(ggtext)
library(stringr)
```

## Getting the data

We'll combine two data sources:

1. Income data at the census tract level
2. Electoral results from Madrid's 2021 regional elections

First, let's get the election data:

```{r elections-data}
# Electoral data
elections_raw <- data.table::fread(
    "https://datos.comunidad.madrid/catalogo/dataset/08aac4de-ca28-4f9c-b45d-ef8457c4b5d2/resource/5e8cf4ad-b9f4-4ffd-a026-c27433e7815f/download/datos_electorales_elecciones_autonomicas_comunidad_de_madrid_2021.csv",
    sep = ";",
    encoding = "Latin-1"
) %>% as_tibble()
```

Now, let's get income data for Madrid's census tracts using `ineAtlas`:

```{r income-data}
# Get income data from ineAtlas
income_data <- get_atlas("income", "tract") %>%
    # Filter for Madrid region
    filter(substr(mun_code, 1, 2) == "28") %>%
    filter(year == 2021)
```

## Data processing

We need to process the election data and merge it with our income data:

```{r data-processing}
# Process election data to census tract level
elections_proc <- elections_raw %>%
    select(PP, `P.S.O.E.`, distrito, seccion, cod_muni, votos_electores) %>%
    rename(
        pp = PP,
        psoe = `P.S.O.E.`,
        total_votes = votos_electores
    ) %>%
    mutate(
        cod_muni = str_pad(as.character(cod_muni), width = 3, pad = "0"),
        distrito = str_pad(as.character(distrito), width = 2, pad = "0"),
        seccion = str_pad(as.character(seccion), width = 3, pad = "0"),
        tract_code = paste0("28", cod_muni, distrito, seccion),
        share_pp = pp / total_votes,
        share_psoe = psoe / total_votes,
        tract_code = as.character(tract_code)
    )

# Merge and prepare for plotting
plot_data <- elections_proc %>%
    left_join(
        income_data,
        by = "tract_code"
    ) %>%
    mutate(income_percentile = percent_rank(net_income_pc)) %>%
    select(income_percentile, share_pp, share_psoe, total_votes) %>%
    pivot_longer(
        cols = c(share_pp, share_psoe),
        names_to = "party",
        values_to = "vote_share"
    )
```

## Visualization

We'll create a visualization showing how voting patterns vary with income levels across Madrid's census tracts. In particular, we'll focus on the vote share of the two main parties, the Popular Party (PP) and the Socialist Party (PSOE):

```{r plot, fig.width=10, fig.height=7}
ggplot() +
    geom_point(data = plot_data,
               aes(x = income_percentile * 100, 
                   y = vote_share * 100,
                   color = party),
               size = plot_data$total_votes/200,
               alpha = 0.2) +
    geom_smooth(data = plot_data,
                aes(x = income_percentile * 100,
                    y = vote_share * 100,
                    color = party),
                method = "loess",
                se = FALSE,
                linewidth = 1.5) +
    scale_color_manual(
        values = c("share_pp" = "#0066CC", "share_psoe" = "#E31C1C"),
        labels = c("share_pp" = "PP", "share_psoe" = "PSOE"),
        name = NULL
    ) +
    scale_y_continuous(labels = function(x) paste0(x, "%")) +
    labs(
        title = "Voting patterns and income at the census tract-level (Madrid, 2021)",
        x = "Equivalised net income percentile",
        y = "Vote share",
        caption = "@pablogguz_ | The chart shows the percent of total votes by party within each census tract in the Madrid 2021 regional election.
        Source: Regional Government of Madrid, Spanish Statistical Office and author's calculations."
    ) +
    theme_minimal() +
    theme(
        text = element_text(family = "Open Sans", size = 16),
        plot.title = element_text(size = 18, margin = margin(b = 20)),
        plot.caption = element_textbox_simple(
            size = 12, 
            color = "grey40", 
            margin = margin(t = 20),
            hjust = 0  # This left-justifies the caption
        ),
        legend.position = "top",        
        legend.justification = "left",
        panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank()
    )
```