Labelled data is a type of dataset where variables or their values contain additional metadata. These datasets, common in tools like SPSS or Stata, allow for better understanding and documentation of variables.
The {haven}
package in R is specifically designed for
working with such datasets. It enhances metadata handling by allowing
you to embed variable and value labels directly into your data.
In this vignette, we’ll explore how labelled data works, leveraging
key {haven}
package functionalities to extract, inspect,
and manipulate labels.
library(nettskjemar)
# Replace this with your form ID
formid <- 123823
data <- ns_get_data(formid)
data
#> formid $submission_id
#> 1 123823 27685292
#> $created freetext radio
#> 1 2023-06-01T20:57:15+02:00 some text 1
#> checkbox.questionnaires checkbox.events
#> 1 1 1
#> checkbox.logs dropdown radio_matrix.grants
#> 1 0 4 1
#> radio_matrix.lecture radio_matrix.email
#> 1 2 2
#> checkbox_matrix.1.IT
#> 1 1
#> checkbox_matrix.1.colleague
#> 1 1
#> checkbox_matrix.1.admin
#> 1 0
#> checkbox_matrix.1.union
#> 1 0
#> checkbox_matrix.1.internet
#> 1 0
#> checkbox_matrix.2.IT
#> 1 0
#> checkbox_matrix.2.colleague
#> 1 0
#> checkbox_matrix.2.admin
#> 1 1
#> checkbox_matrix.2.union
#> 1 0
#> checkbox_matrix.2.internet date time
#> 1 0 2023-06-01 12:00
#> datetime number_decimal
#> 1 2023-06-12T13:33 4.5
#> number_integer slider attachment_1
#> 1 77 3 sølvi.png
#> attachment_2 $answer_time_ms
#> 1 74630
#> [ reached 'max' / getOption("max.print") -- omitted 2 rows ]
Here we have our example data, displayed as a standard data.frame. There is nothing particularly special about it. To continue, we also need to download the form codebook, as we need this to add the labels.
cb <- ns_get_codebook(formid)
cb
#> element_no element_type element_code
#> 1 1 HEADING <NA>
#> 2 2 TEXT <NA>
#> 3 3 IMAGE <NA>
#> 4 4 PAGE_BREAK <NA>
#> 5 5 QUESTION freetext
#> element_text
#> 1 Let talk about Nettskjema! This is a subheading!
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 This is a question about something super important, where the user can input free text.
#> element_desc
#> 1 <NA>
#> 2 <p>This is some text in the form, not a question but a descriptive text.</p>\n\n<p> </p>\n
#> 3 <NA>
#> 4 <NA>
#> 5 <p>With a description field giving details on what should be answered.</p>\n\n<p> </p>\n
#> subelement_seq answer_text answer_code
#> 1 NA <NA> <NA>
#> 2 NA <NA> <NA>
#> 3 NA <NA> <NA>
#> 4 NA <NA> <NA>
#> 5 NA <NA> <NA>
#> answer_seq
#> 1 NA
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> [ reached 'max' / getOption("max.print") -- omitted 34 rows ]
To add the labels, we use the ns_add_labels
function,
using both the unlabelled data and the codebook.
lab_data <- data |>
ns_add_labels(cb)
lab_data
#> formid $submission_id
#> 1 123823 27685292
#> $created freetext radio
#> 1 2023-06-01T20:57:15+02:00 some text 1
#> checkbox.questionnaires checkbox.events
#> 1 1 1
#> checkbox.logs dropdown radio_matrix.grants
#> 1 0 4 1
#> radio_matrix.lecture radio_matrix.email
#> 1 2 2
#> checkbox_matrix.1.IT
#> 1 1
#> checkbox_matrix.1.colleague
#> 1 1
#> checkbox_matrix.1.admin
#> 1 0
#> checkbox_matrix.1.union
#> 1 0
#> checkbox_matrix.1.internet
#> 1 0
#> checkbox_matrix.2.IT
#> 1 0
#> checkbox_matrix.2.colleague
#> 1 0
#> checkbox_matrix.2.admin
#> 1 1
#> checkbox_matrix.2.union
#> 1 0
#> checkbox_matrix.2.internet date time
#> 1 0 2023-06-01 12:00
#> datetime number_decimal
#> 1 2023-06-12T13:33 4.5
#> number_integer slider attachment_1
#> 1 77 3 sølvi.png
#> attachment_2 $answer_time_ms
#> 1 74630
#> [ reached 'max' / getOption("max.print") -- omitted 2 rows ]
You will notice that this does on the surface look completely normal,
with no added extras. Inspecting the data with the str
function will however expose what lies beneath the surface.
str(data)
#> 'data.frame': 3 obs. of 31 variables:
#> $ formid : num 123823 123823 123823
#> $ $submission_id : int 27685292 27685302 27685319
#> $ $created : chr "2023-06-01T20:57:15+02:00" "2023-06-01T20:58:33+02:00" "2023-06-01T20:59:50+02:00"
#> $ freetext : chr "some text" "another answer" ""
#> $ radio : int 1 -1 -1
#> $ checkbox.questionnaires : int 1 0 1
#> $ checkbox.events : int 1 0 1
#> $ checkbox.logs : int 0 1 1
#> $ dropdown : int 4 9 4
#> $ radio_matrix.grants : int 1 3 1
#> $ radio_matrix.lecture : int 2 3 1
#> $ radio_matrix.email : int 2 1 1
#> $ checkbox_matrix.1.IT : int 1 0 0
#> $ checkbox_matrix.1.colleague: int 1 0 1
#> $ checkbox_matrix.1.admin : int 0 0 0
#> $ checkbox_matrix.1.union : int 0 0 0
#> $ checkbox_matrix.1.internet : int 0 1 1
#> $ checkbox_matrix.2.IT : int 0 0 1
#> $ checkbox_matrix.2.colleague: int 0 0 1
#> $ checkbox_matrix.2.admin : int 1 1 1
#> $ checkbox_matrix.2.union : int 0 1 1
#> $ checkbox_matrix.2.internet : int 0 0 0
#> $ date : chr "2023-06-01" "2023-02-07" "2022-09-28"
#> $ time : chr "12:00" "14:45" "05:11"
#> $ datetime : chr "2023-06-12T13:33" "2024-02-15T08:55" "2022-03-03T07:29"
#> $ number_decimal : chr "4.5" "2.2" "10"
#> $ number_integer : int 77 45 98
#> $ slider : int 3 1 9
#> $ attachment_1 : chr "sølvi.png" "" ""
#> $ attachment_2 : chr "" "marius.jpeg" ""
#> $ $answer_time_ms : int 74630 71313 70230
str(lab_data)
#> Classes 'ns-data' and 'data.frame': 3 obs. of 31 variables:
#> $ formid : num 123823 123823 123823
#> $ $submission_id : int 27685292 27685302 27685319
#> $ $created : chr "2023-06-01T20:57:15+02:00" "2023-06-01T20:58:33+02:00" "2023-06-01T20:59:50+02:00"
#> $ freetext : 'character' chr "some text" "another answer" ""
#> ..- attr(*, "label")= chr "This is a question about something super important, where the user can input free text."
#> ..- attr(*, "ns_type")= chr "QUESTION"
#> $ radio : int+lbl [1:3] 1, -1, -1
#> ..@ labels : Named int 1 -1
#> .. ..- attr(*, "names")= chr [1:2] "Very happy!" "Very unhappy!"
#> ..@ label : chr "How happy are we with Nettskjema?"
#> ..@ ns_type: chr "RADIO"
#> $ checkbox.questionnaires : int+lbl [1:3] 1, 0, 1
#> ..@ labels : Named chr "questionnaires"
#> .. ..- attr(*, "names")= chr "Questionnaires"
#> ..@ label : chr "What do we use it for?:: Questionnaires"
#> ..@ ns_type: chr "CHECKBOX"
#> $ checkbox.events : int+lbl [1:3] 1, 0, 1
#> ..@ labels : Named chr "events"
#> .. ..- attr(*, "names")= chr "Event sign-ups"
#> ..@ label : chr "What do we use it for?:: Event sign-ups"
#> ..@ ns_type: chr "CHECKBOX"
#> $ checkbox.logs : int+lbl [1:3] 0, 1, 1
#> ..@ labels : Named chr "logs"
#> .. ..- attr(*, "names")= chr "Data logging"
#> ..@ label : chr "What do we use it for?:: Data logging"
#> ..@ ns_type: chr "CHECKBOX"
#> $ dropdown : int+lbl [1:3] 4, 9, 4
#> ..@ labels : Named int 4 9
#> .. ..- attr(*, "names")= chr [1:2] "UiO" "OsloMet"
#> ..@ label : chr "Who is responsible with Nettskjema?"
#> ..@ ns_type: chr "SELECT"
#> $ radio_matrix.grants : int+lbl [1:3] 1, 3, 1
#> ..@ labels : Named int 1 2 3
#> .. ..- attr(*, "names")= chr [1:3] "yes" "no" "not applicable"
#> ..@ label : chr "In the last month I have: written some grant applications"
#> ..@ ns_type: chr "MATRIX_RADIO"
#> $ radio_matrix.lecture : int+lbl [1:3] 2, 3, 1
#> ..@ labels : Named int 1 2 3
#> .. ..- attr(*, "names")= chr [1:3] "yes" "no" "not applicable"
#> ..@ label : chr "In the last month I have: held a lecture"
#> ..@ ns_type: chr "MATRIX_RADIO"
#> $ radio_matrix.email : int+lbl [1:3] 2, 1, 1
#> ..@ labels : Named int 1 2 3
#> .. ..- attr(*, "names")= chr [1:3] "yes" "no" "not applicable"
#> ..@ label : chr "In the last month I have: sent some e-mails"
#> ..@ ns_type: chr "MATRIX_RADIO"
#> $ checkbox_matrix.1.IT : int+lbl [1:3] 1, 0, 0
#> ..@ labels : Named chr "IT"
#> .. ..- attr(*, "names")= chr "IT"
#> ..@ label : chr "In the last year, I have :: sought help from :: IT"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.1.colleague: int+lbl [1:3] 1, 0, 1
#> ..@ labels : Named chr "colleague"
#> .. ..- attr(*, "names")= chr "A colleague"
#> ..@ label : chr "In the last year, I have :: sought help from :: A colleague"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.1.admin : int+lbl [1:3] 0, 0, 0
#> ..@ labels : Named chr "admin"
#> .. ..- attr(*, "names")= chr "Administration"
#> ..@ label : chr "In the last year, I have :: sought help from :: Administration"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.1.union : int+lbl [1:3] 0, 0, 0
#> ..@ labels : Named chr "union"
#> .. ..- attr(*, "names")= chr "Union"
#> ..@ label : chr "In the last year, I have :: sought help from :: Union"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.1.internet : int+lbl [1:3] 0, 1, 1
#> ..@ labels : Named chr "internet"
#> .. ..- attr(*, "names")= chr "Internet"
#> ..@ label : chr "In the last year, I have :: sought help from :: Internet"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.2.IT : int+lbl [1:3] 0, 0, 1
#> ..@ labels : Named chr "IT"
#> .. ..- attr(*, "names")= chr "IT"
#> ..@ label : chr "In the last year, I have :: received e-mails from :: IT"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.2.colleague: int+lbl [1:3] 0, 0, 1
#> ..@ labels : Named chr "colleague"
#> .. ..- attr(*, "names")= chr "A colleague"
#> ..@ label : chr "In the last year, I have :: received e-mails from :: A colleague"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.2.admin : int+lbl [1:3] 1, 1, 1
#> ..@ labels : Named chr "admin"
#> .. ..- attr(*, "names")= chr "Administration"
#> ..@ label : chr "In the last year, I have :: received e-mails from :: Administration"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.2.union : int+lbl [1:3] 0, 1, 1
#> ..@ labels : Named chr "union"
#> .. ..- attr(*, "names")= chr "Union"
#> ..@ label : chr "In the last year, I have :: received e-mails from :: Union"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ checkbox_matrix.2.internet : int+lbl [1:3] 0, 0, 0
#> ..@ labels : Named chr "internet"
#> .. ..- attr(*, "names")= chr "Internet"
#> ..@ label : chr "In the last year, I have :: received e-mails from :: Internet"
#> ..@ ns_type: chr "MATRIX_CHECKBOX"
#> $ date : 'character' chr "2023-06-01" "2023-02-07" "2022-09-28"
#> ..- attr(*, "label")= chr "Choose a random date"
#> ..- attr(*, "ns_type")= chr "DATE"
#> $ time : 'character' chr "12:00" "14:45" "05:11"
#> ..- attr(*, "label")= chr "now choose a random time!"
#> ..- attr(*, "ns_type")= chr "DATE"
#> $ datetime : 'character' chr "2023-06-12T13:33" "2024-02-15T08:55" "2022-03-03T07:29"
#> ..- attr(*, "label")= chr "Lastly choose a date AND time!"
#> ..- attr(*, "ns_type")= chr "DATE"
#> $ number_decimal : 'numeric' chr "4.5" "2.2" "10"
#> ..- attr(*, "label")= chr "Pick a number between 0 and 10!"
#> ..- attr(*, "ns_type")= chr "NUMBER"
#> $ number_integer : int 77 45 98
#> ..- attr(*, "label")= chr "Choose an integer between 0 and 100"
#> ..- attr(*, "ns_type")= chr "NUMBER"
#> $ slider : int 3 1 9
#> ..- attr(*, "label")= chr "Choose a point on the slider!"
#> ..- attr(*, "ns_type")= chr "LINEAR_SCALE"
#> $ attachment_1 : 'character' chr "sølvi.png" "" ""
#> ..- attr(*, "label")= chr "Upload a fun image!"
#> ..- attr(*, "ns_type")= chr "ATTACHMENT"
#> $ attachment_2 : 'character' chr "" "marius.jpeg" ""
#> ..- attr(*, "label")= chr "This is an attachment2"
#> ..- attr(*, "ns_type")= chr "ATTACHMENT"
#> $ $answer_time_ms : int 74630 71313 70230
You can see there are lots of label attributes attached to
lab_data
that are not there in the data
object. These labels are attached from the codebook, and provide
important context to what the data source actually is. These hidden
features of the data are unlocked when working with functions from the
{haven} package.
Notice how the metadata (variable and value labels) are now embedded in the dataset.
{haven}
Once your data has been labelled, the {haven}
package
provides functionalities to inspect and manipulate labels with ease.
Use var_label()
to extract variable-level labels and
val_labels()
to extract value-level labels:
If you need to modify labels, we suggest you do this directly in the
Nettskjema codebook setup. However, if you are working on a form that is
no longer available in Nettskjema, and you have downloaded and saved
both the data and the codebook (or the labelled data), labels can be
modified using {haven}
:
lab_data$freetex
#> [1] "some text" "another answer"
#> [3] ""
#> attr(,"label")
#> [1] "This is a question about something super important, where the user can input free text."
#> attr(,"ns_type")
#> [1] "QUESTION"
#> attr(,"class")
#> [1] "character"
# Update variable-level label for 'freetext'
var_label(lab_data$freetext) <- "Important freetext comment"
lab_data$radio
#> <labelled<integer>[3]>: How happy are we with Nettskjema?
#> [1] 1 -1 -1
#>
#> Labels:
#> value label
#> 1 Very happy!
#> -1 Very unhappy!
# Update value labels for 'radio'
val_labels(lab_data$radio) <- c(Unhappy = -1, Happy = 1)
# Check updated labels
var_label(lab_data$freetext)
#> [1] "Important freetext comment"
val_labels(lab_data$radio)
#> Unhappy Happy
#> -1 1
Some key benefits include the following: 1. Enhanced Documentation: Embedding metadata directly in the dataset improves clarity. 2. Consistency: Reduces ambiguity when working across different teams or systems. 3. Compatibility: Facilitates interoperability with SPSS, Stata, and other statistical software.
For more information, check out the labelled package documentation.