tidyenigh
is an R package that ships analysis-ready data from the Encuesta Nacional de Ingresos y Gastos de los Hogares ENIGH survey in a consistent, tidy and reproducible fashion.
The package includes data from the 2016, 2018, 2020 and 2022 surveys. By analisys-ready we mean that by lazy-loading the package, the user has instant access to every data set in the survey. Our analysis-ready standard includes:
- Variable labels provided by the official documentation.
- Factor levels and labels for categorical variables.
- Proper data types for each variable.
- Original documentation for each data set with R’s help system.
The package also includes the original metadata for each data set, as it is required by INEGI’s licence.
Usage
Lazy Loading
Data is Lazy Loaded, so you can use it as soon as you load the package. Using gt::gt()
is a great way to work with labelled data, as it will display the variable labels included in the package.
library(tidyenigh)
pop_gt <- poblacion2022 |>
dplyr::select(sexo, edad, nivelaprob, entidad) |>
head() |>
gt::gt()
pop_gt
Sexo | Edad | Nivel de instrucción aprobado | Entidad federativa |
---|---|---|---|
Variable labels
Data includes variable and value labels
gastoshogar2016 |>
labelled::generate_dictionary(details = F)
pos | variable | label |
---|---|---|
Value labels and correct data types
Categorical variables were transformed into factors with the correct levels and labels.
poblacion2022 |>
as_survey() |>
gtsummary::tbl_svysummary(include = c(edad, diabetes, nivelaprob),
by = sexo)
Characteristic | Hombre, N = 61,805,6771 | Mujer, N = 67,193,3611 |
---|---|---|
1 Median (IQR); n (%) |
Installation
You can install the latest version of tidyenigh
from GitHub with:
# install.packages("devtools")
devtools::install_github("estebandegetau/tidyenigh")