step_indicate_na creates a specification of a recipe step that will
create and append additional binary columns to the dataset to indicate
which observations are missing.
step_indicate_na( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, prefix = "na_ind", skip = FALSE, id = rand_id("indicate_na") ) # S3 method for step_indicate_na tidy(x, ...)
A recipe object. The check will be added to the sequence of operations for this recipe.
One or more selector functions to choose which variables are
affected by the step. See
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new na indicator columns created from the original variables will be used as predictors in a model.
A logical for whether the selectors in
A character string of variable names that will be populated (eventually) by the terms argument.
A character string that will be the prefix to the resulting new variables. Defaults to "na_ind".
A logical. Should the check be skipped when the
recipe is baked by
A character string that is unique to this step to identify it.
An updated version of
recipe with the new step added to the
sequence of existing steps (if any). For the
tidy method, a tibble with
terms (the selectors or variables selected) and
library(modeldata) data("credit_data") ## missing data per column purrr::map_dbl(credit_data, function(x) mean(is.na(x)))#> Status Seniority Home Time Age Marital #> 0.0000000000 0.0000000000 0.0013471037 0.0000000000 0.0000000000 0.0002245173 #> Records Job Expenses Income Assets Debt #> 0.0000000000 0.0004490346 0.0000000000 0.0855410867 0.0105523125 0.0040413112 #> Amount Price #> 0.0000000000 0.0000000000set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_indicate_na(Income, Assets, Debt) imp_models <- prep(impute_rec, training = credit_tr) imputed_te <- bake(imp_models, new_data = credit_te, everything())