step_unknown creates a specification of a recipe
step that will assign a missing value in a factor level to"unknown".
step_unknown( recipe, ..., role = NA, trained = FALSE, new_level = "unknown", objects = NULL, skip = FALSE, id = rand_id("unknown") ) # S3 method for step_unknown tidy(x, ...)
| recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
|---|---|
| ... | One or more selector functions to choose which
variables that will be affected by the step. These variables
should be character or factor types. See |
| role | Not used by this step since no new variables are created. |
| trained | A logical to indicate if the quantities for preprocessing have been estimated. |
| new_level | A single character value that will be assigned to new factor levels. |
| objects | A list of objects that contain the information
on factor levels that will be determined by |
| skip | A logical. Should the step be skipped when the
recipe is baked by |
| id | A character string that is unique to this step to identify it. |
| x | A |
An updated version of recipe with the new step
added to the sequence of existing steps (if any). For the
tidy method, a tibble with columns terms (the
columns that will be affected) and value (the factor
levels that is used for the new value)
The selected variables are adjusted to have a new
level (given by new_level) that is placed in the last
position.
Note that if the original columns are character, they will be converted to factors by this step.
If new_level is already in the data given to prep, an error
is thrown.
step_factor2string(), step_string2factor(),
dummy_names(), step_regex(), step_count(),
step_ordinalscore(), step_unorder(), step_other(), step_novel()
library(modeldata) data(okc) rec <- recipe(~ diet + location, data = okc) %>% step_unknown(diet, new_level = "unknown diet") %>% step_unknown(location, new_level = "unknown location") %>% prep() table(juice(rec)$diet, okc$diet, useNA = "always") %>% as.data.frame() %>% dplyr::filter(Freq > 0)#> Var1 Var2 Freq #> 1 anything anything 6174 #> 2 halal halal 11 #> 3 kosher kosher 11 #> 4 mostly anything mostly anything 16562 #> 5 mostly halal mostly halal 48 #> 6 mostly kosher mostly kosher 86 #> 7 mostly other mostly other 1004 #> 8 mostly vegan mostly vegan 335 #> 9 mostly vegetarian mostly vegetarian 3438 #> 10 other other 331 #> 11 strictly anything strictly anything 5107 #> 12 strictly halal strictly halal 18 #> 13 strictly kosher strictly kosher 18 #> 14 strictly other strictly other 450 #> 15 strictly vegan strictly vegan 227 #> 16 strictly vegetarian strictly vegetarian 874 #> 17 vegan vegan 136 #> 18 vegetarian vegetarian 665 #> 19 unknown diet <NA> 24360#> # A tibble: 1 x 3 #> terms value id #> <chr> <chr> <chr> #> 1 diet unknown diet unknown_wvoo0