step_string2factor will convert one or more character
vectors to factors (ordered or unordered).
step_string2factor( recipe, ..., role = NA, trained = FALSE, levels = NULL, ordered = FALSE, skip = FALSE, id = rand_id("string2factor") ) # S3 method for step_string2factor tidy(x, ...)
A recipe object. The step will be added to the sequence of operations for this recipe.
One or more selector functions to choose which
variables will be converted to factors. See
Not used by this step since no new variables are created.
A logical to indicate if the quantities for preprocessing have been estimated.
An options specification of the levels to be used
for the new factor. If left
A single logical value; should the factor(s) be ordered?
A logical. Should the step be skipped when the
recipe is baked by
A character string that is unique to this step to identify it.
An updated version of
recipe with the new step
added to the sequence of existing steps (if any). For the
tidy method, a tibble with columns
selectors or variables selected) and
levels is given,
convert all variables affected by this step to have the same
Also, note that
prep has an option
TRUE. This should be changed so that raw character
data will be applied to
step_string2factor. However, this step
can also take existing factors (but will leave them as-is).
library(modeldata) data(okc) rec <- recipe(~ diet + location, data = okc) make_factor <- rec %>% step_string2factor(diet) make_factor <- prep(make_factor, training = okc, strings_as_factors = FALSE) # note that `diet` is a factor bake(make_factor, new_data = NULL) %>% head#> # A tibble: 6 x 2 #> diet location #> <fct> <chr> #> 1 strictly anything south san francisco #> 2 mostly other oakland #> 3 anything san francisco #> 4 vegetarian berkeley #> 5 NA san francisco #> 6 mostly anything san franciscookc %>% head#> # A tibble: 6 x 6 #> age diet height location date Class #> <int> <chr> <int> <chr> <date> <fct> #> 1 22 strictly anything 75 south san francisco 2012-06-28 other #> 2 35 mostly other 70 oakland 2012-06-29 other #> 3 38 anything 68 san francisco 2012-06-27 other #> 4 23 vegetarian 71 berkeley 2012-06-28 other #> 5 29 NA 66 san francisco 2012-06-27 other #> 6 29 mostly anything 67 san francisco 2012-06-29 stemtidy(make_factor, number = 1)#> # A tibble: 1 x 3 #> terms ordered id #> <chr> <lgl> <chr> #> 1 diet FALSE string2factor_a3AY0