step_string2factor
will convert one or more character
vectors to factors (ordered or unordered).
step_string2factor( recipe, ..., role = NA, trained = FALSE, levels = NULL, ordered = FALSE, skip = FALSE, id = rand_id("string2factor") ) # S3 method for step_string2factor tidy(x, ...)
recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
---|---|
... | One or more selector functions to choose which
variables will be converted to factors. See
|
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessing have been estimated. |
levels | An options specification of the levels to be used
for the new factor. If left |
ordered | A single logical value; should the factor(s) be ordered? |
skip | A logical. Should the step be skipped when the
recipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
An updated version of recipe
with the new step
added to the sequence of existing steps (if any). For the
tidy
method, a tibble with columns terms
(the
selectors or variables selected) and ordered
.
If levels
is given, step_string2factor
will
convert all variables affected by this step to have the same
levels.
Also, note that prep
has an option strings_as_factors
that
defaults to TRUE
. This should be changed so that raw character
data will be applied to step_string2factor
. However, this step
can also take existing factors (but will leave them as-is).
library(modeldata) data(okc) rec <- recipe(~ diet + location, data = okc) make_factor <- rec %>% step_string2factor(diet) make_factor <- prep(make_factor, training = okc, strings_as_factors = FALSE) # note that `diet` is a factor bake(make_factor, new_data = NULL) %>% head#> # A tibble: 6 x 2 #> diet location #> <fct> <chr> #> 1 strictly anything south san francisco #> 2 mostly other oakland #> 3 anything san francisco #> 4 vegetarian berkeley #> 5 NA san francisco #> 6 mostly anything san franciscookc %>% head#> # A tibble: 6 x 6 #> age diet height location date Class #> <int> <chr> <int> <chr> <date> <fct> #> 1 22 strictly anything 75 south san francisco 2012-06-28 other #> 2 35 mostly other 70 oakland 2012-06-29 other #> 3 38 anything 68 san francisco 2012-06-27 other #> 4 23 vegetarian 71 berkeley 2012-06-28 other #> 5 29 NA 66 san francisco 2012-06-27 other #> 6 29 mostly anything 67 san francisco 2012-06-29 stem#> # A tibble: 1 x 3 #> terms ordered id #> <chr> <lgl> <chr> #> 1 diet FALSE string2factor_a3AY0