step_num2factor
will convert one or more numeric vectors to factors
(ordered or unordered). This can be useful when categories are encoded as
integers.
step_num2factor( recipe, ..., role = NA, transform = function(x) x, trained = FALSE, levels, ordered = FALSE, skip = FALSE, id = rand_id("num2factor") ) # S3 method for step_num2factor tidy(x, ...)
recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
---|---|
... | One or more selector functions to choose which variables will be
converted to factors. See |
role | Not used by this step since no new variables are created. |
transform | A function taking a single argument |
trained | A logical to indicate if the quantities for preprocessing have been estimated. |
levels | A character vector of values that will be used as the levels.
These are the numeric data converted to character and ordered. This is
modified once |
ordered | A single logical value; should the factor(s) be ordered? |
skip | A logical. Should the step be skipped when the
recipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
An updated version of recipe
with the new step added to the
sequence of existing steps (if any). For the tidy
method, a tibble with
columns terms
(the selectors or variables selected) and ordered
.
library(dplyr) library(modeldata) data(attrition) attrition %>% group_by(StockOptionLevel) %>% count()#> # A tibble: 4 x 2 #> # Groups: StockOptionLevel [4] #> StockOptionLevel n #> <int> <int> #> 1 0 631 #> 2 1 596 #> 3 2 158 #> 4 3 85amnt <- c("nothin", "meh", "some", "copious") rec <- recipe(Attrition ~ StockOptionLevel, data = attrition) %>% step_num2factor( StockOptionLevel, transform = function(x) x + 1, levels = amnt ) encoded <- rec %>% prep() %>% bake(new_data = NULL) table(encoded$StockOptionLevel, attrition$StockOptionLevel)#> #> 0 1 2 3 #> nothin 631 0 0 0 #> meh 0 596 0 0 #> some 0 0 158 0 #> copious 0 0 0 85# an example for binning binner <- function(x) { x <- cut(x, breaks = 1000 * c(0, 5, 10, 20), include.lowest = TRUE) # now return the group number as.numeric(x) } inc <- c("low", "med", "high") rec <- recipe(Attrition ~ MonthlyIncome, data = attrition) %>% step_num2factor( MonthlyIncome, transform = binner, levels = inc, ordered = TRUE ) %>% prep() encoded <- bake(rec, new_data = NULL) table(encoded$MonthlyIncome, binner(attrition$MonthlyIncome))#> #> 1 2 3 #> low 749 0 0 #> med 0 440 0 #> high 0 0 281# What happens when a value is out of range? ceo <- attrition %>% slice(1) %>% mutate(MonthlyIncome = 10^10) bake(rec, ceo)#> # A tibble: 1 x 2 #> MonthlyIncome Attrition #> <ord> <fct> #> 1 NA Yes