step_ordinalscore
creates a specification of a
recipe step that will convert ordinal factor variables into
numeric scores.
step_ordinalscore( recipe, ..., role = NA, trained = FALSE, columns = NULL, convert = as.numeric, skip = FALSE, id = rand_id("ordinalscore") ) # S3 method for step_ordinalscore tidy(x, ...)
recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
---|---|
... | One or more selector functions to choose which
variables are affected by the step. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessing have been estimated. |
columns | A character string of variables that will be
converted. This is |
convert | A function that takes an ordinal factor vector as an input and outputs a single numeric variable. |
skip | A logical. Should the step be skipped when the
recipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
An updated version of recipe
with the new step
added to the sequence of existing steps (if any). For the
tidy
method, a tibble with columns terms
(the
columns that will be affected).
Dummy variables from ordered factors with C
levels will create polynomial basis functions with C-1
terms. As an alternative, this step can be used to translate the
ordered levels into a single numeric vector of values that
represent (subjective) scores. By default, the translation uses
a linear scale (1, 2, 3, ... C
) but custom score
functions can also be used (see the example below).
fail_lvls <- c("meh", "annoying", "really_bad") ord_data <- data.frame(item = c("paperclip", "twitter", "airbag"), fail_severity = factor(fail_lvls, levels = fail_lvls, ordered = TRUE)) model.matrix(~fail_severity, data = ord_data)#> (Intercept) fail_severity.L fail_severity.Q #> 1 1 -7.071068e-01 0.4082483 #> 2 1 -7.850462e-17 -0.8164966 #> 3 1 7.071068e-01 0.4082483 #> attr(,"assign") #> [1] 0 1 1 #> attr(,"contrasts") #> attr(,"contrasts")$fail_severity #> [1] "contr.poly" #>linear_values <- recipe(~ item + fail_severity, data = ord_data) %>% step_dummy(item) %>% step_ordinalscore(fail_severity) linear_values <- prep(linear_values, training = ord_data) bake(linear_values, new_data = NULL, everything())#> # A tibble: 3 x 3 #> fail_severity item_paperclip item_twitter #> <dbl> <dbl> <dbl> #> 1 1 1 0 #> 2 2 0 1 #> 3 3 0 0custom <- function(x) { new_values <- c(1, 3, 7) new_values[as.numeric(x)] } nonlin_scores <- recipe(~ item + fail_severity, data = ord_data) %>% step_dummy(item) %>% step_ordinalscore(fail_severity, convert = custom) tidy(nonlin_scores, number = 2)#> # A tibble: 1 x 2 #> terms id #> <chr> <chr> #> 1 fail_severity ordinalscore_qIB0vnonlin_scores <- prep(nonlin_scores, training = ord_data) bake(nonlin_scores, new_data = NULL, everything())#> # A tibble: 3 x 3 #> fail_severity item_paperclip item_twitter #> <dbl> <dbl> <dbl> #> 1 1 1 0 #> 2 3 0 1 #> 3 7 0 0#> # A tibble: 1 x 2 #> terms id #> <chr> <chr> #> 1 fail_severity ordinalscore_qIB0v