update_role()
alters an existing role in the recipe or assigns an initial
role to variables that do not yet have a declared role.
add_role()
adds an additional role to variables that already have a role
in the recipe. It does not overwrite old roles, as a single variable can have
multiple roles.
remove_role()
eliminates a single existing role in the recipe.
add_role(recipe, ..., new_role = "predictor", new_type = NULL) update_role(recipe, ..., new_role = "predictor", old_role = NULL) remove_role(recipe, ..., old_role)
recipe | An existing |
---|---|
... | One or more selector functions to choose which variables are
being assigned a role. See |
new_role | A character string for a single role. |
new_type | A character string for specific type that the variable should
be identified as. If left as |
old_role | A character string for the specific role to update for the
variables selected by |
An updated recipe object.
update_role()
should be used when a variable doesn't currently have a role
in the recipe, or to replace an old_role
with a new_role
. add_role()
only adds additional roles to variables that already have roles and will
throw an error when the current role is missing (i.e. NA
).
When using add_role()
, if a variable is selected that already has the
new_role
, a warning is emitted and that variable is skipped so no duplicate
roles are added.
Adding or updating roles is a useful way to group certain variables that
don't fall in the standard "predictor"
bucket. You can perform a step
on all of the variables that have a custom role with the selector
has_role()
.
library(recipes) library(modeldata) data(biomass) # Using the formula method, roles are created for any outcomes and predictors: recipe(HHV ~ ., data = biomass) %>% summary()#> # A tibble: 8 x 4 #> variable type role source #> <chr> <chr> <chr> <chr> #> 1 sample nominal predictor original #> 2 dataset nominal predictor original #> 3 carbon numeric predictor original #> 4 hydrogen numeric predictor original #> 5 oxygen numeric predictor original #> 6 nitrogen numeric predictor original #> 7 sulfur numeric predictor original #> 8 HHV numeric outcome original# However `sample` and `dataset` aren't predictors. Since they already have # roles, `update_role()` can be used to make changes: recipe(HHV ~ ., data = biomass) %>% update_role(sample, new_role = "id variable") %>% update_role(dataset, new_role = "splitting variable") %>% summary()#> # A tibble: 8 x 4 #> variable type role source #> <chr> <chr> <chr> <chr> #> 1 sample nominal id variable original #> 2 dataset nominal splitting variable original #> 3 carbon numeric predictor original #> 4 hydrogen numeric predictor original #> 5 oxygen numeric predictor original #> 6 nitrogen numeric predictor original #> 7 sulfur numeric predictor original #> 8 HHV numeric outcome original# `update_role()` cannot set a role to NA, use `remove_role()` for that if (FALSE) { recipe(HHV ~ ., data = biomass) %>% update_role(sample, new_role = NA_character_) } # ------------------------------------------------------------------------------ # Variables can have more than one role. `add_role()` can be used # if the column already has at least one role: recipe(HHV ~ ., data = biomass) %>% add_role(carbon, sulfur, new_role = "something") %>% summary()#> # A tibble: 10 x 4 #> variable type role source #> <chr> <chr> <chr> <chr> #> 1 sample nominal predictor original #> 2 dataset nominal predictor original #> 3 carbon numeric predictor original #> 4 carbon numeric something original #> 5 hydrogen numeric predictor original #> 6 oxygen numeric predictor original #> 7 nitrogen numeric predictor original #> 8 sulfur numeric predictor original #> 9 sulfur numeric something original #> 10 HHV numeric outcome original# `update_role()` has an argument called `old_role` that is required to # unambiguously update a role when the column currently has multiple roles. recipe(HHV ~ ., data = biomass) %>% add_role(carbon, new_role = "something") %>% update_role(carbon, new_role = "something else", old_role = "something") %>% summary()#> # A tibble: 9 x 4 #> variable type role source #> <chr> <chr> <chr> <chr> #> 1 sample nominal predictor original #> 2 dataset nominal predictor original #> 3 carbon numeric predictor original #> 4 carbon numeric something else original #> 5 hydrogen numeric predictor original #> 6 oxygen numeric predictor original #> 7 nitrogen numeric predictor original #> 8 sulfur numeric predictor original #> 9 HHV numeric outcome original# `carbon` has two roles at the end, so the last `update_roles()` fails since # `old_role` was not given. if (FALSE) { recipe(HHV ~ ., data = biomass) %>% add_role(carbon, sulfur, new_role = "something") %>% update_role(carbon, new_role = "something else") } # ------------------------------------------------------------------------------ # To remove a role, `remove_role()` can be used to remove a single role. recipe(HHV ~ ., data = biomass) %>% add_role(carbon, new_role = "something") %>% remove_role(carbon, old_role = "something") %>% summary()#> # A tibble: 8 x 4 #> variable type role source #> <chr> <chr> <chr> <chr> #> 1 sample nominal predictor original #> 2 dataset nominal predictor original #> 3 carbon numeric predictor original #> 4 hydrogen numeric predictor original #> 5 oxygen numeric predictor original #> 6 nitrogen numeric predictor original #> 7 sulfur numeric predictor original #> 8 HHV numeric outcome original# To remove all roles, call `remove_role()` multiple times to reset to `NA` recipe(HHV ~ ., data = biomass) %>% add_role(carbon, new_role = "something") %>% remove_role(carbon, old_role = "something") %>% remove_role(carbon, old_role = "predictor") %>% summary()#> # A tibble: 8 x 4 #> variable type role source #> <chr> <chr> <chr> <chr> #> 1 sample nominal predictor original #> 2 dataset nominal predictor original #> 3 carbon numeric NA original #> 4 hydrogen numeric predictor original #> 5 oxygen numeric predictor original #> 6 nitrogen numeric predictor original #> 7 sulfur numeric predictor original #> 8 HHV numeric outcome original# ------------------------------------------------------------------------------ # If the formula method is not used, all columns have a missing role: recipe(biomass) %>% summary()#> # A tibble: 8 x 4 #> variable type role source #> <chr> <chr> <lgl> <chr> #> 1 sample nominal NA original #> 2 dataset nominal NA original #> 3 carbon numeric NA original #> 4 hydrogen numeric NA original #> 5 oxygen numeric NA original #> 6 nitrogen numeric NA original #> 7 sulfur numeric NA original #> 8 HHV numeric NA original