Skip to content

has_role(), all_predictors(), and all_outcomes() can be used to select variables in a formula that have certain roles.

Similarly, has_type(), all_numeric(), and all_nominal() are used to select columns based on their data type. Nominal variables include both character and factor.

In most cases, the selectors all_numeric_predictors() and all_nominal_predictors(), which select on role and type, will be the right approach for users.

See selections for more details.

current_info() is an internal function.

All of these functions have have limited utility outside of column selection in step functions.

Usage

has_role(match = "predictor")

all_predictors()

all_numeric_predictors()

all_nominal_predictors()

all_outcomes()

has_type(match = "numeric")

all_numeric()

all_nominal()

current_info()

Arguments

match

A single character string for the query. Exact matching is used (i.e. regular expressions won't work).

Value

Selector functions return an integer vector.

current_info() returns an environment with objects vars and data.

Examples

data(biomass, package = "modeldata")

rec <- recipe(biomass) %>%
  update_role(
    carbon, hydrogen, oxygen, nitrogen, sulfur,
    new_role = "predictor"
  ) %>%
  update_role(HHV, new_role = "outcome") %>%
  update_role(sample, new_role = "id variable") %>%
  update_role(dataset, new_role = "splitting indicator")

recipe_info <- summary(rec)
recipe_info
#> # A tibble: 8 × 4
#>   variable type    role                source  
#>   <chr>    <chr>   <chr>               <chr>   
#> 1 sample   nominal id variable         original
#> 2 dataset  nominal splitting indicator original
#> 3 carbon   numeric predictor           original
#> 4 hydrogen numeric predictor           original
#> 5 oxygen   numeric predictor           original
#> 6 nitrogen numeric predictor           original
#> 7 sulfur   numeric predictor           original
#> 8 HHV      numeric outcome             original

# Centering on all predictors except carbon
rec %>%
  step_center(all_predictors(), -carbon) %>%
  prep(training = biomass) %>%
  bake(new_data = NULL)
#> # A tibble: 536 × 8
#>    sample            dataset carbon hydrogen oxygen nitrogen  sulfur   HHV
#>    <fct>             <fct>    <dbl>    <dbl>  <dbl>    <dbl>   <dbl> <dbl>
#>  1 Akhrot Shell      Traini…   49.8   0.181   4.37   -0.667  -0.234   20.0
#>  2 Alabama Oak Wood… Traini…   49.5   0.241   2.73   -0.877  -0.234   19.2
#>  3 Alder             Traini…   47.8   0.341   7.68   -0.967  -0.214   18.3
#>  4 Alfalfa           Traini…   45.1  -0.489  -2.97    2.22   -0.0736  18.2
#>  5 Alfalfa Seed Str… Traini…   46.8  -0.0586  2.15   -0.0772 -0.214   18.4
#>  6 Alfalfa Stalks    Traini…   45.4   0.291   1.63    0.963  -0.134   18.5
#>  7 Alfalfa Stems     Traini…   47.2   0.531  -0.383   1.60   -0.0336  18.7
#>  8 Alfalfa Straw     Traini…   45.7   0.241   1.13    0.623  -0.0336  18.3
#>  9 Almond            Traini…   48.8   0.0414  2.33   -0.277  -0.234   18.6
#> 10 Almond Hull       Traini…   47.1   0.441   1.43    0.123  -0.134   18.9
#> # … with 526 more rows