step_range
creates a specification of a recipe
step that will normalize numeric data to be within a pre-defined
range of values.
step_range( recipe, ..., role = NA, trained = FALSE, min = 0, max = 1, ranges = NULL, skip = FALSE, id = rand_id("range") ) # S3 method for step_range tidy(x, ...)
recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
---|---|
... | One or more selector functions to choose which
variables will be scaled. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessing have been estimated. |
min | A single numeric value for the smallest value in the range. |
max | A single numeric value for the largest value in the range. |
ranges | A character vector of variables that will be
normalized. Note that this is ignored until the values are
determined by |
skip | A logical. Should the step be skipped when the
recipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
An updated version of recipe
with the new step
added to the sequence of existing steps (if any). For the
tidy
method, a tibble with columns terms
(the
selectors or variables selected), min
, and max
.
When a new data point is outside of the ranges seen in
the training set, the new values are truncated at min
or
max
.
library(modeldata) data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) ranged_trans <- rec %>% step_range(carbon, hydrogen) ranged_obj <- prep(ranged_trans, training = biomass_tr) transformed_te <- bake(ranged_obj, biomass_te) biomass_te[1:10, names(transformed_te)]#> carbon hydrogen oxygen nitrogen sulfur HHV #> 15 46.35 5.67 47.20 0.30 0.22 18.275 #> 20 43.25 5.50 48.06 2.85 0.34 17.560 #> 26 42.70 5.50 49.10 2.40 0.30 17.173 #> 31 46.40 6.10 37.30 1.80 0.50 18.851 #> 36 48.76 6.32 42.77 0.20 0.00 20.547 #> 41 44.30 5.50 41.70 0.70 0.20 18.467 #> 46 38.94 5.23 54.13 1.19 0.51 15.095 #> 51 42.10 4.66 33.80 0.95 0.20 16.240 #> 55 29.20 4.40 31.10 0.14 4.90 11.147 #> 65 27.80 3.77 23.69 4.63 1.05 10.750transformed_te#> # A tibble: 80 x 6 #> carbon hydrogen oxygen nitrogen sulfur HHV #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0.384 0.490 47.2 0.3 0.22 18.3 #> 2 0.347 0.475 48.1 2.85 0.34 17.6 #> 3 0.340 0.475 49.1 2.4 0.3 17.2 #> 4 0.385 0.527 37.3 1.8 0.5 18.9 #> 5 0.414 0.546 42.8 0.2 0 20.5 #> 6 0.360 0.475 41.7 0.7 0.2 18.5 #> 7 0.295 0.451 54.1 1.19 0.51 15.1 #> 8 0.333 0.402 33.8 0.95 0.2 16.2 #> 9 0.177 0.379 31.1 0.14 4.9 11.1 #> 10 0.160 0.325 23.7 4.63 1.05 10.8 #> # … with 70 more rows#> # A tibble: 2 x 4 #> terms min max id #> <chr> <dbl> <dbl> <chr> #> 1 carbon NA NA range_gI6r3 #> 2 hydrogen NA NA range_gI6r3#> # A tibble: 2 x 4 #> terms min max id #> <chr> <dbl> <dbl> <chr> #> 1 carbon 14.6 97.2 range_gI6r3 #> 2 hydrogen 0.03 11.6 range_gI6r3