check_class creates a specification of a recipe check that will check if a variable is of a designated class.

check_class(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  class_nm = NULL,
  allow_additional = FALSE,
  skip = FALSE,
  class_list = NULL,
  id = rand_id("class")
)

Arguments

recipe

A recipe object. The check will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables for this check. See selections() for more details.

role

Not used by this check since no new variables are created.

trained

A logical for whether the selectors in ... have been resolved by prep().

class_nm

A character vector that will be used in inherits to check the class. If NULL the classes will be learned in prep. Can contain more than one class.

allow_additional

If TRUE a variable is allowed to have additional classes to the one(s) that are checked.

skip

A logical. Should the check be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

class_list

A named list of column classes. This is NULL until computed by prep.recipe().

id

A character string that is unique to this check to identify it.

Value

An updated version of recipe with the new check added to the sequence of any existing operations.

Details

This function can check the classes of the variables in two ways. When the class argument is provided it will check if all the variables specified are of the given class. If this argument is NULL, the check will learn the classes of each of the specified variables in prep. Both ways will break bake if the variables are not of the requested class. If a variable has multiple classes in prep, all the classes are checked. Please note that in prep the argument strings_as_factors defaults to TRUE. If the train set contains character variables the check will be break bake when strings_as_factors is TRUE.

When you tidy() this check, a tibble with columns terms (the selectors or variables selected) and value (the type) is returned.

See also

Examples

library(dplyr)
library(modeldata)
data(okc)

# Learn the classes on the train set
train <- okc[1:1000, ]
test  <- okc[1001:2000, ]
recipe(train, age ~ . ) %>%
  check_class(everything()) %>%
  prep(train, strings_as_factors = FALSE) %>%
  bake(test)
#> # A tibble: 1,000 × 6
#>    diet                height location      date       Class   age
#>    <chr>                <int> <chr>         <date>     <fct> <int>
#>  1 NA                      66 san francisco 2012-06-27 other    34
#>  2 strictly anything       74 berkeley      2012-06-26 stem     23
#>  3 mostly anything         67 san francisco 2012-06-29 other    23
#>  4 anything                72 burlingame    2012-06-29 other    45
#>  5 strictly vegetarian     71 oakland       2012-06-29 stem     35
#>  6 mostly anything         66 san francisco 2011-10-17 other    20
#>  7 NA                      66 san francisco 2012-04-10 stem     39
#>  8 anything                67 san francisco 2012-06-29 other    39
#>  9 mostly anything         67 oakland       2012-06-29 other    35
#> 10 mostly anything         64 san francisco 2012-06-26 other    33
#> # … with 990 more rows

# Manual specification
recipe(train, age ~ .) %>%
  check_class(age, class_nm = "integer") %>%
  check_class(diet, location, class_nm = "character") %>%
  check_class(date, class_nm = "Date") %>%
  prep(train, strings_as_factors = FALSE) %>%
  bake(test)
#> # A tibble: 1,000 × 6
#>    diet                height location      date       Class   age
#>    <chr>                <int> <chr>         <date>     <fct> <int>
#>  1 NA                      66 san francisco 2012-06-27 other    34
#>  2 strictly anything       74 berkeley      2012-06-26 stem     23
#>  3 mostly anything         67 san francisco 2012-06-29 other    23
#>  4 anything                72 burlingame    2012-06-29 other    45
#>  5 strictly vegetarian     71 oakland       2012-06-29 stem     35
#>  6 mostly anything         66 san francisco 2011-10-17 other    20
#>  7 NA                      66 san francisco 2012-04-10 stem     39
#>  8 anything                67 san francisco 2012-06-29 other    39
#>  9 mostly anything         67 oakland       2012-06-29 other    35
#> 10 mostly anything         64 san francisco 2012-06-26 other    33
#> # … with 990 more rows

# By default only the classes that are specified
#   are allowed.
x_df <- tibble(time = c(Sys.time() - 60, Sys.time()))
x_df$time %>% class()
#> [1] "POSIXct" "POSIXt" 
if (FALSE) {
recipe(x_df) %>%
  check_class(time, class_nm = "POSIXt") %>%
  prep(x_df) %>%
  bake_(x_df)
}

# Use allow_additional = TRUE if you are fine with it
recipe(x_df) %>%
  check_class(time, class_nm = "POSIXt", allow_additional = TRUE) %>%
  prep(x_df) %>%
  bake(x_df)
#> # A tibble: 2 × 1
#>   time               
#>   <dttm>             
#> 1 2021-09-27 20:25:46
#> 2 2021-09-27 20:26:46