Skip to content

check_class creates a specification of a recipe check that will check if a variable is of a designated class.

Usage

check_class(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  class_nm = NULL,
  allow_additional = FALSE,
  skip = FALSE,
  class_list = NULL,
  id = rand_id("class")
)

Arguments

recipe

A recipe object. The check will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables for this check. See selections() for more details.

role

Not used by this check since no new variables are created.

trained

A logical for whether the selectors in ... have been resolved by prep().

class_nm

A character vector that will be used in inherits to check the class. If NULL the classes will be learned in prep. Can contain more than one class.

allow_additional

If TRUE a variable is allowed to have additional classes to the one(s) that are checked.

skip

A logical. Should the check be skipped when the recipe is baked by bake()? While all operations are baked when prep() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

class_list

A named list of column classes. This is NULL until computed by prep().

id

A character string that is unique to this check to identify it.

Value

An updated version of recipe with the new check added to the sequence of any existing operations.

Details

This function can check the classes of the variables in two ways. When the class argument is provided it will check if all the variables specified are of the given class. If this argument is NULL, the check will learn the classes of each of the specified variables in prep. Both ways will break bake if the variables are not of the requested class. If a variable has multiple classes in prep, all the classes are checked. Please note that in prep the argument strings_as_factors defaults to TRUE. If the train set contains character variables the check will be break bake when strings_as_factors is TRUE.

Tidying

When you tidy() this check, a tibble with columns terms (the selectors or variables selected) and value (the type) is returned.

Case weights

The underlying operation does not allow for case weights.

See also

Examples

library(dplyr)
data(Sacramento, package = "modeldata")

# Learn the classes on the train set
train <- Sacramento[1:500, ]
test <- Sacramento[501:nrow(Sacramento), ]
recipe(train, sqft ~ .) %>%
  check_class(everything()) %>%
  prep(train, strings_as_factors = FALSE) %>%
  bake(test)
#> # A tibble: 432 × 9
#>    city           zip     beds baths type   price latitude longitude  sqft
#>    <fct>          <fct>  <int> <dbl> <fct>  <int>    <dbl>     <dbl> <int>
#>  1 SACRAMENTO     z95834     4   2   Resi… 328578     38.6     -122.  1659
#>  2 ELK_GROVE      z95757     3   3   Resi… 331000     38.4     -121.  2442
#>  3 RANCHO_CORDOVA z95742     4   3   Resi… 331500     38.6     -121.  2590
#>  4 SACRAMENTO     z95833     4   2   Resi… 340000     38.6     -122.  2155
#>  5 SACRAMENTO     z95838     3   2   Resi… 344755     38.7     -121.  1673
#>  6 SACRAMENTO     z95828     3   2   Resi… 345746     38.5     -121.  1810
#>  7 ELK_GROVE      z95757     4   2   Resi… 351000     38.4     -121.  2789
#>  8 GALT           z95632     4   2   Resi… 353767     38.3     -121.  1606
#>  9 GALT           z95632     5   3.5 Resi… 355000     38.3     -121.  3499
#> 10 SACRAMENTO     z95835     4   2   Resi… 356035     38.7     -122.  2166
#> # ℹ 422 more rows

# Manual specification
recipe(train, sqft ~ .) %>%
  check_class(sqft, class_nm = "integer") %>%
  check_class(city, zip, type, class_nm = "factor") %>%
  check_class(latitude, longitude, class_nm = "numeric") %>%
  prep(train, strings_as_factors = FALSE) %>%
  bake(test)
#> # A tibble: 432 × 9
#>    city           zip     beds baths type   price latitude longitude  sqft
#>    <fct>          <fct>  <int> <dbl> <fct>  <int>    <dbl>     <dbl> <int>
#>  1 SACRAMENTO     z95834     4   2   Resi… 328578     38.6     -122.  1659
#>  2 ELK_GROVE      z95757     3   3   Resi… 331000     38.4     -121.  2442
#>  3 RANCHO_CORDOVA z95742     4   3   Resi… 331500     38.6     -121.  2590
#>  4 SACRAMENTO     z95833     4   2   Resi… 340000     38.6     -122.  2155
#>  5 SACRAMENTO     z95838     3   2   Resi… 344755     38.7     -121.  1673
#>  6 SACRAMENTO     z95828     3   2   Resi… 345746     38.5     -121.  1810
#>  7 ELK_GROVE      z95757     4   2   Resi… 351000     38.4     -121.  2789
#>  8 GALT           z95632     4   2   Resi… 353767     38.3     -121.  1606
#>  9 GALT           z95632     5   3.5 Resi… 355000     38.3     -121.  3499
#> 10 SACRAMENTO     z95835     4   2   Resi… 356035     38.7     -122.  2166
#> # ℹ 422 more rows

# By default only the classes that are specified
#   are allowed.
x_df <- tibble(time = c(Sys.time() - 60, Sys.time()))
x_df$time %>% class()
#> [1] "POSIXct" "POSIXt" 
if (FALSE) {
recipe(x_df) %>%
  check_class(time, class_nm = "POSIXt") %>%
  prep(x_df) %>%
  bake_(x_df)
}

# Use allow_additional = TRUE if you are fine with it
recipe(x_df) %>%
  check_class(time, class_nm = "POSIXt", allow_additional = TRUE) %>%
  prep(x_df) %>%
  bake(x_df)
#> # A tibble: 2 × 1
#>   time               
#>   <dttm>             
#> 1 2024-07-05 14:28:24
#> 2 2024-07-05 14:29:24