Documentation for tidy methods for all steps has been added when missing and improved to describe the return value more accurately. (#936)
When errors are thrown about wrongly typed input to steps, the offending variables and their types are now listed. (#1217)
All warnings and errors have been updated to use the cli package for increased clarity and consistency. (#1237)
CRAN release: 2023-08-25
- Minor speed-up and reduced memory consumption for spline steps that rely on
CRAN release: 2023-08-10
keep_original_colsargument has been added to
step_window(). The default for each step is set to preserve past behavior. This change should mean that every step that produces new columns has the
Fixed bugs where
step_regex()didn’t work with empty selection. All steps now leave data unmodified when having empty selections. (#1142)
step_spline_nonnegative()now correctly return a zero row tibble when used with empty selection. (#1133)
step_string2factor()now throw an informative error if needed non-standard role columns are missing during
CRAN release: 2023-04-25
Steps with tunable arguments now have those arguments listed in the documentation.
All steps that add new columns will now informatively error if name collision occurs. (#983)
CRAN release: 2023-02-20
step_percentile()to determine different ways of handling values outside the range of the training data.
update print methods to use cli package for formatting. (#426)
Print methods no longer errors for untrained recipes with long selections. (#1083)
generics::tune_args()are now registered unconditionally (tidymodels/workflows#192).
CRAN release: 2023-01-11
check_type()got a new
typesargument for more precise checking of column types.
recipes_extension_check()have been added. This developer focused function checks that steps have all the required S3 methods.
CRAN release: 2022-11-09
Types of variables have been made granular.
"nominal"has been split into
"numeric"has been split into
all_datetime(), in addition to the existing
all_nominal(). All selectors come with a
step_range()has gained a argument
clippingthat when set to
FALSEno longer clips the data to be between
CRAN release: 2022-10-15
CRAN release: 2022-07-07
CRAN release: 2022-07-01
Added support for case weights in the following steps
recipes now checks that all columns in the
recipe()are also present in the
bake(). An exception is made for columns with roles of either
"case_weights", which are typically not required at
bake()time. The new
update_role_requirements()function can be used to adjust whether or not columns of a particular role are required at
bake()time if you need to opt out of this check (#1011).
step_downsample()in recipes as they are now available in the themis package.
step_naomit()now actually had their defaults for
TRUEas was stated in release 0.1.13. (934)
CRAN release: 2022-02-18
step_dummy_extract()creates multiple variables from a character variable by extracting elements using regular expressions and counting those elements.
All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect (#603, #531). For example, if a previous step removed all of the columns need for a later step, the recipe does not fail when it is estimated (with the exception of
step_mutate()). The documentation in
?selectionshas been updated with advice for writing selectors when filtering steps are used. (#813)
Improved the efficiency of computations for the Box-Cox transformation (#820).
fastICA()using a specific set of random numbers so that initialization is reproducible.
step_kpca*()now directly use the
kernlabpackage. Recipe objects from previous versions will error when applied to new data.
CRAN release: 2021-09-27
The deprecation for
step_downsample()has been escalated from a deprecation warning to a deprecation error; these functions are available in the themis package.
Escalate deprecation for old versions of imputation steps (such as
step_bagimpute()) from a soft deprecation to a regular deprecation; these imputation steps have new names like
step_kpca()was un-deprecated and gained the
The deprecation of the
step_nzv()was escalated to a deprecation error.
Fix imputation steps for new data that is all
NA, and generate a warning for recipes created under previous versions that cannot be imputed with this fix (#719).
A bug was fixed where imputed values via bagged trees would have the wrong levels.
The computations for the Yeo-Johnson transformation were made more efficient (#782).
recipes_eval_select()which is a developer tool that is useful for creating new recipes steps. It powers the tidyselect semantics that are specific to recipes and supports the modern tidyselect API introduced in tidyselect 1.0.0. Additionally, the older
terms_select()has been deprecated in favor of this new helper (#739).
When only the terms attributes are desired from
model.frameuse the first row of data to improve speed and memory use (#726).
Reorganize documentation for all recipe step
CRAN release: 2021-04-16
Integer variables used in
step_profile()are now kept as integers (and not doubles).
keep_original_colsargument to several steps:
Performance improvements for
prep()step no longer evaluates the basis functions on the training set and the
bake()steps only evaluates the basis functions once for each unique input value (#574)
neighborsparameter’s default range for
step_isomap()was changed to be 20-80.
The deprecation for
step_downsample()has been escalated from a soft deprecation to a regular deprecation; these functions are available in the themis package.
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.
CRAN release: 2020-11-11
The full tidyselect DSL is now allowed inside recipes
step_*()functions. This includes the operators
!and the new
where()function. Additionally, the restriction preventing user defined selectors from being used has been lifted (#572).
If steps that drop/add variables are skipped when baking the test set, the resulting column ordering of the baked test set will now be relative to the original recipe specification rather than relative to the baked training set. This is often more intuitive.
More infrastructure work to make parallel processing on Windows less buggy with PSOCK clusters
FALSEwhen an unprepped recipe is used.
CRAN release: 2020-10-17
prep()gained an option to print a summary of which columns were added and/or removed during execution.
To reduce confusion between
juice(), the latter is superseded in favor of using
bake(object, new_data = NULL). The
new_dataargument now has no default, so a
NULLvalue must be explicitly used in order to emulate the results of
juice()will remain in the package (and used internally) but most communication and training will use
bake(object, new_data = NULL). (#543)
Tim Zhou added a step to use linear models for imputation (#555)
CRAN release: 2020-06-23
step_pls()was changed so that it uses the Bioconductor mixOmics package. Objects created with previous versions of
recipescan still use
bake(). With the current version, the categorical outcomes can be used but now multivariate models do not. Also, the new method allows for sparse results.
Avoided partial matching on
seq()arguments in internal functions.
Improved error messaging, for example when a user tries to
prep()a tuneable recipe.
step_downsample()are soft deprecated in recipes as they are now available in the themis package. They will be removed in the next version.
NAvalues so that variables with zero variance plus are removed.
tunepacakge can now use recipes with
checkoperations (but also requires
step_pca()now has an option for returning the variance statistics for each component.
CRAN release: 2020-05-01
- Some S3 methods were not being registered previously. This caused issues in R 4.0.
CRAN release: 2020-04-30
recipesdoes not directly depend on
dials, it has several S3 methods for generics in
dials. Version 0.0.5 of
dialsadded stricter validation for these methods, so changes were required for
step_cut()enables you to create a factor from a numeric based on provided break (contributed by Edwin Thoen)
CRAN release: 2020-03-18
yj_transform()to avoid conflicts.
CRAN release: 2020-01-07
CRAN release: 2019-12-18
The imputation steps do not change the data type being imputed now. Previously, if the data were integer, the data would be changed to numeric (for some step types). The change is breaking since the underlying data of imputed values are now saved as a list instead of a vector (for some step types).
The data sets were moved to the new
When using a selector that returns no columns,
bake()will now return a tibble with as many rows as the original template data or the
new_datarespectively. This is more consistent with how selectors work in dplyr (#411).
Code was added to explicitly register
recipesis loaded. This is required because of changes occurring in R 4.0.
check_class()checks if a variable is of the designated class. Class is either learned from the train set or provided in the check. (contributed by Edwin Thoen)
CRAN release: 2019-09-15
Release driven by changes in
tidyr (v 1.0.0).
wdth argument has been renamed to
bakeif variable contains values that were not observed in the train set (contributed by Edwin Thoen)
When no outcomes are in the recipe, using
bake(object, new_data, all_outcomes()will return a tibble with zero rows and zero columns (instead of failing). (#298). This will also occur when the selectors select no columns.
step_downsample()will replace the
step_upsample()will replace it with
ratiostill works (for now) but issues a deprecation message.
step_discretize()has arguments moved out of
optionstoo; the main arguments are now
min_unique. Again, deprecation messages are issued with the old argument structure.
Methods were added for a future generic called
tunable(). This outlines which parameters in a step can/could be tuned.
CRAN release: 2019-07-02
Release driven by changes in
Since 2018, a warning has been issued when the wrong argument was used in
bake(recipe, newdata). The depredation period is over and
new_datais officially required.
step_other()did not collapse any levels, it would still add an “other” level to the factor. This would lump new factor levels into “other” when data were baked (as
step_novel()does). This no longer occurs since it was inconsistent with
?step_other, which said that
“If no pooling is done the data are unmodified”.
step_otheris greater than one then it specifies the minimum sample size before the levels of the factor are collapsed into the “other” category. #289
Due to changes by CRAN,
step_nnmf()only works on versions of R >= 3.6.0 due to dependency issues.
CRAN release: 2019-03-21
Small release driven by changes in
sample() in the current r-devel.
A new vignette discussing roles has been added.
To provide infrastructure for finalizing varying parameters, an
update()method for recipe steps has been added. This allows users to alter information in steps that have not yet been trained.
step_interactwill no longer fail if an interaction contains an interaction using column that has been previously filtered from the data. A warning is issued when this happens and no interaction terms will be created.
step_corrwas made more fault tolerant for cases where the data contain a zero-variance column or columns with missing values.
Set the embedded environment to NULL in
prep.step_dummyto reduce the file size of serialized recipe class objects when using
step_dummynow returns the original variable and the levels of the future dummy variables.
- Updating the role of new columns generated by a recipe step no longer also updates
NAroles of existing columns (#296).
CRAN release: 2018-11-19
Several argument names were changed to be consistent with other
dials) and the general tidyverse naming conventions.
step_knnimputewas changed to
step_isomaphad the number of neighbors promoted to a main argument called
nbaggout of the options and into a main argument
step_nshas degrees of freedom promoted to a main argument with name
degreepromoted to a main argument.
juiceand other functions has
new_data. For this version only, using
newdatawill only result in a wanring.
- Several steps had
prepand a few steps had
All steps gain an
idfield that will be used in the future to reference other steps.
prepis now defaulted to
verbose = TRUE, the approximate size of the data set is printed. #207
step_integerconverts data to ordered integers similar to
LabelEncoder#123 and #185
step_geodistcan be used to calculate the distance between geocodes and a single reference location.
step_nnmfcomputes the non-negative matrix factorization for data.
prepperwas moved to
- A number of packages were moved from “Imports” to “Suggests” to reduce the install footprint. A function was added to prompt the user to install the needed packages when the relevant steps are invoked.
step_step_string2factorwill now accept factors and leave them as-is.
step_knnimputenow excludes missing data in the variable to be imputed from the nearest-neighbor calculation. This would have resulted in some missing data to not be imputed (i.e. return another missing value).
step_dummynow produces a warning (instead of failing) when non-factor columns are selected. Only factor columns are used; no conversion is done for character data. issue #186
dummy_namesgained a separator argument. issue #183
seedarguments for more control over randomness.
broomis no longer used to get the
tidygeneric. These are now contained in the
- When a recipe is prepared, a running list of all columns is created and the last known use of each column is kept. This is to avoid bugs when a step that is skipped removes columns. issue #239
CRAN release: 2018-06-16
bakeif variable range in new data is outside the range that was learned from the train set (contributed by Edwin Thoen)
step_lagcan lag variables in the data set (contributed by Alex Hayes).
step_naomitremoves rows with missing data for specific columns (contributed by Alex Hayes).
step_rollimputecan be used to impute data in a sequence or series by estimating their values within a moving window.
step_plscan conduct supervised feature extraction for predictors.
signedargument (contributed by Edwin Thoen).
The internal functions
printerhave been exported to enable other packages to contain steps.
When training new steps after some steps have been previously trained, the
retain = TRUEoption should be set on previous invocations of
- It can now compute the entire set of dummy variables per factor predictor using the
one_hot = TRUEoption. Thanks to Davis Vaughan.
contrastoption was removed. The step uses the global option for contrasts.
- `The step also produces missing indicator variables when the original factor has a missing value
- It can now compute the entire set of dummy variables per factor predictor using the
step_otherwill now convert novel levels of the factor to the “other” level.
step_bin2factornow has an option to choose how the values are translated to the levels (contributed by Michael Levy).
juicecan now export basic data frames.
okcdata were updated with two additional columns.
CRAN release: 2018-01-11
Edwin Thoen suggested adding validation checks for certain data characteristics. This fed into the existing notion of expanding
recipesbeyond steps (see the non-step steps project). A new set of operations, called
checks, can now be used. These should throw an informative error when the check conditions are not met and return the existing data otherwise.
Steps now have a
skipoption that will not apply preprocessing when
bakeis used. See the article on skipping steps for more information.
check_missingwill validate that none of the specified variables contain missing data.
detect_stepcan be used to check if a recipe contains a particular preprocessing operation.
step_num2factorcan be used to convert numeric data (especially integers) to factors.
step_noveladds a new factor level to nominal variables that will be used when new data contain a level that did not exist when the recipe was prepared.
step_profilecan be used to generate design matrix grids for prediction profile plots of additive models where one variable is varied over a grid and all of the others are fixed at a single value.
step_upsamplecan be used to change the number of rows in the data based on the frequency distributions of a factor variable in the training set. By default, this operation is only applied to the training set;
bakeignores this operation.
step_naomitdrops rows when specified columns contain
NA, similar to
step_lagallows for the creation of lagged predictor columns.
CRAN release: 2017-11-20
- The default selectors for
bakewas changed from
prepis now defaulted to
A bug in
step_dummywas fixed that makes sure that the correct binary variables are generated despite the levels or values of the incoming factor. Also,
step_dummynow requires factor inputs.
step_dummyalso has a new default naming function that works better for factors. However, there is an extra argument (
ordinal) now to the functions that can be passed to
step_interactnow allows for selectors (e.g.
starts_with("prefix")to be used in the interaction formula.
dplyr::one_ofwas added to the list of selectors.
step_bsadds B-spline basis functions.
step_unorderconverts ordered factors to unordered factors.
step_countcounts the number of instances that a pattern exists in a string.
step_factor2stringcan be used to move between encodings.
step_lowerimputeis for numeric data where the values cannot be measured below a specific value. For these cases, random uniform values are used for the truncated values.
- A step to remove simple zero-variance variables was added (
- A series of
tidymethods were added for recipes and many (but not all) steps.
bake.recipe, the argument
newdatais now without a default.
juicecan now save the final processed data set in sparse format. Note that, as the steps are processed, a non-sparse data frame is used to store the results.
- A formula method was added for recipes to get a formula with the outcome(s) and predictors based on the trained recipe.
- Two of the main functions changed names.
step_lincombremoves variables involved in linear combinations to resolve them.
- A step for converting binary variables to factors (
step_regexapplies a regular expression to a character or factor vector to create dummy variables.
- The class system for
recipeobjects was changed so that pipes can be used to create the recipe with a formula.
roleargument in factor of a general set of selectors. If no selector is used, all the predictors are returned.
- Two steps for simple imputation using the mean or mode were added.