Changelog
Source:NEWS.md
recipes (development version)
Example for
step_novel()
now better illustrates how it works. (@Edgar-Zamora, #1248)recipe()
,prep()
, andbake()
now work with sparse tibbles. (#1364, #1366)recipe()
,prep()
, andbake()
now work with sparse matrices. (#1364, #1368, #1369)prep.recipe(..., strings_as_factors = TRUE)
now only converts string variables that have role “predictor” or “outcome”. (@dajmcdon, #1358, #1376)All steps and checks now require arguments
trained
,skip
,role
, andid
at all times.step_dummy()
gainedsparse
argument. When set toTRUE
,step_dummy()
will produce sparse vectors. (#1392)
recipes 1.1.0
CRAN release: 2024-07-04
Improvements
Improved error message for misspelled argument in step functions. (#1318)
recipe()
can now take data.frames with list-columns or sf data.frames as input todata
. (#1283)recipe()
will now show better error when columns are misspelled in formula (#1283).add_role()
now errors if a column would simultaneously have roles"outcome"
and"predictor"
. (#935)prep()
will now error if the ptype of the data doesn’t match which was used to define the recipe. (#793)Added more documentation in
?selections
about howtidyselect::everything()
works in recipes. (#1259)New
extract_fit_time()
method has been added that returns the time it took to train the recipe. (#1071)step_spline_b()
,step_spline_convex()
,step_spline_monotone()
, andstep_spline_nonnegative()
now throws informative errors if thedegree
,deg_free
, andcomplete_set
arguments causes an error. (#1170)step_mutate()
gained.pkgs
argument to specify what packages need to be loaded for step to work. (#1282)step_interact()
now gives better error ifterms
isn’t a formula. (#1299)The
prefix
argument ofstep_dummy_multi_choice()
is now properly documented. (#1298)Significant speedup in
step_dummy()
when applied to many columns. (#1305)step_dummy()
now gives an informative error on attempt to generate too many columns to fit in memory. (#828)step_dummy()
andstep_unknown()
now throw more informative warnings for unseen levels. (#450)step_dummy()
now throws more informative warnings forNA
values. (#450)step_date()
now accepts"mday"
as a possible feature. (@Edgar-Zamora, #1211)
Bug Fixes
NA
levels in factors aren’t dropped when passed torecipe()
. (#1291)recipe()
no longer crashes when given long formula expression (#1283).Fixed bug in
step_ns()
andstep_bs()
whereknots
field inoptions
argument wasn’t correctly used. (#1297)Bug fixed in
step_interact()
where long formulas were used. (#1231, #1289)Fixed documentation mistake where default value of
keep_original_cols
argument were wrong. (#1314)
Developer
Developer helper function
recipes_ptype()
has been added, returning expected input data forprep()
andbake()
for a given recipe object. (#1329)Developer helper function
recipes_ptype_validate()
has been added, to validate new data is compatible with recipe ptype. (#793)Developer helper functions
recipes_names_predictors()
andrecipes_names_outcomes()
have been added to aid variable selection in steps. (#1026)
recipes 1.0.10
CRAN release: 2024-02-18
Bug Fixes
- Fixed bug where
step_log()
breaks legacy recipe objects by indexingnames(object)
inbake()
. (@stufield, #1284)
recipes 1.0.9
CRAN release: 2023-12-13
Improvements
Minor speed-up and reduced memory consumption for
step_pca()
in thebake()
stage by reducing unused multiplications (@jkennel, #1265)Document that
update_role()
,add_role()
andremove_role()
are applied before steps and checks. (#778)Documentation for tidy methods for all steps has been added when missing and improved to describe the return value more accurately. (#936)
step_dummy()
will now error if passed character instead of loudly ignoring them. Only applicable when settingstrings_as_factors = FALSE
. (#1233)It is now documented that
step_spline_b()
can be made periodic. (#1223)prep()
now correctly throws a warning whentraining
argument is set when prepping a prepped recipe, telling the user that it will be ignored. (#1244)When errors are thrown about wrongly typed input to steps, the offending variables and their types are now listed. (#1217)
All warnings and errors have been updated to use the cli package for increased clarity and consistency. (#1237)
Added warnings when
step_scale()
,step_normalise()
,step_center()
orstep_range()
result inNaN
columns. (@mastoffel, #1221)
Bug Fixes
Fixed bug where
step_factor2string()
ifstrings_as_factors = TRUE
is set inprep()
. (#317)Fixed bug where
tidy.step_cut()
always returned zero row tibbles for trained recipes. (#1229)
recipes 1.0.8
CRAN release: 2023-08-25
Improvements
- Minor speed-up and reduced memory consumption for spline steps that rely on
spline2_apply
(#1200)
Bug Fixes
- Fixed bugs where spline steps (
step_ns()
,step_bs()
,step_spline_b()
,step_spline_convex()
,step_spline_monotone()
,step_spline_natural()
,step_spline_nonnegative()
) would error if baked with 1 row. (#1191)
recipes 1.0.7
CRAN release: 2023-08-10
New Steps
-
step_classdist_shrunken()
, a regularized version ofstep_classdist()
, was added. (#1185)
Improvements
step_bs()
andstep_ns()
have gainedkeep_original_cols
argument. (#1164)The
keep_original_cols
argument has been added tostep_classdist()
,step_count()
,step_depth()
,step_geodist()
,step_indicate_na()
,step_interact()
,step_lag()
,step_poly()
,step_regex()
,step_window()
. The default for each step is set to preserve past behavior. This change should mean that every step that produces new columns has thekeep_original_cols
argument. (#1167)
Bug Fixes
Fixed bugs where
step_classdist()
,step_count()
,step_depth()
,step_geodist()
,step_interact()
,step_nnmf_sparse()
, andstep_regex()
didn’t work with empty selection. All steps now leave data unmodified when having empty selections. (#1142)step_classdist()
,step_count()
andstep_depth()
no longer returns a column with allNA
s with empty selections. (#1142)step_regex()
no longer returns a column with all 0s with empty selections. (#1142)The
tidy()
methods forstep_geodist()
,step_nnmf_sparse()
, andstep_sample()
now correctly return zero-row tibbles when used with empty selections. (#1144)step_poly_bernstein()
,step_profile()
,step_spline_b()
,step_spline_convex()
,step_spline_monotone()
,step_spline_natural()
, andstep_spline_nonnegative()
now correctly return a zero row tibble when used with empty selection. (#1133)Fixed bug where the
tidy()
method forstep_sample()
didn’t return anid
column. (#1144)check_class()
,check_missing()
,check_new_values()
,check_range()
,step_naomit()
,step_poly_bernstein()
,step_spline_b()
,step_spline_convex()
,step_spline_monotone()
,step_spline_natural()
,step_spline_nonnegative()
, andstep_string2factor()
now throw an informative error if needed non-standard role columns are missing duringbake()
. (#1145)
Breaking Changes
step_window()
now throws an error instead of silently overwriting ifnames
argument overlaps with existing columns. (#1172)step_regex()
andstep_count()
will now informatively error if name collision occurs. (#1169)
Developer
Added developer function
remove_original_cols()
to help remove original columns that are no longer needed. (#1149)Added developer function
recipes_remove_cols()
to provide standardized way to remove columns by column names. (#1155)
recipes 1.0.6
CRAN release: 2023-04-25
Improvements
Steps with tunable arguments now have those arguments listed in the documentation.
All steps that add new columns will now informatively error if name collision occurs. (#983)
Bug Fixes
Fixed bug in
step_spline_b()
,step_spline_convex()
,step_spline_monotone()
, andspline_nonnegative()
where you weren’t able to tune thedegree
argument.step_range()
now perform correctly performs clipping on recipes created before 1.0.3. (#1097)
Breaking Changes
- The
tidy()
method forstep_impute_mean()
,step_impute_median()
, andstep_impute_mode()
now the imputed value with the column namevalue
instead ofmodel
. This is in line with the output ofstep_impute_lower()
. (#826)
recipes 1.0.5
CRAN release: 2023-02-20
Added
outside
argument tostep_percentile()
to determine different ways of handling values outside the range of the training data.step_range()
is now backwards compatible with respect to theclipping
argument that was added 1.0.3, and old saved recipes can now be baked. (#1090)update print methods to use cli package for formatting. (#426)
Print methods no longer errors for untrained recipes with long selections. (#1083)
The
recipe
,step
, andcheck
methods forgenerics::tune_args()
are now registered unconditionally (tidymodels/workflows#192).Added a
conditionMessage()
method forrecipes_error
s to consistently point out which step errors occurred in when reporting errors. (#1080)
recipes 1.0.4
CRAN release: 2023-01-11
Added missing tidy method for
step_intercept()
andstep_lag()
. (#730)Errors in
prep()
andbake()
will now indicate which step caused the error. (#420)Developer focused
check_type()
got a newtypes
argument for more precise checking of column types.recipes_extension_check()
have been added. This developer focused function checks that steps have all the required S3 methods.recipe()
now error more informatively whendata
is missing. (#1042)
recipes 1.0.3
CRAN release: 2022-11-09
step_dummy()
no longer returns integer columns as there are a number of contrast methods that return fractional values. (#1053)Fixed a 0-length recycling bug in
step_dummy_extract()
exposed by the development version of purrr (#1052).Types of variables have been made granular.
"nominal"
has been split into"ordered"
and"unordered"
and"numeric"
has been split into"double"
and"integer"
. (#993)New selectors:
all_double()
,all_ordered()
,all_unordered()
,all_date()
andall_datetime()
, in addition to the existingall_numeric()
andall_nominal()
. All selectors come with a*_predictors()
variant. (#993)Developer focused
.get_data_types()
generic has been added to designate types of columns. Exported for use in extension packages that deal with types not supported in recipes directly. (#993)The
step_date()
function now defaults to using the clock package to format day-of-week and month labels. (#1048)step_range()
has gained a argumentclipping
that when set toFALSE
no longer clips the data to be betweenmin
andmax
.Added documentation regarding developer functions
?developer_functions
. (#1163)
recipes 1.0.2
CRAN release: 2022-10-15
A new set of basis functions were added:
step_spline_b()
,step_spline_convex()
,step_spline_monotone()
,step_spline_natural()
,step_spline_nonnegative()
, andstep_poly_bernstein()
.step_date()
,step_dummy()
,step_dummy_extract()
,step_holiday()
,step_ordinalscore()
, andstep_regex()
now returns integer results when appropriate. (#766)The default for the
strict
argument instep_integer()
has been changed fromFALSE
toTRUE
. The function will thus return integers, rather than whole-number numerics, by default. (#766)The default for the
value
argument instep_intercept()
has been changed from1
to1L
. (#766)
recipes 1.0.1
CRAN release: 2022-07-07
- Fixed bug where
step_holiday()
didn’t work if it isn’t have any missing values. (#1019)
recipes 1.0.0
CRAN release: 2022-07-01
Improvements and Other Changes
-
Added support for case weights in the following steps
A number of developer focused functions to deal with case weights are added:
are_weights_used()
,get_case_weights()
,averages()
,medians()
,variances()
,correlations()
,covariances()
, andpca_wts()
recipes now checks that all columns in the
data
supplied torecipe()
are also present in thenew_data
supplied tobake()
. An exception is made for columns with roles of either"outcome"
or"case_weights"
, which are typically not required atbake()
time. The newupdate_role_requirements()
function can be used to adjust whether or not columns of a particular role are required atbake()
time if you need to opt out of this check (#1011).The
summary()
method for recipe objects now contains an extra column to indicate which columns are required whenbake()
is used.
New Steps
-
step_time()
has been added that extracts time features such as hour, minute, or second. (#968)
Bug Fixes
Fixed bug in which functions that
step_hyperbolic()
uses (#932).step_dummy_multi_choice()
now respects factor-levels of the selected variables when creating dummies. (#916)step_dummy()
no works correctly with recipes trained on version 0.1.17 or earlier. (#921)Fixed a bug where setting
fresh = TRUE
inprep()
wouldn’t result in re-prepping the recipe. (#492)Bug was fixed in
step_holiday()
which used to error when it was applied to variable with missing values. (#743)A bug was fixed in
step_normalize()
which used to error if 1 variable was selected. (#963)
Improvements and Other Changes
Finally removed
step_upsample()
andstep_downsample()
in recipes as they are now available in the themis package.discretize()
andstep_discretize()
now can return factor levels similar tocut()
. (#674)step_naomit()
now actually had their defaults forskip
changed toTRUE
as was stated in release 0.1.13. (934)step_dummy()
has been made more robust to non-standard column names. (#879)step_pls()
now allows you use use multiple outcomes if they are numeric. (#651)step_normalize()
andstep_scale()
ignore columns with zero variance, generate a warning and suggest to usestep_zv()
(#920).printing for
step_impute_knn()
now show variables that were imputed instead of variables used for imputing. (#837)step_discretize()
anddiscretize()
will automatically remove missing values ifkeep_na = TRUE
, removing the need to specifykeep_na = TRUE
andna.rm = TRUE
. (#982)prep()
andbake()
checks and errors if output ofbake.bake_*()
isn’t a tibble.step_date()
now has a locale argument that can be used to control how themonth
anddow
features are returned. (#1000)
recipes 0.2.0
CRAN release: 2022-02-18
New Steps
step_nnmf_sparse()
uses a different implementation of non-negative matrix factorization that is much faster and enables regularized estimation. (#790)step_dummy_extract()
creates multiple variables from a character variable by extracting elements using regular expressions and counting those elements.step_filter_missing()
can filter columns based on proportion of missingness (#270).step_percentile()
replaces the value of a variable with its percentile from the training set. (#765)
Improvements and Other Changes
All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect (#603, #531). For example, if a previous step removed all of the columns need for a later step, the recipe does not fail when it is estimated (with the exception of
step_mutate()
). The documentation in?selections
has been updated with advice for writing selectors when filtering steps are used. (#813)Fixed bug in
step_harmonic()
printing and changed defaults torole = "predictor"
andkeep_original_cols = FALSE
(#822).Improved the efficiency of computations for the Box-Cox transformation (#820).
When a feature extraction step (e.g.,
step_pca()
,step_ica()
, etc.) has zero components specified, thetidy()
method now lists the selected columns in theterms
column.Deprecation has started for
step_nnmf()
in favor ofstep_nnmf_sparse()
. (#790)Steps now have a dedicated subsection detailing what happens when
tidy()
is applied. (#876)step_ica()
now runsfastICA()
using a specific set of random numbers so that initialization is reproducible.tidy.recipe()
now returns a zero row tibble instead of an error when applied to a empty recipe. (#867)step_zv()
now has agroup
argument. The same filter is applied but looks for zero-variance within 1 or more columns that define groups. (#711)detect_step()
is no longer restricted to steps created in recipes (#869).New
extract_parameter_set_dials()
andextract_parameter_dials()
methods to extract parameter sets and single parameters fromrecipe
objects.step_other()
now allow for settingthreshold = 0
which will result in no othering. (#904)
Breaking Changes
step_ica()
now indirectly uses thefastICA
package since that package has increased their R version requirement. Recipe objects from previous versions will error when applied to new data. (#823)step_kpca*()
now directly use thekernlab
package. Recipe objects from previous versions will error when applied to new data.bake()
will now error ifnew_data
doesn’t contain all the required columns. (#491)
Developer
- The print methods have been internally changes to use
print_step()
instead ofprinter()
. This is done for a smoother transition to usecli
in the next version. (#871)
recipes 0.1.17
CRAN release: 2021-09-27
New Steps
Added new
step_harmonic()
(#702).Added a new step called
step_dummy_multi_choice()
, which will take multiple nominal variables and produces shared dummy variables. (#716)
Deprecation News
The deprecation for
step_upsample()
andstep_downsample()
has been escalated from a deprecation warning to a deprecation error; these functions are available in the themis package.Escalate deprecation for old versions of imputation steps (such as
step_bagimpute()
) from a soft deprecation to a regular deprecation; these imputation steps have new names likestep_impute_bag()
(#753).step_kpca()
was un-deprecated and gained thekeep_original_cols
argument.The deprecation of the
preserve
argument tostep_pls()
andstep_dummy()
was escalated from a soft deprecation to regular deprecation.The deprecation of the
options
argument tostep_nzv()
was escalated to a deprecation error.
Bug Fixes
Fix imputation steps for new data that is all
NA
, and generate a warning for recipes created under previous versions that cannot be imputed with this fix (#719).A bug was fixed where imputed values via bagged trees would have the wrong levels.
Improvements and Other Changes
The computations for the Yeo-Johnson transformation were made more efficient (#782).
New
recipes_eval_select()
which is a developer tool that is useful for creating new recipes steps. It powers the tidyselect semantics that are specific to recipes and supports the modern tidyselect API introduced in tidyselect 1.0.0. Additionally, the olderterms_select()
has been deprecated in favor of this new helper (#739).Speed-up/simplification to
step_spatialsign()
When only the terms attributes are desired from
model.frame
use the first row of data to improve speed and memory use (#726).Use Haversine formula for latitude-longitude pairs in
step_geodist()
(#725).Reorganize documentation for all recipe step
tidy
methods (#701).Generate warning when user attempts a Box-Cox transformation of non-positive data (@LiamBlake, #713).
step_logit()
gained an offset argument for cases where the input is either zero or one (#784)The
tidy()
methods for objects fromcheck_new_values()
,check_class()
andstep_nnmf()
are now exported.
recipes 0.1.16
CRAN release: 2021-04-16
New Steps
Added a new step called
step_indicate_na()
, which will create and append additional binary columns to the data set to indicate which observations are missing (#623).Added new
step_select()
(#199).
Bug Fixes
The
threshold
argument ofstep_pca()
is nowtunable()
(#534).Integer variables used in
step_profile()
are now kept as integers (and not doubles).Preserve multiple roles in
last_term_info
sobake()
can correctly respond tohas_roles
. (#632)The
tidy()
methods forstep_nnmf()
was rewritten since it was not great (#665), andstep_nnmf()
now no longer fully loads underlying packages (#685).
Improvements and Other Changes
Two new selectors that combine role and data type were added:
all_numeric_predictors()
andall_nominal_predictors()
. (#620)Changed the names of all imputation steps, for example, from
step_knnimpute()
orstep_medianimpute()
(old) tostep_impute_knn()
orstep_impute_median()
(new) (#614).-
Added
keep_original_cols
argument to several steps:-
step_pca()
,step_ica()
,step_nnmf()
,step_kpca_rbf()
,step_kpca_poly()
,step_pls()
,step_isomap()
which all default toFALSE
(#635). -
step_ratio()
,step_holiday()
,step_date()
which all default toTRUE
to maintain original behavior, as well asstep_dummy()
which defaults toFALSE
(#645).
-
Added
allow_rename
argument torecipes_eval_select()
(#646).Performance improvements for
step_bs()
andstep_ns()
. Theprep()
step no longer evaluates the basis functions on the training set and thebake()
steps only evaluates the basis functions once for each unique input value (#574)The
neighbors
parameter’s default range forstep_isomap()
was changed to be 20-80.The deprecation for
step_upsample()
andstep_downsample()
has been escalated from a soft deprecation to a regular deprecation; these functions are available in the themis package.Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.
recipes 0.1.15
CRAN release: 2020-11-11
The full tidyselect DSL is now allowed inside recipes
step_*()
functions. This includes the operators&
,|
,-
and!
and the newwhere()
function. Additionally, the restriction preventing user defined selectors from being used has been lifted (#572).If steps that drop/add variables are skipped when baking the test set, the resulting column ordering of the baked test set will now be relative to the original recipe specification rather than relative to the baked training set. This is often more intuitive.
More infrastructure work to make parallel processing on Windows less buggy with PSOCK clusters
fully_trained()
now returnsFALSE
when an unprepped recipe is used.
recipes 0.1.14
CRAN release: 2020-10-17
prep()
gained an option to print a summary of which columns were added and/or removed during execution.To reduce confusion between
bake()
andjuice()
, the latter is superseded in favor of usingbake(object, new_data = NULL)
. Thenew_data
argument now has no default, so aNULL
value must be explicitly used in order to emulate the results ofjuice()
.juice()
will remain in the package (and used internally) but most communication and training will usebake(object, new_data = NULL)
. (#543)Tim Zhou added a step to use linear models for imputation (#555)
recipes 0.1.13
CRAN release: 2020-06-23
Breaking Changes
step_filter()
,step_slice()
,step_sample()
, andstep_naomit()
had their defaults forskip
changed toTRUE
. In the vast majority of applications, these steps should not be applied to the test or assessment sets.tidyr
version 1.0.0 or later is now required.
Other Changes
step_pls()
was changed so that it uses the Bioconductor mixOmics package. Objects created with previous versions ofrecipes
can still usejuice()
andbake()
. With the current version, the categorical outcomes can be used but now multivariate models do not. Also, the new method allows for sparse results.As suggested by @StefanBRas,
step_ica()
now defaults to the C engine (#518)Avoided partial matching on
seq()
arguments in internal functions.Improved error messaging, for example when a user tries to
prep()
a tuneable recipe.step_upsample()
andstep_downsample()
are soft deprecated in recipes as they are now available in the themis package. They will be removed in the next version.step_zv()
now handlesNA
values so that variables with zero variance plus are removed.The selectors
all_of()
andany_of()
can now be used in step selections (#477).The
tune
pacakge can now use recipes withcheck
operations (but also requirestune
>= 0.1.0.9000).The
tidy
method forstep_pca()
now has an option for returning the variance statistics for each component.
recipes 0.1.12
CRAN release: 2020-05-01
- Some S3 methods were not being registered previously. This caused issues in R 4.0.
recipes 0.1.11
CRAN release: 2020-04-30
Other Changes
- While
recipes
does not directly depend ondials
, it has several S3 methods for generics indials
. Version 0.0.5 ofdials
added stricter validation for these methods, so changes were required forrecipes
.
New Operations
-
step_cut()
enables you to create a factor from a numeric based on provided break (contributed by Edwin Thoen)
recipes 0.1.10
CRAN release: 2020-03-18
Breaking Changes
- renamed
yj_trans()
toyj_transform()
to avoid conflicts.
Other Changes
Added flexible naming options for new columns created by
step_depth()
andstep_classdist()
(#262).Small changes for base R’s
stringsAsFactors
change.
recipes 0.1.9
CRAN release: 2020-01-07
Delayed S3 method registration for
tune::tunable()
methods that live in recipes will now work correctly on R >=4.0.0 (#439, tidymodels/tune#146).step_relevel()
added.
recipes
0.1.8
CRAN release: 2019-12-18
Breaking Changes
The imputation steps do not change the data type being imputed now. Previously, if the data were integer, the data would be changed to numeric (for some step types). The change is breaking since the underlying data of imputed values are now saved as a list instead of a vector (for some step types).
The data sets were moved to the new
modeldata
package.step_num2factor()
was rewritten due to a bug that ignored the user-supplied levels (#425). The results of thetransform
argument are now required to be a function andlevels
must now be supplied.
Other Changes
Using a minus in the formula to
recipes()
is no longer allowed (it didn’t remove variables anyway).step_rm()
orupdate_role()
can be used instead.When using a selector that returns no columns,
juice()
andbake()
will now return a tibble with as many rows as the original template data or thenew_data
respectively. This is more consistent with how selectors work in dplyr (#411).Code was added to explicitly register
tunable
methods whenrecipes
is loaded. This is required because of changes occurring in R 4.0.check_class()
checks if a variable is of the designated class. Class is either learned from the train set or provided in the check. (contributed by Edwin Thoen)step_normalize()
andstep_scale()
gained afactor
argument with values of 1 or 2 that can scale the standard deviations used to transform the data. (#380)bake()
now produces a tibble with columns in the same order asjuice()
(#365)
recipes
0.1.7
CRAN release: 2019-09-15
Release driven by changes in tidyr
(v 1.0.0).
Breaking Changes
format_selector()
’s wdth
argument has been renamed to width
(#250).
New Operations
-
step_mutate_at()
,step_rename()
, andstep_rename_at()
were added.
Other Changes
The use of
varying()
will be deprecated in favor of an upcoming functiontune()
. No changes are need in this version, but subsequent versions will work withtune()
.format_ch_vec()
andformat_selector()
are now exported (#250).check_new_values
breaksbake
if variable contains values that were not observed in the train set (contributed by Edwin Thoen)When no outcomes are in the recipe, using
juice(object, all_outcomes()
andbake(object, new_data, all_outcomes()
will return a tibble with zero rows and zero columns (instead of failing). (#298). This will also occur when the selectors select no columns.As alternatives to
step_kpca()
, two separate steps were added calledstep_kpca_rbf()
andstep_kpca_poly()
. The use ofstep_kpca()
will print a deprecation message that it will be going away.step_nzv()
andstep_poly()
had arguments promoted out of theiroptions
slot.options
can be used in the short term but is deprecated.step_downsample()
will replace theratio
argument withunder_ratio
andstep_upsample()
will replace it withover_ratio
.ratio
still works (for now) but issues a deprecation message.step_discretize()
has arguments moved out ofoptions
too; the main arguments are nownum_breaks
(instead ofcuts
) andmin_unique
. Again, deprecation messages are issued with the old argument structure.Models using the
dimRed
package (step_kpca()
,step_isomap()
, andstep_nnmf()
) would silently fail if the projection method failed. An error is issued now.Methods were added for a future generic called
tunable()
. This outlines which parameters in a step can/could be tuned.
recipes
0.1.6
CRAN release: 2019-07-02
Release driven by changes in rlang
.
Breaking Changes
Since 2018, a warning has been issued when the wrong argument was used in
bake(recipe, newdata)
. The depredation period is over andnew_data
is officially required.Previously, if
step_other()
did not collapse any levels, it would still add an “other” level to the factor. This would lump new factor levels into “other” when data were baked (asstep_novel()
does). This no longer occurs since it was inconsistent with?step_other
, which said that
“If no pooling is done the data are unmodified”.
New Operations
-
step_normalize()
centers and scales the data (if you are, like Max, too lazy to use two separate steps). -
step_unknown()
will convert missing data in categorical columns to “unknown” and update factor levels.
Other Changes
If
threshold
argument ofstep_other
is greater than one then it specifies the minimum sample size before the levels of the factor are collapsed into the “other” category. #289step_knnimpute()
can now pass two options to the underlying knn code, including the number of threads (#323).Due to changes by CRAN,
step_nnmf()
only works on versions of R >= 3.6.0 due to dependency issues.step_dummy()
andstep_other()
are now tolerant to cases where that step’s selectors do not capture any columns. In this case, no modifications to the data are made. (#290, #348)step_dummy()
can now retain the original columns that are used to make the dummy variables. (#328)step_other()
’s print method only reports the variables with collapsed levels (as opposed to any column that was tested to see if it needed collapsing). (#338)step_pca()
,step_kpca()
,step_ica()
,step_nnmf()
,step_pls()
, andstep_isomap()
now accept zero components. In this case, the original data are returned.
recipes
0.1.5
CRAN release: 2019-03-21
Small release driven by changes in sample()
in the current r-devel.
Other Changes
A new vignette discussing roles has been added.
To provide infrastructure for finalizing varying parameters, an
update()
method for recipe steps has been added. This allows users to alter information in steps that have not yet been trained.step_interact
will no longer fail if an interaction contains an interaction using column that has been previously filtered from the data. A warning is issued when this happens and no interaction terms will be created.step_corr
was made more fault tolerant for cases where the data contain a zero-variance column or columns with missing values.Set the embedded environment to NULL in
prep.step_dummy
to reduce the file size of serialized recipe class objects when usingsaveRDS
.
Breaking Changes
- The
tidy
method forstep_dummy
now returns the original variable and the levels of the future dummy variables.
Bug Fixes
- Updating the role of new columns generated by a recipe step no longer also updates
NA
roles of existing columns (#296).
recipes
0.1.4
CRAN release: 2018-11-19
Breaking Changes
-
Several argument names were changed to be consistent with other
tidymodels
packages (e.g.dials
) and the general tidyverse naming conventions.-
K
instep_knnimpute
was changed toneighbors
.step_isomap
had the number of neighbors promoted to a main argument calledneighbors
-
step_pca
,step_pls
,step_kpca
,step_ica
now usenum_comp
instead ofnum
. ,step_isomap
usesnum_terms
instead ofnum
. -
step_bagimpute
movednbagg
out of the options and into a main argumenttrees
. -
step_bs
andstep_ns
has degrees of freedom promoted to a main argument with namedeg_free
. Also,step_bs
haddegree
promoted to a main argument. -
step_BoxCox
andstep_YeoJohnson
hadnunique
change tonum_unique
. -
bake
,juice
and other functions hasnewdata
changed tonew_data
. For this version only, usingnewdata
will only result in a wanring. - Several steps had
na.rm
changed tona_rm
. -
prep
and a few steps hadstringsAsFactors
changed tostrings_as_factors
.
-
add_role()
can now only add new additional roles. To alter existing roles, useupdate_role()
. This change also allows for the possibility of having multiple roles/types for one variable. #221All steps gain an
id
field that will be used in the future to reference other steps.The
retain
option toprep
is now defaulted toTRUE
. Ifverbose = TRUE
, the approximate size of the data set is printed. #207
New Operations
-
step_integer
converts data to ordered integers similar toLabelEncoder
#123 and #185 -
step_geodist
can be used to calculate the distance between geocodes and a single reference location. -
step_arrange
,step_filter
,step_mutate
,step_sample
, andstep_slice
implement theirdplyr
analogs. -
step_nnmf
computes the non-negative matrix factorization for data.
Other Changes
- The
rsample
functionprepper
was moved torecipes
(issue). - A number of packages were moved from “Imports” to “Suggests” to reduce the install footprint. A function was added to prompt the user to install the needed packages when the relevant steps are invoked.
-
step_step_string2factor
will now accept factors and leave them as-is. -
step_knnimpute
now excludes missing data in the variable to be imputed from the nearest-neighbor calculation. This would have resulted in some missing data to not be imputed (i.e. return another missing value). -
step_dummy
now produces a warning (instead of failing) when non-factor columns are selected. Only factor columns are used; no conversion is done for character data. issue #186 -
dummy_names
gained a separator argument. issue #183 -
step_downsample
andstep_upsample
now haveseed
arguments for more control over randomness. -
broom
is no longer used to get thetidy
generic. These are now contained in thegenerics
package. - When a recipe is prepared, a running list of all columns is created and the last known use of each column is kept. This is to avoid bugs when a step that is skipped removes columns. issue #239
recipes
0.1.3
CRAN release: 2018-06-16
New Operations
check_range
breaksbake
if variable range in new data is outside the range that was learned from the train set (contributed by Edwin Thoen)step_lag
can lag variables in the data set (contributed by Alex Hayes).step_naomit
removes rows with missing data for specific columns (contributed by Alex Hayes).step_rollimpute
can be used to impute data in a sequence or series by estimating their values within a moving window.step_pls
can conduct supervised feature extraction for predictors.
Other Changes
step_log
gained anoffset
argument.step_log
gained asigned
argument (contributed by Edwin Thoen).The internal functions
sel2char
andprinter
have been exported to enable other packages to contain steps.When training new steps after some steps have been previously trained, the
retain = TRUE
option should be set on previous invocations ofprep
.-
For
step_dummy
:- It can now compute the entire set of dummy variables per factor predictor using the
one_hot = TRUE
option. Thanks to Davis Vaughan. - The
contrast
option was removed. The step uses the global option for contrasts. - `The step also produces missing indicator variables when the original factor has a missing value
- It can now compute the entire set of dummy variables per factor predictor using the
step_other
will now convert novel levels of the factor to the “other” level.step_bin2factor
now has an option to choose how the values are translated to the levels (contributed by Michael Levy).bake
andjuice
can now export basic data frames.The
okc
data were updated with two additional columns.
recipes
0.1.2
CRAN release: 2018-01-11
General Changes
Edwin Thoen suggested adding validation checks for certain data characteristics. This fed into the existing notion of expanding
recipes
beyond steps (see the non-step steps project). A new set of operations, calledchecks
, can now be used. These should throw an informative error when the check conditions are not met and return the existing data otherwise.Steps now have a
skip
option that will not apply preprocessing whenbake
is used. See the article on skipping steps for more information.
New Operations
check_missing
will validate that none of the specified variables contain missing data.detect_step
can be used to check if a recipe contains a particular preprocessing operation.step_num2factor
can be used to convert numeric data (especially integers) to factors.step_novel
adds a new factor level to nominal variables that will be used when new data contain a level that did not exist when the recipe was prepared.step_profile
can be used to generate design matrix grids for prediction profile plots of additive models where one variable is varied over a grid and all of the others are fixed at a single value.step_downsample
andstep_upsample
can be used to change the number of rows in the data based on the frequency distributions of a factor variable in the training set. By default, this operation is only applied to the training set;bake
ignores this operation.step_naomit
drops rows when specified columns containNA
, similar totidyr::drop_na
.step_lag
allows for the creation of lagged predictor columns.
recipes
0.1.1
CRAN release: 2017-11-20
- The default selectors for
bake
was changed fromall_predictors()
toeverything()
. - The
verbose
option forprep
is now defaulted toFALSE
-
A bug in
step_dummy
was fixed that makes sure that the correct binary variables are generated despite the levels or values of the incoming factor. Also,step_dummy
now requires factor inputs. -
step_dummy
also has a new default naming function that works better for factors. However, there is an extra argument (ordinal
) now to the functions that can be passed tostep_dummy
. -
step_interact
now allows for selectors (e.g.all_predictors()
orstarts_with("prefix")
to be used in the interaction formula. -
step_YeoJohnson
gained anna.rm
option. -
dplyr::one_of
was added to the list of selectors. -
step_bs
adds B-spline basis functions. -
step_unorder
converts ordered factors to unordered factors. -
step_count
counts the number of instances that a pattern exists in a string. -
step_string2factor
andstep_factor2string
can be used to move between encodings. -
step_lowerimpute
is for numeric data where the values cannot be measured below a specific value. For these cases, random uniform values are used for the truncated values. - A step to remove simple zero-variance variables was added (
step_zv
). - A series of
tidy
methods were added for recipes and many (but not all) steps. - In
bake.recipe
, the argumentnewdata
is now without a default. -
bake
andjuice
can now save the final processed data set in sparse format. Note that, as the steps are processed, a non-sparse data frame is used to store the results. - A formula method was added for recipes to get a formula with the outcome(s) and predictors based on the trained recipe.
recipes
0.0.1.9003
- Two of the main functions changed names.
learn
has becomeprepare
andprocess
has becomebake
recipes
0.0.1.9002
recipes
0.0.1.9001
- The class system for
recipe
objects was changed so that pipes can be used to create the recipe with a formula. -
process.recipe
lost therole
argument in factor of a general set of selectors. If no selector is used, all the predictors are returned. - Two steps for simple imputation using the mean or mode were added.