This family of functions allows using AMR-specific data types such as <sir> and <mic> inside tidymodels pipelines.
Usage
all_sir()
all_sir_predictors()
all_mic()
all_mic_predictors()
all_disk()
all_disk_predictors()
step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL,
skip = FALSE, id = recipes::rand_id("mic_log2"))
step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
skip = FALSE, id = recipes::rand_id("sir_numeric"))Details
You can read more in our online AMR with tidymodels introduction.
Tidyselect helpers include:
all_sir()andall_sir_predictors()to select<sir>columnsall_mic()andall_mic_predictors()to select<mic>columnsall_disk()andall_disk_predictors()to select<disk>columns
Pre-processing pipeline steps include:
step_sir_numeric()to convert SIR columns to numeric (viaas.numeric()), to be used withall_sir_predictors():"S"= 1,"I"/"SDD"= 2,"R"= 3. All other values are renderedNA. Keep this in mind for further processing, especially if the model does not allow forNAvalues.step_mic_log2()to convert MIC columns to numeric (viaas.numeric()) and apply a log2 transform, to be used withall_mic_predictors()
These steps integrate with recipes::recipe() and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models.
Examples
if (require("tidymodels")) {
# The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
# Presence of ESBL genes was predicted based on raw MIC values.
# example data set in the AMR package
esbl_isolates
# Prepare a binary outcome and convert to ordered factor
data <- esbl_isolates %>%
mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))
# Split into training and testing sets
split <- initial_split(data)
training_data <- training(split)
testing_data <- testing(split)
# Create and prep a recipe with MIC log2 transformation
mic_recipe <- recipe(esbl ~ ., data = training_data) %>%
# Optionally remove non-predictive variables
remove_role(genus, old_role = "predictor") %>%
# Apply the log2 transformation to all MIC predictors
step_mic_log2(all_mic_predictors()) %>%
# And apply the preparation steps
prep()
# View prepped recipe
mic_recipe
# Apply the recipe to training and testing data
out_training <- bake(mic_recipe, new_data = NULL)
out_testing <- bake(mic_recipe, new_data = testing_data)
# Fit a logistic regression model
fitted <- logistic_reg(mode = "classification") %>%
set_engine("glm") %>%
fit(esbl ~ ., data = out_training)
# Generate predictions on the test set
predictions <- predict(fitted, out_testing) %>%
bind_cols(out_testing)
# Evaluate predictions using standard classification metrics
our_metrics <- metric_set(
accuracy,
recall,
precision,
sensitivity,
specificity,
ppv,
npv
)
metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)
# Show performance
metrics
}
#> Loading required package: tidymodels
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.5.0 ──
#> ✔ broom 1.0.12 ✔ rsample 1.3.2
#> ✔ dials 1.4.3 ✔ tailor 0.1.0
#> ✔ infer 1.1.0 ✔ tidyr 1.3.2
#> ✔ modeldata 1.5.1 ✔ tune 2.1.0
#> ✔ parsnip 1.5.0 ✔ workflows 1.3.0
#> ✔ purrr 1.2.2 ✔ workflowsets 1.1.1
#> ✔ recipes 1.3.2 ✔ yardstick 1.4.0
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ recipes::step() masks stats::step()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> # A tibble: 7 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.912
#> 2 recall binary 0.902
#> 3 precision binary 0.917
#> 4 sensitivity binary 0.902
#> 5 specificity binary 0.922
#> 6 ppv binary 0.917
#> 7 npv binary 0.908