Skip to contents

This family of functions allows using AMR-specific data types such as <sir> and <mic> inside tidymodels pipelines.

Usage

all_sir()

all_sir_predictors()

all_mic()

all_mic_predictors()

all_disk()

all_disk_predictors()

step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL,
  skip = FALSE, id = recipes::rand_id("mic_log2"))

step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
  skip = FALSE, id = recipes::rand_id("sir_numeric"))

Details

You can read more in our online AMR with tidymodels introduction.

Tidyselect helpers include:

  • all_sir() and all_sir_predictors() to select <sir> columns

  • all_mic() and all_mic_predictors() to select <mic> columns

  • all_disk() and all_disk_predictors() to select <disk> columns

Pre-processing pipeline steps include:

  • step_sir_numeric() to convert SIR columns to numeric (via as.numeric()), to be used with all_sir_predictors(): "S" = 1, "I"/"SDD" = 2, "R" = 3. All other values are rendered NA. Keep this in mind for further processing, especially if the model does not allow for NA values.

  • step_mic_log2() to convert MIC columns to numeric (via as.numeric()) and apply a log2 transform, to be used with all_mic_predictors()

These steps integrate with recipes::recipe() and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models.

Examples

if (require("tidymodels")) {
  # The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
  # Presence of ESBL genes was predicted based on raw MIC values.


  # example data set in the AMR package
  esbl_isolates

  # Prepare a binary outcome and convert to ordered factor
  data <- esbl_isolates %>%
    mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))

  # Split into training and testing sets
  split <- initial_split(data)
  training_data <- training(split)
  testing_data <- testing(split)

  # Create and prep a recipe with MIC log2 transformation
  mic_recipe <- recipe(esbl ~ ., data = training_data) %>%
    # Optionally remove non-predictive variables
    remove_role(genus, old_role = "predictor") %>%
    # Apply the log2 transformation to all MIC predictors
    step_mic_log2(all_mic_predictors()) %>%
    # And apply the preparation steps
    prep()

  # View prepped recipe
  mic_recipe

  # Apply the recipe to training and testing data
  out_training <- bake(mic_recipe, new_data = NULL)
  out_testing <- bake(mic_recipe, new_data = testing_data)

  # Fit a logistic regression model
  fitted <- logistic_reg(mode = "classification") %>%
    set_engine("glm") %>%
    fit(esbl ~ ., data = out_training)

  # Generate predictions on the test set
  predictions <- predict(fitted, out_testing) %>%
    bind_cols(out_testing)

  # Evaluate predictions using standard classification metrics
  our_metrics <- metric_set(
    accuracy,
    recall,
    precision,
    sensitivity,
    specificity,
    ppv,
    npv
  )
  metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)

  # Show performance
  metrics
}
#> Loading required package: tidymodels
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.5.0 ──
#>  broom        1.0.12      rsample      1.3.2 
#>  dials        1.4.3       tailor       0.1.0 
#>  infer        1.1.0       tidyr        1.3.2 
#>  modeldata    1.5.1       tune         2.1.0 
#>  parsnip      1.5.0       workflows    1.3.0 
#>  purrr        1.2.2       workflowsets 1.1.1 
#>  recipes      1.3.2       yardstick    1.4.0 
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#>  purrr::discard() masks scales::discard()
#>  dplyr::filter()  masks stats::filter()
#>  dplyr::lag()     masks stats::lag()
#>  recipes::step()  masks stats::step()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> # A tibble: 7 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 accuracy    binary         0.912
#> 2 recall      binary         0.902
#> 3 precision   binary         0.917
#> 4 sensitivity binary         0.902
#> 5 specificity binary         0.922
#> 6 ppv         binary         0.917
#> 7 npv         binary         0.908