Change output of pred_wflw column #189

spsanderson · 2023-12-12T01:50:16Z

spsanderson
Dec 12, 2023
Maintainer

I think it would be a good idea to change the output of the pred_wflw column from just predicting the values for the testing set to a full tibble output of the training, testing and actuals. So something like:

.data_type	.value
actual	23.4
actual	13.5
...	...
training	12.5
training	13.7
...	15.6
testing	19.5

Thoughts?

mayer79 · 2023-12-13T17:46:15Z

mayer79
Dec 13, 2023

Or provide an argument reshape = c("wide", "long")?

0 replies

spsanderson · 2023-12-13T18:23:19Z

spsanderson
Dec 13, 2023
Maintainer Author

I could do that too as an extractor I think as currently no parameters flow through to the function that puts the pred_wflw column together, it is returned as a list so maybe an extractor function with a parameter, thoughts?

…

On Wed, Dec 13, 2023 at 12:46 PM Michael Mayer ***@***.***> wrote: Or provide an argument reshape = c("wide", "long")? — Reply to this email directly, view it on GitHub <#189 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPCNS6PRTJKYRKNXLPVZNDYJHSXFAVCNFSM6AAAAABAQWEPXSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TQNBVGQ4TG> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Steven P Sanderson II, MPH Book on Lulu <http://goo.gl/lmrlFI> Personal Site <http://www.spsanderson.com>

0 replies

spsanderson · 2023-12-13T19:33:19Z

spsanderson
Dec 13, 2023
Maintainer Author

This gives the tibbles, which I can then make a new function that will reshape if desired upon extraction since it will be returned as a list object in the result tibble from the regression/classification functions:

Function:

#' Internals Safely Make Predictions on a Fitted Workflow from Model Spec tibble
#'
#' @family Internals
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @description Safely Make predictions on a fitted workflow from a model spec tibble.
#'
#' @details Create predictions on a fitted `parnsip` model from a `workflow` object.
#'
#' @param .model_tbl The model table that is generated from a function like
#' `fast_regression_parsnip_spec_tbl()`, must have a class of "tidyaml_mod_spec_tbl".
#' This is meant to be used after the function `internal_make_fitted_wflw()` has been
#' run and the tibble has been saved.
#' @param .splits_obj The splits object from the auto_ml function. It is internal
#' to the `auto_ml_` function.
#'
#' @examples
#' library(recipes, quietly = TRUE)
#' library(dplyr, quietly = TRUE)
#'
#' mod_spec_tbl <- fast_regression_parsnip_spec_tbl(
#'   .parsnip_eng = c("lm","glm","gee"),
#'   .parsnip_fns = "linear_reg"
#' )
#'
#' rec_obj <- recipe(mpg ~ ., data = mtcars)
#' splits_obj <- create_splits(mtcars, "initial_split")
#'
#' mod_tbl <- mod_spec_tbl |>
#'   mutate(wflw = full_internal_make_wflw(mod_spec_tbl, rec_obj))
#'
#' mod_fitted_tbl <- mod_tbl |>
#'   mutate(fitted_wflw = internal_make_fitted_wflw(mod_tbl, splits_obj))
#'
#' internal_make_wflw_predictions(mod_fitted_tbl, splits_obj)
#'
#' @return
#' A list object of workflows.
#'
#' @name internal_make_wflw_predictions
NULL

#' @export
#' @rdname internal_make_wflw_predictions

# Safely make predictions on fitted workflow
internal_make_wflw_predictions <- function(.model_tbl, .splits_obj){
  
  # Tidyeval ----
  model_tbl <- .model_tbl
  splits_obj <- .splits_obj
  col_nms <- colnames(model_tbl)
  
  # Checks ----
  if (!inherits(model_tbl, "tidyaml_mod_spec_tbl")){
    rlang::abort(
      message = "'.model_tbl' must inherit a class of 'tidyaml_mod_spec_tbl",
      use_cli_format = TRUE
    )
  }
  
  if (!"fitted_wflw" %in% col_nms){
    rlang::abort(
      message = "Missing the column 'wflw'",
      use_cli_format = TRUE
    )
  }
  
  if (!".model_id" %in% col_nms){
    rlang::abort(
      message = "Missing the column '.model_id'",
      use_cli_format = TRUE
    )
  }
  
  # Manipulation
  # Make a group split object list
  model_factor_tbl <- model_tbl |>
    dplyr::mutate(.model_id = forcats::as_factor(.model_id))
  
  models_list <- model_factor_tbl |>
    dplyr::group_split(.model_id)
  
  # Make the predictions on the fitted workflow object using purrr imap
  wflw_preds_list <- models_list |>
    purrr::imap(
      .f = function(obj, id){
        
        # Pull the fitted workflow column and then pluck it
        fitted_wflw = obj |> dplyr::pull(7) |> purrr::pluck(1)
        
        # Create a safe stats::predict
        safe_stats_predict <- purrr::safely(
          stats::predict,
          otherwise = NULL,
          quiet = TRUE
        )
        
        # Return the predictions
        ret <- safe_stats_predict(
          fitted_wflw,
          new_data = rsample::testing(splits_obj$splits)
        )
        
        if (!is.null(ret$error)) message(stringr::str_glue("{ret$error}"))
        
        # Get testing predictions
        test_res <- ret |> purrr::pluck("result")
        test_res <- test_res |>
          dplyr::mutate(.data_type = "testing") |>
          dplyr::select(.data_type, .pred) |>
          purrr::set_names(c(".data_type", ".value"))
        
        # Get training predictions
        train_res <- fitted_wflw |> 
          broom::augment(new_data = rsample::training(splits_obj$splits)) |>
          dplyr::mutate(.data_type = "training") |>
          dplyr::select(.data_type, .pred) |>
          purrr::set_names(c(".data_type", ".value"))
        
        # Get actual outcome values
        pred_var <- rec_obj$term_info |> filter(role == "outcome") |> pull(variable)
        actual_res <- dplyr::as_tibble(rec_obj$template[[pred_var]]) |>
          dplyr::mutate(.data_type = "actual") |>
          dplyr::select(.data_type, value) |>
          purrr::set_names(c(".data_type", ".value"))
        
        res <- base::rbind(actual_res, train_res, test_res)
        return(res)
      }
    )
  
  return(wflw_preds_list)
}

Example Output:

> internal_make_wflw_predictions(mod_fitted_tbl, splits_obj)
[[1]]
# A tibble: 64 × 2
   .data_type .value
   <chr>       <dbl>
 1 actual       21  
 2 actual       21  
 3 actual       22.8
 4 actual       21.4
 5 actual       18.7
 6 actual       18.1
 7 actual       14.3
 8 actual       24.4
 9 actual       22.8
10 actual       19.2
# ℹ 54 more rows
# ℹ Use `print(n = ...)` to see more rows

[[2]]
# A tibble: 64 × 2
   .data_type .value
   <chr>       <dbl>
 1 actual       21  
 2 actual       21  
 3 actual       22.8
 4 actual       21.4
 5 actual       18.7
 6 actual       18.1
 7 actual       14.3
 8 actual       24.4
 9 actual       22.8
10 actual       19.2
# ℹ 54 more rows
# ℹ Use `print(n = ...)` to see more rows

[[3]]
# A tibble: 64 × 2
   .data_type .value
   <chr>       <dbl>
 1 actual       21  
 2 actual       21  
 3 actual       22.8
 4 actual       21.4
 5 actual       18.7
 6 actual       18.1
 7 actual       14.3
 8 actual       24.4
 9 actual       22.8
10 actual       19.2
# ℹ 54 more rows
# ℹ Use `print(n = ...)` to see more rows

0 replies

spsanderson · 2023-12-18T14:11:58Z

spsanderson
Dec 18, 2023
Maintainer Author

This is now done

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change output of pred_wflw column #189

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Change output of pred_wflw column #189

spsanderson Dec 12, 2023 Maintainer

Replies: 4 comments

mayer79 Dec 13, 2023

spsanderson Dec 13, 2023 Maintainer Author

spsanderson Dec 13, 2023 Maintainer Author

spsanderson Dec 18, 2023 Maintainer Author

spsanderson
Dec 12, 2023
Maintainer

mayer79
Dec 13, 2023

spsanderson
Dec 13, 2023
Maintainer Author

spsanderson
Dec 13, 2023
Maintainer Author

spsanderson
Dec 18, 2023
Maintainer Author