Package 'veesa' reference manual

Title:	Pipeline for Explainable Machine Learning with Functional Data
Description:	Implements the Variable importance Explainable Elastic Shape Analysis pipeline for explainable machine learning with functional data inputs. Converts training and testing data functional inputs to elastic shape analysis principal components that account for vertical and/or horizontal variability. Computes feature importance to identify important principal components and visualizes variability captured by functional principal components. See Goode et al. (2025) <doi:10.48550/arXiv.2501.07602> for technical details about the methodology.
Authors:	Katherine Goode [cre, aut], J. Derek Tucker [aut], Sandia National Laboratories [cph, fnd]
Maintainer:	Katherine Goode <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.6
Built:	2025-02-17 06:02:44 UTC
Source:	https://github.com/cran/veesa

Obtain PC directions with centered warping functions

Description

The function 'prep_training_data' does not center the warping functions, which leads to issues when visualizing joint and horizontal principal component directions. This function aligns the principal directions for improved interpretability of the principal directions. Currently, only alignment for jfPCA has been implemented.

Usage

align_pcdirs(train_obj)

align_pcdirs(train_obj)
align_pcdirs(train_obj)

align_pcdirs(train_obj)

Arguments

train_obj

Output object from 'prep_training_data' (jfpca only)

Value

List with the same structure as 'prep_training_data', but the principal directions are replaced with the aligned version and gamI is included in the fpca_res object.

Center warping functions

Description

The function 'prep_training_data' does not center the warping functions. For visualizing the aligned and warping functions, it can be easier to look at centered versions. This function centers the warping functions and corresponding aligned functions.

Usage

center_warping_funs(train_obj)
center_warping_funs(train_obj)

Arguments

train_obj

Output object from 'prep_training_data'

Value

Object with the same structure as 'train_obj' but qn, fn, and gam have been replaced by centered versions

Compute permutation feature importance (PFI)

Description

Function for computing PFI for a given model and dataset (training or testing)

Usage

compute_pfi(x, y, f, K, metric, eps = 1e-15)
compute_pfi(x, y, f, K, metric, eps = 1e-15)

Arguments

`x`	Dataset with n observations and p variables (training or testing)
`y`	Response variable (or matrix) associated with x
`f`	Model to explain
`K`	Number of repetitions to perform for PFI
`metric`	Metric used to compute PFI (choose from "accuracy", "logloss", and "nmse")
`eps`	Log loss is undefined for p = 0 or p = 1, so probabilities are

Value

List containing

pfi: Vector of PFI values (averaged over replicates)
pfi_single_reps: Matrix of containing the feature importance values from each replicate (rows associated with reps; columns associated with data observations)

Examples

# Load packages
library(dplyr)
library(tidyr)
library(randomForest)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

# Obtain response variable values
model_output <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(id, group) |>
  distinct()

# Prepare data for model
model_data <-
  veesa_train$fpca_res$coef |>
  data.frame() |>
  mutate(group = factor(model_output$group))

# Train model
set.seed(20210301)
rf <-
  randomForest(
    formula = group ~ .,
    data = model_data
  )

# Compute feature importance values
pfi <-
  compute_pfi(
    x = model_data |> select(-group),
    y = model_data$group,
    f = rf,
    K = 1,
    metric = "accuracy"
 )
# Load packages
library(dplyr)
library(tidyr)
library(randomForest)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

# Obtain response variable values
model_output <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(id, group) |>
  distinct()

# Prepare data for model
model_data <-
  veesa_train$fpca_res$coef |>
  data.frame() |>
  mutate(group = factor(model_output$group))

# Train model
set.seed(20210301)
rf <-
  randomForest(
    formula = group ~ .,
    data = model_data
  )

# Compute feature importance values
pfi <-
  compute_pfi(
    x = model_data |> select(-group),
    y = model_data$group,
    f = rf,
    K = 1,
    metric = "accuracy"
 )

Plot principal component directions

Description

Function for plotting the functional PC directions

Usage

plot_pc_directions(
  fpcs,
  fdasrvf,
  fpca_method,
  times = NULL,
  digits = 0,
  alpha = 1,
  nrow = 1,
  linesizes = NULL,
  linetype = TRUE,
  freey = FALSE
)
plot_pc_directions(
  fpcs,
  fdasrvf,
  fpca_method,
  times = NULL,
  digits = 0,
  alpha = 1,
  nrow = 1,
  linesizes = NULL,
  linetype = TRUE,
  freey = FALSE
)

Arguments

`fpcs`	Vector of numbers identifying the PCs to include in the plot
`fdasrvf`	Object output from jointFPCA, horizFPCA, or vertFPCA
`fpca_method`	Character string specifying the type of elastic fPCA method to use ('jfpca', 'hfpca', or 'vfpca')
`times`	Optional vector of times (if not included, times will be represented on the interval from 0 to 1)
`digits`	Number of digits to print in the title for the proportion of variability explained by a PC
`alpha`	Vector of alpha values associated with lines in plot (length must match number of lines in plot)
`nrow`	Number of rows to use when creating a grid of plots
`linesizes`	Vector of line widths associated with lines in plot (length must match number of lines in plot)
`linetype`	Vector of line types (e.g., "solid" or "dashed") associated with lines in plot (length must match number of lines in plot)
`freey`	Indicator for whether y-axis should be freed across facets

Value

ggplot2 plot of specified principal component directions

Examples

# Load packages
library(dplyr)
library(tidyr)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

# Plot principal directions of PC1
plot_pc_directions(
  fpcs = 1,
  fdasrvf = veesa_train$fpca_res,
  fpca_method = "jfpca",
  times = -shifted_peaks_times,
  linesizes = rep(0.75,5),
  alpha = 0.9
 )
# Load packages
library(dplyr)
library(tidyr)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

# Plot principal directions of PC1
plot_pc_directions(
  fpcs = 1,
  fdasrvf = veesa_train$fpca_res,
  fpca_method = "jfpca",
  times = -shifted_peaks_times,
  linesizes = rep(0.75,5),
  alpha = 0.9
 )

Align test data and apply fPCA using elastic method applied to training data

Description

Applies steps 2 and 3 of the VEESA pipeline (alignment and elastic fPCA (jfpca, hfpca, or vfpca)) to the testing data based on the training data prepared using "prep_training_data".

Usage

prep_testing_data(f, time, train_prep, optim_method = "DP")
prep_testing_data(f, time, train_prep, optim_method = "DP")

Arguments

`f`	Matrix (size M x N) of test data with N functions and M samples.
`time`	Vector of size M describing the sample points
`train_prep`	Object returned from applying "prep_training_data" to training data.
`optim_method`	Method used for optimization when computing the Karcher mean. "DP", "DPo", and "RBFGS".

Value

List containing (varies slightly based on fpca method used):

time: vector of times when functions are observed (length of M)
f0: original test data functions - matrix (M x N) of N functions with M samples
fn: aligned test data functions - similar structure to f0
q0: original test data SRSFs - similar structure to f0
qn: aligned test data SRSFs - similar structure to f0
mqn: training data SRSF mean (test data functions are aligned to this function)
gam: test data warping functions - similar structure to f0
coef: test data principal component coefficients
psi: test data warping function SRVFs - similar structure to f0 (jfpca and hfpca only)
nu: test data shooting functions - similar structure to f0 (jfpca and hfpca only)
g: test data combination of aligned and shooting functions (jfpca only)

Examples

# Load packages
library(dplyr)
library(tidyr)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

# Convert testing data to matrix
shifted_peaks_test_matrix <-
  shifted_peaks_sub |>
  filter(data == "Testing") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline testing data
veesa_test <- prep_testing_data(
  f = shifted_peaks_test_matrix,
  time = shifted_peaks_times,
  train_prep = veesa_train,
  optim_method = "DP"
 )
# Load packages
library(dplyr)
library(tidyr)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

# Convert testing data to matrix
shifted_peaks_test_matrix <-
  shifted_peaks_sub |>
  filter(data == "Testing") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline testing data
veesa_test <- prep_testing_data(
  f = shifted_peaks_test_matrix,
  time = shifted_peaks_times,
  train_prep = veesa_train,
  optim_method = "DP"
 )

Align training data and apply a method of elastic fPCA

Description

Applies steps 2 and 3 of the VEESA pipeline (alignment and elastic fPCA) to the training data in preparation for inputting the data to the model in step 4.

Usage

prep_training_data(
  f,
  time,
  fpca_method,
  lambda = 0,
  penalty_method = c("roughness", "geodesic", "norm"),
  centroid_type = c("mean", "median"),
  center_warpings = TRUE,
  parallel = FALSE,
  cores = -1,
  optim_method = c("DP", "DPo", "DP2", "RBFGS"),
  max_iter = 20L,
  id = NULL,
  C = NULL,
  ci = c(-2, -1, 0, 1, 2)
)
prep_training_data(
  f,
  time,
  fpca_method,
  lambda = 0,
  penalty_method = c("roughness", "geodesic", "norm"),
  centroid_type = c("mean", "median"),
  center_warpings = TRUE,
  parallel = FALSE,
  cores = -1,
  optim_method = c("DP", "DPo", "DP2", "RBFGS"),
  max_iter = 20L,
  id = NULL,
  C = NULL,
  ci = c(-2, -1, 0, 1, 2)
)

Arguments

`f`	Matrix (size M x N) of training data with N functions and M samples.
`time`	Vector of size M corresponding to the M sample points.
`fpca_method`	Character string specifying the type of elastic fPCA method to use. Options are 'jfpca', 'hfpca', or 'vfpca'.
`lambda`	Numeric value specifying the elasticity. Default is 0.
`penalty_method`	String specifying the penalty term used in the formulation of the cost function to minimize for alignment. Choices are "roughness" which uses the norm of the second derivative, "geodesic" which uses the geodesic distance to the identity and "norm" which uses the Euclidean distance to the identity. Defaults is "roughness".
`centroid_type`	String specifying the type of centroid to align to. Options are "mean" or "median". Defaults is "mean".
`center_warpings`	Boolean specifying whether to center the estimated warping functions. Defaults is TRUE.
`parallel`	Boolean specifying whether to run calculations in parallel. Defaults is FALSE.
`cores`	Integer specifying the number of cores in parallel. Default is -1, which uses all cores.
`optim_method`	Method used for optimization when computing the Karcher mean. Options are "DP", "DPo", and "RBFGS".
`max_iter`	An integer value specifying the maximum number of iterations. Defaults to 20L.
`id`	Integration point for f0. Default is midpoint.
`C`	Balance value. Default = NULL.
`ci`	Geodesic standard deviations to be computed. Default is c(-2, -1, 0, 1, 2).

Value

List with three objects:

alignment: output from fdasrvf::time_warping
fpca_type: type of elastic FPCA method applied
fpca_res: output from fdasrvf::jointFPCA, fdasrvf::horizFPCA, or fdasrvf::vertFPCA (dependent on fpca_type)

Examples

# Load packages
library(dplyr)
library(tidyr)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )
# Load packages
library(dplyr)
library(tidyr)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

"Shifted Peaks" Simulated Dataset

Description

A simulated dataset generated for examples in the veesa pipeline manuscript. For the code used to prepare this dataset, see https://github.com/sandialabs/veesa/inst/data-shifted-peaks.md.

Usage

shifted_peaks
shifted_peaks

Format

A list.

Details

The objects in the list are:

`data`	Data frame containing simulated data
`params`	The parameters used to generate the data
`true_means`	The true functional means of the shifted peaks groups.

Simulate example functional data

Description

Function for simulating a set of functional data based on a deterministic function with covariates that affect the shape of the functions

Usage

simulate_functions(M, N, seed)
simulate_functions(M, N, seed)

Arguments

`M`	Number of functions
`N`	Number of samples per function
`seed`	Seed for reproducibility

Details

The functions are generated using the following equation:

f(t) = (x_1*exp(-((t-0.3)^2)/0.005)) + (x_2(-((t-(0.7+x_3))^2/0.005)))

where the covariates are generated as follows:

x_1 generated from Unif(0.1,1)
x_2 generated from Unif(0.1,0.5)
x_3 generated from Unif(-0.1,0.1)

Value

Data frame with the following columns (where f is the function):

t: "time" associated with sample from function where t in [0,1]
y: f(t) for the particular observation
x1: covariate 1 for function $f$ (constant across time)
x2: covariate 2 for function $f$ (constant across time)
x3: covariate 3 for function $f$ (constant across time)

Examples

# Simulate data
sim_data = simulate_functions(M = 100, N = 75, seed = 20211130)
# Simulate data
sim_data = simulate_functions(M = 100, N = 75, seed = 20211130)

Package 'veesa'

Help Index

Obtain PC directions with centered warping functions

Description

Usage

Arguments

Value

Center warping functions

Description

Usage

Arguments

Value

Compute permutation feature importance (PFI)

Description

Usage

Arguments

Value

Examples

Plot principal component directions

Description

Usage

Arguments

Value

Examples

Align test data and apply fPCA using elastic method applied to training data

Description

Usage

Arguments

Value

Examples

Align training data and apply a method of elastic fPCA

Description

Usage

Arguments

Value

Examples

"Shifted Peaks" Simulated Dataset

Description

Usage

Format

Details

Simulate example functional data

Description

Usage

Arguments

Details

Value

Examples