--- title: "Resampling Simulation Analysis" author: "Jennifer Moore, Bill Pine" date: '' output: pdf_document: default word_document: default html_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Here we present the steps and R code for resampling our data through simulations to test for the effect of our treatment (rocks) on live oyster counts. For these analyses, we use the R packages \textit{MASS} for running the generalized linear models, and \textit{ggeffects} for predicting the data for our plots. ## Step 1 Read in data. For this example, we will use data from period 1 from this study (Winter 2018-2019). ```{r, eval = FALSE} library(MASS) #glm.nb function library(ggeffects) #run ggpredict() library(dplyr) #function needed for creating the resampled datasets # Read in the data data1 <- read.csv("Supp_Data.csv", header= T) ``` ## Step 2 Simulate 'nsim' datasets by sampling with replacement. In this example, we sample separately within each site/locality combination for each of our two treatment groups (i.e., restored reefs vs. unrestored (wild) reefs). We sample with replacement within each category 'n' times, with 'n' being the sample size for that group. Therefore, our final data set is the same length as our original data set. The output resampled dataset is saved as sim_data. Next, we run run the model testing whether or not there is an effect of treatment on live oyster counts. In this example, we are using a negative binomial model (see Moore et al. 2020). After each model is run, we pull our the coefficient for the treatment term (rocksBin) and save it in rocks. In addition, we use ggpredict() to estimate the live oyster counts for restored and unrestored (wild) reefs and save the results in mod. We loop through this to end up with 'nsim' number of datasets (5000 in this case). ```{r, eval = FALSE} rock_sims <- function(data1){ rocks <- NA mod <- list() nsim = 5000 n = nrow(data1) for(i in 1:nsim){ sim_data = data1 %>% group_by(rocksBin, site, locality) %>% sample_frac(size = 1, replace = T) %>% ungroup() m <- glm.nb(count_live ~ rocksBin + offset(log(tran_length)), data = sim_data) #run model using the resampled data set rocks[i] <- coef(m)['rocksBinY'] #pull out the beta coefficient for the treatment term mod[[i]] <- ggpredict(m, terms = c('rocksBin', 'tran_length[1]'), data = sim_data) #predict live oyster counts for each treatment group } return(list(rocks, mod)) } out <-rock_sims(data1) #run the function on our data ``` ## Step 3 Examine the results. The output is saved under 'out'. Therefore, out[[1]] is the beta coefficients, and out[[2]] is the predicted live oyster counts. ```{r, eval = FALSE} #look at beta coefficients rocks <- out[[1]] #pull out the beta coefficients from 'out' rocks #display all values mean(rocks) #mean beta coefficient #how many beta coefficients are less than 0 beta_crit <- function(x){sum(x<0)} rocks_eff <- beta_crit(rocks) #beta <0 shows that restored reefs have lower live oyster counts than restored reefs #find the ratio of live oyster counts between unrestored (wild) and restored sites ratio <- NA #initiate the variable for(i in 1:nsim){ ratio[i] <- out[[2]][[i]]$predicted[1]/out[[2]][[i]]$predicted[2] #pull outs the predicted live oyster counts for restored and unrestored sites } mean(ratio) #mean ratio between unrestored and restored sites #how many ratios are greater than 1 ratio_crit <- function(x){sum(x>1)} ratio_summ <- ratio_crit(ratio) #ratio greater than 1 means unrestored (wild) reefs have more #live oysters than restored sites ``` ## Step 4 Create plots of the results. ```{r, eval = FALSE} library(ggplot2) #boxplot of the beta coefficients p1 <- ggplot(data = data.frame(rocks)) + geom_boxplot(aes(x = factor(1), y = rocks)) + coord_flip() + labs(x = "", y = "Rocks Beta Coefficient", size = 2) + scale_x_discrete(labels = c("Period 1")) + ylim(-1.5,1) + theme_bw(base_size = 14) p1 #boxplot of the ratio between unrestored and restored reefs p2 <- ggplot(data = data.frame(ratio)) + geom_boxplot(aes(x = factor(1), y = ratio)) + coord_flip() + labs(x = "", y = "Ratio No Rocks/Rocks", size = 2) + scale_x_discrete(labels = c("Period 1")) + ylim(-1,3) + theme_bw(base_size = 14) p2 ```