---
title: "Light Cue Manuscript Statistics Overview"
author: "Brandon Güell"
date: "11/11/2017"
output:
  html_document:
    toc: yes
    toc_float: yes
  pdf_document:
    toc: yes
---

> The statistical analysis shown here are from the Güell & Warkentin light cue manuscript Analysis are shows in chronological order as reported in the manuscript. I show details on importing, manipulating, summarizing, analyzing, and visualizing the data. See manuscript for details on background, research questions, hypothesis, methods, etc.

# Hatching Timing Experiment Data Analysis

## Data importing

First, we can load the data straight from my computer as follows...
```{r}
#library(dplyr)
#library(readxl)
#photopattern <- read_excel("/Volumes/EggScience 1/Warkentin Lab/2015 Data/Hatching Phototaxis Data/Diel Hatching Pattern Data.xlsx")
#photopattern
```

Or, we can also access this dataset from my Github repository as follows:
```{r}
library(curl)
f <- curl("https://raw.githubusercontent.com/bguell/Light-Cue-Manuscript-Final/master/Diel%20Hatching%20Pattern%20Data.csv")
photopattern <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(photopattern)
```

**EXLUCDE clutch 238**
```{r}
library(dplyr)
photopattern= filter(photopattern, clutch!=238) # exclude clude 238
photopattern %>% group_by(clutch) %>% summarise(n()) #check how many clutches and embryos/clutch
```

## GLMM model generations

I created the following models using a GLMM where **for the full, most complete model**, the proportion data is the response variable and treatment, age (time), and their interaction are the predictor variables. I also added clutch as a random effect in these models. The models will be run using a **binomial family distribution**, and all follow a **logit link function** of correction. I also added the variable "initial # eggs" as the weights to ensure proper binomial distribution with our proportion response data. I will call this full complex model "rana".

```{r}
library(lme4)

#full model with treatment, time, and interaction
rana=glmer(percentage ~ treatment*age + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs)
rana
summary(rana)
```

We can then make a series of more simple, nested models as follows:

```{r}
#treatment and time
rana2=glmer(percentage ~ treatment + age + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs)
summary(rana2)

#just treatment
rana3=glmer(percentage ~ treatment + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs)
summary(rana3)

#just time
rana4=glmer(percentage ~ age + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs)
summary(rana4)
```

*************************

## Model comparison using **Likelihood Ratio Tests (LRT)**

> Note: The logic of the likelihood ratio test is to compare the likelihood of two models with each other, i.e., a model that includes the factor that we are interested in versus a reduced, nested model with that factor excluded. To get our p-values for each predictor variable we will compare models using the anova() function. We always want to put the reduced model first and then the full model after, making sure to use a Chisquared test.

First, we can use an information theoretic approach to compare our models. This helps us decide which model fits the data the best: model with **lowest AIC**
```{r}
library(AICcmodavg)
print(aictab(list(rana, rana2, rana3, rana4), c("rana", "rana2", "rana3", "rana4")), LL = FALSE)
```

We can see that indeed "rana", the full complex model with interactions, is the best model for our data. Next, we can test to see how much more variance is explained by the random effects than the fixed effects alone...
```{r}
library(MuMIn)
r.squaredGLMM(rana)
```

Now, we can use the "rana" model as our base model to obtain our p-values using the LRTs...

### **EFFECT OF AGE**
```{r}
anova(rana3,rana, test = "Chisq")
```
* *X^2* == 10317
* *p-value* == 2.2e-16 ***

### **EFFECT OF TREATMENT**
```{r}
anova(rana4,rana, test = "Chisq")
```
* *X^2* == 272.56
* *p-value* == 2.2e-16 ***

### **EFFECT OF TREATMENT*AGE INTERACTION**
```{r}
anova(rana2,rana, test = "Chisq")
```
* *X^2* == 92.442
* *p-value* == 2.2e-16  ***


### Model summary alt.
Alternatively, we could use the Anova() function to get p-values using a Wald chi-square tests for each of our main effects...
```{r}
library(car) # load car package for Anova function on model
Anova(rana) #Analysis of Deviance Table (Type II Wald chisquare tests)
```
Regardless, we can see that the results are extremely similar... Thus we will **report results from the LRT** in our manuscript.


*********************
*********************
*********************


### **POST HOC** OF TREATMENTS DIFFERENT TO EACH OTHER

```{r}
library(multcomp)

summary(glht(rana, linfct=mcp(treatment="Tukey")))
```

* light-dark **p-value** == 0.236 **NS**
* photo-dark **p-value** == <0.001 ***
* photo-light **p-value** == <0.001 ***

> Interpretation: This post hoc test shows that the photo and light treatment were significantly different, and photo and dark were different


*********************************************
*********************************************
*********************************************

*********************************************
*********************************************
*********************************************


### **Visualization of Hatching Timing Experiment**

> In order visualize this data, we will plot a proportion hatched curve for each treatment. Data points will be mean proportion hatched at each time point in a clutch (with standard error bars between clutches). 

In order to do this, we will need to "wrangle" our data a bit first:
```{r}
head(photopattern)
library(dplyr) #allows for data manipulation
k= photopattern %>%
  group_by(treatment, time, age) #group percentage/proportion hatched data by age colum to summarize by data!
k

k= k %>% summarise(sample_size= n(), mean= mean(percentage), SE= sd(percentage)/sqrt(length(percentage))) # sample size, mean, and SE of percentage by time and treatment!
k # new dataset with these variables we created
```

### Final plot with SE ribbons (shaded error areas)
```{r}
library(ggplot2)
library(scales)

legend_title="Light \ntreatment"

g1=ggplot(k, aes(x=age, y=mean)) + #add the axis
  geom_line(aes(color=treatment), size=1.5) + #make lines different colors... these are lines drawn b/t  the means, then make lines thicker... size="must be outside of aes"
  geom_ribbon(aes(ymin=mean-SE, ymax=mean+SE, group=treatment, fill=treatment), alpha=.5) + # make SE ribbon limits, then make 3 ribbon groups by treatment, the nfill them by color by treatment, alpha is see through-ness
  labs(y="Mean proportion hatched", x="Age (days)") + #creat labels... 
  theme_classic(base_size=21) +
  theme(panel.background = element_rect(colour = "black", size=1), legend.position="right", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) +
  scale_x_continuous(breaks= pretty_breaks(n=6)) + # allows for making "pretty" breaks on x axis... only showing 4 and 5 values...
  scale_y_continuous(breaks= pretty_breaks(n=6)) + # allows for making "pretty" breaks on y axis... shows 5 breaks
  scale_color_manual(legend_title, values = c("black", "darkorange", "darkgreen")) +
  scale_fill_manual(guide = FALSE, values = c("black", "darkorange", "darkgreen")) + # remove title for shading colors...
  geom_rect(aes(xmin = -Inf, xmax = 4.25, 
            ymin = -Inf, ymax = Inf), alpha = 0.009) +
  geom_rect(aes(xmin = 4.75, xmax = 5.25, 
            ymin = -Inf, ymax = Inf), alpha = 0.009) +
  geom_rect(aes(xmin = 5.75, xmax = 6.25, 
            ymin = -Inf, ymax = Inf), alpha = 0.009) +
  geom_rect(aes(xmin = 6.75, xmax = 7.25, 
            ymin = -Inf, ymax = Inf), alpha = 0.009)
g1
```

### 3 panel figure
```{r}
library(ggplot2)
library(scales)
library(dplyr)

levels(photopattern$treatment)
levels(photopattern$treatment) <- c("", "Constant dark", "Constant light", "12L:12D\nphotoperiod")
g=ggplot(photopattern, aes(x=age, y=percentage)) + #add the axis
  geom_boxplot(aes(group=age, fill=treatment)) + #make lines different colors... these are lines drawn b/t  the means, then make lines thicker... size="must be outside of aes"
  labs(y="Proportion hatched", x="Age (days)") + #creat labels... 
  theme_classic(base_size=20) +
  theme(panel.background = element_rect(colour = "black", size=1), legend.position="right", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) +
  scale_x_continuous(breaks= pretty_breaks(n=6)) + # allows for making "pretty" breaks on x axis... only showing 4 and 5 values...
  scale_y_continuous(breaks= pretty_breaks(n=6)) + # allows for making "pretty" breaks on y axis... shows 5 breaks
  scale_color_manual(legend_title, values = c("gray40", "darkorange", "darkgreen")) +
  scale_fill_manual(guide = FALSE, values = c("gray40", "darkorange", "darkgreen")) +
  facet_grid(treatment~.) # remove title for shading colors...
g
```


*********************************************
*********************************************
*********************************************


## **General Hatching Process**

*ONSET*
```{r}
library(dplyr)
onset = photopattern %>%
  group_by(clutch, treatment) %>%
  filter(percentage != 0) %>% # get first hatching
  filter(percentage == min(percentage) & age == min(age)) %>% #get only first hatching per clutch and teratment
  dplyr::select(clutch, treatment, age, time, percentage) %>%
  arrange(clutch, treatment)
onset


onset_summary = onset %>%
  group_by(treatment) %>%
  summarise(mean_onset_age = mean(age), SE= sd(age)/sqrt(length(age)), sample_size=n())
onset_summary

ggplot(data=onset, aes(x=treatment, y=age)) +geom_boxplot()
```

> The onset of hatching for embryos in the dark, light, and photoperiod treatments was 5.52, 5.46, and 5.29 days respectively.

Stats:
```{r}
m=lm(data=onset, age~treatment)
summary(m)

library(car)
Anova(m)
aov(m)
summary(aov(m))

#post hoc
TukeyHSD(aov(m))
```
* F value == 0.708;  df= 2,30
* overall treatment **p-value** == 0.501 **NS**

* light-dark **p-value** == 0.9619829 **NS**
* photo-dark **p-value** == 0.4992539 **NS**
* photo-light **p-value** == 0.6619119 **NS**

Check for normality
```{r}
plot(fitted(m), residuals(m))
hist(residuals(m))
qqnorm(residuals(m))
shapiro.test(m$residuals)
```
* shapiro test **p-value** == 0.01511

** NOT normally dist**

non-parametric tests
```{r}
library(FSA)
kwt=kruskal.test(data=onset, age~as.factor(treatment))
kwt

dunnTest(data=onset, age~as.factor(treatment), method = "bh")
```
* X^2 == 2.3325
* overall treatment **p-value** == 0.3115 **NS**

* light-dark **p-value** == 0.4450934 **NS**
* photo-dark **p-value** == 0.3801031 **NS**
* photo-light **p-value** == 0.6676400 **NS**

> interpretation: onset of hatching is not different between treatments


*FIFTY PERCENT*
```{r}
library(dplyr)
fiftypercent = photopattern %>%
  group_by(clutch, treatment) %>%
  filter(percentage >= .50) %>% # get first hatching
  dplyr::select(clutch, treatment, age, time, percentage) %>%
  arrange(clutch, treatment) %>%
  filter(percentage == min(percentage) & age == min(age)) #get only first hatching per clutch and teratment
fiftypercent


fiftypercent_summary = fiftypercent %>%
  group_by(treatment) %>%
  summarise(mean_fiftypercent_age = mean(age), SE= sd(age)/sqrt(length(age)), sample_size=n())
fiftypercent_summary

```

>  Embryos in the dark, light, and photoperiod treatments reached 50% hatching completion at  5.72, 5.87, and 5.73 days respectively.

Stats:
```{r}
m=lm(data=fiftypercent, age~treatment)
summary(m)

library(car)
Anova(m)
aov(m)
summary(aov(m))

#post hoc
TukeyHSD(aov(m))
```
* F value == 0.922;  df= 2,30
* overall treatment **p-value** == 0.409 **NS**

* light-dark **p-value** == 0.4436152 **NS**
* photo-dark **p-value** == 0.9916588 **NS**
* photo-light **p-value** == 0.5157742 **NS**

Check for normality
```{r}
plot(fitted(m), residuals(m))
hist(residuals(m))
qqnorm(residuals(m))
shapiro.test(m$residuals)
```

* shapiro test **p-value** == 0.04747

** NOT normally dist**

non-parametric tests
```{r}
library(FSA)
kwt=kruskal.test(data=fiftypercent, age~as.factor(treatment))
kwt

dunnTest(data=fiftypercent, age~as.factor(treatment), method = "bh")
```
* X^2 == 0.93053
* overall treatment **p-value** == 0.628 **NS**

* light-dark **p-value** == 0.8756973 **NS**
* photo-dark **p-value** == 0.6791092 **NS**
* photo-light **p-value** == 1.0000000 **NS**

> interpretation: embryos all reach 50% at the same age


*COMPLETION*
```{r}
library(dplyr)
completion = photopattern %>%
  group_by(clutch, treatment) %>%
  filter(percentage == 1) %>% # get first hatching
  filter(percentage == min(percentage) & age == min(age)) %>% #get only first hatching per clutch and teratment
  dplyr::select(clutch, treatment, age, time, percentage) %>%
  arrange(clutch, treatment)
completion


completion_summary = completion %>%
  group_by(treatment) %>%
  summarise(mean_completion_age = mean(age), SE= sd(age)/sqrt(length(age)), sample_size=n())
completion_summary

ggplot(data=completion, aes(x=treatment, y=age)) +geom_boxplot()
```

>  Embryos in the dark, light, and photoperiod treatments reached hatching completion at  6.06, 6.47, and 5.91 days respectively.

Stats:
```{r}
m=lm(data=completion, age~treatment)
summary(m)

library(car)
Anova(m)
aov(m)
summary(aov(m))

#post hoc
TukeyHSD(aov(m))
```
* F value == 5.6635;  df= 2,30
* overall treatment **p-value** == 0.008191 ***

* light-dark **p-value** == 0.0609168 **NS**
* photo-dark **p-value** == 0.6573500 **NS**
* photo-light **p-value** == 0.0077221 ***

Check for normality
```{r}
plot(fitted(m), residuals(m))
hist(residuals(m))
qqnorm(residuals(m))
shapiro.test(m$residuals)
```
* shapiro test **p-value** == 0.03655

** NOT normally dist**

non-parametric tests
```{r}
library(FSA)
kwt=kruskal.test(data=completion, age~as.factor(treatment))
kwt

dunnTest(data=completion, age~as.factor(treatment), method = "bh")
```
* X^2 == 11.133
* overall treatment **p-value** == 0.003824 **NS**

* light-dark **p-value** == 0.055745483 **NS**
* photo-dark **p-value** == 0.224474574 **NS**
* photo-light **p-value** == 0.002914246 ***

> interpretation: embryos in photoperiod reach hatching completion before those in light treatment but not differently than those in dark.


*********************************************
*********************************************
*********************************************


## **PEAK HATCHING**

Take data from time points 34 and 38 before and after 36 when lights when off to see peak H after onset of darkness...
```{r}
library(dplyr)
peak = photopattern %>%
  filter(time == 34 | time == 38)
peak
```

Now find the min and max prop hatched at time point 38 and 44
```{r}
minmax_peak= peak %>%
  group_by(clutch, treatment) %>%
  summarise(min=min(percentage), max=max(percentage)) %>%
  mutate(diff = max - min)
minmax_peak
```

### Stats:
```{r}
m=lm(data=minmax_peak, diff~treatment)
summary(m)

library(car)
Anova(m)
aov(m)
summary(aov(m))

#post hoc
TukeyHSD(aov(m))
```
* F value == 20.596;  df= 2,30
* overall treatment **p-value** == 1.86e-05 ***

* light-dark **p-value** == 0.7229111 **NS**
* photo-dark **p-value** == 0.0000054
* photo-light **p-value** == 0.0000467

> interpretation: Here we see that treatment has an effect on the diff b/t proportion hatched before and after darkness; specifically, the photo treatment was effected very strongly in proportion hatched compared to other treatments by the onset of darkness

Check for normality
```{r}
plot(fitted(m), residuals(m))
hist(residuals(m))
qqnorm(residuals(m))
shapiro.test(m$residuals)
```
* shapiro test **p-value** == 0.05205

***NORMALLy DIST** so stats are good!

### Visualization
```{r}
mpeak = minmax_peak %>%
  group_by(treatment) %>%
  summarise(mean_diff= mean(diff), SE= sd(diff)/sqrt(length(diff)), sample_size=n())
mpeak
```

```{r}
library(ggplot2)
legend_title="Light \ntreatment" #make legend title here

g = ggplot(data=mpeak, aes(x= treatment, y= mean_diff)) + 
  geom_point(aes(color= treatment), size= 3) + 
  geom_errorbar(aes(ymin=mean_diff-SE, ymax=mean_diff+SE, color = treatment), width= .4, size= 1) +
  labs(y="Proportion hatched\nbetween 5.67-5.83 days", x="Light treatment") +
  scale_y_continuous(limits=c(0,0.85), oob = rescale_none, breaks= pretty_breaks(n=4)) +
  scale_x_discrete(limits= c("dark", "light", "photoperiod")) + # put the x axis stuff in order that I want.
  scale_color_manual(legend_title,values = c("black", "darkorange", "darkgreen")) +
  geom_text(aes(y = c(0.23, 0.3, 0.8), label = c("a", "a", "b")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 7) +
  theme_classic(base_size = 21) +
  theme(panel.background = element_rect(colour = "black", size=1), legend.position="none", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black"))

g
```


********************************
********************************
********************************

********************************
********************************
********************************

## **SYNCHRONY WITHIN clutches** Analysis

> here lets find the max proportion hatched (max difference between any two time points) between any 2 time points...

```{r}
synchrony = photopattern %>%
  group_by(treatment) %>%
  filter(time == 36 | time == 38)
synchrony

synch_diff= synchrony %>%
  group_by(treatment, clutch) %>%
  summarise(mean= mean(percentage), min=min(percentage), max=max(percentage)) %>%
  mutate(diff = max - min)
synch_diff

synch_sum = synch_diff %>%
  group_by(treatment) %>%
  summarise(mean_diff= mean(diff), SE= sd(diff)/sqrt(length(diff)), sample_size=n())
synch_sum
```

* dark: 40-->42  ---------- 0.29870130
* light: 30-->32 ---------- 0.140259740
* photo: 36-->38 ---------- 0.52987013

> now lets just get that data

```{r}
synch = photopattern %>%
  group_by(treatment) %>%
  filter(time == 40 & treatment == "dark" | time == 42 & treatment == "dark" | time == 30 & treatment == "light" | time == 32 & treatment == "light" | time == 36 & treatment == "photoperiod" | time == 38 & treatment == "photoperiod")
synch

synch_diff= synch %>%
  group_by(treatment, clutch) %>%
  summarise(mean= mean(percentage), min=min(percentage), max=max(percentage)) %>%
  mutate(diff = max - min)
synch_diff

synch_sum = synch_diff %>%
  group_by(treatment) %>%
  summarise(mean_diff= mean(diff), SE= sd(diff)/sqrt(length(diff)), sample_size=n())
synch_sum
```

### Stats:
```{r}
m=lm(data=synch_diff, diff~treatment)
summary(m)

library(car)
Anova(m)
aov(m)
summary(aov(m))

#post hoc
TukeyHSD(aov(m))
```
* *F value* == 5.6252; df== 2,30
* overall treatment **p-value** == 0.008422 **

* light-dark **p-value** == 0.3762132 **NS**
* photo-dark **p-value** == 0.1349544 **NS**
* photo-light **p-value** == 0.0062725 ***

> interpretation: Here we see that treatment has an effect on the max proportion hatched in any given 2 hour time period. The photo treatment was more synchronous than the light treatment, but not than the dark treatment

Check for normality
```{r}
plot(fitted(m), residuals(m))
hist(residuals(m))
qqnorm(residuals(m))
shapiro.test(m$residuals)
```
* shapiro test **p-value** == 0.6777

***NORMALLy DIST** so stats are good!


### Visualization

```{r}
library(ggplot2)
library(scales)

legend_title="Light \ntreatment" #make legend title here

g = ggplot(data=synch_sum, aes(x= treatment, y= mean_diff)) + 
  geom_point(aes(color= treatment), size= 3) + 
  geom_errorbar(aes(ymin=mean_diff-SE, ymax=mean_diff+SE, color = treatment), width= .4, size= 1) +
  labs(y="Max proportion hatched\nin any 2 hour period", x="Light treatment") +
  scale_y_continuous(limits=c(0,0.8),oob = rescale_none, breaks= pretty_breaks(n=4)) +
  scale_x_discrete(limits= c("dark", "light", "photoperiod")) + # put the x axis stuff in order that I want.
  scale_color_manual(legend_title,values = c("black", "darkorange", "darkgreen")) +
  geom_text(aes(y = c(0.42, 0.21, 0.64), label = c("ab", "a", "b")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 7) +
  theme_classic(base_size = 21) +
  theme(panel.background = element_rect(colour = "black", size=1), legend.position = "none", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black"))
g
```


********************************
********************************
********************************
********************************
********************************
********************************


********************************
********************************
********************************
********************************
********************************
********************************


## **SYNCHRONY ACROSS CLUTCHES** TRIAL
```{r}
library(dplyr)
df = photopattern %>%
  group_by(clutch, treatment) %>%
  mutate(twohour = percentage - lag(percentage, n= 2,  default = 0), maximum=max(twohour)) %>% #this creates new column with differences of lagged percentages by clutch for four hour periods
  filter(twohour == maximum) %>% # include only the maximum values
  arrange(clutch) %>%
  dplyr::select(clutch, treatment, age, percentage,twohour, maximum) %>%
  filter(age==min(age)) # leave initial ages with max prop hatching in 4 hours
df
```

###Visualization
```{r}
library(ggplot2)
library(scales)

legend_title="Treatment" #make legend title here

g= ggplot(data=df, aes(x=age, fill=treatment)) +
  geom_histogram(position= "dodge")+
  labs(y="Number of clutches", x="Age (days)") +
  scale_y_continuous(limits=c(0,11), expand = c(0, 0), oob = rescale_none, breaks= pretty_breaks(n=5)) + # expand thing forces it to start at 0
  scale_x_continuous(limits=c(5,6.25), breaks= pretty_breaks(n=4)) +
  scale_fill_manual(legend_title,values = c("black", "darkorange", "darkgreen")) +
  theme_classic(base_size = 21) +
  theme(panel.background = element_rect(colour = "black", size=1), axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) # this creates border around whole thing
g
```


*******
******
******

###Stats: Kurtosis value for dist. of ages of maximum hatching in 4h period for each treatment
```{r}
library(e1071)
library(moments)
dark_synch= df %>%
  filter(treatment == "dark")
dark_synch

light_synch= df %>%
  filter(treatment == "light")
light_synch

photo_synch= df %>%
  filter(treatment == "photoperiod")
photo_synch

kurtosis(dark_synch$age)
kurtosis(light_synch$age)
kurtosis(photo_synch$age)
```
* dark == 3.003346
* light == 3.232073
* photoperiod == 9.1

###Levene Test for Homogeneity of Variance
```{r}
dark_light = df %>%
  filter(treatment == "light" | treatment == "dark")
dark_light

photo_dark = df %>%
  filter(treatment == "photoperiod" | treatment == "dark")
photo_dark

photo_light = df %>%
  filter(treatment == "photoperiod" | treatment == "light")
photo_light

#overall:
leveneTest(data=df, age~treatment)
#dark vs light
leveneTest(data=dark_light, age~treatment)
#photo vs dark
leveneTest(data=photo_dark, age~treatment)
#photo vs light
leveneTest(data=photo_light, age~treatment)
```
* *F value* == 4.6012; df== 2,30
* overall treatment **p-value** == 0.01808 **

* light-dark *F value* == 0.093; df== 1,20 
 **p-value** == 0.7635 **NS**

* photo-dark *F value* == 7.0203; df== 1,20 
 **p-value** == 0.01538 **

* photo-light *F value* == 14.76; df== 1,20 
 **p-value** == 0.001018 **NS**


********************************
********************************
********************************
********************************
********************************
********************************


********************************
********************************
********************************
********************************
********************************
********************************
********************************


## **EARLY HATCHING in DARK treatment** 
```{r}
head(photopattern)
early = photopattern %>%
  group_by(treatment) %>%
  filter(age == 5.75)
early

test1 = early %>%
  group_by(treatment) %>%
  summarise(mean= mean(percentage), SE= sd(percentage)/sqrt(length(percentage)), sample_size=n())
test1
```

Stats
```{r}
m=lm(data=early, percentage~treatment)
summary(m)

library(car)
Anova(m)
aov(m)
summary(aov(m))

#post hoc
TukeyHSD(aov(m))
```
* F value == 0.0019; df= 2,30
* overall treatment **p-value** == 0.9981 **NS**

* light-dark **p-value** == 0.9979927 **NS**
* photo-dark **p-value** == 0.9998360 **NS**
* photo-light **p-value** == 0.9989753 **NS**

> interpretation: **proportion** hatched after at 5.75 days old; same in all treatments.... dark does not elicit early hatching...

Check for normality
```{r}
plot(fitted(m), residuals(m))
hist(residuals(m))
qqnorm(residuals(m))
shapiro.test(m$residuals)
```
* shapiro test **p-value** == 0.01537

***NOT normally distributed** so stats are NOT good....

```{r}
library(FSA)
kwt=kruskal.test(data=early, percentage~as.factor(treatment))
kwt

dunnTest(data=early, percentage~as.factor(treatment), method = "bh")
```
* X^2 == 0.096347
* overall treatment **p-value** == 0.953 **NS**

* light-dark **p-value** == 1.0000000 **NS**
* photo-dark **p-value** == 0.8766644 **NS**
* photo-light **p-value** == 1.0000000 **NS**

> interpretation: prop hatched at 5.75 days old not different at all.... darkness does not induce early hatching


********************************
********************************
********************************

********************************
********************************
********************************


## **HATCHING after day 6 (LATE hatching in LIGHT treatment)** 

```{r}
head(photopattern)
delay = photopattern %>%
  group_by(treatment) %>%
  filter(time == 42) %>%
  mutate(prop_hatched=1-percentage)
delay

test = delay %>%
  group_by(treatment) %>%
  summarise(mean= mean(prop_hatched), SE= sd(prop_hatched)/sqrt(length(prop_hatched)), sample_size=n())
test
```

Stats
```{r}
m=lm(data=delay, prop_hatched~treatment)
summary(m)

library(car)
Anova(m)
aov(m)
summary(aov(m))

#post hoc
TukeyHSD(aov(m))
```
* F value == 6.9731; df==2,30
* overall treatment **p-value** == 0.003258 **

* light-dark **p-value** == 0.0291367 *
* photo-dark **p-value** == 0.6589382 **NS**
* photo-light **p-value** == 0.0033054 **

> interpretation: **proportion** hatched after 6 days old; more embryos hatched after 6 d in light than in other treatments

Check for normality
```{r}
plot(fitted(m), residuals(m))
hist(residuals(m))
qqnorm(residuals(m))
shapiro.test(m$residuals)
```
* shapiro test **p-value** == 0.001358

***NOT normally distributed** so stats are NOT good....

```{r}
library(FSA)
kwt=kruskal.test(data=delay, prop_hatched~as.factor(treatment))
kwt

dunnTest(data=delay, prop_hatched~as.factor(treatment), method = "bh")
```
* X^2 == 11.476
* overall treatment **p-value** == 0.003222 **

* light-dark **p-value** == 0.027008223 *
* photo-dark **p-value** == 0.359018578 **NS**
* photo-light **p-value** == 0.003084079 **

> interpretation: **number of embryos** hatched after 6 days old; more embryos hatched after 6 d in light than in other treatments

### Visualization

```{r}
legend_title="Light \ntreatment" #make legend title here

g = ggplot(data=test, aes(x= treatment, y= mean)) + 
  geom_point(aes(color= treatment), size= 3) + 
  geom_errorbar(aes(ymin=mean-SE, ymax=mean+SE, color = treatment), width= .4, size= 1) +
  labs(y="Proportion hatched\nafter age 6.0 days", x="Light treatment") +
  scale_y_continuous(limits=c(0,0.6),oob = rescale_none, breaks= pretty_breaks(n=4)) +
  scale_x_discrete(limits= c("dark", "light", "photoperiod")) + # put the x axis stuff in order that I want.
  scale_color_manual(legend_title,values = c("black", "darkorange", "darkgreen")) +
  geom_text(aes(y = c(.17, .45, .07), label = c("a", "b", "a")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 7) +
  theme_classic(base_size = 21) +
  theme(panel.background = element_rect(colour = "black", size=1), legend.position = "none", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black"))
g
```

*************************************************************************************
*************************************************************************************
*************************************************************************************
*************************************************************************************
*************************************************************************************
*************************************************************************************
*************************************************************************************


*************************************************************************************
*************************************************************************************
*************************************************************************************
*************************************************************************************
*************************************************************************************
*************************************************************************************


# **Hatching Orientation Experiment Data Analysis**

## Data importing

First, we can load this straight from my computer as follows...
```{r}
#library(readxl)
#phototaxis <- read_excel("/Volumes/EggScience 1/Warkentin Lab/2015 Data/Hatching Phototaxis Data/Hatching Orientation Data.xlsx", sheet = "phototaxis")

#str(phototaxis) #structure of the data; column names, etc.
#head(phototaxis) #preview the data
```

Or, anyone can access this dataset from my personal Github repository as follows:
```{r}
library(curl)
f <- curl("https://raw.githubusercontent.com/bguell/Light-Cue-Manuscript-Final/master/Hatching%20Orientation%20Data.csv")
phototaxis <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(phototaxis)
```

*******

Proportion hatch to light Batch analysis
```{r}
library(readxl)
batch <- read_excel("~/Desktop/batch analysis.xlsx")
```

```{r}
library(dplyr)
library(ggplot2)
library(scales)
library(lme4)
library(car)
library(multcomp)

batch

#the models:
binom1=glmer(data=batch, proportion ~ experiment + (1|batch), family="binomial", weights = total)
summary(binom1)
Anova(binom1)
#post hoc
summary(glht(binom1, linfct=mcp(experiment="Tukey")))

# make a second, simpler model without our variable of interest:
binom2=glm(data=batch, proportion ~ batch, family="binomial", weights = total)
summary(binom2)

binom3=glm(data=batch, proportion ~ experiment, family="binomial", weights = total)
summary(binom3)
Anova(binom3)


anova(binom1, binom2, test = "Chisq")
anova(binom1, binom3, test = "Chisq")
```


```{r}
g=ggplot(batch, aes(x=experiment, y=proportion)) + #add the axis
  geom_boxplot(aes(group=experiment, fill=experiment)) +
  scale_y_continuous(limit= c(.4,1), breaks= pretty_breaks(n=6)) + # allows for making "pretty" breaks on y axis... shows 5 breaks
  labs(y="Proportion embryos\nhatched towards light", x="Experiment") + #creat labels... 
  theme_classic(base_size=20) +
  theme(panel.background = element_rect(colour = "black", size=1), legend.position="right", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) +
    scale_fill_manual(guide = FALSE, values = c("gray40", "darkorange")) 
g
```


*****

## 4d Hatching Complications Analysis

### Open water whole clutch vs natural surface area extremes 25% and 50%

```{r}
prop.test(x=182, n=189, p=.025)
```
* **X^2** == 6783.2
* **p-value**  < 2.2e-16

```{r}
prop.test(x=182, n=189, p=.050)
```
* **X^2** == 3297.3
* **p-value**  < 2.2e-16


> Question: Should this test be one sided to test if its GREATER than the .034 proportion....???? It doesnt change the significant since they are so different, just a formality of the tests we are running... I'll show both below:

### Jelly whole clutch vs open water individual cup
Two Sided:
```{r}
prop.test(x=6,n=21,p=.032)
```
* **X^2** == 35.834
* **p-value** == 2.149e-09


One Sided:
```{r}
prop.test(x=6,n=21,p=.032, alternative = "greater")
```
* **X^2** == 33.21
* **p-value** == 4.136e-09

*****

## Phototaxis Analysis

> Note: Here, for all tests, we are testing the null hypothesis that the proportions of hatching direction was random (i.e. 50% to each side)

Control experiment
```{r}
prop.test(9,20)
```
* **X^2** == 0.05
* **p-value** == 0.8231

Iteration 1: First set of trials:
```{r}
prop.test(30,45)
```
* **X^2** == 4.3556
* **p-value** == 0.03689

Iteration 2: Second set of trials:
```{r}
prop.test(29,40)
```
* **X^2** == 7.225
* **p-value** == 0.00719

Comparison of two trials:
```{r}
prop.test(x=c(29,40), n=c(30,45))
```
* **X^2** == 0.61141
* **p-value** == 0.4343

Since the two sets are data are not significantly different from each other, I've pooled the data and run the test on the whole combined set...

Pooled analysis:
```{r}
prop.test(59,85)
```
* **X^2** == 12.047
* **p-value** == 0.0005187

*****

## GLMMs

> I created these following models using a GLMM where the "hatching direction" data is the response variable and the side of insertion data as the predictor variable. I also added clutch as a random effect in these models. The models will be run using a **binomial family distribution**, and all follow a **logit link function** of correction. Again we will obtain a p-value using LRTs.

```{r}
#some data adjustment so model can run properly
phototaxis$Hatching.direction = as.factor(phototaxis$Hatching.direction)
phototaxis$test.number = as.character(phototaxis$test.number)

#the models:
p1=glmer(Hatching.direction ~  Position.Entered + (1|clutch), data=phototaxis, family="binomial")
summary(p1)

# make a second, simpler model without our variable of interest:
p2=glmer(Hatching.direction ~ (1|clutch), data=phototaxis, family="binomial")
summary(p2)
```

### Model comparison using **Likelihood Ratio Tests (LRT)**

### GLM EFFECT OF INSERTION SIDE ON HATCHING DIRECTION
```{r}
anova(p2, p1)
```

* **p-value** == 0.4068 **NS**

Here we can clearly see that there was no significant effect of position entered on hatching direction

Again, we can use the Anova() function for Wald chi-square tests with one function...
```{r}
library(car)
Anova(p1)
```
* **p-value** == 0.4159 **NS**

And we see the same exact results. Thus we will **report results from the LRT** in our manuscript.


*************************
*************************
*************************


### Visualization of Hatching Orientation Experiment

First a simple barplot
```{r}
library(dplyr) #allows for data manipulation
l= phototaxis %>% 
  group_by(experiment, Hatching.direction) %>% #group hatching direction  data by experiment and hatching direction colum to summarize by data!
  summarise(count= n()) # sample size of experiment and hatching direction!
l
```

```{r}
legend_title="Hatching \ndirection" #make legend title here

g = ggplot(data=l, aes(x= experiment, y= count, fill=Hatching.direction)) + #use whole data of yes #1 hatched
  geom_bar(stat="identity", position= "dodge", width = .8) + #group it by cue type with color
  labs(y="Number of embryos", x="Experiment") +
  scale_y_continuous(limits=c(0,62),oob = rescale_none, breaks= pretty_breaks(n=4)) +
  scale_fill_manual(legend_title,values = c("black", "gray")) +
  geom_text(aes(y = c(13, 11, 28, 61), label = c("a", "a", "b", "c")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 8) +
  geom_text(aes(y= c(1, 1, 1, 1), label= l$count), position = position_dodge(width = .8), color= "white", vjust = 0, size = 4) + 
  theme_gray(base_size = 22)
g
```