---
title: "Tiger Prey Equation"
author: "Nilanjan Chatterjee"
date: "15th March, 2023"
output:
  pdf_document: default
  html_notebook: default
  html_document:
    df_print: paged
---

The  `lmtest` and `sandwich` package is used for testing heteroskedasticity in the equation and transform variables to remove heteroskedasticity

```{r, warning=FALSE, message=FALSE}
library(lmtest)
library(sandwich)
library(dplyr)
```

## Load the data and check 
```{r}
density_dat <-read.csv("D:/Papers/tiger density India/Submission/Tiger_prey_density_data.csv",header=TRUE)

```

`log.chital_biomass..chital_biomass_prop`, `log.sambar_biomass..sambar_biomass_prop` these stands for individual species density*proportion of species biomass where,   

$$ sp_1 biomass proportion = sp_1 density.sp_1 wt/ sp_1 density.sp_1 wt + sp_2 density.sp_2 wt +...+ sp_n density.sp_n wt $$
## Data preparation

We calculated individual *species biomass* columns using the log-transformation. We substituted the `NA` values with 0 as $log(0)$ would give the output of `NA`.   
Later, we multiplied the *log_species_biomass* column with the *species_biomass_prop* to derive the final columns to be used in the regression equation. 
```{r}
density_dat <-density_dat |>
  mutate(log_chital_bio = ifelse(!is.finite(log10(chital_biomass)),0, log10(chital_biomass)),
         log_sambar_bio = ifelse(!is.finite(log10(sambar_biomass)),0, log10(sambar_biomass)),
         log_langur_bio = ifelse(!is.finite(log10(langur_biomass)),0, log10(langur_biomass)),
         log_barkingdeer_bio = ifelse(!is.finite(log10(barking_deer_biomass)),0, log10(barking_deer_biomass)),
         log_wildpig_bio = ifelse(!is.finite(log10(wild_pig_biomass)),0, log10(wild_pig_biomass)),
         log_nilgai_bio = ifelse(!is.finite(log10(nilgai_biomass)),0, log10(nilgai_biomass)),
         log_gaur_bio = ifelse(!is.finite(log10(gaur_biomass)),0, log10(gaur_biomass)))|>
  mutate(chital_bio_prop = log_chital_bio*chital_biomass_prop,
         sambar_bio_prop = log_sambar_bio*sambar_biomass_prop,
         langur_bio_prop = log_langur_bio*langur_biomass_prop,
         gaur_bio_prop = log_gaur_bio*gaur_biomass_prop,
         nilgai_bio_prop = log_nilgai_bio*nilgai_biomass_prop,
         wildpig_bio_prop = log_wildpig_bio*wild_pig_biomass_prop,
         barkingdeer_bio_prop = log_barkingdeer_bio*barking_deer_biomass_prop)

head(density_dat)
```


## Regression

Ordinary linear regression equation for species-wise prey-predator equation

```{r}
aleqt1 <-lm(log.tiger_density. ~ chital_bio_prop + sambar_bio_prop + 
                                 langur_bio_prop + gaur_bio_prop +  nilgai_bio_prop + 
                                 wildpig_bio_prop + barkingdeer_bio_prop, data = density_dat)

```

### Checking for heteroskedasticity

Conducting Breusch-Pagan test for heteroskedasticity and estimating the heteroskedasticity-consistent covariane matrix.
Comapring the standard error of the equation with the robust standard errors reflect which variable has the major issue with variance. There are multiple options for the covariance matrix, this HC4m was recommended by Cribari-Neto & Da Silva 2011.
```{r}
summary(aleqt1)
coeftest(aleqt1, vcov = vcovHC(aleqt1, "HC4m"))
bptest(aleqt1)
```

### Plotting

Plotting the residuals vs Fitted and the modulus of the residuals for each of the explanatory variable. 
We detected a decrease in the residuals of the chital variable as the value increases. This was supposed to be a source of heteroskedascity and we did transformation to remove that
```{r}
attach(density_dat)
par(mfrow=c(3,2))
par(mar=c(4,4,1,1))
plot(aleqt1$fitted.values, aleqt1$residuals, xlab="fitted", ylab="residuals", pch=19)
plot(chital_bio_prop, abs(aleqt1$residuals), xlab="Chital", ylab="residuals", pch=19)
plot(sambar_bio_prop, abs(aleqt1$residuals), xlab="Sambar", ylab="residuals", pch=19)
plot(wildpig_bio_prop, abs(aleqt1$residuals), xlab="Wild-pig", ylab="residuals", pch=19)
plot(nilgai_bio_prop, abs(aleqt1$residuals), xlab="Nilgai", ylab="residuals", pch=19)
plot(gaur_bio_prop, abs(aleqt1$residuals), xlab="Gaur", ylab="residuals", pch=19)
```

## Transformation

We did one transformation for converting the heteroskedastic model into a homoskedastic one    
 $$ \sigma_i^2 = \sigma^2/((x_i+1) $$

We multiplied the previous equation with $$\sqrt(x_i+1)$$ for the transformation

### First transformation
We used a square-root transformation first to remove the heteroskedasticity from the variables.

```{r}
x1 <-sqrt(chital_bio_prop+1)
y11 <-log.tiger_density.*x1
x2 <- chital_bio_prop*x1
x3 <- sambar_bio_prop*x1
x4 <- wildpig_bio_prop*x1
x5 <- nilgai_bio_prop*x1
x6 <- gaur_bio_prop*x1
x7 <- langur_bio_prop*x1
x8 <- barkingdeer_bio_prop*x1
```

### New linear regression equation with the transformed variable
```{r}
new_eqn <-lm (y11 ~x1 +x2+x3+x4+x5+x6-1)
summary(new_eqn)
bptest(new_eqn)
```

### Plotting

The residual vs fitted plot and the species wise residuals looks like the following. Here the residuals have significantly less heteroskedasticity compared to the the earlier equation.

```{r}
par(mfrow=c(3,2))
par(mar=c(4,4,1,1))
plot(new_eqn$fitted.values, new_eqn$residuals, xlab="fitted", ylab="residuals", pch=19)
plot(x2, abs(new_eqn$residuals), xlab="Chital", ylab="residuals", pch=19)
plot(x3, abs(new_eqn$residuals), xlab="Sambar", ylab="residuals", pch=19)
plot(x4, abs(new_eqn$residuals), xlab="Wild-pig", ylab="residuals", pch=19)
plot(x5, abs(new_eqn$residuals), xlab="Nilgai", ylab="residuals", pch=19)
plot(x6, abs(new_eqn$residuals), xlab="Gaur", ylab="residuals", pch=19)
```