---
title: "Heteroscedasticity"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Heteroscedasticity}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo=FALSE, message=FALSE}
library(olsrr)
library(ggplot2)
library(gridExtra)
library(nortest)
library(goftest)
```

# Introduction

One of the assumptions made about residuals/errors in OLS regression is that the errors have the
same but unknown variance. This is known as constant variance or homoscedasticity. When this 
assumption is violated, the problem is known as heteroscedasticity.

##### Consequences of Heteroscedasticity

- The OLS estimators and regression predictions based on them remains unbiased and consistent.
- The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
- Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.

**olsrr** provides the following 4 tests for detecting heteroscedasticity:

- Bartlett Test
- Breusch Pagan Test
- Score Test
- F Test

## Bartlett Test

Bartlett's test is used to test if variances across samples is equal. It is sensitive to 
departures from normality. The Levene test is an alternative test that is less sensitive 
to departures from normality.

You can perform the test using 2 continuous variables, one continuous and one grouping variable,
a formula or a linear model. 

#### Use grouping variable
```{r bartlett1}
ols_test_bartlett(hsb, 'read', group_var = 'female')
```

#### Using variables
```{r bartlett2}
ols_test_bartlett(hsb, 'read', 'write')
```


## Breusch Pagan Test

Breusch Pagan Test was introduced by Trevor Breusch and Adrian Pagan in 1979. It is used to
test for heteroskedasticity in a linear regression model and assumes that the error terms are 
normally distributed. It tests whether the variance of the errors from a regression is dependent 
on the values of the independent variables. It is a $\chi^{2}$ test.

You can perform the test using the fitted values of the model, the predictors in the model and a
subset of the independent variables. It includes options to perform multiple tests and p value adjustments. 
The options for p value adjustments include Bonferroni, Sidak and Holm's method.

#### Use fitted values of the model

```{r bp1}
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_test_breusch_pagan(model)
```

#### Use independent variables of the model

```{r bp2}
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_test_breusch_pagan(model, rhs = TRUE)
```

#### Use independent variables of the model and perform multiple tests

```{r bp3}
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_test_breusch_pagan(model, rhs = TRUE, multiple = TRUE)
```

#### Bonferroni p value Adjustment

```{r bp4}
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_test_breusch_pagan(model, rhs = TRUE, multiple = TRUE, p.adj = 'bonferroni')
```

#### Sidak p value Adjustment

```{r bp5}
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_test_breusch_pagan(model, rhs = TRUE, multiple = TRUE, p.adj = 'sidak')
```

#### Holm's p value Adjustment

```{r bp6}
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_test_breusch_pagan(model, rhs = TRUE, multiple = TRUE, p.adj = 'holm')
```

## Score Test

Test for heteroskedasticity under the assumption that the errors are independent and identically distributed (i.i.d.). You can perform the test using the fitted values of the model, the predictors in the model and a
subset of the independent variables.

#### Use fitted values of the model

```{r score1}
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_test_score(model)
```

#### Use independent variables of the model

```{r score2}
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_test_score(model, rhs = TRUE)
```

#### Specify variables

```{r score3}
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_test_score(model, vars = c('disp', 'hp'))
```

## F Test

F Test for heteroskedasticity under the assumption that the errors are independent and identically distributed (i.i.d.). You can perform the test using the fitted values of the model, the predictors in the model and a
subset of the independent variables.

#### Use fitted values of the model

```{r ftest1}
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_test_f(model)
```

#### Use independent variables of the model

```{r ftest2}
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_test_f(model, rhs = TRUE)
```

#### Specify variables

```{r ftest3}
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_test_f(model, vars = c('disp', 'hp'))
```