-
Notifications
You must be signed in to change notification settings - Fork 1
/
02.Regression.Rmd
50 lines (46 loc) · 1.29 KB
/
02.Regression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: "Regression in R"
output:
html_document: default
html_notebook: default
---
# Introduction
Regression is one kind of supervised learning to predict the target based on the input features which the target value is continuous, such as stock price, sales figures, GDP, etc.
```{r}
library(mlbench)
```
### Load Dataset
We will use Boston house dataset.
```{r}
data("BostonHousing")
str(BostonHousing)
summary(BostonHousing)
apply(BostonHousing, 2, var)
```
## Build a linear regression model
The objective of linear regression is to find the weights (coefficients) that minimise the residual sum of square (RSS) of the linear equation.
$$ y = f(W, X) = w_{0} + w_{1}X_{1} + ... + w_{p}X_{p} $$
```{r}
model <- lm(medv ~ ., data=BostonHousing)
summary(model)
```
### Performance evaluation - RMSE
```{r}
rmse <- sqrt(mean(model$residuals ^ 2))
rmse
```
### Performance evaluation - R-squared
R-squared interprets the percentage the model explains the target variance. The closer to 1, the better fit.
```{r}
SS_res <- sum((model$residuals) ^ 2)
SS_tot <- sum((BostonHousing$medv - mean(BostonHousing$medv)) ^ 2)
R_squared <- 1 - SS_res / SS_tot
R_squared
```
### Check the residual against the fitted value
```{r}
plot(model$fitted.values, model$residuals, col="red")
```
```{r}
qqnorm(model$residuals)
```