20-Tools-DistributionsAndModels.Rmd


# Models and normal distributions {#SamplingDistributions}


```{r, child = if (knitr::is_html_output()) {'./introductions/20-Tools-DistributionsAndModels-HTML.Rmd'} else {'./introductions/20-Tools-DistributionsAndModels-LaTeX.Rmd'}}
```


<!-- Define colours as appropriate -->
```{r, child = if (knitr::is_html_output()) {'./children/coloursHTML.Rmd'} else {'./children/coloursLaTeX.Rmd'}}
```


## Introduction {#DistributionsModelsIntro}

As seen in Chap.\ \@ref(SamplingVariation), many different samples could be drawn from a population, and the value of the statistic varies from sample to sample.
The challenge of research is that only one of these countless possible samples is observed.
The distribution of possible values of the statistic that could be observed from all possible samples is a *sampling distribution*.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Remember: studying a sample leads to the following observations:
\vspace{-2ex}

* Every sample is likely to be different.
* We observe just one of the many possible samples.
* Every sample is likely to yield a different value for the statistic.
* We observe just one of the many possible values for the statistic.
\vspace{-2ex}

Since many values for the statistic are possible, the possible values of the statistic vary (called *sampling variation*) and have a *distribution* (called a *sampling distribution*).
:::


As seen in Chap.\ \@ref(SamplingVariation), sampling distributions often have a *normal distribution* (or bell-shaped distribution).\index{Distributions}\index{Normal distribution}
That is, the normal model is often used to describe the *sampling distribution*.\index{Sampling distribution}
We now study normal distributions, as they appear in many places in research.


## Normal distributions: examples {#DistributionsExample}
\index{Normal distribution!examples for data}

In Chap.\ \@ref(SamplingVariation), we saw that the proportion of odd spins in $15$\ spins of a roulette wheel could vary; similarly, the mean spin from $15$\ spins could vary (Fig.\ \@ref(fig:RouletteWheelHistPropMean)).
In both cases, these sampling distributions had a rough *normal distribution* shape.
This is true for larger numbers of spins also (Figs.\ \@ref(fig:RouletteWheelHist) and\ \@ref(fig:RouletteWheelHistx)).


```{r RouletteWheelHistPropMean, results='hide', fig.width=8.5, fig.height=2.5, fig.cap="Sampling distributions for the proportion of odd spins (left), and the mean of the numbers after $15$ roulette wheel spins (right) are approximate normal distributions. The solid lines are theoretical normal distributions.", fig.align="center", out.width="90%"}
p <- 18/37
spins <- c(15, 25, 100, 200)

se <- sqrt( p * (1 - p) / spins )

num.sims <- 5000

par( mfrow = c(1, 2))

### Spin the wheel spins[1] * num.sims. times 
### and grab each sim set from there

xNorm <- seq(0, 1, 
             length = 100)

propOdd <- function(x){
  sum( (x%%2 != 0 ) ) / length(x)
}

set.seed(37945000)
spinNumbersAll <- sample( 0:36, 
                          spins[1] * num.sims, 
                          replace = TRUE)
spinNumbers <- array( spinNumbersAll, 
                      dim = c(spins[1], num.sims) )
sampleP <- apply( spinNumbers, 
                  MARGIN = 2, 
                  FUN = propOdd )

break.list <- seq(0, 1, 
                  by = 1/15) + 1/30 

out <- hist( sampleP,
             breaks = break.list,
             xlim = c(0, 1),
             axes = FALSE,
             col = plot.colour,
             xlab = "Sample proportions", 
             ylab = "",
             main = paste("Proportion odd spins, from\n", spins[1], " spins of the wheel") )
yNorm <- dnorm(xNorm, 
               mean = p, 
               sd = se[1]) 
lines( xNorm,
       (yNorm) / max(yNorm) * max(out$count),
       col = "black",
       lwd = 2)
axis(side = 1)

points(x = 18/37,
       y = 0,
       pch = 19,
       cex = 0.8)
mtext(text = expression(italic(p)),
      side = 1,
      line = 0,
      at = 18/37,
      padj = 0.5,
      cex = 0.8)


### 

mu <- sum( 0:36 )/ 37
sigma <- sqrt( sum( ( (0:36) - mu )^2 )/37 )

spins <- c(15, 50, 100, 250)

se <- sigma / sqrt( spins )


num.sims <- 5000

### Spin the wheel spins[1] * num.sims. times 
### and grab each sim set from there
set.seed(37389457)

xNorm <- seq(0, 37, 
             length = 100)


spinNumbersAll <- sample( 0:36, 
                          spins[1] * num.sims, 
                          replace = TRUE)
spinNumbers <- array( spinNumbersAll, 
                      dim = c(spins[1], num.sims) )
sampleMeans <- colMeans(spinNumbers)
break.list <- seq(0, 37, 
                  by = 1)

out <- hist( sampleMeans,
             breaks = break.list,
             xlim = c(5, 30),
             axes = FALSE,
             col = plot.colour,
             xlab = "Sample means", 
             ylab = "",
             main = paste("Mean number, from\n", spins[1], " spins of the wheel") )
yNorm <- dnorm(xNorm, 
               mean = mu, 
               sd = se[1]) 
lines( xNorm,
       (yNorm) / max(yNorm) * max(out$count),
       col = "black",
       lwd = 2)
axis(side = 1)

points(x = 18,
       y = 0,
       pch = 19,
       cex = 0.8)
mtext(text = expression(mu),
      side = 1,
      line = 0,
      at = 18,
      padj = 0.5,
      cex = 0.8)
```


The *histograms* in Fig.\ \@ref(fig:RouletteWheelHistPropMean) are based on results from a limited number of simulations.
The solid lines shown in Fig.\ \@ref(fig:RouletteWheelHistPropMean) are actual *normal distributions*, and represent how the histogram might appear theoretically if we used an infinite number of simulations.
The normal distributions are *models* for what might occur in the *population*, so normal distributions are also called *normal models*.
Since the models represent *populations*, the mean of the model is denoted\ $\mu$ and the standard deviation is denoted\ $\sigma$.

A *model* is a theoretical or ideal concept.
A model skeleton isn't $100$%\ accurate and certainly not exactly like *your* skeleton; nonetheless, it suitably approximates reality.
None of us probably have a skeleton *exactly* like the model, but the model is still useful and helpful.
Likewise, a distribution may not have *exactly* a normal shape, but the model is still useful and helpful.
The model is a way of describing a *theoretical* distribution in the population.
A model is a simple (but not overly simple) approximation to reality.
<!-- ; it does not represent any particular sample of data. -->

The histograms of the data in Fig.\ \@ref(fig:RouletteWheelHistPropMean) are not *exactly* normal distributions, but are very close to normal distributions, and certainly close enough for most purposes.
Many, but not all, sampling distributions have approximate normal distributions.

Sampling distributions represent theoretical distributions of sample *statistics*, not the distribution of sample *data*.
When the sampling distribution is a normal distribution, the mean of the distribution is called the *sampling mean* and the standard deviation is called the *standard error*.
<!-- (These values may be *guided* by sample values; e.g., suggesting a mean Leadbeater's possum weight of $1000$\gs based on Fig.\ \@ref(fig:HistogramDBPPossums) (right panel) would be silly.) -->

Apart from their use in modelling theoretical sampling distributions, some quantitative variables have approximate normal distributions too, when the distribution of the data in the *population* can be approximately modelled by a normal distribution.


```{example NormalExamples, name="Normal distributions of data"}
Some quantitative variables have approximate normal distributions.
Figure\ \@ref(fig:HistogramDBPPossums) (left panel) shows the diastolic blood pressure of $398$\ Americans [@data:Willems1997:CHD; @data:Schorling1997:smoking].
Figure\ \@ref(fig:HistogramDBPPossums) (right panel) shows the weight of $83$\ male Leadbeater's possums [@data:Williams2022:Possums].
```


```{r HistogramDBPPossums, fig.align="center", fig.height = 3, fig.width=8.5, out.width='100%', fig.cap="Two normal distributions. Left: diastolic blood pressure of\ $398$ Americans. Right: the weight of\ $83$ male Leadbeater's possums.  The solid lines are the approximate model for the variable in the population."}
par( mfrow = c(1, 2))
data(Diabetes)
out <- hist(Diabetes$DBPfirst,
     col = plot.colour,
     las = 1,
     breaks = seq(40, 130, by = 10),
     ylim = c(0, 120),
     xlim = c(40, 130),
     xlab = "Diastolic blood pressure (in mm Hg)",
     ylab = "Number of people",
     main = "Diastolic blood pressure of Americans")
x <- seq(40, 130, 
         length = 200)
y <- dnorm(x,
           mean = mean(Diabetes$DBPfirst, na.rm = TRUE),
           sd = sd(Diabetes$DBPfirst, na.rm = TRUE) ) 
y <- max(out$counts) * y / max(y)
lines( y ~ x,
       col = "black",
       lwd = 2)

#########

data(Possums)
out <- hist( Possums$Wgt[Possums$Sex=="Male"],
      col = plot.colour,
      las = 1,
      xlab = "Weight (in g)",
      ylab = "Number of possums",
      xlim = c(100, 170),
      ylim = c(0, 35),
      main = "Weight of male Leadbeater's possums")
x <- seq(100, 170, 
         length = 200)
y <- dnorm(x,
           mean = mean(Possums$Wgt[Possums$Sex=="Male"], na.rm = TRUE),
           sd = sd(Possums$Wgt[Possums$Sex=="Male"], na.rm = TRUE) ) 
y <- max(out$counts) * y / max(y)
lines( y ~ x,
       col = "black",
       lwd = 2)
```


<!-- The histogram of the proportion of odd spins in Fig.\ \@ref(fig:RouletteWheelHistPropMean) (left panel) is from one of the countless possible samples of odd spins in $15$ spins. -->
<!-- The histogram of the mean of a set of spins in Fig.\ \@ref(fig:RouletteWheelHistPropMean) (right panel) is from one of the countless possible samples of sets of $15$ spins. -->
<!-- The normal distributions represent the unknown population sampling distributions that could reasonably have produced the histograms of the sample statistics.  -->

<!-- The histograms in Fig.\ \@ref(fig:HistogramDBPPossums) are from one of the countless possible samples of Americans (left panel) or Leadbeater's possums (right panel). -->
<!-- The normal distributions represent the unknown population distributions that could reasonably have produced the sample histograms.  -->


## Normal distributions and the 68--95--99.7 rule {#NormalDistribution}

Normal distributions have a shape that is symmetric about the mean, with a bell shape.
Half the values are greater than the mean, and half the values are less than the mean.
The total probability represented by a normal distribution is one (or\ $100$%).
For example, every sample will produce a sample proportion between\ $0$ and\ $1$ and so is represented somewhere in Fig.\ \@ref(fig:RouletteWheelHistPropMean) (left panel).
For example, every America has a diastolic blood pressure and so is represented somewhere in Fig.\ \@ref(fig:HistogramDBPPossums) (left panel); every male Leadbeater's possum has a weight and so is represented somewhere in Fig.\ \@ref(fig:HistogramDBPPossums) (right panel). 

In theory, no upper limits or lower limits exists for a variable modelled using a normal distribution.
In practice, this is rarely true, but usually never presents a problem.
Consider the normal distributions in Fig.\ \@ref(fig:HistogramDBPPossums), for example.
The normal distribution shown for the diastolic blood pressure (left panel) has no lower or upper limit in theory, but all practical values of diastolic blood pressure are captured by that part of the normal distribution shown.
The normal distribution implies almost no-one has a diastolic blood pressure below\ $40\mms$\ Hg or above\ $130\mms$\ Hg. 

One of the most important properties of normal distributions is the *68--95--99.7 rule* (sometimes called the *empirical rule*).


`r if (knitr::is_html_output()) '<!--'`
:::{.definition #EmpiricalRule name="The $68$--$95$--$99.7$ rule"}
`r if (knitr::is_html_output()) '-->'`
`r if (knitr::is_latex_output()) '<!--'`
:::{.definition #EmpiricalRule name="The 68--95--99.7 rule"}
`r if (knitr::is_latex_output()) '-->'`
For any quantity modelled by a normal distribution:\index{68@$68$--$95$--$99.7$ rule}

* *approximately*\ $68$% of values lie within\ $1$ standard deviation of the mean;
* *approximately*\ $95$% of values lie within\ $2$ standard deviations of the mean; and
* *approximately*\ $99.7$% of values lie within\ $3$ standard deviations of the mean.

These properties are true for *all* normal distributions, whatever the quantity, whatever the value of the mean, and whatever the value of the standard deviation (Fig.\ \@ref(fig:EmpiricalRuleDiagram)).
:::


```{r EmpiricalRuleDiagram, fig.width=6, fig.height=2.1, out.width='100%', fig.align="center", fig.cap="The $68$--$95$--$99.7$ rule."}
par(mfrow = c(1, 3))

out <- plotNormal(mu = 0, 
                  sd = 1, 
                  xlim.hi =  3.75,
                  xlim.lo = -3.75,
                  main = "68% of observations within\none std dev of the mean",
                  xlab = "Number of std deviations\n from the mean")
shadeNormal(out$x,
            out$y,
            lo = -1,
            hi = 1,
            col = plot.colour)

#

out <- plotNormal(mu = 0, 
                  sd = 1, 
                  xlim.hi =  3.75,
                  xlim.lo = -3.75,
                  main = "95% of observations within\ntwo std devs of the mean",
                  xlab = "Number of std deviations\n from the mean")
shadeNormal(out$x,
            out$y,
            lo = -2,
            hi = 2,
            col = plot.colour)

#

out <- plotNormal(mu = 0, 
                  sd = 1, 
                  xlim.hi =  3.75,
                  xlim.lo = -3.75,
                  main = "99.7% of observations within\n three std devs of the mean",
                  xlab = "Number of std deviations\n from the mean")
shadeNormal(out$x,
            out$y,
            lo = -3,
            hi = 3,
            col = plot.colour)
```


:::{.example #HeightsFemales name="Heights of females"}
Suppose the heights of Australian adult females can be *modelled* with a normal distribution having a mean of $\mu = 162\cms$, and a standard deviation of $\sigma = 7\cms$, and follow a normal distribution.
<!-- (based on the -->
<!-- r if (knitr::is_latex_output()) { -->
<!--    'Australian Health Survey, 2011--2012).' -->
<!-- } else { -->
<!--    '[Australian Health Survey](https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/4364.0.55.0012011-12?OpenDocument)).' -->
<!-- } -->
Using the $68$--$95$--$99.7$ rule, approximately\ $68$% of Australian women will be between $162 - 7 = 155\cms$ and $162 + 7 = 169\cms$ tall using this model.
Similarly, approximately\ $95$% of Australian women will be between $162 - (2\times 7) = 148\cms$ and $162 + (2\times 7) = 176\cms$ tall using this model.
:::


These regions under the normal curve are probabilities, are often called areas, and are sometimes expressed as percentages.


## Standardising ($z$-scores) {#zScores}
\index{z@$z$-score|(}

<!-- Since many statistics have a normal distribution (under certain circumstances), the $68$--$95$--$99.7$ rule can be used to understand the distribution of sample statistics. -->

Since the $68$--$95$--$99.7$ rule (Def.\ \@ref(def:EmpiricalRule)) applies for all normal distributions, the percentages in the rule only depend on how many standard deviations\ ($\sigma$) a value\ ($x$) is from the mean\ ($\mu$).
This information can be used to learn more about how values are distributed in a normal distribution.

For example, suppose heights of Australian adult females can be modelled with a normal distribution having a mean of $\mu = 162\cms$, and a standard deviation of $\sigma = 7\cms$ (Example\ \@ref(exm:HeightsFemales)).
Using this model, the proportion of Australian adult women *taller* than\ $169\cms$ can be determined.

From a picture (Fig.\ \@ref(fig:HtsExer1), left panel), $162 + 7 = 169\cms$ is one standard deviation *above* the mean.
Since\ $68$% of values are within one standard deviation of the mean,\ $32$% are outside that range (some shorter; some taller).
Hence,\ $16$% are taller than one standard deviation above the mean, so the answer is about\ $16$%.
(Another\ $16$% are shorter than one standard deviation *below* the mean, or less than $162 - 7 = 155\cms$ in height.)

Again, the percentages only depend on how many standard deviations\ ($\sigma$) the value\ ($x$) is from the mean\ ($\mu$), and not the actual values of\ $\mu$ and\ $\sigma$.


```{r HtsExer1, fig.cap="Left: what proportion of Australian adult females are taller than $169\\cms$? Right: what proportion of Australian adult females are shorter than $148$\\cms?", fig.align="center", fig.width=7.5, fig.height=3, out.width='90%'}

HT.mn <- 162
HT.sd <- 7

plus1  <- HT.mn + HT.sd
minus1 <- HT.mn - HT.sd
plus2  <- HT.mn + (2 * HT.sd)
minus2 <- HT.mn - (2 * HT.sd)
plus3  <- HT.mn + (3 * HT.sd)
minus3 <- HT.mn - (3 * HT.sd)
plus4  <- HT.mn + (4 * HT.sd)
minus4 <- HT.mn - (4 * HT.sd)

gap <- 0.5
shrinkDotted <- 3

####################

par( mfrow = c(1, 2),
     mar = c(5, 0.5, 5, 0.5))

out <- plotNormal(mu = HT.mn, 
                  sd = HT.sd, 
                  ylim = c(0, 0.075),
                  xlab = "Height (in cm)")
shadeNormal(out$x,
            out$y,
            lo = 0,
            hi = minus1,
            col = plot.colour)
shadeNormal(out$x,
            out$y,
            lo = plus1,
            hi = 400,
            col = plot.colour)

abline( v = c(minus1, plus1),
        col = "grey")

arrows(x0 = minus1 + gap,
       y0 = 0.06,
       x1 = plus1 - gap,
       y1 = 0.06,
       code = 3, # Arrow both ends
       length = 0.10,
       angle = 15)
text(x = HT.mn, 
     y = 0.06, 
     labels = "Area: 68%", 
     cex = 0.9,
     pos = 3)

arrows(x0 = plus1 + gap,
       y0 = 0.06,
       x1 = plus3 - shrinkDotted,
       y1 = 0.06,
       code = 1,
       length = 0.10,
       angle = 15)
lines( x = c(plus3 - shrinkDotted, plus3 + gap),
       y = c(0.06, 0.06),
       lty = 2)
text(x = plus2, 
     y = 0.06, 
     labels = "Area: 16%", 
     cex = 0.9,
     pos = 3)

arrows(x0 = minus1 - gap,
       y0 = 0.06,
       x1 = minus3 + shrinkDotted,
       y1 = 0.06,
       code = 1,
       length = 0.10,
       angle = 15)
lines( x = c(minus3 + shrinkDotted, minus3 - gap),
       y = c(0.06, 0.06),
       lty = 2)
text(x = minus2, 
     y = 0.06, 
     labels = "Area: 16%", 
     cex = 0.9,
     pos = 3)


###########################

out <- plotNormal(mu = HT.mn, 
                  sd = HT.sd, 
                  ylim = c(0, 0.075),
                  xlab = "Height (in cm)")
shadeNormal(out$x,
            out$y,
            lo = 0,
            hi = minus2,
            col = plot.colour)
shadeNormal(out$x,
            out$y,
            lo = plus2,
            hi = 200,
            col = plot.colour)


abline( v = c(minus2, plus2),
        col = "grey")

arrows(x0 = minus2 + gap,
       y0 = 0.06,
       x1 = plus2 - gap,
       y1 = 0.06,
       code = 3, # Arrow both ends
       length = 0.10,
       angle = 15)
text(x = HT.mn, 
     y = 0.06, 
     labels = "Area: 95%", 
     cex = 0.9,
     pos = 3)

arrows(x0 = plus2 + gap,
       y0 = 0.06,
       x1 = plus3 - shrinkDotted,
       y1 = 0.06,
       code = 1,
       length = 0.10,
       angle = 15)
lines( x = c(plus3 - shrinkDotted, plus3 + gap),
       y = c(0.06, 0.06),
       lty = 2)
text(x = mean( c(plus2, plus3) ) + 2.5, 
     y = 0.06, 
     labels = "Area: 2.5%", 
     cex = 0.9,
     pos = 3)

arrows(x0 = minus2 - gap,
       y0 = 0.06,
       x1 = minus3 + shrinkDotted,
       y1 = 0.06,
       code = 1,
       length = 0.10,
       angle = 15)
lines( x = c(minus3 + shrinkDotted, minus3 - gap),
       y = c(0.06, 0.06),
       lty = 2)
text(x = mean(c(minus2, minus3)) - 2.5, 
     y = 0.06, 
     labels = "Area: 2.5%", 
     cex = 0.9,
     pos = 3)

```

`r if (knitr::is_html_output()) '<!--'`
::: {.example #HeightsExer2 name="The $68$--$95$--$99.7$ rule"}
`r if (knitr::is_html_output()) '-->'`
`r if (knitr::is_latex_output()) '<!--'`
::: {.example #HeightsExer2 name="The 68--95--99.7 rule"}
`r if (knitr::is_latex_output()) '-->'`
Consider again the heights of Australian adult females.
Using this model, what proportion are *shorter* than\ $148\cms$?

Again, drawing a picture is helpful (Fig.\ \@ref(fig:HtsExer1), right panel).
Since $162 - (2\times 7) = 148$, $148\cms$ is two standard deviation *below* the mean.
Since $95$% of values are within two standard deviation of the mean,\ $5$% are outside that range (half smaller, half larger; see Fig.\ \@ref(fig:HtsExer1), right panel), so that\ $2.5$% are *shorter* than\ $148\cms$.
(Another\ $2.5$% are *taller* than $162 + 14 = 176\cms$.)
:::


Again, the percentages only depend on how many standard deviations\ ($\sigma$) the value\ ($x$) is from the mean\ ($\mu$).
The number of standard deviations that an observation is from the mean is called a *$z$-score*.
A $z$-score is computed using
$$
   z = \frac{ x - \mu}{\sigma},
$$
where\ $\sigma$ is the standard deviation quantifying the variation in the $x$-values.
Converting values to $z$-scores is called *standardising*.


`r if (knitr::is_html_output()) '<!--'`
::: {.definition #zScore name="$z$-score"}
`r if (knitr::is_html_output()) '-->'`
`r if (knitr::is_latex_output()) '<!--'`
::: {.definition #zScore name="z-score"}
`r if (knitr::is_latex_output()) '-->'`
A *$z$-score* measures how many standard deviations a value\ $x$ is from the mean.
In symbols:
\begin{equation}
   z = \frac{x - \mu}{\sigma},
   (\#eq:zscores)
\end{equation}
where\ $\mu$ is the mean of the distribution, and\ $\sigma$ is the standard deviation of the distribution (measuring the variation in the $x$-values).
:::
 
 
The $z$-score is also called the *standardised value* or *standard score*.
Note that:

* $z$-scores are negative for observations *below* the mean.
* $z$-scores are positive for observations *above* the mean.
* $z$-scores have no units (that is, not measured in kg, or cm, etc.).


`r if (knitr::is_html_output()) '<!--'`
::: {.example #HeightsExer3 name="$z$-scores"}
`r if (knitr::is_html_output()) '-->'`
`r if (knitr::is_latex_output()) '<!--'`
::: {.example #HeightsExer3 name="z-scores"}
`r if (knitr::is_latex_output()) '-->'`
Consider the model for the heights of Australian adult females again.
From earlier, the $z$-score for a height of\ $169\cms$ is
$$
   z = \frac{x-\mu}{\sigma} = \frac{169 - 162}{7} = 1,
$$
one standard deviation *above* the mean.
Similarly, the $z$-score for a height of\ $148\cms$ is
$$
   z = \frac{x-\mu}{\sigma} = \frac{148 - 162}{7} = -2,
$$
two standard deviations *below* the mean.
:::


`r if (knitr::is_html_output()) '<!--'`
::: {.example #EmpiricalRuleZ  name="The $68$--$95$--$99.7$ rule"}
`r if (knitr::is_html_output()) '-->'`
`r if (knitr::is_latex_output()) '<!--'`
::: {.example #EmpiricalRuleZ  name="The 68--95--99.7 rule"}
`r if (knitr::is_latex_output()) '-->'`
Consider the model for the heights of Australian adult females: a normal distribution, mean $\mu = 162\cms$, standard deviation $\sigma = 7\cms$ (Fig.\ \@ref(fig:HtsEmpirical)).
Using this model:

* A height of $162\cms$ is zero standard deviations from the mean: $z = 0$.
* $155\cms$ is one standard deviation *below* the mean: $z = -1$.
* $169\cms$ is one standard deviation *above* the mean: $z = 1$.
* $148\cms$ and $176\cms$ correspond to $z = -2$ and $z = 2$ respectively.
* $141\cms$ and $183\cms$ correspond to $z = -3$ and $z = 3$ respectively.
:::

```{r HtsEmpirical, fig.cap="The $68$--$95$--$99.7$ rule and the heights of Australian adult females.", fig.align="center", fig.width=7.15, fig.height=2.75, out.width='75%'}

#par( mar = c() )

out <- plotNormal(HT.mn,
                  HT.sd,
                  main = "Heights of Australian adult females",
                  xlab = "Heights (in cm)")


mtext(expression( "("*italic(z)==0*")"),
      side = 1,
      line = 2,
      cex = 0.9,
      at = HT.mn)

mtext(expression( "("*italic(z)==1*")"),
      side = 1,
      line = 2,
      cex = 0.9,
      at = HT.mn + HT.sd)
mtext(expression( "("*italic(z)==2*")"),
      side = 1,
      line = 2,
      cex = 0.9,
      at = HT.mn + HT.sd*2)
mtext(expression( "("*italic(z)==3*")"),
      side = 1,
      line = 2,
      cex = 0.9,
      at = HT.mn + HT.sd*3)

mtext(expression( "("*italic(z)==-1*")"),
      side = 1,
      line = 2,
      cex = 0.9,
      at = HT.mn - HT.sd)
mtext(expression( "("*italic(z)==-2*")"),
      side = 1,
      line = 2,
      cex = 0.9,
      at = HT.mn - HT.sd*2)
mtext(expression( "("*italic(z)==-3*")"),
      side = 1,
      line = 2,
      cex = 0.9,
      at = HT.mn - HT.sd*3)
```


## Approximating areas (percentages) using the $68$--$95$--$99.7$ rule {#ApproxProbs}
\index{Normal distribution!approximating percentages}

As seen above, the $68$--$95$--$99.7$ rule can be used to approximate percentages under normal distributions.
The rule can even be used for values that do not exactly align with\ $1$,\ $2$ or\ $3$ standard deviations from the mean.

Suppose again that heights of Australian adult females can be modelled with a normal distribution with a mean of $\mu = 162\cms$, and a standard deviation of $\sigma = 7\cms$ (Fig.\ \@ref(fig:HtsEmpirical)).
To find the proportion of women *shorter* than $145\cms$, first draw the situation (Fig.\ \@ref(fig:HtsExer3)).
Proceeding as before, we ask 'How many standard deviations from the mean is\ $145\cms$?'
Using Equation\ \@ref(eq:zscores), $145\cms$ corresponds to a $z$-score of
\begin{equation}
   z = \frac{145 - 162}{7} = -2.4285...
   (\#eq:zscore214)
\end{equation}
which is about\ $2.43$ standard deviations *below* the mean.


```{r HtsExer3, fig.cap="What proportion of Australian adult females are shorter than\ $145\\cms$?", fig.align="center", fig.width=7.0, fig.height=3.00, out.width='85%'}

HtInterest <- 145

par( mar = c(5.1, 4.1, 1.1, 2.1) )

out <- plotNormal(HT.mn, 
                  HT.sd, 
                  ylim = c(0, 0.085),
                  xlab = "Heights (in cm)")
shadeNormal(out$x,
            out$y,
            lo = 120,
            hi = 148,
            col = plot.colour)

mtext(expression( "("*italic(z)==0*")"),
      side = 1,
      line = 2,
      cex = 0.8,
      at = HT.mn)

mtext(expression( "("*italic(z)==1*")"),
      side = 1,
      line = 2,
      cex = 0.8,
      at = plus1)
mtext(expression( "("*italic(z)==2*")"),
      side = 1,
      line = 2,
      cex = 0.8,
      at = plus2)
mtext(expression( "("*italic(z)==3*")"),
      side = 1,
      line = 2,
      cex = 0.8,
      at = plus3)

mtext(expression( "("*italic(z)==-1*")"),
      side = 1,
      line = 2,
      cex = 0.8,
      at = minus1)
mtext(expression( "("*italic(z)==-2*")"),
      side = 1,
      line = 2,
      cex = 0.8,
      at = minus2)
mtext(expression( "("*italic(z)==-3*")"),
      side = 1,
      line = 2,
      cex = 0.8,
      at = minus3)

zInterest <- (HtInterest - HT.mn) / HT.sd

lines( x = c(HtInterest, HtInterest),
       y = c(0, max(out$y) * 0.7),
       col = "black")
text(x = HtInterest, 
     y = max(out$y) * 0.7,
     pos = 3,
     cex = 0.9,
     labels = "145 cm")

text(x = HT.mn, 
     y = 0.070,
     pos = 3,
     cex = 0.9,
     labels = "Area: 95%")
text(x = mean(c(minus2, minus3)) - 2, 
     y = 0.070,
     pos = 3,
     cex = 0.9,
     labels = "Area: 2.5%")
text(x = mean(c(plus2, plus3)) + 2, 
     y = 0.070,
     pos = 3,
     cex = 0.9,
     labels = "Area: 2.5%")

abline(v = c(minus2, plus2),
        col = "grey")


shrinkDotted <- 3
minus4 <- HT.mn - (4 * HT.sd)
plus4  <- HT.mn + (4 * HT.sd)

# Arrows for areas
arrows( x0 = minus3 + shrinkDotted,
        x1 = minus2 - gap,
        y0 = 0.070,
        y1 = 0.070,
        length = 0.10,
        angle = 15,
        lwd = 1)
lines( x = c(minus4, minus3 + shrinkDotted),
       y = c(0.070, 0.070),
       lwd = 1,
       lty = 2)
arrows( x0 = plus3 - shrinkDotted,
        x1 = plus2 + gap,
        y0 = 0.070,
        y1 = 0.070,
        length = 0.10,
        angle = 15,
        lwd = 1)
lines( x = c(plus3 - shrinkDotted, plus4),
       y = c(0.070, 0.070),
       lwd = 1,
       lty = 2)

arrows( x0 = minus2 + gap,
        x1 = plus2 - gap,
        y0 = 0.070,
        y1 = 0.070,
        code = 3,
        length = 0.10,
        angle = 15,
        lwd = 1)


# Arrow to small area sought
arrows(x0 = 138,
       x1 = 145,
       y0 = 0.018,
       y1 = 0.018,
       angle = 15,
       lwd = 1,
       length = 0.1)
lines( x = c(135, 138),
       y = c(0.018, 0.018),
       lwd = 1,
       lty = 2)
text(x = 140,
     y = 0.018,
     pos = 3,
     cex = 0.9,
     labels = "Area smaller")
text(x = 140,
     y = 0.018,
     pos = 1,
     cex = 0.9,
     labels = "than 2.5%")


```


What percentage of observations are less than this $z$-score?
This case is not covered by the $68$--$95$--$99.7$ rule, though the rule can be used to make *rough estimates*.

About\ $2.5$% of observations are less than\ $2$ standard deviations below the mean; that is, about\ $2.5$% of women are shorter than\ $148\cms$.
So the percentage of females shorter than\ $145$\cms (that is, even shorter than\ $148\cms$ and so further into the tail of the distribution) will be *smaller* than\ $2.5$%.
While we don't know the probability exactly, it will be smaller than\ $2.5$%.

Percentages found this way are very approximate, but often sufficient.
However, more accurate percentages are found using tables compiled for this very purpose
`r if ( knitr::is_html_output()) { 
   '(Appendix\\ \\@ref(ZTablesOnline)).'
} else {
   '(Appendices\\ \\@ref(ZTablesNEG) and \\@ref(ZTablesPOS)).'
   }`
We now learn how to use these tables.


## Exact areas (percentages) using tables {#ExactAreasUsingTables}
\index{Normal distribution!using tables}

Areas under normal distributions can be found using online tables, or hard copy tables.
The online tables are easier to use,
`r if (knitr::is_latex_output()) {
   'but only the hard-copy tables are explained in this book (see the online version of this book for the online tables, and instructions for using the online tables).'
} else {
   'but only the online tables are explained in this online book (see the hard-copy version for the hard-copy tables, and instruction for using use the hard-copy tables).'
}`
The tables 
`r if (knitr::is_latex_output()) {
   '(Appendices\\ \\@ref(ZTablesNEG) and\\ \\@ref(ZTablesPOS))' 
} else {
   '(Appendix\\ \\@ref(ZTablesOnline))'
}`
work with $z$-scores to two decimal places, so consider the $z$-score from Sect.\ \@ref(ApproxProbs) as $z = -2.43$.


```{r, child = if (knitr::is_latex_output()) './Tables/Ztables-Using-Hardcopy.Rmd'}
```

```{r, child = if (knitr::is_html_output()) './Tables/Ztables-Using-Online.Rmd'}
```


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
The tables always give the area to the *left* of the $z$-score.
:::

Either the hard-copy or online tables gives an answer of\ $0.75$%.
This is consistent with the rough answer using the $68$--$95$--$99.7$ rule: a value less than\ $2.5$%.


<iframe src="https://learningapps.org/watch?v=ppievv9gc22" style="border:0px;width:100%;height:800px" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>


## Examples using $z$-scores {#ZScoreForestry}

The general approach to computing probabilities from normal distributions is:

* *Draw a diagram*, and mark on the value(s) of interest.
* *Shade* the required region of interest.
* *Compute* the $z$-score(s) using Equation\ \@ref(eq:zscores).
* *Use* the tables in `r if ( knitr::is_html_output()) { 'Appendix\\ \\@ref(ZTablesOnline)'} else {'Appendices\\ \\@ref(ZTablesNEG) and\\ \\@ref(ZTablesPOS)'}` to compute corresponding areas (percentages).
* *Deduce* the answer.

Using this approach, more complicated questions can be answered


::: {.example #NormalTrees name="Normal distributions"}
Mechanized forest harvesting systems were simulated by @data:Aedo1997:softwood, and the diameters of a specific type of trees were modelled using:

* a normal distribution; with
* a mean of $\mu = 8.8$ inches; and
* a standard deviation of $\sigma = 2.7$ inches.

Using this model, what is the probability that a randomly-chosen tree has a diameter *greater* than\ $5$\ inches?

Following the steps identified earlier:

* *Draw* a normal curve, and mark on\ $5$\ inches (Fig.\ \@ref(fig:ZDBH1), left panel).
* *Shade* the region 'greater than\ $5$\ inches' (Fig.\ \@ref(fig:ZDBH1), centre panel).
* *Compute* the $z$-score using Eq.\ \@ref(eq:zscores):
  $\displaystyle z = (5 - 8.8)/2.7 = -1.41$ to two decimal places.
* *Use* tables:
  The probability of a tree diameter *shorter* than $5$\ inches is\ $0.0793$. 
  (Remember: the tables always give area *less* than the value of\ $z$.)
* *Deduce* the answer (Fig.\ \@ref(fig:ZDBH1), right panel):
  since the *total* area under the normal distribution is one (or\ $100$%), the probability of a tree diameter  *greater* than\ $5$\ inches is $1 - 0.0793 = 0.9207$, or about\ $92$%.

A randomly-chosen tree has a probability of\ $92$% of having a diameter *greater*\ $5$\ inches.
:::


```{r ZDBH1, fig.cap="What proportion of tree diameters are greater than\ $5$ inches?", fig.align="center", fig.width=7.0, fig.height=1.75, out.width='100%'}
DBH.mn <- 8.8
DBH.sd <- 2.7
DBH.x <- 5


par(mfrow = c(1, 3),
    mar = c(3, 0.25, 4, 0.25))

z <- seq( -3.5, 3.5, 
          length = 250)
zy <- dnorm( z, 
             mean = 0, 
	           sd = 1)

mu <- DBH.mn
sigma <- DBH.sd
x <- z * sigma + mu

out <- plotNormal(mu,
                  sigma,
                  xlab = "Tree diameters (in inches)",
                  main = "Draw",
                  ylim = c(0, 0.16),
                  round.dec = 1)
segments(x0 = DBH.x,
         x1 = DBH.x,
         y0 = 0,
         y1 = max(out$y) * 0.75,
        lwd = 2)
text(x = DBH.x,
     y = max(out$y) * 0.75,
     pos = 3,
     labels = "5 inches")

####

out <- plotNormal(mu,
                  sigma,
                  xlab = "Tree diameters (in inches)",
                  main = "Shade",
                  ylim = c(0, 0.16),
                  round.dec = 1)

shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = DBH.x,
            hi = 20)
abline(v = DBH.x,
       lwd = 2)

text(x = 14, 
     y = max(out$y) * 0.7,
     pos = 3,
     labels = "Area of\ninterest")
arrows(x0 = 14,
       y0 = max(out$y) * 0.7,
       x1 = 10,
       y1 = max(out$y) * 0.2,
       lwd = 2,
       angle = 15,
       length = 0.1)


####

out <- plotNormal(mu,
                  sigma,
                  xlab = "Tree diameters (in inches)",
                  main = "Compute the answer",
                  round.dec = 1,
                  ylim = c(0, 0.16),
                  showZ = FALSE)

shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 0,
            hi = DBH.x)
abline(v = DBH.x,
       lwd = 2)

text(x = 14, 
     y = max(out$y) * 0.7,
     pos = 3,
     labels = "Area of\ninterest")
arrows(x0 = 14,
       y0 = max(out$y) * 0.7,
       x1 = 10,
       y1 = max(out$y) * 0.15,
       lwd = 2,
       angle = 15,
       length = 0.1)

text(x = 2, 
     y = max(out$y) * 0.7,
     pos = 3,
     labels = "Area from\ntables")
arrows(x0 = 2,
       y0 = max(out$y) * 0.7,
       x1 = 4,
       y1 = max(out$y) * 0.15,
       lwd = 2,
       angle = 15,
       length = 0.1)

```


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Our normal-distribution tables *always* provide area to the *left* of the $z$-score.
Drawing a picture of the situation is important: it helps visualise getting the area requested from the area the tables provide.
Remember: the *total* area under the normal distribution is one (or\ $100$%).
:::


::: {.example #NormalTreesDiagrams name="Normal distributions"}
These scenarios can be displayed on a diagram as shown in Fig.\ \@ref(fig:MatchDiagrams) (recall $\mu = 8.8$ inches):

1. Tree diameters between\ $3$\ and\ $5$\ inches: Diagram\ A.
2. Tree diameters greater than\ $11$\ inches: Diagram\ B.
3. Tree diameters *between*\ $5$\ and\ $11$\ inches: Diagram\ C.
4. Tree diameters less than\ $11$\ inches: Diagram\ D.
:::


<!-- ```{r MatchDiagrams, fig.cap="Scenarios with their diagrams",  fig.align="center", out.width="85%", fig.height=4.00, fig.width=8}-->
```{r MatchDiagrams, fig.cap="Scenarios with their corresponding diagrams.",  fig.align="center", out.width="95%", fig.height=3.25, fig.width=7}
par( mfrow = c(2, 2))
 
par( mar = c(4.5, 1, 1.5, 2) + 0.1)

out <- plotNormal(mu,
                  sigma,
                  main = "Diagram A",
                  xlab = "Tree diameters (inches)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 3,
            hi = 5)


out <- plotNormal(mu,
                  sigma,
                  main = "Diagram B",
                  xlab = "Tree diameters (inches)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 11,
            hi = 20)


out <- plotNormal(mu,
                  sigma,
                  main = "Diagram C",
                  xlab = "Tree diameters (inches)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 5,
            hi = 11)


out <- plotNormal(mu,
                  sigma,
                  main = "Diagram D",
                  xlab = "Tree diameters (inches)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 0,
            hi = 11)
```


::: {.example #NormalTrees2 name="Normal distributions"}
Using the model for tree diameters in Example\ \@ref(exm:NormalTrees), what is the probability that a tree has a diameter *between*\ $5$\ and\ $11$\ inches?


First, *draw* the situation, and *shade* 'between $5$\ and $10$\ inches' (Fig.\ \@ref(fig:MatchDiagrams), Diagram\ C).
Then, *compute* the $z$-scores for *both* tree diameters:

* \makebox[17mm][l]{For $5$ inches:} $\quad  z = (5 - 8.8)/2.7 = -1.41$ (i.e., below the mean).
* \makebox[17mm][l]{For $11$ inches:} $\quad z = (11 - 8.8)/2.7 = 0.81$ (i.e., above the mean).

`r if (knitr::is_latex_output()) {
   'The tables in Appendices\\ \\@ref(ZTablesNEG) and\\ \\@ref(ZTablesPOS)' 
} else {
   'The tables in Appendix\\ \\@ref(ZTablesOnline)'
}`
can then be used to find the area to the *left* of $z = -1.41$, which is\ $0.0793$.
The table can also be used to find the area to the *left* of $z = 0.81$, which is\ $0.791$.
However, neither of these provide the area *between* $z = -1.41$ and $z = 0.81$.

Looking carefully at the areas from the tables and the area sought, the required area is the *area* between the two $z$-scores (Fig.\ \@ref(fig:ZDBH3)):
`r if (knitr::is_latex_output()) {
   '$0.7910 - 0.0793 = 0.7117$.'
} else {
   '$0.7910 - 0.0793 = 0.7117$ (see the animation below).'
}`
The probability that a tree has a diameter between\ $5$ and\ $11$\ inches is about\ $0.7117$, or about\ $71$%.
:::
\index{z@$z$-score|)}


```{r ZDBH3, fig.cap="What proportion of tree diameters are between $5$\ and $11$\ inches? The hatched area is the area to the left of $z = -1.41$, and the shaded area is the area to the left of $z = 0.81$; neither give us the area we seek directly.", fig.align="center", fig.width=9.5, fig.height = 2.75,out.width='100%'}

par( mfrow = c(1, 2))

out0 <- plotNormal(mu,
                   sigma,
                   cex.axis = 0.85,
                   main = expression(atop(The~area~to~the~bold(left)~of,
                                          italic(z)==-1.41*","~according~to~the~tables)),
                   xlab = "Tree diameters (inches)")
shadeNormal(out0$x,
            out0$y,
            col = "white",
            lo = 0,
            angle = 45,
            density = 18,
            hi = 5)

text(x = 3.25,
     y = 0.65 * max(out0$y),
     pos = 3,
     labels = expression(Area*":"~0.0793) )
arrows(x0 = 3,
       y0 = 0.7 * max(out$y),
       x1 = 4,
       y1 = 0.12 * max(out$y),
       angle = 15,
       lwd = 2,
       length = 0.1)

###################

out <- plotNormal(mu,
                  sigma,
                  showZ = FALSE,
                  cex.axis = 0.85,
                   main = expression(atop(The~area~to~the~bold(left)~of,
                                          italic(z)==0.81*","~according~to~the~tables)),
                  xlab = "Tree diameters (inches)")
shadeNormal(out0$x,
            out0$y,
            col = "white",
            shadeCol = "black",
            lo = 0,
            angle = 45,
            density = 18,
            hi = 5)
shadeNormal(out$x,
            out$y,
            col = blueTransparent,
            lo = 0,
            hi = 11)

text(x = 5,
     y = 0.75 * max(out$y),
     pos = 3,
     labels = expression(italic(x) == 5))
arrows( x0 = 5,
        y0 = 0.75 * max(out$y),
        x1 = 5,
        y1 = dnorm(5, mean = 8.8, sd = 2.7),
        angle = 15,
        lwd = 2,
        length = 0.1)
lines( x = c(5, 5),
       y = c(0, dnorm(5, mean = 8.8, sd = 2.7) ),
       col = grey(0.35) )

text(x = 14,
     y = 0.75 * max(out$y),
     pos = 3,
     labels = expression(Area*":"~0.791) )
arrows(x0 = 14,
       y0 = 0.75 * max(out$y),
       x1 = 10,
       y1 = 0.1 * max(out$y),
       angle = 15,
       lwd = 2,
       length = 0.1)
```

```{r animation.hook="gifski", dev=if (is_latex_output()){"pdf"}else{"png"}}
RT.mn <- 8.8
RT.sd <- 2.7
  
lower <- 5
upper <- 11
  
if (knitr::is_html_output()){
  for (i in (1:4)){
    if ( i == 1 ){
      out <- plotNormal(RT.mn, 
                sd = RT.sd, 
                xlab = "Tree diameter (inches)",
                round.dec = 1,
                main = "Between 6 and 11 inches")	
      shadeNormal(out$x,
                  out$y,
                  col = "azure2",
                  lo = lower,
                  hi = upper)
      
    }  
    if ( i == 3 ){
      out <- plotNormal(RT.mn, 
                sd = RT.sd, 
                xlab = "Tree diameter (inches)",
                round.dec = 1,
                main = "Table: Less than 11 inches: 0.7910")	
      shadeNormal(out$x,
                  out$y,
                  col = "azure2",
                  lo = 0,
                  hi = upper)
      shadeNormal(out$x,
                  out$y,
                  col = "blue",
                  lo = 0,
                  hi = lower)
    }  
    if ( i == 2 ){

      out <- plotNormal(RT.mn, 
                sd = RT.sd, 
                xlab = "Tree diameter (inches)",
                round.dec = 1,
                main = "Table: Less than 11 inches: 0.7910")	
      shadeNormal(out$x,
                  out$y,
                  col = "blue",
                  lo = 0,
                  hi = upper)
    }  
    if ( i == 4 ){
      out <- plotNormal(RT.mn, 
                sd = RT.sd, 
                xlab = "Tree diameter (inches)",
                round.dec = 1,
                main = "Between 6 and 11 inches: 0.6418")	
      shadeNormal(out$x,
                  out$y,
                  col = "azure2",
                  lo = lower,
                  hi = upper)
    }  
  }
}
```


<iframe src="https://learningapps.org/watch?v=p4jq6ujuj22" style="border:0px;width:100%;height:900px" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>


## Unstandardising: working backwards {#Unstandardising}
\index{Normal distribution!using tables backwards}\index{Unstandardising formula}

Using the model for tree diameters in Example\ \@ref(exm:NormalTrees) again, different types of questions can be asked too.
Suppose we needed to identify the diameters of the *smallest*\ $3$% of trees.

This is a different type of problem than before; previously, the *tree diameter* was known, so a $z$-score could be computed, and hence a probability (Fig.\ \@ref(fig:WorkingWithZ)).
However, here the *probability* is known, and a tree diameter is sought.
That is, working 'backwards' is necessary (Fig.\ \@ref(fig:WorkingWithZ)), so the $z$-tables need to be used 'backwards' too.


```{r WorkingWithZ, fig.cap="Working with $z$-scores. In the tables, the areas (probabilities) are in the body of the table, and the $z$-scores are in the margins of the table.", fig.align="center", out.width='90%', fig.height=2.25, fig.width=8.25}
par( mar = c(0.5, 0.5, 0.5, 0.5))

openplotmat()

boxY <- 0.075
boxX <- 0.120

pos <- diagram::coordinates(3)
pos[1, 1] <- pos[1, 1] - 0.0
pos[3, 1] <- pos[3, 1] + 0.0

pos[, 2] <- 0.725

text(0.5, 0.90, 
     label = expression(bold(The~usual~way~to~work~with~italic(z)*"-"*scores)), 
     font = 2)

straightarrow(from = pos[1,], 
            to = pos[2,])
straightarrow(from = pos[2,], 
            to = pos[3,])


textrect( pos[1, ], 
          lab = expression(Value~of~italic(x)~bold(known)), 
          box.col = GroupColour,
          lcol = GroupColour,
          shadow.size = 0,
          radx = boxX,
          rady = boxY)
textrect( pos[2, ], 
          lab = expression(Value~of~italic(z)~bold(computed)), 
          box.col = GroupColour,
          lcol = GroupColour,
          shadow.size = 0,
          radx = boxX,
          rady = boxY)
textrect( pos[3, ], 
          lab = expression(Area~from~bold(tables)), 
          box.col = GroupColour,
          lcol = GroupColour,
          shadow.size = 0,
          radx = boxX,
          rady = boxY)


###

pos <- diagram::coordinates(3)
pos[1, 1] <- pos[1, 1] - 0.0
pos[3, 1] <- pos[3, 1] + 0.0

pos[, 2] <- 0.1

text(0.5, 0.28, 
     expression(bold(Working~backwards~with~italic(z)*"-"*scores)), 
     font = 2)

straightarrow(from = pos[2, ], 
            to = pos[1, ])
straightarrow(from = pos[3, ], 
            to = pos[2, ])

textrect( pos[1, ], 
          lab = expression(Value~of~italic(x)~bold(computed)), 
          box.col = GroupColour,
          lcol = GroupColour,
          shadow.size = 0,
          radx = boxX,
          rady = boxY)
textrect( pos[2, ], 
          lab = expression(Value~of~italic(z)~from~bold(tables)), 
          box.col = GroupColour,
          lcol = GroupColour,
          shadow.size = 0,
          radx = boxX,
          rady = boxY)
textrect( pos[3, ], 
          lab = expression(Area~bold(known)),
          box.col = GroupColour,
          lcol = GroupColour,
          shadow.size = 0,
          radx = boxX,
          rady = boxY)

```


Drawing a rough diagram of the situation again is very helpful (Fig.\ \@ref(fig:DBHBackwards)).
We can only mark the approximate location of the required score, but this is sufficient.
Then, tables must be used to determine the corresponding $z$-score.
Since the required value will be smaller than the mean, the $z$-score will be negative (to the *left* of the mean).


```{r DBHBackwards, fig.cap="Tree diameters: the smallest\\ $3$\\% is shaded. The approximate location of the required $z$-score is drawn.", fig.align="center", out.width='65%', fig.width=6.5, fig.height=3.0}

z <- -1.88
zguess <- -1.8
xguess <- zguess * sigma + mu

out <- plotNormal(mu,
                  sigma,
                  main = "Smallest 3% of trees",
                  xlab = "Tree diameters (in inches)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            hi = xguess,
            lo = 0)

text(xguess, 
     max(out$y) * 0.8,
     expression(italic(z)~near~here), 
     pos = 3, 
     cex = 1)
lines(x = c(xguess, xguess),
      y = c(0, max(out$y) * 0.775),
      col = grey(0.2),
      lwd = 2)


arrows(x0 = 1.9, 
       y0 = max(out$y) * 0.5, 
       x1 = 2.0, 
       y1 = max(out$y) * 0.1, 
       angle = 15, 
       length = 0.15, 
       lwd = 2) 
text(x = 1.9, 
     y = max(out$y) * 0.425, 
     labels = "Approx.\narea: 3%", 
     pos = 2)
```


As before (Sect.\ \@ref(ExactAreasUsingTables)), online tables or hard copy tables can be used (and again the online tables are easier to use).
`r if (knitr::is_latex_output()) {
   'Only the hard-copy tables are explained in this book (see the online version for the online tables, and instructions for their use).'
} else {
   'Only the online tables are explained in this online book (see the hard-copy version for the hard-copy tables, and instructions for their use).'
}`

```{r, child = if (knitr::is_latex_output()) './Tables/Ztables-Using-Hardcopy-Tables-Backwards.Rmd'} 
```
```{r, child = if (knitr::is_html_output()) './Tables/Ztables-Using-Online-Tables-Backwards.Rmd'}
```


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Our tables always give the area to the *left* of the $z$-score.
:::


Using either the hard-copy or online tables, the appropriate $z$-value is about\ $-1.88$ standard deviations *below* the mean; that is, $z = -1.88$ (Fig.\ \@ref(fig:DBHBackwards)).
The $z$-score can be converted to an observation value $x$ using the *unstandardising* formula^[This is found by re-arranging Equation\ \@ref(eq:zscores).]:
$$
	x = \mu + z\sigma.
$$
Using this unstandardising formula:
\begin{align*}
	x &= \mu + (z\times\sigma) \\
		&= 8.8 + (-1.88 \times 2.7) = 3.724;
\end{align*}
that is, about $3$% of trees have diameters less than about\ $3.72$ inches.


::: {.definition #UnstandardisingFormula name="Unstandardising formula"}
When the $z$-score is known, the corresponding value of the observation\ $x$ is
\begin{equation}
	x = \mu + z\sigma.
  (\#eq:UnstandardisingFormula)
\end{equation}
This is called the *unstandardising formula*.
:::


::: {.example #LargestPC name="Normal distributions backwards"}
Using the model for tree diameters in Example\ \@ref(exm:NormalTrees) again, suppose now the diameters of the *largest*\ $25$% of trees needs to be identified.

The situation can be drawn (Fig.\ \@ref(fig:DBHBackwards2)).
Since an area is given, we need to work 'backwards', so the $z$-tables need to be used 'backwards' too.
The *largest*\ $25$% implies large trees, so required diameter is larger than the mean (so corresponds to a positive $z$-score). 

The tables work with the area to the *left* of the value of interest, which is\ $75$% (Fig.\ \@ref(fig:DBHBackwards2)).
Using either the hard-copy or online tables, the appropriate $z$-value is $z = 0.674$.
Then, the $z$-score can be converted to an observation value\ $x$ using the *unstandardising* formula:
\begin{align*}
	x &= \mu + (z\times\sigma) \\
		&= 8.8 + (0.674 \times 2.7) = 10.621.
\end{align*}
That is, about\ $25$% of trees have diameters larger than about\ $10.6$\ inches.
:::


```{r DBHBackwards2, fig.cap="Tree diameters: the largest $25$\\% is the same as the smallest $75$\\%.", fig.align="center", fig.width=6.5, out.width='60%', fig.height=2.75}

out <- plotNormal(mu,
                  sigma,
                  cex.axis = 0.85,
                  main = "Largest 25% of trees",
                  xlab = "Tree diameters (inches)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 0,
            hi = 10.62)
# shadeNormal(out$x,
#             out$y,
#             col = plot.colour2,
#             lo = 10.62,
#             hi = 20)

arrows(x0 = 16, 
       y0 = max(out$y) * 0.45, 
       x1 = 12.5, 
       y1 = max(out$y) * 0.1, 
       angle = 15, 
       length = 0.15, 
       lwd = 2) # Note: Locations in terms of z-scores
text(16.5, max(out$y) * 0.45, 
     "Largest 25%", 
     cex = 0.9,
     pos = 3)


arrows(x0 = 12, 
       x1 = 10.62,
       y0 = max(out$y) * 0.925,
       y1 = max(out$y) * 0.925,
       length = 0.15,
       angle = 15)
text(12,
     max(out$y) * 0.925,
     expression(italic(z)~is~near~here), 
     pos = 4, 
     cex = 0.9)
abline(v = 10.6,
       col = "grey")


arrows(5, max(out$y) * 0.7,
       7, max(out$y) * 0.1, 
       angle = 15, 
       length = 0.15, 
       lwd = 2) # Note: Locations in terms of z-scores
text(5, max(out$y) * 0.7, 
     "Smallest 75%", 
     cex = 0.9,
     pos = 2)
```


<iframe src="https://learningapps.org/watch?v=poo3x05hn22" style="border:0px;width:100%;height:600px" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>


## Example: methane production

@huhtanen2016effects modelled the retention time of food in sheep using a normal distribution, with the mean retention time as $\mu = 42.5\hs$, and the standard deviation as $\sigma = 3.68\hs$.
We can draw this normal distribution (Fig.\ \@ref(fig:RetentionTime)), and then apply the $68$--$95$--$99.7$ rule:

* about\ $68$% of retention times are between\ $38.82$ and\ $46.18\hs$;
* about\ $95$% of retention times are between\ $35.14$ and\ $49.86\hs$; and
* about\ $99.7$% of retention times are between\ $31.46$ and\ $53.54\hs$.


```{r RetentionTime, fig.cap="Retention times of food in sheep.", fig.align="center", fig.width=6.5, fig.height=2.75, out.width='65%'}

out <- plotNormal(42.5,
                  3.68,
                  xlab = "Retention times (in hours)",
                  main = "Retention times of food in sheep",
                  round.dec = 2)
```


::: {.example #Methane1 name="Working with the normal distribution"}
Using this model, what proportion of sheep have a retention time *less than* $40\hs$?
:::


A retention time of\ $40\hs$ corresponds to a $z$-score of (Fig.\ \@ref(fig:RetentionPlots), top left panel):
$$
   z = \frac{40 - 42.5}{3.68} = -0.68.
$$
This is a *negative* number, since $40\hs$ is *below* the mean.
Using the tables in
`r if (knitr::is_latex_output()) {
   '(Appendices\\ \\@ref(ZTablesNEG) and\\ \\@ref(ZTablesPOS))' 
} else {
   '(Appendix\\ \\@ref(ZTablesOnline)'
}`
(that give the *area to the left* of the $z$-score), the area to the left of $z = -0.68$ is\ $0.2483$, or about\ $24.8$%.
About\ $24.8$% of sheep have a retention times *less* than\ $40\hs$.


::: {.example #Methane2 name="Working with the normal distribution"}
What proportion of sheep have a retention time *greater than*\ $48\hs$ (two days)?
:::

A retention time of\ $48\hs$ corresponds to a $z$-score of\ $1.49$.
Using the normal distribution tables, the area to the *left* of this $z$-score is\ $0.9319$, so the area to the *right* of this $z$-score is\ $0.0681$ (Fig.\ \@ref(fig:RetentionPlots), top right panel).


::: {.example #Methane3 name="Working with the normal distribution"}
What proportion of sheep have a retention time *between*\ $40$ and\ $48\hs$?
:::


```{r RetentionPlots, fig.cap="Plots for retention times.",  fig.align="center", out.width="80%", fig.height=4.25}
par( mfrow = c(2, 2))

par( mar = c(5, 2, 1.5, 2) + 0.1)

mu <- 42.5
sigma <- 3.68

out <- plotNormal(mu,
                  sigma,
                  main = "Less than 40 hours",
                  xlab = "Retention times (hours)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 25,
            hi = 40)

###

out <- plotNormal(mu,
                  sigma,
                  main = "Greater than 48 hours",
                  xlab = "Retention times (hours)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 48,
            hi = 60)

###


out <- plotNormal(mu,
                  sigma,
                  ylim = c(0, 0.150),
                  main = "Between 40 and 48 hours",
                  xlab = "Retention times (hours)")

shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 25,
            hi = 48)
shadeNormal(out$x,
            out$y,
            col = NA,
            shadeCol = grey(0.3),
            density = 10,
            angle = 45,
            lo = 25,
            hi = 40)
arrows(x0 = 34,
       x1 = 40,
       y0 = 0.11,
       y1 = 0.11,
       length = 0.10,
       angle = 15,
       lwd = 1)
arrows(x0 = 34,
       x1 = 48,
       y0 = 0.130,
       y1 = 0.130,
       length = 0.10,
       angle = 15,
       lwd = 1)
text(x = 35,
     y = 0.11,
     pos = 1,
     cex = 0.8,
     labels = expression(Less~than~40))

arrows(x0 = 40,
       x1 = 48,
       y0 = 0.12,
       y1 = 0.12,
       code = 3,
       length = 0.10,
       angle = 15,
       lwd = 1)
lines( x = c(25, 40),
       y = c(0.11, 0.11),
       lty = 2)
lines( x = c(25, 40),
       y = c(0.130, 0.130),
       lty = 2)
abline( v = c(40, 48),
        col = "grey")
text(x = 35,
     y = 0.12,
     pos = 3,
     cex = 0.8,
     labels = expression(Less~than~48))


### 

 
out <- plotNormal(mu,
                  sigma,
                  main = "Smallest 35%",
                  xlab = "Retention times (hours)")
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 25,
            hi = 41)
```


A retention time of\ $40\hs$ corresponds to $z = -0.68$ and, using the normal distribution tables, the area to the *left* of $z = -0.68$ is\ $0.2483$ (Fig.\ \@ref(fig:RetentionPlots), bottom left panel; hatched area).
But this is not the area that we seek.
From earlier, the area to the *left* of $z = 1.49$ is\ $0.9319$ (Fig.\ \@ref(fig:RetentionPlots), bottom left panel; shaded region).
But this is not the area we seek either.
From the two areas that we know, we *can* find the area that we seek (Fig.\ \@ref(fig:RetentionPlots), bottom left panel):

* $48\hs$ corresponds to $z = 1.49$; the area to the *left* of this $z$-score is\ $0.9319$.
* $40\hs$ corresponds to $z = -0.68$; the area to the *left* of this $z$-score is\ $0.2483$.
* The *difference* between these two *areas* is sought: $0.9319 - 0.2483 = 0.6836$.

So the proportion is about\ $0.684$ (or\ $68.4$%).


::: {.example #Methane4 name="Working with the normal distribution"}
Consider the\ $35$% of sheep with the *shortest* retention times.
What are these retention times?
:::

The time we seek must be *smaller* than the mean if it defines the *shortest*\ $35$% of retention times.
We don't know *exactly* where to draw the retention time that this corresponds to on the diagram; it's just somewhere to the left of the mean (Fig.\ \@ref(fig:RetentionPlots), bottom right panel).

This time, *we know the area to the left*, but we do not know the value (or $z$-score).
This a 'backwards problem', and we need to find the $z$-score 'backwards' (Sect.\ \@ref(Unstandardising)).
From the hard copy tables, a $z$-score of $z = -0.39$ has an area to the left of\ $0.3483$, which is as close as we can get.
(The online tables are more precise: $z = -0.385$.)

We know the $z$-score, so the retention value is found using the unstandardising formula: 
$$
  x = \mu + (z \times \sigma) 
    =  42.5 + (-0.385\times 3.68) = 41.0832.
$$
The retention time is about\ $41.1\hs$.


## Chapter summary {#DistributionModelsSummary}

A *model* is a way of describing the theoretical distribution of some quantitative quantity.
One common model is a *normal model* or *normal distribution*, which is a bell-shaped distribution with a theoretical mean\ $\mu$ and a theoretical standard deviation\ $\sigma$.
Probabilities can be computed from normal distributions using *$z$-scores*, the $68$--$95$--$99.7$ rule and tables.


## Quick revision questions {#DistributionModelsQuickReview}

::: {.webex-check .webex-box}
Consider again the model for tree diameters in Example\ \@ref(exm:NormalTrees) [@data:Aedo1997:softwood]: a normal distribution with $\mu = 8.8$ inches, and $\sigma = 2.7$ inches.

Are the following statements *true* or *false*?

1. A tree diameter of $10.2$\ inches corresponds to a $z$-score of $(10.2 - 8.8)/2.7 = 0.519$. \tightlist  
`r if( knitr::is_html_output() ) {torf( answer=TRUE )}`
2. The probability that a tree has a diameter *less* than\ $10.2$\ inches is about\ $0.70$.
`r if( knitr::is_html_output() ) {torf( answer=TRUE )}`
3. The probability that a tree has a diameter *greater* than\ $10.2$\ inches is about\ $0.70$.
`r if( knitr::is_html_output() ) {torf( answer=FALSE )}`
4. A tree diameter of $6$\ inches corresponds to a $z$-score of $1.04$.
`r if( knitr::is_html_output() ) {torf( answer=FALSE )}`
5. The probability that a tree has a diameter *less* than $6$\ inches is $0.15$.
`r if( knitr::is_html_output() ) {torf( answer=TRUE )}`
6. The probability that a tree has a diameter *greater* than $6$\ inches is $0.85$.
`r if( knitr::is_html_output() ) {torf( answer=TRUE )}`
:::


## Exercises {#SamplingDistributionsExercises}

[Answers to odd-numbered exercises] are given at the end of the book. 

`r if( knitr::is_latex_output() ) "\\captionsetup{font=small}"`

::: {.exercise #Statements}
Are the following statements *true* or *false*?

1. The unstandardising formula can be used to compute probabilities. \tightlist
   `r if( knitr::is_html_output() ) {
	 mcq( c("True", answer = "False"))}`
2. About\ $68$% of observations are within two standard deviations of the mean.  
   `r if( knitr::is_html_output() ) {
	 mcq( c("True", answer = "False"))}`
3. Positive $z$-scores correspond to values larger than the mean.  
   `r if( knitr::is_html_output() ) {
	 mcq( c(answer = "True", "False"))}`
4. A $z$-score tells us how many standard deviations a value is away from the mean.  
   `r if( knitr::is_html_output() ) {
	 mcq( c(answer = "True", "False"))}`
:::


::: {.exercise #StatementsB}
Are the following statements *true* or *false*?

1. A $z$-score larger than\ $4$ is impossible.  
   `r if( knitr::is_html_output() ) {
	 mcq( c("True", answer = "False"))}`
2. A $z$-score of zero is located at the mean value.  
   `r if( knitr::is_html_output() ) {
	 mcq( c(answer = "True", "False"))}`
3. About\ $5$% of observations are less than two standard deviations below the mean.  
   `r if( knitr::is_html_output() ) {
	 mcq( c("True", answer = "False"))}`
4. A $z$-score of zero means a calculation error has been made.  
   `r if( knitr::is_html_output() ) {
	 mcq( c("True", answer = "False"))}`
:::


::: {.exercise #SamplingDistributionsIQForwards}
IQ scores are
`r if (knitr::is_latex_output()) {
   'designed to have'
} else {
   '[designed to have](https://en.wikipedia.org/wiki/IQ_classification)'
}`
a mean of\ $100$ and a standard deviation of\ $15$.
Match the diagram in Fig.\ \@ref(fig:IQMatchDiagramsForwards) with the meaning.

:::::: {.cols data-latex=""}

:::: {.col data-latex="{0.4\textwidth}"}
1. IQs greater than\ $110$.
2. IQs between\ $90$ and\ $115$.

::::

:::: {.col data-latex="{0.05\textwidth}"}
\ 
<!-- an empty Div (with a white space), serving as
a column separator -->
::::

:::: {.col data-latex="{0.5\textwidth}"}

3. IQs less than\ $110$.
4. IQs greater than\ $85$.
::::
::::::
:::


```{r IQMatchDiagramsForwards, fig.cap="Match the diagram with the description.", fig.align="center", out.width="100%", fig.height=1.50, fig.width=6.5}
par( mfrow = c(1, 4),
     mar = c(4, 1, 2, 1) + 0.1)

mu <- 100 
sigma <- 15

# A
out <- plotNormal(mu,
                  sigma,
                  las = 2,
                  main = "Diagram A",
                  xlab = "IQ scores",
                  round.dec = 0)
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 90,
            hi = 115)

# B

out <- plotNormal(mu,
                  sigma,
                  las = 2,
                  main = "Diagram B",
                  xlab = "IQ scores",
                  round.dec = 0)
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 0,
            hi = 110)

# C

out <- plotNormal(mu,
                  sigma,
                  las = 2,
                  main = "Diagram C",
                  xlab = "IQ scores", 
                  round.dec = 0)
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 110,
            hi = 200)

# D

out <- plotNormal(mu,
                  sigma,
                  las = 2,
                  main = "Diagram D",
                  xlab = "IQ scores",
                  round.dec = 0)
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = 85,
            hi = 200)
```


::: {.exercise #SamplingDistributionsIQBackwards}
IQ scores are
`r if (knitr::is_latex_output()) {
   'designed to have'
} else {
   '[designed to have](https://en.wikipedia.org/wiki/IQ_classification)'
}`
a mean of\ $100$ and a standard deviation of\ $15$.
Match the diagram in Fig.\ \@ref(fig:IQMatchDiagramsBackwards) with the meaning.

:::::: {.cols data-latex=""}

:::: {.col data-latex="{0.4\textwidth}"}
1. The *largest*\ $25$% of IQ scores.
2. The *smallest*\ $10$% of IQ scores.

::::

:::: {.col data-latex="{0.05\textwidth}"}
\ 
<!-- an empty Div (with a white space), serving as
a column separator -->
::::

:::: {.col data-latex="{0.5\textwidth}"}

3. The *largest*\ $70$% of IQ scores.
4. The *smallest*\ $60$% of IQ scores.
::::
::::::

:::


```{r IQMatchDiagramsBackwards, fig.cap="Match the diagram with the description.", fig.align="center", out.width="100%", fig.height=1.5, fig.width=6.5}
par( mfrow = c(1, 4),
     mar = c(4, 1, 2, 1) + 0.1)

mu <- 100
sigma <- 15

out <- plotNormal(mu,
                  sigma,
                  las = 2,
                  main = "Diagram A",
                  xlab = "IQ scores",
                  round.dec = 0)
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = qnorm(0.75, mean = 100, sd = sigma),
            hi = 200)

out <- plotNormal(mu,
                  sigma,
                  main = "Diagram B",
                  xlab = "IQ scores",
                  round.dec = 0)
shadeNormal(out$x,
            out$y,
            col = plot.colour,
            lo = qnorm(0.30, mean = 100, sd = sigma),
            hi = 200)

out <- plotNormal(mu,
                  sigma,
                  las = 2,
                  main = "Diagram C",
                  xlab = "IQ scores",
                  round.dec = 0)
shadeNormal(out$x,
            out$y, 
            col = plot.colour,
            hi = qnorm(0.10, mean = 100, sd = sigma),
            lo = 0)  

out <- plotNormal(mu,
                  sigma,
                  las = 2,
                  main = "Diagram D",
                  xlab = "IQ scores",
                  round.dec = 0)
shadeNormal(out$x, 
            out$y,
            col = plot.colour,
            hi = qnorm(0.60, mean = 100, sd = sigma),
            lo = 0)
```


::: {.exercise #SamplingDistributionsEmpiricalA}
The $68$--$95$--$99.7$ rule states that *approximately*\ $68$% of observations are within one standard deviation of the mean.
Use the tables in 
`r if ( knitr::is_html_output()) { 
   'Appendix\\ \\@ref(ZTablesOnline)'
} else {
   'Appendices\\ \\@ref(ZTablesNEG) and \\@ref(ZTablesPOS)'
   }`
to compute a more precise value for the percentage of observations within one standard deviation of the mean.
Comment.
:::


::: {.exercise #SamplingDistributionsEmpiricalB}
The $68$--$95$--$99.7$ rule states that *approximately*\ $95$% of observations are within two standard deviations of the mean.
Use the tables in
`r if ( knitr::is_html_output()) { 
   'Appendix\\ \\@ref(ZTablesOnline)'
} else {
   'Appendices\\ \\@ref(ZTablesNEG) and \\@ref(ZTablesPOS)'
   }`
to compute a more precise value for the percentage of observations within two standard deviations of the mean.
Comment.
:::


::: {.exercise #SamplingDistributionsTrees}
Consider again the study by @data:Aedo1997:softwood (Example\ \@ref(exm:NormalTrees)), who studied the diameter of trees in certain forests.
The tree diameters can be modelled as having a normal distribution, with a mean of $\mu = 8.8$ inches, and a standard deviation of $\sigma = 2.7$ inches.
Using this model, answer these questions.

1. What is the probability that a tree will have a diameter *less than*\ $8$\ inches?
1. What is the probability that a tree will have a diameter *greater than*\ $9$\ inches?
1. What is the probability that a tree will have a diameter *between*\ $7$ and\ $10$\ inches?
1. The largest\ $15$% of trees have what diameters?
1. The smallest\ $25$% of trees have what diameters?
:::


::: {.exercise #CornSeeds}
@pasha2016effect simulated methods for coating corn seeds (with fertilizer and crop protection chemicals, etc.).
The seed diameter was modelled with a normal distribution, with mean\ $7.5\mms$ and standard deviation of\ $0.225\mms$.
Using this model, answer these questions.

1. What is the probability that a seed has a diameter of more than $8\mms$?\tightlist  
  `r if( knitr::is_html_output() ) {
	longmcq( c(
	   "About 2.22",
	   answer = "About 1.3%",
	   "About 98.7%"))}`
 2. What is the probability that a seed has a diameter less than $7.1\mms$?  
  `r if( knitr::is_html_output() ) {
	longmcq( c(
	   "About 96.3%",
	   answer = "About 3.8%",
	   "About -1.78"))}`
3. What is the probability that a seed has a diameter between $7.5$ and $8\mms$?  
  `r if( knitr::is_html_output() ) {
	longmcq( c(
	   "About 0.89",
	   answer = "About 48.7%",
	   "About 2.22",
	   "About 50%",
	   "About 98.7%"))}`
4. What is the diameter of the smallest $30$% of seeds?  
  `r if( knitr::is_html_output() ) {
	longmcq( c(
	   "Smaller than about 7.62mm",
	   "Larger than about 7.38mm",
	   "About -0.524",
	   answer = "Smaller than about 7.38mm"))}`
5. What is the diameter of the largest $90$% of the seeds?  
  `r if( knitr::is_html_output() ) {
	longmcq( c(
	   "Less than about 7.79mm",
	   "Larger than about 7.79mm",
	   "Less than about 7.21mm",
	   answer = "**Larger** than about 7.21mm",
	   "About -1.28"))}`
:::


::: {.exercise #SamplingDistributionsGestationLength}
@snowden2018causal studied factors influencing preterm births.
The gestation length of healthy babies was modelled with a normal distribution, having a mean of\ $40$\ weeks, and a standard deviation of\ $1.64$\ weeks.
Using this model, answer these questions.

1. What proportion of births are *longer* than $39$\ weeks (that is, nine months)?
1. In Australia, 
`r if (knitr::is_latex_output()) {
   'a premature birth is defined as a birth occurring before $37$ weeks.'
} else {
   '[a premature birth is defined as a birth occuring before $37$ weeks](https://www.pregnancybirthbaby.org.au/premature-baby).'
}`
   What proportion of births are expected to be premature?
1. According to
`r if (knitr::is_latex_output()) {
   '*Health Direct*,'
} else {
   '[*Health Direct*](https://www.pregnancybirthbaby.org.au/premature-baby),'
}`
   'Babies born between\ $32$ and\ $37$ weeks may need care in a special care nursery'.
   What proportion of healthy births would be expected to be born between\ $32$ and\ $37$ weeks gestation? 
1. How long is the gestation length for the *longest*\ $5$% of pregnancies?
1. How long is the gestation length for the *shortest*\ $10$% of pregnancies?
:::


::: {.exercise #SamplingDistributionsBridgesTrucks}
A new method for evaluating bridge loads [@obrien2018probabilistic] used a simulation to compare the new method to an existing method. 
For the simulation, they modelled the gross vehicle mass (GVM) of trucks as having a normal distribution, with a mean of\ $13$\ tonnes and a standard deviation of\ $1.3$\ tonnes.

The Isuzu F-Series trucks are rated as having a GVM between\ $10.7$ and\ $26.0$ tonnes (depending on the configuration).

1. What is the $z$-score for the lower limit of\ $10.7$ tonnes? 
1. What is the $z$-score for the upper limit of\ $26.0$ tonnes? 
1. What does a negative $z$-score mean?
:::


::: {.exercise #SamplingDistributionsIQs}
IQ scores are
`r if (knitr::is_latex_output()) {
   'designed to have'
} else {
   '[designed to have](https://en.wikipedia.org/wiki/IQ_classification)'
}`
a mean of\ $100$ and a standard deviation of\ $15$.
`r if (knitr::is_latex_output()) {
   'Mensa'
} else {
   '[Mensa](https://www.mensa.org/)'
}`
is a society for people with a high IQ; specifically, for people who have 'attained a score within the upper two percent of the general population' (Mensa webpage: https://www.mensa.org/).
What IQ score is needed to join Mensa?
:::


::: {.exercise #SamplingDistributionsIQsMilitary}
IQ scores are
`r if (knitr::is_latex_output()) {
   'designed to have'
} else {
   '[designed to have](https://en.wikipedia.org/wiki/IQ_classification)'
}`
a mean of\ $100$ and a standard deviation of\ $15$.
@data:Zagorsky2016:Blondes reports that the US Military must 'reject all military recruits whose IQ is in the bottom\ $10$% of the population' (@data:Zagorsky2016:Blondes, p.\ 403).
What IQs scores lead to a rejection from the US military?
:::


::: {.exercise #SamplingDistributionsChargingEVs}
A study of the impact of charging electric vehicles (EVs) on electricity demands [@affonso2018probabilistic] modelled the *time* at which people began charging their EVs at home.
Based on a survey [@us20112009], they modelled the time at which EVs began charging as having a mean of\ $5$:$30$pm, with a standard deviation of\ $2.28\hs$.
For this model:

1. What is the probability that an EV will begin charging after\ $9$pm?
1. What is the probability that an EV will begin charging before\ $5$pm?
1. What is the probability that an EV will begin charging between\ $5$pm and\ $6$pm?
1. $30$% of the EVs begin charging after what time?
1. The earliest\ $15$% of charging begins when?
  
*Hint:* This question is easier if you convert times into 'minutes after\ $5$:$30$'.
:::

`r if( knitr::is_latex_output() ) "\\captionsetup{font=normalsize}"`


<!-- QUICK REVIEW ANSWERS -->
`r if (knitr::is_html_output()) '<!--'`
::: {.EOCanswerBox .EOCanswer data-latex="{iconmonstr-check-mark-14-240.png}"}
**Answers to *Quick Revision* questions:**
**1.** True.
**2.** True.
**3.** False: $1 - 0.70 = 0.30$.
**4.** False: $z = -1.04$.
**5.** True.
**6.** True.
:::
`r if (knitr::is_html_output()) '-->'`