The test statistic, the standard error, and the p-value (z-statistic).
- Christos Nikolaou
- Apr 21
- 5 min read
Updated: Apr 22
If the town we live in has a population close to 5 million people, we can safely consider our sample of 5 million to be the whole population of our town. We want to make a graph to see the probability of meeting people of different heights as we walk down the road. For example, we want to know the probability that someone is between 120-130 cm or 150-160 cm.
This graph will look like a bell, known as the normal or Gaussian probability function. It is called a function because it takes a unique and different value for different heights. Its value is the probability that we meet people of different heights and is depicted on the y-axis. The x-axis depicts the heights. We call it Gaussian because it was developed by the mathematician Friedrich Gauss. We also call it normal mainly because many natural phenomena, like the height of people, are distributed normally. And this brings us to the term distribution. This is a simpler term for describing a function. So, instead of saying the function of the probability relative to the heights, we say the distribution of the heights within a population.

Some defining properties of the normal distributions are the following:
1. Their line of symmetry, which is the line that cuts them in half, goes through the mean of the distribution. This makes sense because it says that the probability of meeting a person whose height is close to the population's mean is higher than meeting a person whose height is far away from the population's mean. In our example, the mean height of the population is 170 cm.
2. 95% of all measurements are within plus or minus 1.96 standard deviations of the mean. The other 5% is further away from this; we consider them extreme values.
You may remember from high school that a way of measuring the variance within a distribution is the standard deviation. The standard deviation is a measure of how far away the measurements are from the mean. You can look this up later. What is important for now is to remember that 95% of the measurements are within plus or minus 1.96 standard deviations from the mean. All other measurements can be considered extreme or very rare.
So, we know that the variable height on the x-axis follows a normal distribution. This distribution helps find the probability that the height of the next person we meet is within a specific interval. However, in statistics, we don't use measurements of individuals. We use mean values of a sample of individuals. So, it is more essential for us to know the probability of the mean height of a sample of people. As an example, we would like to know what the probability is that if we randomly select 30 people from our town, their mean height will be between 160 and 170 cm.

To find this, we would need to take many random samples of 30 individuals and plot the relative frequency of their mean to create another curve. This curve will give us the probability that if we randomly select 30 people from our town, their mean height will be between, say, 160-170 cm or any other interval we choose. And how would such a curve look? You may have guessed it. It will be a normal distribution. In other words, the mean of all possible samples of size 30 from our town will be distributed normally (Figure 3).

Now, a very interesting fact here is that the mean of many samples of large enough size will always follow a normal distribution, regardless of how the variable we are measuring is distributed. For example, even if the height of the people did not follow the normal distribution, the mean height of many samples of people would. And this is part of what we call the central limit theorem in statistics. Another part says that the mean of the sample means will be the same as the mean of the whole population of the individuals. So, in our example, the line of symmetry of the distribution of the sample means will go through 170 cm. If you think about it, that makes sense because taking a sample of 30 people whose mean height is close to the population's mean should be more probable than taking a sample of people with a mean height very different from the population's mean height.
So, we now have a graph that can tell us the probability of selecting a sample of size 30 with a specific interval of heights. We call this graph the distribution of the sample means, and we know that its mean will be the same as the mean of the individual measurements of heights in our population. Like every distribution, it has a standard deviation, which we call the standard error. To make things easier for us, we will assume that the standard error is the same as the standard deviation of the whole population. In reality, the standard error is the standard deviation of the population divided by the square root of the sample size, which is 30 in our case. But never mind that.
Let's now say that we randomly select a sample of 30 people from our population whose mean height is 180 cm. Also, let's say that the standard deviation of heights in the population is 7 cm and that the standard deviation of the sample means (or standard error) is also 7 cm.
We have said that the distribution of the sample means will have the same mean as the distribution of the heights of the individuals. So, the mean of the sample means will be 170 cm. Also, we have said that the sample means follow a normal distribution, which means that 95% of the samples will have a mean within plus or minus 1.96 standard errors from the mean.
Our sample's mean is 10 cm away from the population mean. The standard error is 7. Ten is approximately 1.43 times 7. So our sample is 1.43 standard errors from the mean.
Now, this is an aha moment! Have you heard of the term "test statistic"? Well, the number 1.43 is our test statistic.
We just carried out a statistical test. But we have not finished yet.
So, the test statistic is how many standard errors our sample mean is away from the mean. We want to know this because we can consider our sample an extreme value if it is more than 1.96 standard errors away from the mean. We know that we have a 5% or 0.05 probability of selecting a sample with a mean regarded as an extreme value and a 95% or 0.95 probability of getting a sample with a mean not considered an extreme value.
Our test statistic is 1.43. What is the probability that we get a test statistic equal to or higher than 1.43 or lower than -1.43? In other words, what is the probability that we get a test statistic as or more extreme than 1.43 or -1.43?
This probability is called the significance probability or the p-value!
Comments