Hypothesis testing
- An assumption or claim made about the entire population using the sample statistics after an analysis is performed on a sample is called Hypothesis.
- What hypothesis testing helps you do is statistically verify whether a claim is likely to be true or not for the whole population.
- Thus, we can say that Hypothesis testing is a method or procedure that tests the statistical validity of a claim.
- Components involved in Hypothesis testing:
- Null hypothesis : It states that there is no change or no difference in the situation and assumes that the status quo is true. It always has the
=
sign and it is a common belief about the population.
- Alternative hypothesis : It is the claim that opposes null hpothesis. It challenges the status quo and may or may not be proved. It never has the
=
sign and always challenges the status quo.
- Example: Jeep, a well-known car maker, claims that it car ‘Compass’ gives a mileage of at least 17 km/litre.
- The null hypothesis for this case would be:
- The alternative hypothesis is:
Steps of Hypothesis testing
- Step 1: Defining the hypothesis
- State
null
and alternative
hypothesis for the problem.
- Step 2: Identify the associated distribution
- Condition 1:
n > 30
, which means that the population sample size should be greater than 30 observations.
- Condition 2: is known i.e. the population standard deviation is known.
- If both of above conditions are satisfied, you go for a normal distribution or Z-test; otherwise, you use the T-test.
- Step 3: Determine the test statistic.
- This is a value that is to be calculated from some given data, which is then used to compare the results arrived at with the tabular values.
- The test statistic for a normal distribution or Z-test is defined as .
- Here,
- is the process mean,
- is the population mean,
- is the standard deviation,
n
is the sample size.
- Refer here
- Example: Google claims that its internet browser ‘Chrome’ is the best in the industry as it has an optimum boot time of only 250 ms, with a standard deviation of 9 ms. Sam, a tech geek, wanted to test the claim of Google. So, he randomly collected boot time data of 165 devices of Chrome and got a sample mean of 247 ms.
- Step 1: Define Hypothesis
- Step 2: Conditions
- Condition 1: n > 30. Here
n = 165
which is greater than 30.
- Condition 2: is defined. Here
- As we have both conditions satisified, it can be considered as Normal distribution. Z-test can be performed.
-
- If we test our hypothesis at 95% confidence level. For 95% confidence interval, . The test statistic value we calculated = -4.3
- The region between -1.96 and + 1.96 is called acceptance region and the region outside it is called critical region.
- If the calculated Z-statistic is in the region of acceptance, you fail to reject the null hypothesis. If the calculated Z-statistic lies outside the region of acceptance, i.e in the critical region, you reject the null hypothesis.
- The critical region depends upon the nature of the alternate hypothesis. An alternate hypothesis can be of two types:
- Non-directional
When you test any non-directional hypothesis, you need to define the critical region on both the sides as you need to check whether the sample mean lies to left or right of the assumed population mean.
This kind of a test is called a two-tailed test because you have to check both the tails of the sampling distribution.
- Directional
It will hypothesise that the population mean lies in a particular direction from the assumed mean. Such an alternative hypothesis is called a directional hypothesis.
- Lower-tailed test
The critical region that lies on the left tail of the sampling distribution. Thus, the hypothesis test is called a one-tailed test, and more specifically, a lower-tailed test.
- Upper-tailed test
The critical region that lies on the right tail of the sampling distribution. Thus, the hypothesis test is called a one-tailed test, and more specifically, an upper-tailed test.
Errors in Hypothesis testing
It has 2 types of errors:
- Type 1 error
- Type 2 error
_ |
True |
True |
Reject |
Type 1 error |
None |
Fail to reject |
None |
Type 2 error |
Type 1 error
- This is where a good hypothesis is rejected.
- is True, but it is rejected.
- It is also called the consumer’s loss because it results in the consumer discarding a perfectly good material/service, causing his organisation a loss of time as well as monetary resources
- Type 1 error is equal to , the level of significance.
Type 2 error
- This is where a false hypothesis is accepted.
- is False, but it is accepted.
- It is also called the producer’s gain as it results in a faulty batch being produced by them to be accepted by a customer.
- Type 2 error is denoted by , where (1 - ) is also called the power of the statistical test.
- Power of statistical test is inversely proportional to the probability of making Type-2 error. Therefore, 1 - .