Uploaded by Stats Work

Explain and Execute Statistical Design and Analysis of Two Variable Hypothesis - Statswork

Explain And Execute Statistical Design And Analysis Of Two Variable
Hypothesis
Dr. Nancy Agens, Head,
Technical Operations, Statswork
I. INTRODUCTION
In this blog, I will explain you how
the statistical analysis is being applied for
two independent samples. In practice, the
test statistic used for comparing the two
means from a population is by using the ttest because t-test shrinks the data to a
single t-value and it is then compared with
the significant value for the final conclusion.
Now, Let us understand the theoretical
background in performing the t-test for two
variables.
Suppose X1 and X2 be the two
independent random variable and let
,
be the sample with size n1 and n2 from a
population with mean µ1, µ2 and variance
σ12, σ22 respectively. It is obvious that if the
sample size is large enough then the sample
mean will follow a normal distribution, (i.e)
and
In addition, if the means of the two
samples are said to follow normal
distribution, then the difference of mean are
also said to follow normal distribution. It is
given by
Under the null hypothesis, H0: µ1 =
µ2 which means there is no statistical
significant difference between the means.
The test statistic becomes
Copyright © 2020 Statswork. All rights reserved
If suppose we come across the data
having the same variance then the test
statistics boils down to
Once the t-value is calculated, the
next step is to compare with the critical
value with alpha level of significance and if
the calculated t-value is less than the
significant value then the conclusion is to
reject the null hypothesis stating that there is
a significant difference between the means
of the population. (Cressie & Whitford,
1986)
Imagine a marketing company has
recently launched two campaigns for
advertising their product. The company’s
head wants to identify whether both the
campaign is equally effective or not. In such
case, the statistical hypothesis testing is the
essential method to give a valid inference.
Before performing any statistical hypothesis
testing, the main task is to understand the
problem statement, to frame the hypotheses
of interest, to find a suitable test statistics,
and finally to make a proper decision with
the results.
This blog will elaborate each one
with the advertising example as mentioned
above.
1
II. UNDERSTANDING THE PROBLEM
STATEMENT
The primary or basic task in any
statistical data analysis is to know or find
out what the problem is and how the data is
being measured. In our example, the
manager wish to find the effectiveness of
their campaign, for this, he/she has to
consider all the information related to the
campaign and find out whether the
campaign results in a profit or loss. The only
way to test whether the two campaign is
effective is to perform a statistical test by
comparing their means.
problem and is denoted by H0. That is, for
our example, the null hypothesis will be
there is no statistically significant difference
between the mean incomes from two
campaigns.
H0:μ1 = μ2
Or
H0:μ1−μ2=0
The alternative hypothesis (H1) is
simply a contrary to the null hypothesis.
That is, there is a significant difference
between the means of the two campaigns.
H1: μ1≠μ2
or
H1: μ1−μ2≠0
III. CONSTRUCTION OF TEST
HYPOTHESES
IV. FINDING A SUITABLE TEST
STATISTICS
Once you understand the problem at
hand, the next step is to frame an appropriate
hypothesis to test for statistical significance;
we call it as the null hypothesis and
alternative hypothesis (Flandin & Friston,
2019). The null hypothesis is something
which we claim or our belief about the
For finding the suitable statistic test,
we need to find the distribution of the
data. I will illustrate with a simulated data
for two campaigns using R software.
set.seed(123)
camp1<-rt(30,29)*50+210
camp2<-rt(30,29)*48+170
Fig. 1 Histogram-Normal Distribution
Copyright © 2020 Statswork. All rights reserved
2
If you see the above graph, the data
is closely from a normal distribution.
However, I have taken the sample size as 30
per campaign, so we should make use of the
t-distribution for testing this problem. From
the simulated data, the mean for two
campaigns is $210.2226 with standard
deviation $60.0008 and $182.8537 with
standard deviation $47.56557 respectively.
V. CALCULATION OF TEST STATISTIC
Once you got all the necessary
values for the calculation, the next step is to
apply it into the formula of statistics test as
mentioned earlier. Here, I will illustrate
using R.
Difference<-mean(camp1)-mean(camp2)
Std.dev<-sqrt((sd(camp1)^2+sd(camp2)^2)/2)
Std.err<Std.dev*(1/length(camp1)+1/length(camp2))^0.5
t.value<-Difference/Std.err
The difference of mean is $27.3689 and the t-value is
1.9578.
VI. CONCLUSION OF THE PROBLEM
As a final step, we compare the
calculated t.value with the critical value. In
order to find the critical value, we need to
fix the significance level alpha. Usually, it is
considered as 5% that means we can tolerate
the probability of rejecting the null
hypothesis by 5% or 0.05 level of
significance. Next step is to check whether
the null hypothesis is one-sided or two-sided
for the concluding the problem. If you are
concerned about which campaign is higher
or smaller then the null will be one-sided.
However, in our case, it is two sided null
hypothesis stating that the means of the
campaigns are equal. An important note is
that in a two-sided test the critical region is
divided by half (5% is equally distributed in
both sides from population mean).
The rejection region can be
calculated using the confidence interval. If
Copyright © 2020 Statswork. All rights reserved
the t-value lies outside the confidence limit
we will reject the null hypothesis otherwise
we accept the same. In R, there is a function
called t.test to perform the calculation and
the p-value is compared with 0.05 for the
conclusion.
Res<t.test(camp1,camp2,paired=FALSE,var.equal=TRU)
Res
Two Sample t-test
data: cam1 and cam2
t = 1.9578, df = 58, p-value = 0.05507
alternative hypothesis: true difference in
means is not equal to 0
95 percent confidence interval:
-0.613592 55.351410
sample estimates:
mean of x mean of y
210.2226 182.8537
From the results, the t-value (or test
statistic) is 1.9578 as we got previously and
the p-value is 0.05507, which is greater
than 0.05. Since the p-value is greater than
0.05, we accept the null hypothesis and
conclude that the difference of mean amount
from two campaign is same.To sum up, this
blog is to elaborate and explain you the
procedure used to test the two variables and
how to provide a valid inference from the
results. I hope this blog serves you better for
understanding and analyzing similar data.
REFERENCES
[1] Cressie, N. A. C., & Whitford, H. J. (1986). How to use
the two sample t‐test. Biometrical Journal, 28(2),
131–148.
Retrieved
from
https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj
.4710280202
[2] Flandin, G., & Friston, K. J. (2019). Analysis of
family‐wise error rates in statistical parametric
mapping using random field theory. Human Brain
Mapping, 40(7), 2052–2054. Retrieved from
https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm
.23839
3