Review of Basic Statistical Concepts

advertisement
Review of MGT 2110




Descriptive Statistics
Probability distribution
Estimation (Confidence interval)
Inference (Hypothesis testing)
Descriptive Statistics
 Numerical measures
o Mean, Median, Mode
o Variance and standard deviation
o Percentiles
o Quartiles and Interquartile-Range
o Frequency distribution (use Frequency array-function)
 Graphical Presentations
o Histogram
o Scatter Diagram (for two columns of data)
Probability Distribution
Random Variable (RV): A numerical description of the outcome of an experiment.
Discrete RV: A random variable that can take a countable set of values. For instance, if an
experiment consists of inspecting 10 laptops produced by a manufacturer, then a random
variable X can be defined as the number of defective laptops in the lot. The possible values
for X are any number from zero to 10.
Continuous RV: A random variable that can take an uncountable range of values. For
instance, if an experiment consists of measuring the amount of toothpaste in a 6 oz. tube, then
a random variable X can be defined as the amount of toothpaste in a tube. The possible values
for X could be any value between 5.8 oz. To 6.2 oz. The values within the range are not
countable.
Probability Distribution: A description of how the probabilities are distributed over the
values the random variable can assume. Probability distribution for a discrete RV is called a
discrete probability distribution. Probability distribution for a continuous RV is called a
continuous probability distribution.
Continuous probability distribution:
Normal Probability Distribution: A continuous probability distribution. The normal
distribution is a symmetrical distribution with a mean,  , and a standard deviation,  .
Example
A department store has determined that its customers charge an average of $500 per month,
with a standard deviation of $80. Assume the amounts of charges are normally distributed.
a.
What percentage of customers charges less than $340 per month?
b. What percentage of customers charges more than $380 per month?
c.
What percentage of customers charges between $644 and $700 per month?
d. What is least dollar amount of the top 10% of customer charges?
e.
What are the minimum and maximum of the middle 95% of customer charges?
Four Excel functions for answering the above questions
To find probabilities using normal distribution:
=NORM.S.DIST(z,1)
z must first be calculated before using this function.
Returns cumulative probability
=NORM.DIST(X,,)
Returns cumulative probability for X
To find value of X, given normal probability:
=NORM.S.INV(probability)
Returns the Normal table value of z
Then, X may be computed using X =  + z
=NORM.INV(Probability,,)
Returns the value of X for the given cumulative probability
Estimation (Confidence Interval)
Confidence Interval for population mean (
Assume a simple random sample of size n
Point Estimation:
Sample Statistic
Size
Mean
Standard deviation
± SE
S
SE = Sampling Error = t 2 .
n
n
Population Parameter
N
S


Confidence Interval =
(Always use t, use Z only if  is known)
Then, Confidence interval for   x  t 2 .
S
n
Two methods for calculating confidence interval
Method A – Using Excel TINV function
Step 1
Find t-table value using the Excel function
=T.INV.2T(,df)
 = 1 – Confidence level
df = degrees of freedom
Step 2
Determine the sampling error (SE)
SE = t/2 S/√n
Step 3
Calculate the lower and upper limits of the
confidence interval
LL =
UL =
– SE
+ SE
Method B – Using Excel Data Analysis command
Step 1
Run Descriptive Statistics command from
Data Analysis command with Confidence
Level for mean checked
The output includes the sampling
error – the last item of the output
table, Confidence Level
Step 2
Calculate the lower and upper limits of the
confidence interval
LL =
UL =
– SE
+ SE
Example 1
A sample of 100 cans of coffee showed an average weight of 13 ounces with a standard
deviation of 0.8 ounces. Develop and interpret a 98% confidence interval for the mean weight
of coffee in the cans.
Example 2
For the Net Income as a % of equity, develop and interpret a 97% confidence interval for the
mean.
Confidence Interval for population proportion (p
Assume a simple random sample of size n
Point Estimation:
Sample Statistic
Size
Mean
Population Parameter
n
Confidence Interval for p =
N
p
± SE
Estimating Sampling Error (SE) =
z 2 .
p (1  p )
n
Then, Confidence interval for p = p  z 2 .
p (1  p)
n
Step 1
Find z-table value using the Excel function
Step 2
Determine the standard error estimate
=ZINV(/2)
.
Step 3
p (1  p )
n
Determine the sampling error (SE)
SE = z 2 .
Step 4
Calculate the lower and upper limits of the
confidence interval
LL =
UL =
p (1  p )
n
– SE
+ SE
Example
In a poll 600 voters were asked whether they were in favor of eliminating plastic bags in
grocery stores. 390 of the voters were in favor and 210 of the voters were opposed. Develop
a 92% confidence interval estimate for the proportion of all the voters who are opposed to the
proposal.
Inference (Hypothesis Testing)
Step 1: Set up the null and the alternative hypotheses.
Three types of hypotheses
Type
For population mean  For population proportion p
Two-tailed
H0: p = p0
Ho:  = a
H1: p ≠ p0
Ha:  ≠ a
One-tailed
H0: p ≤ p0
Ho:  ≤ a
H
1: p > p0
Ha:  > a
One-tailed
H0: p ≥ p0
Ho:  ≥ a
H1: p < p0
Ha:  < a
Step 2: Decision rule for testing the hypotheses
Possible results of a Hypothesis Test
H0 is accepted
Correct decision
H0 is true
Type II error
H0 is false
H0 is rejected
Type I error
Correct decision
Decision rule: Reject H0 if the probability of type I error <= , where,
= Level of significance. i.e. the maximum tolerable value for the
probability of type I error up to which the H0 can be rejected
Note: Probability of type II error = 
Step 3: Compute p-value and reject H0, if p-value <= .
Case 1: For hypotheses about , use t-distribution for p-value
p-value = T.DIST.2T(abs(t),df) for two tailed test
= T.DIST.RT(abs(t),df) for one tailed test
Where, t 
X a
, df = degrees of freedom = n-1, and k = number of tails, 1 or 2.
S n
Case 2: For hypotheses about p, use z-distribution for p-value
p-value = 1 - NORMSDIST(abs(z)) for one-tailed tests
p-value = 2*(1 - NORMSDIST(abs(z)) for two-tailed tests
Where, z 
p  p0
.
p0 (1  p0 )
n
Example 1:
A sample of 81 account balances of a credit company showed an average balance of $1,200
with a standard deviation of $126. Determine if the mean of all account balances is
significantly different from $1,150. Use a .05 level of significance.
Example 2:
It is assumed that at least half the membership of a national trade union is female. A random
sample of 400 members showed 168 women. Does the sample show that the proportion of
women among the membership is less than 50%? Use a .05 level of significance for this
hypothesis test.
Example 3:
It is normally assumed that the net income as % equity for the companies in the population is
no more than 13%. However, test whether the sample data shows that the net income as %
equity for the companies in the population is now greater than 13%. Use a .01 level of
significance.
When to use .INV and .DIST functions
Use .INV for find table values for confidence intervals only
Use .DIST for find p-value fop hypothesis testing only
Using .INV functions for Confidence Interval
If Sigmas are known: Table value Z/2 = NORM.S.INV(cell containing the value of 1-/2)
If Sigmas are unknown: t/2 = Table value T.INV.2T(,df)
Using .DIST functions for Hypothesis testing
If Sigmas are known:
Step 1: Find Z using formula (don’t use functions like NORM….)
Step 2: p-value for 2-tailed test: (1-NORM.S.DIST(ABS(Z-calculated),1))*2
p-value for 1-tailed test: (1-NORM.S.DIST(ABS(Z-calculated),1))
If Sigmas are unknown:
Step 1: Find t using formula (don’t use functions like T.INV ot T.DIST...)
Step 2: p-value for 2-tailed test: T.DIST.2T(ABS(t-calculated),df)
p-value for 1-tailed test: T.DIST.RT(ABS(t-calculated),df)
Download