Inferential Statistics

advertisement
+
Inferential Statistics
Jin Guo
+
Inferential Statistics
 Definition: the
branch of statistics concerned with
drawing conclusions about a population from
a sample.
 Sample: representative, typically
 Main


random
functions:
Estimating Population Parameters
Testing statistically based hypotheses
+
Estimating Population Parameters
 Estimating
parameters related to central tendency
(mean), variability (the standard deviation), and
proportion (P).
 Example: Estimating



a Population Mean
The mean from infinite number of random samples from a
normal distribution
Mean: parameter (mean of the population) we are trying to
estimate when unbiased.
Standard Deviation: standard error of the mean
+
Estimating Population Parameters

Point and Interval Estimate

Point estimate: use a single value of a statistic to estimate the
population parameter.

Interval estimate: is defined by two numbers, between which a
population parameter is said to lie.

A confidence interval gives an estimated range of values which
is likely to include an unknown population parameter, the
estimated range being calculated from a given set of sample
data.
+
Testing hypotheses
 Example:
Assume there are 1,000,000 on-line students in
this course, I claim that 80 percent of them are
very satisfied with today’s class.
+
Testing hypotheses

Statistical hypotheses

Null hypothesis: sample observations result purely from chance.

Alternative hypothesis

Outcome: reject the null hypothesis or fail to reject the null
hypothesis

Decision Errors

Type I error: reject a null hypothesis that is true. The probability
of committing this error is called significance level.

Type II error: fail to reject a null hypothesis that is false. The
probability of not committing this error is called the power of the
test.
+
Testing hypotheses
 Decision
Rules:

P-value: the probability of observing a test statistic as
extreme as S, assuming the null hypothesis is true. If the Pvalue is less than the significance level, we reject the null
hypothesis.

Region of acceptance: it is defined so that the chance of
making a Type I error is equal to the significance level. If
the test statistic falls within the region of acceptance, the
null hypothesis is not rejected.
+
Testing hypotheses

One-Tailed Test


A test of a statistical hypothesis, where the region of rejection is on
only one side of the sampling distribution. For example, suppose the
null hypothesis states that the mean is less than or equal to 10. The
alternative hypothesis would be that the mean is greater than 10. The
region of rejection would consist of a range of numbers located on the
right side of sampling distribution; that is, a set of numbers greater
than 10.
Two-Tailed Test

A test of a statistical hypothesis, where the region of rejection is on
both sides of the sampling distribution. For example, suppose the null
hypothesis states that the mean is equal to 10. The alternative
hypothesis would be that the mean is less than 10 or greater than 10.
The region of rejection would consist of a range of numbers located
on both sides of sampling distribution; that is, the region of rejection
would consist partly of numbers that were less than 10 and partly of
numbers that were greater than 10.
+
Testing hypotheses
 Procedure:

State the hypotheses: include a null hypothesis and an
alternative hypothesis, mutually exclusive.

Formulate an analysis plan: Specify significance level and test
method. Test method includes a test statistic (mean score,
proportion, difference between means, difference between
proportions, z-score, t-score, chi-square, etc) and a sampling
distribution.

Analyze sample data: Calculate the test statistic and P-value

Interpret the results: Compare the P-value to the significance
level, and rejecting the null hypothesis when the P-value is less
than the significance level.
+
Testing hypotheses
 Test



methods :
One-sample tests: a sample is being compared to the
population from a hypothesis.
Two-sample tests: comparing two samples, typically
experimental and control samples from a scientifically
controlled experiment.
Paired tests: comparing two samples where members are
paired between samples so the difference between the
members becomes the sample.
+
Testing hypotheses
 Test
methods :

Z-tests: comparing means under stringent conditions regarding
normality and a known standard deviation.

T-tests: comparing means under relaxed conditions (less is
assumed).

F-tests (analysis of variance, ANOVA): comparing two variance. It
is are commonly used when deciding whether groupings of data
by category are meaningful.

Chi-squared tests use the same calculations and the same
probability distribution for different applications: chi-squared
tests for variance, chi-squared tests of independence, chi-squared
goodness of fit tests.
+
Testing hypotheses
Purpose
Test Method
Means
one sample t-test
Difference between means
two sample t-test
Proportions
one sample z-test
Difference between proportions
two-proportion z-test
Regression Slope
linear regression t-test
Difference between matched pairs
matched-pairs t-test
Difference between variances
two-sample f-test
Goodness of fit
chi-square goodness of fit test
Homogeneity
chi-square test for homogeneity
Independence
chi-square test for independence
Reference: http://en.wikipedia.org/wiki/Category:Statistical_tests
+
Back to our example
 Assume
there are 1,000,000 on-line students in this
course, I claim that 80 percent of them are very
satisfied with today’s class.
 To
test this claim, I survey 100 students through
email, using simple random sampling. Among the
sampled students, 73 percent say they are very
satisfied. Based on these findings, can we reject the
hypothesis that 80% of the students are very
satisfied? Use a 0.05 level of significance.
+
Solution
 State
null hypothesis and an alternative
hypothesis.
 Null
hypothesis: P = 0.80
 Alternative hypothesis: P ≠ 0.80
 Formulate
an analysis plan:
 significance
level -- 0.05.
 test method -- one-sample z-test (for testing
proportions).
+
Solution
 Conditions




for the test method:
The sampling method is simple random sampling.
Each sample point can result in just two possible outcomes.
We call one of these outcomes a success and the other, a
failure.
The sample includes at least 10 successes and 10 failures.
(Some texts say that 5 successes and 5 failures are
enough.)
The population size is at least 10 times as big as the sample
size.
+
Z-test
 Test
statistic

A z-score (standard score): indicates how many standard deviations an
element is from the mean.

It can be calculated from the formula: z = (X - μ) / σ
where z is the z-score, X is the value of the element, μ is the population mean, and
σ is the standard deviation.

Interpret z-scores: the normal random variable of a standard normal
distribution .

A z-score equal to 0 represents an element equal to the mean.

A z-score less than 0 represents an element less than the mean.

A z-score greater than 0 represents an element greater than the mean.

A z-score equal to 1 represents an element that is 1 standard deviation greater
than the mean; a z-score equal to 2, 2 standard deviations greater than the mean;
etc.
+
Solution
 Analyze

sample data:
Using sample data, we calculate the standard deviation (σ)
and compute the z-score test statistic (z).


σ = sqrt[ P * ( 1 - P ) / n ] = sqrt [(0.8 * 0.2) / 100] = 0.04
z = (p - P) / σ = (.73 - .80)/0.04 = -1.75
where P is the hypothesized value of population proportion in
the null hypothesis, p is the sample proportion, and n is the
sample size.
+
Solution
 Analyze


sample data:
P-value: two-tailed test, the probability that the z-score is
less than -1.75 or greater than 1.75. We use the Normal
Distribution Calculator to find P(z < -1.75) = 0.04, and P(z >
1.75) = 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
Interpret results:

Since the P-value (0.08) is greater than the significance level
(0.05), we cannot reject the null hypothesis.
+
More about inferential statistics

http://www.socialresearchmethods.net/kb/statinf.php

http://en.wikipedia.org/wiki/Inferential_statistics

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing

http://stattrek.com/

Statistics test books
+
Questions?
Download