Chapter 6-7-8 Sampling Distributions and Hypothesis Testing

advertisement

Chapter 6-7-8 Sampling

Distributions and

Hypothesis Testing

When we have a frequency distribution, or histogram, we can determine probabilities. Look at the

M&M example.

What is one of the most common shapes of frequency distributions??

The normal distribution.

Again, all normal distributions are characterized by the mean and the standard deviation. There are an infinite number of normal distributions.

But some are very special to us, like the

Standardized Normal Distribution.

– ALL normal distributions can be standardized.

– All scores are put in terms of Standard

Deviation units from the mean.

– SO, we know proportions, and hence, probabilities associated with scores that fall in a normal distribution. We just did that in

Chapter 5.

100% of our observations appear in the normal distribution.

Proportions and probabilities are the same.

What proportion of scores fall above a zscore of 1?

What is the probability that a randomly chosen z-score will be 1 or higher?

What is the probability that a randomly chosen z-score will fall between 0 and .5?

There is a .05 probability (or a 5% chance) of a z-score being this high or higher?

More

We can also look at specific scores (X), convert them into z-score, and find the probability of getting a score that high or higher, lower than that score, and so on.

– Given sigma = 100 and the mean = 500, what is the probability of getting a 600 or higher?

– 1) Convert to z; (600-500)/100 = 1.

– 2) What proportion of the distribution falls at or above a z-score of 1?

The past

What we have been doing is descriptive statistics.

We have come up with distributions, measures of central tendency and measures of variability, all of which describe a population or a sample.

We can use these, as we have found out, to find the probability of a score, or range of scores, etc.

But statistics, z-scores, probabilities, etc., can be used for more interesting purposes.

The future

Inferential statistics – Estimate population parameters from a sample, or determine if two samples are different

– Hypothesis testing – Is the population parameter equal to some specific value?

– Ex. This class (random sample) takes a study skills course: Seating, classroom tips, study habits

– G. P. A. – Is the G.P.A. of this class now different than MSU students generally

(population)?

Well, let’s think about this.

Of course, if we were to randomly sample 50

MSU students and get their mean GPA, it would be a little different than the actual population mean GPA.

There will always be a little error, the sample mean will probably not equal the population mean until all of the members in the population are in our sample.

The quantification of this discrepancy is called

Sampling Error –

The discrepancy, or amount or error, between a sample statistic and its corresponding parameter.

Well, let’s think about this.

Also, we can take numerous samples. For example, the next day I can get the GPAs of 40 different students. The mean GPA for this sample will also be a little different than the true population mean. ALSO, this second sample will have a mean that is slightly different from our first sample mean.

– In fact, we could take a huge number of samples, and get a huge number of sample means.

So, how do we use a given sample to estimate the population if every sample will be a little different?

Sampling Distribution

To answer this we have to create a sampling

Distribution of a statistic (mean, median)

In particular, we will use a Sampling

Distribution of Sample Means =

– This is the collection of sample means for all the possible random samples of a particular size (n) that could be obtained from a population.

OR

– The distribution of a statistic (the mean) over repeated sampling from a specified population.

Sampling distribution of sample means : (Most common),

G.P.A.: Say MSU population mean is 2.74, distribution of means of an infinity of random samples.

We have been looking at distributions of

SCORES, now we are doing to look at distributions of all possible SAMPLE

MEANS.

We are dealing with particular type of

sampling distribution = a distribution of statistics (e.g., mean) obtained by selecting all the possible samples of a specific size from a population

DRAW SAMPLING DISTRIBUTION

OF MEANS: N = 50

Distribution of means if we sample

50 students and assume the population mean is 2.74:

Sample 1: 2.77

Sample 2: 2.91

Sample 3: 2.55

Sample 4: 3.77

NOTE: This is similar to what we were doing with z scores. We were looking at where a z score falls in a distribution of scores. Now we are looking at where a sample statistic (in this case the mean) falls among a distribution of samples.

If close to the middle of the distribution we retain null hypothesis (no difference)

If far from the middle – sample unlikely, reject hypothesis.

Sampling Error: Variability of a statistic from sample to sample. Due to chance.

Standard Error: The standard deviation of a sampling distribution from the population. (sigma/ sqrt n)

As usual, n = sample size, which should be taken into account when calculating standard deviations.

Obviously, the larger the sample, the closer the sample means will be to the population mean (i.e., less error). So, we have to take sample size into account.

Law of large numbers = the larger the sample size, the more probable it is that the sample mean will be close to the population mean.

When n = 1, se = sd

As n increases, the standard error should decrease. The equation takes this into account.

There is this great mathematical Theorem that allows us to know the general properties of our sampling distribution as our samples (and population) get larger and larger.

Central Limit Theorem:

Central Limit Theorem:

From the book: For any population with a mean

(mu) and a standard deviation (sigma), the distribution of sample means for sample size n will have a mean or mu and a standard deviation of sigma/sqrt n and will approach a normal distribution as n approaches infinity.

– So what is this saying?

As N increases, sample means and standard deviations approach those of the population.

– With a sample size of 30+, the distribution of sample means is practically normal.

– So, we have a clue about the mean of the sampling distribution, the standard deviation, and its shape

(normal). What can we do with this information???

So what is this saying?

As N increases, sample means and standard deviations approach those of the population.

With a sample size of 30+, the distribution of sample means is practically normal.

So, we have a clue about the mean of the sampling distribution, the standard deviation, and its shape (normal). What can we do with this information???

This allows us to know the distribution of sample means for any population, regardless of the mean and SD, and even if the population distribution is not normal.

Back to our example:

MSU Mean: 2.53

Class Mean: 3.02

There may be no relationship between this class (the intervention) and G.P.A.

Goal:

Determine whether this difference is due to chance (sampling error)

Can determine with probabilities how likely/unlikely it is that this difference is due to chance.

If this class is different, then we can classify it as a different population with different population parameters (higher mean)

A statistical test will answer this question for us:

HYPOTHESIS TESTING!

A hypothesis test = a statistical procedure that uses sample data to evaluate hypotheses about a population parameter.

General steps.

– 1) generate a hypothesis about the population mean.

– 2) So, we hypothesize that our sample mean will be close to this guess regarding the population mean.

– 3) Obtain a sample and sample mean

– 4) Compare the sample and population means.

1) Set up Null Hypothesis:

The null hypothesis always says the opposite of that in which we are interested:

– We can never prove something is true; We can only prove that it is false

In other words:

– There is no difference between our groups or:

– If we are only interested in whether our group is better:

Null Hypothesis would say our group is equal to or worse than other.

– We are usually working to reject the null hypothesis

– Note: Assuming the null is true, we create our sampling distribution. In this case the sampling distribution of means.

– M class = 2.53

2. Set up the “Alternative hypothesis” (What we want to find)

M class ne 2.53

Doing this before we collect our data. Mean could be higher or lower. Maybe our class hurts people G.P.A.

3. Set a criterion level for our

Decision:

How far away does the mean have to be for us to reasonably doubt that this sample came from the same population?

When are we going to say this sample is the same as the population

(just sampling error) or when we are going to say this sample is different from the population.

3. Set a criterion level for our

Decision:

When are we going to say this sample is the same as the population (just sampling error) or when we are going to say this sample is different from the population.

Significance level – Predetermined probability that represents a sample result that is so rare or unusual that is cast doubt on the accuracy of Ho: alpha

– The probability with which we are willing to reject Ho when it is correct.

Rejection region: the set of outcomes from an experiment that will lead to a rejection of Ho.

Typically:

– Choose : alpha = 5%

Download