Chapter 3: The Normal Distributions

Chapter 19: Two-Sample Problems

STAT 1450

19.0 Two-Sample Problems

Connecting Chapter 18 to our

Current Knowledge of Statistics

Population

Parameter

μ (σ known)

μ (σ unknown)

Point Estimate s

𝑥

Confidence

Interval

𝑥 ± 𝑧 ∗ 𝜎 𝑛

Test Statistic 𝑧 =

𝑥 − 𝜇

0 𝜎 𝑛

𝑥 ± 𝑡 ∗ 𝑠 𝑛 𝑡 =

𝑥 − 𝜇

0 𝑠 𝑛

▸ Remember that these formulas are only valid when appropriate simple conditions apply!


Connecting Chapter 18 to our

Current Knowledge of Statistics

▸ Matched pairs were covered at the end of Chapter 18.

A common situation requiring matched pairs is when before-and-after measurements are taken on individual subjects.

▸ Example: Prices for a random sample of tickets to a 2008 Katy Perry concert were compared with the ticket prices (for the same seats) to her 2013 concert.

.



The data could be consolidated into 1 column of differences in ticket prices.



A test of significance, or, a confidence interval would then occur for

“1 sample of data.”

19.1 The Two-Sample Problem

The Two-Sample Problems

▸ Two-sample problems require us to compare:

 the response to two treatments

- or -

 the characteristics of two populations.

▸ We have a separate sample from each treatment or population.


The Two-Sample Problem

▸ Example: Suppose a random samples of ticket prices for concerts by the Rolling Stones was obtained. For comparison purposes another random sample of Coldplay ticket prices was obtained. Note these are not necessarily the same seats or even the same venues.

▸ Question: Are these samples more likely to be independent or dependent?

a) Independent b) Dependent c) Not sure


The Two-Sample Problem

▸ Example: Suppose a random samples of ticket prices for concerts by the Rolling Stones was obtained. For comparison purposes another random sample of Coldplay ticket prices was obtained. Note these are not necessarily the same seats or even the same venues.

▸ Question: Are these samples more likely to be independent or dependent?

a) Independent b) Dependent c) Not sure


Two-Sample Problems

▸ The end of Chapter 18 described inference procedures for the mean difference in two measurements on one group of subjects (e.g., pulse rates for 12 students before-and-after listening to music).

▸ Given our answer from above, and the likelihood that each sample has different sample sizes, variances, etc… Chapter 19 focuses on the difference in means for 2 different groups.

Population

Parameter 𝜇

1

− 𝜇

2

Point Estimate

𝑥

1

− 𝑥

2

Confidence

Interval

Test Statistic

19.2 Comparing Two Population Means

Sampling Distribution of Two Sample Means

▸ Recall that for a single sample mean 𝑥

 The standard deviation of a statistic is estimated from data the result is called the standard error of the statistic.

 The standard error of 𝑥 is 𝑠 𝑛

.

Inference in the two-sample problem will require the standard error of the difference of two sample means 𝒙

𝟏 𝒙

𝟐

.


Sampling Distribution of Two Sample Means

▸ The following table stems from the above comment on standard error and statistical theory.

Variable Parameter Point Estimate Population

Standard Deviation

Standard Error x

1 x

2

Diff = x

1

- x

2 m

1 m

2 m

1

m

2

𝑥

1

𝑥

2

𝑥

1

− 𝑥

2 s

1 s

2 𝜎

1

2 𝑛

1

+ 𝜎

2

2 𝑛

2 𝑠

1 𝑛

1 𝑠

2 𝑛

2 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2


Example: SSHA Scores

▸ The Survey of Study Habits and Attitudes (SSHA) is a psychological test designed to measure various academic behaviors (motivation, study habits, attitudes, etc…) of college students. Scores on the SSHA range from 0 to 200. The data for random samples 17 women

(**the outlier from the original data set was removed**) and 20 men yielded the following summary statistics.

▸ Is there a difference in SSHA performance based upon gender?



▸ Summary statistics for the two groups are below:

Group

Women**

Sample

Mean

139.588

Sample Standard

Deviation

20.363

Sample

Size

17

Men 122.5

32.132

20

 There is a difference in these two groups.

The women’s average was 17.5 points > than the men’s average.




Group

Women**

Sample

Mean

139.588

Sample Standard

Deviation

20.363

Sample

Size

17

Men 122.5

32.132

20



 Yet, the standard deviations are larger than this sample difference, and the sample sizes are about the same.




Group

Women**

Sample

Mean

139.588

Sample Standard

Deviation

20.363

Sample

Size

17

Men 122.5

32.132

20




 Is this difference significant enough to conclude that 𝜇 women is larger than 𝜇 men

?




Group

Women**

Sample

Mean

139.588

Sample Standard

Deviation

20.363

Sample

Size

17

Men 122.5

32.132

20




 Is this difference significant enough to conclude that 𝜇 women is larger than 𝜇 men

? Let’s learn more!

19.3 Two-Sample t Procedures

The Two-sample t Procedures: Derived

▸ Now that we have a point estimate and a formula for the standard error, we can conduct statistical inference for the difference in two population means.

Chapter

18

19

Parameter of Interest m

(

σ unknown; 1-sample)

(

σ

1

μ

1

,

σ

2

μ

2 unknown;

2-samples)

Point

Estimate

𝑥

𝑥

1

− 𝑥

2

Standard

Error 𝑠 𝑛 𝑠 2

1 𝑛

1

+ 𝑠 2

2 𝑛

2

Confidence Interval

𝑥 ± 𝑡 ∗ 𝑠 𝑛 pt. estimate ± t*(standard error)




Chapter

18

19

Parameter of Interest m

(

σ unknown; 1-sample)

(

σ

1

μ

1

,

σ

2

μ

2 unknown;

2-samples)

Point

Estimate

𝑥

𝑥

1

− 𝑥

2

Standard

Error 𝑠 𝑛 𝑠 2

1 𝑛

1

+ 𝑠 2

2 𝑛

2

Confidence Interval

𝑥 ± 𝑡 ∗ 𝑠 𝑛 pt. estimate ± t*(standard error)

( 𝑥

1

− 𝑥

2

) ± t * 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2



Chapter Parameter of

Interest

18

Point

Estimate

μ

(

σ unknown;

1-sample)

𝑥

Standard

Error 𝑠 𝑛

Test Statistic 𝑡 =

𝑥 − 𝜇

0 𝑠/ 𝑛

19 m

1

μ

2

( σ

1

, σ

2 unknown;

2-samples)

𝑥

1

− 𝑥

2 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2 pt. estimate – m

0 standard error

Note: H

0

for our purposes will be that

m

1

=

m

2

; which is equivalent to there being a mean difference of ‘0.’



Chapter Parameter of

Interest

18

Point

Estimate

μ

(

σ unknown;

1-sample)

𝑥

Standard

Error 𝑠 𝑛

Test Statistic 𝑡 =

𝑥 − 𝜇

0 𝑠/ 𝑛 m

1

μ

2 pt. estimate – m

0 standard error

19

( σ

1

, σ

2 unknown;

2-samples)

𝑥

1

− 𝑥

2 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2 𝑡 =

( 𝑥

1 𝑥

2

) − 0 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2

Note: H

0

for our purposes will be that

m

1

=

m

2

; which is equivalent to their being a mean difference of ‘0.’


The Two-sample t Procedures

▸ Now we can complete the table from earlier:

Population

Parameter

Point Estimate Confidence Interval Test Statistic 𝜇

1

− 𝜇

2

𝑥

1

− 𝑥

2 t* is the critical value for confidence level C for the t distribution with df = smaller of (n

1

-1) and (n

2

-1).

Find P -values from the t distribution with df = smaller of (n

1

-1) and (n

2

-1).




Population

Parameter

Point Estimate Confidence Interval 𝜇

1

− 𝜇

2

𝑥

1

− 𝑥

2

( 𝑥

1

− 𝑥

2

) ± t * 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2

Test Statistic t* is the critical value for confidence level C for the t distribution with df = smaller of (n

1

-1) and (n

2

-1).

Find P-values from the t distribution with df = smaller of (n

1

-1) and (n

2

-1).




Population

Parameter

Point Estimate Confidence Interval Test Statistic 𝜇

1

− 𝜇

2

𝑥

1

− 𝑥

2

( 𝑥

1

− 𝑥

2

) ± t * 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2 𝑡 =

( 𝑥

1

− 𝑥

2

) − 0 𝑠 2

1 𝑛

1

+ 𝑠 2

2 𝑛

2 t* is the critical value for confidence level C for the t distribution with df = smaller of (n

1

-1) and (n

2

-1).

Find P-values from the t distribution with df = smaller of (n

1

-1) and (n

2

-1).


The Two-sample t Procedures: Confidence

Intervals

▸ Draw an SRS of size n

1 𝜇

1 from a large Normal population with unknown mean

, and draw an independent SRS of size n

2 from another large Normal population with unknown mean 𝜇

2

. A level C confidence interval for 𝜇

2

𝜇

1 is given by

( 𝑥

1

− 𝑥

2

) ± t * 𝑠

2

1 𝑛

1

+ 𝑠

2

2 𝑛

2

▸ Here t * is the critical value for confidence level C for the t distribution with degrees of freedom from either Option 1(computer generated) or

Option 2 (the smaller of n

1

– 1 and n

2

– 1).


The Two-sample t Procedures: Significance

Tests

▸ To test the hypothesis H

0

: μ

1

μ

2

, calculate the two-sample t statistic 𝑡 =

( 𝑥

1

− 𝑥

2

) 𝑠

2

1 𝑛

1

+ 𝑠 𝑛

2

2

2

▸ Find p-values from the t distribution with df = smaller of ( n

1

-1 ) and ( n

2

-1 ).


Conditions for Inference Comparing Two-

Sample Means and Robustness of t Procedures

▸ The general structure of our necessary conditions is an extension of the one-sample cases.

 Simple Random Samples:

 Do we have 2 simple random samples?

 Population : Sample Ratio:

 The samples must be independent and from two large populations of interest.




 Large enough sample:

Both populations will be assumed to be from a Normal distribution and

 when the sum of the sample sizes is less than 15, t procedures can be used if the data close to Normal (roughly symmetric, single peak, no outliers)? If there is clear skewness or outliers then, do not use t .

 when the sum of the sample sizes is between 15 and 40, t procedures can be used except in the presences of outliers or strong skewness.

 when the sum of the sample sizes is at least 40, the t procedures can be used even for clearly skewed distributions.




▸ Note: In practice it is enough that the two distributions have similar shape with no strong outliers. The two-sample t procedures are even more robust against non-Normality than the one-sample procedures.



Poll: SSHA Scores

▸ Suppose we have a goal of measuring the mean difference in SSHA between women and men. Which seems more plausible?

a.

µ

Women

µ

Men

= 0 b.

µ

Women

µ

Men

≠ 0

(There is no difference.)

(There is some difference.)



▸ The summary statistics for the SSHA scores for random samples of men and women are below. Use this information to construct a 90% confidence interval for the mean difference.

Group

Women

Men

Sample

Mean

139.588

122.5

Sample Standard

Deviation

20.363

32.132

Sample

Size

17

20

18.3 One-Sample t Confidence Intervals

Example: 90% CI for SSHA Scores

Steps for Success-

1. Components



Constructing Confidence Intervals for m

1

- m

2

.

1.

Confirm that the 3 key conditions are satisfied

Do we have two simple random samples?

(SRS?, N:n?, t-distribution?).

Yes. It was stated.

 Large enough population: sample ratio?

Yes. N

W

N

M

> 20*17 = 340

> 20*20 = 400

 Large enough sample?

Yes. n

W

+ n

M

=37 < 40 but outlier has been removed.

No skewness.



2. Components.

𝒙 𝒘 𝒙 𝒎

= 139.588, s w

= 122.5, s m

= 20.363, n w

= 32.132, n m

= 17

= 20



1

- m

2

.

1.



2.

Identify the 3 key components of the

, n

2

).

confidence interval (means, s.ds., n

1

3.

Select t* .

4. Construct the confidence interval .

5.

* Interpret * the interval.



2. Components.

𝒙 𝒘 𝒙 𝒎

= 139.588, s w

= 122.5, s m

= 20.363, n w

= 32.132, n m

= 17

= 20

3. Select t*.

df =min{(n w

-1), (n m

-1)}=16 t*(90%, 16) = 1.746



1

- m

2

.

1.



2.


, n

2

).


1

3.

Select t* .


5.




2. Components.

𝒙 𝒘 𝒙 𝒎

= 139.588, s w

= 122.5, s m

= 20.363, n w

= 32.132, n m

= 17

= 20

3. Select t*.

df =min{(n w

-1), (n m

-1)}=16 t*(90%, 16) = 1.746



1

- m

2

.

1.



2.


, n

2

).


1

3.

Select t* .


5.


4. Interval.

139.588 − 122.5 ± 1.746

20.363

2

17

+

32.132

2

20

17.088 ± 15.222 = 1.866

to 32.31



2. Components.

𝒙 𝒘 𝒙 𝒎

= 139.588, s w

= 122.5, s m

= 20.363, n w

= 32.132, n m

= 17

= 20

3. Select t*.

df =min{(n w

-1), (n m

-1)}=16 t*(90%, 16) = 1.746



1

- m

2

.

1.



2.


, n

2

).


1

3.

Select t* .


5.


4. Interval.

139.588 − 122.5 ± 1.746

20.363

2

17

+

32.132

2

20

17.088 ± 15.222 = 1.866

to 32.31

5. Interpret.

We are 90% confident that the mean women’s SSHA score is between 1.866 and 32.31 points higher than men’s.



▸ Let’s continue with this example by now conducting a test of significance for the mean difference in SSHA by gender at a

=0.10.

Does our decision align with the results from the earlier poll?

Group

Women

Men

Sample

Mean

139.588

122.5

Sample Standard

Deviation

20.363

32.132

Sample

Size

17

20



State: Is there a difference in the mean SSHA scores between men and women?

(i.e., m

Diff

≠ 0, m

Women

− m

Men

≠ 0 , m

Women

≠ m

Men

)

Plan: a.) Identify the parameter.




(i.e., m

Diff

≠ 0, m

Women

− m

Men

≠ 0 , m

Women

≠ m

Men

)


m

Diff

= m

Women

m

Men

.

b) List all given information from the data collected.




(i.e., m

Diff

≠ 0, m

Women

− m

Men

≠ 0 , m

Women

≠ m

Men

)


m

Diff

= m

Women

m

Men

.


𝒙 𝒘

= 139.588, s w

= 20.363, n w

= 17 𝒙 𝒎

= 122.5, s m

= 32.132, n m

= 20 c) State the null (H

0

) and alternative (H

A

) hypotheses.




(i.e., m

Diff

≠ 0, m

Women

− m

Men

≠ 0 , m

Women

≠ m

Men

)


m

Diff

= m

Women

m

Men

.


𝒙 𝒘

= 139.588, s w

= 20.363, n w

= 17 𝒙 𝒎

= 122.5, s m

= 32.132, n m


0


A

) hypotheses.

H

0

: m

Diff

= 0 H a

: m

Diff

≠ 0




(i.e., m

Diff

≠ 0, m

Women

− m

Men

≠ 0 , m

Women

≠ m

Men

)


m

Diff

= m

Women

m

Men

.


𝒙 𝒘

= 139.588, s w

= 20.363, n w

= 17 𝒙 𝒎

= 122.5, s m

= 32.132, n m


0


A

) hypotheses.

H

0

: m

Diff

= 0 H a

: m

Diff

≠ 0 d) Specify the level of significance. a

=.10

e) Determine the type of test.

Left-tailed Right-tailed Two-Tailed


Plan: f)

Sketch the region(s) of “extremely unlikely” test statistics.




Solve: a) Check the conditions for the test you plan to use.

 Two Simple Random Samples?

 Large enough population: sample ratios?

 Large enough samples?



Solve: a) Check the conditions for the test you plan to use.

 Two Simple Random Samples?

Yes. Stated as a random sample.

 Large enough population: sample ratios?

Yes. Both populations are arbitrarily large; much greater than, N

W

> 20*17 = 340; N

M

> 20*20 = 400

 Large enough samples?

Yes. n

W

+ n

M

=37 < 40 outlier has been removed. No skewness.


Solve: b) Calculate the test statistic 𝑥 𝑤

−𝑥 𝑚 𝑠𝑤 2 𝑛𝑤

+ 𝑠𝑚 2 𝑛𝑚

= c) Determine (or approximate) the P-Value.







=

139.588−122.5

20.3632

17

+

32.1322

20

=

17.088

8.719

= 1.96

c) Determine (or approximate) the P-Value.






=

139.588−122.5

20.3632

17

+

32.1322

20

=

17.088

8.719

= 1.96

c) Determine (or approximate) the P-Value.

1.96 DF = 17 - 1

 1.746 < 1.96 < 2.12

 .05 < P -value < .10

P -value



Conclude: a) Make a decision about the null hypothesis ( Reject H

0 or Fail to reject H

0

).





0

).

Because the approximate P -value is smaller than 0.10, we reject the null hypothesis. b) Interpret the decision in the context of the original claim.





0

).

Because the approximate P -value is smaller than 0.10, we reject the null hypothesis. b) Interpret the decision in the context of the original claim.

There is enough evidence (at a

=.10) that there is a difference in the mean SSHA score between men and women.



▸ Let’s continue with this example by now conducting a test of significance for the mean difference in SSHA by gender at a

=0.10.

Does our decision align with the results from the earlier poll? ________

Group

Women

Men

Sample

Mean

139.588

122.5

Sample Standard

Deviation

20.363

32.132

Sample

Size

17

20

Chapter 3: The Normal Distributions

Chapter 19: Two-Sample Problems

Note: H

for our purposes will be that

=

; which is equivalent to there being a mean difference of ‘0.’

Note: H

for our purposes will be that

=

; which is equivalent to their being a mean difference of ‘0.’

Related documents

Products

Support

Chapter 3: The Normal Distributions

Chapter 19: Two-Sample Problems

Note: H

for our purposes will be that

=

; which is equivalent to there being a mean difference of ‘0.’

Note: H

for our purposes will be that

=

; which is equivalent to their being a mean difference of ‘0.’

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib