Section 3.1

advertisement

Lab#__________ Name___________________

Section________ Instructor________________

Homework #3 - Solutions

Many of these problems (**)are not assigned but the solutions have been included for your benefit.

Section 3.1

**3.3 This is an observational study. No action was taken by the experimenter that would affect the outcome. The explanatory variable is cell phone use. The response variable is the occurrence of brain cancer.

3.4

Observational. The researcher applied no treatment that might affect the outcome. The explanatory variable is satisfaction with congress. The response variable is support for term limits.

**3.7 This is an experiment. The treatment is the exercise. The response variable is the change in metabolic rate before the exercise and 12 hours later. The explanatory variable is exercise.

Section 3.2

**3.11 The units are the individuals who were called. One factor is what information is offered. Treatments are (1) giving name, (2) identifying university, (3) both of these. The second factor is offering to send a copy of the results. The treatments are either offering or not offering. The response is whether the interview was completed.

**3.14 a) Randomly assign 20 subjects to group 1 and twenty subjects to group2. Give the warmed treatment to group 1 and the not warmed treatment to group 2. Then observe the infections.

An alternative design, and one that would allow you to test whether it was the warm fluids or the warm air or the combination of the two caused a reduction of infections after surgery, would be to divide the patients into 4 groups, 10 subjects each.

b) Number the subjects from 01 to 40. Divide digits into groups of two on the random number chart. Omit groups that are over 40. The first 20 pairs will be group 1. The rest will be in group 2.

3.16

a) This would confound the effect of operating team. If one team is better than another or gets harder (or easier) patients, this would be confounded with the warming effect.

b) This is a double-blind design. This prevents the operating team from treating one type of patient different from another. It also prevents different treatment by the doctors who follow the patients after the surgery and who, presumably, assess their improvement.

**3.19 “Significantly higher” means that these returns are higher than would be expected due to change. “Not significantly different from zero” means that, although the sample may not be exactly zero, the difference from zero is within the range that can occur simply due to chance.

**3.32

Subject Excess Block – (a) Treat – (b)

Williams 22 1 C

Festinger 24

Hernandez 25

Moses 25

Santiago 27

Kendall

Mann

Smith

28

28

29

2

2

2

1

1

1

2

D

A

B

A

C

B

D

Brunk

Obrack

30

30

Rodriguez 30

Loren 32

Jackson

Stall

Brown

Dixon

33

33

34

34

Birnbaum 35

Tran 35

3

3

3

3

4

4

4

4

5

5

D

C

A

B

D

A

C

B

C

A

Nevesky 39

Wilansky 42

5

5

B

D

For part b, use table B line 112. Find the first occurrences of 1, 2, 3, 4, while ignoring the other digits. Also omit repeats of digits you have already used. 596 3 6 8880 4 04634 7 1. This produces the sequence 3, 4, 1, and the last digit must be 2.

Lab#__________ Name___________________

Section________ Instructor________________

Therefore the third subject will get treatment A, the fourth subject will get treatment B, etc. For block 2, continue on the random number chart, 1 97 19 3 5 2 . The second block will use 1, 3, 2, 4, etc.

**3.34 a) This is a blocked design, with ages being the blocks.

b) In a diagram, show runners are split into blocks by ages. Then each age is randomly assigned into two groups with one group receiving Vitamin C and the other group receiving the placebo. Then compare the groups for infections.

c) It is unlikely that a difference this large could simply occur by chance.

Section 3.3

**3.38 a) The adults in the country.

b) All the wood sent by the supplier.

c) All households in the U.S.

**3.46 a) Line 120 starts off with 35. We consider our 200 addresses to be 5 lists of 40 each. We would use address 35, 75,

115, 155, 195.

b)In the previous example, each address in the group of 40 has the some chance of being chosen, depending on the value from Table B. It is not a SRS because not all combinations have the same probability. This system will never choose the first five addresses for example.

3.48

Use an alphabetized list of male faculty and an alphabetized list of female faculty. Assign them to the numbers 0001 to

2000 for males and 001 to 500 for females. Use the numbers from table B. Divide them into groups of 4 for the male faculty and groups of 3 for the female faculty continuing where we left off for the males. For the males: 1387, 0529, 0908, 1369,

0815. For the females: 072, 256, 027, 306, 341.

**3.52 The second period had a higher rate of no answers. This may be due to people being away on vacation. Those on vacation would tend to be wealthier than those who could not afford to go away.

Section 3.4

Why do we take samples?

Sometimes it is impossible to examine the whole population. We usually take some ‘samples’ from the population in order to draw conclusions about the wider population.

What makes a ‘good’ sample?

We select a sample without preference (bias) so that we make a ‘good’ sample.

How does taking a bigger (increasing the sample size, n) sample help?

A bigger sample can help to reduce the variability of statistics.

Why is randomization so important in statistical inference?

Randomization guarantees that the results of analyzing our data are subject to the laws of probability.

Define population parameter and sample statistic and explain how we use each of them in statistics.

Population parameter describes the characteristics of the real population, which is usually unknown. The sample statistic describes the sample and does not involve the population parameter. We use appropriate sample statistics to estimate the population parameters.

3.60

52% is a parameter; 43 is a statistic

**3.62 a) High bias (low) and high variability (several large values). b) Low bias and low variability c) Low bias and high variability d) High bias (high) and low variability

3.64

The sample mean will have the same mean as the population, but lower variability. a) The means of samples of size 25 range from about $20,000 to $70,000. The range of samples of size 100 range from about $30,000 to $50,000. This shows that larger sample size produces estimates with lower variability. b) The mean of the population is between $35,000 and $40,000.

**3.65 We assume that the proportion of real estate owners in each state is roughly the same. We also note that with a sample of size 2000, we do not need to correct for finite population size because we are sampling such a small fraction of the population. a) The variability of the estimate depends on the size of the sample, not the population.

Lab#__________ Name___________________

Section________ Instructor________________ b) Yes, if we take a fixed proportion of the states population, we will have a much larger sample size for California than Wyoming. The estimate for California will be much less variable for California than for Wyoming.

**3.70 a) histogram of the 50 sample proportions b) histogram of samples, center of this histogram should be smaller than the center of the histogram in part a

**3.76 a) Sample survey b) Experiment. The treatment would be classroom or online course c) Observational

**3.85 a) One possible population is all full-time undergraduates in the fall term. b) A stratified sample might include 125 students from each year. c) Mailed questionnaires might have a high nonresponse rate. Telephone interviews would not reach those students without phones, and there might be problems in reaching students at home.

**3.91 Women chose for themselves whether or not to respond. If many of the letters described unfeeling treatment by men, this may suggest that women with this experience are more likely to write in. The 72% figure is likely higher than in the overall population.

Section 4.1 (p.285)

**1. Exercise 4.1

Clearly, your answer will depend on many factors: how you toss the coin, your consistency in the tossing process, etc. In any case, your result will certainly be a value between 0and 1.

**2. Using the Probability applet (found under Student Resources

Statistical Applets on the IPS homepage, http://bcs.whfreeman.com/ips4e which can also be found through the Stat30X Homepage

Applets

Introduction to the

Practice of Statistics, Applets ), simulate data like you just saw in your first problem. Use your estimated probability of heads from 4.1(above) in the box for Probability of heads = ____ and Toss _25__ times , but you must do this twice to get 50 tosses. Don’t click Reset. Is 50 enough tosses to be close to your probability of heads?

Whether 50 is enough tosses also “depends”. What do you mean by “close”? How certain do you want to be about any claim or assumption you make about the P(Heads)? These are 2 important questions among several worth answering.

3. Exercise 4.8

(it uses the same applet as the problem above)

(a) Results: For n = 20, Proportion Agrees = 0.40, for n = 80, proportion = 0.48, for n = 320, proportion = 0.73.

(b) Results: For n = 20 women x 10 times: 0.80, 0.75, 0.65, 0.85, 0.60, 0.65, 0.65, 0.60, 0.80, 0.85, 0.65 Results: For n =

320 x 10 times: 0.74, 0.75, 0.72, 0.73, 0.72, 0.75, 0.74, 0.73, 0.75, 0.72 Clearly, there is less variability if n = 320.

Section 4.2

**4. Exercise 4.10

(a) The sample space would be a larger group that can be either male or female. Examples would include all adults in the U.S., college students, wild animals, etc.

(b) The sample space would be {6,7,…,20}

(c) The sample space would be the interval [2.5,6], not just whole numbers.

(d) This depends a bit on the way it is measured. If we are talking about living humans, heart rate is probably in the range 40-200 and would be a whole number.. In the hospital, they usually count the number of beats in 10 seconds and multiply by 6, so heart rates would always be a multiple of 6. The theoretical range would be positive whole numbers.

**5 . Exercise 4.18 Write out the probability distribution before answering the questions.

Prob (Acre is Forest) = 0.35, Prob (Acre is not Forest) = 0.65,

Prob (Acre is Pasture) = 0.03, Prob (Acre is not pasture) = 0.97

Forest Pasture Other

0.35 0.03 0.62

(a) Prob(Not Forested) = 1 – Prob (Is forest) = 1

0.35 = 0.65

(b) Prob (Forest or Pasture) = Prob (Forest) + Prob (Pasture) = 0.35 + 0.03 = 0.38 (Assuming that the forest cannot be

which is a reasonable assumption

(c) Prob (Other than forest or pasture) = 1 – Prob (Either forest or pasture) = 1

0.38 = .62

Section 4.3

**6. Exercise 4.40 discrete random variables usually count something, continuous random variables usually measure something

Lab#__________ Name___________________

Section________ Instructor________________

(b) Discrete, since only integers are allowed

(c) Continuous, since fractions are allowed --- you are measuring the amount of oxygen

(d) Depends on how it is measured. Theoretically, the rate could contain fractions and be continuous; however, the way it is usually measured, it would be a whole number (integer) and be discrete.

**7. Exercise 4.46

a) {Y>1} or (Y>=2}. Probability is 1 – 0.25 = 0.75.

b) This is the same as Y = 3 or Y = 4. The probability is 0.17 + 0.15 = 0.32.

c) 1 – Prob (Y = 2) = 1

0.32 = 0.68

8. Exercise 4.50

a) 0.6(0.6)0.4= 0.144

b) SSS,SSO,SOS,OSS,SOO,OSO,OOS,OOO. Prob(SSS) = (0.6)(0.6)(0.6) = 0.216, Prob(SSO) = Prob(SOS) =

Prob(OSS) = (0.6)(0.6)(0.4) = 0.144, Prob(SOO) = Prob(OSO) = Prob(OOS) = (0.6)(0.4)(0.4) = 0.096, Prob(OOO) =

(0.4)(0.4)(0.4) = 0.064

X 0 1 2 3

P(X) 0.216 0.432 0.288 0.064

c) Prob (X = 0) = 0.216, Prob (X = 1) = 0.432, Prob (X = 2) = 0.288, Prob (X = 3) = 0.064

d) A majority corresponds to X >= 2 which is the same as X being 2 or 3

Probability = 0.288 + 0.064 = 0.352

9. Exercise 4.52

a) For the uniform density, the probability of an interval is proportional to its length. The interval (0.27,1) is 73% of the total. You just find the area of the box with height 1

base * ht = (1

0.27)*1 = 0.73

b) Zero. For explanation: 4.51 X ~ Uniform(0,1) a) P(X

0.5) = 0.5, b) P(X < 0.5) = 0.5, c) Since X is a CONTINUOUS random variable, there is NO probability at exact points.

P(X

0.5) = P(X < 0.5) + P(X = 0.5) = 0.5 + 0 = 0.5

c) This is the same as (a)., because there is no density to the right of 1. P(0.27

X

1.27) = P(0.27

X

1) + P(X > 1)

d) These are disjoint intervals, so we can add the probability of each one. Each is length 0.1, so the total probability is

0.2 P(0.1

X

0.2 OR 0.8

X

0.9) = P(0.1

X

0.2) + P(0.8

X

0.9) = 0.1 + 0.1 = 0.2.

e) The probability of being inside the given interval is 0.5, so the probability of not being in the interval is 1–0.5 = 0.5.

More about normals:

**10 . Because P( z < 0.44) = 0.67, sixty-seven percent of all z values are less than 0.44, and 0.44 is the 67 th percentile of the standard normal distribution. Determine the value of each of the following percentiles for the standard normal distribution.

(If the cumulative area that you must look for does not appear in the z tables, use the closest entry.)

NOTE: z  is the z -score with an area of

to the right of it! See Z , t ,

2 , and F Concept Lab (Ch 8---black book). We will be using this notation later in the course, so I want you to become familiar with it. a. the 91 st percentile of Z = z

.09

= 1.34, P(Z < z

0.09

) = 0.91 or

By definition, z

0.09

is z -score with 9% of the curve above it, or P(Z > z

0.09

) = 0.09. We need to have < (or

) to use the Z

Table, however, so we use P(Z < z

0.09

) = 1

P(Z > z

0.09

) = 0.91 and look up 0.91 in the body of the table (the areas). b. the 77 th percentile of Z = z

.23

= 0.74

c. the 50 th percentile of Z = z

.50

= 0 d. the 9 th percentile of Z = z

.91

= -1.34 e. What is the relationship between the 70 th and 30 th z percentile?

The 70 th percentile has 30% above it and the 30 th percentile has 30% below it, so they have the same absolute value but the sign of the 70 th percentile is + and that of the 30 th percentile is

.

There multiple members of the Normal Family. Five are displayed in the graph at the right. a. Define the notation Z~ N( 0, 1 2 ):

Z is the random variable for which we are defining a distribution.

~ means “is distributed as,” and must be followed by some probability distribution . (remember SHAPE, CENTER,

SPREAD define a distribution)

N means the random variable is normally distributed, the shape is a bell-shaped curve.

0 means Z has mean 0, so the center is 0.

Lab#__________ Name___________________

Section________ Instructor________________

1 2 means Z has a standard deviation of 1, so the spread is 1. b. Which of the normals at the right has the largest mean?

The curve N(1, 1) has the largest mean (1), which can also be seen from the graph by noting its center of symmetry is most shifted to the right. c. Which of the normals at the right has the largest variance?

N(0, 4) = N(0, 2 2 ) evidently does, and this can also be seen from the graph by noting it is the distribution which is most

“spread out” about its mean. (from this notation we see its variance is 4)

12 . Consider the population of all one-gallon cans of dusty rose paint manufactured by a particular paint company. Suppose that a normal distribution with mean

= 5 ml and standard deviation

= 0.2 ml is a reasonable model for the distribution of the variable x = amount of red dye in the paint mixture. Use the normal distribution model to calculate the following probabilities.

NOTE: X ~ N(

= 5,

2 = 0.2

2 ) a. P ( X

5.4) = P(Z

(5.4-5)/0.2)=P(Z

2)=0.9772

b. P (4.6 < X < 5.2) = P((4.6-5)/0.2

Z

(5.2-5)/0.2)=P(-2

Z

1)=0.8413-0.0228=0.8185 c. P ( X > 4.5) = P(Z > (4.5-5)/0.2 )=P(Z>-2.5)=1-0.0062=0.9938

13 . A gasoline tank for a certain car is designed to hold 15 gallons of gas. Suppose that the variable x = actual capacity of a randomly selected tank has a distribution that is well approximated by a normal curve with mean 15.0 gal and standard deviation 0.1 gal. NOTE: X ~ N(

= 15,

2 = 0.1

2 ) a. What is the probability that a randomly selected tank will hold at most 14.8 gal?

P(X

14.8) = P(Z

(14.8

15)/0.1) = P(Z <

2) = 0.0228

b. What is the probability that a randomly selected tank will hold between 14.7 and 15.1 gal?

P(14.7

X

15.1) = P((14.7

15)/0.1

Z

(15.1

15)/0.1) = P(

3 < Z < 1) = 0.8413

0.0013 = 0.84

c. If two such tanks are independently selected, what is the probaility that both hold at most 15 gal?

If we have independent events, then we know that the probability is not altered by the other event(it says the same).

So, P(X

1

15 and X

2

15) = P(X

1

15)* P(X

2

15) = 0.5

2 = 0.25

**14 . The Wall Street Journal (Feb. 15, 1972) reported that General Electric was being sued in Texas for sex discrimination because of a minimum height requirement of 5 ft 7 in. The suit claimed that this restriction eliminated more than 94% of adult females from consideration. Let x represent the height of a randomly selected adult woman. Suppose that the probability distribution of x is approximately normal with mean 66 in. and standard deviation 2 in. NOTE: X ~ N(

= 66,

2 = 2 2 ) a. Is the claim that 94% of all women are shorter than 5 ft 7 in. correct? Find P ( X < 67 ).

P(X<67)=P( Z< (67-66)/2 ) =P(Z<0.5)=0.6915 Therefore, the claim is not correct. b. What proportion of adult women would be excluded from employment due to the height restriction?

Since P(X<67) = 0.6915 from part (a), only 69.15% of the adult women would be excluded from employment.

**15 Exercise 4.55

(a) Prob(fraction>1/2) = Prob(Z>(0.5-0.3)/0.023) = Pr(Z>8.7) = 0.

(b) Prob(fraction<0.25) = Prob(Z<-2.17) = 0.0150

(c) Prob(0.25<fraction<0.35) = Prob(-2.17<Z<2.17) = 0.9700

**16.Exercise 4.56

(a) Prob(fraction>0.16) = Prob(Z>(0.16 – 0.15)/0.0092) = Prob(Z>1.09) = 0.1379

(b) Prob(0.14<fraction<0.16) = Prob(-1.09<Z<1.09) = 0.7242

Section 4.4

**17. Exercise 4.60

The mean is 1.55 = 0x0.07 + 1x0.09 + 2x0.34 + 3x0.32 + 4x0.18 = 2.45

**18. Exercise 4.64

(a) Independent: weather conditions that far apart should be independent.

(b) Not independent: Weather patterns tend to persist for several days; today’s weather tells us something about tomorrow’s.

(c) Not independent: The tow locations are very close together and would likely have similar weather conditions .

19.Exercise 4.67

(a) The total mean 11 + 20 = 31 seconds.

(b) No: Changing the standard deviations does not affect the means.

Lab#__________ Name___________________

Section________ Instructor________________

(c) No: The total mean does not depend on dependence or independence of the two variables.

20. Exercise 4.82

(a)

Z

= 0.5*

X

+ 0.5*

Y

=

X

;

Z

=

(0.5

2

X

2 + 0.5

2

Y

2 ) = 0.5



X

2 =

X

/

2 = 0.707

X

=

X

/

 n where n =2

(b) If we consider X

1

+ X

2

to be X in part (a) and X

3

+ X

4

to be Y, we can see that the mean of Z is the same as the mean of X. The standard deviation will be 0.707 times the standard deviation of the result in (a), or about 0.49 the original standard deviation.

Z

=

X

/

4, the standard deviation of the average of 4 observations.

NOTE: the mean of averaged of any number of observations ( n ) is always the mean of the original population IF you take random samples. The standard deviation, however, will be 1/

 n the standard deviation of the original population, where n is the number in each sample.

Section 4.5

**4.97 Proving independence

The % Unemployment = %labor force NOT working. For “No High School”, the value is 1 – (11,139/12,073). Because the

4 percentages are not all the same, the unemployment rate is not independent of education.

No HS: 7.74%, HS, No college: 4.66%, Less than Bachelor’s: 4.07%, College grad: 2.74%

**4.99 Conditional probability

Probability (Graduate|Employed) = 36,259/114,490 = 0.3167. It is the % of Employed (Sum of Labor column) who graduated from college. Probability(Employed|Graduate) = 36,259/47,371 = 0.7654. It is the % of College graduates who are employed.

**4.110 Read about probability trees in the textbook.

Choosing surgery, there are only two ways to survive: with serious complications (B

1

, P(B

1

) = 0.10), but with 73% chance of then surviving for 5 years or without complications (B

2

, P(B

2

) = 0.85) with 76% chance of surviving 5 years. The product of surviving surgery times the conditional probability surviving 5 years with that particular surgery outcome is the probability of surviving surgery AND that particular outcome. P(A and B

1

) = 0.10(0.73), P(A and B

2

) = 0.85(0.76), so the total probability of A is P(A) = 0.10(0.73) + 0.85(0.76) = 0.719. Surgery is slightly better than medical management .

Download