Short Answer Questions that could appear on exams: 1. What is the population? 2. What is a sample? 3. List three problems with using the number 868/1523 (obtained from a Gallup poll) for the probability that all adults bought a lottery ticket last year. 4. If the sample size in the Gallup poll went from 1523 to 6523 will the percentage that said they bought a lottery ticket most likely go up, most likely go down, or can you not tell? 5. If you take two samples of the same size from the same population will the percentage that bought a lottery ticket be the same? 6. Which is likely to be closer? The percentages in two samples of size 5 from the same population, or the percentages in two samples of size 500 from the same population? 7. In a discrete probability model all the probabilities add up to what number? 8. In a continuous probability model what adds up to 1? 9. Give 3 ways of determining probability. 10. If a give you a coin, can you find exactly the probability it will land heads? 11. Suppose I give you a bent coin, how can you estimate the probability it will land heads? 12. Chance behavior has what property in the short run? 13. Chance behavior has what property in the long run? 14. When observing, do people tend to see the long run? 15. When observing, do people tend to give equal importance to all outcomes? 16. When observing, which outcomes do people tend to give more importance to? 17. Suppose airline A has three times as many flights out of a city than airline B which will have a higher percent of delayed flights? Most likely A, most likely B or you have no idea. 18. What is the notation for the population mean? 19. In a continuous distribution the mean is the area under what curve? 20. The area under xp(x) gives what value in the continuous case? 21. The area under p(x) gives what value in the continuous case? 22. The area under | x | 2 p( x) gives what value in the continuous case? 23. What is the meaning of | x | ? | x | which tells what in everyday terms? 24. Variance is similar to N 25. Is variance an average? 26. Variance is the average of what? 27. The variance and standard deviation measure what? 28. The mean measures what? 29. What is the notation for population variance? 30. What is the notation for population standard deviation? 31. Describe what the standard deviation is in words. 32. What is the area under the z curve? 33. What is the mean of the z curve? 34. What is the standard deviation of the z curve? 35. What is the formula for the z curve? 36. Describe how far a standard deviation is on the z curve. 37. On the z curve how much of the data is within 1 standard deviation of the mean? 38. On the z curve how much of the data is within 2 standard deviations of the mean? 39. On the z curve how much of the data is within 3 standard deviations of the mean? 40. For any probability distribution how much of the data is within 1 standard deviation of the mean? 41. For any probability distribution how much of the data is within 2 standard deviations of the mean? 42. For any probability distribution how much of the data is within 3 standard deviations of the mean? 43. If a set of data is normal with a mean of 40 and a standard deviation of 8, what shape will the data have if each piece of data has 40 subtracted and then that result divided by 8? 44. If a set of data is normal with a mean of 40 and a standard deviation of 8, what will be the mean of the data have if each piece of data has 40 subtracted and then that result divided by 8? 45. If a set of data is normal with a mean of 40 and a standard deviation of 8, what will be the standard deviation of the data have if each piece of data has 40 subtracted and then that result divided by 8? 46. What is a parameter? 47. What is a statistic? 48. Most often what is calculated, a parameter or a statistic? 49. What is the notation for the sample mean? 50. What is the notation for the sample standard deviation? 51. What is the notation for the sample variance? 52. If the sample is random, what is the best guess for ? 53. If the sample is random, what is the best guess for ? _ 54. The Law of Large Numbers says that for what kind of samples x is more likely to be closer to ? 55. If you flip a fair coin and record the percentage of heads, you will get close to 50% for what two reasons? 56. If you flip a fair coin 10 times and get close to 50% it will be mostly due to what? 57. If you flip a fair coin 1000 times and get close to 50% it will be mostly due to what? 58. If you were to get all samples of the same size from a population with mean , the mean of all these sample means would be what? 59. If you were to get all samples with replacement of size n from a population with standard deviation , the standard deviation of all these sample means would be what? 60. For large samples describe the difference between sampling with and without replacement. 61. If the original data is normal, what about the shape of all sample means from samples of the same size? 62. If the original data is not normal, what happens to the shape of all sample means from samples of size n as n goes up? What is the name of this theorem? 63. Consider data sets A:{25,26,26,25,24} and B:{15,25,38,22,40}. If you know one set of data is 5 individuals and the other is 5 averages, which is more likely to be the 5 averages? This is because the ___________ __________ of averages is ___________. 64. Explain why it makes sense that averages tend to have a smaller standard deviation than individuals? 65. Explain the difference between , _ , and s. x 66. Which two of the three should be close and the other one is what compared to those two? , _ , and s x 67. What does the z score tell us in terms of standard deviation? 68. We came up with the formula for P(A | B) by taking a sports team and making a fraction for P(W | H) and the top of the fraction represented what? 69. We came up with the formula for P(A | B) by taking a sports team and making a fraction for P(W | H) and the bottom of the fraction represented what? 70. How can you turn “how many” into “probability”? 71. True or False: P(A and B) and P(A | B) are just two ways to write the same thing? 72. True of False: P(A | B) is the same as P(B | A)? 73. Explain what P(A | B) = P(A) means in everyday terms. 74. If P(A | B) = P(A) as well as P(B | A) = P(B) we say that A and B have what property? 75. If X and Y are independent then P(X | Y) = what? 76. To figure out how many ways a multi-step process can be done you do what? 77. How many ways can n things can be arranged in a row? 78. When finding out how many ways to pick 6 numbers from 42 numbers in a lottery in which order does not matter we first (incorrectly) came up with what? We realized that each outcome was being counted how many times? So we divided by this number and came up with the correct answer of what? 79. The formula for the probability of getting exactly k successes in n trials, n k nk p q , is a combination of what two main ideas? k 80. When using the normal to approximate the binomial the probability of exactly 6 successes is approximately the area from _____ to ____, and this is why we add or subtract .5 when using the normal approximation for the binomial. 81. In the binomial setting what does n – k represent? n 82. In the binomial setting what does represent? k 83. In the binomial setting what does nq represent? 84. Is it human nature to tend to pay more attention to anecdotes or all the data? 85. Which is more important to pay attention to, anecdotes or all the data? 86. Give an example of how data beat anecdotes. 87. What is a lurking variable? 88. Give an example of lurking variable. 89. Is the mean sensitive to outliers? 90. Is the standard deviation sensitive to outliers? 91. Is the median sensitive to outliers? 92. Are the quartiles sensitive to outliers? 93. Suppose you have data only summarized in different numerical ranges. How can you estimate the mean and standard deviation? 94. Why does the following graph make it look like drivers under 25 are the worst? Accidents for drivers up to 50 years of age 1000 900 800 accidents 700 600 500 400 300 200 100 0 under 25 25-29 30-34 35-39 age group 40-44 45-50 95. Give two problems with the following graph. Price for a set list of groceries 109 108.22 108 107.66 total cost 107 106.51 106 105 104 103.22 103 Albertson's City Market Walmart Safeway store 96. Why do we do statistical graphs? 97. Let’s compare percent of children abused in Idaho and Virginia. In Idaho its 22.6% and in Virginia its only 5.9%. Does this mean it is safer for children in Virginia? Explain. 98. How is it that in 1998 North Dakota that was 45th in spending per pupil has a much higher SAT average (by almost 200 points) than New Jersey that was 2nd in spending per pupil? 99. Suppose in a big city it is found that in all fatal car accidents 25% were under the influence of alcohol and 75% were not. It seems that it is better to be drunk, explain why it is not the case. 100. Are statistical conclusions about populations based on samples ever 100% sure? 101. A good graph will show that many people most likely in Florida voted for whom by mistake in 2000? 102. Explain why Colorado is probably doing better than Alabama in education despite the fact Alabama has a higher SAT average than Colorado? 103. Our scatter plot of states with percent taking SAT and SAT average would probably show what if we colored the southern states’ dots a different color 70 years ago? 104. Our scatter plot of states with percent taking SAT and SAT average would probably show what if we colored the southern states’ dots a different color today? 105. What is the notation for the sample linear correlation coefficient? 106. What is the notation for the population correlation coefficient? 107. What does the least squares line minimize? 108. What happens if you switch x and y when finding correlation? 109. What happens if you switch x and y when finding the regression line? 110. What happens to r if the units of measurement on x and/or y are changed? 111. Why does not r change if the units of measurement on x and/or y are changed? 112. If you are 1.43 standard deviations taller than the mean when measured in inches, how many standard deviations above the mean will you be when measured in centimeters? 113. The linear correlation coefficient is always between what two numbers? 114. If there is a negative relationship, then r will be negative in part because bigger than average x’s will correspond to __________ than average y’s making _ _ x x y y the product of a _______ and a _______ which is ________. s s x y 115. Does r measure the strength of the relationship between x and y? 116. The only kind of relationship r measures is what? 117. Name three other relationships besides linear. 118. Is r sensitive to outliers? 119. Is the regression line sensitive to outliers? 120. A change in one standard deviation of x results in a change of ____ standard deviations in y. 121. What is the meaning of r 2 ? 122. If a scatter plot does not show a linear pattern can you still find the line of best fit? 123. If a scatter plot does not show a linear pattern should you still find the line of best fit? 124. If r is close to 1 or -1 is that enough of a reason to find the line of best fit? 125. Are predictions for y based on an x far beyond the range of x’s you have data for are reliable? 126. Predicting a y based on an x far beyond the range of x’s you have data for is called what? 127. Which scatter plot shows a stronger relationship? 128. Which scatter plot will have a higher value of r? 129. If there is a strong correlation between x and y does that mean that changing x will most likely bring about a change in y? 130. Give an example in which there is a strong association between x and y, but there is no cause and effect. 131. There is a strong relationship between elementary kids’ grades and involvement in soccer, explain how this could be true even if there is no cause and effect. 132. Give an example in which there is a fairly strong linear correlation between x and y but there is another variable contributing to the differences in y besides x. Name the two variables. 133. People are often interested in how one variable affects another, give an example in which there are many variables involved and it is basically impossible to do so. 134. Do you think that people with an agenda will still try to show x affects y even if the setting is too complex with many variables interacting? 135. What was a possible lurking variable that would explain why it appears that smoking causes lung cancer despite a high correlation between smoking and lung cancer? 136. What is some really good evidence that there is not some gene that both causes lung cancer and nicotine addiction? 137. There is a strong correlation between education and wealth. Give a possible lurking variable that could explain this without having education have a cause and effect on wealth. 138. If a person is motivated they are likely to become wealthy and also become educated. Do you think that motivation explains all the association between education and wealth, so in fact there is no cause and effect? 139. If a person is motivated they are likely to become wealthy and also educated. Do you think that motivation explains part of the association between education and wealth, so in fact the cause and effect still exists, but it not a strong as many might think? 140. Give an example in which a lurking variable makes a cause and effect look weaker than it actually is. 141. Which scatter plot will have more scatter, or will they be about the same? A) SAT math vs SAT verbal for individual students, B) SAT math vs SAT verbal for state averages. 142. If you try to predict an individual student’s SAT verbal from their SAT math using the regression line for state averages instead of individuals will the prediction be too high, too low, or about right? 143. If you try to predict an individual student’s SAT verbal from their SAT math using the regression line for state averages instead of individuals will the prediction be more reliable, less reliable, or have about the right amount of reliability? 144. With categorical data a what takes the place of a scatter plot? 145. What is Simpson’s Paradox? 146. Give an example of Simpson’s Paradox. 147. Which is always possible, an experiment or an observational study? 148. If done correctly, which controls lurking variables, an experiment or an observational study? 149. The investigators control which subjects get what treatments in which one, an experiment or an observational study? 150. Give an example in which an experiment changed the conclusions of an observational study. 151. Were the observational studies wrong that said women at menopause that had hormone replacement therapy had fewer heart attacks? 152. What was the lurking variable that in observational studies made it appear that hormone replace at menopause made women have fewer heart attacks? 153. Give three lurking variables that may explain why it appear that drinking wine appears to be better than beer or hard liquor in observational studies. 154. How could it be proven that wine is better than beer or hard liquor when it comes to health? 155. Suppose a large florist is deciding whether or not to accept a shipment of roses. The florist asks a recently hired employee to go into the truck where the shipment is and get a sample of 10 roses. What do you think this employee will do? Would you be surprised if after accepting the shipment the florist is not happy with the overall quality of the roses? 156. Give three biases with Mall Sampling. 157. True or false: Getting a good sample is usually pretty easy to do. 158. The reasons people get bad samples can be classified into what two categories? 159. Are volunteer response samples good? 160. Give an example of a volunteer response sample. 161. The AFA (American Family Association) has online polls. Usually these polls will have what kind of bias? 162. The AFA (American Family Association) got upset when an online poll about same sex marriage showed 2-1 in support of it. What happened? 163. The ultimate way to sample is to get a what kind of sample? 164. How often are SRS’s possible? 165. Is it hard to get a bad sample? 166. Is it hard to get a good sample? 167. Give an example of undercoverage. 168. Give an example of nonresponse. 169. What is the problem with undercoverage and nonresponse? 170. Suppose a large city is deciding whether or not to use tax money to build a new stadium for its NFL football team. A newspaper is curious what the residents think and so they send out a mail questionnaire to 10,000 addresses picked at random. Do you think all 10,000 questionnaires will be returned? Do you think that even half will be returned? Do you think people that would like a new stadium and those that do not will have the same rate of mailing the questionnaires back? What sort of bias do you think will result if the newspaper relies only on the returned questionnaires? Should they put a story in their paper telling the residents what they think about the potential new stadium? 171. Does the wording of a question have much affect on the answers? 172. Give an example in which the wording of a question could make quite a difference. 173. Give an example of a sensitive question could not give accurate results. 174. Give an example of a question in which people are forgetful and the results may not be accurate. 175. Give an example of a question asked by the wrong person that would make the results worthless. 176. Give an example of a question that begs a certain answer and hence the results can’t be trusted. 177. We wish to perform an experiment to see whether an online version of a Stat course is better than an in class version. We have data from two teachers. Teacher A teaches an online class and the average grade point for the students in this class is 2.94. Teacher B teaches a regular class and the average grade point in this class was 2.33. So we conclude the online version is better. What are three distinct problems with this experiment? 178. What is a control group? 179. What are the three principals of experimental design? 180. What does statistically significant mean? 181. What is a placebo? 182. What is the purpose of a placebo? 183. What is a double-blind experiment? 184. What is the purpose of a double-blind experiment? 185. Statistically significant depends on what two things? 186. Give an example of how lack of realism can cause problems in a experiment. 187. In a matched-pairs experiment if each person gets both treatments, why is it still important to divide the people up at random? 188. What is the advantage of a block design? 189. In a CI as the confidence level goes up, what happens to the margin of error? 190. In a CI as the sample size goes up, what happens to the margin of error? 191. In a CI if the standard deviation gets higher, what happens to the margin of error? 192. All things being equal, do we prefer the margin of error to be big or small? 193. Which hypothesis {Ho or Ha} are we trying to prove in a HT? 194. If we have better evidence for Ha than for Ho, does that mean that Ho will probably be rejected? 195. If Ho is true what is the probability that you will reject it by mistake? 196. If Ho is not true what is the probability that you mistakenly not reject it? 197. What is the total area of the rejection region in a HT? 198. If you mistakenly reject Ho, what type of error is it? 199. If you mistakenly don’t reject Ho, what type of error is it? 200. What is the chance of making a type I error? 201. What is the chance of making a type II error? 202. What is the notation for the significance level? 203. What is the notation for the total area of the rejection region? 204. Generally speaking which type of error is more important to keep small? 205. At the beginning we always assume what about Ho? 206. The pictures we draw in a HT show how ___________ would be distributed assuming __________. 207. The edge(s) of the rejection region(s) are called what? 208. Are the critical value(s) found by a table or calculation in this class? 209. The standardized number of the statistic(s) related to the parameter(s) in Ho are called what? 210. Is the test statistic found by a table or calculation? 211. What is the p-value in everyday terms? 212. In a right-hand tail the p-value is the area to the ________ of the test statistic. This is because this area represents the chance of getting _________________ evidence against Ho, assuming Ho is _______. 213. In order to reject Ho, the p-value must be what compared to the significance level? 214. Which casts more doubt on Ho, a small p-value or a large p-value? 215. Are the conditions usually met exactly when doing CIs or HTs? 216. Is it rare to see any problems when doing CIs or HTs? 217. To be a good statistician what should you do about not meeting conditions once the data is collected? 218. To be a good statistician what should you do about any problems when doing HTs or CIs? 219. Do CIs and HTs remedy basic flaws in the data? 220. Give an example where a SRS is called for and not met, but probably does not cause any bad problems. 221. Give an example where a SRS is called for and not met, and this causes the results to be useless. 222. Give an example where there was a high statistical significance of something occurring, but it was not what people thought at first. 223. In the gastric freezing example, we were pretty sure patients were getting better, at first doctors thought it was ______________ , but later experiments showed it was probably just because of _________ affect? The problem was at first the gastric freezing experiment was not _________. 224. Do outliers have much affect on the HTs and CIs we do in the class? 225. If you have an outlier that is found to be an incorrect piece of data and can’t be corrected, the best thing to do is what? 226. If you have an outlier that is found to be a real piece of data, should you remove it? 227. Does the margin of error in a CI fix nonresponse? 228. Does the margin of error in a CI fix undercoverage? 229. Does the margin of error in a CI fix biased data? 230. If your sample is not a random sample, can you be 95% sure that the CI has the correct answer for the parameter? 231. There is only one thing the margin of error in a CI covers, what it that? 232. What does the p-value mean in cases where the sample you use for the HT has problems with it? 233. What are three things that affect how small we want the p-value or the significance level to be? 234. Should we always use the 5% significance level? 235. If you have a small sample size, what will happen to the p-value if the same behavior is seen with a larger sample? 236. If you have a small sample size and the p-value is too high, should you just give up on rejecting Ho? 237. Does practically significant mean the same as statistically significant? 238. A small difference that nobody would care about in the real world, but we are really sure about is _____________ significant, but not _____________ significant? 239. When doing HTs is it best to first look at the data you collect before deciding on Ho and Ha? 240. Is it a good idea to do many different HTs to search for things that are true? 241. Why is it not a good idea to do many different HTs to search for things that are true? 242. Is it a good idea to do repeat the same HTs with different sets of data? 243. Why is it a good idea to repeat the same HTs with different sets of data? 244. If 40 and the 8 and the data is normal, what will be the mean of sample means of size 16? 245. If 40 and the 8 and the data is normal, what will be the standard deviation of sample means of size 16? 246. If 40 and the 8 and the data is normal, what will be the shape of sample means of size 16? 247. Gosset came up with the t distributions by trying to make what product have a high quality? 248. Suppose you have a large sample and use z in place of t will the difference be that noticeable? 249. Suppose you have a small sample and use z in place of t will the difference be that noticeable? 250. What is the area under a t curve? 251. What distribution is a t with degrees of freedom? 252. How often we will be able to exactly meet the condition for CIs and HTs to be mathematically precise? 253. Generally speaking there is there more concern with doing HTs and CIs with small sample sizes or large sample sizes? 254. Name one problem with doing HTs and CIs with large sample sizes. 255. When using the z or t why do we not really care about the normality of the data for large sample sizes? 256. If your degrees of freedom are not in the table what should you do? 257. If you reject an Ho assuming fewer degrees of freedom than you actually have, will you be able to reject Ho with the correct degrees of freedom? 258. If you reject an Ho assuming more degrees of freedom than you actually have, will you be able to reject Ho with the correct degrees of freedom? 259. If you give a 95% CI assuming fewer degrees of freedom than you actually have, you should be ____________ than 95% sure you have the correct answer in the CI? 260. If 40 and the s 8 based on a random sample of size 16 and the data is normal, what will be the mean of sample means of size 16? 261. If 40 and the s 8 based on a random sample of size 16 and the data is normal, what will be the best estimate for the standard deviation of sample means of size 16? 262. If 40 and the s 8 based on a random sample of size 16 and the data is normal, what will be the shape of sample means of size 16? _ 263. Suppose you assume 40 and s 8 and n = 16 and x 42 and you are trying to prove 40 . a) Would it be better of worse if s = 9? b) Would it be better of worse if n = 17? _ c) Would it be better of worse if x 43 ? 264. In each case do you think that conditions are OK to do a HT or CI with the given data: a) You are comparing two means and your sample sizes are 5 and 8. The samples are random. There are no outliers but the shapes of the sample data are quite different. b) You are comparing two means and your sample sizes are 50 and 80. The samples are random. There are no outliers but the shapes of the sample data are quite different. c) You are comparing two means and your sample sizes are 5 and 8. The samples are random. There are no outliers and the shapes of the sample data are very close. d) You are comparing two means and your sample sizes are 50 and 80. The samples are random. There are no outliers and the shapes of the sample data are very close. e) You are comparing two means and your sample sizes are 5 and 8. The samples are random. There are two minor outliers and the shapes of the sample data are very close. f) You are comparing two means and your sample sizes are 50 and 80. The samples are random. There are two minor outliers and the shapes of the sample data are very close. g) You are comparing two means and your sample sizes are 5 and 8. The samples are random. There are two minor outliers and the shapes of the sample data are quite different. h) You are comparing two means and your sample sizes are 50 and 80. The samples are random. There are two minor outliers and the shapes of the sample data are quite different. i) You are studying a mean and have a sample of size 10. The sample data is symmetric with no outliers and the data was collected at random. j) You are studying a mean and have a sample of size 10. The sample data is not symmetric and there are no outliers and the data was collected at random. k) You are studying a mean and have a sample of size 10. The sample data is symmetric with a minor outlier and the data was collected at random. l) You are studying a mean and have a sample of size 100. The sample data is symmetric with no outliers and the data was collected at random. m) You are studying a mean and have a sample of size 100. The sample data is not symmetric with no outliers and the data was collected at random. n) You are studying a mean and have a sample of size 100. The sample data is symmetric with a minor outlier and the data was collected at random. o) You are studying the mean heights of all adult men and have a sample of size 1200. The sample data is all major league baseball players and it is symmetric with no outliers. 265. In each case the sample is not a SRS; do you think it will be OK to do a HT or CI with the given data? a) You are studying the mean number of gallons of milk sold per day by a store and your sample is 60 days all in a row. b) You are studying the mean number of gallons of milk sold per day by a store and your sample is 30 days starting with one day and picking every 7th day after that. c) You are studying the mean number of gallons of milk sold per day by a store and your sample is 30 days starting with one day and picking every 12th day after that. d) You are studying the mean drying time of paint on 2x4’s sold by a home improvement store and your sample is 40 boards all from the same shipment and the wood is pretty much the same from shipment to shipment. e) You are studying the mean drying time of paint on 2x4’s sold by a home improvement store and your sample is 40 boards all from the same shipment and the wood tends to vary quite a bit from shipment to shipment. f) You are studying the mean drying time of paint on 2x4’s sold by a home improvement store and your sample is 40 boards in which you choose 10 shipments spaced out over several months and then chose 4 boards from each at shipment (one off the top, two from the middle, and one off the bottom). g) You are studying the mean drying time of paint on 2x4’s sold by a home improvement store and your sample is 240 boards all from the same shipment and the wood tends to vary quite a bit from shipment to shipment. h) You are studying the percents of cats that prefer two different types of cat food and your sample is 42 cats that were basically all the cats of all the people you knew real well that would participate. i) You are studying the percents of people that prefer two different types of beer and your sample is 42 prisoners in county jail. j) You are studying the difference in average weights of boy 4th graders and girl 4th graders and your samples are all 52 4th grade boys from a school in Mississippi and all 32 4th grade girls from a school in Colorado. k) You are studying the difference in average weights of boy 4th graders and girl 4th graders and your samples are all 52 4th grade boys from a school in Mississippi and all 32 4th grade girls from the same school. 266. For HTs and CIs for comparing means from two independent samples with small sample sizes, you want the samples to have similar __________ with no __________. 267. How can you get a good idea about the shape of a distribution? 268. How can you get a good idea if there are outliers? 269. For HTs and CIs for comparing means from two independent samples, if you knew the population standard deviations what distribution would you use? 270. When comparing two means, we use what arithmetic operation to compare them? 271. When subtracting means from two independent samples {X and Y}of size n X and nY with variances X2 and Y2 the variance is ______________. The standard deviation is _____________. The best estimate for X2 is ______ and the best estimate for Y2 is ________, so the best estimate for the standard deviation of X Y is _______________ . 272. Suppose X 40 and Y 40 and X 8 and Y 9 and n X 14 and nY 12 _ _ and X and Y are normal, what will be the shape of the distribution of x y ? 273. Suppose X 40 and Y 40 and X 8 and Y 9 and n X 14 and nY 12 _ _ and X and Y are normal, what will be the mean of the distribution of x y ? 274. Suppose X 40 and Y 40 and X 8 and Y 9 and n X 14 and nY 12 and X and Y are normal, what will be the standard deviation of the distribution of _ _ x y ? 275. Suppose X 40 and Y 40 and s X 8 and sY 9 and n X 14 and nY 12 and X and Y are normal, what will be the approximate shape of the distribution of _ _ x y ? 276. Suppose X 40 and Y 40 and s X 8 and sY 9 and n X 14 and nY 12 _ _ and X and Y are normal, what will be the mean of the distribution of x y ? 277. Suppose X 40 and Y 40 and s X 8 and sY 9 and n X 14 and nY 12 and X and Y are normal, what will be the best guess for the standard deviation of the _ _ distribution of x y ? 278. What is the notation for the sample proportion? 279. What is the notation for the population proportion? 280. If the data is random what is the best guess for p? 281. How do you go from how many successes (Binomial) to proportion of successes? 282. If you divide the number of successes by n, the mean gets divided by ____ and the variance gets divided by ______. 283. The mean of the binomial is np which divided by n is ______, so the mean of p’ is _______. 284. The variance of the binomial is npq which divided by n 2 is _______, so the standard deviation of p’ is __________. 285. If a population is normal, then dividing by n will give it what shape? 286. The binomial is approximately normal when np and nq exceed _______ so p’ is also approximately normal under the same conditions. 287. To figure the sample size, n, needed for a CI for a proportion, you are safe to use p and q to be ______, this makes n the largest and if n is too large then the margin of error will be even _________ than what was asked for? 288. To figure the sample size, n, needed for a CI for a proportion, if you have a reasonable value for p’ and use it then you CI may have a margin of error a little too big, but your sample size will be ________ making collecting the data easier. 289. Suppose p = .40. What is the approximate shape of the distribution of p’ for samples of size 200? 290. Suppose p = .40. What is the mean of the distribution of p’ for samples of size 200? 291. Suppose p = .40. What is the standard deviation of the distribution of p’ for samples of size 200? pq 292. For a HT for p we use , because we assume Ho is _____ and so have a n p' q' value for p. For a CI for p we use because we estimate p by ___ and q by n _____. 293. When comparing two proportions, we use what arithmetic operation to compare them? 294. When subtracting proportions from two independent samples {X with sample size n X and proportion p X and Y with sample size nY and proportion pY } the standard deviations from X and Y are _______ and ______, the variances are ______ and ______, when subtracting the variance is _______________ and the standard deviation is _________________. 295. The formula for the standard deviation of the difference of two proportions is p1 q1 p 2 q 2 . If we must estimate the p’s with sample numbers without the n1 n2 assumption that the p’s are equal then the formula becomes what? 296. The formula for the standard deviation of the difference of two proportions is p1 q1 p 2 q 2 . If we estimate the p’s with sample numbers with the assumption that n1 n2 the p’s are equal then the formula becomes what? 297. Suppose p X .40 and pY .40 and n X 140 and n y 120 , what will be the mean of p X' pY' ? 298. Suppose p X .40 and pY .40 and n X 140 and n y 120 , what will be the standard deviation of p X' pY' ? 299. Suppose p X .40 and pY .40 and n X 140 and n y 120 , what will be the approximate shape of p X' pY' ? 300. Are CIs and HTs about variances (or standard deviations) considered risky compared to CIs and HTs about means? 301. Is the z distribution symmetric? 302. Are the t distributions symmetric? 303. Are the 2 distributions symmetric? 304. Are the F distributions symmetric? 305. What is the area under a 2 curve? 306. What is the area under a F curve? 307. Suppose 2 = 12 and the population is normal, what is the shape of all the df ( s 2 ) for samples of size 25? 2 308. When comparing variances, what arithmetic operation is used? 309. Suppose X and Y are independent normal populations. What is the shape of s X2 sY2 where the sample sizes are of size 10 for X and size 8 for Y? 310. With the O and E stuff, why is the rejection region is always to the right? It is O E 2 _________ because if Ho is wrong then O and E will _______, making E which is to the right. 311. The E’s in the O and E stuff are found assuming what? 312. With a Test for Independence why are the Es = (row total)(column total)/(grand total)? For E that is for Row 2 and Column 3, E should be (grand total)(P(_________________)) = n(P( )P( )) because we assume that Ho is true which is that the rows and columns are _____________. The best estimate for P(R2) = ______ and for P(C3) = __________ making the estimate (R2 total)(C3 total)/n. 313. With the O and E stuff we want all the E’s to be at least what to get good results? 314. What are the 4 assumptions for ANOVA? 315. If you do a good job of collected data from different sources, the data will vary for only what two reasons? 316. In ANOVA to reject Ho: “all means equal” you hope the variance due to _________ is high and the variance due to ____________ is low. 317. Variance due to factor is a weighted ____________ of the sample ___________. 318. Variance due to error is a weighted ____________ of the sample ____________. 319. Suppose you have three normal populations with equal variances and you find _ _ _ x 1 5 , s1 11 , x 2 7 , s 2 12 , x 3 9 , and s3 13 . Would you have better _ evidence for a difference in population means if x 3 10 instead? 320. Suppose you have three normal populations with equal variances and you find _ _ _ x 1 5 , s1 11 , x 2 7 , s 2 12 , x 3 9 , and s3 13 . Would you have better _ evidence for a difference in population means if x 3 8 instead? 321. Suppose you have three normal populations with equal variances and you find _ _ _ x 1 5 , s1 11 , x 2 7 , s 2 12 , x 3 9 , and s3 13 . Would you have better evidence for a difference in population means if s3 10 instead? 322. Suppose you have three normal populations with equal variances and you find _ _ _ x 1 5 , s1 11 , x 2 7 , s 2 12 , x 3 9 , and s3 13 . Would you have better evidence for a difference in population means if s3 15 instead? 323. What four things should you check in addition to graphing a scatter plot before calculating the least-squares line? 324. Can you still do all the calculations for CIs and HTs if the data is bad? 325. Should you do all the calculations for CIs and HTs if the data is bad? 326. Give two advantages of Non-parametric statistics. 327. Give a disadvantage of Non-parametric statistics. 328. If you want to do a HT about the mean but the sample size is small and there is an outlier, you might instead do a HT about the __________ and use what Nonparametric test?