A Catechism of YMS Strayhorn

advertisement
A Catechism
for
The Practice of Statistics, Second Edition
This is the 6th version of this catechism, revised as of 2/2/2007
The excellent textbook was written by Daniel Yates, David Moore, and Daren Starnes,
and is copyright 2003, W.H. Freeman and Company.
This catechism was written by Joseph Strayhorn, in 2006, and is not copyrighted.
Strayhorn’s email address is joestrayhorn@juno.com
YMS Chapter 1: Exploring Data
Q1. The science of data is known as ____.
A1. Statistics
Q2. Most raw data sets can be organized into rows and columns. Each row represents
some object or person that is studied, and each column represents some characteristic
about that thing that is measured. Our textbook calls those objects and characteristics
what two things respectively?
A2. Individuals and variables
Q3. What are the two main classes of variable types?
A3. Categorical and quantitative
Q4. A description, depiction, or equation telling what values a variable takes on and how
often it takes on these values is called the ___ of the variable.
A4. Distribution
Q5. Before studying the relationships among variables, it's usually good to begin by
examining what?
A5. Each variable by itself
Q6. Before getting numerical summaries of the data, your textbook advises exploring the
data with what?
A6. Graphs
Q7. What two types of graphs are usually most appropriate for categorical data?
A7. Bar charts and pie charts
Q8. If several percentages do not represent portions of the same whole, then what type of
graph is inappropriate?
A8. A pie chart
Q9. When you are asked to describe a distribution after looking at a graph, the general
tactic is to look for an overall pattern and also for striking deviations from that pattern.
When describing the overall pattern, what three features should you mention?
A9. Center, shape, and spread
Q10. When you are asked to describe a distribution, the general tactic is to look for an
overall pattern and also for striking deviations from that pattern. What are the striking
deviations called?
A10. Outliers
Q11. Someone wants to display this center, shape, and spread of a data set with a picture.
But the person also wants to communicate, through the same graph, the individual raw
data values that were collected in the study. There are too many different values that the
variable takes on to make a dot plot feasible. What type of graph should the person
choose?
A11. A stem plot
Q12. Instead of a dot plot or a stem plot, a ____ is the most common graph of the
distribution of one quantitative variable.
A12. Histogram
Q13. What does your textbook depict as a minimum number for either the number of
stems in a stem plot, or the number of classes in a histogram?
A13. Five
Q14. If the right and left sides of a histogram are approximately mirror images of each
other, we call the distribution what?
A14. Symmetric
Q15. If there's a big hump on the left side of a histogram and a long tail extending far out
to the right, do we say that the distribution is skewed right or skewed left?
A15. Skewed right
Q16. If you look at people's incomes, defining income so that zero is the smallest
possible value, and your sample includes mainly middle income people but at a few
extremely high income people, will the distribution be skewed right or skewed left?
A16. Skewed right
Q17. Mary gets a test report saying that 79% of the test takers fell at or below the score
that she made. The name of the type of score she got is what?
A17. Percentile
Q18. A relative cumulative frequency graph is often called what?
A18. Ogive.
Q19. In a relative cumulative frequency graph, or ogive, the horizontal axis is for the
values of the variable you are looking at. For any given value on the horizontal axis,
what does the value on the vertical axis stand for?
A19. The fraction of observations less than or equal to that value
Q20. If you are given a relative cumulative frequency graph, and someone asks you to
find the center of the distribution, how do you do it?
A20. Find the value on the x-axis that has a 50% or .5 value on the y-axis.
Q21. On a time plot, what axis does time go on?
A21. The horizontal axis
Q22. On a time plot, an overall upward or downward slope is called what?
A22. A trend
Q23. On a time plot, what do you call the shorter-term variations that occur regularly,
repeating themselves in a cyclic fashion?
A23. Seasonal variation
Q24. 1/n times the summation of the x(i), where n is the number of cases and x(i) is the
value of the ith case, is known as what?
A24. The mean
Q25. The number in a distribution such than half the observations are smaller and the
other half are larger is called what?
A25. The median
Q26. If there is no middle value in a data set because you have an even number of cases,
how do you do find the median then?
A26. You find the mean of the two center observations.
Q27. Between the mean and median, which of these is pulled farther in the direction of
extreme values or outliers?
A27. The mean.
Q28. If a distribution is highly skewed to the right, which value will be lower: the mean,
or the median?
A28. The median.
Q29. From which statistic, the mean or the median, can you recover the total value of all
the cases in your data set, if you know how many cases there are?
A29. The mean
Q30. What's the definition of the range of a distribution?
A30. The difference between the largest and smallest value
Q31. What's the chief problem with using the range as a measure of the spread of a
distribution?
A31. It's too sensitive to outliers, and it depends on only two values in the data set.
Q32. What you call the median of the subset of observations whose position in the
ordered list is to the left of the overall median?
A32. The first quartile
Q33. What's the definition of the interquartile range?
A33. The third quartile minus the first quartile.
Q34. What's the rule of thumb for defining outliers in terms of the interquartile range?
A34. An outlier falls more than 1.5 times the interquartile range above the third quartile
or below the first quartile.
Q35. What five numbers are in the so-called five number summary?
A35. The minimum, the first quartile, the median, the third quartile, and the maximum.
Q36. What type of graph gives a picture of the five number summary?
A36. The box plot
Q37. What's the difference between a regular box plot and a modified box plot?
A37. In a regular box plot, the whiskers go out to the maximum and minimum. In a
modified box plot, the whiskers go out to the largest and smallest data points that are not
outliers. The outliers are plotted as isolated points on a modified box plot.
Q38. If you take the deviation of each observation from the mean of the whole set,
square those deviations, add those squares, and divide by one less than the number of
observations, what do you call the resulting number?
A38. The variance
Q39. What is the relationship between the variance and the standard deviation?
A39. The standard deviation is the square root of the variance.
Q40. How is the standard deviation like the interquartile range?
A40. Both of them are measures of spread of the distribution.
Q41. When you average the squared deviations from the mean to find the variance of a
sample, what should you divide by: the n of cases, or the "degrees of freedom"?
A41. The degrees of freedom
Q42. Under what conditions will a standard deviation equal zero?
A42. When all the observations have the same value.
Q43. Between the interquartile range and the standard deviation, which is more resistant
to the effects of the outliers?
A43. The interquartile range
Q44. How do you choose between the five number summary on the one hand, and the
mean and standard deviation, on the other hand, as ways of describing a distribution?
A44. The mean and standard deviation are good for reasonably symmetric distributions
that are free of outliers. Otherwise the five number summary is usually better.
Q45. If you add the same number to each observation, how does that affect the center
and the spread of the distribution?
A45. The number that you add is added to the measures of center, such as the mean and
median. But measures of spread, such as the interquartile range and standard deviation,
are not affected.
Q46. If you multiply each observation by the same number, how does that affect
measures of center and spread?
A46. Both the measures of center (median and mean) and the measures of spread
(standard deviation and interquartile range) are multiplied by the same number. (The
variance, which is also a measure of spread, is multiplied by the square of the number
each observation is multiplied by.)
Q47. What are three graphical methods of comparing distributions?
A47. Side by side bar graphs, back-to-back stem plots, and side-by-side box plots.
YMS Chapter 2: The Normal Distribution
Q1. The scales of density curves are adjusted so that the total area under each curve is
what?
A1. One
Q2. The area under the density curves between a couple of x-axis values represents what?
A2. The proportion of all observations that fall between those values.
Q3. Do measures of center and spread apply a to density curve as well as to sets of
observations?
A3. Yes
Q4. How do you define the median of the density curve?
A4. The point with half the area under the curve to its left and the remaining half of the
area to its right.
Q5. The quartiles of a density curve divide the area into what?
A5. Four equal parts.
Q6. What is the relationship between the mean and the median of a symmetric density
curve?
A6. They are equal.
Q7. Which is pulled the farther toward the tail of a skewed distribution: the median, or
the mean?
A7. The mean
Q8. In conventional notation, what are the meanings of x-bar and s, as contrasted to mu
and sigma?
A8. The first two refer to the mean and standard deviation, respectively, of a set of
observations, a sample. The second two refer to the mean and standard deviation,
respectively, of a density curve or idealized distribution, or the population distribution.
Q9. What three features describe the overall shape of normal curve?
A9. Normal curves are symmetric, single peaked, and bell shaped.
Q10. Is there only one normal curve, or is there an infinite number of normal curves?
A10. An infinite number.
Q11. For any given mean and standard deviation, is there only one normal curve, or an
infinite number of normal curves?
A11. Only one.
Q12. How can you visually find the points one standard deviation from the mean of a
normal curve?
A12. Those points are the inflection points of the curve. That is, the curve changes from
falling more and more steeply to falling less and less steeply, or vice versa. (Optional
answer for calculus lovers: they are points where the second derivative of the curve
equals zero.)
Q13. The distributions of test scores, of measures of characteristics of living things, and
of summary statistics for chance outcomes repeated many times, often (but not always!)
follow what type of distribution?
A13. The normal distribution
Q14. What three percentages do you have to remember when you are stating the
“empirical rule”?
A14. 68%, and 95%, and 99.7%.
Q15. Are the three percentages for 1, 2, and 3 standard deviations exact, or easier-toremember rounded approximations?
A15. Approximations.
Q16. What do the three percentages in the empirical rule apply to? in other words, what is
the meaning of this rule?
A16. The three numbers tell the per cent observations falling within the region plus or
minus 1, 2, or 3 standard deviations from the mean, respectively, in a normal curve. (Note
that the percents refer to the percent of observations encompassed by the interval from
that number of standard deviations below the mean to that number above the mean.)
Q17. True or false: If Mary scores one standard deviation above the mean on a normally
distributed test, then approximately 68% of the test takers scored as close to the mean of
the test as, or closer to the mean than, Mary did.
A17. True
Q18. True or false: If Mary scores one standard deviation above the mean a on a
normally distributed test, her score is in the 68th percentile.
A18. False
Q19. True or false: if Mary scores one standard deviation above the mean on a normally
distributed test, half of 68% or 34% are above the mean but at or below Mary’s score.
An additional 50% are below the mean. Thus Mary equals or surpasses 50% plus 34% of
the test takers, and is at the 84th percentile.
A19. True
Q20. What does the notation N(100,15) mean?
A20. It denotes a normal in distribution with mean it 100 and standard deviation 15.
Q21. True or false: the standard score for any observation tells how many standard
deviations that score is from the mean.
A21. True
Q22. What two operations do we do, to standardize a score?
A22. Subtract the mean and divide by the standard deviation.
Q23. A standard score is often called by what other term?
A23. The z-score.
Q24. What does the sign of a standard score correspond to?
A24. If the z-score is positive, it’s above the mean, and if negative, below the mean.
Q25. Are there an infinite number of standard and normal curves, each with its own
equation describing it, or just one standard normal curve, with just one equation
describing it?
A25. There is just one standard normal curve, with only one equation and describing it.
Q26. In a table of areas under the standard normal curve, what does the table entry for
each z score represent?
A26. The area under the curve to the left of z, or in other words, the proportion of cases
with values less than z.
Q27. What steps do you follow to use the z table solve the following “problem
prototype”: given N(mu, sigma), please find the proportion of observations less than x?
A27. First we standardize x (by subtracting mu and dividing by sigma). Then we look at
the z table to find the proportion of the distribution less than the z score we’ve obtained.
Q28. What steps do you follow with a z table if you want to know what proportion of the
scores are between two values?
A28. Look up the proportion less than the first, and less than the second, and find the
difference between the two proportions.
Q29. What two pictures do YMS recommend drawing when solving problems where you
are asked to solve problem where you are given normal curves and asked for proportions
of the observations?
A29. They recommend drawing unstandardized and standardized normal curves, and
shading in the areas that are asked for.
Q30. What do the authors recommend (as a word to the wise for future test-takers) as the
last step of problems giving a normal distribution and asking for proportions of
observations?
A30. They recommend stating the conclusion in the context of the problem. Thus rather
than just saying, the answer is 49%, you would say, “About 49% of boys have cholesterol
levels between 170 and 240 mg/dl.”
Q31. What steps do you go through when you want to find a value given a proportion of
a normal distribution, using the z-table?
A31. You look for the proportion in the body of the table, and you find at the margin the
z-score that corresponds to it. Then you “unstandardize” the z-score.
Q32. What operations do you do, in what order, to “unstandardize” a z-score, or turn the
z-score into a raw score?
A32. You multiply the z score by the standard deviation, and then you add it to the mean.
Q33. Suppose you have a data set, and you want to see if it is approximately normally
distributed. What’s the first thing to do, before doing calculations?
A33. Make a frequency histogram or stemplot, and see if the curve looks bell-shaped and
symmetric.
Q34. What’s a way of checking a data set for normality, using the empirical rule?
A34. Find the mean and sd of the data set, and count (or get a computer to count) the
proportions of observations that are within 1, 2, and 3 standard deviations of the mean.
See if these proportions correspond, roughly, to .68, .95, and .997.
Q35. True or false: The point of making a normal probability plot is to see whether a set
of numbers is normally distributed.
A35. True
Q36. If you were to take any set of numbers, and plot the numbers on the x-axis, and their
z-scores on the y-axis, you would get what shape for your graph? (Hint1: y values are
(1/sd)*x + mean/sd, or of the form y=mx +b) (Hint 2: Standardizing a score involves a
linear transformation.)
A36. A line.
Q37. On a normal probability plot for a set of observations, what goes on the x axis?
A37. The values of the observations themselves.
Q38. True or false: On a normal probability plot, what goes on the y-axis for each x value
is the z score that would be associated with the percentile for that value, assuming a
normal distribution (and using midpoints of intervals in finding percentiles)?
A38. True.
Q39. What conclusion do you come to if a normal probability plot is not linear?
A39. That the data are not normally distributed.
YMS Chapter 3: Examining Relationships
Q1. Suppose that a researcher wants to study the effect of people’s ever having taken the
drug "ecstasy" upon the people’s memory scores when tested. Which of these is the
response variable and which is the explanatory variable?
A1. Ingestion of ecstacy is the explanatory variable and memory test scores are the
response variable.
Q2. How do the terms "dependent variable" and "independent variable" correspond to the
terms “response variable” and “explanatory variable”?
A2. Dependent corresponds to response, and independent corresponds to explanatory.
Q3. Is it proper to use the terms, response variable and explanatory variable, if the
explanatory variable does not actually cause the response variable?
A3. Yes. No implication of causation is contained in the terms explanatory and response
(or independent and dependent).
Q4. What's the order of tasks involved in the examining relationships between two
variables?
A4. First plot the data, then use numerical summaries. Look for the overall pattern and
deviations from that pattern, and when the overall pattern is regular, use a mathematical
model to describe it.
Q5. Suppose that someone has math scores for the children in one classroom, and English
scores for a second set of children in another classroom. The person asks you about
making a scatterplot for these data. What would you reply?
A5. It doesn't make sense to use a scatterplot in this situation, because a scatterplot
involves graphing two variables measured upon the same individuals.
Q6. True or false: in a scatterplot, each point represents one individual; the x-coordinate
of the point represents the value of one variable and the y-coordinate represents the value
of another variable measured on that same individual.
A6. True
Q7. If there is an explanatory variable, which axis should it be graphed on?
A7. The x-axis
Q8. When describing a scatterplot, what three aspects of the pattern should you refer to?
A8. The form, the direction, and the strength of the relationship.
Q9. True or false: in describing the form of a scatterplot, it's important to say whether the
graph appears to be linear or not.
A9. True
Q10. In describing the form of a scatterplot, what term do you use if the values tend to
fall into two or more groups that are separated from one another by gaps?
A10. You say that there are clusters.
Q11. In describing the direction of a scatterplot, when there is a positive or negative
slope, we say that the variables are positively or negatively what?
A11. Positively or negatively associated.
Q12. When any given x value on a scatterplot has vary widely varying y values
associated with it: the more widely varying the y values, the less _____ is the relationship
between the two variables.
A12. strong
Q13. When you are drawing a scatterplot, what symbols should you use in showing the
axes if the origin of the graph is not at zero?
A13. You use a symbol that looks like two slashes to indicate a break in the scale.
Q14. What are about 3 other guidelines on how to draw scatterplots properly?
A14. Make the intervals uniform. Label both axes. Choose a scale that makes your
graph big enough.
Q15. Suppose that you want your scatterplot to reflect the influence of a particular
categorical variable, in addition to the relationship of the two quantitative variables that
are plotted. For example, suppose you want to graph the relation between entertainment
violence and real-life violence for males and females on the same graph, in such a way
that displays the relationship separately for males and females. What should you do?
A15. Use a different symbol on the scatterplot for the points designating males as for
those designating females.
Q16. A common problem in constructing a scatterplot occurs when two or more
individuals have exactly the same values for each of the two variables. What should you
do in that case?
A16. Use a different plotting symbol to call attention to those points.
Q17. Which is a better method for judging the strength of a linear relationship: simply to
look at a graph, or to use a calculated numerical statistic that summarizes the strength of
the linear relationship? Why do you think your chosen method is better?
A17. A calculated statistic works better, because our eyes can be deceived by a different
scaling methods used in graphs.
Q18. What is the summary statistic that measures the strength of a linear relationship?
A18. The correlation coefficient.
Q19. We’ve used Greek letter mu to represent a population mean; x-bar to represent a
sample mean; Greek sigma to represent the standard deviation, and s to represent the
sample standard deviation. What letter does out book use to designate what it calls the
correlation?
A19. r
Q20. Given that the letter r, for the correlation coefficient, is in our own alphabet and not
the Greek alphabet, do you think it refers to a sample statistic or a population parameter?
A20. sample statistic
Q21. Would you guess that there is some other Greek letter that refers to the population
value of the correlation coefficient?
A21. Yes. (It’s the letter rho, which looks pretty much like a p!)
Q22. When you look at the formula for the sample correlation coefficient that your text
gives, you see (xi-xbar)/sx and (yi-ybar)/sy. Can you give a simpler name to these
expressions?
A22. These are the standard scores, or z-scores, for the ith individual. The first factor is
the z-score for the x variable and the second is the z-score for the y variable.
Q23. What is the meaning of a positive and negative sign associated with the correlation
coefficient?
A23. A positive sign means there’s a positive association between the variables; in other
words, higher values of one are associated with higher values of the other. A negative
sign means there’s a negative association; that is, higher values of one variable are
associated with lower values of the other.
Q24. Suppose one person calculates the correlation of IQ score of some individuals with
number of boxing matches fought, testing the hypothesis that boxing (the explanatory
variable) affects IQ (the response variable). A second person, using the same data set,
also calculates the correlation of the number of fights with IQ score, only this person
thinks of IQ as the explanatory variable and number of fights as the response variable.
Do they get the same correlation, or different ones?
A24. The same correlation. The correlation coefficient is not affected by which variable
is considered explanatory and which is considered response.
Q25. Suppose someone codes race as follows: 0=Caucasian, 1=African American,
2=Asian, 3=Hispanic, 4=American Indian 5=Other. Then someone calculates a
correlation between race and a reading test score for a sample of kids. Do you have a
problem with this? If so, what’s your problem?
A25. The problem is that the correlation coefficient is to be used with quantitative
variables, not categorical variables like this. The obtained correlation would be
meaningless, and an artifact of the arbitrary coding system.
Q26. Melinda computes a correlation between the height of mothers and their daughters.
Lunk is looking at the computations, and says, “You blew it! You have the height of
mothers measured in centimeters, and the height of the daughters measured in inches!”
Please tell whether Melinda needs to do anything to fix her correlation coefficient, and if
so, what?
A26. Melinda did not blow it, because the correlation coefficient comes out the same no
matter what units are used. (This is because a transformation from one unit to another
(which involves multiplying each number in the data set by the same number) multiplies
both the mean and the standard deviation of the data set by the same number (as was
learned in chapter 1). The z-score, which is (xi-xbar)/sx comes out the same, because
each of the three numbers that make up the z-score is multiplied by the same factor, and
that factor cancels out. Since the z-scores are not affected by changes of units, the
correlation coefficient is also not affected.)
Q27. What range of values is possible for the correlation coefficient?
A27. -1 to +1.
Q28. What sort of correlation coefficient do you find when two variables have a very
strong linear relationship, and when as the first gets greater, the second gets smaller?
A28. A correlation close to –1.
Q29. Suppose the data points are two variables collected for all the days of 2005. For
each of those days, imagine that we know (variable 1) the number of words your
instructor for this course spoke in that day, and (variable 2) the peak barometric pressure
for that day in Caracas, Venezuela. About what would you guess the correlation between
these two variables to be? Why?
A29. You’d guess a correlation of about 0, since there is no reason to expect that these
two variables would rise and fall in association with each other.
Q30. Suppose there are two variables which, when graphed in a scatterplot, form an
almost perfect u-shaped parabola. Would the strong relationship between these variables
imply a high correlation coefficient (meaning one close to 1)? Why or why not?
A30. No, because the correlation coefficient measures the strength of linear relationships
only, not curvilinear relationships. A u-shaped curve isn’t a straight line!
Q31. Does the correlation coefficient resemble the median and interquartile range in
being fairly resistant to outliers, or resemble the mean and standard deviation in being
heavily influenced by outliers?
A31. Like the mean and sd, the correlation coefficient can be greatly influenced by
outliers.
Q32. Someone practices guessing correlation coefficients from scatterplots using an
“applet” on the internet. Why should the person not get too confident of his or her
guessing power given scatterplots of real-life data?
A32. Because the scales of the variables are not necessarily the same as they were on the
applet, and scales can throw off “eyeball” estimates.
Q33. In attempting to give a more complete description of a set of data involving two
variables, someone wants to give a measure of center and spread as well as measure of
the correlation coefficient. Assuming the person has made a good decision to use the
correlation coefficient, what measures of center and spread would be most consistent
with the correlation coefficient: the mean and sd or the median and IQR?
A33. The mean and sd, because the formula for the correlation uses the mean and sd.
Q34. The women in a corporation think that they are being discriminated against in their
salaries. A management spokesman says to them, “Look at this plot. The first data point
is the average salary for men who have worked here 1 year, put into an ordered pair with
the salary for women who have worked here one year. The second ordered pair is the
average salary for men and women with two year’s experience, and so forth. The
correlation between men’s salaries and women’s salaries is .95! That’s almost a perfect
correlation! You women have nothing to complain about!” Is this argument valid? Why
or why not?
A34. It’s not valid. The correlation coefficient measures the predictability of one score
from another, not the equality of the two scores. Adding same value to all values of either
x or y does not change the correlation – the same with subtraction, multiplication, or
division. So the salaries of women could be half those of the comparable salaries of men,
or $10,000 less than the comparable salaries of men, and you could still get a high
correlation.
Q35. Suppose that you have a data set with a correlation fairly close to 0. All the numbers
for both variables are between 0 and 10. There are about 10 individuals in the data set.
Then suppose that one more individual gets added, an outlier with a value of 100, 100.
What do you think the correlation coefficient will become? (Try it out with your
calculator or minitab if you want, or mimic this situation on an “applet.”)
A35. The correlation will become close to 1, because this one outlier has such a strong
effect.
Q36. True or false: In a regression line, like a correlation coefficient, you get the same
numbers (slopes and intercepts) no matter which variable is considered the explanatory
variable and which is considered the response variable.
A36. False. The change in y per unit change in x, for example, is not the same as the
change in x per unit change in y.
Q37. Please explain, for a least squares regression line: the sum of the squares of what are
being minimized?
A37. The squares of the errors for each data point, where the errors are the distances from
the data point to the regression line. (The word residuals is also correct.)
Q38. Please explain why the distance from the data point to the regression line
corresponds to the idea of an “error.”
A38. The regression line gives predicted values of y (called y-hat) for each x. There is
also an actual observed value y for each x, for each data point. The difference between
the actual and the predicted value is the “error” in prediction that is made by using the
regression line to predict the response variable.
Q39. What’s the formula for the slope of a regression line, in terms of the correlation
between the two variables and their standard deviations? (Call the slope b, the correlation
r, and the two sd’s sx and sy.)
A39. b=r (sy/sx).
Q40. Every least-squares regression line passes through what point?
A40. It passes through xbar, ybar, the ordered pair formed by the means of both
variables.
Q41. Once you know the slope of a regression line, how would you find the y-intercept,
knowing the means of the x values and the y values? (call the intercept a, and the means
for x and y xbar and ybar.)
A41. a=ybar- b*xbar. You get this by just solving for a the equation ybar=b*xbar + a.
And the second equation comes from the fact that ybar is always the predicted value of y
for xbar.
Q42. When you have a regression equation delivered by the computer software output,
and someone asks you for the predicted value of y given a certain x value, what do you
do?
A42. Just substitute the value of x into the equation and solve for the predicted y value.
Q43. Suppose that someone measures height as a function of weight for a bunch of
human adults, and gets a regression equation predicting height as a function of weight.
Why is the y-intercept of the equation not as meaningful or important as the slope, or as
the equation as a whole?
A43. Because the y-intercept corresponds to the height of someone with weight 0. But the
weight of 0 is far outside the range of weights measured in the study and thus the height
predicted will be an extrapolation. Secondly, the weight of 0 is one seldom found in
human beings, (at least those who have already been born and aren’t dead yet)!
Q44. Suppose you have a regression equation output from a computer and you are asked
to plot the line by hand. How would you do it?
A44. Just pick two values of x, and calculate the yhat values for each, and connect those
two dots. It helps if you pick points that are close to the bottom and top ends of the range.
(One easy point is the y-intercept.)
Q45. When, in the context of regression, people speak about the SST (sum of squares
total), what do they mean by that?
A45. The sum of the squared deviations of each y value from the mean of the y values.
Q46. When in the context of regression people speak of the SSE or sum of squares for
error, what do they mean by that?
A46. They mean the sum of the squares of the deviations of the actual y values from the
predicted y values. (These deviations are also called residuals.)
Q47. Your book doesn’t define very explicitly in this chapter what the sum of squares for
regression is. Do you think it would be reasonable to think of that as the sum of all the
squared deviations of the predicted y values (the y-hats) from the mean of the y values?
Particularly if a trustworthy source hinted that it was? (P.S. you can calculate the SSReg
in your head, easily, for the 3-point data set of example 3.11 on page 160: it comes out to
32. The SST comes out to 38, and the SSE to 6.)
A47. Yes.
Q48. The book speaks of the sum of squares for the regression as the SST-SSE, or the
sum of squares total minus the sum of squares for error. Can we infer from this that the
total sum of squares, SST, can be partitioned into the SSReg (sum of squares for
regression) and the SSE, (sum of squares for error), and that SST=SSReg+SSE? (P.S. I
use the notation SSReg so as not to confuse sum of squares for regression with sum of
squares for residuals.)
A48. Yes.
Q49. The square of the correlation coefficient, or r-squared, a.k.a. the coefficient of
determination, means what in terms of the fraction of the total sum of squares? Please
answer in symbols and words.
A49. r-squared = the SSReg/SST or (SST-SSE)/SST. The r-square is the fraction of the
total sum of squares that is accounted for by the regression of y on x.
Q50. One person studies IQ as a function of number of boxing matches participated in,
and another uses the same data set to study boxing matches participated in as a function
of IQ. (That is, matches is the explanatory variable in the first study and IQ is the
explanatory variable in the second.) Do they both get the same value for the slope of the
regression line? Can you explain in words the reason for this answer? (You may assume a
negative relationship between the two variables in constructing language for your
answer.)
A50. No. The first slope tells how many points IQ changes per additional boxing match,
and the second slope tells how many fewer boxing matches someone has for each
additional IQ point.
Q51. The slope of the regression line b is equal to r*(sy/sx). Along the regression line, a
change in 1 standard deviation in x results in a change of how many standard deviations
of y? (Hint: the slope is the change in y over the change in x. So the change in y equals
the slope times the change in x. So if the change in x is sx, we get r*(sy/sx)*sx, which
equals...
A51. r standard deviations of y, or r*sy.
Q52. True or false: the slope of the regression line tells you how many unstandardized
units the predicted value of y changes for each unstandardized unit change in x.
A52. True.
Q53. True or false: the correlation coefficient tells you how many standard deviations the
predicted y changes for each standard deviation change in x.
A53. True.
Q54. True or false: If both of two variables y and x are standardized, (so that the standard
deviation of both is 1) then the slope of the regression line and the correlation coefficient
are equal.
A54. True.
Q55. What is another name for y – yhat, or the deviation of the observed y value from the
predicted value, or the error in prediction for a given value, or the vertical distance
between any data point and the regression line?
A55. The residual.
Q56. True or false: when we speak of a “least squares” regression line, we mean that we
choose the line so as to minimize the squares of the residuals.
A56. True.
Q57. Someone draws a graph of residuals (on the y axis) versus the values of the
explanatory variable. This graph is called what?
A57. A residual plot.
Q58. Someone draws a residual plot, and all the values are positive. Someone says to that
person, “There must be some mistake.” Why did the person say that?
A58. Because the mean of the least-squares residuals is always zero; thus if you have
positive values you have to have at least one negative value.
Q59. If the linear regression equation fits the data well, what do you see on the residual
plot?
A59. A uniform scatter of points, without a clear pattern, and with no unusual individual
observations.
Q60. What do you call a data point that has a big effect on the slope or intercept of the
regression line?
A60. An influential point.
Q61. Does an influential point necessarily have a large residual? (Hint: the influential
point can pull the line close to it.)
A61. No.
YMS Chapter 4: More on Two-Variable Data
Q1. In the example at the beginning of this chapter, a plot of the log of brain weight as a
function of the log of body weight provides a “better fit” for the observed data than a
simple plot of brain weight as a function of body weight. What is meant by better fit?
(This is also the answer to the question, what are we trying to do (at least in this course)
when we transform data?)
A1. A better fit means that the data are more linear – a linear model is more successful in
describing the relationship, the correlation coefficient is higher, the sum of squares of
residuals for a regression line is lower.
Q2. Can you figure out why we would want to transform data so as to get a more linear
relationship?
A2. Because then we can use the methods of correlation and regression that were studied
in the previous chapter and will be studied more in future chapters, which are very
powerful methods of statistical analysis.
Q3. True or false: if we have a curvilinear function, and we want to straighten it out to
make a linear function, we can’t do that by multiplying or dividing by constants or adding
or subtracting constants (i.e. by using linear transformations).
A3. True. Linear transformations don’t straighten curves.
Q4. What are the transformations that are most commonly used, other than linear
transformation?
A4. Positive and negative powers, and logarithms.
Q5. What is the definition of a monotonic function?
A5. It’s one where as x increases, y always increases (a monotonic increasing function)
or as x increases, y always decreases (a monotonic decreasing function). It produces a
graph that doesn’t go up and dip down, but consistently has either a positive or negative
slope.
Q6. Is it kosher to speak of a function as being, for example, monotonic increasing over
part of the domain of x, and monotonic decreasing over another part? If so, can you give
an example?
A6. Yes. An example is y=x2, which is monotonic decreasing for negative values of x,
and monotonic increasing for positive values of x.
Q7. True or false: There are often two steps in transformation. The second is to apply a
power or logarithmic function that simplifies the data. The first is to use a linear
transformation, such as adding a constant, that makes the values all positive, so that the
function applied in the second step will be defined and monotonic increasing.
A7. True.
Q8. How is the ladder of power functions useful?
A8. When we are trying to straighten out curved data sets, we can go in one direction or
the other along the ladder, seeing how straight the line becomes, rather than just
randomly picking different functions.
Q9. Linear growth is to adding a fixed amount per unit time as exponential growth is to
______ by a fixed amount per unit time.
A9. Multiplying.
Q10. If the number of a certain type of bacteria doubles every two hours, is that linear
growth or exponential growth?
A10. Exponential.
Q11. Increasing everyone’s salary by a certain percentage is to ______ growth as
increasing everyone’s salary by the same dollar amount is to _______ growth.
A11. Exponential, linear.
Q12. Suppose we have a function y=ab^x, where a and b are constants and x is the
explanatory or independent variable, and y is the response or dependent variable. Is this
an example of an exponential function, or a power function?
A12. An exponential function. (This is a function like y=2^x.)
Q13. Suppose we have a function y=ax^b, where a and b are constants and x is the
explanatory variable and y is the response variable. Is this an example of an exponential
function, or a power function?
A13. A power function. (This is a function like y=x^2.)
Q14. If y is an exponential function of x, plotting what function of y versus x should
result in a linear graph?
A14. The log of y versus x.
Q15. Suppose you do a regression of the log (base 10) of y versus x, and you get a nice
linear scatterplot and a high coefficient of determination (r^2) when you do a regression.
Now you can use this linear relationship for prediction. Suppose someone (like a testmaker) asks you what the predicted value is of y (not log y) for a given value of x. How
would you find it?
A15. You’d just use your equation to find the predicted value of log y. Then you take the
antilog (or 10 to that number) to get the predicted value of y. In other words, you
“untransform” the value back to the original scale.
Q16. If a variable grows exponentially, its logarithm grows how?
A16. Linearly.
Q17. To make an exponential function linear, we use the log transformation just with the
response variable y. To make a power function linear, we use the log transformation with
what?
A17. Both the explanatory and the response variable.
Q18. If you start with the power function y=ax^p, and take the log of both sides, what
result do you end up with?
A18. log y=log a +p log x.
Q19. Suppose you have a data set, and its scatterplot is curved. Then you take the log of
both explanatory and response variables, and plot them, and you get a line. What do you
infer from this?
A19. That the original variables were related according to a power function (or power
law).
Q20. When you plot the log of y vs. the log of x, do you give any meaningful
interpretation to the slope of the line that you get? If so, what is it?
A20. According to the equation log y =log a + p log x, the slope of the line is the power
to which x is raised in the original power function.
Q21. Jane gets a regression coefficient (i.e. a slope) of 3.617 when regressing log y vs.
log x. She says, “Now I know that x and y are related in a power function, and y= a
constant times x to the 3.617 power.” Do you think this conclusion should be tempered or
qualified? If so, how?
A21. Yes. The slope you obtain in any given experiment is an estimate of the population
value of the slope, and not an exact rendering of it. Another sample is very likely to give
a different slope, and it could possibly even lead to a different conclusion about the form
of the functional relationship!
Q22. Suppose you plot the log y vs. the log x and you get a good line, with intercept 2
and slope 3. So log y=2+3log x. Now you are asked to find the equation for y in terms of
x, without logs in it. How do you do this?
A22. You just take the antilog of both sides. You get y= 10^(2+3 log x), or
y=10^2*(10^log x)^3, or y=100*x^3. That is, y=100 times x cubed.
Q23. Suppose I find that in the range of 3 to 7 milligrams of Ritalin given to a group of
children, their math scores rise in linear fashion with increasing dose. A parent looks at
the regression equation and says, “By my calculations, all it would take would be 400
milligrams of Ritalin for my child to get an 5 on the AP statistics course, while he’s still
in 3rd grade.” What do we call this type of reasoning (which often leads to incorrect
conclusions)?
A23. Extrapolation, which is using the regression equation to make predictions for values
of the explanatory variable that we have no experience with.
Q24. I notice a linear relationship between shoe size and basketball prowess, and propose
to help my daughter’s fifth grade basketball team win by dressing them all in size 15
Converse All-Stars. Someone says, “But in your study, there’s another variable, namely
height, that was not among your explanatory or response variables and yet may influence
the interpretation of the relationship between shoe size and basketball skill.” What type of
variable is height, in this situation?
A24. A lurking variable.
Q25. In the example you just read, a lurking variable enhanced the apparent association
between two variables. Can lurking variables also mask or attenuate the apparent
association between two variables? If so, can you think of an example?
A25. Yes. The relation of overcrowding and lack of indoor toilets on page 227 of YMS is
an example. Here’s another: suppose a researcher finds that the correlation between
violent entertainment viewed and aggressive behavior in real life is low. Then someone
reanalyzes the data and finds that when you make separate scatterplots for males and
females, you get a strong relationship. Males are much more aggressive than females, and
when you throw the data together on one scatterplot you get more of a horizontal line
than a sloped one. Thus the lurking variable of gender obscured the real association that
was there.
Q26. What’s one way of discovering lurking variables that someone may not have
thought about?
A26. Plot the data as a function of time. You may able to see relationships that can be
linked to events that happened at a certain time, or conditions that changed at a certain
time.
Q27. One researcher studies the murder rate of every state as a function of the average
literacy level of the teenagers in that state. The correlation is reported to be very strong
(and close to –1). Another researcher studies aggression in individuals in a high school,
as a function of their reading scores. The correlation, though in the same direction, shows
much less strong a relationship. Can you explain why?
A27. Correlations with averaged data are generally considerably higher than correlations
obtained with individuals. This is because some of the random variation in each variable
gets reduced when you sum or average over many individuals. (Later on in the course,
you’ll find out that the variance of the mean of a set of observations for a population is
equal to 1/n times the variance of the individual observations, where n is the number of
observations.) The less “noisy” both variables are, the more highly they tend to correlate
with each other.
Q28. When two variables X and Y are found to correlate with each other, of course two
possible explanations for this association are 1) that X causes Y, and 2) (one not
diagrammed on page 232) that Y causes X. Please name the other two possible
explanations that are good to keep in mind when interpreting findings of associations.
A28. Common response (z causes both x and y) and confounding (z, which is associated
with x, may cause y).
Q29. Suppose a researcher studies the effects of a way of teaching children not to be
violent. The researcher gives the instruction to all the children in Mrs. Harmony’s
classroom, and uses the kids in Mr. Gutsly’s classroom as a comparison group. But then
the researcher realizes that Mrs. Harmony has a very different personality and
interpersonal style than Mr. Gutsly: she tries to promote kindness and good will, whereas
Mr. Gutsly is mainly interested in promoting competitiveness and not being wimps. What
would we say about the variables of teacher personality and interpersonal style in this
study?
A29. That they are CONFOUNDED with the intervention. Thus the effects of these
teacher variables can’t be distinguished from the effects of the intervention the study is
meant to test.
Q30. Someone finds that the degree of physical fitness in youth (as measured by heart
rate recovery from exercise) is correlated with the number of ankle injuries the person has
had. But before concluding that we should hurt the ankles of youth in order to make them
more fit, a COMMON RESPONSE explanation for the association comes to mind. Can
you posit this common response explanation?
A30. That both fitness and ankle injuries are associated with more running or more
athletic activity – both are responses to this basic causal variable.
Q31. Even when causation is present, is there usually one and only one contributing
cause for a given effect, at least in the types of phenomena people study with statistics?
A31. No.
Q32. Someone says, “Lots of kids play “shooter” video games for hundreds of hours, and
never do anything violent. Therefore these games can’t cause violence.” What does the
principle, as stated in your text, that “Even when direct causation is present, it is rarely a
complete explanation of an association between two variables” have to do with this
reasoning?
A32. Another way of stating this principle is that one phenomenon does not have to be a
necessary and sufficient condition for a second, in order to be causally related. Therefore
one or several instances of non-association do not disprove a causal relationship.
Q33. What is the strongest type of evidence for causal relations?
A33. Well-designed experiments that are meant to control for all lurking variables.
(These usually entail randomly assigning individuals to different conditions.)
Q34. What’s the problem with doing a well-designed experiment, for example, to see
what the effects of child abuse are?
A34. We will never find it ethical to randomly assign children to conditions of child
abuse versus nonabuse.
Q35. Is it possible to come to valid causal inferences without doing experiments that
randomly assign people to various conditions? Can you give an example of such?
A35. Although your text says that “the only fully compelling method” of establishing
causality is an experiment, we can and do come to valid causal inferences without
randomly assigning people to conditions. The example of smoking and lung cancer is one
where the evidence for causation is “overwhelming” despite no study in which people
were randomly assigned to smoke or not smoke over many years.
Q36. A two-way table describes the relation between two of what kind of variables?
A36. Categorical.
Q37. When you look at a two-way table that looks like this
party affiliation
Democrat
Republican
Total
Approval of president’s performance
Yes
No
25
100
125
5
130
105
255
125
150
Total
What is the row variable, and what is the column variable?
A37. The party affiliation is the row variable and the approval of the president’s
performance is the column variable.
Q38. If we look, at the table above, at the totals for the rows, we get how many
Democrats and how many Republicans are in the sample. Similarly, the column totals tell
us how many approvers and disapprovers are in our sample. These give us the
distribution for each variable separately, in our sample. These distributions are called
what?
A38. Marginal distributions. (Because they’re in the right and bottom margins of the
table.)
Q39. The above table gives the results in counts. Especially when the marginal
distributions are not equal (for example, if the sample should contain twice as many
Republicans as Democrats) we should convert the count data to what kind of data?
A39. To per cents (or fractions).
Q40. True or false: When describing the relationship between two quantitative variables,
the scatterplot and the correlation coefficient are usually the graph and numeric measure
of choice; but in describing the relation between two categorical variables, no single
graph or numeric measure summarizes the strength of the association. We usually pick
and choose among bar charts and pie charts and the reporting of various per cents.
A40. True.
Q41. Someone looks at a sample of 500 men and 100 women. 250 men oppose the war,
whereas 80 women oppose the war. The researcher says, “Lots more men than women
oppose the war. Therefore the idea that women in this area are more pacificist is
incorrect.” What’s the problem with this reasoning, and what should the researcher do?
A41. He should not just use the counts, but find the per cents. 50% of men, but 80% of
women, in this sample opposed the war. So in this region it looks like the women are
more anti-war than the men.
Q42. Suppose you have three age groups, and you have data on how many individuals
got educated to each of 4 different levels. Suppose you calculate, just for one of the age
groups, the per cent of people in that age group who attained each level. This distribution
of per cents for one age group is called what?
A42. A conditional distribution.
Q43. Do the per cents for a conditional distribution add to 100 for each of the different
groups for which you calculate them?
A43. Yes.
Q44. Do the per cents for conditional distributions equal the per cents for marginal
distributions?
A44. No, not necessarily.
Q45. There were two AP Statistics teachers. 40% of the 40 students in the first teacher’s
classes got 5’s, and 25% of the 40 students in the second teacher’s classes got 5’s. People
assumed that the first teacher is better. However, someone then studied the results based
on whether or not the students scored above or below a certain cutoff on the SAT, before
going into AP Statistics. The first teacher had 80% of students above this cutoff and 20%
below. The second teacher had 20% above and 80% below. The first teacher had 50% of
the “aboves” get 5’s, and none of the “belows.” The second teacher had 75% of the
“aboves” get 5’s, and 12.5% of the “belows.” Now which teacher appears to be better,
and why?
A45. The second teacher, because a higher fraction of that teacher’s students got 5s from
those both above the cutoff and below the cutoff.
Q46. The situation above is whose paradox?
A46. Simpson’s.
Q47. True or false: In Simpson’s paradox, there is a lurking variable, which predisposes
the results against one of the two groups; controlling for the effects of that lurking
variable by looking separately at the subsets formed by the categories of it reveals results
in the opposite direction from those obtained when ignoring the lurking variable.
A47. True.
Q48. If a lurking variable can actually reverse the direction of results, do you think it is
also possible that a lurking variable could result in lack of an observed association when
in fact there is a causal influence?
A48. Yes.
Q49. Does the fact that lurking variables can obscure influences that are actually present
imply that: not only does correlation not imply causation, but lack of correlation does not
rule out causation?
A49. Yes.
YMS Chapter 5: Producing Data
Q1. The difference between an observational study and an experiment is that in the first,
the explanatory variable is observed and measured, whereas in an experiment, the
explanatory variable is ____.
A1. Deliberately imposed, manipulated.
Q2. When there is a jobs program for welfare recipients, and you simply observe that
those who voluntarily take part in the program do better than those who don’t, what’s the
problem with inferring that the program causes better results?
A2. The effects of the program are confounded with the characteristics that lead people to
seek the program, for example motivation and values.
Q3. The entire group of individuals we want information about is called the _____.
A3. Population
Q4. The subset of the population we actually examine in order to gather information is
called the ______.
A4. Sample.
Q5. Studying the whole population by attempting to contact every individual is called
conducting a ______.
A5. Census.
Q6. Studying a population by taking a subset of it in order to generalize to the whole
population is called _____.
A6. Sampling.
Q7. The method used for selecting the sample from the population is called the ____ of
sampling.
A7. Design.
Q8. If a radio station invites anyone who wants to call and give an opinion on a question,
the set of people thus obtained is called a _____ response sample.
A8. Voluntary.
Q9. If the researcher enrolls a group of people in the study on the basis of how easy it is
to contact them and get them to enroll, that method of sampling is called ______
sampling.
A9. Convenience.
Q10. The systematic error introduced when the sample is very different from the
population is called ____.
A10. Bias.
Q11. If a conservative radio commentator polls his listeners, and a liberal commentator
polls her listeners, both polls are likely to be biased as methods of ascertaining the
sentiment of the country, because _______.
A11. It’s very likely that these samples differ highly from the country as a whole.
Q12. A SRS, or simple random sample, is a subset of n individuals from a population,
chosen in such a way that ____.
A12. Every subset of n individuals has an equal chance of being chosen for the sample.
Q13. True or false: if every individual in the population has an equal chance of being
included in the sample, the sample is a simple random sample.
A13. False. You need not only this condition, but also that every subset of the population
of size n is equally likely to be chosen.
Q14. Suppose I take the numbers 1, 2, 3, and 4, and write them on identical pieces of
paper, put them into a hat and mix them thoroughly, and draw out two numbers. Is this a
simple random sample of the 4 numbers?
A14. Yes.
Q15. Suppose I take the numbers 1, 2, 3, and 4. First I take the numbers 1 and 2 and put
them into a hat, and choose one of them. Then I take the numbers 3 and 4 and put them
into a hat and choose one of them. For each of the numbers 1, 2, 3, and 4, what is the
probability that this number will end up in the sample?
A15. .5
Q16. Is it possible that the subset {1,2} would be chosen for our sample using the
sampling method just mentioned (that is, pick randomly from 1 and 2, then pick
randomly from 3 and 4)?
A16. No.
Q17. So the sampling method just mentioned is one where each individual has equal
probability of being chosen, but each subset is not equally likely to be chosen; thus the
sample obtained is, or is not, a simple random sample?
A17. Is not.
Q18. In a table of random digits, each triple of digits is equally likely to be any of the
____ possibilities from _____ to ______.
A18. 1000, 000, 999
Q19. The two rhyming words (with different ways of spelling the second syllable) that
summarize the process of using a table of random digits to select a simple random sample
are ___ and ____.
A19. Label and table.
Q20. There are 7 members in a class. Please describe how you would use a table of
random digits to select a simple random sample of 3 of them.
A20. Assign each of them a single digit label. Enter the random number table at any
point, and look at the numbers in order. If the number isn’t one of the ones you assigned,
ignore it and go to the next. If it is one that you assigned, put that individual in the
sample. Keep going until you have put 3 individuals in the sample.
Q21. A sample chosen by chance is called a ____ sample.
A21. Probability
Q22. Suppose there is a class, and someone wants to choose a random sample of it. But
the researcher wants to make sure that both males and females are adequately sampled.
So the researcher takes the names of the girls, and draws a simple random sample of
them, and then does the same with the boys’ names. The total sample thus obtained is not
a simple random sample, but a _____ random sample.
A22. Stratified
Q23. Suppose a researcher wants to collect a random sample of high school students in
the U.S. The researcher first takes a simple random sample of counties in the country,
then takes a simple random sample of high schools within each county, and then a simple
random sample of students within each high school. This sampling method is called
____.
A23. Multistage sampling.
Q24. The above method of sampling high school students leaves out homeschoolers. The
general term for such a problem in sampling is ____.
A24. Undercoverage.
Q25. When you get a survey in the mail and immediately toss it in the trash, the source of
bias this introduces into the survey is called _____.
A25. Nonresponse.
Q26. If you were asked what is the “essential principle of statistical sampling,” would
you say that it’s to have a simple random sample, a probability sample, a stratified
sample, or a multistage sample?
A26. A probability sample, because the most essential factor is that the sample be chosen
by chance.
Q27. During recent decades, society has become less and less tolerant of any sexual
activity between therapists and their clients. Surveys of the incidence of such behavior
are now almost impossible to obtain, because therapists would avoid trusting a researcher
with a confession of behavior that would lead to severe penalties. This introduces bias
into any survey that is called ______ bias.
A27. Response
Q28. One survey question asks, “Do you believe that children should be legally protected
from exposure to violent models on TV that can lead them to commit acts of violence?”
And a separate question asks, “Do you believe that government should limit the free
expression of ideas by censoring television?” The major difference in results these
questions would yield would be referred to as ______ effects.
A28. Wording.
Q29. Which would give more accurate results in a poll: a probability sample of 1000
people, or a voluntary response sample of 100,000 people?
A29. The probability sample of 1000.
Q30. A study in which we actually do something to people, animals, or objects in order to
learn about the response is called an _____.
A30. Experiment.
Q31. The individuals on which an experiment is done are called the experimental whats?
A31. Units.
Q32. When the experimental units are human beings, according to our book they are
called ____, (although the preferred term among psychological researchers these days is
“participants.”)
A32. Subjects.
Q33. The thing that is done to the subjects (or participants) (for example giving them a
drug or teaching them to read) is called a ____.
A33. Treatment.
Q34. Suppose that in an experiment, learning of math facts is your response variable.
You are studying two explanatory variables, and varying them systematically in your
study: amount of practice, and the frequency of recurrence of any one math fact in a
practice session. These two explanatory variables are called the two _____ in the
experiment.
A34. Factors.
Q35. In an experiment on math facts, one is studying the frequency of recurrence of any
one math fact in practice sessions: does, for example, 7+8 occur every 3 problems, every
20 problems, or every 90 problems? If the experiment is set up like this, within the factor
called “frequency of recurrence” there are three different degrees of that factor, three
different specific values of that factor, which in the jargon is called three ____ of that
factor.
A35. Levels.
Q36. A pill that is made of inactive material, which is used so that subjects can have
information withheld about which treatment group they are in, is an example of a _____.
A36. Placebo.
Q37. If you want evidence for causation, and if you want to study the interactions of
factors, and you are able to do either an observational study or an experiment, you
should, all other things equal, choose the ______.
A37. Experiment.
Q38. When people get better from an inactive treatment, that is called the ____ effect.
A38. Placebo.
Q39. A group of individuals who receive an inactive treatment, so that the effects of a
possibly active treatment can be contrasted with those of inactive treatment, is called a
____ group.
A39. Control.
Q40. A researcher tries to make two treatment groups equal on every variable other than
the treatment of interest. The researcher does this by fashioning two groups that are very
similar on several variables relevant to outcome. What is the problem with this method?
A40. There are too many lurking variables – the experimenter may not measure all of
them, and some of them may not become apparent until after the experiment. Some of
them may not be measurable at all.
Q41. What’s the “gold standard” method of assuring the equivalence of two treatment
groups?
A41. To randomly assign subjects to treatment groups.
Q42. Suppose you first assemble pairs of subjects that are very similar on the
preintervention measure of the response variable. Then, you randomly assign one
member of each pair to the experimental group or the control group. This method
combines random assignment with ______.
A42. Matching.
Q43. A researcher is studying the effect of two methods of teaching reading. Instead of
matching the subjects on their reading level and then randomly assigning one of each pair
to the two groups, the researcher ignores the initial reading level for purposes of
assignment to groups, and instead picks a simple random sample of the whole set of
subjects to be in each group. Is this an acceptable method of assignment to groups?
A43. Yes. Understanding why this is true is central to the logic of experimental design.
Q44. True or false. When subjects are assigned at random to two groups, get two
treatments, and they differ on the response variable, it must be true that the treatment
accounts for the difference between the groups.
A44. False. It is also possible that the “play of chance in the random assignment”
accounts for the difference in the groups. (However, the likelihood of this alternative
explanation can be quantified, and when it is small enough, the other explanation is
favored.)
Q45. Suppose that both treatments studied in an experiment in fact have no causal
influence upon the response variable. Under what conditions are we more likely to see
big differences between the two groups, due to the vagaries of random assignment: with a
very small sample size, or a very big sample size?
A45. A very small sample size.
Q46. A difference between groups that is so large (and with so many subjects) that it
would “rarely” (i.e. to whatever criterion of rarity we specify) occur by chance is called a
_____ ______ effect.
A46. Statistically significant.
Q47. The three central principles of experimental design are _____ (which is making
comparisons between groups), _______ (a method of assigning individuals to groups),
and _______ (which has to do with how many individuals you have in your groups).
A47. Control, random assignment, and replication. (The word replication here refers to
repeating the observation on more subjects within a given experiment. The word is also
used, in a different sense, to refer to repeating the experiment.)
Q48. Suppose that we want to study the effect of a new curriculum and an old one, on
reading skills. We also want to study the effects of whether the curriculum is delivered in
person or over the phone. We randomly assign subjects to the curriculum, but we can’t
randomly assign them to in person or over the phone, because certain people live too far
away to get the training in person. So subjects are allocated to the curriculum at random,
but not to the delivery method. We would say that the experimental design here is not
_____ ________.
A48. Completely randomized.
Q49. In a “double-blind” experiment, what two sets of people are “blind” to which group
the subject is in?
A49. The subjects themselves, and the research staff who have contact with them.
Q50. What’s a problem in making inferences from experiments that is often less of a
problem in observational studies?
A50. The problem of “lack of realism,” in other words, the problem that the conditions in
the study do not match those to which we wish to generalize.
Q51. How do you do random assignment in a matched pairs design?
A51. First choose pairs that are as similar as possible, then randomly choose one subject
from each pair.
Q52. In a certain type of matched pair design where each subject serves as his or her own
control, and each “pair” consists of only one individual, what is randomly assigned?
A52. Whether the subject gets treatment 1 first or treatment 2 first.
Q53. Suppose that we want to compare two method of tutoring in reading that children
receive after school. We know that the teacher the student has is also an important
variable in the outcome variable, which is reading skill. Please describe how we would
use a block design to control the effect of teacher when studying the method of teaching
reading.
A53. We randomly assign the students of the first teacher to the two groups, then do the
same for the students of the second teacher, and so forth, rather than using a simple
random sample of all students.
Q54. True or false: If we want to make separate conclusions about males and females in a
study, it’s a good idea to block on gender when making our assignment to groups.
A54. True.
Q55. Making a model that accurately reflects the experiment under consideration and
imitating chance behavior based on that model is called doing a _____.
A55. Simulation.
Q56. What are the 5 steps of doing simulations?
A56. State problem, state assumptions, assign digits to represent outcomes, simulate
repetitions, state conclusions.
Q57. Someone is wanting to simulate a situation where there’s a 3/10 chance that a child
will be involved in bullying. The person assigns the digits 0 to 3 for involved in bullying,
and the rest of the digits to noninvolved in bullying. Do you have a problem with this? If
so, what’s your problem?
A57. Yes a problem. There are 4 digits from 0 to 3 inclusive, and 6 other digits, so the
person would be simulating a 40% probability situation rather than a 30% probability.
Q58. Please use your calculator to generate 4 random integers in the range from 0 to 99.
Please tell what you entered on your calculator to get these, and what 4 integers you got.
A58. On the TI 83 or 84, you do math>prb>5:randint, (0, 99, 4). On the TI 89, you do
catalog, F3, and then scroll down to randInt and hit enter. Then you insert 0, 99, 4 in the
parentheses. You’ll get different sets of numbers each time, unless something very
unlikely happens!
YMS Chapter 6 Probability
Q1.The branch of mathematics that deals with the pattern of chance outcomes is ____.
A1. Probability
Q2. The big idea of the study of probability is that chance behavior is unpredictable in the
_____ but has a regular and predictable pattern in the _____.
A2. short run, long run
Q3. An illustration of the “big idea” mentioned in Q2 is that while it is unpredictable
whether a single coin toss will come out heads, the ________ is almost always very close
to .5.
A3. Fraction of heads in a very large number of tosses
Q4. What is the difference between a changing, or variable phenomenon that is “random”
and one that is not?
A4. A random phenomenon is uncertain with respect to individual outcomes, but
nonetheless there is a regular distribution of outcomes in a large number of repetitions.
Q5. The ____ of any outcome of a random phenomenon is the proportion of times the
outcome would occur in a very long series of repetitions, i.e. long-term relative
frequency.
A5. Probability.
Q6. When there are independent trials, that means that the outcome of one trial _______.
A6. Does not influence the outcome of another.
Q7. The set of all possible outcomes of a random phenomenon is called the ______.
A7. Sample space.
Q8. An event is defined as a subset of ____.
A8. The sample space.
Q9. When we make a mathematical description of a random phenomenon by describing a
sample space and a way of assigning probabilities to events, we are constructing a
A9. Probability model.
Q10. Jane has 2 shirts and 3 pairs of pants. If we want to picture the 6 ways she can dress
in these garments, we can draw a diagram with a bifurcation point at the left of the page,
with two lines going out to two points called “red shirt” and “brown shirt.” From each of
these, you then draw 3 lines, saying “blue pants,” “green pants,” and “black pants.” This
sort of picture is called a _____.
A10. Tree diagram.
Q11. Jane has 2 shirts and 3 pairs of pants. The “Cartesian Product” of these two sets
produces 6 possible combinations. This illustrates what our book calls the _____
principle, which says that if you can do one task in a ways, and another in b ways, you
can do both together in _____ ways.
A11. multiplication, ab
Q12. Please give an example of sampling with and without replacement.
A12. As one of many possible examples: in sampling without replacement, you draw first
one, then another card from a deck without putting the first card back. In sampling with
replacement, you draw one card from the deck, note its identity, replace it, shuffle them,
draw again, and note the identity of the second draw.
Q13. The probability of any event A has to satisfy the inequality x<= P(A) <=y. What
are x and y?
A13. 0 and 1.
Q14. If you sum the probabilities for each member of the sample space, you always come
to a grand total of ____.
A14. 1.
Q15. If the probability that A will occur is P(A), the probability that A will not occur is
____.
A15. 1-P(A).
Q16. What notation do we use to represent the probability that A will not occur, if P(A) is
the probability that A will occur?
A16. P(Ac)
Q17. What does it mean when we call two events disjoint or mutually exclusive?
A17. That if one happens, the other can’t also happen.
Q18. If two events are mutually exclusive, and P(A) is the probability of A, and P(B) is
the probability of B, what is the probability of (A and B) (a.k.a. A intersect B)?
A18. 0
Q19. If two events are mutually exclusive, and P(A) is the probability of A, and P(B) is
the probability of B, what is the probability of (A or B) (a.k.a. A union B)?
A19. P(A) + P(B).
Q20. Suppose a random event has k equally likely outcomes. What’s the probability of
any one of these outcomes?
A20. 1/k.
Q21. When there are k equally likely outcomes, and a certain number of outcomes are
counted as being in event A, the P(A) = (number of outcomes in A)/k. Please give an
example of this.
A21. As one possible example: when a deck of cards has 52 cards, each of which is
equally likely to be drawn, and 13 of these are called hearts, then P(drawing a heart) =
number of hearts/number of cards, or 13/52, or ¼.
Q22. In a finite sample space, with outcomes that are not equally likely, the probability of
any event is the sum of the probabilities of the outcomes making up the event. Someone’s
statistics instructor asks, “Please give an example of this.” The person says, “Suppose the
probability that a randomly selected person in a certain community will own a dog is .3, a
cat is .2, and another pet is .1. Then the probability that the person will own a pet is (.3
+.2 +.1). Do you have a problem with this example? If so, what’s your problem?
A22. One big problem is: what about the people who own two or more different types of
pets? If the only people who owned cats or other pets were the dog owners, the
probability of pet ownership would be .3 instead of .6. The addition rule only applies to
disjoint events, and these categories are not disjoint.
Q23. When two events are independent, the probability that both will occur?
A23. The product of their individual probabilities.
Q24. How would you use the multiplication rule for independent events, as well as the
complement rule, to calculate the probability that in two coin tosses, you would get at
least 1 tail? Please explain your reasoning as you compute an answer.
A24. Your strategy is to realize that (at least one is tails) is the complement of the event
(both are heads). So you use the multiplication rule to compute the probability of two
heads (1/2 * ½ = ¼) and then compute the complement of that (1-1/4) to get ¾.
Q25. Suppose you have a lab test that gets a “false positive” 1 in 100 times. (That means
that the P is 1/100 that the test will say you have the disease, when you don’t.) How
would you use the multiplication rule as well as the complement rule to compute the
probability that out of 100 tests where the correct answer is “negative,” you will get at
least one false positive? Please explain your reasoning as you compute an answer.
A25. The logic for this is exactly the same as for Q24. You realize that the event (at least
one false positive in 100 tests) is the complement of (no false positives in 100 tests). The
probability of not getting a false positive on one test is .99, and each test is assumed
independent of the others, so .99100 (or about .37) is the probability that all the tests will
be free of false positive results. The complement of this, which is the probability that at
least one is a false positive, is about .63.
Q26. What is the general addition rule for unions of two events that may or may not be
disjoint?
A26. P(A or B) = P(A) +P(B) –P(A and B).
Q27. The probability that someone plays the guitar, in a certain community, is .4, and the
probability that the person plays the harmonica is .2. The probability that the person
plays both guitar and harmonica is .1. What’s the probability that the person plays either
guitar or harmonica?
A27. .4+.2-.1=.5
Q28. In a certain community, .7 of people take physics and .6 of people take statistics.
The fraction of people taking one or the other is .85. What’s the probability that a
randomly drawn person will take both courses?
A28. P(A or B) = P(A) + P(B) - P(A and B); therefore .85=.7+.6 –P(A and B), from
which P(A and B) =.45.
Q29. What does the symbol P(B|A) mean?
A29. The probability of B given A, or the probability that B will happen given that A has
happened.
Q30. What is the general multiplication rule for any two events?
A30. P(A and B) = P(A)P(B|A)
Q31. Why does this rule simplify to the multiplication rule for independent events, when
the two events are independent?
A31. Because the definition of independence is that P(B|A) =P(B), another way of saying
that A doesn’t influence the occurrence of B. Thus when the two events are independent,
we can substitute P(B) for P(B|A) in the general multiplication rule, and get P(A and B)=
P(A)P(B).
Q32. The general multiplication rule for any two events really follows from the definition
of conditional probability, and is a rearrangement of the defining formula. What is that
defining formula for conditional probability?
A32. P(B|A) = P(A and B) /P(A)
Q33. True or false: P(A and B and C) = P(A)P(B|A)P(C|A and B)?
A33. True.
Q34. True or false: the fact that P(A and B and C)=P(A)P(B|A)P(C|A and B) is the basis
of tree diagrams.
A34. True.
Q35. If there’s a .05 chance that a male high school athlete will play in college, and a
.017 chance that a college athlete will play professionally, what’s the chance that a male
high school athlete will both go to college and play professionally?
A35. .05 * .017, or .00085.
Q36. If there’s a .95 chance that a male high school athlete will not play in college, and a
.0001 chance that someone not playing in college will play professionally, what’s the
chance that someone will both not play in college and play professionally?
A36. .95 * .0001, or .000095.
Q37. If the chance is .00085 that a male high school athlete will play in college and play
professionally, and the chance is .000095 that an athlete will not play in college and play
professionally, what’s the chance that a male high school athlete will play professionally?
A37. .00085+.000095, or .000945, which is a little less than one in a thousand.
Q38. For a tree diagram: the probability of reaching the end of any complete branch is the
product of what?
A38. The probabilities written on its segments (provided that those probabilities are
written so as to mean the conditional probability of going down this branch given that
you have reached the previous bifurcation!)
Q39. When you look at Bayes’s rule, (page 375) does the numerator of the fraction,
P(B|A)P(A) equal the probability of A and B?
A39. Yes.
Q40. When you look at Bayes’s rule, (page 375) does the denominator of the fraction
reduce to the probability of B and A plus the probability of B and Ac, which reduces to
the probability of B?
A40. Yes.
Q41. Is a simpler form of Bayes’s rule the following: P(A|B) = P(B|A)P(A)/P(B) ?
A41. Yes.
Q42. Does this simpler form of Bayes’s rule say that if you want to “reverse the direction
of a conditional probability,” i.e. to find P(A|B) when you know P(B|A), you multiply the
P(B|A) by the ratio of the P(A) to the P(B)?
A42. Yes.
Q43. P(A|B)=P(B|A)P(A)/P(B) implies that if you know the probability of a positive test
given having a disease, you can calculate the probability of having the disease given a
positive test, provided that you also know what two other quantities?
A43. The probability (or rate) of the disease, and the probability (or rate) of positive tests.
So for example if the probability of a positive test given the disease was .99, but the rate
of positive tests was 1 in 100, but the rate of the disease was 1 in 1000, the probability of
the disease given a positive test would be only .99*.001/.01, or .099. This illustrates that
the probability of having the disease given a positive test does not necessarily equal the
probability of having a positive test given the disease!
Q44. What should someone do to organize the information for decision analysis if one is
given a set of complex information about branching alternatives and outcomes given
those alternatives?
A44. Use a tree diagram.
YMS Chapter 7 Random Variables
Q1. A random variable is a variable whose value is a ________ of a random
phenomenon.
A1. Numerical outcome
Q2. A random variable with a countable number of possible values is a _____ random
variable.
A2. Discrete.
Q3. What is a probability distribution of a discrete random variable?
A3. A list of the values the variable can take on, and the probability for each value.
Q4. For the probability distribution of a discrete random variable, every probability is
between ___ and ___, and the sum of all the probabilities is equal to ___.
A4. 0 and 1, 1
Q5. In a probability histogram, what quantity do the horizontal and vertical axes
represent, respectively?
A5. The horizontal axis represents the possible values the random variable can take on,
and the vertical axis represents the probability of that value.
Q6. A continuous random variable can take on how many values for a certain interval in
its domain?
A6. An infinite number
Q7. A continuous random variable’s probability distribution is described by a graph
called the ___.
A7. Density curve, or probability density curve.(This is the graph of the probability
density function, or pdf.)
Q8. Events, for continuous random variables, are described by the random variable’s
taking on a value within a certain interval. The probability of that event is represented by
what aspect of the density curve?
A8. The area under the curve, between the two points that bound the interval, or the area
under the curve, over the values (on the x-axis) that make up the event.
Q9. Suppose you have a continuous random variable X. What is the probability that
X=10?
A9. Zero. Continuous probability distributions assign probability 0 to every individual
outcome.
Q10. In a continuous probability distribution, what is the relationship between the
probability that X<10 and the probability that X<=10?
A10. The two are equal, because the probability that X=10 is 0.
Q11. True or false: the normal distribution is an example of a continuous probability
distribution.
A11. True.
Q12. The mean of a discrete random variable is the sum of the products of all the possible
values and the __________.
A12. Probabilities of those values.
Q13. Suppose there are two possible outcomes for a certain random variable, 0 and 100.
The probability of getting 0 is .99 and the probability of getting 100 is .01. What is the
mean of the random variable?
A13. 1.
Q14. The mean of a random variable is often called the e_____ v_____ of the variable.
A14. Expected value
Q15. Someone is offered a gambling game where there is a .25 chance of her losing
$100, and a .75 chance of her winning $60. If she plays many times, what would her
average winnings be?
A15. $20. (Because this number is positive, it is unlike any gambling games anyone is
likely to be offered.)
Q16. Someone is invited to send in a contest entry in which the chances are 1 in 50
million of winning a million dollars, and one in a million of willing a thousand dollars.
How do the expected earnings (or the mean earnings) from this contest compare with the
price of a first-class stamp?
A16. The expected earnings are 1/50million * 1million + 1/1million*1000 + the rest of
the probability*0. This comes out to one fiftieth of a dollar plus one thousandth of a
dollar, or 2.1 cents. At the time of this writing, a first class stamp costs 39 cents. So the
cost is 36.9 cents more than the expected earnings.
Q17. The mean of symmetric continuous probability distributions lies at the ____ of the
curves.
A17. Center
Q18. The variance of a discrete random variable is the sum of the products of the squared
deviation of each possible value from the mean of the distribution and the _____ for that
value.
A18. Probability
Q19. Suppose there is a distribution with possible values 0, 1, and 2, each with
probability 1/3. What is the variance, i.e. the sigma-squared, of this distribution? (This is
also known as the variance of the population.)
A19. 1*1/3 +0*1/3 +1*1/3, or 2/3.
Q20. Think back to the definition of the variance of a sample. Suppose you had a sample
consisting of 0, 1, and 2, with mean 1. Is the variance of this sample the same as the
variance of the population?
A20. No. The variance of the sample is the sum of the squared deviations over n-1. So the
variance of the sample would be (1+0+1)/(3-1) or 1, rather than 2/3.
Q21. Please take a few seconds to enter 0,1,and 2 in a list on your calculator. (On the TI83 or 84, stat >edit.) Then please compute 1 variable stats on these. (Stat>calc>1varstats,
listname). Look at sx and sigma x results. What are they, and why do they make sense vis
a vis the different definitions of population and sample variance and standard deviation?
A21. They are 1 and .81650, respectively, and they do make sense because they are the
square roots of 1 and 2/3, respectively.
Q22. What is the law of large numbers, in your own words?
A22. One way of putting it is that as the sample size approaches infinity, the sample
mean approaches the population mean. Another is that you can make the sample mean
get as close as you want to the population mean by getting a large enough sample.
Q23. True or false: If by chance, you flip a coin and get 10 heads in a row, the law of
large numbers tells us that if we flip many more times, we will get just a tiny bit under
50% heads in the remaining tosses, to compensate for the first 10 heads and make the
long-range probability equal 50%.
A23. False. The definition of independent trials implies that the coin “doesn’t remember”
the first 10 flips and the subsequent results are not influenced by the initial ones.
Q24. The law of large numbers says that you can get a sample mean as close as you want
to a population mean by using a large enough sample. (Or: the probability that the
difference between x-bar and mu is less than a given constant c approaches 1 as the
sample size approaches infinity.) True or false: Given a certain desired accuracy, i.e. a
certain maximum difference between the sample mean and the population mean, you will
need a larger sample the greater the variance of the population.
A24. True. For example, with very small variability in the population, the sample mean
gives a close approximation to the population mean with a much smaller sample size than
when there is great variability in the population.
Q25. True or false: the mean of a linear function of a random variable is that same linear
function of the mean of the random variable. In other words, the mean of a + bX is a
+b*the mean of X.
A25. True.
Q26. The mean of the sum of two random variables equals what?
A26. The sum of the means of the two variables.
Q27. If the mean amount that Linda makes at her lemonade stand per day is $10 and the
mean amount her brother Tom makes is $9, what’s the average of their total daily
receipts?
A27. $19.
Q28. Suppose someone tells you that the standard deviation of single scores on the SAT
is 100 points. Suppose that there are two people who take the SAT independently of one
another. How would you find the sd of the sum of their scores?
A28. You’d first square their scores to get the variances of their individual scores –
10000 apiece. Then you add those variances, to get the total variance – 20000. Then you
take the square root of that to get the sd of the sum, 100*the square root of 2, or 141.42.
Q29. If the standard deviations of the SAT math and critical reading are both about 100,
is the standard deviation of the sum of these two scores for an individual more or less
than 141.42? Why?
A29. More, because the two scores are correlated with each other. The variance of the
sum is the sum of the variances plus twice the correlation between the variables times the
product of their standard deviations.
Q30. Can you give an intuitive explanation for why the variance of the sum of two
random variables is increased, the more highly they are correlated with each other?
A30. The variance is increased the more extreme values you have. If the two variables
are independent, then a high value of one variable tends to be balanced by a lower value
of the other. For example, if you get a 6 on rolling one die, you’ll on the average get a
sum of 9.5 once you add in the value of the other die – not a sum of 12. The same thing
goes in the other direction: low values of one variable are on the average, balanced by
higher values of the other. But when the values are highly correlated, then high values
of one variable predict high values of the other, and you sums that are higher, and low
values of one variable predict low values of the other, and you get sums that are lower.
So with sums that are higher and lower, the variance of the sum is greater with correlated
variables.
Q31. A linear combination of two independent normally distributed random variables is
distributed how?
A31. Normally.
YMS Chapter 8 The Binomial and Geometric Distributions
Q1. Suppose someone looks at the numbers of 1’s, 2’s, 3’s, 4’s, 5’s, and 6’s that result
from 600 die rolls. Is this situation an example of the “binomial setting”?
A1. Almost, but not quite. For there to be a binomial setting, you have to have each
observation fall into only two categories, rather than the 6 categories described here.
However, if you defined a 1 as a “success” and anything else as a “failure,” then you
would have a binomial setting, and you could then do the same thing separately with 2, 3,
4, 5, and 6.
Q2. What are the four requirements for the binomial setting?
A2. 1. Two categories 2. Fixed n of observations 3. Independence 4. p of success same
for all observations.
Q3. The distribution of the number of successes out of n trials (with probability of
success p on each trial) is the ______ _______.
A3. binomial distribution
Q4. If someone has 51 socks in a drawer, with 1/3 red and 2/3 black, and the person grabs
a handful of 5 of them, and counts the number of black, will the results of such a trial
follow the binomial distribution? Why or why not?
A4. Not quite, because grabbing a handful of 5 is equivalent to sampling without
replacement. The probability of a black’s being included in the handful is altered some
depending on what other socks are also in the handful. If you picked a sock one at a time,
replaced the sock and mixed them thoroughly, and then picked again, the binomial
distribution would apply.
Q5. Suppose you roll a die 1000 times and count “1” as success and “anything else” as
failure. Is this an example of the binomial setting, and does the count have the binomial
distribution? Why or why not?
A5. Yes and yes, because the conditions of 2 categories, fixed n of observations,
independence, and p of success constant all hold.
Q6. In Chapter 1, the word distribution was defined as what values the variable takes and
how often it takes these values. Let’s say you roll a die 1000 times and count the number
of 1’s. The count of successes comes out to 165. Someone asks you, “What does this
have to do with a distribution? We just got one number from this experiment. How’s
anybody going to plot a histogram or any other representation of a distribution with this?”
What would be your answer?
A6. The number of 1’s that you would get in such an experiment is a random variable. If
you did the experiment many times, you would get a distribution that could be plotted
with a histogram, and which would take the approximate shape of the theoretically
derived binomial distribution for this situation.
Q7. When we say a certain random variable has a B(100, .7) distribution, what do we
mean?
A7. That there is a binomial distribution with 100 observations and probability of success
on each observation .7.
Q8. If there is a discrete random variable (such as a binomial), and you want to find the
probability of any given value of X, what function do you use – the cumulative
distribution function or the probability distribution function? (cdf or pdf?)
A8. pdf
Q9. Suppose you want to know the probability that a binomial random variable B(100,
.7) takes on a value less than or equal to 60. One way would be to use the binomial pdf
for values 0, 1, .. 60 and then add them all up. A much less laborious way would be to do
what?
A9. Use the binomial cdf function. On the TI-83 or TI-84 the command would be
binomcdf(100, .7, 60).
Q10. Suppose you roll a die six times, and you want the probability of getting exactly 3
1’s. What would be the appropriate expression of the binomial formula that would give
the answer to this?
A10. (6 choose 3) * (1/6^3 * 5/6^3).
Q11. Can you please explain why the binomial probability formula is as it is, using this
example of rolling a die six times? Please give an explanation for why each of the three
factors is what it is.
A11. You have 6 independent rolls. The probability of getting 1 on the first three and
something other than 1 on the next three is 1/6^3 * 5/6^3 because of the multiplication
rule for independent events. The number of different ways you can get three ones is the
number of ways you can select 3 dice to be ones out of the 6 different rolls: i.e. roll 1, 2,
and 3, roll 1, 2, and 4, etc. This is the number of combinations of six things taken 3 at a
time, or 6 choose 3. Because each of these ways of getting 3 1’s is mutually exclusive of
the others, you can use the addition rule to add the probability of each of the 6 choose 3
ways of getting the outcome of 3 1’s, and since these probabilities are all the same, a
shorter way is to multiply the probability of any one of them time 6 choose 3.
Q12. Can you please explain why, to obtain the binomial coefficient, you use the number
of combinations rather than the number of permutations, in calculating n choose k?
A12. In our die roll example, using the number of permutations would for example count
the event of getting 1’s on roll number 1, 3, and 5 as a different event from getting 1’s on
rolls number 5, 3, and 1. Since these events are the same, order does not make a
difference in enumerating subsets, and therefore you want combinations rather than
permutations.
Q13. In chapter 7 we learned that means and variances are additive when you want to
know the mean and variance of sums of independent random variables. How are these
facts crucial in figuring out the formulas for the mean and variance of a binomial random
variable?
A13. In the case of both the mean and the variance, we consider a certain random
variable to be the outcome of any one trial, giving a success numerical value 1 and failure
value 0. We calculate the mean and variance of this variable from the defining formulas.
Then we define another random variable to be the sum of n of these random variables,
which is the number of successes in n trials. We use the additive properties of the mean
and the variance that we learned in chapter 7 to move us from an expression for the mean
and variance of any one trial, to the mean and variance of the sum of n trials.
Q14. What are the formulas for the mean, variance, and standard deviation of a binomial
random variable in terms of n and p, and, if you want, q (or 1-p).
A14. The mean mu=np. The variance sigma squared = npq. The standard deviation is
the square root of npq or (np(1-p))^.5.
Q15. When n is “large,” the binomial distribution with n trials and success probability p
can be approximated by what?
A15. The normal distribution with mean np and standard deviation (npq)^.5.
Q16. As a rule of thumb, the normal distribution may be used as an approximation to the
binomial when both np and nq (expected successes, expected failures) equal or exceed
what number?
A16. 10
Q17. Please describe how to have your calculator simulate a binomial experiment. What
are the keys that you press?
A17. For the TI-83 or 84, You press math>prb>randbin( and then enter 1, p, and n in
parentheses. You press the sto> key to store the results, and 2nd L1 to get the results
stored in L1; then you hit the colon which is done by alpha and then the decimal point
key; then you do the sum function which is under list>math>. Then in parentheses you
enter L1. So the command is randbin(1, p, n)->L1:sum(L1). What this does is to
generate n numbers that are either 0 or 1, with a p probability of being 1, and adds them
to get the number of successes in the n trials, and displays that number. Each time you
press the enter key, this will be repeated. So you take your number 2 pencil and write
down each of these numbers, and then you count how many of them had the result you
were looking for, and you express that result as a fraction of the number of trials you did.
Q18. For a binomial setting, the number of trials is fixed, and the random variable is the
number of successes in that trial. For a geometric setting, the random variable is the
number of ____ necessary to achieve the first _____.
A18. trials, success
Q19. True or false: in the geometric setting, as in the binomial setting, you have 1. two
categories, 2. with the same probability for each observation, and 3. independent
observations.
A19. True.
Q20. In a geometric setting, with probability of success p, what is the probability that the
first success will occur on the nth trial?
A20. P(X=n) = (1-p)^n-1 * p
Q21. True or false: the probabilities of success on the first, second, third, etc. trial in a
geometric setting, when arranged in order, form a geometric series where p is the first
term and each successive term being (1-p) (or q) times the previous one?
A21. True
Q22. True or false: if you apply the formula a/(1-r) for the sum of the terms of an infinite
geometric series, where a is the first term and r is the ratio of each term to the previous
one, for the geometric setting p is the first term and (1-p) is the ratio, so the sum becomes
p/(1-(1-p)) or 1. Thus even though there are infinitely many possibilities for the outcome
of the experiment in the geometric setting, the probabilities of each outcome sum to 1.
A22. True
Q23. If your chances of rolling a 1 on a die roll are one in 6, what is the expected or
average number of times that you would have to roll the die before getting a 1?
A23. 6 times.
Q24. If your chances of getting a success at anything in the geometric setting is p, what is
the average or expected number of trials that you would have to conduct before getting a
success?
A24. 1/p trials.
Q25. What is the variance in the geometric random variable?
A25. q/p^2, or (1-p)/p^2
Q26. In the geometric setting, if q=1-p, what is the probability that it takes more than n
trials to see the first success?
A26. P(X>N)= q^n.
Q27. On page 470 of YMS there is a derivation of the formula for the probability that it
takes more than n trials to see the first success. Can you think of a really simple way to
arrive at the same formula?
A27. If, and only if, the first n trials are failures, it will take more than n trials to get the
first success. The probability of the first n trials being failures, using the multiplication
rule for independent events, is q^n.
Q28. For a geometric distribution, would you say that it is approximately true that 34% of
the observations would fall between the mean and 1 standard deviation above the mean,
and 34% would fall between the mean and 1 standard deviation below the mean? Why or
why not?
A28. No, because the geometric distribution is always strongly skewed to the right, and
its shape doesn’t resemble the normal distribution (for which the above statement is true).
Q29. Suppose that some experts estimate that the probability of a major nuclear war in
any given year is 1%. You think that you will live another 65 years. You are wondering
what the chance is that you will and will not see a nuclear war. Please fit some of the
concepts of this chapter to this situation, and calculate the probability.
A29. This is the geometric setting, where “success” is defined as nuclear war, and
“failure” is defined as no nuclear war! What you are being asked is the probability that it
will take more than 65 trials to see a “success.” You use the formula P(X>n)=q^n, to get
the P(X>65)=.99^65. The probability of no nuclear war comes out to .52. So would that
be comforting, or what?
YMS Chapter 9 Sampling Distributions
Q1. A parameter (which begins with p) is a number describing a _____; a statistic (which
begins with s) is a number describing a ______.
A1. population, sample
Q2. What symbols are used in our book’s notation to represent a population mean,
sample mean, sample proportion, and population proportion, respectively?
A2.  (mu), x-bar, p-hat, and p. (Note that the letter P will also be used later to indicate
the “P-value” for a statistical test. Our text will use a capital P for that, or the phrase Pvalue. But many other writings will use small p for P-value.)
Q3. Suppose you were to take a large number of samples (all the same size) from a
population, compute the mean of each, and plot a histogram of the sample means that you
obtain. This histogram would approximate the shape of the ______ ________ of the
mean.
A3. Sampling distribution.
Q4. The sampling distribution for a proportion or mean changes as the number in the
sample increases: the mean of that sampling distribution (increases, stays the same,
decreases) and the variance of the sampling distribution (increases, stays the same,
decreases).
A4. stays the same, decreases
Q5. If the mean of a sampling distribution is the true value of the parameter being
estimated, we refer to the statistic used to estimate the parameter as being _____.
A5. unbiased
Q6. True or false: if a statistic is unbiased, the value of the statistic computed from the
sample equals the population parameter.
A6. False. Samples vary. It’s only the mean of all possible samples that equals the
population parameter for an unbiased statistic.
Q7. True or false: the variability (and thus the accuracy) of statistics are very sensitive to
the size of the population from which the samples are drawn.
A7. False. The sample size is much more important than the population size.
Q8. An organization wants to sample with equal accuracy from each state of the USA.
Would it make more sense to sample 2000 from each state, or 1% of each state?
A8. 2000 from each state, because the absolute sample size determines the accuracy, and
you don’t need a greater sample with a higher population.
Q9. To review from Chapter 7, on the binomial distribution: what are the mean and
standard deviation of a binomially distributed variable X, where p is the (population)
probability of success, q is the probability of failure, and n is the size of the sample?
A9. The mean is np, and the standard deviation is the sqrt(npq).
Q10. What are the mean and sd of sample proportion, which is X/n where X is binomially
distributed (but X/n is not binomially distributed)?
A10. The mean is p, and the standard deviation is sqrt(pq/n).
Q11. If you want a standard deviation for a sample proportion that is half as big as some
other one, you have to get a sample that is how many times bigger?
A11. 4 times bigger.
Q12. If the sample is a substantial fraction of the population, then the assumption of
independence that leads to the binomial distribution is violated. How many times bigger
should the population be than the sample, so that we don’t worry about this?
A12. At least 10 times bigger.
Q13. True or false: The standard deviation of the sampling distribution of a proportion is
only approximately sqrt(pq/n); this approximation is most accurate when np>=10 and
nq>=10.
A13. False. The sd of the sampling distribution of a proportion is always exactly
sqrt(pq/n). But that distribution is approximately NORMAL when np and nq are >=10.
Q14. If you know the population proportion, how do you use the normal approximation
to figure out the probability that the proportion obtained from a random sample of size n
will be between two given values?
A14. You use p and sqrt(pq/n) as the mean and sd, and with these compute a z score for
the upper and lower bounds of the interval you’re interested in. Then you use the normal
table, or a calculator, to get the proportion of the normal curve between those two z
scores. This is the probability that the sample proportion will fall between those values.
Q15. How do the sampling distributions of means compare with the distributions of
individual observations? They are less _____ and more _____.
A15. Variable, normal
Q16. Suppose you have a population with mean mu and sd sigma. What are the mean
and sd of the sampling distribution for means with sample size n?
A16. The mean of the sampling distribution is mu and the sd is sigma/sqrt(n).
Q17. Under what conditions will the sampling distribution of the mean have an exact
normal distribution, no matter what the sample size is?
A17. When the population is normally distributed.
Q18. What does the central limit theorem tell us?
A18. That as the sample size gets larger, the sampling distribution of the mean
approaches the normal, regardless of the distribution of the population from which the
observations are drawn.
Q19. True or false: suppose that income in a large country is not normally distributed, but
is very skewed. The central limit theorem tells us that if we were to collect several very
large samples and compute the mean income for each sample, those means would be
approximately normally distributed, even though the incomes in the population are not
normally distributed.
A19. True.
Q20. Why do you think the central limit theorem is so “central” to statistics?
A20. Because it enables us to use normal probability calculations to answer questions
about sample means even when population distributions are not normal. Those questions
include the big idea of confidence intervals: how likely is the right answer to be between
these two bounds. Thus the central limit theorem helps us say, “There’s x probability that
the true mean of the population is between a and b.”
YMS Chapter 10 Introduction to Inference
Section 1
Q1. Statistical inference consists in drawing conclusions about a ____ from data in a
____.
A1. population, sample
Q2. If the standard deviation of a population is sigma, what is the sd of the sampling
distribution for the sample mean (this is often called the standard error of the mean) with
sample size n?
A2. sigma/sqrt(n)
Q3. Suppose we know that the sd of the sample mean (a.k.a. standard error of the mean)
is 4.5. This implies that if we were to draw many samples from the population, about
95% of these sample means would fall within what interval?
A3. The population mean plus or minus 9.
Q4. True or false: we should imagine the sample mean as being at the center of a bellshaped curve, with 2 standard deviations of the sample mean (a.k.a. standard errors) on
either side of this point encompassing 95% of the other sample means.(Assume the
sample is an srs and the sample means are normally distributed.)
A4. False. We should imagine the population mean as being at the center of that bell
curve. We visualize the sample mean as falling within 2 standard errors of the population
mean 95% of the time.
Q5. True or false: The reasoning we use in making confidence intervals around a sample
mean is as follows: if the sample mean is normally distributed, then 95% of the time, xbar will be within 2 sample standard deviations (standard errors) of the population mean,
mu. Whenever x-bar is within 2 standard errors of mu, mu is within 2 standard errors of
x-bar. So if we make an interval + or – 2 standard errors around x-bar, that interval will
encompass mu for 95% of the sample means we obtain.
A5. True.
Q6. A confidence interval has two parts: 1) the interval itself (usually expressed as an
estimate plus or minus a margin of error) and 2) ______________.
A6. The confidence level, e.g. .95.
Q7. Someone says, “I read that the 95% confidence interval for a certain group’s score on
a certain test was 115 to 128. That means that 95% of all the members of the group score
in that range.” Is this an accurate interpretation? If not, please give a better one.
A7. Not accurate. The confidence interval stated means that we are 95% confident that
the population mean lies within the stated interval. And 95% confident means that 95%
of the intervals obtained the way we got this one would encompass the population mean.
Q8. In order to construct a confidence interval for a mean, what two conditions need to
be met?
A8. That the data come from a SRS of the population, and that the sampling distribution
of the x-bar is approximately normal.
Q9. A first person says, “I want a 90% confidence interval. So I’ll look in the normal
table for the z-score with 95% of the area to the left of it.” A second person says, “You
mean 90% of the area, don’t you?” What is the correct way to look in the table?
A9. The first person got it right. The region around the population mean that subsumes
90% of the sample means is that with 5% above that region and 5% below that region. So
you want z for .95 or the negative of the z for .05.
Q10. What are the “tail areas” you look for, for confidence intervals of .90, .95, and .99,
respectively?
A10. .05, .025, and .005.
Q 11. If C is the confidence level, what is the expression for the area to the right of the
interval subsuming fraction C of the distribution for sample means?
A11. (1-C)/2.
Q12. What does the symbol z* stand for?
A12. The z-score with (1-C)/2 of the area lying to the right of it. Or: the number of
standard deviations above and below the mean that bound the C level confidence interval.
Q13. True or false: The values mu- z* sigma/sqrt(n) and mu + z* sigma/sqrt(n)
represent the upper and lower bounds for the confidence interval for the mean.
A13. False. The confidence interval is centered around x-bar, not around mu, because we
don’t know mu. (If we did, we wouldn’t need to make a confidence interval.) The values
listed above are the bounds within which there is a probability C that any observed
sample mean will fall. Whenever the sample mean falls between these values, then we
can say that the population mean falls between x-bar –z* sigma/sqrt(n) and x-bar + z*
sigma/sqrt(n).
Q14. If my wife’s age falls in the interval of my age plus or minus 5 years, then my age
must fall within the interval of my wife’s age plus or minus 5 years. Is this true, and is
this sort of reasoning central to the reasoning about confidence intervals?
A14. Yes and yes.
Q15. True or false: The way in which the statement in the previous question has its
analogy in the reasoning about confidence intervals is: any time the sample mean falls
within the interval of mu plus or minus the margin of error, then the population mean
must fall within the interval of x-bar plus or minus the same margin of error.
A15. True.
Q15. True or false: the values x-bar –z* sigma/sqrt(n) and x-bar + z* sigma/sqrt(n) form
the upper and lower bounds for the confidence interval for the mean (assuming the
assumptions are met).
A15. True.
Q16. Example 10.5 on page 546 is worthy of careful study. What are the 4 steps that were
exemplified in using confidence intervals?
A16. 1. Identify the population of interest and the parameter to be estimated. 2. Choose
the appropriate procedure, and verify that the conditions for using it are met. 3. Carry out
the procedure. CI = estimate + or – margin of error. 4. Interpret the results in the context
of the problem.
Q17. Please tell whether the margin of error (which is half the width of the confidence
interval), or the width of the confidence interval itself, gets bigger or smaller under each
of the following circumstances: a. the population standard deviation gets smaller, b. the
level of confidence C gets bigger (e.g. a move from a 90% confidence interval to a 99%
confidence interval) c. the sample size gets bigger, and d. the population size gets bigger?
A17. a. smaller, b. bigger, c. smaller, d. no effect
Q18. Is it preferable in research for a 95% confidence interval to have its upper and lower
bounds closer together, or farther apart?
A18. Closer together, because this represents a more accurate estimate of whatever
you’re trying to estimate.
Q19. Suppose you are a researcher planning a study, and you are deciding how many
subjects to enroll. You want a certain margin of error m. You know what level of
confidence you want, and you know (or estimate) the sigma for the population. How do
you figure out the sample size?
A19. Set m>= z* sigma/sqrt(n) and solve that inequality for n. As usual, you use as z*
the z score that has (1-C)/2 area to the right of that score.
Q20. Some of the problems in the use of confidence intervals can be surmounted by
getting a large enough sample size – with this, the distribution of sample means can be
considered normal even if the population isn’t normal. Also, with a large enough sample
size, the sample standard deviation is close to the population standard deviation. What’s
the main problem that can’t be overcome with a large sample size?
A20. That the data should be a SRS of a population, and that the margin of error in a
confidence interval covers only random sampling errors. If you are dealing with a
convenience sample (as in the vast majority of research studies) then you can go ahead
and compute a confidence interval, but the results are generalizable only to the population
of which this particular sample is representative, and sometimes it’s impossible to say
what that population is.
Chapter 10, Section 2
Q21. True or false: The basic reasoning for significance testing is: an outcome that would
happen rarely if a claim were true is good evidence that the claim is not true.
A21. True. (Note: there are problems with this basic presumption.)
Q22. In doing statistical tests, the first step is to identify what you want to make
conclusions about. Are you always wanting to make conclusions about sample statistics,
or population parameters? Or is it sometimes one or sometimes another?
A22. Always the population parameter.
Q23. What does a “null hypothesis” typically state?
A23. That there is no difference between two parameters, or no effect, or no change (or
that a parameter is equal to a certain value).
Q24. A significance test works by assessing how likely the ______ _____ would be if
the ____ _____ were true.
A24. observed outcome, null hypothesis
Q25. True or false: the p-value is the probability of getting exactly the results we
observed, presuming the null hypothesis to be true.
A25. False. The probability of getting exactly the results we obtained is almost always
very small (or even theoretically 0 when dealing with continuous distributions). The p
value is the probability of getting results as extreme, or more extreme, than the actually
observed results; “extreme” means “far from what we would expect if the null hypothesis
were true.”
Q26. We are more likely to reject the null hypothesis of “no difference” or “no effect,”
and infer that there is a difference or an effect, when the P-value is large, or small?
A26. Small
Q27. The null hypothesis has to do with a population parameter; in analyzing your
sample data you calculate a ______ that estimates that population parameter.
A27. Statistic (the phrase, “sample statistic” is correct but redundant.)
Q28. When a drug company researcher is hoping to find evidence that a drug is better
than placebo, is the researcher wishing to reject, or fail to reject, the null hypothesis?
A28. Reject. The null hypothesis would be that drug and placebo are equal in effect.
Q29. Suppose someone is testing a drug versus placebo. If the researcher is interested
only in the alternative that the drug is better than placebo, then the alternative hypothesis
is _____-sided, but if the researcher both harmful effects (drug worse than placebo) and
beneficial effects (drug better than placebo) as rejections of the null hypothesis, then the
alternative hypothesis is ____-sided.
A29. one, two
Q30. What is the meaning of the significance level, or alpha?
A30. It’s a threshold level for the p-value that we consider decisive, with which the
obtained p-value is compared.
Q31. Do we reject the null hypothesis when the p value is less than alpha, or greater than
alpha?
A31. Less than alpha.
Q32. Are we more likely to reject the null hypothesis with a larger alpha, or a smaller
alpha, all other things equal?
A32. With a larger alpha. The larger alpha is, the more likely the p-value we obtain is less
than that cutoff. Thus smaller p-values, and larger alphas, dispose to rejection of the null.
Q33. If a test is statistically significant at the .05 level, what does that mean?
A33. That the p-value obtained is less than or equal to .05.
Q34. Someone finishes writing up a statistical test by saying, “In conclusion, p=.021.”
What step of the “inference toolbox” are they leaving out, that should come after what
they said?
A34. Interpreting the results in the context of the problem. So they should say something
like, “Therefore we reject the hypothesis that drug and placebo are equal; our study gives
evidence that our drug is more effective than placebo.”
Q35. When we are testing the hypothesis that a population mean is equal to a certain
hypothesized value, in the unlikely situation where we know the population standard
deviation, what is our test statistic?
A35. The standardized sample mean, which is the sample mean minus the hypothesized
population mean, divided by the standard deviation of the sample means. This is also
known as the one-sample z statistic.
Q36. What distribution does the one-sample z statistic, a.k.a. the standardized sample
mean, have when the null hypothesis is true?
A36. The standard normal distribution.
Q37. True or false: for a one-sided test (or a one-sided alternative hypothesis), results
extreme in one direction are counted as evidence against the null hypothesis; for a twosided test (or a two-sided alternative hypothesis), results extreme in either direction are
counted as evidence against the null hypothesis.
A37. True.
Q38. Please explain why the two-sided p-value is double that of the one-sided p-value.
A38. The p-value is the probability of getting results as deviant as, or more deviant than,
the results obtained. For a two sided test, we add the probability of getting results deviant
in both directions to get the total p value; for symmetrical distributions, those two
probabilities are equal, thus amounting to twice the value for any single direction.
Q39. How do you compute the one sample z statistic?
A39. Sample mean minus hypothesized population mean, over the sd of the sampling
distribution of the mean (that sd is sigma/sqrt(n) ).
Q40. True or false: What is meant by doing “tests with fixed significance level” for a
one-sample z test is that you become aware of what the cutoff (or critical) values are for z
for the alpha you’ve picked. If the z your data yield is more extreme than the z for the
alpha you’ve picked, the test is significant at the specified level of alpha. This method is
most useful for those who don’t have access to calculators or computers that will give a
p-value directly.
A40. True.
Q41. True or false: If you obtained a 95% confidence interval for a mean that ranged
from 10 to 30, then a null hypothesis that the mean was equal to any value outside that
range would be rejected and a null hypothesis of a mean within that range would not be
rejected, at the .05 level, using a two-sided test.
A41. True.
Chapter 10, Sections 3 and 4
Q42. Suppose you thought your research would overturn a conclusion that many people
had held for a long time. Would you tend to choose a smaller alpha (thus necessitating a
larger sample size to reject the null hypothesis) or a larger alpha?
A42. A smaller alpha. You know that people will be skeptical of your conclusion, so
you'd be better off getting a larger sample size that enables you to say, "This result is
significant at the .001 level" rather than just the .05 level.
Q43. Suppose you had limited subjects with whom to work, and you were looking for
evidence of toxicity from a chemical. The consequences of declaring that the chemical is
safe when it isn't are very bad. The consequences of declaring the chemical dangerous
when it isn't are primarily that more studies would be done than you have the resources to
do. Given these consequences, would you tend to set alpha higher, or lower?
A43. Higher. You want to give yourself as good a chance as possible to reject the null,
under these conditions.
Q44. True or false: If you report the p-value itself, rather than saying, that p<.05, you in a
sense let the readers of your journal set their own alpha, i.e. make their own decision as to
whether they want to reject the null hypothesis given the p-value you report.
A44. True.
Q45.True or false: P-values slightly over .05 should not be considered statistically
significant.
A45. False. Where we draw the cutoff depends on the stakes of the various outcomes of
the decision we are trying to make.
Q46. Suppose we test a drug with a very large number of subjects. We find that on a 60point rating scale, the drug group has a mean depression score rating that is 2 points
lower than the placebo group. The p-value is .03. Someone is likely to say that the
difference is ____ significant but not ____ significant.
A46. Statistically, practically
Q47. A researcher designs a study, gathers data, punches the data into the computer, runs
a significance test, and interprets the result based on the significance test. What important
step is being left out? Please give one reason why this step is important.
A47. Inspecting the data from graphs. This is important because 1) outliers can have big
effects on significance tests, 2) you want to see how closely the data follow a normal
distribution, especially with small samples; and 3) you want to see if an effect is visible
in the plots; if not, perhaps the effect is not practically significant.
Q48. Suppose you are trying to find the effect of lighting levels on worker productivity.
But your design has confounded lighting level with the effect of workers being observed
and monitored; the effect of monitoring gives a false impression that lighting makes a
difference. This situation would be referred to as the ____ effect.
A48. Hawthorne
Q49. If there are bad design problems, is it likely, or unlikely, that sophisticated
inferential statistical analysis techniques can get around these problems to produce valid
inferences?
A49. Unlikely.
Q50. Should an ethical statistician attempt to produce inferential statistics on a data set
that did not involve random sampling or random assignment?
A50. Definitely yes. Much data which was not produced by random methods can
nonetheless be treated as if it were, to produce very useful results.
Q51. Someone is looking for drugs or dietary factors that could prevent Parkinson's
Disease. The researcher gives a questionnaire to a random sample of people with the
disease and of people without it. The researcher looks at the mean intake of each of 100
substances for people with and without Parkinson's, and tests each one to see if there is a
significant difference in intake between the two groups. The researcher comes out with
about 5 substances that are significantly different between the two groups, at the .05
level. The researcher recommends that people adjust their intakes accordingly. Do you
have a problem with this? If so, what?
A51. Yes -- this is the multiple comparison problem. Out of 100 hypothesis tests you
would expect about 5 to come out significant by chance alone. It could be that the effects
are real, but it is very possible that they have happened by chance alone.
Q52. Our textbook agrees with your instructor, that (confidence intervals, tests of
significance) are underutilized, whereas _______(same options) are perhaps overutilized.
A52. confidence intervals, significance tests
Q53. Our book points out that "many tests run at once will probably produce some
significant results by chance alone." What does you logical reasoning tell you about
many tests run sequentially, each testing one association, in a separate study? (The
answer to this isn't specifically discussed in our book, but tell what you think.)
A53. Logically, the same problem holds. And if people choose to publish the significant
findings and relegate the non-significant ones to the "file drawer," there is a distortion of
the truth just as there can be from the multiple comparison problem.
Q54. What is acceptance sampling?
A54. Sampling a subset of a product and using statistical methods to decide whether to
accept or throw out the whole batch on the basis of the sample.
Q55. When we are thinking of statistical inference as making a decision, as in acceptance
sampling, what is different about the status of the null hypothesis and the alternative
hypothesis from null hypothesis significance testing?
A55. There's no special status allocated to the null hypothesis; there is a null hypothesis
and an alternative hypothesis that we are trying to decide between.
Q56. Suppose the null hypothesis is that a drug has no effect. If this is true and our
analysis yields the decision that the drug is effective, that is a _____ error; if the drug has
an effect but our analysis concludes that there is no effect, that is a ______ error.
A56. Type 1, Type 2
Q57. If we set alpha at .01, what is the probability of a type 1 error given that the null
hypothesis is true?
A57. The probability is .01.
Q58. Somebody does a statistical test and rejects the null with p less than the preset alpha
of .05. Someone else says, "There is a 5% chance that your result is a type 1 error." Is
this correct?
A58. Despite the sentence in our text on page 596 that reads, "The significance level of
any fixed level test is the probability of a Type 1 error," the answer to this is no. The 5%
probability is a conditional probability: the probability of rejecting the null GIVEN that
the null is false. The person who made the statement above is mistakenly thinking that
"Given that the null is rejected, there is a 5% chance that the null is rejected falsely."
To understand better why such reasoning is wrong: suppose we are testing whether
running really fast raises the heart rate of previously resting college students. We set an
alpha of .05 when comparing the resting heart rates and the immediate post-running heart
rates. When we find a huge effect of our intervention, what is the chance that it's a type 1
error? Approximately zero, because it's not an error at all.
On the other hand, if we are testing the effects of a drug with absolutely no effect,
versus placebo, then if we rejected the null, the chance that we had made a type 1 error
would be 100%; before doing the test, our probability of making a type 1 error would be
5%.
Q59. Suppose someone says, "I'm testing whether a certain mean equals 0 -- that's my
null hypothesis. I'm setting alpha at .05. I have a sample size of 50. The population
standard deviation, we're assuming, is known to be 1. Can you please calculate a
probability of a type 2 error for me? (Or the "power" of my test, which is 1-the
probability of a type 2 error, the probability of rejecting a false null?)
Can you calculate this, or is there something missing, and if something's missing,
what is it?
A59. The probability of rejecting a false null depends upon "how false" the null is. If the
true mean of the population the researcher is drawing from is really 100, the probability
of rejecting the false null is very close to 1. If the population mean is really 0.000001, the
probability of rejecting the false null is very close to 0. So what's missing is a
hypothesized mean that represents "how false" the null is.
Q60. Suppose you are given a hypothesized true mean, a null hypothesis mean, an alpha,
a sample size, and a population sd, and the right to assume a normal distribution of the
mean you obtain. How do you calculate the probability of a type 2 error?
A60. You figure out the region where the null hypothesis is not rejected, which is, for a
.05 alpha, 1.96 sample standard deviations on either side of the null hypothesized mean.
Then you use the normal cumulative distribution function for the distribution N(mu,
sigma), where mu and sigma are the population mean and sd, to find the probability that
the sample mean will fall within the interval you've calculated.
Q61. The probability that a fixed level alpha significance test will reject the null when a
particular alternative value of the parameter is true is called the _____ of the test against
that alternative.
A61. Power
Q62. The power of a test is what function of the probability of a type 2 error?
A62. 1 minus prob of type 2 error.
Q63. True or false: the p-value tells what would happen if we tested many samples, when
the null hypothesis is true; the power tells what would happen if we tested many samples,
when a particular alternative hypothesis is true.
A63. True.
Q64. When alternative hypothesis parameter is very close to that of the null hypothesis,
the power tends to be (lower than, higher than, the same as) the power when the
alternative hypothesis parameter is very far from that of the null hypothesis.
A64. Lower than.
Q65. What happens to power as you increase sample size, all other things equal?
A66. It increases.
Q67. What happens to power as you increase the diversity of the population you are
looking at, i.e. increase the population standard deviation of whatever parameter you are
estimating?
A67. Power goes down.
Q68. What happens to power as you increase the precision and accuracy of your
measuring devices?
A68. Power goes up, because the more accurately the phenomenon is measured, the less
the standard deviation of the measurements becomes.
Q69. If too many tests of a hypothesis have insufficient power, then what happens: (a
false null tends to become believed to be true, or a true null tends to become believed to
be false)?
A69. The false null tends to become accepted as true. This is because study after study
fails to reject the null.
YMS Chapter 11 Inference for Distributions (T-Tests of Means)
Q1. What two conditions does our text list for inference about means when the
population standard deviation is not known?
A1. That the data are a SRS from the population of interest, and that the observations
from the population have a normal distribution.
Q2. The sample standard deviation, divided by the sqrt of the n of observations in the
sample, is called the s_____ e_____ of the sample mean.
A2. Standard error
Q3. When the standard deviation of any statistic is estimated from the data, the result is
called the ____ ____ of that statistic. (Thus you can have these that apply not just to the
sample mean.)
A3. standard error
Q4. Does it make sense to speak of the standard deviation of the population mean? If not,
why not?
A4. A population parameter is a single number, not a random variable. As such, it doesn't
have a variance or standard deviation. Sigma/sqrt(n) gives the standard deviation of the
sample mean, and s/sqrt(n) gives a less accurate estimate of the standard deviation of the
sample mean.
Q5. The z-statistic is the (x-bar-mu)/(sigma/sqrt(n)). What is the t-statistic?
A5. t=(x-bar-mu)/(s/sqrt(n)) where s is the sample standard deviation.
Q6. There is just one standard normal distribution. Is there just one t-distribution?
A6. No; there is a t-distribution for each number of degrees of freedom of the statistic,
where the degrees of freedom in dealing with means is n-1.
Q7. What is the general shape of the t-distribution?
A7. Bell-shaped, similar to the normal.
Q8. As the degrees of freedom increase, the shape of the t-distribution more and more
closely approximates what?
A8. The standard normal distribution.
Q9. Can you please explain the reason for the way the shape of the t-distribution differs
from that of the normal when the degrees of freedom are low?
A9. The distribution is more spread out, less peaked, with less probability in the center
and more in the tails -- in other words, it has more variation. This is because estimating
sigma by s rather than knowing sigma for sure adds more variation to the statistic.
Q10. What's the expression for the level C confidence interval for the population mean
(mu), using the t distribution to estimate when the population standard deviation is
unknown?
A10. The confidence interval is x-bar +- t* (s/sqrt(n)) where t* is the upper (1-C) critical
value for the t distribution with n-1 degrees of freedom. (And s is the sample standard
deviation, and n is the sample size.)
Q11. How do you construct the t statistic and use it to test a hypothesis about a
hypothesized population mean mu-zero?
A11. The t statistic is (x-bar - mu-zero)/(s/sqrt(n)). Once you compute that statistic, you
look in a table for the area in the t(n-1) distribution that is >= that value, <= that value, or
twice the area >= that value, depending on whether your alternative hypothesis is that the
mean is greater than mu-zero, less than mu-zero, or not equal to mu-zero, respectively.
Q12. Suppose you make from the subjects who have signed up for your study a set of
pairs who are as similar as you can make them. Then you randomly assign one from each
pair to get a treatment, and one to get placebo or no-treatment. What sort of design is
this?
A12. A matched pairs design.
Q13. If in a matched pairs design, you simply measure the post-treatment outcome for
each subject and want to know if the treated subjects do better than the untreated subjects,
how does our text suggest conducting that test?
A13. By creating a variable which is the difference between treated and untreated for
each pair, and then using the one-sample t procedure to either get a confidence interval
for the mean difference or to do a hypothesis test on the mean difference.
Q14. When you using a matched pairs design as suggested, what population parameter
are you estimating?
A14. The mean difference between pairs for all matched pairs in the population.
Q15. What is a "robust" procedure?
A15. One where the accuracy of a confidence interval or significance test is not very
greatly affected by violation of the assumptions.
Q16. The t procedures are very robust against (nonnormality of the population, outliers)
but not very robust against (nonnormality of the population, outliers).
A16. nonnormality of the population, outliers
Q17. The main reason why the t procedures are robust against the nonnormality of the
population is what theorem?
A17. The central limit theorem, i.e. that sample means become more nearly normally
distributed as the sample size gets larger, even when the population does not have a
normal distribution.
Q18. The rules of thumb for sample size and t procedures are that with sample size less
than ____, you can use t if the data are close to normal and there are no outliers; with
sample size ___ to ____, you can use t unless there are outliers or strong skewness, and
with sample size over ____, you can use t procedures even for clearly skewed
distributions.
A18. 15, 15 to 40, 40.
Q19. When calculating the power of a t-test, (as when calculating the power of a z-test),
what two steps do you go through?
A19. First, write the rule for rejecting H0 in terms of x-bar. (This is done by writing the
formula for t, plugging in the critical t value, sample size, and alternative hypothesis
mean, and solving for x-bar.) Second, you compute the probability of rejecting H0 given
that the alternative is true. (This is done by standardizing x-bar relative to the alternative,
assuming that sigma is equal to s, and using the normal distribution to compute the
probability that this mean will fall in the rejection region computed by the first step.)
Q20. In power calculations, and in explaining them, there's a tricky distinction between
the "alternative hypothesis" for the test, and the "alternative" that is assumed for the sake
of power calculations. Please clear up any confusion on this by giving an example. (The
example on page 640 will do fine.)
A20. Suppose you are hoping that the listening score of 20 subjects will improve by 2
points, and that your test can detect such an improvement. Your null hypothesis would be
a mean improvement score of zero, tested against an Ha, "alternative hypothesis," that the
mean is greater than zero (or, if a two-sided test is desired, that the mean is not equal to
zero). But the "alternative" that you use in your power calculations is that the mean
change is really 2 points. So the "alternative hypothesis" for the test is that the mean is
greater than 0, and the "alternative" for the power calculation is that the mean is equal to
2.
Q21. One researcher randomly samples 2 groups from a population, and gives training to
one and not the other. The researcher uses a t procedure to compare the test scores of the
two groups. Another researcher samples a group from the population, and gives a test to
the group 2 times, once before training and once after. The researcher uses a t procedure
to compare the results after testing with those before testing. How are these two situations
different, and what different statistical procedures should they result in?
A21 In the first case, the samples are independent of one another, and in the second, they
are not. So in the first case, you use a 2 sample t to study the difference in the means. In
the second case, you create a new variable, the postscore-prescore, and use a 1-sample t
to study the mean of the differences.
Q22. Has it always been an assumption or condition, for every inference procedure we
have studied, (as contrasted to the exploratory analysis procedures we studied) that the
sample can be considered a SRS of the population of interest?
A22. Yes. (This is despite the fact that the overwhelming majority of inferential statistics
reported in research are NOT based on SRS's of populations of interest!)
Q23. Suppose someone were to draw many pairs of samples from two populations, and
compute the difference between the sample means for each pair. What would the mean of
this difference approach as the number of samples drawn approached infinity?
A23. The difference in population means.
Q24. The fact that the mean of the difference in sample means approaches the difference
in population means as the number of samples gets larger is a long way of saying that the
difference in sample means is an ____ estimator of the difference in population means.
A24. unbiased
Q25. True or false: just as the difference in sample means estimates the difference in
population means, the difference in sample standard deviations estimates the population
standard deviation of the difference between two means.
A25. This is triply false! First, what you would combine would be variances, not
standard deviations. Second, to find the variance of the difference between two random
variables you add the variances; you don't subtract them. Third, the sample variances
would have to be divided by n to estimate the variance of the sample mean.
Q26. True or false: the variance of the difference between two population means is
estimated by s1^2/n1 + s2^/n2, where s1 and s2 are the sample standard deviations (and
thus s1^2 and s2^2 are the sample variances) and where n1 and n2 are the sample sizes.
A26. True.
Q27. When the standard deviations of the two populations you are sampling from are
different, why does the difference of the means of two independent samples not exactly
follow the t-distribution?
A27. Because there are two population standard deviations replaced by the sample
standard deviations in the formula, not just one.
Q28. When the sample sizes for the two samples are different, how many degrees of
freedom do you use for t procedures? Please mention two options.
A28. One option is to use n-1, where n is the smaller of the two samples. The other (and
the one almost always used in research) is to let the computer program calculate a
noninteger degrees of freedom according to a more complicated formula to be presented
later, which does not need to be memorized!
Q29Can you explain in words how you form a confidence interval for the difference
between two means?
A29You take the difference between the sample means, and add and subtract the margin
of error, which is the t* multiplied by the estimated standard deviation of the difference.
The t* is the upper (1-C)/2 critical value. The estimated standard deviation of the
difference is the square root of the sum of the estimated variances of each group mean.
The estimated variance of a group mean is the sample variance over the n for that group.
Q30 Can you explain in words how you do a hypothesis test of the null hypothesis that
two group means are equal?
A30 You compute the t statistic by putting the difference in sample means in the
numerator, and the estimated standard deviation of the difference in the denominator.
You get the P-value that corresponds to that t.
Q31 The guidelines for sample sizes with one-sample t procedures are also applicable to
two-sample t tests, if you substitute for the phrase "sample size" the phrase ____.
A31 Sum of the sample sizes.
Q32 For using t procedures with means of independent samples, an excellent
approximation is achieved by using the t distribution with a not-necessarily-integer
degrees of freedom computed by the formula on p. 659. This approximation is quite
accurate when the sample size of both samples is what?
A32 Five or larger.
Q33 When software gives you a choice between assuming, or not assuming, equal
variances for the two populations whose means you are comparing with a t procedure,
which choice should you generally make?
A33 Not to assume equal variances, because this assumption is very difficult to check.
YMS Chapter 12 Inference for Proportions
Q1. The statistic that estimates (in an unbiased way) the population proportion is ____.
A1. The sample proportion.
Q2. What is the standard deviation of the sample proportion (provided the population is
at least 10 times as big as the sample)?
A2. sqrt(pq/n) where p is the population proportion, q is 1-p, and n is the sample size.
Q3. If np and nq are at least 10, then we can treat the distribution of p-hat as
approximately what?
A3. Normal.
Q4. Just for review: if asked for the standard deviation of the population proportion, what
would you say?
A4. Population parameters do not have standard deviations, because they are not random
variables. The population proportion is a single number. For example, if a certain
proportion of the population of the U.S. is left-handed, there is no variance in that
number.
Q5. Just for review: what does it mean to "standardize" any statistic?
A5. To subtract the mean and divide by the standard deviation.
Q6. When you standardize p-hat, what do you subtract and divide by?
A6. You subtract the population proportion and divide by the population standard
deviation, which is sqrt(pq/n).
Q7. What familiar ploy do we use to cope with not knowing the population proportion, in
creating a standardized p-hat?
A7. We use the familiar ploy of substituting the sample statistic for the population
parameter. So we use the standard error, sqrt( phat qhat/n,), instead of the standard
deviation of the proportion.
Q8. What is the expression for a confidence interval around the sample proportion?
A8. The confidence interval is phat +- z*sqrt(phat*qhat/n). This fits the format of
estimate+- z* SEestimate for any normally distributed estimator.
Q9. Just for review: what would you say if asked to do a hypothesis test about whether a
sample proportion was equal to a certain number?
A9. Hypothesis tests, and statistical inference in general, have to do with what population
parameters are and not what sample statistics are.
Q10. How do you do a hypothesis test that a population proportion is equal to a certain
number?
A10. You form a z statistic with phat - p0 in the numerator, and sqrt(p0*q0/n) in the
denominator. You then get a p value associated with that z statistic, from the standard
normal table or the normal cdf function of a calculator. (Note that when doing a
hypothesis test, you use the hypothesized proportion, p0, for the estimate of the standard
deviation; when doing a confidence interval, you use the sample proportion, phat.)
Q11. Given a certain margin of error, for example +- .03, how do you determine how big
a sample size you need to get a margin of error that small or smaller?
A11. You set the expression for margin of error, namely z* sqrt(p*q*/n)<=m and solve;
z* is the critical value of z for (1-C)/2, and p* is the estimated proportion of successes,
and q* is 1-p*.
Q12. How do you solve for sample size needed in a study of proportion, without knowing
the proportion you are looking for in the first place? Please mention two strategies.
A12. If you can make an informed guess, you can do that. If your guess is a range, use
both ends of the range. If you have no information, p*=.5 yields the biggest margin of
error and thus the biggest and most conservative estimate of the necessary sample size.
Q13. If the guessed proportion is anywhere between .3 and .7, you should as a rule of
thumb use what as the p* or estimated proportion of successes, for your calculations on
determining the necessary sample size?
A13. .5.
Q14. Suppose we give a cancer drug to one group and not to another group, and look at
the difference in proportions of people who survive for five years in these two conditions.
What parameter are we trying to estimate, and what statistic do we use to estimate it?
A14. The parameter is the difference between the population proportions of survivors for
the two conditions. The parameter is the difference in sample proportions.
Q15. How do you find the standard deviation of the difference in sample proportions?
A15. The variance of the difference is the sum of the variances of the individual
proportions. So the sd of the difference is sqrt (p1q1/n1 + p2q2/n2). And because you
don't know the p's and q's for this expression, you substitute the sample statistics for the
population parameters (the usual ploy).
Q16. What is the expression for the confidence interval for the difference of two
proportions?
A16. It's the estimate +- the margin of error, or:
the difference in sample proportions, +- z*(sqrt(p1q1/n1 + p2q2/n2)).
Q17. What are the conditions for the confidence interval for a difference in proportions?
A17. Population 10 or more times the sample; all 4 np and nq quantities are 5 or more.
Q18. In doing a hypothesis test for the difference of two proportions, we compute a z
statistic. What, in general terms (that is without going into the specific formula) is in the
numerator and the denominator of this statistic?
A18. The numerator is the difference in sample proportions. The denominator is an
estimate of the standard deviation of the difference of sample proportions (a.k.a. the
standard error of the difference).
Q19. In the special case where the null hypothesis is that the two proportions are equal, in
other words the difference between them is 0, what do we do differently because of this?
A19. In this case, under the null hypothesis, there is only one population p rather than a
p1 and a p2. So we can substitute p for p1 and p2 in the expression we used for the
standard deviation of the difference, and it simplifies some. To estimate that p, we use the
pooled p-hat, which is the total successes over the total trials for both samples combined.
YMS Chapter 13 Inference for Tables: Chi-Square Procedures
Q1. Suppose that you are dealing with a situation where there are several possible
outcomes, not just 2 (success and failure). You are interested in seeing whether the
proportion of outcomes falling into each of a certain set of categories is consistent with a
certain hypothesized population distribution. What is the name of the test you use?
A1. The chi-square test for goodness of fit.
Q2. Suppose that your hypothesized population distribution for the percent of objects that
are certain colors is 20% black, 50% white, and 30% green. Suppose you draw a sample
of 200, to test this hypothesis. What are the "expected" values that you use when you do
the chi-square goodness of fit test?
A2. 40, 100, and 60.
Q3. Suppose you are doing a goodness of fit test for the distribution hypothesized in Q2.
What is the formula that you use for chi-square?
A3. Chi-square = Summation of (observed-expected)^2/expected. Or
summation of (O-E)^2/E.
Q4. In testing the hypothesis mentioned in Q2, suppose your observed counts are 45, 90,
and 65. What does chi-square equal for this goodness of fit test? Please write a numerical
expression without bothering to calculate the result.
A4. chi-square = (45-40)^2/40 + (90-100)^2/100 + (65-60)^2/60
Q5. Is there just one chi-square distribution, or a family of distributions, with one
distribution for each number of degrees of freedom?
A5. A family, with one distribution for each number of degrees of freedom.
Q6. How do you find the number of degrees of freedom for a chi-square goodness of fit
test? For example, how many degrees of freedom would there be if you were looking at
the proportion of blacks, whites, and greens as in Q2?
A6. The degrees of freedom is one less than the number of categories in the distribution;
for example, when there are blacks, whites, and greens, the number of degrees of freedom
is 3-1=2.
Q7. When you look up in a table or a calculator the P-value associated with a certain chisquare, what is that the probability of?
A7. The probability of obtaining results as extreme as, or more extreme than, the ones
you obtained, if the hypothesized distribution is true. (Extreme means deviant from what
is expected.)
Q8. Is the chi-square distribution symmetrical? If not, in which direction is it skewed?
A8. Skewed to the right.
Q9. When you are doing a chi-square test for goodness of fit, what are the hypothesis H0
and the alternative hypothesis Ha?
A9. The H0 is that the population percents are equal to the set of hypothesized percents.
The Ha is that the population percents do not equal that set of hypothesized percents.
Q10. What are the rule of thumb conditions for the use of the chi-square goodness of fit
test?
A10. All individual expected counts are at least 1 and no more than 20% of the expected
counts are less than 5.
Q11. What's an efficient way of calculating a chi-square on a TI 83 or 84 graphing
calculator? (If you have a TI-89, please tell me how to do the same thing on it.)
A11. Enter the observed counts as a list, say L1, and the expected counts as another list,
say L2. Then define L3 as (L1-L2)^2/L2. On the Ti-83 or 84 you do this by putting the
cursor on L3, and then typing in the expression. Then when you hit the enter key, you see
the (observed-expected)^2/expected values in L3. Then you sum the values in L3. You
can do this by hitting (second) List >math>5. sum, and then L3.
Another way of getting the sum of the values in a list is by hitting stat >calc>1.1-var
stats, and then entering L3. Then you look in the output for the summation of x, in the
second line.
Q12. If a chi-square goodness of fit test yields a significant result, what should you
inspect before you interpret the results?
A12. You see which observed counts deviated the most from the expected ones -- in other
words, you see which cells contributed the most to the chi-square that was calculated.
You take these observations into account when interpreting your results.
Q13. Two-way tables describe relationships between two (categorical or continuous?)
variables.
A13. Categorical
Q14. When there are multiple comparisons that can be made, what two steps are often
carried out?
A14. First an overall test for evidence of any differences among the parameters being
compared, and then a follow-up analysis to decide which parameters differ and to
estimate how large the differences are.
Q15. When doing a chi-square test to compare several proportions, the first step is to set
up the table with the numbers in it being (proportions of success and number of trials, or
counts of the number of cases falling into each category).
A15. Counts of the number of cases falling into each category.
Q16. When there are two categorical variables being displayed in an r by c table (with r
rows and c columns), each of the r x c possible categories into which the observations
may fall is called a _____ of the table.
A16. cell
Q17. When we are comparing the proportion of successes for three treatment conditions,
what null hypothesis would we use?
A17. That the proportion of successes is the same among all three conditions, i.e. that
p1=p2=p3.
Q18. When comparing the proportion of successes for three treatment conditions, what
would be the alternative hypothesis?
A18. That not all the proportions are equal.
Q19. In testing H0 via chi-square with a two-way table, we compare the observed counts
with the expected counts. Evidence against H0 consists of observed and expected counts
that are (far from each other, close to each other).
A19. Far from each other.
Q20. How do you compute the expected count in a certain cell of a two-way table?
A20. The expected count is the (row total * column total)/table total.
Q21. The calculation of the expected value for a cell of a two-way table assumes what
relationship between the row and column variables is (disjoint, independent).
A21. Independent
Q22. Please explain why the formula for expected cell counts is what it is.
A22. The row total/table total is the probability that a case will fall into a certain row.
The column total is the total number of cases that fall into a certain column. When there
are n trials and a probability p of success on n trials, the expected successes is np, if we
assume independent trials. Thus if there are a certain number of cases in the column and
there is a probability row total/table total that each of those cases will fall into the row,
the expected number to fall into both the row and the column is (row total/table total) *
column total, or row total * column total/table total. This assumes that the probability of
a case's being in a certain row is independent of what column it's in.
Q23. When you want to test the statistical significance of the deviation of observed from
expected counts, in a two-way table, using chi-square, how do you compute the chisquare statistic?
A23. chi-square is the summation of the (observed count - expected count)^2/expected
count. The summation is over all r * c cells of the table.
Q24. Large values of chi-square are evidence for, or against H0? Why?
A24. Against. This is because chi-square will be bigger, the bigger are the deviations of
observed counts from those that would be expected under H0.
Q25. How many degrees of freedom do you have in a chi-square test with an r * c twoway table?
A25. (r-1)(c-1)
Q26. True or false: when doing chi-square tests, the p-value is always the area under the
distribution curve that is to the right of the observed chi-square, and never the area to the
left.
A26. True. For the chi-square distribution, the farther you go to the right, the more you
have deviated from the null hypothesis. The value most consistent with the null
hypothesis is 0, which is the left end of the domain for the function. To get the
probability of results as deviant as, or more deviant than, the obtained results, you look at
the probability under the curve to the right of the obtained results. (This includes the
probability exactly at the obtained results, but since chi-square is a continuous function,
the distinction between "above" and "at or above" is not meaningful.)
Q27. What cell counts are required for doing a chi-square test for homogeneity of
populations?
A27. The same as for tests of goodness of fit: all expected counts are 1 or greater, and no
more than 20% of the expected counts are less than 5.
Q28. In the special case of a two-by-two table (r=2 and c=2), how many cell counts need
to be 5 or greater in order to do a chi-square?
A28. All four of them.
Q29. How many degrees of freedom would be used for a 3 by 2 table?
A29. (3-1)*(2-1)=2
Q30.Just for review: what are the four steps for the "inference toolbox" that you use for
doing chi-square tests as well as any other hypothesis tests?
A30.1) Identify populations of interest; state hypotheses in words and symbols. 2)
Choose the appropriate inference procedure and verify the conditions for its use. 3) Carry
out the procedure. 4) Interpret your results in the context of the problem.
Q31.After having done an overall test rejecting the hypothesis that all the proportions are
equal, what should be done?
A31.A follow-up analysis that asks which cells most contribute to the deviation from
expectations under the null hypothesis. You can do this informally by observation; there
are more formal methods that do significance tests and confidence intervals for the
individual proportions.
Q32.True or false: the chi-square tests the hypothesis that "the row and column variables
are not related to each other," even when it is difficult to conceive of the groups defined
by the rows and columns as different populations, i.e. when you are dealing with the
relation of some variables in one population.
A32.True.
Q33.True or false: for a chi-square test of association/independence of variables, you
compute the expected counts just as in the other situations: the row total * column total/
table total.
A33.True.
Q34.True or false: converting table entries to percents is not necessary for the
computation of chi-square, but it does help to shed light on the association among the
variables.
A34.True.
Q35.For a chi-square test of association/independence of variables, what is the null
hypothesis?
A35.That the variables are independent, or that there is no association between them.
Q36.True or false: the distinction between tests of homogeneity of populations and tests
of association/independence is that in the first, there is a sample from each of two or
more populations, and in the second, there is a single sample from a single population.
A36.True. (However, distinguishing whether there is one or more than one population
involved in a study can be a debatable procedure. If you collect a sample of people, some
of whom are wealthy and some of whom are poor, can you argue that you have sampled
some individuals from the population of poor people and some from the population of
rich people? Or have you drawn from one population of people, who simply differ in one
variable? Fortunately, the chi-square test is done in the same way regardless of the
outcome of such a debate.)
Q37.When there is a two-by-two table, and you wish to compare two proportions, how
will a two-sided z test for equality of proportions and a chi-square test compare with
respect to the p values that result?
A37.The same p values will result.
Q38.If there is a two-by-two table and you wish to compare two proportions, which test
is usually recommended, between a z test and a chi-square, and why?
A38.The z test has the advantages that it is related to a confidence interval for the
difference in proportions, plus you can do a one-sided test if desired.
YMS Chapter 14: Inference for Regression
Q1. To review: we use least-squares regression to study the relation between a couple of
variables, both of which are (quantitative, categorical).
A1. Quantitative
Q2. Before doing regressions to study the relationship between two quantitative
variables, we should explore the data by examining a _______.
A2. Scatterplot.
Q3. The statistic that describes the strength of a linear relationship, that is the same
whichever variable is thought of as the explanatory variable, and which has a familiar
relationship to the percent of variance in one variable explained by the other, is the
______ ______.
A3. correlation coefficient (or just, the correlation)
Q4. The formulas for generating a slope and intercept to form a regression line are set up
so as to minimize what quantity?
A4. The sum of the squared residuals or errors.
Q5. What is a residual?
A5. A residual is a vertical distance between the data point and the regression line, or yy-hat.
Q5. The r-squared, which is part of the regression output, tells us how much of what is
what?
A5. How much of the variation in the y variable is accounted for by the linear
relationship with x.
Q6. Suppose we draw lots of samples and compute a regression line for each sample.
The slope and intercept of each sample line estimate a true value. Thus the slope and
intercept we obtain from our sample are _____ that estimate population ______.
A6. Statistics parameters
Q7. One of the conditions for regression inference is that for any fixed value of x, the
response variable y varies according to a _____ distribution.
A7. Normal
Q8. Another assumption for regression inference is that for any fixed value of x, the
repeated responses y are ____ of each other.
A8. Independent.
Q9. Another assumption for regression inference is that the means of the sets of y-values
for each x value have what relationship to the x values?
A9. That the means of the y's for each x are a linear function of x:
mean for y's = alpha + beta * x
Q10. Another assumption for regression inference is that what measure of dispersion is
equal for each value of x?
A10. The standard deviation of the y's for the various x values.
Q11. True or false: the slope and intercept we obtain from the least squares regression for
our sample are unbiased estimators, respectively, of the line connecting the population
means for each of the x's.
A11. True.
Q12. What is the unbiased estimator for the standard deviation of the y values around the
regression line (in other words, the standard deviation of the y values around the means
of each of those values for each x)?
A12. The statistic called s, which is the standard error, or the standard deviation of the
residuals. .
Q13. What's the formula for the standard error s obtained from a regression?
A13. The square root of (the sum of squared residuals/(n-2)). Or
s= sqrt (summation of (y-y-hat)^2)/(n-2))
Q14. The parameter we are usually most interested in estimating from regression output
is the (slope, y-intercept) of the line.
A14. slope
Q15. How do we put a confidence interval around the slope of a regression line?
A15. Confidence interval = b+- t*(SEb)
where b is the obtained slope, t* is the (1-C)/2 critical value for t, and SEb is the standard
error of the slope.
Q16. When you find the confidence interval for the slope of a regression line, one of the
variables in the equation is t*. Please explain how you would find t* when your sample
size is 40 and your desired level of confidence is 95%.
A16. You would use n-2 or 38 degrees of freedom. You could use an inverse T function
on your calculator to find the t score that corresponds to .025 (1-C)/2 (or the T that
corresponds to .975. Or you can look up the result in a table of critical values for T. (If
the table give values for 30 and 40 but not 38, you can extrapolate, i.e. get the value that
is .8 of the way between the value for 30 and the value for 40.)
Q17. The standard error of the slope is a function of the s. But it's not really necessary to
memorize the formula for this, because _____.
A17. Regressions are seldom done by hand, and the SEb is part of the regression output
of the calculator or computer program.
Q18. The most commonly tested hypothesis about regressions is that Beta, the
"Population slope," is 0. Can you put this hypothesis in some other phrasings?
A18. That the straight line dependence on x is of no value in predicting y. Or that the
population correlation between x and y is 0. Or that there is no true linear relationship
between x and y in the population.
Q19. If you form the ratio of the slope obtained in your sample to the standard error of
that slope, what is the sampling distribution of that statistic?
A19. It's distributed according to the t distribution, with n-2 degrees of freedom.
Q20. Regression output usually gives a two-sided p value for the hypothesis test that the
population slope is 0. How do you obtain a one-sided p-value for the same hypothesis?
A20. Divide the two-sided p-value by two.
Q21. Someone asks, "The regression equation we obtained from analyzing our data is
y=2 + 10x. What's the 95% confidence interval for the y when x is 8?" The other person
says, "That depends upon what question you want the confidence interval to answer."
What does the second person mean by this?
A21. The two possible questions are: 1) "What is the 95% confidence interval for the
MEAN of the y values when x=8," and 2) "What is the 95% confidence interval for a
single y value when x=8?" In other words, what do we want to predict: the average value
for a group, or a score for a single individual?
Q22. When we are getting a confidence interval for the predicted mean of y's for a given
x value, our confidence interval is (narrower, wider) than when we get a "prediction
interval" for a single individual?
A22. The confidence interval is narrower for predicting the mean than predicting an
individual.
Q23, Suppose that some researchers are testing how long a thermometer takes to come to
equilibrium, given the temperature difference between starting and ending temperature.
They regress the time to equilibrium on the temperature difference. Then the researchers
want to produce a table for a clinician. The table will contain numbers so as to help the
clinician predict the time to wait for a certain individual who is using the thermometer,
such that there is a 95% chance that this individual will not start trying to raise the
temperature before the temperature has equilibrated. Please explain whether the
researchers should use a confidence interval or a prediction interval in constructing their
table.
A23. The researchers are interested in predicting an individual response, not the mean of
a set of responses. This is what the prediction interval chooses the correct interval for.
Q24. Please list 4 conditions for inference in regression.
A24. Independent observations, true relationship linear, standard deviation of the
response about the true line is the same everywhere, and the response varies normally
about the true line.
Q25. Is there an additional condition placed upon the construction of a prediction interval
via regression?
A25. The prediction interval assumes normality of the individual observations, not just
the approximate normality of statistics like the slope and intercept.
Q26. Suppose that in a residual plot, the values are close to 0 when x is low, but the
residuals get bigger and bigger in absolute value as the x values get greater. What
condition of regression is violated in this circumstance?
A26. The condition that the standard deviation of the response around the true line is the
same everywhere.
Q27. Someone examines a residual plot and a scatterplot and observes a curvilinear
pattern. What condition of regression is being violated, and what should the researcher
consider doing in order to correct this?
A27. The condition violated is that the true relationship is linear. The researcher should
consider transforming one or more of the variables.
YMS Chapter 15: Analysis of Variance
Q1. Inference about the spread of populations and inference about several means by oneway-analysis of variance are very different goals. But they will both involve the use of
what distribution?
A1. The F-distribution.
Q2. What's one of the big problems with trying to infer whether one population's spread
is greater than another by the F test?
A2. That the test is extremely sensitive to departures from normality, and this situation
does not improve as the sample size gets larger. Thus it's difficult to tell whether a
significant F-value implies unequal population spreads, or nonnormal populations.
Q3. What's a problem with even using the standard deviation as a measure of spread, with
asymmetric distributions?
A3. When distributions are skewed, with unequally spread tails, no single measure of
spread does a good job of describing the spread.
Q4. At one time, it was common to test equality of standard deviations as a preliminary to
performing a pooled two-sample t test. What's now regarded as better practice?
A4. To check the distributions graphically, and to use the version of the two-sample t-test
that does not require equal standard deviations.
Q5. How do you do a significance test of the hypothesis that two sample standard
deviations are equal?
A5. You compute the ratio of the larger variance (standard deviation squared, not the
standard deviation itself) to the smaller variance. This is the F statistic, with n1-1 and n21 degrees of freedom. Then you compare the value of F with critical values for rejection
of the null hypothesis at a given region. Or, you use a calculator or statistical software to
arrive at a P-value for this F statistic and these two degrees of freedom.
Q6. Suppose you want a two sided p value once you have computed a one-sided p-value.
What do you do?
A6. Double the one-sided value to get the two-sided p-value.
Q7. If you want to carry out a significance test for hypothesis that two sample spreads are
equal, how do you do it on a TI 83 or 89 calculator?
A7. You use stat>tests>D:2-SampFTest. Then you fill in the blanks for the two standard
deviations and sample sizes, and the calculator does the rest for you. Notice that you
enter the standard deviations, not the variances; the calculator does the squaring for you
in calculating F.
Q8. Now that you've learned how to compare two standard deviations with the F test, is it
recommended that you go out and start using this knowledge?
A8. No, because the sensitivity to normality limits the usefulness of the procedure.
However, understanding the F test is going to help you understand ANOVA, and thus
you haven't wasted your time!
Q9. What is the main objective of one-way analysis of variance?
A9. To compare several group means.
Q10. If we are comparing, say, 10 different treatments, why do we prefer analysis of
variance to using a bunch of t-tests to compare the treatments 2 at a time?
A10. Because in this situation, for example, that involves (10*9)/2=45 different pairwise
comparisons, and the results get overwhelming. Also, you run into the multiple
comparisons problem, where the more you do, the more you get some significant by
chance alone.
Q11. What's the usual two-step procedure in doing multiple comparisons?
A11. First do an overall test to see if there is good evidence of any differences among the
parameters. Then do a detailed follow-up to decide which parameters differ and to
estimate how large the differences are.
Q12. In doing an anova with several groups, what are the null hypothesis and the
alternative hypothesis?
A12. The null hypothesis is that all the population means are equal, and the alternative
hypothesis is that not all the means are equal.
Q13. When you look at computer output regarding a one-way anova, where do you look
to see if the null hypothesis was rejected?
A13. You look at the far right column of the output at the p-value. (This is low when the
F-statistic is high, relative to the degrees of freedom.)
Q14. What is the main idea of analysis of variance?
A14. You look at the variation among sample means, compared with the variation within
each sample. The variance within the samples provides an indicator of how much the
means should differ by the randomness of sampling; if the means differ by more than this
chance expectation, you reject the null hypothesis.
Q15. The F statistic for one-way anova is the ratio of what to what?
A15. Variation among the sample means/variation among the individuals in the same
sample.
Q16. When doing an F test, how do you get the numerator and denominator degrees of
freedom?
A16. The numerator degrees of freedom is one less than the number of groups. The
denominator degrees of freedom is the total number of subjects in the study, minus the
number of groups.
Q17. When you are doing a one-way anova, you are thinking about how many different
populations?
A17. One population for each of the groups.
Q18. You assume that each group is derived in what manner from its population, when
doing a one-way anova?
A18. Each group is a SRS of its population.
Q19. What other assumptions or conditions are there for one-way anova, other than that
each group is a SRS of its population?
A19. That each population has a normal distribution, and that all the populations have the
same standard deviation.
Q20. With respect to robustness to violations of the assumption of normality, does anova
more resemble the t-test, which is rather robust, or the F-test for differences in population
standard deviations, which is not very robust?
A20. The T-test, which is rather robust.
Q21. What's a rule of thumb that helps you decide when the violation of the assumption
of equal variances of the populations invalidates the one-way anova?
A21. When the largest sample standard deviation is no more than twice as big as the
smallest one.
Q22. The numerator of the ratio used in one-way anova is called the mean square for
groups (elsewhere a.k.a. the mean square "between"). How is this calculated?
A22. You take each of the group means, subtract the overall mean from it, and square
each of those deviations; you multiply each squared deviation by the number in that
group; then you sum these terms, then divide the sum by one less than the number of
groups. In other words,
MSG= (summation of ni(x-bari-xbar)^2)/I-1 where ni is the number of subjects in the ith
group, x-bari is the mean of the ith group, xbar is the overall mean, and I is the number of
groups.
Q23. The denominator of the ratio used in one-way anova is called the mean square for
error (elsewhere a.k.a. the "mean square within"). How is this calculated?
A23. You take the variance of each group and multiply it by one less than the number in
that group -- this is the "sum of squares" for the group, i.e. the sum of squared deviations
of the individual observations from their group means. (Alternatively, you can just
compute the sum of the squared deviations from the group means, without converting to a
variance and back again to a sum of squares!) You add up all those squared deviations,
and divide by the total number of subjects in the study minus the number of groups. In
symbols,
MSE= (summation of (ni-1)si^2))/(N-I) where ni is the number of subjects in group i, si is
the standard deviation of group i, N is the total subjects in the study, and I is the number
of groups.
Q24. In the anova table: the mean square for groups is the ratio of what two quantities?
A24. The sum of squares for groups over the degrees of freedom for groups.
Q25. The mean square error is the ratio of what two quantities?
A25. The sum of squares error (meaning the sum of the squared deviations of each
observation from its group mean) over the degrees of freedom for error (which is the
number of observations minus the number of groups.)
Q26. The F ratio presented in the Anova table is the ratio of what two quantities?
A26. The mean squares for groups over the mean squares for error.
Q27. The square root of the MSE (mean squares for error) is called what?
A27. The pooled standard deviation sp, which estimates the common standard deviation
of each group.
Q28. How do you form a confidence interval for the means found in an analysis of
variance?
A28. You take each sample mean, and add to and subtract from it, its margin of error.
The margin of error is the critical value of t for the desired interval, multiplied by the
pooled standard deviation, divided by the square root of the number in that group.
Q29. If you have a bunch of group means, and you know the sample size for each group,
and you want to calculate the overall mean from them, how do you do that?
A29. Multiply each group mean by the number in that group, to get the sum of the
observations in that group; add up all those sums to get the overall sum; add up the n's for
each group to get the overall n; divide the overall sum by the overall n.
YMS Chapter 16 Multiple Regression
Q1. Please explain both in words and symbols what the multiple regression model is, or
the population regression equation.
A1. There is a response variable that is a linear function of not just one explanatory
variable, but more than one. The prediction equation is
muy =beta0 +beta1x1 ... +bpxp
where muy is the mean y value for a given subpopulation defined by a particular
combination of scores on the several different values of the several variables, P is the
number of predictor variables (statisticians like the letter p, don't they? Proportion, pvalue, and now number of predictor variables!)
beta0 is the y-intercept of this equation,
and each of the other terms involves a certain coefficient multiplied by the value of the
variable that coefficient applies to.
Q2. What are some other assumptions of the regression model?
A2. That in each subpopulation, determined by a certain combination of explanatory
variable values, y is normally distributed with a mean given by the population regression
equation. Also, the model assumes that the standard deviation of the response variable is
the same in all the subpopulations.
Q3. Please give an example of predicting something from more than one explanatory
variable, in such a way that a multiple regression equation could be used.
A3. The example our text uses is one in which grade point average for computer science
students in college after 3 semesters is predicted by math, science, and English average
grades in high school. We can think of an almost infinite number of other examples: life
span as a function of number of cigarettes smoked, years lived by grandparents, and body
weight; popularity as a function of physical attractiveness, conversation skill,
friendliness, and self-control; wealth as a function of parents' wealth, educational
attainment, and stated desire for high income; etc.
Q4. According to the notation for naming observations our text uses, xij means what?
A4. The value of the jth explanatory variable for the ith subject or case.
Q5. True or false: the multiple regression model requires the explanatory variables to be
normally distributed.
A5. False. Only the population y's are assumed to be normally distributed around their
means.
Q6. True or false: The first step in doing multiple regression analysis is to use graphs and
tables to do exploratory analysis with all the variables individually.
A6. True.
Q7. True or false: when doing a multiple regression analysis, the second step is to use
scatterplots and correlations to examine the bivariate relations between pairs of variables.
A7. True.
Q8. What does the equation data=fit+residual mean?
A8. In the multiple regression equation, the "data" is the observed data point; the "fit" is
the y-hat, or predicted y-value, or predicted mean of the y-values for that combination of
x-values; the residual is the difference between the actual y value and the predicted y
value.
Q9. True or false: in the multiple regression model, the residuals are independent,
normally distributed, and with mean 0 and sd sigma.
A9. True.
Q10. What are the parameters of the multiple regression model?
A10. Beta0, Beta1, Beta2... BetaP and sigma, the standard deviation of each y value
about its mean (or the standard deviation of the residuals).
Q11. The b coefficients in the multiple regression equation obtained from our sample
data are chosen so as to minimize what?
A11. The sum of the squares of the residuals.
Q12. The s that our computer output reports for multiple regressions, that is the estimator
of the sigma of each y value for each combination of x values, is computed how?
A12. s is the square root of (the sum of squared residuals/n-p-1) where n is the number of
subjects, p is the number of explanatory variables, and n-p-1 is the degrees of freedom for
this quantity.
Q13. How do you get a confidence interval for a certain regression coefficient?
A13. The confidence interval is that coefficient plus or minus t*(standard error of that
coefficient).
Q14. How do you do a significance test for the null hypothesis that a given regression
coefficient is equal to 0?
A14. Your test statistic is t=b/standard error of b, with n-p-1 degrees of freedom.
Q15. In multiple regression, as in bivariate regression, we can ask software to produce
confidence intervals or prediction intervals. What is meant by a 95% interval of each of
these types?
A15. A 95% confidence interval means that we have 95% confidence that the interval we
produce encloses the mean of y for the given combination of x's. A 95% prediction
interval means that we have 95% confidence that an individual observation of y with the
given combination of x's would fall into the given interval.
Q16. What are the two sources of variation for the y values analyzed in a multiple
regression model?
A16. The y values are different for two reasons: first, they have different x-values.
Second, even for a given set of x-values, the y values vary around the mean y value -- in
other words, the residuals are not zero. So there is variation due to the regression and
variation due to the residuals or to error.
Q17. The total sum of squares of the y values, that is the sum of the squared deviations of
the y values from the overall y mean, can be partitioned into what two components?
A17. The two components are the sum of squares for the model, i.e. the sum of squared
deviations of the predicted y's from the overall y mean, and the sum of squares for error
or residuals, i.e. the sum of squared deviations of the actual y's from the predicted y's. In
other words, SS total = SS model + SS error.
Q18. The analysis of variance that is done for a multiple regression tests what null
hypothesis?
A18. That all our population regression coefficients are zero.
Q19. In general, the mean squares of something is equal to what divided by what?
A19. The sum of squares divided by the degrees of freedom.
Q20. The F ratio presented in the analysis of variance for multiple regression is the ratio
of what to what?
A20. The ratio MSM/MSE, or the mean squares for the model divided by the mean
squares for error.
Q21. The degrees of freedom for the model is (p, or n-p-1)?
A21. p (the number of explanatory variables. The total degrees of freedom is n-1, and that
minus p give n-p-1 for the degrees of freedom for error.
Q22. When there is only one predictor variable, the t-test for the regression coefficient b
and the F test in the analysis of variance always yield the same p value, because they are
testing the same hypothesis. In this case, the F bears what relation to t?
A22. F is the square of t.
Q23. The R2 statistic for multiple regression is the ratio of what to what?
A23. The sum of squares for the model to the sum of squares total.
Q24. The R2 statistic for multiple regression has what meaning?
A24. The proportion of the total variation in y that is explained by the explanatory
variables.
Q25. The square root of the R2 statistic for multiple regression is called the multiple
correlation coefficient, and is interpretable as the correlation between what and what?
A25. The y-hats that are predicted by the multiple regression equation, and the actual y
values. In other words, the correlation between the actual and predicted values of y.
Q26. When, in regression output, you find that a regression coefficient is significantly
different from zero, what should you check to help you interpret the meaning of that
coefficient?
A26. You should look at the sign of the coefficient and figure out whether the
relationship between the explanatory variable and response variable is an increasing or
decreasing function.
Q27. When examining multiple regression output, why is it a good idea to verify the
degrees of freedom?
A27. To check for errors in data entry or in specifying the model.
Q28. In the regression output, what is the meaning of the quantity that some statistical
packages call the Root MSE and others call s?
A28. The root mean square error or the s is the estimate of sigma, the standard deviation
of the y values about their individual means. This is estimated by the square root of the
MSE given in the Anova table; the MSE is the SSE/DFE or sum of squares error divided
by the degrees of freedom for error (and that is n-p-1).
Q29. The parameter estimates of the regression output, or the coefficients (as they are
called by other packages) can be put together to form what?
A29. The fitted regression equation, the Y-hat=b0 + b1x1 ... prediction equation.
Q30. The t statistics for each parameter estimate are formed by dividing what by what?
A30. The value of each coefficient (or intercept) by its standard error.
Q31. Are the p values in the regression output two-sided or one-sided?
A31. Two-sided. The null hypothesis is that the coefficient is 0 and the alternative is that
it's not 0.
Q32. True or false: the significance tests for the coefficients test, not whether the single
variables can predict the response variable, but whether those variables improve upon the
prediction achieved by all the other variables in the model. Thus you can have a variable
that correlates well with the explanatory variable by itself, which does not have a
significant p-value in the regression equation, because it does not add to what other
explanatory variables can do.
A32. True.
Q33. It is good practice to make several residual plots when doing a multiple regression.
What several plots should be done?
A33. Plot the residuals as a function of each of the explanatory variables. You can also
make a normal probability plot to determine the normality of the residuals.
Q34. Suppose that after examining the regression output with three explanatory variables
in the model, you eliminate one of the explanatory variables and run a multiple regression
with two explanatory variables. As a rule, the intercept, coefficients, t's, and p values for
the remaining values will (remain the same as they were before, change)?
A34. Change. The values you find are thus contingent upon what other variables are in
the model.
Q35. True or false: it is possible, using multiple regression, to test a certain set of
variables, asking whether that set of variables improves significantly upon the prediction
made by another set of variables.
A35. True.
YMS Chapter 17: Logistic Regression
Q1. When our response variable is dichotomous, or has only two values (like live or die)
and there is either one or several explanatory variables (that can be categorical or
quantitative) such as for example, severity of condition on admission to the hospital,
degree of training of the doctor, and rating of cleanliness of the ward, then the method of
_____ ______ is often an appropriate method of analysis.
A1. Logistic regression
Q2. Why can't you use ordinary least-squares regression with the situation when the
response variable is dichotomous?
A2. Because the least-squares regression model we've studied so far carries the condition
that the y values are normally distributed around the model fit. You can't have a normal
distribution with only two possible values.
Q3. Suppose there are 200 violent people among a set of 1000. Thus the proportion of
violent people is .20. What are the odds of being a violent person in this set?
A3. Odds=p/(1-p) = .20/(1-.20)=.20/.80=.25. OR: odds = ratio of violent to nonviolent, =
200/800=.25.
Q4. Suppose that in our group of 1000, 150 of 500 males are violent and 50 of 500
females are violent. What method did we learn in chapter 12 for comparing these
proportions via a confidence interval?
A4. We would get a confidence interval for the difference between the two proportions,
which are .30 and .10. We would use the normal approximation to the binomial
distribution to do this. We would use the fact that the variance of a difference between
two independent random variables is the sum of the variances of the variables, to
compute the standard error of the difference in proportions by taking the square root of
the sum of the pq/n for men and women. We'd multiply that standard error of the
difference by the critical z, or z*, for whatever level of confidence we wanted, and add
and subtract that margin of error to and from the obtained difference in proportions.
Q5. If instead of this method, we use logistic regression, how would we turn gender into
an "indicator variable"?
A5. By letting, for example, female=0 and male=1.
Q6. If we have 150 out of 500 males violent and 50 out of 500 females violent, how
would we transform these proportions to odds, if we are doing logistic regression?
A6. For males, the odds of violent to nonviolent are .15/.35=.43. For females, the odds
are .5/.45=.11.
Q7. In simple linear regression, we modeled the mean of the response variable for a
given value of x by mu=B0+B1x. Why don't we do something similar with proportions,
and model p= B0+B1x?
A7. Because with extreme values of x, we get values of p that are inconsistent with the
constraint that p has to be between 0 and 1.
Q8. So what model do we use for logistic regression?
A8. We transform the odds by taking the natural log of it, and let that equal B0+B1x.
Thus
ln(p/(1-p)) = B0+B1x
Q9. If we have 150 out of 500 males violent and 50 out of 500 females violent, how
would we compute the logistic regression equation?
A9. The odds for men are .43 and the odds for women are .11. The natural logs, or ln's,
respectively, are -.844 and -2.21. So with x=0 for women, the ln(p/(1-p)) becomes= B0.
So B0= -2.21. For men, -.844 = -2.21 + B1*1, so B1=2.21-.844=1.366.
Thus the logistic regression equation is:
ln(p/(1-p))=-2.21+1.366x
Q10. True or false: in the logistic regression example that we just gave, the slope in the
logistic regression model is the difference between the natural logs of the odds for men
and women, that is the odds for men minus the odds for women.
A10. True.
Q11. Given that the slope, 1.366, is the difference in the natural logs of the odds, the
antilog of that number corresponds to what relation between the odds?
A11. The odds ratio, which is the odds of violence for men divided by that for women.
Since the ln (a/b)=ln(a)-ln(b), the antilog of ln(a)-ln(b)=a/b. Thus the antilog of 1.366 (or
e to the 1.366 power) is equal to the ratio of the odds for men to the odds for women.
This equals 3.92. To check, the ratio of .43 to .11 is approximately the same.
Q12. More generally, the odds ratio is the ratio of the odds for explanatory variable equal
to _____ to the odds for explanatory variable equal to ______.
A12. x+1, x
Q13. How do we form a confidence interval for the slope in the logistic regression
model?
A13. It's the obtained slope plus or minus z*(SEb1), where z* is the critical value of z in
the normal distribution, and SEb1 is the standard error of the regression coefficient. (The
computer prints out the standard error of the regression coefficient.)
Q14. A confidence interval for the odds ratio is obtained by getting the antilogs of what?
A14. The antilogs of the boundaries for the confidence interval for the slope.
Q15. To do a hypothesis test of the null hypothesis that the slope in the logistic regression
is 0, what statistic do you use?
A15. The square of the ratio of the slope to its standard error is distributed according to
the chi-square distribution with 1 degree of freedom.
Q16. How can the hypothesis that the slope is 0 be translated to hypotheses about the
odds ratio?
A16. If the slope in the logistic regression model is 0, then the odds ratio is e0, or 1. In
other words, the explanatory variable does not alter the odds for the response variable.
Q17. We would reject the null hypothesis at the .05 level if the 95% confidence interval
for the odds ratio did not include what value?
A17. One.
Q18. Suppose that instead of the simple case where you have one explanatory variable
that takes on only two values, you have one explanatory variable that takes on, say, five
or six values. Then your raw data are a proportion, that is associated with each of those
values of the one explanatory variable. How do you fit the logistic regression model to
these data?
A18. You transform the proportions to ln(odds), or ln(p/(1-p)) as before. Then you do a
least-squares regression with the ordered pairs that you have. The result is an equation of
the form
ln(p/(1-p)) = B0+B1x
as before.
Q19. When doing this logistic regression, as in doing all analyses, we should examine the
data via graphs. What graph should we do?
A19. We should do a scatterplot of ln(p/(1-p)) versus the explanatory variable x. (Note:
in the pesticide example in the text, the explanatory variable is also transformed to get a
better linear relationship.)
Q20. When we do a logistic regression with more than one explanatory variable, the
computer output that is labeled "chi-square for covariates" gives a chi-square statistic and
a p-value that is testing what null hypothesis?
A20. That all the regression coefficients in the model are equal to zero.
Q21. True or false: as in multiple linear regression, the regression coefficients obtained
from a multiple logistic regression are meant to measure how much additional each
variable contributes to the prediction over that contributed by the other variables, and not
to measure how much that variable by itself predicts the response variable.
A21. True.
Q22. True or false: as in multiple linear regression, when you are doing multiple logistic
regression, you can sometimes get results when, for example, one or two of three
explanatory variables are included in the model that you would not have predicted from
looking at the results when all three variables are included in the model.
A22. True.
Download