Statistical Inference in Education

Introduction to

Statistical Inference

EDUC 502

November 28, 2005

Statistical Inference in Education



Illuminating article:

Daniel, L.G. (1998). Statistical significance testing:

A historical overview of misuse and misinterpretation with implications for editorial policies of educational journals. Research in the

Schools, 5 (2), 23-32. Available online: http://www.personal.psu.edu/users/d/m/dmr/ sigtest/3mspdf.pdf




“Probably few methodological issues have generated as much controversy among sociobehavioral scientists as the use of

[statistical significance] tests” (Pedhazur &

Schmelkin, 1991, p. 198).



“The test of significance does not provide the information concerning psychological phenomena characteristically attributed to it…a great deal of mischief has been associated with its use” (Bakan, 1966, p. 423).




Huberty (1987) asserted, “There is nothing wrong with statistical tests themselves! When used as guides and indicators, as opposed to means of arriving at definitive answers, they are

Okay” (p. 7).



Main problem: “The ingenuous assumption that a statistically significant result is necessarily a noteworthy result” (Daniel, 1997, p. 106).




Another problem: It is “common practice to drop the word ‘statistical’ and instead speak of

‘significant differences,’ ‘significant correlations,’ and the like” (Pedhazur & Schmelkin, 1991, p.

202).



Schafer (1993) noted, “I hope that most researchers understand that significant

(statistically) and important are two different things. Surely the term significant was ill-chosen”

(p. 387).




In order to better understand this controversy, we will explore some of the mathematics behind statistical inference.



We will follow the outline provided by:

Moore, D.S. (1997). Statistics: Concepts and

controversies (4 th ed.). New York: W.H. Freeman.




Inference simply means drawing conclusions from data, as we have discussed up to this point.



The phrase “statistical inference” is reserved for occasions when probability concepts are used to help in drawing conclusions.



Probability can account for chance variation, which allows us to correct our judgment of what is happening in certain situations.






Scenario: Suppose a multiple choice test is used to compare the performance of students receiving teaching method A to teaching method

B. 20 students were assigned at random to teaching method A and another 20 to teaching method B. At the end of the experiment, 12 of the students in group A received Fs on the test while only 8 in group B received Fs.

Question: Can we conclude that teaching method B better prevents students from receiving Fs?




Answer: Not necessarily. A difference this size could likely be due to chance variation alone. We could do a probability calculation to compute the probability of avoiding an F just by guessing and then compare.



While there is a numerical difference between the number of Fs in the two groups, that difference might vanish if the experiment were repeated a number of times.




Drawing conclusions in mathematics: Start with a hypothesis and then use a logical argument to prove that the conclusion follows.



Example: If a quadrilateral is a rectangle, then its diagonals are congruent. (This can be proven through an a priori logical argument – not by just examining a bunch of rectangles and measuring their diagonals to see if they are congruent).






Drawing conclusions in social science is almost the opposite of mathematics: You need to start with a number of observations and draw conclusions from them. (Inductive reasoning).

Important implication: Social science research studies do NOT produce proofs. They only produce evidence that something may or may not be the case. (i.e., you can never prove that teaching method A is better than method B, but you can systematically gather evidence to help you make decisions about how to teach).




“Statistical inference uses probability to say how strong an inductive argument is” (Moore, 1997, p. 459).



In the teaching method A vs. teaching method B scenario, a probability calculation could help us see that the argument in favor of teaching method B is not very strong. We could likely get different results if the experiment were replicated a number of times.




Note: The probability calculations required for statistical inference depend upon probability samples or randomized comparative experiments.



Very few educational research studies have this sort of luxury, with a few notable exceptions.

For example:



National Assessment of Educational Progress

(NAEP)



Trends in Mathematics and Science Study (TIMSS)

Some Essential Terminology





“A parameter is a number that describes the population. For example, the proportion of the population having some characteristic of interest is a parameter we call p. In a statistical inference problem, population parameters are fixed numbers, but we do not know their values”

(Moore, 1997, p. 460).

Example: The actual proportion of 3 rd graders who can read in the U.S. is a population

parameter. We can only estimate it by drawing random samples from the population. We will probably never know it exactly.

Some Essential Terminology





“ A statistic is a number the describes the sample data. For example, the proportion of the sample having some characteristic of interest is a statistic the we call p-hat. Statistics change from sample to sample. We use the observed statistics to get information about the unknown parameters”

(Moore, 1997, p. 460).

Example: We could draw a random sample out of all the 3 rd graders in the U.S. and administer a literacy test. The proportion that could read would be a statistic to estimate the population parameter.

Confidence Intervals



Scenario: “The NAEP survey includes a short test of quantitative skills, covering mainly basic arithmetic and the ability to apply it to realistic problems. Scores on the test range from 0 to

500. For example, a person who scores 233 can add the amounts of two checks appearing on a bank deposit slip; someone scoring 325 can determine the price of a meal from a menu, a person scoring 375 can transform a price in cents per ounce into dollars per pound” (Moore,

1997b, p. 207).




Scenario (contd).: “In a recent year, 840 men 21 to 25 years of age were in the NAEP sample.

Their mean quantitative score was 272 (statistic).

These 840 men are a simple random sample from the population of all young men. On the basis of this sample, what can we say about the mean score in the population of all 9.5 million young men of these ages (parameter)?” (Moore,

1997b, p. 207).








Because the statistic was 272, you might guess the actual population parameter is around 272.

Statistical Inference question related to confidence intervals: “How would the sample mean (statistic) vary if we took many samples of

840 young men from this same population?”

(Moore, 1997b, p. 207).

This seems like an impossible question to answer on the face of it, but some statistical facts help us out.




Useful fact #1: The sampling distribution for sample means is normally distributed!



Useful fact #2: The mean of the sampling distribution is equal to the mean of the population.



Useful fact #3: The 68-95-99.7 rule for normal distributions.



Useful fact #4: From long experience, we calculate the standard deviation of the sampling distribution to be 2.1.








Putting the facts together: The 68-95-99.7 rule says that about 95% of the means will be within two standard deviations of the population mean.

In our case, 95% of the sample means will be within 4.2 points of the population mean.

In 95% of all samples taken, the actual population mean is within 4.2 points of the sample mean.

This means that in 95% of all samples the actual population mean lies between (sample mean) –

4.2 and (sample mean) + 4.2






Bottom line: If we choose very many samples,

95% of the intervals defined by (sample mean) plus or minus (4.2) will capture the actual population mean.

Back to the NAEP scenario: Recall that our sample mean was 272. This means we can say that we are 95% confident that the actual population mean for the NAEP lies between:

272-4.2 = 267.8 and 272+4.2 = 276.2.




“Be sure you understand the grounds for our confidence. There are only two possibilities:



1. The interval between 267.8 and 276.2 contains the true population mean.



2. Our simple random sample was one of the few samples for with the sample mean is not within 4.2 points of the true population mean. Only 5% of all samples give such inaccurate results” (Moore, 1997, p. 210).




“We cannot know whether our sample is one of the 95% for which the interval catches the actual population mean, or one of the unlucky 5%.



The statement that we are 95% confident that the actual population mean lies between 267.8 and 276.2 is shorthand for saying, ‘We got these numbers by a method that gives correct results

95% of the time” (Moore, 1997b, p. 210).

Homework Exercise 1



“The report of a sample survey of 1500 adults says, ‘With 95% confidence, between 27% and

33% of American adults believe that drugs are the most serious problem facing our nation’s public schools.’ Explain to someone who knows no statistics what the phrase ‘ninety-five percent confidence’ means in this report” (Moore, 1997, p. 468).




“A student reads that a 95% confidence interval for the mean NAEP quantitative score for men of ages 21 to 25 is 267.8 to 276.2. Asked to explain the meaning of this interval, the student says, ‘ninety-five percent of all young men have scores between 267.8 and 276.2.’ Is this student right? Justify your answer” (Moore, 1997b, p.

217).

Hypothesis Tests



“The other major type of formal inference is the

test of significance. The purpose of a statistical test is to assess the evidence provided by the data against some claim about a parameter. A test says, ‘If we took many samples and the claim were true, we would rarely get a result like this.’

Observing a result that would rarely occur if a claim were true is evidence that the claim is not true. Replace the word ‘rarely’ by a probability and you have a numerical measure of our confidence in the evidence that the data give us”

(Moore, 1997, p. 483).






Generic Example: Suppose we want to compare a new teaching method (A) against another one

(B). We might start by guessing that teaching method A will work better.

We would then state a null and alternative hypothesis: Null – Mean posttest scores for the two groups will be identical. Alternative: Mean posttest scores for group A will be greater than group B.




If we believe in teaching method A, we hope to gather evidence against the null hypothesis and in support of the alternative.



If we gather enough evidence (enough and

significance being defined in probabilistic terms), we can reject the null hypothesis. Note, however, that this does not prove the alternative hypothesis. All that any sociological study can do is to gather evidence.




Suppose you read in an educational research report that students’ posttest scores after receiving teaching method A were significantly higher than those of students who received teaching method B. Does this prove that teaching method A is more effective than teaching method B? Why or why not?

Statistical Inference in Education

Introduction to

Statistical Inference

Related documents

Products

Support

Statistical Inference in Education

Introduction to

Statistical Inference

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib