Descriptive Statistics and the Normal Distribution HPHE 3150

advertisement

Descriptive Statistics and the

Normal Distribution

HPHE 3150

Dr. Ayers

Introduction Review

• Terminology

• Reliability

Validity

• Objectivity

Formative vs Summative evaluation

• Norm- vs Criterion-referenced standards

Scales of Measurement

• Nominal

• name or classify

Major, gender, yr in college

Ordinal

• order or rank

• Sports rankings

• Continuous

• Interval equal units, arbitrary zero

• Temperature, SAT/ACT score

• Ratio equal units, absolute zero (total absence of characteristic)

• Height, weight

Summation Notation

• S is read as "the sum of"

X is an observed score

• N = the number of observations

Complete ( ) operations first

• Exponents then * and / then + and -

Operations Orders

65

26

-5

4 2 -3

Summation Notation Practice:

Mastery Item 3.2

Scores:

3, 1, 2, 2, 4, 5, 1, 4, 3, 5

Determine:

X

(

X) 2

X 2

30

900

110

Percentile

The percent of observations that fall at or below a given point

Range from 0% to 100%

Allows normative performance comparisons

If I am @ the 90 th percentile, how many folks did better than me?

Test Score Frequency Distribution

Figure 3.1

(p.42 explanation)

53

54

55

Total

48

49

50

51

52

Valid

41

43

44

45

46

47

1

65

3

2

7

6

11

8

3

Frequency

1

3

3

5

5

7

16.9

12.3

10.8

9.2

4.6

4.6

3.1

1.5

100.0

Percent

1.5

4.6

4.6

7.7

7.7

10.8

Valid Percent

1.5

4.6

4.6

7.7

7.7

10.8

16.9

12.3

10.8

9.2

4.6

4.6

3.1

1.5

100.0

Cumulative Percent

1.5

6.2

10.8

18.5

26.2

36.9

53.8

66.2

76.9

86.2

90.8

95.4

98.5

100.0

Central Tendency

Where do the scores tend to center?

• Mean sum scores / # scores

• Median (P

50

) exact middle of ordered scores

• Mode most frequent score

• Mean

• Median

(P

50

)

• Mode

Raw scores

2

7

5

5

1

5

5

7

Rank order

1

2

Mean: 4 (20/5)

Median: 5

Mode: 5

Distribution Shapes

Figure 3.2

So what? OUTLIERS

Direction of tail = +/-

Distribution of Initial CRF

Mean = 11.7

SD = 2.0

Normal Density

Superimposed

5 7 9 11 13 15

CRF at Initial Examination (METs)

17

Based on 15,242 maximal GXT

19

Kampert, MSSE, Suppl. 2004, p. S135

Histogram of Skinfold Data

60

50

40

30

20

10

0

10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

Three Symmetrical Curves

Figure 3.3

The difference here is the variability;

Fully normal

More heterogeneous

More homogeneous

Descriptive Statistics I

What is the most important thing you learned today?

What do you feel most confident explaining to a classmate?

Descriptive Statistics I

REVIEW

Measurement scales

Nominal, Ordinal, Continuous (interval, ratio)

Summation Notation:

3, 4, 5, 5, 8 Determine:

X, (

X) 2 , ∑

X 2

9+16+25+25+64 25 625 139

Percentiles: so what?

Measures of central tendency

3, 4, 5, 5, 8

Mean (?), median (?), mode (?)

Distribution shapes

Variability

Range

Hi – Low scores only (least reliable measure; 2 scores only )

Variance (s 2 ) inferential stats

Spread of scores based on the squared deviation of each score from mean

Most stable measure of variability

Error

True

Variance

Total variance

Standard Deviation (S) descriptive stats

Square root of the variance

S

Most commonly used measure of variability

S

2

Variance

(Table 3.2)

The didactic formula

S

2 

 

X n

1

M

2

4+1+0+1+4=10 10 = 2.5

5-1=4 4

The calculating formula

S

2 

X

2

 

2

 n

1 n

55 - 225 = 55-45=10 = 2.5

5 4 4

4

Standard Deviation

The square root of the variance

S

S

2

Nearly 100% scores in a normal distribution are captured by the mean + 3 standard deviations

M + S

100 + 10

The Normal Distribution

M + 1s = 68.26% of observations

M + 2s = 95.44% of observations

M + 3s = 99.74% of observations

Calculating Standard Deviation

S

 

X

M

2

N Raw scores

3

7

4

5

1

20

(XM )

-1

3

0

1

-3

0

Mean: 4

(XM ) 2

1

9

0

1

9

20

S= √20

5

S= √4

S=2

Coefficient of Variation (V)

Relative variability

Relative variability around the mean OR determine homogeneity of two data sets with different units S / M

Relative variability accounted for by the mean when units of measure are different (ht, hr, running speed, etc.)

Helps more fully describe different data sets that have a common std deviation (S) but unique means (M)

Lower V=mean accounts for most variability in scores

.1 - .2=homogeneous >.5=heterogeneous

Descriptive Statistics II

• What is the “muddiest” thing you learned today?

Descriptive Statistics II

REVIEW

Variability

Range

Variance : Spread of scores based on the squared deviation of each score from mean Most stable measure

Standard deviation Most commonly used measure

Coefficient of variation

Relative variability around the mean (homogeneity of scores)

Helps more fully describe relative variability of different data sets

50+10

What does this tell you?

Standard Scores

Z or t

Set of observations standardized around a given M and standard deviation

Z

X

S scores in the group

M

• Converting scores to Z scores expresses a score’s distance from its own mean in sd units

• Use of standard scores: determine composite scores from different measures (bball: shoot, dribble); weight?

• Z-score

M=0, s=1

Standard Scores

T-score

T = 50 + 10 * (Z)

M=50, s=10

Percentile p = 50 + Z (%ile)

Z

X

M

S

T

50

10

X

M

S p

Z

50

X

 z  (

S

)

Conversion to Standard Scores

Raw scores

3

7

4

5

1

Mean: 4

St. Dev: 2

0

1

-3

X-M

-1

3

Z

-.5

1.5

0

.5

-1.5

Z

X

M

S

SO WHAT?

You have a Z score but what do you do with it? What does it tell you?

Allows the comparison of scores using different scales to compare “apples to apples”

Normal distribution of scores

Figure 3.6

99.9

Descriptive Statistics II

REVIEW

Standard Scores

• Converting scores to Z scores expresses a score’s distance from its own mean in sd units

• Value?

Coefficient of variation

Relative variability around the mean ( homogeneity of scores )

Helps more fully describe relative variability of different data sets

100+20

What does this tell you?

Between what values do 95% of the scores in this data set fall?

Normal-curve Areas

Table 3.4

• Z scores are on the left and across the top

Z=1.64: 1.6 on left , .04 on top=44.95

Since 1.64 is +, add 44.95 to 50 (mean) for 95 th percentile

Values in the body of the table are percentage between the mean and a given standard deviation distance

• ½ scores below mean, so + 50 if Z is +/-

• The "reference point" is the mean

+Z=better than the mean

• -Z=worse than the mean

p. 51

Area of normal curve between 1 and

1.5 std dev above the mean

Figure 3.7

Normal curve practice

Z score Z = (X-M)/S

• T score T = 50 + 10 * (Z)

• Percentile P = 50 + Z percentile ( +: add to 50, -: subtract from 50 )

• Raw scores

Hints

Draw a picture

• What is the z score?

• Can the z table help?

Assume M=700, S=100

Percentile

64

T score

53.7

43

z score

.37

–1.23

17

68

68

.57

Raw score

737

618

835

Descriptive Statistics III

Explain one thing that you learned today to a classmate

• What is the “muddiest” thing you learned today?

Download