STA201 Intermediate Statistics Formula Sheet

advertisement

STA201 Intermediate Statistics

Formula Sheet

Luc Hens

Vesalius College

25 August 2015

Scientific notation, resetting the calculator

Calculators and statistical software often use scientific notation to display numbers. The letter E (or e ) stands for exponent of 10:

2.3E+04 = 2 .

3 × 10 4 = 2 .

3 × 10 000 = 23 000

2.3E-04 = 2 .

3 × 10

− 4 = 2 .

3 × 1

10 000

= 0 .

00023

Sometimes the TI-84 returns an error message . If nothing else works, reset the memory: Mem → Reset → Reset RAM. Resetting erases lists you stored.

Descriptive statistics

density = relative frequency (%) width of class interval sum of numbers in list average of a list of numbers = how many numbers there are in the list deviation from average = measurement − average

The standard deviation (SD) is the quadratic average ( root-mean-square ) of the deviations from the average:

SD = s sum of (deviations)

2 number of measurements

Shortcut formula for SD of a 0-1 list (p. 298):

SD of a 0-1 list = s fraction of ones

× fraction of zeroes standard score (p. 79) = how many SDs a value is above (+) or below ( − ) the average: value − average standard score of value =

SD

Page numbers refer to Freedman, D., Pisani, R., & Purves, R. (2007).

Statistics (4th edition) . New York: Norton. STA201 Intermediate Statistics is equivalent to STA301 Statistics for Business and Economics.

1

Descriptive statistics with the TI-84 :

Enter data in a list (e.g., L

1

): STAT → EDIT

STAT → CALC → 1-Var Stats L

1

¯ = ave, S x

Q1 = 25 th

= SD + , σ x

= SD, n percentile, Q3 = 75 th

= no. of measurements, percentile, Med = median

Sampling distributions

E (sum of draws) = (number of draws) × (average of box)

SE(sum of draws) =

√ number of draws × (SD of the box)

When the sample is reasonably large:

SE(sample percentage) ≈

SD of sample

√ sample size

× 100%

(compute SD of the sample using the shortcut formula for 0-1 lists)

SE(sample average) ≈

SD of sample

√ sample size

Confidence intervals

(only valid when the sample is large and random)

95% confidence interval for a parameter (general formula): point estimator ± 2 · (SE for estimator)

95% confidence interval for a population percentage : sample percentage ± 2 ·

SD of sample

√ sample size

× 100%

(compute SD of the sample using the shortcut formula for 0-1 lists)

TI-84 : STAT → TESTS → 1-PropZInt; x = number of times the event occurs

= sample percentage × sample size. To get percentages multiply the boundaries of the obtained confidence interval by 100%.

95% confidence interval for a population average : sample average ± 2 ·

SD of sample

√ sample size

TI-84 : STAT → TESTS → ZInterval (¯ = sample ave, σ x size)

= SD, n = sample

2

Hypothesis tests for an average or a percentage

The test statistic says how many SEs away an observed value is from its expected value, where the expected value is calculated using the null hypothesis: test statistic = estimator − hypothetical value

SE for estimator

Large random sample: z -test

For large random samples, the test statistic approximately follows the standard normal curve; in that case the test statistic is written as z . To compute the observed significance level ( P -value) use the normal curve .

Area under standard normal curve with the TI-84 :

DISTR → normalcdf(lower boundary, upper boundary) e.g.

, area under standard normal curve to right of 1:

DISTR → normalcdf(1, 10

99

) z -test with TI-84 : STAT → TESTS → Z-Test

µ

0

= value of ave in null hypothesis (= 50 on p. 477), ¯ = sample ave, σ = SD, n = sample size. Select the appropriate alternative hypothesis: average of box is different from 50 ( = µ

0

), less than 50 ( < µ

0

), more than 50 ( > µ

0

).

Small random sample: t -test

For small random samples:

– use SD

+ instead of SD to compute the SE of the estimator:

SD

+

= s sample size sample size − 1

× SD

– when the histogram of the population doesn’t look too different from the normal curve, the test statistic approximately follows Student’s curve ; in that case the test statistic is written as t . To compute the p -value for a test on one average, use Student’s curve with degrees of freedom = sample size − 1

Area under Student’s curve with the TI-84 :

DISTR → tcdf(lower boundary, upper boundary, degrees of freedom) e.g.

, area under Student’s curve with 4 degrees of freedom to right of 2.2:

DISTR → tcdf(2.2, 10 99 , 4) t -test with TI-84 : STAT → TESTS → T-Test; as in Z-Test, with Sx = SD + ;

SD + is Sx obtained in STAT → CALC → 1-Var Stats L

1

.

Hypothesis test for a difference (pp. 503–506) test statistic = observed difference − hypothetical value of difference

SE for difference

SE for the difference of two independent quantities (p. 502):

SE for difference = p

(SE for first quantity) 2 + (SE for second quantity) 2

3

Correlation, Regression

Correlation coefficient (p. 132). Convert each variable to standard units.

The average of the products gives the correlation coefficient: r = average of [( x in standard units) × ( y in standard units)]

Simple regression : line of best fit (p. 204): slope = r × SD of y

SD of x y -intercept = (ave of y ) − slope × (ave of x )

Correlation and simple regression with the TI-84 :

Enter data in two lists (e.g., x in L

1

, y in L

2

): STAT → EDIT

CATALOG → DiagnosticOn → ENTER → ENTER

STAT → CALC → LinReg(ax+b) L

1

, L

2

( a = slope, b = intercept)

( x list first, y list second) confidence interval for slope: STAT → TESTS → LinRegTInt test for H

0 that slope = 0: STAT → TESTS → LinRegTTest

Multiple regression :

Statistical software provides a table with the estimated coefficients and SEs, and the test statistics and p -values for the null hypothesis (against a two-sided alternative) that the coefficients of the population regression function are equal to zero.

Chi-square test (pp. 525–506) test statistic = χ

2

= sum of

(observed frequency − hypothetical frequency) 2 hypothetical frequency

The p -value is approximately equal to the area to the right of the observed value for the χ 2 test statistic, under the χ 2 -curve with the appropriate number of degrees of freedom. When the model is fully specified (no parameters to estimate): degrees of freedom = (number of terms in χ

2

) − 1

Area under χ

2

-curve with the TI-84 :

DISTR → χ 2 cdf(lower boundary, upper boundary, degrees of freedom) e.g.

, area under χ 2 -curve with 5 degrees of freedom to the right of 14.2:

DISTR → χ 2 cdf(14.2, 10 99 , 5)

4

Download
Study collections