Uploaded by Swapan Saha

Summary Of Material

advertisement
Math 10 - Statistics Winter 2021 Summary of Material
Frequency Tables and Relative Frequency Tables
Relative Frequency =
Frequency of outcome
Number of observations
Lower Class Limit = lowest data value allowed in a class
Upper Class Limit = highest data value allowed in a class
Lower Class Boundary = lower class limit – 0.5
Upper Class Boundary = upper class limit + 0.5
Class Width is the difference between consecutive lower class limits.
To calculate the class width:
Class Width =
largest data value−smallest data value
desired number of classes
; (+ 1 if an integer otherwise round up )
To calculate the class midpoint:
Class midpoint =
lower class limit+ lower class limit of next class
2
Frequency Distributions can be: Symmetric, Uniform, Skewed left, Skewed right, Unimodal, Bimodal
Sample Data: [ n = sample size ]
Sample Mean:
๐‘ฅฬ… =
∑๐‘ฅ
๐‘›
, (round to one more decimal place than the data)
∑(๐‘ฅ−๐‘ฅฬ… )2
Sample Variance:
๐‘ 2 =
Standard Deviation:
๐‘  = √๐‘  2
๐‘›−1
๐‘œ๐‘Ÿ
1
๐‘  2 = (๐‘›−1) [∑ ๐‘ฅ 2 −
(∑ ๐‘ฅ)2
๐‘›
]
Population Data: [ N = sample size ]
Population Mean:
๐œ‡=
∑๐‘ฅ
๐‘
Population Variation: ๐œŽ 2 =
, (round to one more decimal place than the data)
∑(๐‘ฅ−๐œ‡)2
๐‘
Standard Deviation: ๐œŽ = √ ๐œŽ 2
Lauri Papay March 12, 2021
Measures of position
5 - Number Summary: ( smallest data value, Q1, Q2, Q3, largest data value)
pth Percentile: Data value for which p% of the values lie to the left of p and 100-p% lies to the right.
Q1 = 25th Percentile,
Q2 = 50th Percentile = median,
Q3 = 75th Percentile
To find the pth percentile in a sorted list of n elements:
1) multiply: n *p (in decimal form)
2) If n *p is an integer i,
then the pth percentile is the average of the i th and ( i +1)st elements in the list.
If n *p is not an integer,
then round up to next higher integer to find the position of the pth percentile.
.
Data Range: largest data value − smallest data value
Interquartile Range (IQR): Q3 – Q1
Lower Outlier Boundary = Q1 – (1.5)IQR
Upper Outlier Boundary = Q3 + (1.5)IQR
Empirical Rule for Symmetrical Mound Shaped Distributions:
Empirical Guidelines for a symmetric distribution with mean ๐œ‡ and standard deviation ๏ณ
68% of the data lies within ๐œ‡ ± ๏ณ
Approximately
95 % of the data lies within ๐œ‡ ± 2๏ณ
Almost all of the data lies within ๐œ‡ ± 3๏ณ
Z-Score:
z =
๐‘ฅ−๐œ‡
๐œŽ
= the number of standard deviations the data value ๐‘ฅ is from the mean.
Correlation and Regression:
x = Explanatory Variable
y = Response Variable
To test a set of (x, y) data for linearity, we calculate the correlation coefficient.
Correlation Coefficient: ๐‘Ÿ =
1
๐‘›−1
∑(
๐‘ฅ− ๐‘ฅฬ…
๐‘ ๐‘ฅ
)(
๐‘ฆ− ๐‘ฆฬ…
๐‘ ๐‘ฆ
)
Lauri Papay March 12, 2021
If ๐‘Ÿ is closer to 1, it defines a positive linear relationship
If ๐‘Ÿ is closer to 0, there is no linear relationship
If ๐‘Ÿ is closer to -1, it defines a negative linear relationship
ฬ‚ = ๐‘0 + ๐‘1 ๐‘ฅ ;
Least squares linear regression line: ๐‘ฆ
๐‘1 = ๐‘Ÿ
๐‘ ๐‘ฆ
๐‘ ๐‘ฅ
; ๐‘0 = ๐‘ฆฬ… + ๐‘1 ๐‘ฅฬ…
Coefficient of determination = ๐‘Ÿ 2 = fractional amount of the total variation in y that can be
explained by using the linear model ๐‘ฆฬ‚ = ๐‘Ž + ๐‘๐‘ฅ.
Total deviation in y: ๐‘ฆ − ๐‘ฆฬ…; Explained Variation: ๐‘ฆฬ‚ − ๐‘ฆฬ…; Unexplained Variation: ๐‘ฆ − ๐‘ฆฬ‚
Residual: ๐‘ฆ − ๐‘ฆฬ‚ for each (x, y) pair
Influential Point: any point that strongly affects the least squares regression line
Interpolation: The use of the least squares regression line to predict the output of a data point
within the range of known data points
Extrapolation: The use of the least squares regression line to predict the output of a data point
that is outside of the range of known data points
Correlation verses causation. Even when two variables are highly correlated, it is not necessarily the case
that changing the value of one of them will cause a change in the other.
Residual plot. When a residual plot has no apparent pattern, a linear model is appropriate.
Probabilities:
Sample space = S = all possible outcomes of an experiment.
If S contains equally likely outcomes, and A is a subset of S, then ๐‘ƒ (๐ด) =
n( A )
n(S)
Event A = a subset of the sample space.
Event Ac = The subset of elements in the sample space that are not in A.
P(A) = probability that an outcome will be in A.
P(Ac ) = probability that the outcome of an experiment will not be in event A.
P(A) + P(Ac ) = 1
Conditional Probabliity – “The probability of A given B” denoted P( A | B )
Lauri Papay March 12, 2021
P( A | B ) =
P( A
๐€๐๐ƒ B
P(B)
)
; P(B) ๏‚น 0 (probability of A occurring given that B has occurred)
Multiplication Rule: P( A | B ) P(B) =
P( A ๐€๐๐ƒ B )
A and B are independent if P( A | B ) = P(A), which is equivalent to P( A ๐€๐๐ƒ B ) = P(A)๏ƒ—P(B)
A and B are mutually exclusive if p(A ๐€๐๐ƒ B) = 0
Addition Rules:
P( A OR B ) = P(A) + P(B) - P( A ๐€๐๐ƒ B ) [for all events A and B]
P( A OR B ) = P(A) + P(B)
[ If A and B are mutually exclusive events]
RANDOM VARIABLES
Discrete Random Variable Properties:
Properties of a Probability Distribution f ( ๐‘ฅ) that describes the distribution of a discrete random
variable X.
1) 0 ๏‚ฃ f ( ๐‘ฅ ) ๏‚ฃ 1 for all f ( ๐‘ฅ )
2) ∑ ๐‘“ (๐‘ฅ ) = 1
Expected Value = E[X] = ๏ญ =∑ ๐‘ฅ๐‘“(๐‘ฅ )
[ f(๐‘ฅ) = probability that ๐‘ฅ will occur ]
Variance = Var(X) = ๏ณ2 = ∑(๐‘ฅ − ๏ญ)2 ๐‘“(๐‘ฅ )
Standard Deviation ๏ณ = √Var(X)
Standard Deviation
= √Var(X)
Binomial
Random ๏ณVariables
A binomial trial is an activity for which
1) There are only 2 possible outcomes labeled Success and Failure.
2) The probabilities of Success = p, and Failure = 1-p, remain constant.
3) The trials are independent.
Lauri Papay March 12, 2021
Binomial Random Variable Properties:
A binomial random variable X is one that is assigned the number of successes from n binomial
trials. X takes on the values 0, 1, …, n which are all possible numbers of success for n trials.
๐‘ƒ[ ๐‘ฅ ] = Probability that there are exactly ๐‘ฅ successes out of ๐‘› trials
๐‘ƒ[ ๐‘ฅ] = ๐‘›๐‘ช๐‘ฅ ๐‘ ๐‘ฅ ๐‘ž ๐‘›−๐‘ฅ
for any ๐‘ฅ = 0, 1, …, n
p = P[success] , 1-p = P[ failure]
n = number of trials, p = P[success] , (1-p) = P[ failure]
Expected Value = E[X] = ๏ญ = np,
Variance = Var(X) = ๏ณ2 = np(1-p)
Standard Deviation ๏ณ = √๐‘›๐‘(1 − ๐‘)
๐‘ƒ[ ๐‘Ž ≤ ๐‘ฅ ≤ ๐‘] = Probability that ๐‘ฅ is between a and b inclusive.
๐‘ƒ[ ๐‘Ž < ๐‘ฅ < ๐‘] = Probability ๐‘ฅ is between a and b, not including a or b.
On the TI-84 silver plus (and possibly other similar models):
binompdf( n, p, x ) = probably of exactly x successes;
binomcdf( n, p, x ) = probably of at most x successes
Continuous Random Variable Properties:
Properties of a Probability Distribution f ( ๐‘ฅ) that describes the distribution of a continuous
random variable X:
1) The total area under the curve f ( ๐‘ฅ) = 1
2) P[ a ๏‚ฃ ๐‘ฅ ๏‚ฃ b ] = The total area under the curve between a and b
3) ๐‘“ (๐‘ฅ ) ≥ 0 for all ๐‘ฅ
Population Distribution: x - Distribution
Standard Normal Distribution:
A bell shaped density distribution with mean = ๏ญ = 0 and standard Deviation = ๏ณ = 1.
Empirical Guidelines for a symmetric Bell-Shaped Distribution state that:
Approximately: 68% of the data lies within ๐œ‡ ± ๏ณ
95 % of the data lies within ๐œ‡ ± 2๏ณ
99.7% of the data lies within ๐œ‡ ± 3๏ณ
Lauri Papay March 12, 2021
Probabilities for a Standard Normal Population
Distribution
๐‘ƒ (๐‘Ž ≤ ๐‘ฅ ≤ ๐‘) can be interpreted in two ways.
1) It is he proportion of data between the values a and b
2) It is the probability that a randomly chosen data value will be between the values a and b.
๐‘ฅ−๐œ‡
Standardized variable z for a population distribution: z =
๐‘Ž− ๐œ‡
๐‘ƒ(๐‘Ž ≤ ๐‘ฅ ≤ ๐‘) = ๐‘ƒ (
๐œŽ
≤๐‘ ≤
๐œŽ
๐‘− ๐œ‡
๐œŽ
)
On the TI-84 silver plus (and possibly other similar models):
Let a and b represent data values from a distribution
If working with z-scores:
normcdf( za, zb, 0, 1) = the proportion of data between z-scores za and zb
If working with data values:
normcdf (a, b, ๏ญ, ๏ณ) = the proportion of data between the data values a and b,
where the mean is ๏ญ, and the standard deviation is ๏ณ.
To find the z-score for which p % of the data lies to the left we use Invnorn( p, 0, 1)
To find the data value for which p % of the data lies to the left we use Invnorn( p, ๏ญ, ๏ณ)
SAMPLING DISTRIBUTIONS
Sampling Distribution for the mean ๐ :
Let ๐‘ฅ = { ๐‘ฅ1 , ๐‘ฅ2 , … , ๐‘ฅ๐‘› } be a simple random sample of the population X.
If the population X is not a normal distribution, then it is required that n > 30.
๐‘ฅฬ… =
(๐‘ฅ1 + ๐‘ฅ2 +โ‹ฏ+ ๐‘ฅ๐‘› )
๐‘›
;
๐œ‡๐‘ฅฬ… = ๐œ‡
Lauri Papay March 12, 2021
Probabilities given a sampling distribution for the mean (๐ˆ ๐’Œ๐’๐’๐’˜๐’):
Standard deviation of the sampling distribution: ๐œŽ๐‘ฅฬ… =
๐œŽ
√๐‘›
z =
Standardized variable z for the sampling distribution:
๐‘ƒ(๐‘Ž ≤ ๐‘ฅฬ… ≤ ๐‘) = ๐‘ƒ (
๐‘Ž−๐œ‡
๐œŽ
√๐‘›
≤๐‘ ≤
๐‘−๐œ‡
๐œŽ
√๐‘›
[n = sample size]
๐‘ฅฬ… −๐œ‡
๐œŽ
√๐‘›
)
Probabilities given a sampling distribution for the mean (๐ˆ ๐’–๐’๐’Œ๐’๐’๐’˜๐’):
Standard deviation of the sampling distribution: ๐‘ ๐‘ฅฬ… =
๐‘ 
√๐‘›
Standardized variable t for the sampling distribution: t =
[n = sample size]
๐‘ฅฬ… −๐œ‡
๐‘ 
√๐‘›
[Use Student’s t Distribution]
๐‘ƒ(๐‘Ž ≤ ๐‘ฅฬ… ≤ ๐‘) = ๐‘ƒ (
Degrees of freedom( d.f.) = n – 1 ,
๐‘Ž−๐œ‡
๐‘ 
√๐‘›
≤๐‘ก ≤
๐‘−๐œ‡
๐‘ 
√๐‘›
)
Sampling Distribution for the proportion p:
p = population proportion = proportion of a population having some characteristic
n = size of sample;
x = number in the sample that have the characteristic
If np ≥ 10 and n(1-p) ≥ 10 then ๐‘ฬ‚ can be approximated by a normal random variable
๐‘ฬ‚ =
๐‘ฅ
๐‘›
is the sample proportion
Probabilities for a sampling distribution of a proportion ๐’‘:
๐œ‡๐‘ฬ‚ = ๐‘;
๐œŽ๐‘ฬ‚ = √
ฬ‚ ≤ ๐‘) = ๐‘ƒ (
๐‘ƒ (๐‘Ž ≤ ๐‘
๐‘(1−๐‘)
๐‘Ž−๐œ‡๐‘ฬ‚
๐œŽ๐‘ฬ‚
๐‘›
;
≤๐‘ง ≤
z =
๐‘−๐œ‡๐‘ฬ‚
๐œŽ๐‘ฬ‚
๐‘ฬ‚−๐œ‡๐‘ฬ‚
๐œŽ๐‘
ฬ‚
)
Lauri Papay March 12, 2021
CONFIDENCE INTERVALS
Assumptions for Confidence Intervals for a mean ๏ญ
1. We have a simple random sample.
2. The sample size is large (n > 30), or the population is approximately normal.
Confidence Intervals for ๏ญ (๏ณ ๐ค๐ง๐จ๐ฐ๐ง): [ NOTE zc = ๐’›๐œถ ]
๐Ÿ
If ๐‘ฅฬ… is the point estimator for ๐œ‡, c is the confidence level, and n = sample size, then:
Margin of Error ๐ธ = ๐‘ง๐‘ โˆ™
Resulting in the C % confidence interval:
๐œŽ
√๐‘›
( ฬ…๐‘ฅ − ๐ธ , ๐‘ฅฬ… + ๐ธ )
Confidence Intervals for ๏ญ (๏ณ ๐ฎ๐ง๐ค๐ง๐จ๐ฐ๐ง): [ NOTE tc = ๐’•๐œถ ]
๐Ÿ
If ๐‘ฅฬ… is the point estimator for ๐œ‡, c is the confidence level, and n = sample size, then:
๐‘ 
Margin of Error ๐ธ = ๐‘ก๐‘ โˆ™ ๐‘›
√
[Use Student’s t Distribution d.f. = n -1]
( ฬ…๐‘ฅ − ๐ธ , ๐‘ฅฬ… + ๐ธ )
Resulting in the C % confidence interval:
Assumptions for Confidence Intervals for a proportion ๐’‘
1. We have a simple random sample.
2. The population is at least 20 times as large as the sample.
3. The items in the population are divided into two categories.
4. The sample must contain at least 10 individuals in each category.
Confidence Intervals for a proportion ๐’‘: [ NOTE zc = ๐‘ง๐›ผ ]
2
๐‘ฅ
If ๐‘ฬ‚ = ๐‘› ; E = ๐‘ง๐‘ √
๐‘ฬ‚ (1 −๐‘ฬ‚)
๐‘›
(c = level of confidence);
Resulting in the C % confidence interval:
(๐‘ฬ‚ − ๐ธ , ๐‘ฬ‚ + ๐ธ )
Assumptions for Confidence Intervals for a standard deviation s
1. We have a simple random sample.
2. The population must have a normal distribution.
Lauri Papay March 12, 2021
Confidence Intervals for a standard deviation s:
(๐‘›−1)๐‘  2
๏ฃ2๐›ผ
2
< ๏ณ2 <
(๐‘›−1)๐‘  2
๏ฃ2
1−
Resulting in confidence interval: (√
๐›ผ
2
(๐‘›−1)๐‘ 2
๏ฃ2๐›ผ
2
, √
(๐‘›−1)๐‘ 2
๏ฃ2
๐›ผ
1−
2
)
HYPOTHESIS TESTING
If H1 contains the inequality symbol < a left-tail test is performed
If H1 contains the inequality symbol > a right-tail test is performed
If H1 contains the inequality symbol ๏‚น a two-tail test is performed
HYPOTHESIS TESTS FOR A MEAN
Assumptions for Hypothesis test of a mean ๏ญ
1. We have a simple random sample.
2. The sample size is large (n > 30), or the population is approximately normal.
HYPOTHESIS TEST (P-value Method): (Given significance level ๏ก)
1) State Null Hypothesis denoted H0 (usually status quo)
State Alternative Hypothesis denoted H1 (proposed change)
2) Calculate the test statistic for ๐‘ฅฬ… .
๐‘ง๐‘ฅฬ… =
๐‘ฅฬ… − ๏ญ
๏ณ
⁄ ๐‘›
√
;
or
๐‘ก๐‘ฅฬ… =
๐‘ฅฬ… − ๏ญ
s
⁄ ๐‘›
√
(d. f. = n – 1);
3) Calculate p-value using either the Normal Standard Distribution, or the Student’s
T-Distribution.
4) Reject or fail to Reject the null Hypothesis based on the significance level ๏ก.
5) Interpret your conclusion
Summarize your decision within the context of the problem.
Lauri Papay March 12, 2021
HYPOTHESIS TEST (Critical value Method): (Given significance level ๏ก)
1) State Null Hypothesis denoted H0 (usually status quo)
State Alternative Hypothesis denoted H1 (proposed change)
2) Calculate the test statistic for ๐‘ฅฬ… .
๐‘ง๐‘ฅฬ… =
๐‘ฅฬ… − ๏ญ
๏ณ
⁄ ๐‘›
√
;
or
๐‘ก๐‘ฅฬ… =
๐‘ฅฬ… − ๏ญ
s
⁄ ๐‘›
√
(d. f. = n – 1);
3) Find the critical value associated with significance level ๏ก
-zα for a left-tail, zα for a right-tail test, -z α/2 and z α/2 for a two-tailed test
Or
-tα for a left-tail, tα for a right-tail test, -t α/2 and t α/2 for a two-tailed test
4) Reject or fail to Reject the null hypothesis based on the Critical Value.
5) Interpret your conclusion
Summarize your decision within the context of the problem.
HYPOTHESIS TESTS FOR A PROPORTION
Assumptions for testing a proportion ๐’‘
1. We have a simple random sample.
2. The population is at least 20 times as large as the sample.
3. The items in the population are divided into two categories.
4. The sample must contain at least 10 individuals in each category.
HYPOTHESIS TEST (P-value Method): (Given significance level ๏ก)
1) State Null Hypothesis denoted H0 (usually status quo)
State Alternative Hypothesis denoted H1 (proposed change)
2) Calculate the test statistic for ๐‘ฬ‚ .
๐‘ง๐‘ฬ‚ =
๐‘ฬ‚− ๐‘
√
๐‘๐‘ž
๐‘›
3) Calculate p-value using the Normal Standard Distribution.
4) Reject or fail to Reject the null Hypothesis based on the significance level ๏ก.
Lauri Papay March 12, 2021
If p-value ๏‚ฃ ๏ก, Reject the null hypothesis
If p-value > ๏ก, Fail to reject the null hypothesis
5) Interpret your conclusion
Summarize your decision within the context of the problem.
HYPOTHESIS TEST (Critical value Method): (Given significance level ๏ก)
1) State Null Hypothesis denoted H0 (usually status quo)
State Alternative Hypothesis denoted H1 (proposed change)
2) Calculate the test statistic for ๐‘ฬ‚ .
๐‘ง๐‘ฬ‚ =
๐‘ฬ‚− p
๐‘๐‘ž
๐‘›
√
3) Find the critical value associated with significance level ๏ก
-zα for a left-tail, zα for a right-tail test,
or -z α/2 and
z α/2 for a two-tailed test
4) Reject or fail to Reject the null hypothesis based on the Critical Value.
5) Interpret your conclusion
Summarize your decision within the context of the problem.
Lauri Papay March 12, 2021
Download