Student Review Project

advertisement
Unit 9: Statistics
By: Jamie Fu and Neha Surapaneni
Central Tendency
❖ Mean/average( ): sum of n numbers divided by n
❖ Median: The middle number of n numbers that are written in
numerical order (If n is even, take the average of the middle two
numbers)
❖ Mode: The number or numbers that occur most in a set of n
numbers
❖ Range: The difference between the largest and smallest numbers
of a set of n numbers
of values
❖ Standard Deviation(σ): DescribesN=number
the =typical
difference between a
mean
X1-n=each of the values of the set
data value and mean
Sample Problem
You are competing in a hockey tournament. The
winning scores for the first 10 games are:
11, 12, 13, 13, 14, 15, 15, 15, 15, 17
Find the mean, mode, median, range,standard
deviation
Mean: (11+12+13…+17)/10 = 14 Median:
(14+15)/2= 14.5
Mode: 15 Range: 17-11 = 6
Std. Dev.: σ= √[(11-14)^2 + (12-14)^2 …+ (17-14)^2]/10 ≈ 1.7
Scatter plots
❖ X-axis is independent variable
❖ Y-axis is the dependent variable
❖ Correlation coefficient: How
strong the relationship is
between points
Box and Whisker plots
❖ Lower quartile=median between median
and lowest value
❖ Upper quartile=median between median
and highest value
Histograms
❖ The bars should be touching
❖ Bars should be the same width
❖ Width should represent a quantitative value
not categorical
❖ The height indicates frequency
Sample Problem: Box & Whisker vs. Histogram
The annual incomes for 6 professions are shown
below:
Farming: 19,630 Sales: 28,920 Architecture: 56,330
Professional Athlete: 2,476,590 Legal: 69,030 Teaching: 39, 130
Which graph will better represent the data?(histogram/box and whisker)
Box and Whisker, due to the big outlier,the athlete
Which graph will better represent the data when the outlier is
removed?
Histogram, because the data is closer together
Frequency Distribution
❖
❖
❖
❖
❖
Lower Class limit: lowest value in a class
Upper Class Limit: Highest value in a class
Class Mark: midpoint of a class
Formula: (Lower class limit + upper class limit)/2
Class Width: the difference between the lower and upper
class limits to the next highest whole number
❖ Formula: (high-low)/number of classes you want
❖ Class Boundaries: the endpoints of the bars in a histogram
Frequency tables
❖ Use frequency distribution to organize data
❖ Used in creating
Class
Tally
histograms
Limits
Class
Boundaries
Lower-Upper
Frequenc
y
Class
Midpoint
1-8
0.5-8.5
11111
11111
1111
14
4.5
9-16
8.5-16.5
11111
11111
11111
11111
1
21
12.5
17-24
16.5-24.5
11111
11111
11
20.5
Sample Frequency Tables
❖ Sample mean: = [∑(x.f)]/n
❖ Sample Standard Deviation:
❖ Variance: sample std. dev. without the square
root
x= class mark
n= # of entries
f= frequency
Sample Frequency Tables
Class Limits
Marks
x✖f
Frequency
v
(class midpoint)
1-10
5.5
34
187
112.36
3820.24
11-20
15.5
18
279
.36
6.48
21-30
25.5
Mean:
31+
16.135.5
17
433.5
88.36
1502.12
∑=80
∑=1290
∑=9468.8
Standard
Variance:
11Deviation:√119.9
390.5
376.36 119.94139.96
✖f
Bell Curves
❖ Normal curve: A normal distribution is a frequency
distribution where there are large number of values in a set
of data: a symmetric, bell-shaped curve
❖ Middle value is the mean, which is 0
standard deviations away from the mean
Mean
❖ Probability: When randomly choosing a
value from a set, the probability of
choosing a number 1 S.D. away is 68%
Choosing a Graph:
Use...when...
❖ Scatter plot: Set of points to find correlation
❖ Box & Whisker: Outliers
❖ Histogram: numbers are close to each other,
showing for frequency of #s in a certain
range
❖ Bell Curve: Showing where majority of data
is; finding probability of a certain # picked
out of set
Data Classification
Population: Group of people or objects you want info about
Sample: Subset of population
Types of Samples
❖ Self-selected sample: Members volunteer to be samples
❖ Systematic sample: A rule is used to select members
❖ Convenience sample: Easy-to-reach members
❖ Random sample: Each member has equal chance of selection
Bias and Unbias
❖ Biased sample: Over or under represents parts of population
❖ Unbiased sample: Representative of population you want info on
Margin of Error
❖ Gives limit on how much the responses of your sample
differ from responses from the whole population
❖ ±(1/√N)
❖ Percent of the population that responds in the same
way as the sample responding a certain way(p) is likely
to be between...
❖ P-(1/√N) and P+(1/√N)
Calculating Probability
❖ What is the probability that a number(x) will be ≥, ≤, <, >
+nσ (≤ and < are the same, ≥ and > are the same)
❖ Example: P(x≥ +σ)
❖ The percent between +1 and +2 is 13.5%
❖ The percent between +2 and +3 is
Mean
2.35%
34
❖ After +3 is .15%
13.
❖ Therefore, P(x≥ +σ) is 13.5+2.35+.15
5 2.35
=16%
.1
5
Calculating Probability
Example: Normal distribution has a mean of 27 and a
standard deviation of 5
What is P(22<x<32)?
22 is 27( )-5(σ) and 32 is 27+5 so it is
one standard deviation away
Mean
Therefore, the percentage should be in
34
between -σ and +σ, which is 68%
13.
5 2.35
.1
5
Calculating Probability using zscore
❖ The standard normal distribution is the normal distribution with mean 0 and
standard deviation 1
❖ The formula (x- )/σ can be used to transform x-values from a normal
distribution with mean and standard deviation σ into z-values having a
standard normal distribution with mean 0 and standard deviation 1
❖ The z-value for a particular
x-value is
called the z-score for
the x-value and
is the number of
standard deviations the xvalue
lies above or below the mean
Calculating Probability using zscore
A normal distribution has a mean of 75 and a standard deviation of 10
Use standard normal table(find z-score first) to find P(x≤70)
Answer: Use (x- )/σ to find z-score [ (70-75)/10 = -0.5 ]
The z-score is -0.5; Choose row -0 and column .5 to form -0.5
The cross section shows the number
.3085 which is 30.85% which is ≈31%
The probability of P(x≤70) ≈31%
When using z-score and it asks for
P(x≥n), then take 1 MINUS the z-score
Standard Error of the Mean
❖ Standard Error( ) measures how well a sample
mean estimates the true mean of a population
❖ Formula:
Q: The mean weight of 36 boys on the wrestling team is 136.4lbs and the standard
deviation is 4.1 what is the standard error of the mean?
A: 4.1÷√36 = 4.1÷6 ≈ .68 = 68%
Confidence Interval
❖ Lower Confidence Limit
❖ Upper Confidence Limit
❖ formula for confidence interval around pop,
mean(μ):
❖ critical confidence value(zc)is found using this
table
Confidence Interval Example
Q: In a sample of 100 families a Tv reporter found
that kids watch an avg. of 4.6hrs of TV a day. The σ
is 1.4hrs.
a) Find the ? 1.4÷√100 = 1.4÷10= .14 = 14%
b) Find the 50% and 99% confidence intervals for the
number of hours all kids in Texas watch T.V.
50%: 4.6±(.6745)(.14) = 4.51 ≤μ≤4.69
99%: 4.6±(.2.58)(.14) = 4.24 ≤μ≤4.96
c) Which interval has a wider range? 99%
Common Mistakes
❖ Finding the proper class limits: sometimes the formula
doesn’t work
❖ Remember: when using z-score and the probability
asks for ≥ a number, then you take 1 MINUS the zscore
❖ Deciding the formulas to use in real life situations
❖ Deciding on the classification of the data(which sample
to use)
Terminology
= mean
σ = standard deviation
f = frequency
x = midpoint
= standard error of the mean
Zc = Critical confidence value
N = number of participants, number of entries, etc.
∑ = summation, sum of all the numbers
z = z-score, fractional part of data that lies in the interval ±tσ
t = number of standard deviations away from the mean
Using a Graphing Calculator
Correlation Coefficient: Stat, 1, 2nd, 0, D,
diagnostics on, enter, enter. stat, calc, 4 (It is r)
Central Tendency: Stat, 1, stat, calc
Get numbers in order: stat, 1, 2nd, stat, ops, 1, 2nd,
1, stat, 1.
Line of best fit: stat, 1, 2nd, y=, plot on, stat, calc, 4
Statistics Formulas
Margin of Error: ± 1 /√N
Class WIdth: (high - low) / # of classes
mean
Class Midpoint: (upper limit + lower limit) / 2
Mean of Frequency Distribution: [∑(x.f)]/n
Sample standard deviation:
Z-score: (x- )/σ
Standard
Standard Error of the Mean:
Deviation
Confidence Interval:
Download