STATS NOTESs09

advertisement
~1~
STATS NOTES
What questions do you have concerning this chapter?
Do you have any suggestions for improving the presentation of
this chapter?
Chapter 1
Objectives:
1. The student will be able to define statistics
2. The student will be able to identify disciplines that use
statistics.
3. The student will be able to classify statistics into four
areas.
4. The student will be able to differentiate a population for a
sample.
5. The student will be able to classify different types of data.
6. The student will be able to understand how statistics can be
mis-used.
Statistics is the science of collecting, organizing, presenting,
analyzing, and interpreting data for the purpose of assisting in
making a more effective decision. Pg 4
1/09/10
Undressing the Terror Threat WSJ W3
Applications of statistics: accounting, economics, finance,
management, marketing, sports, leisure, and households Pg 4-5
“An NBA MBA” WSJ W5
Are Statheads the NBA's Secret Weapon? WSJ W10
“Wal-Mart Seeks new Flexibility In Worker Shifts” WSJ
A1
Positive versus normative decision making.
11/3/06
3/10/10
1/03/07
BRANCHES OF STATISTICS
1.
DESCRIPTIVE STATISTICS - TRANSFORMS DATA
Excel
INTO INFORMATION
select: Tools
Data analysis
Descriptive
~2~
Minitab
Select: Stat/Display Descriptive Stat
2.
3.
PROBABILITY - UNCERTAINTY
INFERENCE DRAWS CONCLUSIONS ABOUT THE POPULATION
PARAMETERS
AFTER EXAMINING ONLY A SAMPLE(OR PORTION) OF THE DATA.
Sample - A portion, or part, of the population of interest
Sample size and representation - random sample
Numerical attributes of samples are called
statistics and are called variables
Population - A collection, of all possible individuals,
objects, measurements of interest. Pg 7
Numerical attributes of populations are alled
parameters and are constants.
4. SPECIAL TOPICS IN STATISTICS
TYPES OF DATA
1.
CONTINUOUS
2.
DISCRETE
Quantitative Data(units of measurement)
Pg 8-9
Qualitative
Data (Non-numeric
Levels of measurement Pg 10-13
Qualitative data
1. Nominal level - categories that do not show order
2. Ordinal level - categories that show order
Quantitative data
3. Interval level - Distance between values is constant
size.
4. Ratio level - meaningful zero(absence of the
characteristic), and ratio between two numbers is
meaningful
~3~
Mutually exclusive - An individual, object, or
measurement is included in only one category.
Exhaustive - Each individual object, or measurement
must appear in one category.
Pounds, minutes, # of baskets, wrong or right, shirt sizes,
miles per gallon, centigrade, vanilla ice cream or other
flavors, profits, net worth, counties in WI, thumbs up or down.
“It’s a Crime What Some People Do with Statistics” WSJ 8/30/00
How do you compute the percent of people falsely sentenced to
death?
How do you compute the percent of people falsely executed?
Chapter 2
Objectives:
1. The student will be able to construct and interpret frequency
distributions and histograms.
2. The student will be able to construct and interpret an ogive.
Frequency Distribution - Pg 22
Grouped data showing all possible outcomes and the number of
each outcome. Summarizes data by forming categories of values
and indicating the number of occurrences in each.
Features of a good frequency distribution
1. Class intervals are mutually exclusive(do not overlap)
2. Class intervals are of equal width(except for open ended
intervals)
3. Between 5 and 15 classes are normally used.
4. The number of data values falling in each class is
indicated.
5. Try to avoid open-ended classes
Construction of frequency distributions Pg 28-30
1. Establish the number of classes n = number of
observations
k = number of classes
~4~
2k ο€€ n Rule of Thumb
2.
Determine width of each class.
width =
(Range of data set)/k
highest value - lowest value
---------------------------k
3.
Determine the class boundaries
4.
5.
Count number of observations in each class
Present results
Step #1 2k≥n
27= 128
7 classes
Step #2- Determine the class width
Maximum - Minimum
k
This will determine class width
{ 127 - 70 } OVER 7 = 8.14
The number of classes should be made discrete. I rounded up to
nine. Seven classes that are nine units wide will cover 63
units. It is best if you distribute this excess of 6 units in
the first and last classes. Therefore, 3 additional units should
be in the first class and three additional units should be in
the last class.
Step #3- Determine the class boundaries
Start with the lowest value in the data set (70). Subtract
the excess for the first class determined in step #2.
PSI to
Break
67 to 76
#
Parts
2
Rel.
Freq.(%)
1.6
C.F. R.C.F
.%
2
1.6
76 to 85
22
17.6
24
19.2
85 to 94
47
37.6
71
56.8
94 to 103
29
23.2
100
80.0
~5~
PSI to
Break
103 to 112
#
Parts
17
112 to 121
5
121 to 130
3
Totals
Rel.
Freq.(%)
13.6
4.0
2.4
125
C.F. R.C.F
.%
117
93.6
122
97.6
125
100
100.0
frequency distributions for discrete vs nominal data
cumulative and relative frequency distributions
Histograms - Quality Control
Excel - Select/Data Analysis/Histogram
Misuses of Statistics Pg 14
1. Per-idem basis: per-capita, per-share, per-household, or per
transaction. Chapters 3 and 12
2. Adjustment for inflation - GNP Chapter 18 & 17
3. Induced bias in the process of inference - Management surveys
to determine place of Christmas party - Chapters 7-12,14,&15
4. Inappropriate comparisons of groups - Chapters 7-12,14,&15
self-selection - Company asks for volunteers for an
exercise
program
hidden differences - Wage discrimination
- Average score on basic skills test for
countries - Chapter 12 and 18
5. Scale on graphs and charts - Chapter 2 & 12
6. Inaccurate interpretation of statistics - All chapters
Chapter 3
Objectives:
1. The student will be able to compute and interpret the
arithmetic mean, median, mode, and weighted mean.
2. The student will be able to explain the advantages, and
disadvantages of each measure of central tendency listed
above.
3. The student will be able to identify the position of the
arithmetic mean, median, and mode for both a symmetrical
distribution and a skewed distribution.
~6~
4. The student will be able to compute and interpret measures of
dispersion.
5. The student will be able to explain the advantages and
disadvantages of each measure of central tendency
6. The student will be able to compute and interpret population
and sample standard deviation.
7. The student will be able to explain Chebyshev’s theorem, or
Normal rule.
SUMMATION NOTATION
X = 1,2,3,4,5
Y = -2,-1,0,1,2
∑ π‘₯ = 15
∑ π‘₯ 2 = 55
2
(∑ π‘₯) = 225
∑ π‘₯𝑦 = 10
∑π‘₯∑𝑦 = 0
DESCRIPTIVE STATISTICS - Excel-tools/data analysis/descriptive
stats
Minitab - Stats-Descriptive Statistics
MEASURES OF CENTRAL TENDENCY
1. Mean
∑π‘₯
= π‘₯Μ… π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘šπ‘’π‘Žπ‘›
𝑛
DATA = 2,2,3,5,7,7,8,8,9,10,104
_
X = 15 computers sold per day
The mean value is the center of the data that distributes
deviations above the mean and below the mean. The mean equates
the sum of the deviations above the mean to the sum of the
deviations below the mean.
Properties of the mean page 59.
~7~
Interpret average page 60 #8,#10
The mean balances deviations above and deviations below.
∑(π‘₯ − π‘₯Μ… ) = 0
SAMPLE VS POPULATION
MEAN OF A FREQUENCY DISTRIBUTION Pg 84
Which of the following is most appropriate for the
mean(average).
Use an average of the last 12 months to predict next months fuel
bill in Wisconsin.
Use an average of the last 12 months to pay your annual fuel
bill.
Use an average of the last 12 months determine what yearly sales
would be without seasonal fluctuations. Ch 19 Page 670 text
Applications
1. Wisconsin Power and Light energy averaging
2. Moving average and seasonal analysis #9 page 624 Ch 16
3. The cost of advertising on in Super Bowl
was is 3 million
per 30 second spot or $100,000 per second.
1A. WEIGHTED MEAN Pg 61
∑ 𝑀π‘₯
= π‘Šπ‘’π‘–π‘”β„Žπ‘‘π‘’π‘‘ π‘šπ‘’π‘Žπ‘›
∑𝑀
EX 1
_
X = 81.8
EX 2
EX 3
EX 4 HW
.2
.2
.2
.3
85
MIDTERM = 80
FINAL
= 90
WEIGHTS
1
2
78
80
84
.1
80
_
X = 86.6
Baseball average versus slugging percentage
1/06/10
In the NBA, 3 Is Cheaper Than 2 WSJ B14
1B. Geometric mean
Formula 3-4 page 69
𝑛
𝐺𝑀 = √(1 + 𝑅1 )(1 + 𝑅2 ) … (1 + 𝑅𝑛 ) − 1
n=number of changes in the data
R = fractional change from one period to the next period. For example if year
1 is 100 and increase in a year to 110 the fractional change is 10/100 or .1.
You make a two year investment of $1
~8~
End of year 1 worth $2
End of year 2 worth $1
100% increase
50% decrease
[Determine the arithmetic mean and the geometric mean of the
percentage change in your investment.]
Formula 3-5 page 70
𝑛
π΄π‘£π‘’π‘Ÿπ‘Žπ‘”π‘’ π‘π‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘–π‘›π‘π‘Ÿπ‘’π‘Žπ‘ π‘’ π‘œπ‘£π‘’π‘Ÿ π‘‘π‘–π‘šπ‘’ = √
π‘‰π‘Žπ‘™π‘’π‘’ π‘Žπ‘‘ π‘‘β„Žπ‘’ 𝑒𝑛𝑑 π‘œπ‘“ π‘π‘’π‘Ÿπ‘–π‘œπ‘‘
− 1
π‘‰π‘Žπ‘™π‘’π‘’ π‘Žπ‘‘ π‘‘β„Žπ‘’ 𝑏𝑒𝑔𝑖𝑛𝑛𝑖𝑛𝑔 π‘œπ‘“ π‘π‘’π‘Ÿπ‘–π‘œπ‘‘
[Invested $10,000 and after 10 years your investment grew to
30,000. Determine the average rate of return per year.]
𝑛
√(1 + π‘Ÿ1)(1 + π‘Ÿ2) … (1 + π‘Ÿπ‘›) − 1 [3 − 4]
R1 = the fractional change in period 1 for example assume your
profits grew by 2 hundredths in period 1 r1=.02
#28 text page 70
2. Median is the midpoint of the values after they have been
ordered from the smallest to the largest. The center is now
measured by the value that divides the data set in half, half
the values higher and half lower than the median value. Pg 63
POSITION OF THE MEDIAN VALUE = n + 1
2
Half the data must be below the median and half must be
above it. If the data set is odd, and 50 or less, n+1/2 will
indicate the position that divides the data set in half. If the
data set is large n/2 indicates the position of the median.
Median = 7 computers sold
Properties of the median page 64
9/15/08 “New Evidence on Taxes and Income”
2/2/07
“Is $34.06 Per Hour ‘Underpaid’? WSJ A19
EXCEL - Use sort function under data menu
3.
WSJ
MODE VALUE IN THE DATA COLLECTION THAT OCCURS MOST OFTEN.
1. USE MORE OFTEN WITH DISCRETE DATA AND ESPECIALLY USEFUL
FOR NOMINAL DATA.
Skewed frequency distributions Pg 114
[Would you prefer my grading to be balanced skewed right or
left?]
~9~
MEASURES OF DISPERSION Pg 73-78
1. RANGE - HIGHEST - LOWEST VALUE
∑|π‘₯−π‘₯|
2.
Mean Deviation
3.
Standard Deviation of a population = √
= 16.18
𝑛
Excel function key statistical AVGDEV
∑(π‘₯−π‘₯)2
𝑁
= 28.26
The average or typical distance data values are from its mean.
Sample vs Population - Excel automatically assumes the data is
from a sample, if you want Excel to compute a standard deviation
for a population you must use the function key. Go to the
function key statistical – stdevp
2
π‘†π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘› π‘“π‘œπ‘Ÿ π‘Ž π‘ π‘Žπ‘šπ‘π‘™π‘’ = √
Computer sales each Mean
day
2
2
3
5
7
7
8
8
9
10
104
∑(π‘₯ − π‘₯)
𝑛−1
Mean Dev Stan Dev
15
15
15
15
15
15
15
15
15
15
15
13
13
12
10
8
8
7
7
6
5
89
178
169
169
144
100
64
64
49
49
36
25
7921
8790
Mean Dev 16.18182 799.0909 Variance
28.2682 St Dev
Chebyshev’s Theorem Pg 81
Empirical or Normal Rule pg 82
STANDARD DEVIATION AS A MEASURE OF RISK
VARIANCE
STEEL MILL VS APPLE ORCHARD
Quality Control
~ 10 ~
[You are deciding on two different production methods for
producing ball bearings. They both will produce ball bearings
with a mean of 1 cm. Production method #1 Sx=.01 and method #2
Sx = .001. Which production method would you choose? Explain]
Excel select: Tools/Data analysis/Descriptive Statistics
Chapter 4
1. The Student will be able to construct and interpret a stem
and leaf display.
2. The Student will be able to Compute and interpret quartiles,
Deciles, and Percentiles.
3. The student will be able to compute and interpret skewness.
Location of median, quartiles, or percentiles
Lp = (n + 1)P/100
Pg 107
π‘ƒπ‘’π‘Žπ‘Ÿπ‘ π‘œπ‘›′ π‘ πΆπ‘œπ‘’π‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘ π‘œπ‘“ π‘†π‘˜π‘’π‘€π‘›π‘’π‘ π‘  = π‘ π‘˜ =
3(π‘₯ − π‘šπ‘’π‘‘π‘–π‘Žπ‘›)
𝑠
Pearson 4-2
Page 113-116
Software skewness comes with descriptive statistics in data analysis. 4-3
1/19/06
3/2/06
“Is Inequality Over Wages Worsening? WSJ A2
“Rich Get Richer, But Not as Fast As you Think” WSJ A2
quintiles
References:
“Census history and Census Politics: What We Can Expect in 1990"
Research and Opinion, Urban research center UW-Milwaukee 1989
“Price of Tickets rises 8.9 percent” Wisconsin State Journal
4/6/94
“How fair are our Taxes” WSJ 1/10/96
“Unemployment duration and labor market tightness” Chicago fed
Letter march 96
“Brewers at the bottom of the barrel in salaries” Wisconsin
State Journal 11/14/96
“Life is a Gamble” WSJ
“It’s a Crime What Some People Do With Statistics” WSJ 8/30/00
Mayday at 41,000 Feet–Watch Those Units! American Educator
Winter 2003/04
1/19/06
“Is Inequality Over Wages Worsening? WSJ A2
~ 11 ~
3/2/06 “Rich Get Richer, But Not as Fast As you Think” WSJ A2
quintiles
Download