Ch2 - YSU

advertisement
Chapter 2 – Descriptive Statistics
Tabular
and
Graphical Presentations
1
Chapter Outline
 Summarize Qualitative Data
 Frequency Distribution
 Bar Charts and Pie Charts
 Summarize Quantitative Data
 Frequency Distribution
 Histogram
 Cumulative Distributions
 Crosstabulations
 Scatter Diagrams
2
A Note
 An important aspect of statistics is to present the
data in an informative way so as to reveal any
patterns in the data (no pattern is a pattern!).
 Different types of data require different
summarization methods and statistical analyses.
3
Summarize Qualitative Data
 Check out the following data. What pattern can you detect from the
raw data?
Table 2.1 Data from a sample of 50 individual responses to the question
'Which network's evening news do you prefer to watch?'
NBC
CBS
NBC
ABC
FOX
NBC
NBC
CNN
NBC
CBS
CBS
FOX
NBC
CNN
ABC
FOX
NBC
CBS
FOX
ABC
CNN
FOX
CBS
CBS
CNN
NBC
NBC
CBS
FOX
ABC
CBS
NBC
FOX
NBC
FOX
NBC
CNN
NBC
CBS
CBS
ABC
NBC
CNN
FOX
CBS
FOX
CBS
ABC
NBC
CNN
4
Summarize Qualitative Data
Frequency Distribution
 The raw data in the previous table does not provide any
meaningful information ( like any pattern) directly. For
qualitative data, we can summarize and present the raw
data with ‘Frequency Distribution’.
 A frequency distribution is a tabular summary of data
showing the number (frequency) of items in each
nonoverlapping class.
• Please refer to the Excel demonstration ( Chapter 2) on how to
construct the frequency distribution for the data in table 2.1.
• The outcome is shown on the next slide.
5
Frequency Distribution for Data in Table 2.1
Network
ABC
CBS
CNN
FOX
NBC
Frequency
6
12
7
10
15
6
Relative Frequency
 To obtain relative frequency, simply divide the frequency of each class
by the total number of observations (n). For the data in Table 2.1, n
equals 50.
Network Frequency Relative Frequency
ABC
6
0.12
CBS
12
0.24
CNN
7
0.14
FOX
10
0.2
NBC
15
0.3
Percent Frequency
12
24
14
20
30
15/50=0.3
7
Bar Charts and Pie Charts


A frequency distribution is often presented in a graph (a bar chart or a pie chart)
to communicate information visually.
Please refer to the Excel demonstration ( Chapter 2) on how to create a bar
chart and a pie chart for the frequency distribution from previous slide.
ABC
12%
NBC
30%
CBS
24%
FOX
20%
CNN
14%
Both charts indicate that the most popular
network evening news is on NBC.
8
Summarize Quantitative Data
 Check out the following data. Can you quickly decide how many
classes there should be in the construction of a frequency distribution?
Table 2.2 Data of average monthly sales volume
($1000) of a sample of 50 Starbucks stores in New
York City in 2012
95
77
97
99
89
108
120
78
79
88
67
97
97
79
93
99
103
106
82
93
93
97
95
61
109
77
88
100
109
90
86
89
97
93
88
93
105
87
82
98
119
104
93
104
101
118
105
82
73
101
9
Summarize Quantitative Data
Frequency Distribution
 Different from the qualitative data in Table 2.1, the
quantitative data in Table 2.2 do not indicate the
number of classes straightforwardly.
 Apply the following procedure to construct a
frequency distribution for quantitative data.
•
•
•
•
Determine the number of non-overlapping classes;
Determine the class width;
Determine the class limits;
Count the item numbers in each class.
10
Summarize Quantitative Data
Frequency Distribution
 Step one – Determine the number of nonoverlapping classes
• As a guidance, you can use the ‘2 to the power of k’
rule. That is, to find the smallest integer (k) such that 2k
 n ( n is the sample size). Applying the rule to the data
in Table 2.2, we find k = 6 since 26=64 ( n=50). Thus,
we set the # of classes as 6. (Note that it is only a
suggestion, not an absolute rule.)
• Empirically speaking, the # of classes is between 5 and
20.
11
Summarize Quantitative Data
Frequency Distribution
 Step two – Determine the class width
• Use equal class width to avoid misinterpretation
• Approximately, class width =
Largest va lue - Smallest v alue
# of classes
• For the data in Table 2.2, class width = (120-61)/6=
9.96. We can round it up to 10, which is a much more
convenient value to work with for class width.
12
Summarize Quantitative Data
Frequency Distribution
 Step three – Determine the class limits
• Class limits should be set so that each data point
belongs to one and only one class, and no data point is
left out.
• Similar to class width, class limits can use values that
are convenient to work with.
- In our example, the smallest value is 61 and the class width is
set as 10. So, the lowest class can be set as 61 – 70. Note that
the class width is calculated as 70-61+1=10.
13
Summarize Quantitative Data
Frequency Distribution
 Step four – count the # of items in each class
• For the data in Table 2.2, the frequency distribution is
constructed as follows:
Sales Volume ($1000) Frequency
61-70
2
71-80
6
81-90
11
91-100
17
101-110
11
111-120
3
Total
50
•
Please refer to the Excel demonstration ( Chapter 2) on how to construct the
frequency distribution for the data in table 2.2.
14
Relative Frequency
Example: Monthly Sales Volume of 50 Starbucks Stores
Sales Volume ($1000)
61-70
71-80
81-90
91-100
101-110
111-120
Frequency
2
6
11
17
11
3
Relative Frequency
0.04
0.12
0.22
0.34
0.22
0.06
Percent Freqency
4
12
22
34
22
6
3/50=0.06
15
Interpretation of Frequency Distribution
The frequency distribution of monthly sales volume of 50
Starbucks stores in NYC reveals that
 39 stores generated an average monthly sales in 2012
between $81,000 and $110,000.
 4% of the sample stores had an average monthly sales no
more than $70,000.
 6% of the sample stores had an average monthly sales
$111,000 or more.
16
Histogram
 Like a bar chart, a histogram is a graphical presentation of
frequency distribution.
 The height of a rectangle ( a bar) drawn above each class
interval corresponds to that class’ frequency or relative
frequency.
 Unlike a bar chart, a histogram has no gap between
rectangles of adjacent classes.
• Please refer to the Excel demonstration ( Chapter 2) on how to create a
histogram for the frequency distribution of Sales volume of Starbucks
stores.
17
Histogram
Monthly Sales Volume of 50 Starbucks Stores in NYC
Average Monthly Sales Volume of A Sample of 50 Starbucks Stores
in NYC in 2012
Frequency
20
17
15
11
11
10
6
5
3
2
0
61-70
71-80
81-90
91-100
101-110
111-120
Sales Volume ($1000)
18
Histogram
 Skewness – the lack of symmetry.
 Symmetric distribution, such as height or weight of human
population.
Relative Frequency
.35
.30
.25
.20
.15
.10
.05
0
19
Histogram
 Negative Skewness – a longer tail to the left.
 An example: exam scores
Relative Frequency
.35
.30
.25
.20
.15
.10
.05
0
20
Histogram
 Positive Skewness – a longer tail to the right.
 An example: home values
Relative Frequency
.35
.30
.25
.20
.15
.10
.05
0
21
Cumulative Distributions
 Cumulative frequency distribution – shows the # of items
with values less than or equal to the upper limit of each
class.
 Cumulative relative frequency distribution – shows the
proportion (percentage) of items with values less than or
equal to the upper limit of each class.
22
Cumulative Distributions
Monthly sales volume of 50 Starbucks stores
Sales Volume
Cumulative
($1000)
Frequency
 70
2
 80
8
 90
19
100
36
2+6+11=19
110
47
120
50
Cumulative
Relative Frequency
0.04
0.16
0.38
0.72
19/50=0.38
0.94
1
23
Crosstabulations and Scatter Diagrams
 So far, we have studies the methods of summarizing the
data of one variable at a time.
 In business, it is important to understand the relationships
among different variables. For instance, the relationship
between sales volume and expenditure on advertisement.
 Crosstabulations and scatter diagrams are two
methods of descriptive statistics, which are used to
summarize the data to reveal the relationship of two
variables.
24
Crosstabulations
 A crosstabulation is a tabular summary of data for two
variables.
 The two variables can be either qualitative or quantitative
or one of each.
 The left and top margin labels show the classes for
the two variables.
25
Crosstabulations

Example: Finger Lakes Homes
The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
Price
Range
quantitative
categorical
variable
variable
Home Style
Colonial Log Split A-Frame Total
< $200,000
> $200,000
18
12
6
14
19
16
12
3
55
Total
30
20
35
15
100
45
26
Crosstabulations

Example: Finger Lakes Homes
Insights Gained from Preceding Crosstabulation
• The greatest number of homes (19) in the sample
are a split-level style and priced at less than
$200,000.
• Only three homes in the sample are an A-Frame
style and priced at $200,000 or more.
27
Crosstabulation
Example: Finger Lakes Homes
Insights Gained from Preceding Crosstabulation
 The greatest number of homes (19) in the sample
are a split-level style and priced at less than
$200,000.
 Only three homes in the sample are an A-Frame
style and priced at $200,000 or more.
28
Crosstabulations
Frequency
distribution
for the
price range
variable
Example: Finger Lakes Homes
Home Style
Log Split A-Frame
Price
Range
Colonial
< $200,000
> $200,000
18
12
6
14
19
16
12
3
55
Total
30
20
35
15
100
Total
45
Frequency distribution for
the home style variable
29
Crosstabulations: Simpson’s Paradox
 Data in two or more crosstabulations are often
aggregated to produce a summary crosstabulation.
 We must be careful in drawing conclusions about the
relationship between the two variables in the
aggregated crosstabulation.
 In some cases the conclusions based upon an
aggregated crosstabulation can be completely
reversed if we look at the unaggregated data. The
reversal of conclusions based on aggregate and
unaggregated data is called Simpson’s paradox.
30
Scatter Diagrams
 A scatter diagram is a graphical presentation of the
relationship between two quantitative variables.
 One variable is shown on the horizontal axis and the other
variable is shown on the vertical axis.
 The general pattern of the plotted points suggests the
overall relationship between the variables.
 A trendline provides a linear approximation of the
relationship.
31
Scatter Diagrams
 A Positive Relationship
y
x
32
Scatter Diagrams
 A Negative Relationship
y
x
33
Scatter Diagrams
 No Relationship
y
x
34
Scatter Diagrams
 An example
Is there a relationship between gas prices and stock prices?
• For the variable – gas price, let us use the data of the U.S. retail gas
price;
• For the variable – stock prices, let us use the data of the S&P 500
Index ( ticker symbol – SPY);
• Weekly data for both variables.
The data are shown in the next slide.
35
Data of U.S. Retail Gas Price and S&P
500 Proxy Price (SPY)
Date
Jan 28, 2013
Feb 04, 2013
Feb 11, 2013
Feb 18, 2013
Feb 25, 2013
Mar 04, 2013
Mar 11, 2013
Mar 18, 2013
Mar 25, 2013
Apr 01, 2013
U.S. Retail
Gas Price
3.296
3.471
3.537
3.69
3.722
3.698
3.644
3.633
3.616
3.572
SPY
151.24
151.8
152.11
151.89
152.11
155.44
155.83
155.6
156.67
155.86
36
Scatter Diagrams

The relationship between gas prices and stock prices
Scatter Diagram
157
156
SPY
155
154
153
152
151
150
3.25
3.3
3.35
3.4
3.45
3.5
3.55
3.6
3.65
3.7
3.75
U.S. Retail Gas Price ($/gallon)
37
Scatter Diagrams
The relationship between gas prices and stock prices
 The plots in the previous scatter diagram indicate a positive
relationship between U.S. retail gas price and the value of
SPY.
 The relationship is sketchy. When gas price is high, the
S&P 500 Index tend to be high.
 We need to be cautious in drawing conclusion from a
scatter diagram. In the example, there are only 10 data
points. Much more data are required to rigorously examine
the relationship between gas price and stock prices.
38
Download