mode

advertisement
MEASURES OF
CENTRALITY
Last lecture summary
• Which graphs did we meet?
• scatter plot (bodový graf)
• bar chart (sloupcový graf)
• histogram
• pie chart (koláčový graf)
• How do they work, what are their advantages and/or
disadvantages?
Random noise
SIZE [ft2] COST [$]
1 300
88 000
1 400
72 000
1 600
1 900
2 100
94 000
86 000
112 000
2 300
98 000
Histogram
• Now I will collect heights of all of you in this room.
• Use Interactive Histogram Applet:
http://www.shodor.org/interactivate/activities/Histogram/
• interval, bin
Histogram – Body fat
• In Interactive Histogram Applet – choose „Body fat % in 252
men“ dataset.
• Find reasonable bin size
• Answer following questions. No matter of bin size what is
always true?
•
•
•
•
Most scores fall around 20%.
The shape is roughly symmetrical.
Most scores fall in the middle of distribution.
There are more scores between 15 and 25 than between 35
and 50.
• There are more scores between 0 and 10 than between 18 and
24.
• Relatively more men have a body fat above 35% or below 5%.
Histogram – Income distribution
• United States Census Bureau – http://www.census.gov
Income
Number of houses
10 000
9401
20 000
14447
30 000
13642
40 000
12388
50 000
11028
Histogram – Income distribution
• This is an example of a
(positively) skewed
distribution (zprava
zešikmené rozdělení).
• This distribution is not
symmetrical.
• Most incomes fall to the
left of the distribution.
Bar chart and scatter plot
• Which scatter plot corresponds to this bar chart?
Pie chart to histogram
• Which histogram looks like it cames from the same data?
About statistics
• Statistics – the science of collecting, organizing,
summaryzing, analyzing, and interpreting data
• Goal – use imperfect information (our data) to infer facts,
make predictions, and make decisions
• Descriptive statistic – summarising data with numbers
or pictures
• Inferential statistics – making conclusions or decisions
based on data
Choosing a profession
Chemistry
Geography
50 000 – 60 000
40 000 – 55 000
Choosing a profession
• We made an interval estimate.
• But ideally we want one number that describes the entire
dataset. This allows us to quickly summarize all our data.
Choosing a profession
Chemistry
Geography
1. The value at which frequency is highest.
2. The value where frequency is lowest.
3. Value in the middle.
4. Biggest value o x-axis.
5. Mean
Three big M’s
Chemistry
Geography
• The value at which frequency is highest is called the
mode. i.e. the most common value is the mode.
• The value in the middle of the distribution is called the
median.
• The mean is the mean.
Quick quiz
• What is the mode in our data?
Mode in negatively skewed distribution
Mode in uniform distribution
Multimodal distribution
Mode in categorical data
More of mode
True or False?
The mode can be used to describe any type of data we have,
whether it’s numerical or categorical.
2. All scores in the dataset affect the mode.
3. If we take a lot of samples from the same population, the
mode will be the same in each sample.
4. There is an equation for the mode.
1.
• Ad 3.
• http://onlinestatbook.com/stat_sim/sampling_dist/
• Mode changes as you change a bin size.
• The mode depends on how you present data. And we can’t use
mode to learn something about our population.
Life expectancy data
• Watch TED talk by Hans Rosling, Gapminder Foundation:
http://www.ted.com/talks/hans_rosling_shows_the_best_s
tats_you_ve_ever_seen.html
Download