Chapter 1 Data Prese..

advertisement
Chapter 1 Data Presentation
Statistics and Data
Measurement Levels
Summarizing Data
Symmetry and Skewness
Statistics and Data
• Statistics – collection of techniques used in analyzing data
– numbers produced in the analysis (eg. Average)
• Data – collection of measurements made on a number of subjects.
• Subjects – where information are drawn
- experimental units
Data are usually stored in a row-and-column display called a spreadsheet.
From page 2 of the textbook
Row represents a subject and columns represent measure of
variables.
Measurement Levels of Data
Types of Data
• Categorical Data – variables that yield categorical data
Nominal – possible values are just names of categories
– no apparent ordering between the possible values
examples: Gender, Major, College
Ordinal – there is an obvious ordering of the possible values
example: Year level (Freshman, Sophomore …) , Military ranking
• Numerical Data - variables that yield numerical data
Interval – Interval exists but not ratios
– zero does not mean absence of that variable
examples: Temperature, IQ
60 F vs 30 F, there is 30 degrees difference between the two
temperatures but it does not mean that 60 F is twice as warm as 30F
Ratio – ratio exist
examples: Age, Height, Number of classes taken this semester
Ratio : there are 2 other levels under ratio
Discrete: result of a counting process
example: number of classes being taken,
number of students in a class
Continuous: result of a measuring process.
example: height, age, weight, velocity
Summary of Data type and Levels
Summarizing Data
Summarizing Categorical Data
1. Relative Frequency Table - represents the frequency of
each type of categorical variable
2. Bar Chart - plot of the relative frequency table; order of
categories is arbitrary
3. Pie Chart - also a plot of the relative frequency table,
except in a circular shape
Relative Frequency Table
Major
ACT
GBS
MGT
MKT
Total
Frequency
16
28
8
4
56
Rel. Freq.
0.29
0.5
0.14
0.07
1
Bar Chart of the Relative Frequency Table
Frequency
30
25
Using Frequency
20
15
Frequency
10
5
0
ACT
GBS
MGT
MKT
Percent Distribution of Major
0.6
0.5
Using Relative
Frequency
0.5
0.4
0.3
0.29
Rel. Freq.
0.2
0.14
0.07
0.1
0
ACT
GBS
MGT
MKT
Pie Chart
Percent Distribution of Major
0.07
0.14
0.29
ACT
GBS
MGT
MKT
0.5
Summarizing Numerical Data
Stem and Leaf Plot
Relative Frequency Table and Histogram
similar concept with the categorical data
determine the following: number of classes, class width
For example: MIN, MAX , number of classes, width = (MAX -MIN) /(classes-1)
The intervals in each class should be mutually exclusive.
The histogram will just be the graphical presentation of the RTF
Box-and-Whisker Plot
a graphical picture of the distribution of quarters of the data.
Useful for comparing distributions of two or more variables
Minimum
Q1 (first quartile) – the upper boundary of the first quarter
Median – divides the data into lower and upper halves.
Q3 (third quartile) – the upper boundary of the third quarter
Maximum
Dotplot
similar to the histogram but used for moderately large data
this can also be used in studying outliers in the data
Stem-and-leaf Display
Summer 2 Quiz Data:
8, 11, 13, 19, 21, 23, 25, 25, 25, 28, 31, 35, 39, 47
Stemplot of Summer 2 Quiz
0
1
2
3
4
8
1 3 9
1 3 5 5 5 8
1 5 9
7
Relative Frequency Table and Histogram
Summer 2 Quiz Data:
8, 11, 13, 19, 21, 23, 25, 25, 25, 28, 31, 35, 39, 47
For example, 4 classes is desired. MIN=8, MAX=47
Class width = (47-8)/(4-1)=39/3=13
Class Freq. Rel.Freq
-5 - 8
1
7%
8 - 21
4
29%
21 - 34
6
43%
34 - 47
3
21%
14
100%
Note: intervals include the right endpoint but not the left endpoint.
Histogram of the Summer 2 Quiz Data
Boxplot or Box-and-Whisker Plot
Minimum = 8
Q1 (first quartile) =19
Median = 25
Q3 (third quartile) = 31
Maximum = 47
Summer 2 Quiz Data:
8, 11, 13, 19, 21, 23, 25, 25, 25, 28, 31, 35, 39, 47
Symmetry and Skewness
Examining symmetry and skewness determines the shape of the data
If the left tail is longer than the right tail, then the data is left-skewed.
If the right tail is longer than the left tail, then the data is right-skewed.
If the left tail is almost the same as the right tail, then the data is symmetric.
Stem-and-leaf display, Histogram and Boxplot can be used to examine symmetry
and skewness.
The left tail is longer than the right tail, hence the data is left-skewed.
Download