DATA AND STATISTICS

advertisement
Tuesday August 28
Class 2
Text problems for August 30: Chapter 2 - 2,6 & 10
Aplia Graded Assignment: “Introduction” due
September 4, 9:00 am
Practice Problems for Chapter 1 & 2 are now
available
Please note that a tutorial for basic math concepts is
available if needed
Slide 1

Introduction Statistics: the language
Slide 2
Data and Data Sets
 Data are the facts and or numbers collected,
summarized, analyzed, and interpreted.
 The data collected in a particular study are referred
to as the data set.
Slide 3
Elements, Variables, and Observations
 The elements are the entities on which data are
collected.
 A variable is a characteristic of interest for the elements.
 The set of measurements collected for a particular
element is called an observation.
 The total number of data values in a complete data
set is the number of elements multiplied by the
number of variables.
Slide 4
Data, Data Sets,
Elements, Variables, and Observations
Variables
Element
Names
Company
Dataram
EnergySouth
Keystone
LandCare
Psychemedics
Stock
Exchange
NQ
N
N
NQ
N
Annual
Earn/
Sales($M) Share($)
73.10
74.00
365.70
111.40
17.60
0.86
1.67
0.86
0.33
0.13
Data Set
Slide 5
Scales of Measurement
Scales of measurement include:
Nominal
Interval
Ordinal
Ratio
The scale determines the amount of information
contained in the data.
The scale indicates how the data can be summarized
and statistical analyses that are most appropriate.
Slide 6
Scales of Measurement

Nominal
Data are labels or names used to identify an
attribute of the element.
A nonnumeric label or numeric code may be used.
Slide 7
Scales of Measurement

Nominal
Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business,
2 denotes Humanities, 3 denotes Education, and
so on).
Slide 8
Scales of Measurement

Ordinal
The data have the properties of nominal data and
the order or rank of the data is meaningful.
A nonnumeric label or numeric code may be used.
Slide 9
Scales of Measurement

Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Slide 10
Scales of Measurement

Interval
The data have the properties of ordinal data, and
the interval between observations is expressed in
terms of a fixed unit of measure.
Interval data are always numeric.
Slide 11
Scales of Measurement

Interval
Example:
Melissa has an SAT score of 1205, while Kevin
has an SAT score of 1090. Melissa scored 115
points more than Kevin.
Slide 12
Scales of Measurement

Ratio
The data have all the properties of interval data
and the ratio of two values is meaningful.
Variables such as distance, height, weight, and time
use the ratio scale.
This scale must contain a zero value that indicates
that nothing exists for the variable at the zero point.
Slide 13
Scales of Measurement

Ratio
Example:
Melissa’s college record shows 36 credit hours
earned, while Kevin’s record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.
Slide 14
Qualitative and Quantitative Data
Data can be further classified as being qualitative
or quantitative.
The statistical analysis that is appropriate depends
on whether the data for the variable are qualitative
or quantitative.
In general, there are more alternatives for statistical
analysis when the data are quantitative.
Slide 15
Qualitative Data
Labels or names used to identify an attribute of each
element
Often referred to as categorical data
Use either the nominal or ordinal scale of
measurement
Can be either numeric or nonnumeric
Slide 16
Quantitative Data
Quantitative data indicate how many or how much:
discrete, if measuring how many
continuous, if measuring how much
Quantitative data are always numeric.
Ordinary arithmetic operations are meaningful for
quantitative data.
Slide 17
Scales of Measurement
Data
Qualitative
Numerical
Nominal
Ordinal
Quantitative
Non-numerical
Nominal
Ordinal
Numerical
Interval
Ratio
Slide 18
Cross-Sectional Data
Cross-sectional data are collected at the same or
approximately the same point in time.
Example: data detailing the number of building
permits issued in June 2007 in each of the counties
of Ohio
Slide 19
Time Series Data
Time series data are collected over several time
periods.
Example: data detailing the number of building
permits issued in Lucas County, Ohio in each of
the last 36 months
Slide 20
Types of Statistical Studies

Statistical Studies
In experimental studies the variable of interest is
first identified. Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of
interest.
In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest.
a survey is a good example
Slide 21
Descriptive Statistics

Descriptive statistics are the tabular, graphical, and
numerical methods used to summarize and present
data.
Slide 22
Example: Hudson Auto Repair
The manager of Hudson Auto would like to have a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Slide 23
Example: Hudson Auto Repair

Sample of Parts Cost ($) for 50 Tune-ups
91
71
104
85
62
78
69
74
97
82
93
72
62
88
98
57
89
68
68
101
75
66
97
83
79
52
75
105
68
105
99
79
77
71
79
80
75
65
69
69
97
72
80
67
62
62
76
109
74
73
Slide 24
Tabular Summary:
Frequency and Percent Frequency
Parts
Cost ($)
50-59
60-69
70-79
80-89
90-99
100-109
Parts
Frequency
2
13
16
7
7
5
50
Percent
Frequency
4
26
(2/50)100
32
14
14
10
100
Slide 25
Graphical Summary: Histogram
Tune-up Parts Cost
18
16
Frequency
14
12
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Slide 26
Numerical Descriptive Statistics
 The most common numerical descriptive statistic
is the average (or mean).
 Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).
Slide 27
Statistical Inference
Population
- the collection of all the elements of
interest
Sample - a subset of the population
Statistical inference - the process of using data obtained
from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census - collecting data for a population
Sample survey - collecting data for a sample
Slide 28
Process of Statistical Inference
1. Population
consists of all tuneups. Average cost of
parts is unknown.
4. The sample average
is used to estimate the
population average.
2. A sample of 50
engine tune-ups
is examined.
3. The sample data
provide a sample
average parts cost
of $79 per tune-up.
Slide 29
Computers and Statistical Analysis
 Statistical analysis typically involves working with
large amounts of data.
 Computer software is typically used to conduct the
analysis.
 Instructions are provided in chapter appendices for
carrying out many of the statistical procedures
using Minitab and Excel.
Slide 30
Tainted Truth

“ If someone is misusing numbers and scaring us with
those numbers to get us to do something, however good
that something is, we have lost the power of numbers”

WE ALL NEED TO BE CRITICAL.
Slide 31
Reported Information

Eating oat brand is a cheap and easy way to reduce
your cholesterol count (Quaker Oats)
Actual Study Information

Diet must consist of nothing but oat bran to achieve a
slightly lower cholesterol count.
Slide 32
Reported Information

Only 29% of high school girls are happy with
themselves, compared to 66% of elementary school
girls. (American Association of University Women)
Actual Study Information

Of 3000 high school girls 29% responded “Always
true” to the statement, “I am happy with the way I
am.” Most answered, “Sort of true” and “Sometimes
true.”
Slide 33

Four out of five people in Columbia prefer Wendys
over McDonalds (according to a recent survey)
?????? Credible
Slide 34
Slide 35
Slide 36

Ethical Guidelines for Statistical Practice
American Statistical Association
www.amstat.org
Slide 37

Association vs Causation
Slide 38
Chapter 2
Descriptive Statistics:
Tabular and Graphical Presentations


Summarizing Qualitative Data
Summarizing Quantitative Data
Slide 39
Summarizing Qualitative Data





Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Bar Graphs
Pie Charts
Slide 40
Frequency Distribution
A frequency distribution is a tabular summary of
data showing the frequency (or number) of items
in each of several non-overlapping classes.
The objective is to provide insights about the data
that cannot be quickly obtained by looking only at
the original data.
Slide 41
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent, above
average, average, below average, or poor. The ratings
provided by a sample of 20 guests are:
Below Average
Above Average
Above Average
Average
Above Average
Average
Above Average
Average
Above Average
Below Average
Poor
Excellent
Above Average
Average
Above Average
Above Average
Below Average
Poor
Above Average
Average
Slide 42
Frequency Distribution
Rating
Frequency
2
Poor
3
Below Average
5
Average
9
Above Average
1
Excellent
Total
20
Slide 43
Relative Frequency Distribution
The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
A relative frequency distribution is a tabular
summary of a set of data showing the relative
frequency for each class.
Slide 44
Percent Frequency Distribution
The percent frequency of a class is the relative
frequency multiplied by 100.
A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class.
Slide 45
Relative Frequency and
Percent Frequency Distributions
Relative
Frequency
Rating
.10
Poor
.15
Below Average
.25
Average
.45
Above Average
.05
Excellent
Total
1.00
Percent
Frequency
10
15
25 .10(100) = 10
45
5
100
1/20 = .05
Slide 46
Bar Graph
 A bar graph is a graphical device for depicting
qualitative data.
 On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
 A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
 Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
 The bars are separated to emphasize the fact that each
class is a separate category.
Slide 47
Bar Graph
Marada Inn Quality Ratings
10
9
Frequency
8
7
6
5
4
3
2
1
Poor
Below Average Above Excellent
Average
Average
Rating
Slide 48
Pie Chart
 The pie chart is another commonly used graphical device
for presenting relative frequency distributions for
qualitative data.

First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond
to the relative frequency for each class.

Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) = 90
degrees of the circle.
Slide 49
Pie Chart
Marada Inn Quality Ratings
Excellent
5%
Poor
10%
Above
Average
45%
Below
Average
15%
Average
25%
Slide 50
Example: Marada Inn

Insights Gained from the Preceding Pie Chart
•
One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
•
For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.
Slide 51
Pie Chart
Marada Inn Quality Ratings
Excellent
5%
Poor
10%
Above
Average
45%
Below
Average
15%
Average
25%
Slide 52
See Example 1 Class 2 data file
Text problems for August 30: Chapter 2 – 2 , 6 & 10
Slide 53
Slide 54
Slide 55
Download