Uploaded by Anton Miroshnichenko

Economics & Business Stats CH1

advertisement
Chapter 1
Data and Statistics
I need help!
Why Study Statistics?
 Applications in Business and Economics
 Data and Data Sources
 Descriptive Statistics
 Statistical Inference
 Computers and
 Description of Raw Data

Learning Objectives
On completion of this chapter, students will be able to:
1. Understand why we study statistics and its applications
2. Explain what is meant by descriptive statistics and
inferential statistics
3. Distinguish between a qualitative variable and a
quantitative variable
4. Distinguish between a discrete variable and a
continuous variable
5. Distinguish among the nominal, ordinal, interval, and
ratio levels of measurement
6. Define the terms mutually exclusive and exhaustive
2 of 20
Why Study Statistics?
• Three main reasons why we study statistics
are:
1. Data are everywhere
2. Statistical techniques are used to make many
decisions that affect our lives
3. Whatever your future career, you will make
decisions that involve data
• An understanding of statistical methods helps
in making decisions more effectively
3 of 20
What Is Statistics?
1. Collecting Data
e.g., Survey
(source-Mcclave, Benson, Sincich)
Data
Analysis
Why?
2. Presenting Data
© 1984-1994 T/Maker Co.
e.g., Charts & Tables
DecisionMaking
3. Characterizing Data
e.g., Average
© 1984-1994 T/Maker Co.
What Is Statistics?
• Statistics is the science of data and involves
collecting, organizing, presenting, analyzing, and
interpreting data to assist in making more
effective decisions.
• Descriptive statistics are the tabular, graphical,
and numerical methods used to summarize and
present data.
• Inferential Statistics relate to making inferences
or predictions about a population from
observations and analyses of a sample. It means
that results of an analysis based on sample data
can be generalized to reflect the larger
population that the sample represents
5 of 20
Types of Statistics
Descriptive Statistics
Inferential Statistics
• The methods used to
• Methods of organizing,
determine something about a
summarizing, and
population, based on a sample
presenting data in an
– A population is the entire
informative way
set of individuals or objects
of interest or the
– Frequency distributions
measurements obtained
– Chart forms
from all individuals or
– Central tendency
objects of interest
measures
– A sample is a portion, or
– Data clustering
part of the population of
interest.
6 of 20
Applications in
Business and Economics
• Accounting
Public accounting firms use statistical
sampling procedures when conducting
audits for their clients.

Economics
Economists use statistical information
in making forecasts about the future of
the economy or some aspect of it.
Applications in Business and Economics

Marketing
Electronic point-of-sale scanners at
retail checkout counters are used to
collect data for a variety of marketing
research applications.

Production
Emphasis on quality makes quality control an
important application of statistics in
production. A variety of statistical quality
control charts are used to monitor
the output of a production process.
Applications in Business and Economics (Cont’d)
 Finance
Financial analysts use a variety of statistical
information to guide their investment
recommendations. Example, Financial advisors
use price-earnings ratios and dividend yields to
guide their investment recommendations.
Summary - Applications in Business
and Economics (Cont’d)
• Economics
– Forecasting
– Demographics
• Sports
– Individual & Team
Performance
• Engineering
– Construction
– Materials
• Business
– Consumer Preferences
– Financial Trends
Data and Data Sets
• What is Data?
• Data are the facts and figures collected,
summarized, analyzed, and interpreted.
The data collected in a particular study are
referred to as the data set.
Elements, Variables, and Observations
The elements are the entities on which data
are collected.
A variable is a characteristic of interest for
the elements.
The set of measurements collected for a
particular element is called an observation.
Examples of Data, Data Sets, Elements,
Variables, and Observations
Variables
Element
Names
Company
Dataram
EnergySouth
Keystone
LandCare
Psychemedics
Stock
Exchange
NQ
N
N
NQ
N
Annual
Earn/
Sales($M) Share($)
73.10
74.00
365.70
111.40
17.60
Data Set
0.86
1.67
0.86
0.33
0.13
Data Definitions
(Table 2.2)
Number of Variables and Typical Tasks
Data Set
Variables
Typical Tasks
Univariate
One
Histograms, descriptive
statistics, frequency tallies
Bivariate
Two
Scatter plots, correlations,
simple regression
Multivariate More than
two
2-14
Multiple regression, data
mining, econometric
modeling
Data Definitions
A Small Multivariate Data Set
8 Subjects
2-15
5 Variables
Data Definitions
Binary Data
A binary variable has only two values,
1 = presence, 0 = absence of a characteristic of
interest (codes themselves are arbitrary).
For example,
1 = employed, 0 = not employed
1 = married, 0 = not married
1 = male, 0 = female
1 = female, 0 = male
2-16
The coding itself has no numerical value. So binary
variables are attribute data.
Scales of Measurement
Scales of measurement include:
Nominal
Ordinal
Interval
Ratio
The scale determines the amount of
information contained in the data.
The scale indicates the data summarization and
statistical analyses that are most appropriate.
Scales of Measurement
• Nominal
•
•
•
•
Data are labels or names used to identify
an attribute of the element.
A nonnumeric label or numeric code may
be used.
Scales of Measurement

Nominal
Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business,
2 denotes Humanities, 3 denotes Education, and
so on).
Scales of Measurement
• Interval
The data have the properties of ordinal data, and
the interval between observations is expressed in
terms of a fixed unit of measure.
Interval data are always numeric.
Example:
Melissa has an SAT score of 1205, while Kevin
has an SAT score of 1090. Melissa scored 115
points more than Kevin.
Scales of Measurement
• Ordinal
• The data have the properties of nominal
data and
• the order or rank of the data is meaningful.
• A nonnumeric label or numeric code may
be used.
Scales of Measurement
• Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Scales of Measurement
• Ratio
• The data have all the properties of interval
data and the ratio of two values is
meaningful.
• Variables such as distance, height, weight,
and time use the ratio scale.
• This scale must contain a zero value that
indicates that nothing exists for the variable
at the zero point.
Scales of Measurement
• Ratio
• Example:
• Melissa’s college record shows 36 credit
hours earned, while Kevin’s record shows
72 credit hours earned. Kevin has twice
as many credit hours earned as Melissa.
Qualitative and Quantitative Data
Also, data can be classified as being qualitative
or quantitative.
The statistical analysis that is appropriate depends
on whether the data for the variable are qualitative
or quantitative.
In general, there are more alternatives for statistical
analysis when the data are quantitative.
Qualitative Data
• Labels or names used to identify an attribute
of each element is often referred to as
categorical data
• It uses either the nominal or ordinal scale of
measurement
• It can be either numeric or nonnumeric
• Its appropriate statistical analyses are rather
limited
Qualitative Data
(Doane/Seward )
Data Types
Categorical or Qualitative data.
Values are described by words rather than numbers.
For example,
- Automobile style (e.g., X = full, midsize,
compact, subcompact).
- Mutual fund (e.g., X = load, no-load).
2-27
Quantitative Data
Quantitative data indicate how many or how much:
Discrete, if measuring how many
Continuous, if measuring how much
Quantitative data are always numeric.
Ordinary arithmetic operations are meaningful for
quantitative data.
Data Definitions
(Doane/Seward )
Discrete Data
A numerical variable with a countable number of
values that can be represented by an integer (no
fractional values).
For example,
- Number of Medicaid patients (e.g., X = 2).
- Number of takeoffs at O’Hare (e.g., X = 37).
2-29
Data Definitions
(Doane/Seward )
Continuous Data
A numerical variable that can have any value within an
interval (e.g., length, weight, time, sales,
price/earnings ratios).
Any continuous interval contains infinitely many
possible values (e.g., 426 < X < 428).
2-30
Scales of Measurement
Data
Qualitative
Numerical
Nominal
Ordinal
Quantitative
Non-numerical
Nominal
Ordinal
Numerical
Interval
Ratio
Time Series vs. Cross-Sectional Data
Time Series Data
•Values that correspond to specific measurements
taken over a range of time periods
Cross-Sectional Data
•Values collected from a number of subjects
during a single time period
Time Series versus Cross-Sectional
Data
• Time Series Data
Each observation in the sample represents a
different equally spaced point in time (e.g.,
years, months, days).
Periodicity may be annual, quarterly, monthly,
weekly, daily, hourly, etc.
We are interested in trends and patterns over
time (e.g., annual growth in consumer debit
card use from 2015 to 2020).
Time Series Plot
Used to graphically display data produced
over time
Shows trends and changes in the data over
time
Time recorded on the horizontal axis
Measurements recorded on the vertical axis
Points connected by straight lines
Time Series Plot Example
• The following data shows
the average retail price of
regular gasoline for 8
weeks in 2016.
• Draw a time series plot for
this data.
Date
Oct 16, 2006
Oct 23, 2006
Oct 30, 2006
Nov 6, 2006
Nov 13, 2006
Nov 20, 2006
Nov 27, 2006
Dec 4, 2006
Average
Price
$2.219
$2.173
$2.177
$2.158
$2.185
$2.208
$2.236
$2.298
Time Series Plot Example
Price
2.35
2.3
2.25
2.2
2.15
2.1
2.05
10/16
10/23
10/30
11/6
Date
11/13
11/20
11/27
12/4
Time Series Data
• Time series data are collected over several
time periods.
• Example: data detailing the number of
building permits issued in Mississauga
municipality, Ontario in each of the last 36
months
Cross-Sectional Data
• Cross-sectional data are collected at the
same or approximately the same point in
time.
• Example: data detailing the number of
building permits issued in June 2010 in
each of the municipalities of Ontario
Time Series versus Cross-Sectional Data
Cross-sectional Data
Each observation represents a different individual unit
(e.g., person) at the same point in time
(e.g., monthly VISA balances).
We are interested in
- variation among observations or in
- relationships.
We can combine the two data types to get pooled
cross-sectional and time series data.
2-39
Fundamental Elements
1. Experimental unit
•
Object upon which we collect data
2. Population
•
All items of interest
3. Variable
•
• P in Population
& Parameter
• S in Sample
& Statistic
Characteristic of an individual
experimental unit
4. Sample
•
Subset of the units of a population
Why Sample ?
1. Prohibitive cost of surveying the whole population
2. Destructive nature of some tests
3. Physical impossibility of capturing the population
41 of 20
Parameters and Statistics?
• Statistics are computed from a sample of n items,
chosen from a population of N items.
• Statistics can be used as estimates of parameters
found in the population.
Any measurement computed from a sample.
Usually, the statistic is regarded as an estimate of
a population parameter. Sample statistics are
often (but not always) represented by Roman
letters.
2-42
Parameter or Statistic?
Parameter
Any measurement that describes an entire
population. Usually, the parameter value is
unknown since we rarely can observe the entire
population. Parameters are often (but not always)
represented by Greek letters.
2-43
Parameters and Statistics?
Situations Where A Sample May Be
Preferred:
Infinite Population
No census is possible if the population is infinite or of
indefinite size (an assembly line can keep producing bolts, a
doctor can keep seeing more patients).
Destructive Testing
The act of sampling may destroy or devalue the item
(measuring battery life, testing auto crashworthiness, or
testing aircraft turbofan engine life).
2-44
Parameters and Statistics?
Situations Where A Sample May Be
Preferred:
Timely Results
Sampling may yield more timely results than a census
(checking wheat samples for moisture and protein content,
checking peanut butter for aflatoxin contamination).
Accuracy
Sample estimates can be more accurate than a census.
Instead of spreading limited resources thinly to attempt a
census, our budget of time and money might be better spent
to hire experienced staff, improve training of field interviewers,
and improve data safeguards.
2-45
Parameters and Statistics?
Situations Where A Sample May Be
Preferred:
Cost
Even if it is feasible to take a census, the cost, either in time or
money, may exceed our budget.
Sensitive Information
Some kinds of information are better captured by a welldesigned sample, rather than attempting a census.
Confidentiality may also be improved in a carefully-done
sample.
2-46
Parameters and Statistics?
Situations Where A Census May Be
Preferred
Small Population
If the population is small, there is little reason to sample, for
the effort of data collection may be only a small part of the
total cost.
Large Sample Size
If the required sample size approaches the population size,
we might as well go ahead and take a census.
2-47
Parameters and Statistics?
Situations Where A Census May Be
Preferred
Database Exists
If the data are on disk we can examine 100% of the cases.
But auditing or validating data against physical records may
raise the cost.
Legal Requirements
Banks must count all the cash in bank teller drawers at the
end of each business day. The U.S. Congress forbade
sampling in the 2000 decennial population census.
2-48
Parameters or Statistics?
Finite or Infinite?
A population is finite if it has a definite size, even if its
size is unknown.
A population is infinite if it is of arbitrarily large size.
Rule of Thumb: A population may be treated as infinite
when N is at least 20 times n (i.e., when N/n ≥ 20)
N
n
Here,
N/n ≥ 20
2-49
Descriptive Statistics
• Descriptive statistics are the tabular,
graphical, and numerical methods used to
summarize and present data.
Example: Hudson Auto Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. She examines 50
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Example: Hudson Auto Repair

Sample of Parts Cost ($) for 50 Tune-ups
91
71
104
85
62
78
69
74
97
82
93 57
72 89
62 68
88 68
98 101
75 52 99
66 75 79
97 105 77
83 68 71
79 105 79
80
75
65
69
69
97 62
72 76
80 109
67 74
62 73
Tabular Summary:
Frequency and Percent Frequency
Parts
Cost ($)
50-59
60-69
70-79
80-89
90-99
100-109
Parts
Frequency
2
13
16
7
7
5
50
Percent
Frequency
4
26
(2/50)100
32
14
14
10
100
Graphical Summary: Histogram
Tune-up Parts Cost
18
16
Frequency
14
12
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Numerical Descriptive Statistics
 The most common numerical descriptive statistic
is the average (or mean).
 Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).
Pareto Diagram
Like a bar graph, but with the categories arranged by
height in descending order from left to right.
Percent
Used
Also
Frequency
150
Equal Bar
Widths
Bar Height
Shows
Frequency or %
100
50
0
Acct.
Mgmt.
Major
Zero Point
Econ.
Vertical Bars
for Qualitative
Variables
Statistical Inference
Population
Sample
- the set of all elements of interest in a
particular study
- a subset of the population
Statistical inference - the process of using data obtained
from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census
- collecting data for a population
Sample survey
- collecting data for a sample
Process of Statistical Inference
1.
Population
consists of all tuneups. Average cost of
parts is unknown.
4.
The sample average
is used to estimate the
population average.
2.
A sample of 50
engine tune-ups
is examined.
3. The sample data
provide a sample
average parts cost
of $79 per tune-up.
Chapter Summary
• Statistics is the science of collecting, organizing,
presenting, analyzing, and interpreting data to
assist in making more effective decisions
• There are two types of statistics – descriptive and
inferential
• There are two types of variables – qualitative and
quantitative
• There are two types of quantitative variables –
discrete and continuous
• There are four levels of measurement – nominal,
ordinal, interval, and ratio
59 of 20
Chapter END!
• Chapter END!
Download