Uploaded by MadeFresh

Ch01 data and statstics

advertisement
Chapter 1 Data and Statistics
Xue Yan (薛艳)
School of Economics
Email:xueyan@jsut.edu.cn Office:2# 214
Overview
◼
What is the application of statistics?
◼
What is statistics?
◼
Why study statistics?
Chap 1-2
Application of Statistics
▪ Weather Forecast
Chap 1-3
◼
Medical treatment
e.g. Relationship between
smoking and lung cancer
4
Applications in Accounting
◼
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
◼
For instance, the audit staff selects a subset of the
accounts called a sample. After reviewing the accuracy of
the sampled accounts, the auditors draw a conclusion as
to whether the accounts receivable amount shown on the
client’s balance sheet is acceptable.
◼
Collecting statistics data
◼
Census-taking (普查)
◼
Sampling (抽樣)
Applications in Finance
Financial analysts use a variety of statistical information to
guide their investment recommendations.
◼ For instance, the analysts review a
variety of financial data including
price/earnings ratios and dividend
yields. By comparing the information
for an individual stock with information
about the stock market averages, a financial analyst
can begin to draw a conclusion as to whether an individual
stock is over- or under priced.
◼
Applications in Marketing
◼Electronic
point-of-sale (POS) scanners at retail checkout counters
are used to collect data for a variety of marketing
research applications.
◼For
example, brand managers can review the
scanner statistics and the promotional activity
statistics to gain a better understanding of the
relationship between promotional activities and sales.
Such analyses often prove helpful in establishing future
marketing strategies for the various products.
◼Example:
7-11, Wal-Mart, …, etc.
7
Applications in Production
◼
◼
◼
A variety of statistical quality control charts are used to
monitor the output of a production process.
For example, that a machine fills containers with 12
ounces of a soft drink. Periodically, a production worker
selects a sample of containers and computes the average
number of ounces in the sample. Properly interpret the
average can help determine when adjustments are
necessary to correct a production process.
Example: Yield Rate (良率) in semiconductor chip
manufacturing.
8
Applications in Economics
◼
◼
Economists frequently provide forecasts about the future of
the economy by using a variety of statistical information in
making such forecasts.
For instance, in forecasting inflation rates, economists use
statistical information on such indicators as the Product Price
Index, the unemployment rate, and manufacturing capacity
utilization.
Inflating Rate = I + a*Product Price Index + b*unemployment rate +
c*manufacturing capacity utilization + e
◼
Often these statistical indicators are entered into
computerized forecasting models that predict inflation rate.
9
In Today’s Business World You
Cannot Escape From Data
◼
In today’s digital world ever increasing amounts
of data are gathered, stored, reported on, and
available for further study.
◼
You hear the word data everywhere.
◼
Data are facts about the world and are
constantly reported as numbers by an ever
increasing number of sources.
Statistics is all around us
◼
◼
◼
◼
◼
◼
◼
◼
Is the housing becoming more expensive over time?
Has the unemployment rate fallen over the past year?
Who is the highest scoring basketball player in NBA?
Are millennials more likely to rent than the rest?
Who is the highest paid actress in Hollywood?
What is the average salary of a starting business
analyst?
Is the average salary of a fresh engineer higher than
that of a fresh economist?
Has crime rate declined in China in recent years?
Chap 1-11
The language of Statistics
◼
We use Statistics everyday without really being mindfully of it
◼ Average income, age, height …
◼ Highest paid (Maximum) athlete
◼ Fastest (Maximum) sprinter
◼ Lowest (Minimum) unemployment rate of all OECD
countries
◼ Percent of females studying engineering
◼ How consistent (variance) is a stock performance over the
past three months?
◼ On average, do men spend more (t-test) on clothes than
women?
Chap 1-12
To Properly Apply Statistics You Should Follow A
Framework To Minimize Possible Errors
In this book we will use DCOVA
◼
◼
◼
◼
◼
Define the data you want to study in order to
solve a problem or meet an objective
Collect the data from appropriate sources
Organize the data collected by developing
tables
Visualize the data by developing charts
Analyze the data collected to reach
conclusions and present results
Definition Of Some Terms
DCOVA
VARIABLE
A characteristic of an item or individual.
DATA
The set of individual values associated with a variable.
STATISTICS
Company
Stock
Exchange
Annual Sales
Earn Share
Dataram
AMEX
73.10
0.86
Energy South
OTC
74.00
1.67
Keystone
NYSE
365.70
0.86
Land Care
NYSE
111.40
0.33
Psychemedics
AMEX
17.60
0.13
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.
Chap 1-15
What is statistics?
◼
A branch of mathematics taking and
transforming numbers into useful information for
decision makers
◼
Methods for processing & analyzing numbers
◼
Methods for helping reduce the uncertainty
inherent in decision making
Chap 1-16
Statistics is the art and science of
collecting, analyzing, presenting, and
interpreting data.
17
Types of Statistics
◼
◼
Statistics
The branch of mathematics that transforms data into
useful information for decision makers.
Descriptive Statistics
Collecting, summarizing, and
describing data
Inferential Statistics
Drawing conclusions and/or
making decisions concerning a
population based only on sample
data
Chap 1-18
Descriptive Statistics
◼
Collect data
◼
◼
Present data
◼
◼
e.g., Survey
e.g., Tables and graphs
Characterize data
◼
X

e.g., Sample mean =
i
n
Chap 1-19
Example:
Hudson Auto Repair1
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. She examines 50
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
20
Example:
Hudson Auto Repair2
◼
91
71
104
85
62
Sample of Parts Cost for 50 Tune-ups
78
69
74
97
82
93 57
72 89
62 68
88 68
98 101
75 52 99
66 75 79
97 105 77
83 68 71
79 105 79
80
75
65
69
69
97 62
72 76
80 109
67 74
62 73
21
Tabular Summary:
Frequency and Percent Frequency
Parts
Cost ($)
50-59
60-69
70-79
80-89
90-99
100-109
Parts
Frequency
2
13
16
7
7
5
50
Percent
Frequency
4
26
(2/50)100
32
14
14
10
100
22
Graphical Summary:
Histogram
Frequency
Tune-up Parts Cost
18
16
14
12
10
8
6
4
2
50-59 60-69 70-79 80-89 90-99
Parts
Cost ($)
100-110
23
Inferential Statistics
◼
Estimation
◼
◼
e.g., Estimate the population
mean weight using the sample
mean weight
Hypothesis testing
◼
e.g., Test the claim that the
population mean weight is 120
pounds
Drawing conclusions about a large group of
individuals based on a subset of the large group.
Chap 1-24
Why Study Statistics?
◼
To visualize & summarize business data
◼
◼
To draw conclusions from business data
◼
◼
Inferential methods used to reach conclusions about
a large group based on data from a smaller group
To make reliable forecasts about business
activities
◼
◼
Descriptive methods used to create charts & tables
Inferential methods utilizing statistical models based
on business data
To improve business processes
◼
Involves managerial approaches like Six Sigma
Chap 1-25
References
◼
◼
◼
Larry Wasserman. All of Statistics : A Concise
Course in Statistical Inference [M].
Springer,2010.
Bradley Efron;Trevor Hastie.Computer Age
Statistical Inference: Algorithms, Evidence, and
Data Science[M]. Cambridge University
Press,2016.
David S. Moore;George P. McCabe;Bruce A.
Craig. Introduction to the Practice of
Statistics[M]. Macmillan Learning,2021.
Chap 1-26
Objectives
In this chapter you learn:
◼
◼
◼
◼
How to define variables
How to collect data
To identify different ways to collect a sample
Understand the types of survey errors
Classifying Variables By Type
DCOVA
▪ Categorical (qualitative) variables take categories as
their values such as “yes”, “no”, or “blue”, “brown”,
“green”.
▪ Numerical (quantitative) variables have values that
represent a counted or measured quantity.
▪ Discrete variables arise from a counting process
▪ Continuous variables arise from a measuring process
Examples of Types of Variables
DCOVA
Question
Responses
Variable Type
Do you have a Facebook
profile?
Yes or No
Categorical (Qualitative)
How many text messages
have you sent in the past --------------three days?
Numerical
(discrete)
How long did the mobile
app update take to
download?
Numerical
(continuous)
---------------
Types of Variables
DCOVA
Variables
Categorical
Numerical
Examples:
◼
◼
◼
Marital Status
Political Party
Eye Color
(Defined categories)
Discrete
Examples:
◼
◼
Number of Children
Defects per hour
(Counted items)
Continuous
Examples:
◼
◼
Weight
Voltage
(Measured characteristics)
Collecting Data Correctly Is A Critical Task
▪ Need to avoid data flawed by biases,
ambiguities, or other types of errors.
DCOVA
▪ Results from flawed data will be suspect or in
error.
▪ Even the most sophisticated statistical
methods are not very useful when the data is
flawed.
◼
Developing Operational Definitions Is Crucial
To Avoid Confusion / Errors
DCOVA
An operational definition is a clear and precise
statement that provides a common
understanding of meaning
◼
In the absence of an operational definition
miscommunications and errors are likely to
occur.
◼
Arriving at operational definition(s) is a key part
of the Define step of DCOVA
Sources of Data
DCOVA
▪ Primary Sources: The data collector is the one using the data
for analysis
▪ Data from a political survey
▪ Data collected from an experiment
▪ Observed data
▪ Secondary Sources: The person performing data analysis is
not the data collector
▪ Analyzing census data
▪ Examining data from print journals or data published on the internet.
Sources of data fall into five
categories
DCOVA
◼
Data distributed by an organization or an
individual
◼
The outcomes of a designed experiment
◼
The responses from a survey
◼
The results of conducting an observational
study
◼
Data collected by ongoing business activities
Examples Of Data Distributed
By Organizations or Individuals
DCOVA
◼
Financial data on a company provided by
investment services.
◼
Industry or market data from market research
firms and trade associations.
◼
Stock prices, weather conditions, and sports
statistics in daily newspapers.
Examples of Data From A
Designed Experiment
◼
◼
◼
DCOVA
Consumer testing of different versions of a
product to help determine which product should
be pursued further.
Material testing to determine which supplier’s
material should be used in a product.
Market testing on alternative product
promotions to determine which promotion to
use more broadly.
Examples of Survey Data
DCOVA
◼
A survey asking people which laundry detergent
has the best stain-removing abilities
◼
Political polls of registered voters during political
campaigns.
◼
People being surveyed to determine their
satisfaction with a recent product or service
experience.
Examples of Data Collected
From Observational Studies
◼
DCOVA
Market researchers utilizing focus groups to
elicit unstructured responses to open-ended
questions.
◼
Measuring the time it takes for customers to be
served in a fast food establishment.
◼
Measuring the volume of traffic through an
intersection to determine if some form of
advertising at the intersection is justified.
Examples of Data Collected From
Ongoing Business Activities
DCOVA
◼
A bank studies years of financial transactions to
help them identify patterns of fraud.
◼
Economists utilize data on searches done via
Google to help forecast future economic
conditions.
◼
Marketing companies use tracking data to
evaluate the effectiveness of a web site.
Data Is Collected From Either A
Population or A Sample
DCOVA
POPULATION
A population consists of all the items or individuals
about which you want to draw a conclusion. The
population is the “large group”
SAMPLE
A sample is the portion of a population selected for
analysis. The sample is the “small group”
Population vs. Sample
DCOVA
Population
All the items or individuals about
which you want to draw conclusion(s)
Sample
A portion of the population of
items or individuals
Collecting Data Via Sampling Is Used
When Selecting A Sample Is
DCOVA
◼
Less time consuming than selecting every item
in the population.
◼
Less costly than selecting every item in the
population.
◼
Less cumbersome and more practical than
analyzing the entire population.
Things To Consider / Deal With
In Potential Sources Of Data
DCOVA
◼
Is the source of data structured or unstructured?
◼
How is electronic data formatted?
◼
How is data encoded?
Structured Data Follows An Organizing
Principle & Unstructured Data Does Not
DCOVA
◼
A Stock Ticker Provides Structured Data:
◼
◼
◼
Due to their inherent structure, data from tables and
forms are structured data.
E-mails from five people concerning stock trades is an
example of unstructured data.
◼
◼
The stock ticker repeatedly reports a company name, the
number of shares last traded, the bid price, and the percent
change in the stock price.
In these e-mails you cannot count on the information being
shared in a specific order or format.
This book deals exclusively with structured data
All Of The Methods In This Book
Deal With Structured Data
DCOVA
◼
To use the techniques in this book on
unstructured data you need to convert the
unstructured into structured data.
◼
For many of the questions you might want to
answer, the starting point can / will be tabular
data.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.
Chap 1-46
Data Can Be Formatted and / or Encoded
In More Than One Way
DCOVA
◼
Some electronic formats are more readily
usable than others.
◼
Different encodings can impact the precision of
numerical variables and can also impact data
compatibility.
◼
As you identify and choose sources of data you
need to consider / deal with these issues
Data Cleaning Is Often A Necessary
Activity When Collecting Data
DCOVA
◼
Often find “irregularities” in the data
◼
◼
◼
◼
◼
◼
Typographical or data entry errors
Values that are impossible or undefined
Missing values
Outliers
When found these irregularities should be
reviewed / addressed
Both Excel & Minitab can be used to address
irregularities
After Collection It Is Often Helpful To
Recode Some Variables
DCOVA
◼
◼
◼
◼
Recoding a variable can either supplement or replace
the original variable.
Recoding a categorical variable involves redefining
categories.
Recoding a quantitative variable involves changing this
variable into a categorical variable.
When recoding be sure that the new categories are
mutually exclusive (categories do not overlap) and
collectively exhaustive (categories cover all possible
values).
A Sampling Process Begins With A
Sampling Frame
DCOVA
◼
◼
◼
◼
The sampling frame is a listing of items that
make up the population
Frames are data sources such as population
lists, directories, or maps
Inaccurate or biased results can result if a
frame excludes certain portions of the
population
Using different frames to generate data can
lead to dissimilar conclusions
Types of Samples
DCOVA
Samples
Non-Probability
Samples
Judgment
Convenience
Probability Samples
Simple
Random
Stratified
Systematic
Cluster
Types of Samples:
Nonprobability Sample
◼
DCOVA
In a nonprobability sample, items included are
chosen without regard to their probability of
occurrence.
◼
◼
In convenience sampling, items are selected based
only on the fact that they are easy, inexpensive, or
convenient to sample.
In a judgment sample, you get the opinions of preselected experts in the subject matter.
Types of Samples:
Probability Sample
◼
DCOVA
In a probability sample, items in the sample
are chosen on the basis of known probabilities.
Probability Samples
Simple
Random
Systematic
Stratified
Cluster
Probability Sample:
Simple Random Sample DCOVA
◼
Every individual or item from the frame has an
equal chance of being selected
◼
Selection may be with replacement (selected
individual is returned to frame for possible
reselection) or without replacement (selected
individual isn’t returned to the frame).
◼
Samples obtained from table of random
numbers or computer random number
generators.
Selecting a Simple Random Sample Using A
Random Number Table
DCOVA
Sampling Frame For
Population With 850
Items
Item Name Item #
Bev R.
Ulan X.
.
.
.
.
Joann P.
Paul F.
001
002
.
.
.
.
849
850
Portion Of A Random Number Table
49280 88924 35779 00283 81163 07275
11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401
The First 5 Items in a simple
random sample
Item # 492
Item # 808
Item # 892 -- does not exist so ignore
Item # 435
Item # 779
Item # 002
Probability Sample:
Systematic Sample
DCOVA
◼
Decide on sample size: n
◼
Divide frame of N individuals into groups of k
individuals: k=N/n
◼
Randomly select one individual from the 1st
group
◼
Select every kth individual thereafter
N = 40
n=4
k = 10
First Group
Probability Sample:
Stratified Sample
◼
DCOVA
Divide population into two or more subgroups (called strata) according
to some common characteristic
◼
A simple random sample is selected from each subgroup, with sample
sizes proportional to strata sizes
◼
Samples from subgroups are combined into one
◼
This is a common technique when sampling population of voters,
stratifying across racial or socio-economic lines.
Population
Divided
into 4
strata
Probability Sample
Cluster Sample
◼
DCOVA
Population is divided into several “clusters,” each representative of
the population
◼
A simple random sample of clusters is selected
◼
All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling technique
◼
A common application of cluster sampling involves election exit polls,
where certain election districts are selected and sampled.
Population
divided into
16 clusters.
Randomly selected
clusters for sample
Probability Sample:
Comparing Sampling Methods
◼
◼
◼
DCOVA
Simple random sample and Systematic sample
◼ Simple to use
◼ May not be a good representation of the population’s
underlying characteristics
Stratified sample
◼ Ensures representation of individuals across the entire
population
Cluster sample
◼ More cost effective
◼ Less efficient (need larger sample to acquire the same level
of precision)
Evaluating Survey Worthiness
DCOVA
◼
◼
◼
◼
◼
◼
What is the purpose of the survey?
Is the survey based on a probability sample?
Coverage error – appropriate frame?
Nonresponse error – follow up
Measurement error – good questions elicit good
responses
Sampling error – always exists
Exercise
Suppose that 10,000 customers in a retailer’s
customer database are categorized by three
customer types: 3,500 prospective buyers, 4,500
first time buyers, and 2,000 repeat (loyal) buyers.
A sample of 1,000 customers is needed.
a. What type of sampling should you do? Why?
b. Explain how you would carry out the sampling
according to the method stated in (a).
c. Why is the sampling in (a) not simple random
sampling?
◼
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.
Chap 1-61
Types of Survey Errors
◼
Coverage error or selection bias
◼
◼
People who do not respond may be different from those who do
respond
Sampling error
◼
◼
Exists if some groups are excluded from the frame and have no
chance of being selected
Nonresponse error or bias
◼
◼
DCOVA
Variation from sample to sample will always exist
Measurement error
◼
Due to weaknesses in question design and / or respondent error
Types of Survey Errors
DCOVA
(continued)
◼
Coverage error
Excluded from
frame
◼
Nonresponse error
Follow up on
nonresponses
◼
Sampling error
Random
differences from
sample to sample
◼
Measurement error
Bad or leading
question
Chapter Summary
In this chapter we have discussed:
◼
◼
◼
◼
The types of variables used in statistics
How to collect data
The different ways to collect a sample
The types of survey errors
Download