Uploaded by k60.2112343619

Chapter1 S

advertisement
Statistics for Business Administration
(TOAE302)
Theory of Economics Statistics
(TOAE301)
Nguyen Thu Hang
nguyenthuhang.cs2@ftu.edu.vn
Assessment



Attendance: 10%
Mid-term test: 30%
Final exam: 60%
Course outline








Chapter 1: Introduction to Statistics
Chapter 2: Summarizing Data
Chapter 3: Numerical Descriptive Techniques
Chapter 4: Inferences Based on a Single Sample:
Confidence Intervals and Tests of Hypothesis
Chapter 5: Inferences Based on a Two Samples
Confidence Intervals and Tests of Hypothesis
Chapter 6: ANOVA Analysis
Chapter 7: Regression Analysis
Chapter 8: Time series analysis
Text book



Gerald Keller (2018), Statistics for Management
and Economics, Cengage Learning
James T. McClave • P. George Benson • Terry
Sincich (2018), Statistics for Business and
Economics, Pearson Education
Levin, Stephan, Krehbiel & Berenson, Statistics
for Managers Using Microsoft Excel, 8e © 2017
Pearson Prentice-Hall, Inc.
Chapter 1
Introduction to Statistics (6 hours)
Chapter outline
In this chapter you learn:
 1. Statistics Definition and Objectives

2. Statistical Concepts

3. Types of data and variable
measurements

4. Statistical Analysis Process

5. Source of Data

6. Questionnaire design
Business Statistics Marks

A student enrolled in a
business program is attending
the first class of the required
statistics course. The student is
somewhat
apprehensive
because he believes the myth
that the course is difficult. To
alleviate his anxiety, the
student asks the professor
about last year’s marks. The
professor obliges and provides
a list of the final marks, which is
composed of term work plus
the
final
exam.
What
information can the student
obtain from the list?
Business Statistics Marks

A student enrolled in a
business program is attending
the first class of the required
statistics course. The student is
somewhat
apprehensive
because he believes the myth
that the course is difficult. To
alleviate his anxiety, the
student asks the professor
about last year’s marks. The
professor obliges and provides
a list of the final marks, which is
composed of term work plus
the
final
exam.
What
information can the student
obtain from the list?
Case Pepsi’ Agreement
Case Pepsi’ Agreement
1. What Is Statistics?
1.
Collecting Data
e.g., Survey
2.
Presenting Data
e.g., Charts & Tables
3.
Characterizing Data
e.g., Average
Data
Analysis
Why?
DecisionMaking
1. What is statistics?



A branch of mathematics taking and
transforming numbers into useful information for
decision makers. Statistics is a way to get
information from data.
Methods for processing & analyzing numbers
Methods for helping reduce the uncertainty
inherent in decision making
1. What Is Statistics?
Statistics is the science of data.
It involves
collecting,
classifying,
summarizing,
organizing,
analyzing,
interpreting
numerical information.
Application Areas

Economics



Forecasting
Demographics
Sports


Individual & Team
Performance
Engineering



Construction
Materials
Business


Consumer
Preferences
Financial Trends
Objectives of Statistics
Decision Makers Use Statistics To:





Present and describe business data and information
properly
Draw conclusions about large groups of individuals or
items, using information collected from subsets of the
individuals or items.
Make reliable forecasts about a business activity
Improve business/production processes
Improve product quality
Statistics: Two Processes
A Describing sets of data
B Drawing conclusions
making estimates,
decisions,
predictions, etc.
about sets of data based on sampling
Types of Statistics


Statistics
The branch of mathematics that transforms data into
useful information for decision makers.
Descriptive Statistics
Collecting, summarizing, and
describing data
Inferential Statistics
Drawing conclusions and/or
making decisions concerning a
population based only on sample
data
Descriptive Statistics

Collect data


Present data


e.g., Survey
e.g., Tables and graphs
Characterize data

X

e.g., Sample mean =
n
i
Descriptive Statistics
Descriptive statistics
utilizes numerical and graphical methods to
explore data,
i.e., to look for patterns in a data set,
to summarize the information revealed in a
data set,
to present the information in a convenient
form.
Inferential Statistics

Estimation


e.g., Estimate the population
mean weight using the sample
mean weight
Hypothesis testing

e.g., Test the claim that the
population mean weight is 120
pounds
Drawing conclusions about a large group of
individuals based on a subset of the large group.
Inferential Statistics





Inferential statistics utilizes sample data to
make estimates,
decisions,
predictions,
other generalizations
about a larger set of data.
Example- Inferential statistics
2. Statistical Concepts
Experimental unit Object upon which we collect data
Population
the totality of objects under consideration • P in
Population
Variable
& Parameter
Characteristic of an individual
experimental unit
• S in Sample
& Statistic
Measurement
the process we use to assign numbers to variables of
individual population units
Sample
Subset of the units of a population that is selected for
analysis
Measurement





Numerical representations are not often readily available
for some variables, so the process of measurement
plays an important supporting role in statistical studies.
Measurement is the process we use to assign numbers
to variables of individual population units.
Measure the preference for a food product by asking a
consumer to rate the product’s taste on a scale from 1 to
10.
Measure workforce age by simply asking each worker,
“How old are you?”.
Measure gender by giving 0 and 1 for female and male,
respectively.
2. Statistical Concepts



Data
 facts or information that is relevant or appropriate to
a decision maker
Parameter
 a summary measure (e.g., mean) that is computed
to describe a characteristic of the population
Statistic
 a summary measure (e.g., mean) that is computed
to describe a characteristic of the sample
Population vs. Sample
Population
Measures used to describe the
population are called parameters
Sample
Measures computed from
sample data are called statistics
Example





According to a report in the Washington Post (Sep.
5, 2014), the average age of viewers of television
programs broadcast on CBS, NBC, and ABC is 54
years. Suppose a rival network (e.g., FOX) executive
hypothesizes that the average age of FOX viewers
is less than 54. To test her hypothesis, she samples
200 FOX viewers and determines the age of each.
a. Describe the population.
b. Describe the variable of interest.
c. Describe the sample.
d. Describe the inference.
2. Statistical Concepts

Measure of Reliability
•
Statement (usually qualified) about the degree of
uncertainty associated with a statistical inference
Four Elements of Descriptive
Statistical Problems
1.
2.
3.
4.
The population or sample of interest
One or more variables (characteristics of the
population or sample units) that are to be
investigated
Tables, graphs, or numerical summary tools
Identification of patterns in the data
Five Elements of Inferential
Statistical Problems
1.
2.
3.
4.
5.
The population of interest
One or more variables (characteristics of the
population units) that are to be investigated
The sample of population units
The inference about the population based on
information contained in the sample
A measure of reliability for the inference
Example
Example
“The actual preference for Pepsi is between 51%
and 61%” This interval represents a measure of
reliability for the inference
Process (optional)
A process is a series of actions or operations that
transforms inputs to outputs. A process produces
or generates output over time.
Process
A process whose operations or actions are
unknown or unspecified is called a black box.
Any set of output (object or numbers) produced by
a process is called a sample.
Example

A particular fast-food restaurant chain has 6,289 outlets with
drive-through windows. To attract more customers to its
drive-through services, the company is considering offering
a 50% discount to customers who wait more than a
specified number of minutes to receive their order. To help
determine what the time limit should be, the company
decided to estimate the average waiting time at a particular
drive-through window in Dallas, Texas. For 7 consecutive
days, the worker taking customers’ orders recorded the time
that every order was placed. The worker who handed the
order to the customer recorded the time of delivery. In both
cases, workers used synchronized digital clocks that
reported the time to the nearest second. At the end of the 7day period, 2,109 orders had been timed.
Example (cont)





a. Describe the process of interest at the Dallas
restaurant.
b. Describe the variable of interest.
c. Describe the sample.
d. Describe the inference of interest.
e. Describe how the reliability of the inference could be
measured.
3. Types of Data and variable
measurements
Quantitative data are measurements that are
recorded on a naturally occurring numerical scale.
Qualitative data are measurements that cannot
be measured on a natural numerical scale; they
can only be classified into one of a group of
categories.
3. Types of Data
Types of
Data
Quantitative
Data
Qualitative
Data
Quantitative Data
Measured on a numeric
scale.
Number of defective
items in a lot.
Salaries of CEOs of
oil companies.
Ages of employees at
a company.
4
943
21
52
12
120
8
71
3
Qualitative Data
Classified into categories.
College major of each
student in a class.
Gender of each employee
at a company.
Method of payment
(cash, check, credit card).
$
Credit
Example

Chemical and manufacturing plants sometimes
discharge toxic-waste materials such as DDT
into nearby rivers and streams. These toxins
can adversely affect the plants and animals
inhabiting the river and the riverbank. The U.S.
Army Corps of Engineers conducted a study of
fish in the Tennessee River (in Alabama) and its
three tributary creeks: Flint Creek, Limestone
Creek, and Spring Creek. A total of 144 fish
were captured, and the following variables were
measured for each: (continued on next slide)
Example (cont)






1. River/creek where each fish was captured
2. Species (channel catfish, largemouth bass,
or smallmouth buffalo fish)
3. Length (centimeters)
4. Weight (grams)
5. DDT concentration (parts per million)
These data are saved in the DDT file. Classify
each of the five variables measured as
quantitative or qualitative.
Types of Variables

Categorical (qualitative) variables have values that
can only be placed into categories, such as “yes” and
“no.”

Numerical (quantitative) variables have values that
represent quantities.
Types of Variables
Data
Categorical
Numerical
Examples:



Marital Status
Political Party
Eye Color
(Defined categories)
Discrete
Examples:


Number of Children
Defects per hour
(Counted items)
Continuous
Examples:


Weight
Voltage
(Measured characteristics)
Levels of Measurement

A nominal scale classifies data into distinct categories in
which no ranking is implied.
Categorical Variables
Categories
Personal Computer
Ownership
Yes / No
Type of Stocks Owned
Growth Value Other
Internet Provider
Microsoft Network / AOL/ Other
Levels of Measurement

An ordinal scale classifies data into distinct categories
in which ranking is implied
Categorical Variable
Ordered Categories
Student class designation
Freshman, Sophomore, Junior,
Senior
Product satisfaction
Satisfied, Neutral, Unsatisfied
Faculty rank
Professor, Associate Professor,
Assistant Professor, Instructor
Standard & Poor’s bond ratings
AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades
A, B, C, D, F
Levels of Measurement

An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true zero
point.

A ratio scale is an ordered scale in which the difference
between the measurements is a meaningful quantity
and the measurements have a true zero point.
Interval and Ratio Scales
Difference between interval and
ordinal scales


The critical difference between them is that the intervals
or differences between values of interval data are
consistent and meaningful (which is why this type of
data is called interval).
For example, the difference between marks of 85 and
80 is the same five-mark difference that exists between
75 and 70—that is, we can calculate the difference and
interpret the results.
Difference between interval and
ordinal scales

Because the codes representing ordinal
data are arbitrarily assigned except for the
order, we cannot calculate and interpret
differences.

Using a 1-2-3-4-5 coding system to represent poor, fair,
good, very good, and excellent, we note that the
difference between excellent and very good is identical
to the difference between good and fair. With a 6-18-2345-88 coding, the difference between excellent and very
good is 43, and the difference between good and fair is
5.
4. Statistical Analysis Process







Identify research goals
Identify variables of interest and measuring
methods
Data collection
Data summarization
Data analysis
Forecasting
Decision making
The role of statistics in business analytics
Source: From The American
Statistician by George Benson.
Discussion
Monitoring product quality. The Wallace Company of Houston is a
distributor of pipes, valves, and fittings to the refining, chemical, and
petrochemical industries. The company was a recent winner of the
Malcolm Baldrige National Quality Award. One of the steps the company
takes to monitor the quality of its distribution process is to send out a
survey twice a year to a subset of its current customers, asking the
customers to rate the speed of deliveries, the accuracy of invoices, and
the quality of the packaging of the products they have received from
Wallace.
a. Describe the process studied.
b. Describe the variables of interest.
c. Describe the sample.
d. Describe the inferences of interest.
e. What are some of the factors that are likely to affect the reliability of the
inferences?

Questions
What are some of the factors that are likely to
lead to a selection bias problem in:
- A survey of customers’ satisfaction towards
digital banking service?
- A survey of customers’ satisfaction towards
bancassurance service?

5. Sources of Data
1.
2.
3.
Data from a published source
Data from a designed experiment
Data from an observationally study
5. Sources of Data
 Primary Sources: The data collector is the one using the data
for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data
 Secondary Sources: The person performing data analysis is
not the data collector
 Analyzing census data
 Examining data from print journals or data published on the internet.
5. Sources of Data
Published source:
book, journal, newspaper, Web site
(https://www.wider.unu.edu/data),
https://data.worldbank.org/
Designed experiment:
researcher exerts strict control over the units
Survey:
a group of people are surveyed and their
responses are recorded
Observation study:
units are observed in natural setting and variables
of interest are recorded
Designed Experiment

A designed experiment is a data-collection
method where the researcher exerts full control
over the characteristics of the experimental
units sampled. These experiments typically
involve a group of experimental units that are
assigned the treatment and an untreated (or
control) group.
Observational Study

An observational study is a data-collection
method where the experimental units sampled
are observed in their natural setting. No attempt
is made to control the characteristics of the
experimental units sampled. (Examples include
opinion polls and surveys.)
Samples
A representative sample exhibits characteristics
typical of those possessed by the population of
interest.
A simple random sample of n experimental units is
a sample selected from the population in such a
way that every different sample of size n has an
equal chance of selection.
Random Sample

A simple random sample of n experimental units
is a sample selected from the population in such a
way that every different sample of size n has an
equal chance of selection.
Example

Suppose you wish to assess the feasibility of
building a new high school. As part of your
study, you would like to gauge the opinions of
people living close to the proposed building site.
The neighborhood adjacent to the site has 711
homes. Use a random number generator to
select a simple random sample of 20
households from the neighborhood to
participate in the study
Importance of Selection
How a sample is selected from a population is of
vital importance in statistical inference because
the probability of an observed sample will be
used to infer the characteristics of the sampled
population.
Measurement error

Refer to inaccuracies in the values of the data
collected. In the surveys, the error may be due
to ambiguous or leading questions and the
interviewer’s effect on the respondent.
Nonrandom Sample Errors
Selection bias results when a subset of the
experimental units in the population is excluded so
that these units have no chance of being selected
for the sample.
Nonresponse bias results when the researchers
conducting a survey or study are unable to obtain
data on all experimental units selected for the
sample.
Measurement error refers to inaccuracies in the
values of the data recorded. In surveys, the error
may be due to ambiguous or leading questions and
the interviewer’s effect on the respondent.
Example




How do consumers feel about using the Internet
for online shopping? To find out, United Parcel
Service (UPS) commissioned a nationwide
survey of 5,118 U.S. adults who had conducted
at least two online transactions in 2015. One
finding from the study is that 74% of online
shoppers have used a smartphone to do their
shopping.
a. Identify the data-collection method.
b. Identify the target population.
c. Are the sample data representative of the
population?
Questionnaire Design
71
Questionnaires

The validity of the results depends on the quality
of these instruments.


Good questionnaires are difficult to construct; bad
questionnaires are difficult to analyze.
Difficult to design for several reasons:



Each question must provide a valid and reliable
measure.
The questions must clearly communicate the research
intention to the survey respondent.
The questions must be assembled into a logical, clear
instrument that flows naturally and will keep the
respondent sufficiently interested to continue to
cooperate.
72
Quality aims in survey research
Goal is to collect information that is:
 Valid: measures the quantity or concept that is
supposed to be measured
 Reliable: measures the quantity or concept in a
consistent or reproducible manner
 Unbiased: measures the quantity or concept in
a way that does not systematically under- or
overestimate the true value
 Discriminating: can distinguish adequately
between respondents for whom the underlying
level of the quantity or concept is different
Steps to design a
questionnaire:
Step 1: Write out the primary and secondary aims
of your study.
Step 2: Write out concepts/information to be
collected that relates to these aims.
Step 3: Review the current literature to identify
already validated questionnaires that measure
your specific area of interest.
Step 4: Compose a draft of your questionnaire.
Step 5: Revise the draft.
Step 6: Assemble the final questionnaire.
73
Step 1: Define the aims of the
study


Write out the problem and primary and
secondary aims using one sentence per aim.
Formulate a plan for the statistical analysis of
each aim.
Make sure to define the target population in
your aim(s).
74
75
Step 2: Define the variables to be collected

Write a detailed list of the information to be collected and the
concepts to be measured in the study. Are you trying to
identify:







Attitudes
Needs
Behavior
Demographics
Some combination of these concepts
Translate these concepts into variables that can be measured.
Define the role of each variable in the statistical analysis:
76
Step 3: Review the literature

Review current literature to identify related
surveys and data collection instruments that
have measured concepts similar to those
related to your study’s aims.
77
Step 4: Compose a draft



Determine the mode of survey administration:
face-to-face interviews, telephone interviews, selfcompleted questionnaires, computer-assisted
approaches.
Format the draft as if it were the final version with
appropriate white space to get an accurate
estimate as to its length – longer questionnaires
reduce the response rate.
Make sure questions flow naturally from one to
another.
78
Compose a draft



Question: How many cups of coffee or tea do
you drink in a day?
Principle: Ask for an answer in only one
dimension.
Solution: Separate the question into two –


(1) How many cups of coffee do you drink during a
typical day?
(2) How many cups of tea do you drink during a
typical day?
79
Compose a draft

Question: What brand of computer do you own?




(A) IBM PC
(B) Apple
Principle: Avoid hidden assumptions. Make sure to
accommodate all possible answers.
Solution:


(1) Make each response a separate dichotomous item
 Do you own an IBM PC? (Circle: Yes or No)
 Do you own an Apple computer? (Circle: Yes or No)
(2) Add necessary response categories and allow for multiple
responses.
 What brand of computer do you own? (Circle all that apply)




Do not own computer
IBM PC
Apple
Other
80
Compose a draft

Question: Have you had pain in the last week?
[ ] Never [ ] Seldom


[ ] Often
[ ] Very often
Principle: Make sure question and answer
options match.
Solution: Reword either question or answer to
match.

How often have you had pain in the last week?
[ ] Never [ ] Seldom [ ] Often [ ] Very Often
81
Compose a draft



Question: Are you against drug abuse? (Circle:
Yes or No)
Principle: Write questions that will produce
variability in the responses.
Solution: Eliminate the question.
82
Compose a draft



Question: Which one of the following do you think increases
a person’s chance of having a heart attack the most?
(Check one.)
[ ] Smoking
[ ] Being overweight [ ] Stress
Principle: Encourage the respondent to consider each
possible response to avoid the uncertainty of whether a
missing item may represent either an answer that does not
apply or an overlooked item.
Solution: Which of the following increases the chance of
having a heart attack?



Smoking:
Being overweight:
Stress:
[ ] Yes [ ] No [ ] Don’t know
[ ] Yes [ ] No [ ] Don’t know
[ ] Yes [ ] No [ ] Don’t know
83
Compose a draft

Question:





(1) Do you currently have a life insurance policy?
(Circle: Yes or No)
If no, go to question 3.
(2) How much is your annual life insurance premium?
Principle: Avoid branching as much as possible
to avoid confusing respondents.
Solution: If possible, write as one question.

How much did you spend last year for life insurance?
(Write 0 if none).
84
Step 5: Revise


Shorten the set of questions for the study. If a
question does not address one of your aims,
discard it.
Refine the questions included and their wording
by testing them with a variety of respondents.


Ensure the flow is natural.
Verify that terms and concepts are familiar and easy
to understand for your target audience.
Step 6: Assemble the final
questionnaire

Decide whether you will format the questionnaire yourself or
use computer-based programs for assistance:



85
SurveyMonkey.com
Google form
At the top, clearly state:




The purpose of the study
How the data will be used
Instructions on how to fill out the questionnaire
Your policy on confidentiality
Assemble the final
questionnaire


90
Group questions concerning major subject
areas together and introduce them by heading
or short descriptive statements.
Order and format questions to ensure unbiased
and balanced results.
Assemble the final
questionnaire


91
Include white space to make answers clear and
to help increase response rate.
Space response scales widely enough so that it
is easy to circle or check the correct answer
without the mark accidentally including the
answer above or below.


Open-ended questions: the space for the response
should be big enough to allow respondents with large
handwriting to write comfortably in the space.
Closed-ended questions: line up answers vertically
and precede them with boxes or brackets to check, or
by numbers to circle, rather than open blanks.
92
Non-responders


Understanding the characteristics of those who
did not respond to the survey is important to
quantify what, if any, bias exists in the results.
To quantify the characteristics of the nonresponders to postal surveys, Moser and Kalton
suggest tracking the length of time it takes for
surveys to be returned. Those who take the
longest to return the survey are most like the
non-responders. This result may be situationdependent.
93
Conclusions

You need plenty of time!




Design your questionnaire from research hypotheses
that have been carefully studied and thought out.
Discuss the research problem with colleagues and
subject matter experts is critical to developing good
questions.
Review, revise and test the questions on an iterative
basis.
Examine the questionnaire as a whole for flow and
presentation.

End of Chapter 1
Download