Lecture Notes 01 to 03-Spring 19

advertisement
MIS 301 Statistical Analysis for Business
Chris O’Byrne
Spring 2019, Sections 1 & 6
Lecture Notes for Chapters 1 – 3
Course Text: Anderson, D.R., Sweeney, D.J. and T.A. Williams, Modern Business Statistics with
Microsoft Excel, 5th edition, Cengage Learning, 2015.
Suggested Problems:
Read Chapters 1 – 3.
Chapter 1 exercises: 3, 7, 10, 13
Chapter 2 exercises: 2, 8, 11, 12, 15, 23, 29, 49
Chapter 3 exercises: 4, 5, 6, 8, 21, 27, 31, 40, 47, 52, 56, 58, 61
Use Excel for problems with a CD file.
Answers to all of the homework problems are available in the solutions manual.
Section 1 – Monday, January 28, 2019
Sections 6 – Friday, January 25, 2019
Copyright © 2017, Chris O’Byrne, MIS Department, San Diego State University. Any use or reproduction of these
notes without his consent is prohibited by law.
Chapter 1: Data and Statistics
Terms:
Statistics: The art and science of collecting, analyzing, presenting, and interpreting
data.
Data: The facts and figures that are collected, analyzed, and summarized for
presentation and interpretation.
Elements: The entities on which data are collected.
Variable: An attribute or characteristic of an entity.
Observation: The set of measurements for a particular element.
Example: Financial Data on Selected Companies
Company
Exchange
Ticker
Market Cap
(millions)
P/E Ratio
Current
Price
52-week
high
52-week
low
Qualcomm
NASDAQ
QCOM
56,767
30.75
34.75
44.99
32.08
GE
NYSE
GE
371,101
21.21
34.99
37.75
31.42
IBM
NYSE
IBM
127,936
15.86
79.30
99.10
71.85
Gateway
NYSE
GTW
1,392
0.00
3.75
6.92
2.78
Microsoft
NASDAQ
MSFT
271,081
24.36
25.09
30.20
23.82
Source: money.cnn.com accessed on 7/10/05
What is the element or entity in the above data set?
What are the attributes for these entities?
How many observations are in the above data set?
Scale of Measurement:
Nominal: Data are labels or names.
Ordinal: The data have the properties of nominal data and can be ordered in some
meaningful way.
Interval: The data have the properties of ordinal data and equal distances on the
scale represent equal distances being measured.
Ratio: The data have the properties of interval data and contain an absolute zero
value resulting in meaningful ratios.
1
Qualitative Data vs. Quantitative Data
Qualitative data use labels or names to describe an attribute of an element whereas
Quantitative data indicate how much or how many.
Discrete vs. Continuous
Cross-Sectional Data vs. Time Series Data
Cross-sectional data has been collected at approximately the same time. Time
series data are collected over a series of time periods.
Statistical Studies
Experimental: Control one or more independent variable (IV) to measure their
influence on a dependent variable (DV).
Observational: No attempt is made to control the IV.
Statistical Inference
Population: The set of all elements of interest in a particular study.
Sample: A subset of the population.
Ethical Guidelines for Statistical Practice
(unethical behavior can take a variety of forms)
•
•
•
•
•
•
Improper sampling
Inappropriate analysis of the data
Development of misleading graphs
Use of inappropriate summary statistics
Biased interpretation of the statistical results
Multiple tests until a desired result is obtained
2
Chapter 2: Descriptive Statistics: Tabular and Graphical Methods
Read Chapter 2
Review the Following:
2.1
Summarizing Qualitative Data
Frequency Distribution
Relative and Percent Frequency Distributions (Table 2.3)
Bar Graphs and Pie Charts (Figures 2.3 & 2.4)
2.2
Summarizing Quantitative Data
Frequency Distribution (Table 2.6)
Relative and Percent Frequency Distributions (Table 2.7)
Histogram (Figures 2.7 & 2.8)
Cumulative Distributions (Table 2.8)
Ogive (Figure 2.15)
2.3
Stem-and-Leaf Display
2.4
Crosstabulations and Scatter Diagrams (Table 2.11 and Figure 2.23)
3
Chapter 3: Descriptive Statistics: Numerical Measures
Terms:
Sample Statistics: Numerical measures computed from a sample (e.g., the sample
mean, x , and the sample standard deviation, s).
Population Parameters: Numerical measures computed from a population (e.g., the
population mean,  , and the population standard deviation,  ).
Point Estimate: The sample statistic used to estimate the corresponding population
parameter (e.g., the sample mean, x , is often used as a point estimate of the
population mean,  ).
3.1 Measures of Location
Mean
Population Mean
Sample Mean
Median – The middle value when the data are sorted in ascending order.
To compute the median, sort data in ascending order:
(a) If n is odd, the median is the middle value.
(b) If n is even, the median is the average of the two middle values.
Mode – The value that occurs with the greatest frequency.
Excel Functions:
Mean
=AVERAGE(range)
Median
=MEDIAN(range)
Mode
=MODE(range)
4
Percentile: The pth percentile is a value that at least p percent of the observations
are ≤ to this value and at least (1-p) percent of the observations are ≥ to this value.
Calculating the pth Percentile
Step 1.
Step 2.
Step 3.
Sort data in ascending order.
 p 
i
=

n
Computer index i:
100


(a) If i is not an integer, round up. The next integer greater than i
denotes the position of the pth percentile.
(b) If i is an integer, the pth percentile is the average of the values in
positions i and i + 1.
Quartiles
Q1 = first quartile, or 25th percentile
Q2 = second quartile, or 50th percentile (median)
Q3 = third quartile, or 75th percentile
Calculate the 35th percentile - 2 4 7 9 13 17 21 23 29 40 43
Excel Functions:
Percentile
=PERCENTILE(array, k) where k is the percentile from 0 to 1
Quartile
=QUARTILE(array, quart) where quart is the quartile
Note: Excel assumes that you are working with continuous data and uses a slightly different formula to compute
percentiles (for an explanation see p. 95 in the textbook). When working with large datasets, this difference is
negligible.
5
Exact Values for Percentiles & Quartiles:
LP =
P
(n + 1)
100
if LP = 10.4
P Percentile = 10th # + .4(11th #-10th #)
.4 or 4 percent between the 10th and 11th number
Calculate the 35th percentile and 3rd Quartile - 2 4 7 9 13 17 21 23 29 40 43
Example. The following is a list of countries and their Gross National Income
(GNI) per capita as published by the World Bank, 7/16/01. The data is rounded
and presented in thousands (1,000s).
Australia: 20
Canada: 20
Ireland: 23
Italy: 19
Portugal: 11
Singapore: 24
Spain: 15
Sweden: 27
Switzerland: 37
United States: 34
Calculate the Mean, Median, Mode,
Q1 , Q2 , Q3 and the 40th percentile
6
3.2 Measures of Variability
Range
Interquartile Range
Variance
Population Variance
Sample Variance
Standard Deviation
Population Standard Deviation
Sample Standard Deviation
Coefficient of Variation – is a descriptive statistic that indicates how large the
standard deviation is relative to the mean.
s

  100  %
x

Using the following sample: 3 8 11 15 20 25 28 29 31 35
Find the mean, median, quartiles, range, IQR, 60th percentile and the coefficient of
variation.
7
Compute the coefficient of variation for the following stocks:
Stock
IBM
Citibank
Mean
121
14
Variance
36
18
Excel Functions:
Range
=MAX(range)-MIN(range)
IQR
=QUARTILE(array, 3) - QUARTILE(array, 1)
Variance
=VAR(range)
Std. Dev.
=STDEV(range)
Note: To compute the population variance and the population standard deviation, use =VARP(range) and
=STDEVP(range) respectively.
Example. The following is a list of countries and their Gross National Income
(GNI) per capita as published by the World Bank, 7/16/01. The data is rounded
and presented in thousands (1,000s).
Australia: 20
Canada: 20
Ireland: 23
Italy: 19
Portugal: 11
Singapore: 24
Spain: 15
Sweden: 27
Switzerland: 37
United States: 34
Calculate the Range, Variance, Standard Deviation, IQR, and Coefficient of
Variation
8
Review “Using Excel’s Descriptive Statistics Tool” on pp.106 – 108.
Data File: Salary.xls
Tools → Data Analysis → Descriptive Statistics
Output
9
3.3 Measures of Distribution Shape, Relative Location, and Detecting
Outliers
Skewness = (negative, positive)
Excel Function: =SKEW(range)
Mean vs Median:
Relative Location
The z-score is often referred to as the standardized value, it is the number of
standard deviations an observation x i is away from x .
z-Score
Country
Per capita GNI (1,000s)
Australia
20
Canada
20
Ireland
23
Italy
19
Portugal
11
Singapore
24
Spain
15
Sweden
27
Switzerland
37
United States
34
z-Score
Example:
Test:
Mean:
Std Dev.
SAT
970
280
ACT
24
3.2
Student:
Score:
Sally
1330
Alice
30
Who did better on there respective exam?
10
Chebyshev’s Theorem:
-lower bound – “at least”
k>1
Example: Mean=28, standard deviation=5, what is the minimum proportion of
data points that fall within 12 units of the mean?
Example: If the mean is 45 and the standard deviation is 8, what is the minimum
proportion of data points that will fall between 35 and 55?
Example: If the mean is 163 and the standard deviation is 25, what is the minimum
proportion of data points that will fall between 100 and 226?
Empirical Rule:
(normal dist.)
  1  68%
  2  95%
  3  99.7%
Rule of thumb for Identifying Outliers:
11
3.4 Exploratory Data Analysis
Five Number Summary
Stem Plot
Box Plot
8 15 19 22 22 25 30 33 34 38 40 44 45 48 53 81
3.5 Measures of Association Between Two Variables
Covariance
Sample Covariance
Population Covariance
Pearson Product Moment Correlation Coefficient
Sample Correlation Coefficient:
Population Correlation Coefficient:
12
3.6 Weighted Mean
Weighted Mean
Example: A student in Dr. Reinig’s section of IDS 301 finishes the semester with the following
scores:
Assignments: 65, 95, 100 (worth 10%)
Midterms: 58, 76, 68
(high is worth 30%, 2nd is worth 25%, 3rd is thrown out)
Final: 80
(worth 35%)
What is this student’s weighted mean for the course?
Assume that the student in the above example has not yet taken the final. What is his weighted
mean for the course prior to the final and what does he need to score on the final exam to earn a
72 for the course?
Example:
x
5
25
90
f(x)
10
4
1
GPA
Class
IDS
MGMT
BIO
PE
ACCT
MKT
Find the weighted average?
Cars per household in a Community
Units
Grade
Cars
Households
3
3
5
1
4
2
A
B
B
D
F
C
0
1
2
3
4
48
51
34
9
5
13
Geometric Mean (used typically in financial data)
2 questions: 1) value at end of period
2) average rate of return
Year
1
2
3
4
5
6
7
8
9
10
$100
Return
-22.1%
28.7%
10.9%
4.9%
15.8%
5.5%
-37.0%
26.5%
15.1%
2.1%
Growth
Factor
0.779
1.287
1.109
1.049
1.158
1.055
0.63
1.265
1.151
1.021
$ 133.45
1.029275
1.3345
Avg Rate just adding up and Dividing- Wrong
5.040%
2.9275%
Initially invested $100
$100*[.779*1.287*1.109*1.049*1.158*1.055*.630*1.265*1.151*1.021] = $133.4493
Mean Growth Rate➔ x g = 10 1.334493 = 1.029275 ➔ 2.9275% annual growth rate
$133.4493 / $100 (initial investment)
If you invested $1500 for 5 years and received the following returns, what will be the value of
your investment and the average rate of return after 5 years?
Year
1
2
3
4
5
Return
15.0%
32.1%
-11.2%
4.9%
-9.0%
Growth
Factor
14
Practice Problems:
1.
Chapter 3
Q. 62
Tax Penalties on Payroll Taxes
820
270
450
1010
890
700
1350
350
300
1200
390
730
2040
230
640
350
420
270
370
620
The above data represents a sample of 20 companies tax penalties for not properly paying payroll
taxes.
A.
What is the range of this data?
B.
What is the IQR?
C.
What is the mean and median?
D.
What is the first and third quartile?
E.
What is the 60th percentile?
F.
What is the 35th percentile?
G.
What is the coefficient of variation?
Sample Standard Deviation = 455.91
2.
The average weight of a 26 year old male is 171 pounds with a standard deviation of 18
pounds. Assume the following weights are normally distributed.
A.
What is the probability that you pick someone and they weigh more than 189 pounds?
B.
What is the probability that you pick someone and they weigh less than 135 pounds?
C.
What is the probability that you pick someone and they weigh more than 153 pounds?
D.
What is the probability that you pick someone and they weigh less than 225 pounds?
E.
What is the probability that you pick someone and they weigh more than 225 pounds?
F.
What is the probability that you pick someone and they weigh between 135 pounds and
189 pounds?
15
3.
The average amount of money in tips a food server makes at this particular restaurant is
$91 with a standard deviation of $16. What is the minimum percentage of food servers
that make between $50 and $132.
4.
The average cell phone bill is $78 with a standard deviation of $10. What is the
minimum percentage of cell phone bills that will be between $54 and $102.
5.
What is the weighted average of the following salaries:
ategory
# of
Workers
CEO
President
CFO
Mechanics
Sales
Associates
6.
Salary
1
1
1
15
20
10
$
$
$
$
$
$
100,000
75,000
65,000
40,000
35,000
45,000
The following table is the prices and the number of times I paid that price to go to
baseball games this season. Tell me the average price I paid this season.
Price of Ticket
$1
$3
$8
$15
$40
# of Tickets
8
21
17
15
8
7.
In one of your classes the syllabus states that HW is worth 10%, Quizzes 20%, Test 40%
and the final 30%. Your grades on the HW is 80, Quizzes is 75 and Tests 68. What is
your current grade in the class? What do you need to get on the final to get a 75 in the
class?
8.
The possible returns for your portfolio and the probabilities that it occurs is as follows:
-15% .20
9% .15
11% .30
18% .25
25% .10
What is the expected return for the portfolio?
16
9.
If you invested $1000 for 4 years and received the following returns, what will be the
value of your investment and the average rate of return after 4 years?
Year
1
2
3
4
Growth
Factor
Return
25.0%
-50.0%
30.0%
10.0%
Answers to Practice Problems:
1. A. 1810
B. 522.5 C. 670 & 535
D. 350 & 872.5 E. 676 F. 377 G. 68.05%
2. A. 16% B. 2.5% C. 84% D. 99.85% E. .15% F. 81.5%
3. std dev = 2.5625
84.771%
4. std dev = 2.4
5.
$41,458.33
6.
7.
before the final 71.71, needed on the final 82.67
8.
8.65%
82.6389%
$10.90
9.
Year
1
2
3
4
$1,000
0.894
Return
25.0%
-50.0%
30.0%
10.0%
Growth
Factor
1.25
0.5
1.3
1.1
$ 893.75
0.972308
Avg Rate just adding up and Dividing- Wrong
3.750%
-2.769%
17
The following will be included in your formula sheet for Midterm 1:
Formulas from Chapter 3: Descriptive Statistics: Numerical Measures
Sample Mean: x =
 xi
n
Sample Standard Deviation: s = s 2
z-score: z i =
xi − x
s
Correlation Coefficient: rxy =
 (x i − x )
2
Sample Variance: s 2 =
s

Coefficient of Variation:   100  %
x

Sample Covariance: s xy =
s xy
s xs y
n −1
Weighted Mean: x =
 (x i − x )(y i − y )
n −1
 wi xi
 wi
18
Download