Uploaded by Tuure Pasto

Managerial Statistics - Introduction

advertisement
Lecture 1: Course Introduction – Descriptive Statistics
B6014 Managerial Stats. – Prof. Jing Dong
Managerial Statistics
Professor Jing Dong
Today:
• Course Outline
• Recap: descriptive statistics material
– mean, median, standard deviation
• Normal Approximation
Course Introduction – Descriptive Statistics
Lecture 1 / #2
Introductions
Professor: Jing Dong
Office: Uris 413
Tel: (212) 854-9154
Email: jing.dong@gsb.columbia.edu
Office hours: Thur 5:00pm-7:00pm or by appointment
TAs: Gowtham Tangirala, Zhe Liu, Sharon Huang, Yue Hu, Pengyu Qian,
Aishwarya Sharma, Raghav Seth
(see Canvas for names/emails/office hrs)
• Course Website: Canvas
Course Introduction – Descriptive Statistics
Lecture 1 / #3
Why are we here?
•
All: Become an informed consumer of statistical information
•
Most: Learn basic statistical analysis for other courses and your work
– Core: Strategy Formulation; Corporate Finance; Marketing; Managerial Economics;
Operations Management; Business Analytics
Elective: Capital Markets and Investments; Marketing Research; Applied Regression
Analysis; Business Analytics II ...
– Examples: estimate product reliability, evaluate the risk and reward of a portfolio,
test for bias in analysts’ recommendations, predict sales on the basis of a product’s
characteristics, . . .
•
Many: Develop a foundation for learning further statistical methods
Course Introduction – Descriptive Statistics
Lecture 1 / #4
Course outline
Regression
Confidence intervals
Sampling and sampling errors
Normal distribution
Random variables, Probability, Exp.Value
Descriptive statistics and Summary measures
Course Introduction – Descriptive Statistics
Lecture 1 / #5
Course Organization
• Lectures Mon/Wed/Fri (only first 4 Fri: 9/7, 9/14, 9/21, 9/28)
– Cluster A: 4:00pm - 5:30pm in Uris 326
– Cluster B: 2:15pm - 3:45pm in Uris 326
– Cluster E: 9:00am - 10:30am in Uris 326
– Cluster F: 10:45 am - 12:15pm in Uris 326
• Review sessions: Wednesdays 5:45-7:15pm, Uris 142
– non mandatory (shared across all 8 clusters)
– reviews material already covered in class; goes over predetermined set of practice
problems; reviews Excel concepts as needed
• Midterm on Friday Sep 21 (in class); midterm exam review session (share across all 8
clusters): Sep 18, 5:45 - 7:15pm in Uris 332
• Final exam on Friday Oct 19; final exam review session (share across all 8 clusters): Oct 15,
9:00-10:00am in Uris 142
Course Introduction – Descriptive Statistics
Lecture 1 / #6
Class Contribution
• Professionalism in class
– Present, on time, prepared and engaged
– no laptops / phones / tablets
• For many sessions, you have a few questions to prepare or answer on Canvas.
• PollEverywhere will be used during the Lecture and your responses will be used for taking
attendance.
• Questions and comments are strongly encouraged! It is OK to say “I’m lost”.
Course Introduction – Descriptive Statistics
Lecture 1 / #7
Reference Material
• Lecture slides are the primary source of information.
They will be provided in class and posted on Canvas.
• Course Reader
– Course Notes: secondary source for the material we will cover
– Practice problems & Answers
– Cases
• Supplementary textbooks (not required):
– Levine, Stephan, Krehbiel, & Berenson, Statistics for Managers, 6th Ed.,
Prentice-Hall, 2010
• Resources
– Weekly review sessions
– Me: feel free to stop by my office hours or make an appointment.
– Teaching Assistants: office hours and contact information on Canvas
– Tutor: available through Student Affairs
– Learning team
Course Introduction – Descriptive Statistics
Lecture 1 / #8
Assignments/Grading
• Midterm & Final exam: closed book
(three sheets of notes, double-sided allowed)
• Four hand-in assignments:
– 4 cases with learning team
First one due on Monday 9/10
• Weekly homework: problems in the course reader
– Not graded; answers also in the course reader
• Grading: based on the maximum of the following two weighting schemes.
Class participation
Hand-in assignments
Midterm
Final
5%
20%
30%
45%
5%
20%
0%
75%
Course Introduction – Descriptive Statistics
Lecture 1 / #9
Course outline
Main goal: introduce and study how basic statistical
tools are used in managerial decision making
We will cover the following key concepts of statistical analysis and inference:
• Descriptive statistics: summarize data, observe patterns, extract vital information
• Probability : systematic framework for dealing with uncertainty, basis for statistical inference
• Sampling : sample data as a guide for statistical inference
• Estimation: point & interval estimators, construction & interpretation of confidence
intervals
• Regression: construction of predictive models based on statistical data
Course Introduction – Descriptive Statistics
Remainder of lecture
• Quick recap of Descriptive Statistics material from pre-term videos
– mean, median
– standard deviation
– Normal Approximation
Lecture 1 / #10
Course Introduction – Descriptive Statistics
aa
• Who is the best baseball player of all time?
Lecture 1 / #11
Course Introduction – Descriptive Statistics
Lecture 1 / #12
aa
• What is happening to the economic health of America’s middle class?
Graphic from CNN Money
The per capita income in the United States climbed from $7,787 in 1980 to $26,487 in 2010.
Course Introduction – Descriptive Statistics
Lecture 1 / #13
Which printer has better quality?
aa
• Data: for each printer sold last year, the file documents the number of quality problems
that were reported during the warranty period. The total number of printers sold by each
company are
– Brand X: 57,334
– Brand Y: 994,773
• Mean number of quality problems per printer
– Brand X: 3.49
– Brand Y: 2.64
Course Introduction – Descriptive Statistics
Lecture 1 / #14
Which printer has better quality?
aa
• Data: for each printer sold last year, the file documents the number of quality problems
that were reported during the warranty period. The total number of printers sold by each
company are
– Brand X: 57,334
– Brand Y: 994,773
• The average number of quality problems per printer
– Brand X: 3.49
– Brand Y: 2.64
Course Introduction – Descriptive Statistics
Which printer has better quality?
• Median number of quality problems per printer:
– Brand X: 1
– Brand Y: 2
• Standard deviation of the number of quality problems per printer
– Brand X: 4.07
– Brand Y: 2.15
Lecture 1 / #15
Course Introduction – Descriptive Statistics
Which printer has better quality?
• Median number of quality problems per printer:
– Brand X: 1
– Brand Y: 2
• Standard deviation of the number of quality problems per printer
– Brand X: 4.07
– Brand Y: 2.15
Lecture 1 / #16
Course Introduction – Descriptive Statistics
Lecture 1 / #17
Recap: measures of central tendency
For a dataset X1, X2, . . . , Xn, we defined several notions of an average:
X1 + · · · + Xn
• mean or arithmetic average:
n
• median: “midpoint” or “middle value” of dataset
• mode: most frequently occurring value (mostly for categorical data)
• weighted average: w1X1 + · · · + wnXn, where we often require
w1 + w2 + · · · + wn = 1.
Remarks:
• Appropriate notion of average depends on the context
• The mean is more sensitive to outliers than the median
Course Introduction – Descriptive Statistics
Lecture 1 / #18
aa
• Who is the best baseball player of all time?
According to Steve Moyer, president of Baseball Info Solutions, the three most
valuable statistics (other than age) for evaluating any player who is not a pitcher
would be
– On-base percentage (OBP): Measures the proportion of the time that a player reaches
base successfully, including walks (which are not counted in the batting average).
– Slugging percentage (SLG): Measures power hitting by calculating the total bases
reached per at bat. A single counts as 1, a double is 2, a triple is 3, and a home run is
4. Thus, a batter who hit a single and a triple in five at bats would have a slugging
percentage of (1 + 3)/5, or .800.
– At bats (AB)
• Derek Jeter: OBP 0.377; SLG 0.440; AB: 0.310
• Babe Ruth: OBP 0.474; SLG 0.690; AB: 0.342
Course Introduction – Descriptive Statistics
Lecture 1 / #19
aa
• How is the economic heath of the American middle class?
According to Alan Krueger (Professor of Economics and Public Affairs at
Princeton), we should examine
– changes in the median wage (adjusted for inflation)
– changes to wages at the 25th and 75th percentiles (adjusted for inflation)
Source: Congressional Budget Office
Course Introduction – Descriptive Statistics
Lecture 1 / #20
Recap: measure of data dispersion
• Standard devision is the most basic and widely used measures of variability in statistical
analysis
The variance σ 2 is the avg. squared distance to mean:
2
σ2 =
(X1 − X̄) + · · · + (Xn − X̄)
n
"
2
=
n
1X
n
#
(Xi − X̄)2
i=1
and the standard deviation σ is
√
σ=
σ2
 v

u n
u X
= t 1
(Xi − X̄)2 
n i=1
• Variance expressed in the original units squared; Square root recovers the original units
• Excel: STDEVP; “P” is mnemonic for “population”
Course Introduction – Descriptive Statistics
Lecture 1 / #21
Variance & Standard Deviation – version 2
When we estimate the variance and standard deviation using a data sample we tweak the
formulae as follows:
The sample variance s2 is the avg. squared distance to mean:
2
s2 =
(X1 − X̄) + · · · + (Xn − X̄)
n−1
2
"
=
1
n−1
n
X
#
(Xi − X̄)2
i=1
and the sample standard deviation σ is
√
s=
s2
 v

u
n
X
u
= t 1
(Xi − X̄)2 
n − 1 i=1
• Excel: STDEV computes the sample standard deviation, i.e., divides by n − 1
Course Introduction – Descriptive Statistics
Lecture 1 / #22
Which one to use?
We will not worry about the distinction between the two in this course. You can use either one.
• The population standard deviation σ:
– To calculate σ, we fist find the variance and then take the square root.
– The variance is the average squared distance to the mean
– Excel: STDEVP; “P” stands for “population”
• The sample standard deviation s:
– Divides by n − 1 instead of n.
– This is often used when estimating the stdev of a population from a random sample
– Excel: STDEV
Course Introduction – Descriptive Statistics
Lecture 1 / #23
Recap of mean and standard deviation calculation
Xi
Xi − X
(Xi − X)2
Data: X1,..., X n
Sample size: n
Sample mean: (Excel function AVERAGE)
X=
…
X1 +... + X n 1 n
= ∑ Xi
n
n i=1
Sample variance:
(X1 − X)2 +... + (X n − X)2
2
s =
n −1
Sample standard deviation:
Excel function STDEV
(X1 − X)2 +... + (X n − X)2
s=
n −1
Course Introduction – Descriptive Statistics
Should I worry about my health?
My fictitious blood test result:
– My score: 134
– The average of female of my age group: 122
• The standard deviation of female of my age group: 18
Lecture 1 / #24
Course Introduction – Descriptive Statistics
Lecture 1 / #25
Should I worry about my health?
My fictitious blood test result:
– My score: 134
– The average of female of my age group: 122
• The standard deviation of female of my age group: 18
• Chebyshev’s rule: True for any dataset
– At least 75% of the data lies within ± 2 standard deviations from the mean
– At least 88.9% of the data lies within ± 3 standard deviations from the mean
Course Introduction – Descriptive Statistics
Normal approximation
• In practice, data may often have a bell-shaped histogram.
• If data is approximately normal,
– 68.3% of observations within ± 1 standard deviation from the mean
– 95.4% of observations within ± 2 standard deviation from the mean
– 99.7% of observations within ± 3 standard deviation from the mean
Lecture 1 / #26
Course Introduction – Descriptive Statistics
How do the two events compare?
• The S&P 500 index drop by 4.1% on Feb 5, 2018
• The average December temperature in central park in 2015 is 50.8 F
Lecture 1 / #27
Course Introduction – Descriptive Statistics
Daily return of S&P 500
• Mean daily return (=AVERAGE) = 0.04%
• Standard deviation of daily returns (=STDEV) = 1.10%
• Out of 7,716 days, 94.9% are within ± 2 standard deviations from the mean
Lecture 1 / #28
Course Introduction – Descriptive Statistics
Central park Temperature
• Mean = 36 F
• Standard deviation = 4.3 F
• 95.1% of the observations are within ±2 standard deviations from the mean
Lecture 1 / #29
Course Introduction – Descriptive Statistics
Lecture 1 / #30
How do the two events compare?
• The S&P 500 index drop by 4.1% on Feb 5, 2018. It is a 3.76 standard deviations below
the mean event. (0.0001)
• The average December temperature in central park in 2015 is 50.8 F. It is a 3.44 standard
deviations above the mean event. (0.0003)
Course Introduction – Descriptive Statistics
How do the two events compare?
• The monthly return of Shanghai Stock Exchange in March 2016 is 11.75%.
• The imprisonment rate in New Jersey drop by -6.5% in 2015.
Lecture 1 / #31
Course Introduction – Descriptive Statistics
Lecture 1 / #32
Swiss central bank lets Frank rise – 01/15/15
• The move on Jan 15 was 32 standard deviations above the mean
• . . . it is silly to think like that – the distribution changed
• Later on, we will be able to verify that the difference between the new mean and the old
one is statistically significant
Course Introduction – Descriptive Statistics
Lecture 1 / #33
Swiss central bank lets Frank rise – 01/15/15 (2)
Course Introduction – Descriptive Statistics
Lecture 1 / #34
Summary
Quick recap of Descriptive Statistics material from pre-term videos
• mean, median
• standard deviation
• Normal approximation
If data is approximately normal, we can characterize
68.3%
86.6%
95.4%
99.7%
of data lie within
1
1.5
2
3
stdev’s from the mean
Download