Uploaded by Jeff Negus

Midterm study guide

advertisement
MGT 201
Midterm study guide
I.
Descriptive statistics
How to quantitatively describe a data set {x1, …, xn}.
1. Measures of central tendency: where is the data located?
a. Mean/average: 𝑥̅ = (x1+ …+ xn)/n. Excel: =AVERAGE(…)
b. Median: middle number when data is sorted in increasing order (or average of two
middle numbers). Less influenced by extreme values. Excel: =MEDIAN(…)
c. Mode: most frequent observation. Excel: =MODE.SNGL(…)
2. Measures of spread/variability/dispersion: measure how far from the mean the data
falls
a. Mean absolute deviation (average of the |xi – 𝑥̅ |’s)
b. Variance = mean square deviation (average of the (xi – 𝑥̅ )2 ’s). Excel: VAR.P(…)
c. Standard deviation = square root of the variance (brings to the same unit as the
data). Used as a unit of distance to the mean (e.g., a data point is “far” from the
mean if it is 3 standard deviations above or below the mean). Excel: STDEV.P(…)
d. Coefficient of variation = ratio of standard deviation to mean (gives a reference
point to understand size of standard deviation in context)
e. Range: gap between lowest and highest observation. Excel: = MAX(…) – MIN(…)
f. IQR: gap between 1st and 3rd quartiles. Measures the spread of the middle 50% of
the data. To find quartiles: find the median; if it’s a data point, exclude it. The
median divides the data set into two parts. Find the median of each part: those are
the quartiles (3rd quartile is median of upper half, 1st quartile is median of lower
half). Excel: QUARTILE.EXC(…) (may give a slightly different answer than method
described here)
3. Measures of dependence of two data sets
a. Covariance = average of the (xi – 𝑥̅ ) (yi – 𝑦%). Measures how close to a straight line
the scatter plot of y vs. x is. Unit = unit of x times unit of y. Value difficult to
interpret. Excel: =COVARIANCE.P(…, …)
b. Correlation coefficient = covariance/ (σx σy). Standardizes the covariance to a value
from –1 to 1. When absolute value of the correlation is close to 1, we can predict
one variable using the other using the equation of the regression line. Excel:
=CORREL(…, …)
c. Do not confuse correlation and causation
II.
Probability of events
1. Sample space = set of all possible outcomes. Has probability 1.
2. Event = subset of the sample space = subset of outcomes
3. Probability of an event A: likelihood that event A happens. If all outcomes are equally
likely, it is the ratio of number of outcomes in A out of total number of possible
outcomes
4. Complement of an event A: happens whenever A does not happen. P(not A ) = 1 –
P(A)
5. Union of events A and B: when A happens or B happens or both happen. P(A or B) =
P(A) + P(B) – P(A and B)
6. Intersection of events A and B: when A and B happen simultaneously. Events are
mutually exclusive (i.e., disjoint) when their intersection is empty.
7. Conditional probability: when we know for sure that an event is true. It’s a reduction
of the sample space to a subset corresponding to what we know for sure is true
(what is “given”). Probability of A given B is the probability that A happens given that
B happens for sure. P(A|B) = P(A and B) / P(B)
8. Total probability rule: when breaking up the probability into two (or more) parts by
intersecting or condition on another event makes it easier. P(A) = P(A and B) + P(A
and ‘not B’)
9. Multiplication rule: P(A and B ) = P(A|B ) * P(B) = P(B|A) * P(A)
10. Independence of events: A and B are independent when knowing that B happens
does not change the probability of A. Events A and B are independent if and only if
P(A and B) = P(A)*P(B). Two other equivalent definitions: P(A) = P(A|B); P(B) = P(B|A).
To show A and B are independent, show one of these 3 equalities hold. To show
events are not independent, show that one of these equalities does not hold. Do not
confuse independence with being mutually exclusive! Mutually exclusive events
cannot be independent.
11. Bayes Rule: used when you need to flip the conditioning, i.e., find P(B|A) when you
P(A|B) . P(B)
know P(A|B). Formula: P(B|A) = P(A|B) . P(B) + P(A|not B) . P(not B)
III.
Random variables and distributions
1. Generalities
a. A random variable assigns a numerical value to each possible outcome of a
statistical experiment. It must be subject to uncertainty and it must be numerical.
It can be discrete or continuous.
b. Distribution of a discrete random variable: set of possible values xi and their
corresponding probabilities pi.
c. To find the probability of an event on discrete random variable X, add up the
probabilities of all the outcomes corresponding to the event.
d. Expected value of X, E[X]: weighted average of the possible outcomes: ∑! 𝑝! 𝑥! .
Excel: = SUMPRODUCT(…, …)
e. Variance of X, Var(X): weighted average of the square deviations to the mean:
∑! 𝑝! (𝑥! − 𝐸 [𝑋])"
f. Standard deviation of X: square root of the variance.
g. Coefficient of variation of X: ratio of standard deviation to expected value.
2. Some special distributions
a. Discrete uniform distribution: n equally likely outcomes. Each one has probability
1/n.
b. Binomial distribution:
• n independent trials, each one results in either success or failure (define what
is a trial, define what is a success)
• Each trial has the same chance of success, p
• Random variable X counts how many successes there are out of the n trials.
(Make sure you define what X is.)
• Then X has a binomial distribution with parameters (n,p).
#!
• 𝑃(𝑋 = 𝑥) = %!(#'%)! 𝑝 % (1 − 𝑝)#'% for x = 0, …, n
•
•
•
IV.
E[X] = np, Var(X) = np(1– p)
P(X = x) = BINOM.DIST(x, n, p, false) = BINOM.DIST(x, n, p, 0)
P(X ≤ x) = BINOM.DIST(x, n, p, true) = BINOM.DIST(x, n, p,1 )
Combinations of random variables
1. Definition: Z = a X + b Y, where a and b are constants, X and Y are random variables
2. Expected value: E[a X + b Y] = a E[X] + b E[Y]
3. Joint distribution of X and Y: gives the probabilities that simultaneously X = x and Y =
y, for all possible values x of X and y of Y. Marginal distribution of X and of Y can be
found by adding up the joint probabilities over the rows and columns of the joint
distribution table.
4. X and Y are independent random variables when P(X = x and Y = y) = P(X=x) * P(Y=y)
for all possible values x of X and y of Y.
5. Covariance of X and Y: weighted average of (𝑥! − 𝐸 [𝑋])(𝑦! − 𝐸 [𝑌]), using the joint
probabilities as weights.
6. Correlation of X and Y = covariance/ (SD(X) . SD(Y)). Standardizes the covariance to a
value from –1 to 1.
7. Finding the distribution of Z: find all the possible values of Z, and, for each of these
values, the corresponding probability, by adding up the joint probabilities of X and Y
that lead to this value of Z.
8. Variance of linear combination: Var(aX + bY) = a" Var(X) + b" Var(Y) + 2abCov(X,Y)
Download