n-1

advertisement
A quick reference for symbols and formulas covered in COGS14:
MEAN OF SAMPLE:
x = ∑ xi
n
•
€
•
•
€
•
€
x : “X bar”Mean (i.e. Average) of a sample
∑
: “Capital Sigma” Sum of everything that comes after it
i : “X sub i” This stands for each individual value you have in your sample.
For example, when you’re finding the mean of values 3, 4, and 5, you substitute 3
into the xi spot, then 4, then 5, and then add these together
x
n: the number of observations in your sample; for the above example of finding
the mean of 3, 4, 5, n = 3 observations
MEAN OF POPULATION:
µ = ∑ xi
N
•
•
€
µ = “mu” Mean of a Population
Notice that this equation is very similar to the one for the mean of a sample, the
only difference is that you know you have observed the ENTIRE population
(this is rare in real life)
ESTIMATED POPULATION VARIANCE/VARIANCE OF A SAMPLE:
€
s2 = ∑ (x i − x ) 2
n −1
•
€
•
•
€
€
•
s 2 : “S squared” the term for the variance of a sample, also known as the
estimated variance of a population
∑
: “Capital Sigma” Sum of everything that comes after it
i : “X sub i” This stands for each individual value you have in your sample.
For example, when you’re finding the variance within the sample of values 3, 4,
and 5, you substitute 3 into the xi spot, subtract the mean from 3, and then square
this value. Repeat this step for as many values of xi as you have, then add those
results together.
: Number of observations in your sample minus 1; for the example of
observations equaling 3, 4, and 5, n=3, so n-1=2.
x
n-1
ESTIMATED POPULATION STANDARD DEVIATION/STANDARD
DEVIATION OF A SAMPLE:
s = s2
•
•
€
s=
standard deviation of a sample, also known as estimated standard deviation
for the population
see above for how to calculate s2, then take the square root of your answer to find
standard deviation
POPULATION VARIANCE:
σ 2 = ∑ (xi − µ )2
N
•
€
•
•
•
€
€
Note that this equation is very similar to the equation for estimated population
variance above—The difference is that you divide by “N” in the denominator
to find population variance, which is equal to the total number of members of
your population, whereas you divide by n-1 to find the ESTIMATED
population variance
σ 2 = “sigma squared” the term used for population variance
∑ : “Capital Sigma” Sum of everything that comes after it
xi : “X sub i” This stands for each individual value you have in your sample.
For example, when you’re finding the variance within the sample of values 3, 4,
and 5, you substitute 3 into the xi spot, subtract the mean from 3, and then square
this value. Repeat this step for as many values of xi as you have, then add those
results together.
•
: Number of members of your population/observations
**This equation will only be used when you can observe the ENTIRE population,
which is commonly not feasible in real life. But you should understand how to find
population variance, and how it is related/different from ESTIMATED population
variance
N
POPULATION STANDARD DEVIATION:
σ = σ2
•
•
€
σ = “sigma” the term for population standard deviation
see above for how to calculate sigma squared, then take the square root of
your answer to find sigma
ENTROPY:
H = −∑ f (xi )log 2 ( f (xi ))
€
€
€
•
H: the symbol to denote entropy
•
∑
: “Capital Sigma” Sum of everything that comes after it
•
•
€
f (xi ) : Relative frequency of something occurring; For example, you flip a coin
10 times, and 4 times it comes up heads. The relative frequency = 0.4
For each outcome, figure out the relative frequency, then find log2 of that
frequency, and then multiply that value times the relative frequency itself. Once
you have done this for each outcome you had, add all your answers together and
take the negative of it to find entropy.
MAXIMUM POSSIBLE ENTROPY:
H max = −log 2 (1/ k) = log 2 (k)
•
€
k: the number of possible outcomes.
For example, with a coin toss, there are 2
possible outcomes. With a die roll, there are 6.
RELATIVE ENTROPY:
J=H
H max
•
€
A value close to 1 indicates maximum possible entropy. A value close to 0
indicates minimum possible entropy.
EXPECTED VALUE OF A RANDOM VARIABLE:
E(X) = ∑ P(X = xi )xi
•
•
•
€
€
•
E(X) = Notation for “Expected Value”
∑
: “Capital Sigma” Sum of everything that comes after it
= Probability
P
xi : “X sub i” This again stands for each possible observed value.
For
example, you are trying to find the observed value for a die that has 5 sides
showing “1” and 1 side showing “0”; then 1 and 0 are your values you plug in for
xi. You would first figure out the probability of rolling a 1 (P=5/6) and then
multiply that P times the actual value of 1. Then repeat with the probability of
rolling a 0 (P=1/6) times the value of 1, add these results together, and find your
expected value (E(X)) = 5/6
VARIANCE OF A RANDOM VARIABLE:
Var(X) = ∑ P(X = xi )(xi − E(X))2
•
•
•
€
€
Var(X) = Notation for “Variance of a random variable”
∑
: “Capital Sigma” Sum of everything that comes after it
 see above, this means “expected value of a random variable.” So to
find the variance of a random variable, you will first need to find the expected
value.
E(X)
•
•
xi : “X sub i” This stands for each possible observed value.
To find the
variance, plug in each possible value for xi and then subtract the expected value
from this observed value, and square this answer. Then multiply this answer by
the probability of getting that observed value.
For example, assume we roll a fair die and want to know what the variance of the
random variable will be. We find that the Exptected Value = 3.5. For each
possible value of the die (1, 2, 3, 4, 5, 6) we will plug each value in for xi,
subtract the expected value of 3.5, square the answer, and then multiply it by the
probability of rolling that value (in this case each number has a 1/6 chance of
being rolled). Calculate this for all 6 numbers, and sum those components
together to find the variance.
STANDARD DEVIATION OF A RANDOM VARIABLE:
Std(X) = Var(X)
•
•
€
Std(X) = Standard Deviation of a Random Variable
Once you compute variance as in the above example, take the square root of it to
get the standard deviation of a Random variable
BINOMIAL DISTRIBUTION:
n 
P(k | n, p) =   p k (1− p)n−k
k 
EXPANDED TO:

 k
n!
n−k
P(k | n, p) = 
 p (1− p)
 k!(n − k)!
€
€
€
€
•
k: The number of “successful” outcomes.
•
n: The number of trials.
•
p: The probability of getting a successful outcome.
•
P(k | n, p) : “The probability of getting “k” successes, given “n” number of
•
You define what you think a success
is—it could be something like getting heads on a coin flip.
When you are doing a binomial equation, this might be
listed as the number of times you flip the coin, reach into a bag, etc.
If you are flipping a coin and
have defined success as getting heads, then p=the probability of getting a head
when you flip the coin.
trials, and “p” probability of success
n 
 
 k  : “n choose k” You define getting “k” number of successes out of “n”
number of trials (see below to calculate)

n! 


 k!(n − k)!  the Expansion of “n choose k”.
n! means “n factorial”, which
means you take “n” and multiply it by all numbers smaller than “n”. For
example, to find 4!, you multiply 4x3x2x1
• The rest of the equation is just plugging in values to figure out the correct
probability of getting “k” number of successes across “n” number of trials, given
that you have a “p” probability of getting “k” on any given trial
**Define n, k, and p before you start the problem—It might help to write them next to the
binomial equation and then just go back and plug them in where needed.
•
€
THE SAMPLING DISTRIBUTION OF THE MEAN:
µ x = E(X) = E(X) = µ x
•
€
The concept of the sampling distribution of the mean is a very helpful and crucial
concept for statistics. In short, the sampling distribution of the mean is a
hypothetical distribution that represents what you would get if you took infinite
samples of size “n”, took the mean of each of those samples, and then graphed
those means. Some things we know about the sampling distribution of the mean
are:
o For a large enough n (25-100), the sampling distribution of the mean will
be normally distributed
o The mean of the sampling distribution of the mean = mean of the
population
µ x : Mean of the sampling distribution of the mean
µ x : Mean of the population
•
•
E(X) : Expected value of the sampling distribution of the mean
E(X) : Expected value of the population
•
€
€
€
€
•
BUT:
σx =
€
€
€
=
σx
n
•
σ x : Standard deviation of the sampling distribution of the mean
Var( X ) : Variance of the population
•
n: Number of observations
•
σ x : Standard deviation of the population
•
€
Var ( X )
n
•
SO, we know that the standard deviation of the sampling distribution of the mean
will always be smaller than the standard deviation of the population by a specific
amount (i.e. population standard deviation divided by the square root of the
number of observations in a sample)
COHEN’S D:
d=
•
•
•
•
€
€
€
•
x −µ
σ
x : mean of your sample
µ : mean of the null hypothesis
σ : Standard deviation of the null hypothesis
Cohen’s d is a measure of effect size, or how large of an effect your sample had in
comparison to the null hypothesis
d =0.20 (small effect), d = 0.50 (medium effect), d = 0.80 (large effect)
OBSERVED Z-SCORE:
€
z=
x −u
σx
EXPANDED TO:
z=
€
•
€
€
€
€
x −u
σ
n
x : mean of your sample
•
µ : mean of the null hypothesis
•
σ x : standard error of the mean (also known as the standard deviation of the
population divided by the square root of the number of observations)
CONFIDENCE INTERVALS (FOR A Z-TEST):
x ± (zconf )σ x
€
€
•
x : observed mean of your sample
•
(zconf ) : the critical z-scores for your level of confidence.
For purposes of this
class, think of these like when you are finding critical z-scores for two-tailed ztests. If you have a 95% confidence interval, you will have the same “z conf” as
you would have for a 2-tailed z-test with an alpha level of 0.05.
To find your “z conf” subtract your level of confidence from 100 (i.e. 100-95%
confidence = 5). Divide this 5% by 2 =2.5% or 0.025, find 0.025 in the “C”
Column of the z-table, then find the corresponding z-score in the “A” column.
•
€
σ x : standard error of the mean (also known as the standard deviation of the
•
population divided by the square root of the number of observations)
ONE SAMPLE T-TEST (3 related formulas):
€
1)
x −µ
t=
sx
2)
^
^
sx = σ x =
€
s
σ
=
n
n
3)
^
€
s =σ =
•
€
•
•
€
∑ (x − x)
2
n −1
x : your sample mean
x: Each individual observation in your sample
n-1: the number of observations in your sample, minus 1
•
•
€
µ : the population mean (usually what you are comparing your sample mean to,
to see if there is a difference
s x / σ^ x : The estimated standard error of the mean. Note this is also represented
as the Greek letter sigma σ , with a “hat”, so we can call it “sigma hat”—this
indicates it’s an estimate
^
€
€
•
•
•
s = σ : The€estimated standard deviation of the population.
∑
: “Capital Sigma” Sum of everything that comes after it
^
•
€
€
To estimate the population standard deviation, we need to find s or σ , which we
find in a similar way to how we always calculate standard deviation. Take each
individual score (x) and subtract the mean ( x ). Square that value. Repeat for
each individual score and then add up what you get. Then€divide that value by the
number of observations minus 1 (n-1), and finally take the square root of your
^
€
answer to find “s” or “ σ ”
**Note that you will use “n” at least 2 times in the t-score formula: once to find the
estimated standard deviation (formula #3 above) and again when finding the
estimated standard error of the mean (formula #2). You will also need to know n to
€
find your critical t-score on your t-score chart. Your degrees of freedom (df) is equal
to the number of observations minus 1 for a one-sample t-test (so df=(n-1) for this
test)
CONFIDENCE INTERVAL FOR A ONE-SAMPLE T-TEST:
x ± t conf (s x )
• x : Observed mean of your sample
t conf : the critical t-scores for your level of confidence.
•
€
€
€
•
•
€
For purposes of this
class, think of these like when you are finding critical t-scores for two-tailed ttests. If you have a 95% confidence interval, you will have the same “t conf”
as you would have for a 2-tailed t-test with an alpha level of 0.05.
To find your “z conf” subtract your level of confidence from 100 (i.e. 10095% confidence = 5). Go to the 2-tailed test side of the t-test table, find the
column for 0.05 and go down to your df to find the correct “t conf”
s x : Estimated standard error of the mean (see above to calculate)
Download