n-1

A quick reference for symbols and formulas covered in COGS14: MEAN OF SAMPLE: x = ∑ xi n • € • • € • € x : “X bar”Mean (i.e. Average) of a sample ∑ : “Capital Sigma” Sum of everything that comes after it i : “X sub i” This stands for each individual value you have in your sample. For example, when you’re finding the mean of values 3, 4, and 5, you substitute 3 into the xi spot, then 4, then 5, and then add these together x n: the number of observations in your sample; for the above example of finding the mean of 3, 4, 5, n = 3 observations MEAN OF POPULATION: µ = ∑ xi N • • € µ = “mu” Mean of a Population Notice that this equation is very similar to the one for the mean of a sample, the only difference is that you know you have observed the ENTIRE population (this is rare in real life) ESTIMATED POPULATION VARIANCE/VARIANCE OF A SAMPLE: € s2 = ∑ (x i − x ) 2 n −1 • € • • € € • s 2 : “S squared” the term for the variance of a sample, also known as the estimated variance of a population ∑ : “Capital Sigma” Sum of everything that comes after it i : “X sub i” This stands for each individual value you have in your sample. For example, when you’re finding the variance within the sample of values 3, 4, and 5, you substitute 3 into the xi spot, subtract the mean from 3, and then square this value. Repeat this step for as many values of xi as you have, then add those results together. : Number of observations in your sample minus 1; for the example of observations equaling 3, 4, and 5, n=3, so n-1=2. x n-1 ESTIMATED POPULATION STANDARD DEVIATION/STANDARD DEVIATION OF A SAMPLE: s = s2 • • € s= standard deviation of a sample, also known as estimated standard deviation for the population see above for how to calculate s2, then take the square root of your answer to find standard deviation POPULATION VARIANCE: σ 2 = ∑ (xi − µ )2 N • € • • • € € Note that this equation is very similar to the equation for estimated population variance above—The difference is that you divide by “N” in the denominator to find population variance, which is equal to the total number of members of your population, whereas you divide by n-1 to find the ESTIMATED population variance σ 2 = “sigma squared” the term used for population variance ∑ : “Capital Sigma” Sum of everything that comes after it xi : “X sub i” This stands for each individual value you have in your sample. For example, when you’re finding the variance within the sample of values 3, 4, and 5, you substitute 3 into the xi spot, subtract the mean from 3, and then square this value. Repeat this step for as many values of xi as you have, then add those results together. • : Number of members of your population/observations **This equation will only be used when you can observe the ENTIRE population, which is commonly not feasible in real life. But you should understand how to find population variance, and how it is related/different from ESTIMATED population variance N POPULATION STANDARD DEVIATION: σ = σ2 • • € σ = “sigma” the term for population standard deviation see above for how to calculate sigma squared, then take the square root of your answer to find sigma ENTROPY: H = −∑ f (xi )log 2 ( f (xi )) € € € • H: the symbol to denote entropy • ∑ : “Capital Sigma” Sum of everything that comes after it • • € f (xi ) : Relative frequency of something occurring; For example, you flip a coin 10 times, and 4 times it comes up heads. The relative frequency = 0.4 For each outcome, figure out the relative frequency, then find log2 of that frequency, and then multiply that value times the relative frequency itself. Once you have done this for each outcome you had, add all your answers together and take the negative of it to find entropy. MAXIMUM POSSIBLE ENTROPY: H max = −log 2 (1/ k) = log 2 (k) • € k: the number of possible outcomes. For example, with a coin toss, there are 2 possible outcomes. With a die roll, there are 6. RELATIVE ENTROPY: J=H H max • € A value close to 1 indicates maximum possible entropy. A value close to 0 indicates minimum possible entropy. EXPECTED VALUE OF A RANDOM VARIABLE: E(X) = ∑ P(X = xi )xi • • • € € • E(X) = Notation for “Expected Value” ∑ : “Capital Sigma” Sum of everything that comes after it = Probability P xi : “X sub i” This again stands for each possible observed value. For example, you are trying to find the observed value for a die that has 5 sides showing “1” and 1 side showing “0”; then 1 and 0 are your values you plug in for xi. You would first figure out the probability of rolling a 1 (P=5/6) and then multiply that P times the actual value of 1. Then repeat with the probability of rolling a 0 (P=1/6) times the value of 1, add these results together, and find your expected value (E(X)) = 5/6 VARIANCE OF A RANDOM VARIABLE: Var(X) = ∑ P(X = xi )(xi − E(X))2 • • • € € Var(X) = Notation for “Variance of a random variable” ∑ : “Capital Sigma” Sum of everything that comes after it  see above, this means “expected value of a random variable.” So to find the variance of a random variable, you will first need to find the expected value. E(X) • • xi : “X sub i” This stands for each possible observed value. To find the variance, plug in each possible value for xi and then subtract the expected value from this observed value, and square this answer. Then multiply this answer by the probability of getting that observed value. For example, assume we roll a fair die and want to know what the variance of the random variable will be. We find that the Exptected Value = 3.5. For each possible value of the die (1, 2, 3, 4, 5, 6) we will plug each value in for xi, subtract the expected value of 3.5, square the answer, and then multiply it by the probability of rolling that value (in this case each number has a 1/6 chance of being rolled). Calculate this for all 6 numbers, and sum those components together to find the variance. STANDARD DEVIATION OF A RANDOM VARIABLE: Std(X) = Var(X) • • € Std(X) = Standard Deviation of a Random Variable Once you compute variance as in the above example, take the square root of it to get the standard deviation of a Random variable BINOMIAL DISTRIBUTION: n  P(k | n, p) =   p k (1− p)n−k k  EXPANDED TO:   k n! n−k P(k | n, p) =   p (1− p)  k!(n − k)! € € € € • k: The number of “successful” outcomes. • n: The number of trials. • p: The probability of getting a successful outcome. • P(k | n, p) : “The probability of getting “k” successes, given “n” number of • You define what you think a success is—it could be something like getting heads on a coin flip. When you are doing a binomial equation, this might be listed as the number of times you flip the coin, reach into a bag, etc. If you are flipping a coin and have defined success as getting heads, then p=the probability of getting a head when you flip the coin. trials, and “p” probability of success n     k  : “n choose k” You define getting “k” number of successes out of “n” number of trials (see below to calculate)  n!     k!(n − k)!  the Expansion of “n choose k”. n! means “n factorial”, which means you take “n” and multiply it by all numbers smaller than “n”. For example, to find 4!, you multiply 4x3x2x1 • The rest of the equation is just plugging in values to figure out the correct probability of getting “k” number of successes across “n” number of trials, given that you have a “p” probability of getting “k” on any given trial **Define n, k, and p before you start the problem—It might help to write them next to the binomial equation and then just go back and plug them in where needed. • € THE SAMPLING DISTRIBUTION OF THE MEAN: µ x = E(X) = E(X) = µ x • € The concept of the sampling distribution of the mean is a very helpful and crucial concept for statistics. In short, the sampling distribution of the mean is a hypothetical distribution that represents what you would get if you took infinite samples of size “n”, took the mean of each of those samples, and then graphed those means. Some things we know about the sampling distribution of the mean are: o For a large enough n (25-100), the sampling distribution of the mean will be normally distributed o The mean of the sampling distribution of the mean = mean of the population µ x : Mean of the sampling distribution of the mean µ x : Mean of the population • • E(X) : Expected value of the sampling distribution of the mean E(X) : Expected value of the population • € € € € • BUT: σx = € € € = σx n • σ x : Standard deviation of the sampling distribution of the mean Var( X ) : Variance of the population • n: Number of observations • σ x : Standard deviation of the population • € Var ( X ) n • SO, we know that the standard deviation of the sampling distribution of the mean will always be smaller than the standard deviation of the population by a specific amount (i.e. population standard deviation divided by the square root of the number of observations in a sample) COHEN’S D: d= • • • • € € € • x −µ σ x : mean of your sample µ : mean of the null hypothesis σ : Standard deviation of the null hypothesis Cohen’s d is a measure of effect size, or how large of an effect your sample had in comparison to the null hypothesis d =0.20 (small effect), d = 0.50 (medium effect), d = 0.80 (large effect) OBSERVED Z-SCORE: € z= x −u σx EXPANDED TO: z= € • € € € € x −u σ n x : mean of your sample • µ : mean of the null hypothesis • σ x : standard error of the mean (also known as the standard deviation of the population divided by the square root of the number of observations) CONFIDENCE INTERVALS (FOR A Z-TEST): x ± (zconf )σ x € € • x : observed mean of your sample • (zconf ) : the critical z-scores for your level of confidence. For purposes of this class, think of these like when you are finding critical z-scores for two-tailed ztests. If you have a 95% confidence interval, you will have the same “z conf” as you would have for a 2-tailed z-test with an alpha level of 0.05. To find your “z conf” subtract your level of confidence from 100 (i.e. 100-95% confidence = 5). Divide this 5% by 2 =2.5% or 0.025, find 0.025 in the “C” Column of the z-table, then find the corresponding z-score in the “A” column. • € σ x : standard error of the mean (also known as the standard deviation of the • population divided by the square root of the number of observations) ONE SAMPLE T-TEST (3 related formulas): € 1) x −µ t= sx 2) ^ ^ sx = σ x = € s σ = n n 3) ^ € s =σ = • € • • € ∑ (x − x) 2 n −1 x : your sample mean x: Each individual observation in your sample n-1: the number of observations in your sample, minus 1 • • € µ : the population mean (usually what you are comparing your sample mean to, to see if there is a difference s x / σ^ x : The estimated standard error of the mean. Note this is also represented as the Greek letter sigma σ , with a “hat”, so we can call it “sigma hat”—this indicates it’s an estimate ^ € € • • • s = σ : The€estimated standard deviation of the population. ∑ : “Capital Sigma” Sum of everything that comes after it ^ • € € To estimate the population standard deviation, we need to find s or σ , which we find in a similar way to how we always calculate standard deviation. Take each individual score (x) and subtract the mean ( x ). Square that value. Repeat for each individual score and then add up what you get. Then€divide that value by the number of observations minus 1 (n-1), and finally take the square root of your ^ € answer to find “s” or “ σ ” **Note that you will use “n” at least 2 times in the t-score formula: once to find the estimated standard deviation (formula #3 above) and again when finding the estimated standard error of the mean (formula #2). You will also need to know n to € find your critical t-score on your t-score chart. Your degrees of freedom (df) is equal to the number of observations minus 1 for a one-sample t-test (so df=(n-1) for this test) CONFIDENCE INTERVAL FOR A ONE-SAMPLE T-TEST: x ± t conf (s x ) • x : Observed mean of your sample t conf : the critical t-scores for your level of confidence. • € € € • • € For purposes of this class, think of these like when you are finding critical t-scores for two-tailed ttests. If you have a 95% confidence interval, you will have the same “t conf” as you would have for a 2-tailed t-test with an alpha level of 0.05. To find your “z conf” subtract your level of confidence from 100 (i.e. 10095% confidence = 5). Go to the 2-tailed test side of the t-test table, find the column for 0.05 and go down to your df to find the correct “t conf” s x : Estimated standard error of the mean (see above to calculate)

n-1

Related documents

Products

Support

n-1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib