Probability Distributions, and Parameter Estimation Some Notes Random Variable – takes on different values based on chance Discrete – Only has certain possible values Continuous – Anything is possible! Where only two outcomes are possible Certain number of “trials” Trials are independent Probabilities are consistent On five MC questions with five options, what is the probability that someone randomly guessing will get three correct? What you’re calculating is: (# ways to get 3 correct)/(all possible outcomes) Accounting for the “known” probability What about three or less correct? Need to find possible ways can occur Counting Rule for Combinations Cnx = (n!)/[(x!)(n-x)!] Tells us number of possible outcomes given situation Order does not matter With Counting Rule for Combinations and probability, we can construct Binomial Formula Pr(x) = {(n!)/(x!)(n-x)!}*(px )*(qn-x) These are located on the Binomial Table Mean n*p Gives average number of successes Standard √(n*p*q) Deviation Based on a countable number of “successes” Use Poisson when We know average number of successes Probability of success is consistent Segments are independent We can divide segments into smaller pieces Mean λt Be careful how you use this… Poisson Probability Disribution Pr(x) = {(λt)x e-λt}/(x!) Standard √(λt) Deviation Rick Ankiel has hit 10 HR in 58 games. What is the probability that he will hit a HR in the first three innings of tonight’s game? The most often used/desired distribution of them all Easiest to work with Most other distributions converge towards normal Looking for range of possible values Pr(x) = 0, no matter what x is True for all continuous distributions Density Function… Properties of Normal Has a single peak Symmetric Mean = Median = Mode Approaches 0, but never reaches Variation depends on height, spread All Normals can be “Standardized” The Z-value is the “standardized” version This value can be used with the Z-table But be aware of what you’re calculating and reading from the table Density Function… Shaped as a rectangle with a and b as its “limits” on the x axis Mean (a+b)/2 Standard Deviation √{(b-a)2/12} When we consider samples from a population, those samples have a distribution of their own We’ll want to know how accurate our sample is as a representative of the population Sampling Error = (x-bar) – μ Size will depend on sample selection May be + or – Can be different for each sample For all possible values of a statistic of a given sample size that has been randomly selected from a population The average of all possible sample averages will equal population averages Same is true for standard deviations This property called unbiasedness As we increase the size of n, something else occurs As n increases, we should see the values of our statistics (means and standard deviations) grow closer to the population value This is called consistency Usually shown analytically as population unknown If population is ~ N, Sampling dist’n of sample mean ~ N Mean = μ Standard Deviation = σ/(√n) We can then convert to Z-value Equation… This is why the Normal is so wonderful As the sample size grows, any distribution will become approximately normal Mean of x-bar Standard Deviation of σ/√n Defined as π = X/N Sample proportion is p=x/n Sampling error is p – π Mean of SampDist of p π Standard √{(π(1- π))/n} Works Error as long as nπ ≥ 5 n(1 – π) ≥ 5 We can also do Z-values for this Z = p – π/(std. error) Point estimate Statistic used to estimate a parameter This is likely what you see reported Recall if the sample is large enough, we can assume it to be normal Central Limit Theorem n > 30, typically Regardless, we can convert to Z-values and construct confidence intervals Confidence Interval (X-bar) ± Z*(σ/√n) This tells you how “certain” you are that the population value is within that range. The percentage based on choice of Z Error Margin of error = Z*(σ/√n) This happens, but it is measurable illustrates a tradeoff Lower confidence – lower error Higher confidence – higher error Can also increase sample size to lower error We don’t always know σ (in fact, we rarely do) But we can estimate σ (calculating s) This however changes our method, slightly We’ll use the t-distribution Relying on degrees of freedom t-score for mean… t-score for confidence interval… Now we have a method What else could we do to influence the margin of error? Change the sample size (n) Sample Size Requirement (σ known) (Z2 σ2)/(e2) But again, σ not always (if ever) known Sample Size Requirement (σ unknown) Estimate σ using (R / 6) We can also do the same for proportions Some formulas… Sample Proportion Standard Error for p Estimate for SE for p Confidence Interval for p Margin of Error Sample Size Now that you know how to calculate some statistics, it’s time to “give you the sword” Null Hypothesis (H0) – This is what we are testing Alternative Hypothesis (HA) – This includes everything not in the null One-sided or two sided? Two-sided H0 will have “=“ Rejection region is on either side of the null region One-sided H0 will have “>” or “<“ Rejection region is only on one side When specifying H0, don’t set up the “straw man” Formally, it goes away from the power of the test Informally, it’s “shady” What’s the point? Statistical method of determining validity of claims Powerful weapon of refuting or supporting these claims Must be done properly else lose credibility Note: We will never “prove” anything We only find evidence We will either reject or fail to reject H0 WE WILL NEVER ACCEPT H0!!! I don’t care what the book says, that is careless and inappropriate This Type I Error – Rejecting a true H0 is done with error Denoted by significance level This is your α This will determining your critical value Type II Error – Failing to reject a false H0 This is usually denoted by β To do this, you’ll need your critical value Critical Value – cutoff point where you either reject or fail to reject H0 Calculating the critical value… These will have the subscript “crit” Critical value compared to the test statistic Calculating the test statistic… These will have subscript “stat” Here’s what you’ll need to do Specify H0 and HA Determine if the test is one or two sided Specify Decision Rule using Zcrit Calculate Zstat Compare the two values Express your decision Another approach exists p-value – Tells you what α level would allow you to reject H0 This does not mean you should use this α That depends on the problem Calculating p-value Find Zstat Find associated value in Z-table Once again, σ is not always known In that event, you’ll again use t-statistics Calculation of t-stat Calculation of t-crit Other than a change in formulas, the procedure is exactly the same You can also do this for proportions Calculating… Z-stat for proportions Z-crit for proportions