Topic II

advertisement
Probability Distributions, and
Parameter Estimation
 Some



Notes
Random Variable – takes on different values
based on chance
Discrete – Only has certain possible values
Continuous – Anything is possible!
 Where
only two outcomes are possible
 Certain number of “trials”
 Trials are independent
 Probabilities are consistent
 On
five MC questions with five options, what
is the probability that someone randomly
guessing will get three correct?
 What you’re calculating is:
 (# ways to get 3 correct)/(all possible outcomes)

Accounting for the “known” probability
 What
about three or less correct?
 Need
to find possible ways can occur
 Counting Rule for Combinations



Cnx = (n!)/[(x!)(n-x)!]
Tells us number of possible outcomes given
situation
Order does not matter
 With
Counting Rule for Combinations and
probability, we can construct Binomial
Formula


Pr(x) = {(n!)/(x!)(n-x)!}*(px )*(qn-x)
These are located on the Binomial Table
 Mean


n*p
Gives average number of successes
 Standard

√(n*p*q)
Deviation
 Based
on a countable number of “successes”
 Use Poisson when




We know average number of successes
Probability of success is consistent
Segments are independent
We can divide segments into smaller pieces
 Mean

λt

Be careful how you use this…
 Poisson

Probability Disribution
Pr(x) = {(λt)x e-λt}/(x!)
 Standard

√(λt)
Deviation
 Rick
Ankiel has hit 10 HR in 58 games. What
is the probability that he will hit a HR in the
first three innings of tonight’s game?
 The
most often used/desired distribution of
them all


Easiest to work with
Most other distributions converge towards normal
 Looking


for range of possible values
Pr(x) = 0, no matter what x is
True for all continuous distributions
 Density
Function…
 Properties of Normal





Has a single peak
Symmetric
Mean = Median = Mode
Approaches 0, but never reaches
Variation depends on height, spread
 All
Normals can be “Standardized”
 The Z-value is the “standardized” version
 This value can be used with the Z-table
 But be aware of what you’re calculating and
reading from the table
 Density
Function…
 Shaped as a rectangle with a and b as its
“limits” on the x axis
 Mean

(a+b)/2
 Standard

Deviation
√{(b-a)2/12}
 When
we consider samples from a
population, those samples have a distribution
of their own
 We’ll want to know how accurate our sample
is as a representative of the population
 Sampling



Error = (x-bar) – μ
Size will depend on sample selection
May be + or –
Can be different for each sample
 For
all possible values of a statistic of a given
sample size that has been randomly selected
from a population



The average of all possible sample averages will
equal population averages
Same is true for standard deviations
This property called unbiasedness
 As
we increase the size of n, something else
occurs



As n increases, we should see the values of our
statistics (means and standard deviations) grow
closer to the population value
This is called consistency
Usually shown analytically as population unknown
 If



population is ~ N,
Sampling dist’n of sample mean ~ N
Mean = μ
Standard Deviation = σ/(√n)
 We

can then convert to Z-value
Equation…
 This
is why the Normal is so wonderful
 As the sample size grows, any distribution
will become approximately normal
 Mean of x-bar
 Standard Deviation of σ/√n
 Defined
as π = X/N
 Sample proportion is p=x/n
 Sampling error is p – π
 Mean of SampDist of p

π
 Standard

√{(π(1- π))/n}
 Works


Error
as long as
nπ ≥ 5
n(1 – π) ≥ 5
 We

can also do Z-values for this
Z = p – π/(std. error)
 Point


estimate
Statistic used to estimate a parameter
This is likely what you see reported
 Recall
if the sample is large enough, we can
assume it to be normal


Central Limit Theorem
n > 30, typically
 Regardless,
we can convert to Z-values and
construct confidence intervals
 Confidence
Interval
(X-bar) ± Z*(σ/√n)
This tells you how “certain” you are that the
population value is within that range.
 The percentage based on choice of Z


 Error

Margin of error = Z*(σ/√n)
 This


happens, but it is measurable
illustrates a tradeoff
Lower confidence – lower error
Higher confidence – higher error
 Can
also increase sample size to lower
error
 We
don’t always know σ (in fact, we rarely
do)
 But we can estimate σ (calculating s)
 This however changes our method, slightly
 We’ll use the t-distribution

Relying on degrees of freedom
 t-score
for mean…
 t-score for confidence interval…
 Now we have a method
 What
else could we do to influence the
margin of error?
 Change the sample size (n)
 Sample

Size Requirement (σ known)
(Z2 σ2)/(e2)
 But
again, σ not always (if ever) known
 Sample Size Requirement (σ unknown)

Estimate σ using (R / 6)
 We
can also do the same for proportions
 Some formulas…






Sample Proportion
Standard Error for p
Estimate for SE for p
Confidence Interval for p
Margin of Error
Sample Size
 Now
that you know how to calculate some
statistics, it’s time to “give you the sword”
 Null Hypothesis (H0) – This is what we are
testing
 Alternative Hypothesis (HA) – This includes
everything not in the null

One-sided or two sided?
 Two-sided


H0 will have “=“
Rejection region is on either side of the null
region
 One-sided



H0 will have “>” or “<“
Rejection region is only on one side
When specifying H0, don’t set up the “straw
man”


Formally, it goes away from the power of the test
Informally, it’s “shady”
 What’s





the point?
Statistical method of determining validity of
claims
Powerful weapon of refuting or supporting these
claims
Must be done properly else lose credibility
Note: We will never “prove” anything
We only find evidence
 We
will either reject or fail to reject H0
 WE WILL NEVER ACCEPT H0!!!
 I don’t care what the book says, that is
careless and inappropriate
 This

Type I Error – Rejecting a true H0




is done with error
Denoted by significance level
This is your α
This will determining your critical value
Type II Error – Failing to reject a false H0

This is usually denoted by β
 To
do this, you’ll need your critical value
 Critical Value – cutoff point where you
either reject or fail to reject H0


Calculating the critical value…
These will have the subscript “crit”
 Critical
value compared to the test
statistic


Calculating the test statistic…
These will have subscript “stat”
 Here’s






what you’ll need to do
Specify H0 and HA
Determine if the test is one or two sided
Specify Decision Rule using Zcrit
Calculate Zstat
Compare the two values
Express your decision
 Another
approach exists
 p-value – Tells you what α level would
allow you to reject H0


This does not mean you should use this α
That depends on the problem
 Calculating


p-value
Find Zstat
Find associated value in Z-table
 Once
again, σ is not always known
 In that event, you’ll again use t-statistics


Calculation of t-stat
Calculation of t-crit
 Other
than a change in formulas, the
procedure is exactly the same
 You
can also do this for proportions
 Calculating…


Z-stat for proportions
Z-crit for proportions
Download