Uploaded by Jenessa Hatfield

Statistics midterm study sheet

advertisement
Chapter 5: Discrete Random Variables & Their Probability Distributions
Discrete Random Variable = random variable that assumes countable values
Continuous Random Variable = not countable, can assume any value in 1 or
more intervals ($,weight, time, length)
Probability Distribution of a Discrete Random Variable:
x P(x) x2 x2 (Px)
Two Characteristics of a Probability Distribution:
1. 0≤ P(x)≤1 for each value of x
2.Σ P(x)=1
Mean of discrete random variable: μ=Σ xP(x)
Standard Deviation of a Discrete Random Variable:
4 Conditions of Binomial Experiment:
1. “n” identical trials - repeated under identical conditions
2. 2 outcomes only - “success” or “failure”
3. Probability of the outcomes remains constant
4. Trials are independent (one outcome does not affect another)
Binomial Probability Formula: the probability of exactly x
successes in n trials
q= 1-p = prob of failure
Mean of binomial distribution:
Standard deviation of a binomial distribution:
P = 0.50 = symmetric
P < 0.50 = skewed to the right
P > 0.5 = skewed to the left
Larger SD = x can assume values over a larger range about the
mean
Chapter 6: Continuous Random Variables and the Normal
Distribution
Characteristics of the Probability Distribution of a Continuous
Random Variable:
1. Probability that x assumes a value in any intervals lies in the
range 0-1
2. Total probability of all the (mutually exclusive) intervals within
which x can assume a value is 1.0 p (a ≤ x ≤ b)
The Normal Probability Distribution (Bell-Shaped)
1. Total area under curve is 1.0
2. Curve is symmetric about the mean
3. The 2 tails of the curve extend indefinitely (not beyond (μ ±3σ)
Z values: represents the distance btw the mean and x in terms of
the standard deviation
p (z assumes a single value) = 0
Use table to find values to the LEFT of any z value
1 – P(of area to the left of z) = gives area to the right
If z > 3.49 - area is approx. 1.0
If z < -3.49 -area is approx. 0.0
Standardizing a Normal Distribution:
Convert x value to a z value:
SD and Mean are parameters of a binomial distribution
Z value for an x value greater than the mean is + positive
Z value for an x value smaller than the mean is - negative
Finding an x Value for a Normal Distribution when µ, σ and z are
known: x = µ + zσ
Normal Distribution as an Approximation to Binomial Distribution:
When np > 5 and nq > 5
1. compute µ and σ for the binomial distribution µ = np and σ =
√npq
2. convert the discrete random variable into a continuous random
variable
Continuity Correction Factor: addition of 0.5 and/or
subtraction of 0.5 from the value(s) of x when the normal
distribution is used an approximation to the binomial distribution
SUBTRACT 0.5 from LOWER LIMIT of interval
ADD 0.5 to the UPPER LIMIT of the interval
3. compute the required probability using the normal distribution
Selection error -sampling frame is not representative of pop.
Nonresponse error-people do not respond
Response error -people don’t provide correct answers
Voluntary response error- participation is voluntary
Simple - each sample of same size has probability of being selected
Systematic - randomly select one member from first k (pop. size/intended sample size)
units,
then every kth member is used
Stratified - divide population into strata (when pop. differs in characteristics)
Cluster -geographical clusters, randomly select from x amount of clusters
ELEMENT OR MEMBER - of a sample or population is the specific subject or object
about which the
information is being collected
VARIABLE - a characteristic under study that assumes different values for different
elements
OBSERVATION OR MEASUREMENT - the value of a variable for an element
DATA SET - collection of observations on one or more variables
Relative frequency of a class = f/sum f
Sum of relative frequency is always 1.0
Percentage of a class= (relative frequency) X 100%
Class Width =lower limit of next class – lower limit of current class
Class midpoint = (Upper limit + Lower limit) ÷ 2
Class boundary = the midpoint of the upper limit of one class and the lower
limit of the next class (800.5-1000.5 for 801-1000)
Approximate Class Width = (largest value – smallest value) ÷ #of classes
Cumulative Frequency: Add frequency of a class to frequencies of all preceeding classes
(16+9+25|16+9+1+26)
Cumulative relative frequency = CFC/total observations (16/30, 25/30)
Cumulative Percentage = (Cumulative relative frequency) x 100%
Mean (ungrouped data): μ={x/n , x={x/n
Mean (grouped data – eg. Frequency table): μ {mf/n X=mf/n
Frequency Distribution Table: f m f2 mf m2f (m=midpoint, f=frequency of a class)
Mean is sensitive to outliers
Weighted mean (ungrouped): {xw/{w
Median for ungrouped data: value of the middle term in a ranked data set
Median not influenced by outliers, preferred over mean
Mode: # occurs with highest frequency (can have 0 or many)
Negatively skewed = highest on right, tail to the left - Mean – median – mode (at
the peak)
Positively skewed = highest on left, tail to the right - Mode (at the peak) –
median – mean
Symmetric: identical on both sides of central point
Uniform/Rectangular: Same for each class
The sum of the deviations of the x values from the mean is always 0. Σ (x−μ)=0
Range: Largest value – smallest value (influenced by outliers)
Variance (for Ungrouped Data):
Standard deviation (Ungrouped Data):
Coefficient of variation:
Expresses standard deviation as a % of the mean and compares variability for 2
different data sets that have different units of measure (eg. years, $)
Variance (Grouped Data):
Standard deviation (Grouped Data):
Standard deviation tells how closely the values of a data set are
clustered around the mean (lower value = smaller range, higher
value = larger range)
Values for the variance and SD are never negative (if no
variation, values = 0) and they are sensitive to outliers
Population parameters: a numerical measure (mean, median,
mode, variance and SD) calculated for a population data set
(mean and SD are the parameters) Sample statistics: a summary
measure calculated for a sample data set
Chebyshev’s Theorem: for any number k greater than 1, at least
Of the values for any distribution lie within k standard
deviations from the mean (use mean and standard deviation)
K= distance between the mean and a point/SD (k>1)
Distance between mean and each of the two given points must
be the same
Gives minimum area (in %) under distribution curve b/w two
points
1. Find distance btw mean and 2 given points
2. Divide distance by standard to get “k”
3. Sub k in formula
Empirical rule: only applies to bell-shaped distribution
68% of observations lie within interval (μ ±1σ)
95% of observations lie within interval (μ ±2σ)
99.7% of observations lie within interval (μ ±3σ)
Quartiles: divide ranked data set into 4 equal parts
Rank set in increasing order – In an odd number, the median
stands alone
Q1= value of the middle term among the ranked observations
ranked less than the median
Q2= value of the middle term in a ranked data set (median)
Q3=value of the middle term among the ranked observations
greater than the median
IQR = Q3−Q1 (not affected by outliers)
Lower inner fence = Q1−1.5 IQR (to determine mild outliers)
Upper inner fence = Q3+1.5 IQR
Lower outer fence = Q1−3IQR (outside outer fences= extreme
outliers)
Upper outer fence = Q3+3IQR
The kth percentile:
Always round to the next higher whole number
Percentile rank of Xi=
Give the % of values in a dataset that are less than xi
Chapter 4: Probability
Simple event – an event that includes only one of the final
outcomes of an experiment
Compound event - a collection of more than one outcomes for
an experiment
Two properties of probability
1.Event always lie in the range 0 to 1. 0≤ P(A) ≤1
2.Sum of all probabilities of a simple event is always 1. ΣP(Ei)=1
Classical Probability: applied for an experiment where all
outcomes are equally likely
Classical Probability Rule for a simple event:
classical Probability Rule for a compound event:
Relative frequency as an approximation of probability (needed when outcomes
are not equally likely)
More repetitions lead to closer actual probabilities
Subjective probability: based on subjective judgement, experience, info, belief
Marginal probability; prob. of single event w/o consideration of any other event
(Divides totals for rows or columns by the grand total)
Conditional probability of an event: prob. event will occur given another event
has already occurred
Conditions for independence of events: P(A|B) =P(A) or P(B|A)=P(B) (the
occurrence of 1 event does not affect occurrence of another)
If 2 probabilities are equal, the events are independent
1. Two events are either mutually exclusive or independent
a. Mutually exclusive events are always dependent
b. Independent events are never mutually exclusive
2. Dependent events may or may not be mutually exclusive
Mutually exclusive events cannot occur together
Complementary events: P(A) + P(A) = 1
A = all outcomes from an experiment not in A (if we already know A, we can find
prob of complementary event) [1 - P(A) = P(A)]
2 complementary events = always mutually exclusive
To find joint probability:
Multiplication rule for independent events: P(A and B) = P(A) x P(B)
Multiplication rule for dependent events: P(A and B) = P(A) x P(B|A) or
P(B)x P(A|B)
Joint probability of two mutually exclusive events (cannot occur together):
P(A and B) = 0
Addition rule for mutually nonexclusive events (can occur together):
P(A or B) = P(A) + P(B) – P(A and B)
Addition rule for mutually exclusive events:
P(A or B) = P(A) + P(B)
P(neither (A or B)) = 1 – P(A and B)
n factorial: the product of all the integers from given number to 1 n! = n(n-1)(n2)(n-3) … 0! = 1 (always)
Combinations: gives the number of ways x elements can be selected from n
elements n = total items
x = # of items selected out of n
n is always ≥ x
Counting rule: total outcomes for an experiment = m x n x k
Download