Descriptive Statistics

advertisement
1
Descriptive Statistics
Concepts of population, sample, random sample and frequency distribution of discrete and
continuous data.

A population is the set of all individuals with a given value for a variable associated with them.

A sample is a small group of individuals randomly selected (in the case of a random sample) from the
population as a whole, used as a representation of the population as a whole.

The frequency distribution of data is the number of individuals within a sample or population for
each value of the associated variable in discrete data, or for each range of values for the associated
variable in continuous data.
Presentation of data: frequency tables and diagrams
Grouped data: mid-interval values, interval width, upper and lower interval boundaries,
frequency histograms.

Mid interval values are found by halving the difference between the upper
and lower interval boundaries.

The interval width is simply the distance between the upper and lower
interval boundaries.

Frequency histograms are drawn with interval width proportional to bar
width and frequency as the height.
Median, mode; quartiles, percentiles.
Range; interquartile range; variance, standard deviation.

Mode (discrete data) is the most frequently occurring value in the data set.

Modal class (continuous data) is the most frequently occurring class.

Median is the middle value of an ordered data set.
For an odd number of data, the median is middle data.
For an even number of data, the median is average of two middle data.

Percentile is the score bellow which a certain percentage of the data lies.

Lower quartile (Q1) is the 25th percentile.

Median (Q2) is the 50th percentile.

Upper quartile (Q3) is the 75th percentile.

Range is the difference between the highest and lowest value in the data set.

The interquartile range is Q3−Q1.

Cumulative frequency is the frequency of all values less than a given value. A table can be drawn as
shown:
2

The population mean, μ is generally unknown but the sample mean, 𝑥̅ used to serve as an
unbiased estimate of this mean. That used to be. From now on for the examination
purposes, data will be treated as the population. Estimation of mean and variance of
population from a sample is no longer required.
In the examination the following formulae should be used

Mean

Population variance 𝝈𝟐 of a finite population of size n is a measure of statistical
dispersion (to what extent the data values deviate from the mean).

Standard deviation

approximately 68% of the population will measure between 1 standard deviation either side of the
mean

approximately 95% of the population will measure between 2 standard deviations either side of the
mean

approximately 99.7 % of the population will measure between 3 standard deviations either side of
the mean
3
Probability

The number of trials is the total number of times the “experiment” is repeated.

The outcomes are the different results possible for one trial of the experiment.

Equally likely outcomes are expected to have equal frequencies.

The sample space, U, is the set of all possible outcomes of an experiment.

And event is the occurrence of one particular outcome.
𝑛(𝐴)
𝑃(𝐴) =
𝑛(𝑈)

P(A) is the probability of an event A occurring in one trial,
n(A) is the number of times event A occurs in the sample space
n(U) is the total number of possible outcomes.
Complementary Events
Two events are described as complementary if they are the only two possible outcomes.
event A: rain, complement of A is A’: not rain
Two complementary events are mutually exclusive.
Since an event must either occur or not occur, the probability of the event either occurring
or not
occurring must be 1. This can be stated as follows.
𝑷(𝑨) + 𝑷(𝑨′ ) = 𝟏
Use when you need probability that an event will not happen

Possibility when we are interested in more than one outcome
(events are “and”, “or”, “at least”)
 Combined Events
∪ (𝑢𝑛𝑖𝑜𝑛) ≡ 𝑒𝑖𝑡ℎ𝑒𝑟
∩ (𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛) ≡ 𝑏𝑜𝑡ℎ/𝑎𝑛𝑑
Given two events, B and A, the probability of at least one of the two events occurring,
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
either A
or B
or both
P(A) includes part of B from intersection
P(B) includes part of A from intersection
𝑃(𝐴 ∩ 𝐵) (both A and B)
was counted twice, so one has
to be subtracted
4
𝐼𝑡 𝑖𝑠 𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡 𝑡𝑜 𝑘𝑛𝑜𝑤 ℎ𝑜𝑤 𝑡𝑜 𝑔𝑒𝑡 𝑃(𝐴 ∩ 𝐵)
For mutually exclusive events (no possibility that A and B occurring at the same time)
Turning left and turning right (you can't do both at the same
time)
Tossing a coin: Heads and Tails
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
𝑃(𝐴 ∩ 𝐵) = ∅
Independent and Dependent Events
A bag contains three different kinds of marbles: red, blue and green. You pick the marble twice.
Probability of picking up the red one (or any) the second time depends weather you put back
the first marble or not.
•
Independent Events:
•
Dependent Events:
the probability that one event occurs
in no way affects the probability of
the other event occurring.
probability of one event occurring influences
the likelihood of the other event
You put the first marble back
You don’t put the first marble
∎ Conditional Probability:
Given two events, B and A, the conditional probability of an event A is the probability
that
the event will occur given the knowledge that an event B has already occurred.
This probability is written as (notation for the probability of A given B)
P (A|B )
Probability of the intersection of A and B (both events occur) is:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵)
•
Independent Events:
𝑃(𝐴|𝐵) = 𝑃(𝐴) = 𝑃(𝐴|𝐵′ )
𝐴 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝐵 𝑛𝑜𝑟 𝑜𝑛 𝐵′
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)
•
Dependent Events:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵)
𝑃(𝐴|𝐵) 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑑𝑒𝑝𝑒𝑛𝑑𝑖𝑛𝑔 𝑜𝑛 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 𝐵
5
𝑃(𝐴|𝐵) =
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) =
=
𝑃(𝐵)
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵)
𝑛(𝐴 ∩ 𝐵)
𝑛(𝐴 ∩ 𝐵)
𝑛(𝑈)
=
𝑛(𝐵)
𝑛(𝐵)
𝑛(𝑈)
Probability that A will happen given B occurred, is
number of both together out of B available
On the other hand probability of both together
𝑃(𝐴 ∩ 𝐵) =
𝑛(𝐴 ∩ 𝐵)
𝑛(𝑈)
EXAMPLE:
In the class of 25 students, 14 like pizza and 16 like iced coffee. One student likes neither and 6
student like both. One student is randomly selected from the class. What is the probability that
the student likes
a. pizza
b. likes pizza and coffee
b. likes pizza given that she or he likes iced coffee?
n = 25
a. P(pizza) =
𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎
𝑎𝑙𝑙 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
b. P( pizza and coffee) =
𝑐𝑜𝑓𝑓𝑒𝑒
6
25
16
25
𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎 𝑎𝑛𝑑 𝑐𝑜𝑓𝑓𝑒𝑒
𝑐𝑜𝑓𝑓𝑒𝑒
=
𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎 𝑎𝑛𝑑 𝑐𝑜𝑓𝑓𝑒𝑒
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵)
c. P( pizza given coffee) =
25
𝑎𝑙𝑙 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
Using formula
𝑃(𝐴|𝐵) =
14
𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎 𝑎𝑛𝑑 𝑐𝑜𝑓𝑓𝑒𝑒
c. P( pizza given coffee) =
=
=
=
6
16
=
6
25
6
16
=
6
● Use of Venn diagrams, tree diagrams and tables of outcomes to solve problems.
1. Venn Diagrams
The probability is found using the principle 𝑃(𝐴) =
2.
𝑛(𝐴)
𝑛(𝑈)
Tree diagrams
A more flexible method for finding probabilities is known as a tree diagram.
This allows one to calculate the probabilities of the occurrence of events, even
where trials are non-identical (where 𝑃(𝐴|𝐴) ≠ 𝑃(𝐴)), through the product principle.
3.
Tables of outcomes two dice combinations
11
12
13
14
15
16
21
22
23
24
25
26
31
32
33
34
35
36
41
42
43
44
45
46
51
52
53
54
55
56
61
62
63
64
65
66
7
⧪ Bayes’ Theorem
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵)
𝑃(𝐴|𝐵) =
▪
⟹
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴)
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴)𝑃(𝐵|𝐴)
=
𝑃(𝐵)
𝑃(𝐵)
𝐵𝑎𝑦𝑒𝑠′ 𝑡ℎ𝑒𝑜𝑟𝑒𝑚
Another form of Bayes’ theorem (Formula booklet)
From tree diagram:
there are two ways to get A, either after B has happen or after B has not happened:
𝑃(𝐴) = 𝑃(𝐵)𝑃(𝐴|𝐵) + 𝑃(𝐵′)𝑃(𝐴|𝐵′)
𝑃(𝐵|𝐴) =
𝑃(𝐵)𝑃(𝐴|𝐵)
𝑃(𝐵)𝑃(𝐴|𝐵) + 𝑃(𝐵′)𝑃(𝐴|𝐵′)
⟹
8
▪ Extension of Bayes’ Theorem
If there are more options than simply B occurs or B doesn’t occur, for example if
there were
three possible outcomes for the first event B1, B2, and B3
Probability of A occurring is: 𝑃(𝐵1 )𝑃(𝐴|𝐵1 ) + 𝑃(𝐵2 )𝑃(𝐴|𝐵2 ) + 𝑃(𝐵3 )𝑃(𝐴|𝐵3 )
𝑃(𝐵𝑖 |𝐴) =
𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 )
𝑃(𝐵1 )𝑃(𝐴|𝐵1 ) + 𝑃(𝐵2 )𝑃(𝐴|𝐵2 ) + 𝑃(𝐵3 )𝑃(𝐴|𝐵3 )
Outcomes B1, B2, and B3 must cover all the possible outcomes.
Download