Uploaded by bahram abedini

Probability Functions

advertisement
Department of Civil Engineering
Statistics
for
Construction
Random Variables and Probability distributions
Prepared by:
Bahram Abedinianagerabi
Outline
• What is a random variable?
• What is a distribution?
• Where do ‘commonly-used’ distributions come from?
• What distribution does my data come from?
• Do I have to specify a distribution to analyse my data?
Department of
Civil Engineering
2
What is a random variable?
• A random variable is a number associated with the outcome of a stochastic
process
Waiting time for hauling trucks
 Average number of major flooding in Arlington
 Productivity rate for construction labours

• In statistics, we want to take observations of random variables and use this
to make statements about the underlying stochastic process
Are the productivity rate for two different projects are the same?
 What is the probability which placing slab on grade takes more than 3 days?

• Parametric
models provide much power in the analysis of variation
(parameter estimation, hypothesis testing, model choice, prediction)


Statistical models of the random variables
Models of the underlying stochastic process
Department of
Civil Engineering
3
What is a distribution?
• A distribution characterises the probability (mass)
associated with each possible outcome of a
stochastic process
• Distributions of discrete data characterised by
probability mass functions
P( X  x)
 P( X  x)  1
x
x
• Distributions of continuous data are characterised by
probability density functions (pdf)

f (x)
 f ( x)dx  1

x
Department of
Civil Engineering
4
Expectations and variances
• Suppose we took a large sample from a particular distribution; we might
want to summarise something about what observations look like ‘on
average’ and how much variability there is.
• The expectation of a distribution is the average value of a random variable
over a large number of samples.
E ( X )   xP( X  x)
or
x
 xf ( x)dx
• The variance of a distribution is the average squared difference between
randomly sampled observations and the expected value.
Var ( X )   x  E ( x)  P( X  x) or
2
x
2


x

E
(
x
)
f ( x)dx

Department of
Civil Engineering
5
Random variable assumptions
• In most cases, we assume that the random variables we observe are
independent and identically distributed if each random variable has the
same probability distribution as the others and all are mutually
independent.
• This assumption allows us to make all sorts of statements both about what
we expect to see and how much variation to expect.
• Suppose X, Y and Z are random variables and a and b are constants.
E ( X  Y  Z )  E ( X )  E (Y )  E ( Z )  3E ( X )
Var ( X  Y  Z )  Var ( X )  Var (Y )  Var ( Z )  3Var ( X )
E (aX  b)  aE ( X )  b
Var (aX  b)  a 2 Var ( X )


Var  1n  X i   1n Var ( X )
 i

Department of
Civil Engineering
6
‘Commonly-used’ distributions
• At the core of much statistical theory and methodology lie a series of key
distributions (e.g. Normal, Binomial, Uniform, etc.)
• These distributions are closely related to each other and can be ‘derived’ as
the limit of simple stochastic processes when the random variable can be
counted or measured
• In many settings, more complex distributions are constructed from these
‘simple’ distributions
Ratios: E.g. Beta, Cauchy
 Compound: E.g. Geometric, Beta
 Mixture models

Department of
Civil Engineering
7
Bernoulli random variable
• Bernoulli random variable has two possible outcomes: 0 or 1.
• A binomial distribution is the sum of independent and identically
distributed Bernoulli random variables.
• For example, say I have a coin, and, when tossed, the probability that it
lands heads is p.
P(x) = (1−𝑝)𝑥−1 ∗ 𝑝
μ=
1
𝑝
, 𝜎2 =
1−𝑝
𝑝2
Department of
Civil Engineering
8
Example 1
• A division of a construction company has over 200 employees, 48% percent
of its employees are male. The company is going to randomly select 10 of
these employees to attend a conference related to new technologies for
Tunneling.
A) Let Z equals the number of male employees chosen. Is Z a binomial
variable? Why or why not?
Solution:
True. Each trial has two outcomes (male or not), results of each trial can be considered
independent since we're sampling less than %10 percent of the population, there is a fixed
number of trials (10), and the probability of success is the same for each trial (%48 percent).
• Technically, since we are sampling without replacement, each employee is not independent
and the probability slightly changes as we sample. But the %10, percent condition says that we
can still use a binomial distribution since we are sampling less than %10, percent of the
population.
• When our sample size is small in comparison to the population, this assumption of
independence doesn't change our results too much.
Department of
Civil Engineering
9
Example 2
• Historical data indicates that for the last 100 years, there have been 4 major
floods at a river.
A) Find the probability of a 10 years flooding.
Department of
Civil Engineering
10
Example 3
• The productivity of an employment is 65%. The construction employee is
being observed.
A) What is the probability that the first time that the employee is not
productive is his/her 7th observation.
Department of
Civil Engineering
11
Binomial Distribution
• Often, we don’t care about the exact order in which successes occurred.
We might therefore want to ask about the probability of k successes in n
trials. This is given by the binomial distribution.
• For example, the probability of exactly 3 heads in 4 coins tosses =
P(HHHT)+P(HHTH)+P(HTHH)+P(THHH)
 Each order has the same Bernoulli probability = (1/2)4
 There are 4 choose 3 = 4 orders

• Generally, if the probability of success is q, the probability of k successes in
n trials.
n k
P(k | n,q )   q (1  q ) n k
k 
• The expected number of successes is nq and the variance is nq(1-q).
Department of
Civil Engineering
12
Example 4
• Historical data indicates that 3% of slab concretes fail the strength test.
Twenty tests are performed.
A) What is the probability that exactly 17 slabs have enough strength?
B) What is the probability that at least two slabs do not meet the strength
requirement?
Department of
Civil Engineering
13
Normal Distribution
• Normal distribution, also known as the Gaussian distribution, is a
probability distribution that is symmetric about the mean, showing that
data near the mean are more frequent in occurrence than data far from the
mean.
• In graph form, normal distribution will appear as a bell curve.
Department of
Civil Engineering
14
Normal Distribution (Cont’d)
• The general formula for the normal distribution is
where
•
•
•
•
•
σ is a population standard deviation;
μ is a population mean;
x is a value or test statistic;
e is a mathematical constant of roughly 2.72;
π a mathematical constant of roughly 3.14.
Department of
Civil Engineering
15
Download