Uploaded by LOPAMUDRA MOHAPATRA

Session 5 & 6

advertisement
Introduction to Probability
Dr. Indranil Ghosh, IT & Analytics Area, Institute of Management
Technology, Hyderabad, Telangana, India
A Brief History
Early Generalizations
Axiomatic Development
“But to us, probability is the very
guide of life.”
Bishop Joseph Butler
Chance of Occurrence
Random Experiment
• Random experiment is an experiment in which the outcome is not known
with certainty.
• Predictive analysis mainly deals with random experiment like:
• Predicting quarterly revenue of an organization
• Customer churn
• Demand for a product at future time period etc.
Fundamentals
Experiment
Event
• An experiment is a process which
produces outcomes. For example, if we
toss a fair coin, we may obtain either a
head or a tail. So, tossing this fair coin is
an experiment which can produce two
outcomes, either a head or a tail.
• Similarly, when we roll a die, six possible
outcomes can arise, that is, turning of any
of the six numbers 1, 2, 3, 4, 5, 6 on the
upper face of the dice
• An interview to gauge the job satisfaction
levels of the employees in an
organization is also an experiment
because this will produce outcomes.
• An event is the outcome of an
experiment.
• If the experiment is to roll a dice, an
event can be defined as obtaining a 6 on
the upper face of the dice.
• If the experiment is to toss a fair coin, an
event can be obtaining a tail.
• If an event has a single possible outcome,
it is called a simple (or elementary) event.
• A subset of outcomes corresponding to a
specific event is called an event space.
Union of two sets
Intersection of two sets
Compound Event
• The joint occurrence of two or more
simple events is known as a
compound event.
• In other words, if two or more events
are connected with each other, then
their simultaneous occurrence is
called a compound event. In an
experiment in which two coins are
tossed, the event of obtaining “one
head and one tail” is a compound
event as it consists of two events: (1)
one head occurrence and (2) one tail
occurrence.
Independent and Dependent Events
• Two events are said to be independent events if
the occurrence or non-occurrence of one is not
affected by the occurrence or nonoccurence of the
other.
• For example, when tossing a coin, a tail on the first
toss does not affect the possibility of obtaining a
tail on the second toss. So, this is an independent
event.
• Two or more events are said to be dependent if
the occurrence of one event influences the
occurrence of the other. Dependence indicates a
relationship between two events and implies that
knowledge of one event can be used in assessing
the occurrence of the other event. For example
actual sales and expense incurred in advertising.
Mutually Exclusive Events
• Two or more events are said to be
mutually exclusive if the occurrence of
one implies that the other cannot occur.
In other words, two events are mutually
exclusive if the occurrence of one of
them rules out the occurrence of the
other.
• For example, in an unbiased coin tossing
experiment, either a head can occur or a
tail can occur, but the two events head
and tail cannot occur together. Similarly,
when rolling a dice, two numbers 3 and 4
cannot occur on the upper face in one
throw.
Equally Likely Events
• Two or more events are said to be equally
likely if each has an equal chance of
occurrence.
• In other words, two or more events are said
to be equally likely if any of them cannot be
expected to occur in preference over the
other. For example, in an unbiased coin
tossing experiment, both the outcomes, that
is, head and tail, have an equal chance of
occurrence.
• Similarly, in a die rolling experiment, all
possible outcomes, that is, 1, 2, 3, 4, 5, 6 are
equally likely because none of the outcomes
can occur in preference over the other.
Complementary Events
• The complement of event A is the set
of all the outcomes in a sample space
that are not included in the event A.
This is generally denoted by A’ or 𝐴 .
• For example, in a die rolling
experiment, if event A is getting 2,
then the complement A is getting 1, 3,
4, 5, 6 on the upper face of the die.
• Two events are complementary, when
one event occurs if and only if the
other does not.
Sample Space
• The sample space denoted by S is the set of all possible outcomes of
an experiment. For a single die rolling experiment, the sample space
will be {1, 2, 3, 4, 5, 6}. When we roll a pair of dice, sample space or
all possible elementary events are given as:
Possible outcomes for rolling a pair of dice
Counting Rule
• Multi-Step Experiment: If an experiment is defined as a sequence of k
steps, with n1 possible outcomes in the first step, n2 possible
outcomes in the second step, and so on, then the total number of
experimental outcomes is given by (n1) × (n2) ×…× (nk).
Counting Rules for Combinations
• The second counting method uses the concept of combinations.
Sampling of n items from a population of size N (usually larger)
without replacement provides
Example
• A firm wants to randomly select 3 employees from a total of 10
employees. How many combinations of 3 employees can be selected?
Counting Rules for Combinations
• The second counting method uses the concept of combinations.
Sampling of n items from a population of size N (usually larger)
without replacement provides
Example
• A firm wants to randomly select 3 employees from a total of 10
employees. How many combinations of 3 employees can be selected?
Counting Rules for Permutations
• A third rule of counting known as the counting rule for permutations
helps in computing the possible number of experimental outcomes
when n items are to be selected from a set of N items in a particular
order.
• The same n items selected in a different order would be considered a
different experimental outcome.
• The number of permutations of N items taken n at a time is given by
Example
• A quality control inspector selects two parts out of five for inspecting
defects. How many permutations may be selected?
Classical Definition of Probability
• This is a mathematical approach of assigning probability. If for an
experiment there are N exhaustive, mutually exclusive, and equally
likely cases, and out of these, 𝑛𝑒 are favorable to the occurrence of an
event E, then as per the classical approach of probability, the
probability of occurrence of the event E is given by
Illustrations
• A company employs a total of
• So, the probability of randomly
400 workers. Out of these, 150
selecting a skilled worker from a
workers are skilled and 250
total of 400 workers is 37.5%.
workers are unskilled. The
Probability of non-occurrence of
probability of randomly selecting
an event 𝐸 is given by
a skilled worker is
• Probability of not selecting a
skilled worker from a total of
400 workers is:
Probability Estimation using Relative
Frequency
• According to frequency estimation, the probability of an event X,
P(X), is given by
P( X ) 
Number of observations in favour of event X n( X )

Total number of observations
N
Examples
A website displays 10 advertisements and the revenue generated by the
website depends on the number of visitors to the site clicking on any of the
advertisements displayed in the website. The data collected by the company
has revealed that out of 2500 visitors, 30 people clicked on 1 advertisement,
15 clicked on 2 advertisements, and 5 clicked on 3 advertisements.
Remaining did not click on any of the advertisements. Calculate
(a) The probability that a visitor to the website will click on an
advertisement.
(b) The probability that the visitor will click on at least two advertisements.
(c) The probability that a visitor will not click on any advertisements.
Solution
(a) Number of customers clicking an advertisement is 50 and the total
number of visitors is 2500. Thus, the probability that a visitor to the
website will click on an advertisement is
50
 0.02
2500
(b) Number of customers clicking on at least 2 advertisements is 20.
Thus, the probability that a visitor will click on at least 2
advertisements is
20
2500
 0.008
(c) Probability that a visitor will not click on any advertisement is
2450
 0.98
2500
Algebra of Events
• Assume that X, Y and Z are three events of a sample space. Then the
following algebraic relationships are valid and are useful while deriving
probabilities of events:
• Commutative rule: X  Y = Y  X and X  Y = Y  X
• Associative rule: (X  Y)  Z = X  (Y  Z) and (X  Y)  Z = X  (Y  Z)
• Distributive rule: X  (Y  Z) = (X  Y)  (X  Z)
X  (Y  Z) = (X  Y)  (X  Z)
Contd.
• The following rules known as DeMorgan’s Laws on complementary
sets are useful while deriving probabilities:
(X  Y)C = XC  YC
(X  Y)C = XC  YC
where XC and YC are the complementary events of X and Y,
respectively
Axioms of Probability
According to axiomatic theory of probability, the probability of an
event E satisfies the following axioms
1. The probability of event E always lies between 0 and 1. That is, 0 
P(E) 1.
2. The probability of the universal set S is 1. That is, P(S) = 1
3. P(X  Y) = P(X) + P(Y), where X and Y are two mutually exclusive
events.
The elementary rules of probability are directly deduced from the original
three axioms of probability, using the set theory relationships
1. For any event A, the probability of the complementary event, written AC, is
given by
P(A) = 1 – P(AC)
If P(A) is a probability of observing a fraudulent transaction at an ecommerce portal, then P(AC) is the probability of observing a genuine
transaction.
2. The probability of an empty or impossible event, , is zero: P( )  0
3. If occurrence of an event A implies that an event B also occurs, so
that the event class A is a subset of event class B, then the probability
of A is less than or equal to the probability of B:
P ( A)  P ( B )
4. The probability that either events A or B occur or both occur is given
by
P( A  B)  P( A)  P( B)  P( A  B)
5. If A and B are mutually exclusive events, so that P( A  B)  0 , then
P( A  B)  P( A)  P( B)
6. If A1, A2, …, An are n events that form a partition of sample space S,
then their probabilities must add up to 1: P( A )  P( A )   P( A )   P( A )  1
n
1
2
n
i 1
i
Types of Probability
• Marginal Probability
• Union Probability
• Joint Probability
• Conditional Probability
Union Probability
• Union probability is the second type of probability. If E1 and E2 are
two events, then union probability is denoted by P(E1∪ E2) and is the
probability that event E1 will occur or that event E2 will occur or both
event E1 and event E2 will occur.
Joint Probability
• Let A and B be two events in a sample space. Then the joint
probability of the two events, written as P(A  B), is given by
Number of observations in A  B
P( A  B) 
Total number of observations
Example
• ABRC, a leading marketing research firm in India, wants to collect
information about households with computers and Internet access in
urban Mumbai. After conducting an intensive survey, it was revealed that
60% of the households have computers with Internet access; 70% of the
households have two or more computer sets. Suppose 50% of the
households have computers with Internet connection and two or more
computers. A household with computer is randomly selected.
• 1. What is the probability that the household has computers with Internet
access or two or more computers?
• 2. What is the probability that the household has computers with Internet
access or two or more computers, but not both?
• 3. What is the probability that the household has neither computers with
Internet access nor two or more computers?
Solution
Solution (Contd.)
Joint Probability
• Let A and B be two events in a sample space. Then the joint
probability of the two events, written as P(A  B), is given by
Number of observations in A  B
P( A  B) 
Total number of observations
Example
At an e-commerce customer service centre a total of 112 complaints
were received. 78 customers complained about late delivery of the
items and 40 complained about poor product quality.
(a) Calculate the probability that a customer will complain about both
late delivery and product quality.
(b) What is the probability that a complaint is only about poor quality
of the product?
Solution
• Let A = Late delivery and B = Poor quality of the product. Let n(A)
and n(B) be the number of events in favour of A and B. So n(A) = 78
and n(B) = 40. Since the total number of complaints is 112, hence
n(A  B) = 118 – 112 = 6
• Probability of a complaint about both delivery and poor product
quality is
n(A  B)
6
P(A  B) 

 0.0535
Total number of complaints 112
• Probability that the complaint is only about poor quality = 1-P(A) =
1
78
 0.3035
112
• Marginal probability is simply a probability of an event X, denoted by P(X),
without any conditions
• Independent Events : Two events A and B are independent when occurrence of
one event (say event A) does not affect the probability of occurrence of the other
event (event B). Mathematically, two events A and B are independent when
P(A  B) = P(A)  P(B).
• Conditional Probability: If A and B are events in a sample space, then the
conditional probability of the event B given that the event A has already
occurred, denoted by P(B|A), is defined as
P( B | A) 
P( A  B)
, P( A)  0
P( A)
Application of Simple Probability Rules in
Analytics
• Association rule mining is one of the popular algorithms used to solve
problems such as market basket analysis and recommender systems.
• Market basket analysis (MBA) is used frequently by retailers to predict
products a customer is likely to buy together, which further can be
used for designing planogram and product promotions
Association Rule Mining
• Association rule learning (also known as association rule mining) is a
method of finding association between different entities in a
database
• Association rule is a relationship of the form
X  Y (that is, X implies Y).
Association rule learning Example Binary representation of point of sale
data
• In Table , transaction ID is the transaction reference number and apple,
orange, etc. are the different SKUs sold by the store. Binary code is used to
represent whether the SKU was purchased (equal to 1) or not (equal to 0)
during a transaction. The strength of association between two mutually
exclusive subsets can be measured using ‘support’, ‘confidence’, and ‘lift’
• Support between two sets (of products purchased) is calculated using the
joint probability of those events:
n( X  Y )
Support  P( X  Y ) 
N
• Where n(X  Y) is the number of times both X and Y is purchased together
and N is the total number of transactions
• Confidence is the conditional probability of purchasing product Y given the
product X is purchased. It measures probability of event Y (customer
buying a product Y) given the event X has occurred (the customer has
already purchased product X). That is,
Confidence =
P(Y | X ) 
P( X  Y )
P( X )
• Lift: The third measure in association rule mining is lift, which is given by
Lift =
P( X  Y )
P( X )  P(Y )
Association rules can be generated based on threshold values of support,
confidence and lift. For example, assume that the cut-off for support is 0.25
and confidence is 0.5 (Lift should be more than 1)
Bayes Theorem
• Bayes theorem is one of the most important concepts in analytics since several problems are
solved using Bayesian statistics
P( A | B) 
P( A  B)
P( B)
and
P( B | A) 
P( A  B)
P( A)
• Using the two equations, we can show that
P( B | A) 
P( A | B) P( B)
P( A)
Terminologies used to describe
various components in Bayes Theorem
1. P(B) is called the prior probability (estimate of the probability without any
additional information).
P( B | A) 
P( A | B) P( B)
P( A)
2. P(B|A) is called the posterior probability (that is, given that the event A has
occurred, what is the probability of occurrence of event B). That is,
post the additional information (or additional evidence) that A has
occurred, what is estimated probability of occurrence of B.
3. P(A|B) is called the likelihood of observing evidence A if B is true.
4. P(A) is the prior probability of A
Monty Hall Problem
Monty Hall Problem Using Bayes
Theorem
• Let C1, C2, and C3 be the events that the car is behind door 1, 2, and 3,
respectively. Let D1, D2, and D3 be the events that Monty opens door 1, 2,
and 3, respectively. Prior probabilities of C1, C2, and C3 are
P(C1) = P(C2) = P(C3) = 1/3
• Assume that the player has chosen door 1 and Monty opens door 2 to
reveal a goat. Now we would like to calculate the posterior probability
P(C1|D2), that is, the probability that the car is behind door 1 (door chosen
initially by the player) when Monty has provided the additional information
that the car is not behind door 2
• Using, Bayes theorem
P(C1 | D2 ) 
P( D2 | C1 )  P(C1 ) (1/ 2)  (1/ 3)

 1/ 3
P( D2 )
(1/ 2)
• P(D2|C1) = 12(if the car is behind door 1, then Monty can open either
door 2 or 3)
P(D2) =
1
2
Note that P(C2|D2) = 0.
P(D2 | C3 )  P(C3 ) 1 (1/ 3)
P(C3 | D2 ) 

2/3
P(D2 )
(1/ 2)
Thus, changing the initial choice will increase the probability of winning
the car.
P(D2|C3) = 1 (if the car is behind door 3 and the player has chosen door
1, Monty has to open door 2 with probability 1)
Generalization of Bayes Theorem
Example
• Black boxes used in aircrafts manufactured by three companies A, B
and C. 75% are manufactured by A, 15% by B, and 10% by C. The
defect rates of black boxes manufactured by A, B, and C are 4%, 6%,
and 8%, respectively. If a black box tested randomly is found to be
defective, what is the probability that it is manufactured by company
A?
Solution
Probable but not Possible!!!
https://www.pinterest.com/pin/64317100903604229/
Random Variables
• Random variable is a function that
maps every outcome in the sample
space to a real number.
• A function that assigns a real
number to each sample point in
the sample space S.
• Random variable is a robust and
convenient way of representing
the outcome of a random
experiment
Discrete Random Variables
• If the random variable X can assume only a finite or countably infinite set of values, then it is
called a discrete random variable.
• Examples of discrete random variables are:
• Credit rating (usually classified into different categories such as low, medium and high or
using labels such as AAA, AA, A, BBB, etc.).
• Number of orders received at an e-commerce retailer which can be countably infinite.
• Customer churn (the random variables take binary values, 1. Churn and 2. Do not churn).
• Fraud (the random variables take binary values, 1. Fraudulent transaction and 2. Genuine
transaction).
• Any experiment that involves counting (for example, number of returns in a day from
customers of e-commerce portals such as Amazon, Flipkart; number of customers not
accepting job offers from an organization).
Continuous Random Variables
• A random variable X which can take a value from an infinite set of values is called a continuous
random variable
• Examples of continuous random variables are listed below:
• Market share of a company (which take any value from an infinite set of values between 0
and 100%).
• Percentage of attrition among employees of an organization.
• Time to failure of engineering systems.
• Time taken to complete an order placed at an e-commerce portal.
• Time taken to resolve a customer complaint at call and service centers.
Problem Solving
Dr. Indranil Ghosh, IT & Analytics Area, Institute of Management
Technology, Hyderabad, Telangana, India
Problem
• A store receives 3 red, 6 white, and 7 blue shirts. Two shirts are drawn
at random. Determine the probability that:
1. Both the shirts are white
2. Both the shirts are blue
3. One shirt is red and the other is white
4. One shirt is white and the other shirt is blue.
Solution
Solution (Contd.)
Problem
• The probability that a contractor will not get a plumbing contract is
1/3, and the probability that he will get an electrical contract is 4/9. If
the probability of getting at least 1 contract is 4/5, what is the
probability that he will get both the contracts? Let A and B stand for
the event of getting the plumbing and electrical contracts,
respectively.
Solution
Problem (Independent Event)
• A candidate is selected for an interview for 3 posts. In the first post,
there are 3 candidates, for the second, there are 4, and for the third,
there are 2. What are the chances of his getting at least 1 post?
Solution
Problem
• From a well-shuffled pack of 52 cards, a card is drawn at random. Find
the probability that it is an ace or a heart.
Probability Matrices
• A company is interested in understanding the consumer behaviour of the capital of the
newly formed state Chhattisgarh, that is, Raipur. For this purpose, the company has
selected a sample of 300 consumers and asked a simple question, “Do you enjoy
shopping?” Out of 300 respondents, 200 were males and 100 were females. Out of 200
males, 120 responded “Yes,” and out of 100 females, 70 responded “Yes.” A respondent
is selected randomly. Construct a probability matrix and ascertain the probability that:
1. The respondent is a male
2. Enjoys shopping
3. Is a female and enjoys shopping
4. Is a male and does not enjoy shopping
5. Is a female or enjoys shopping
6. Is a male or does not enjoy shopping
7. Is a male or female.
Solution
The probability matrix can be
constructed as shown in the table
below.
Independent Events
• Delta is a leading marketing
research firm in India. A client of
Delta is interested in the
probable relationship between
telephone and television
purchase of a particular region.
The company prepared a single
question “Do you have a
telephone and/or a television in
your home” and conducted a
survey on 75 persons.
Is Television Purchase Dependent on
Telephone Purchase
Problem
• A market survey was conducted in
four cities to find out the
preference for brand A soap. The
responses are shown below:
(a) What is the probability that a
consumer selected at random,
preferred brand A?
(b) What is the probability that a
consumer preferred brand A and
was from Chennai?
(c) What is the probability that a
consumer preferred brand A,
given that he was from Chennai?
(d) Given that a consumer preferred
brand A, what is the probability
that he was from Mumbai
Solution
• Let X denote the event that a
consumer selected at random
preferred brand A. Then
Revisiting Bayes Theorem
• The Bayes’ theorem is useful in
revising the original probability
estimates of known outcomes as
we gain additional information
about these outcomes. The prior
probabilities, when changed in
the light of new information, are
called revised or posterior
probabilities.
Proof
Generalization of Bayes Theorem
Problem
• In a bolt factory, machines X, Y,
and Z manufacture 20%,
35%, and 45% of items,
respectively. Out of which 8%,
6%, and 5% items are defective
from machines Y and Z. One bolt
is drawn at random from the
product and is found defective.
What is the probability that it is
manufactured by machine Z?
Solution
• Tabulate the prior and posterior
probabilities:
Representation in the form of tree diagram
Problem
• Suppose an item is manufactured
by three machines X, Y, and Z. All
the three machines have equal
capacity and are operated at the
same rate. It is known that the
percentages of defective items
produced by X, Y, and Z are 2, 7,
and 12 per cent, respectively. All
the items produce by X, Y, and Z are
put into one bin. From this bin, one
item is drawn at random and is
found to be defective. What is the
probability that this item was
produced on Y?
Example
• Black boxes used in aircrafts manufactured by three companies A, B
and C. 75% are manufactured by A, 15% by B, and 10% by C. The
defect rates of black boxes manufactured by A, B, and C are 4%, 6%,
and 8%, respectively. If a black box tested randomly is found to be
defective, what is the probability that it is manufactured by company
A?
Solution
Discrete Random Variables
• If the random variable X can assume only a finite or countably infinite set of values, then it is
called a discrete random variable.
• Examples of discrete random variables are:
• Credit rating (usually classified into different categories such as low, medium and high or
using labels such as AAA, AA, A, BBB, etc.).
• Number of orders received at an e-commerce retailer which can be countably infinite.
• Customer churn (the random variables take binary values, 1. Churn and 2. Do not churn).
• Fraud (the random variables take binary values, 1. Fraudulent transaction and 2. Genuine
transaction).
• Any experiment that involves counting (for example, number of returns in a day from
customers of e-commerce portals such as Amazon, Flipkart; number of customers not
accepting job offers from an organization).
Continuous Random Variables
• A random variable X which can take a value from an infinite set of values is called a continuous
random variable
• Examples of continuous random variables are listed below:
• Market share of a company (which take any value from an infinite set of values between 0
and 100%).
• Percentage of attrition among employees of an organization.
• Time to failure of engineering systems.
• Time taken to complete an order placed at an e-commerce portal.
• Time taken to resolve a customer complaint at call and service centers.
Probability Distribution
Dr. Indranil Ghosh, IT & Analytics Area, Institute of Management
Technology, Hyderabad, Telangana, India
Random Variables
• Random variable is a function that
maps every outcome in the sample
space to a real number.
• A function that assigns a real
number to each sample point in
the sample space S.
• Random variable is a robust and
convenient way of representing
the outcome of a random
experiment
A random variable is a numerical
description of the outcome of an
experiment.
Why Random Variables?
To Predict
Discrete Random Variables
• If the random variable X can assume only a finite or countably infinite set of values, then it is
called a discrete random variable.
• Examples of discrete random variables are:
• Credit rating (usually classified into different categories such as low, medium and high or
using labels such as AAA, AA, A, BBB, etc.).
• Number of orders received at an e-commerce retailer which can be countably infinite.
• Customer churn (the random variables take binary values, 1. Churn and 2. Do not churn).
• Fraud (the random variables take binary values, 1. Fraudulent transaction and 2. Genuine
transaction).
• Any experiment that involves counting (for example, number of returns in a day from
customers of e-commerce portals such as Amazon, Flipkart; number of customers not
accepting job offers from an organization).
Continuous Random Variables
• A random variable X which can take a value from an infinite set of values is called a continuous
random variable
• Examples of continuous random variables are listed below:
• Market share of a company (which take any value from an infinite set of values between 0
and 100%).
• Percentage of attrition among employees of an organization.
• Time to failure of engineering systems.
• Time taken to complete an order placed at an e-commerce portal.
• Time taken to resolve a customer complaint at call and service centers.
Instances of Discrete Random Variables
Types of Random Variables
Discrete Probability Distributions
• The probability distribution
for a random variable
describes how probabilities
are distributed over the
values of the random
variable.
• We can describe a discrete
probability distribution with
a table, graph, or equation.
Property
• The probability distribution is
defined by a probability
function, denoted by f(x),
which provides the
probability for each value of
the random variable.
• The required conditions for a
discrete probability function
are:
f(x) > 0
f(x) = 1
Examples
P(X)
0.4
0.3
0.2
0.1
0
1
2
3
4
5 X
Properties
Probability mass function
• For a discrete random variable,
the probability that a random
variable X taking a specific value
xi, P(X = xi), is called the
probability mass function P(xi).
• That is, a probability mass
function is a function that maps
each outcome of a random
experiment to a probability
Probability density function
Examples on Random Variables
• From a bag containing 3 red balls
and 2 white balls, a man is to
draw two balls at random
without replacement. He gains
Rs. 20 for each red ball and Rs.
10 for each white one. What is
the expectation of his draw?
Examples (Contd.)
• In a cricket match played to
benefit an ex-player, 10,000
tickets are to be sold at Rs. 500.
The prize is a Rs. 12,000 fridge
by lottery. If a person purchases
two tickets, what is his expected
gain?
Orthodox Probability Distributions
Binomial Distribution
• A random variable X is said to follow a Binomial distribution when
• The random variable can have only two outcomes success and failure (also
known as Bernoulli trials).
• The objective is to find the probability of getting k successes out of n trials.
• The probability of success is p and thus the probability of failure is
(1  p).
• The probability p is constant and does not change between trials
Possible Applications for the Binomial
Distribution
• A manufacturing plant labels items as either defective or acceptable.
• A firm bidding for contracts will either get a contract or not.
• A marketing research firm receives survey responses of “yes I will
buy” or “no I will not.”
• New job applicants either accept the offer or reject it.
Illustration
Probability Mass Function (PMF) of Binomial
Distribution
Example
Fashion Trends Online (FTO) is an e-commerce company that sells women apparel. It is observed
that about 10% of their customers return the items purchased by them for many reasons (such as
size, color, and material mismatch). On a particular day, 20 customers purchased items from FTO.
Calculate:
(a) Probability that exactly 5 customers will return the items.
(b) Probability that a maximum of 5 customers will return the items.
(c) Probability that more than 5 customers will return the items
(d) Average number of customers who are likely to return the items.
(e) The variance and the standard deviation of the number of returns.
purchased by them.
Solution
Problem
• Of the 41,636 residents of Tamil Nadu, 20% were born outside Tamil
Nadu. A group of 5 people is to be randomly selected from the state
and the discrete random variable is X, the number of persons in the
group who were born in outside Tamil Nadu. Find
1. The probability for exactly 2 persons born outside Tamil Nadu.
2. The probability for at least 3 persons born outside Tamil Nadu.
Time to be Normal!!
Normal Distribution
Dr. Indranil Ghosh, IT & Analytics Area, Institute of Management
Technology, Hyderabad, Telangana, India
Continuous Probability Distributions
• A continuous variable is a variable that can assume any value on a
continuum (can assume an uncountable number of values):
•
•
•
•
thickness of an item.
time required to complete a task.
temperature of a solution.
height, in inches.
• These can potentially take on any value depending only on the ability
to precisely and accurately measure.
Continuous Probability Distributions Vary By
Shape
•
•
•
Symmetrical
Bell-shaped
Ranges from
negative to
positive infinity
•
•
Symmetrical
Also known as
Rectangular
Distribution
• Every value between
the smallest & largest
is equally likely
•
•
•
Right skewed
Mean > Median
Ranges from
zero to
positive infinity
The Normal Distribution
• ‘Bell
Shaped.’
• Symmetrical. .
• Mean, Median and Mode are
Equal.
Location is determined by the
mean, μ.
Spread is determined by the
standard deviation, σ.
The random variable has an infinite
theoretical range:
- to +.
The Normal Distribution Density Function
Gaussian Distribution
Applications
• Stock Market Modelling
• Analyzing Mutual Funds
• Predictive Analytics
• Sampling
The Standardized Normal
•
Any normal distribution (with any mean and standard deviation
combination) can be transformed into the standardized normal
distribution (Z).
•
To compute normal probabilities need to transform X units into Z
units.
•
The standardized normal distribution (Z) has a mean of 0 and a
standard deviation of 1.
Translation to the Standardized Normal
Distribution
The Standardized Normal Probability Density
Function
The Standardized Normal Distribution
Example
Finding Normal Probabilities
Probability as Area Under the Curve
The Standardized Normal Table
The Standardized Normal Table (Contd.)
General Procedure for Finding Normal
Probabilities
Finding Normal Probabilities
Finding Normal Probabilities (Contd.)
Solution: Finding P(Z < 0.12)
Finding Normal Upper Tail Probabilities
Finding Normal Upper Tail Probabilities
(Contd.)
Finding a Normal Probability Between Two
Values
Solution: Finding P(0 < Z < 0.12)
Probabilities in the Lower Tail
Probabilities in the Lower Tail (Contd.)
Example
Solution
Evaluating Normality
• Not all continuous distributions are normal.
• It is important to evaluate how well the data set is approximated by a
normal distribution.
• Normally distributed data should approximate the theoretical normal
distribution:
• The normal distribution is bell shaped (symmetrical) where the mean is equal to
the median.
• The empirical rule applies to the normal distribution.
• The interquartile range of a normal distribution is 1.33 standard deviations.
Evaluating Normality (Contd.)
Comparing data characteristics to theoretical properties:
•Construct charts or graphs:
• For small- or moderate-sized data sets, construct a stem-and-leaf display or a boxplot to
check for symmetry.
• For large data sets, does the histogram or polygon appear bell-shaped?
•Compute descriptive summary measures
• Do the mean, median and mode have similar values?
• Is the interquartile range approximately 1.33σ?
• Is the range approximately 6σ?
Evaluating Normality (Contd.)
Comparing data characteristics to theoretical properties:
• Observe the distribution of the data set:
• Do approximately 2/3 of the observations lie within mean ±1 standard deviation?
• Do approximately 80% of the observations lie within mean ±1.28 standard deviations?
• Do approximately 95% of the observations lie within mean ±2 standard deviations?
• Evaluate normal probability plot:
• Is the normal probability plot approximately linear (i.e. a straight line) with positive slope?
Constructing A Normal Probability Plot
• Normal probability plot:
• Arrange data into ordered array.
• Find corresponding standardized normal quantile values (Z).
• Plot the pairs of points with observed data values (X) on the vertical axis and
the standardized normal quantile values (Z) on the horizontal axis.
• Evaluate the plot for evidence of linearity.
The Normal Probability Plot Interpretation
Evaluating Normality An Example: Mutual Fund
Returns
Evaluating Normality An Example: Mutual Fund
Returns (Contd.)
Evaluating Normality An Example: Mutual Fund
Returns (Contd.)
Evaluating Normality An Example: Mutual Fund
Returns (Contd.)
• Conclusions
•
•
•
•
•
The returns are right-skewed.
The returns have more values concentrated around the mean than expected.
The range is larger than expected.
Normal probability plot is not a straight line.
Overall, this data set greatly differs from the theoretical properties of the
normal distribution.
Introduction to Sampling
Dr. Indranil Ghosh, IT & Analytics Area, Institute of Management
Technology, Hyderabad, Telangana, India
Essence of Sampling
Random Sampling
• Shewhart (1931) defines random sample as a ‘sample drawn under
conditions such that the law of large number applies’
• Random sampling is usually carried out without replacement, that is,
an observation which is selected in the sample is removed from the
population for further consideration
• Random samples can also be created with replacement, that is, an
observation which is selected for inclusion in the sample can again be
considered since it is replaced (not removed) in the population.
Random Sampling (Example)
Stratified Sampling
• The population can be divided into mutually exclusive groups using
some factor (for example, age, gender, marital status, income,
geographical regions, etc.). The groups, thus, formed are called
stratum
• It is important that the groups are mutually exclusive and exhaustive
of the population.
Stratified Sampling Examples
a) Amount of time spent by male and female users in sending messages in a
day. Here the strata are male and female users.
b) Efficacy of a drug among different age groups. Age group can be
classified into categories such as less than 40, between 41 and 60, and
over 60 years of age.
c) Performance of children in school and the parents’ marital status. Here,
marital status can be (a) Single, (b) Married, (d) Divorced. In this case we
assume that the parent’s marital status may influence children’s
academic performance.
d) Television rating points for a program across different geographical
regions of a country. For India, geographical regions could be different
states of the country.
Steps in Stratified Sampling
a) Identify the factor that can be used for creating strata (for example:
factor = Age; Strata 1: age less than 40; Strata 2: age between 41
and 60; and Strata 3: Age more than 60).
b) Calculate the proportion of each stratum in the population (say p1,
p2, and p3 for three strata identified in step 1).
c) Calculate the sample size (say N). The sample size for strata 1, 2,
and 3 identified in step 2 are p1 × N, p2 × N, and p3 × N,
respectively.
d) Use random sampling procedure explained in Section 4.4.1 to
generate random samples in each strata.
e) Combine samples from each stratum to create the final sample.
Cluster Sampling
Cluster Sampling Steps
Bootstrap Aggregating
• Bootstrap Aggregating (known as Bagging) is sampling with replacement
used in machine learning algorithms, especially the random forest
algorithm (Breiman, 1996)
• The size of each sample and the number of samples are determined based
on factors such as population size, target accuracy of the model developed
using bagging and convergence, etc
• Bagging is frequently used in ensemble methods (in which several models
are developed and the final prediction is usually based on the majority
voting)
Non-Probability Sampling
• Convenience sampling is a non-probability sampling technique in
which the sample units are not selected according to a probability
distribution
• Sampling the data is collected from people who volunteer for such
data collection. There could be bias in case of voluntary sampling
Sampling Distribution
Examples
Sampling Distribution
• A sampling distribution is a distribution of all of the possible values of
a sample statistic for a given sample size selected from a population.
• For example, suppose you sample 50 students from your college
regarding their mean GPA. If you obtained many different samples of
size 50, you will compute a different mean for each sample. We are
interested in the distribution of all potential mean GPAs we might
calculate for any sample of 50 students.
Developing Sampling Distribution
Developing Sampling Distribution (Contd.)
Developing Sampling Distribution (Contd.)
Developing Sampling Distribution (Contd.)
Developing Sampling Distribution (Contd.)
Comparing the Population Distribution
to the Sample Means Distribution
Sample Mean Sampling Distribution:
Standard Error of the Mean
Sample Mean Sampling Distribution:
If the Population is Normal
Z-value for Sampling Distribution
of the Mean
Sampling Distribution Properties
Sampling Distribution Properties
Sample Mean Sampling Distribution:
If the Population is not Normal
Central Limit Theorem
Sample Mean Sampling Distribution:
If the Population is not Normal
How Large is Large Enough?
• For most distributions, n > 30 will give a sampling distribution that is
nearly normal.
• For fairly symmetric distributions, n > 15.
• For a normal population distribution, the sampling distribution of the
mean is always normally distributed.
Example
Population Proportions
Sampling Distribution of p
Z-Value for Proportions
Example
Central Limit Theory (CLT)
Alternative Version
Implications
Example
Estimation
Dr. Indranil Ghosh, IT & Analytics Area, Institute of Management
Technology, Hyderabad, Telangana, India
To find the true story we need to have confidence
in the work that produced the numbers
Estimation Process
• Estimation is a process used for
making
inferences
about
population parameters based on
samples
• Point Estimate: Point estimate of a
population parameter is the single
value (or specific value) calculated
from sample (thus called statistic).
• Interval Estimate: Instead of a
specific value of the parameter, in an
interval estimate the parameter is
said to lie in an interval (say between
points a and b) with certain
probability (or confidence).
• According to the central limit theorem,
the sample means for a sufficiently large
samples (n >= 30), are approximately
normally distributed, regardless of the
shape of the population distribution. For
a normally distributed population, sample
means are normally distributed for any
size of the sample. z formula for this is as
below:
• This formula can be rearranged algebraically for
population mean
• Sample mean x can be greater than or less than
the population mean; hence, the formula takes the
following form
• Confidence interval for estimating population
mean
Deeper Insights
• Population mean is located
within the confidence interval
99% Confidence Interval
Problem
• A researcher has taken a random
sample of size 70 from a
population with a sample mean
of 35 and a population standard
deviation of 4.62. Construct a
90% confidence interval to
estimate the population mean.
Sampling from a Finite Population
Problem
• A researcher wants to measure the
income level of employees working
in a company. The total employee
strength of the company is 1200. A
random sample of 50 employees
reveals that the average income of
sampled employees is Rs 15,000.
Historical data reveals that the
standard deviation of the income
of the employees is approximately
Rs 1500. Construct a 99%
confidence interval for obtaining
the average income of all the
employees working in this
company.
Solution
Interval Estimates Using t-Distribution
• We have seen that when the
population standard deviation is
unknown, sample standard deviation
can be used for estimating the
confidence interval for large samples
(n>= 30).
• In a real-life situation, a sample size
less than 30 is not very uncommon. In
the case of small sample size (n < 30),
the z formula discussed earlier is not
applicable. The problem can be solved
by using the t statistic, developed by a
British statistician, William S. Gosset.
•
When the population standard
deviation is not known and the
sample size is 30 or less, tdistribution is used.
•
Assumption for using t-distribution:
The population is normal or
approximately normal.
•
Applicable
when
population
standard deviation is not known
t-Distribution
•
The t-distribution is symmetrical but
flatter than the normal distribution,
and there is a different t-distribution
for different sample sizes (or
degrees of freedom). As the sample
size gets larger, the shape of the tdistribution becomes approximately
equal to the normal distribution.
•
Interval estimate for mean using tdistribution:
X + t. s/√n : Upper confidence limit
X – t. s/√n : Lower confidence limit
(value of t depends upon
degree of freedom and α)
Process
t-Distribution
Problems
• In a grocery store, the mean
expenditure per customer
is Rs 2000 with a standard
deviation of Rs 300. If a random
sample of 50 customers
is selected, what is the
probability that the sample
average expenditure
per customer is more
than Rs 2080?
Problems
• By the year 2014–2015, the telephone
instrument industry is estimated to
grow by 106.20 million units as
compared to 1993–1994 when the
total market size was only 3 million
units. Bharti Teletech, BPL Telecom, ITI
(Indian Telephone Industries), Bharti
Systel, Tata Telecom, and Gigrej
Telecom are some of the major
players in the market. Bharti Teletech
has a market share of 24%.3 If 200
purchasers of telephone instruments
are randomly selected, what is the
probability that 55 or more are Bharti
Teletech customers?
Problems
• n order to estimate the customer
loyalty for a particular product, a
researcher poses the following
question to a sample of 100
customers: How many years have
you been continuously using this
product? This sample yielded a
mean period of 8 years with a
sample standard deviation of 2
years. Construct a 95% confidence
interval for estimating the
population mean.
Problems
• The personnel department of an
organization wants to apply costcutting measures for improving
efficiency. As the first step, the
personnel department wants to curtail
telephone expenses incurred by
employees. For this, personnel
department has taken a random
sample of 10 employees and gathered
the following data about telephone
expenses (in thousand rupees) in the
previous year: 10, 12, 24, 23, 11, 14,
15, 34, 16, 23 Construct a 95%
confidence interval to estimate the
average telephone expenses of the
employees in the population
Ten Commandments of Sampling and
Estimation
• If sample size (n) is large enough, the
sampling distribution of the sample mean is
approximately normal regardless of
population distribution/shape. (n>=30)
• A larger sample automatically reduces the
standard error of mean.
• The primary endeavor of sampling is to
minimize the difference of sample and
population mean.
• Sampling constructs the entrance towards
inferential statistics.
• For a normal population distribution, the
sampling distribution of the mean is always
normally distributed irrespective of sample
size.
• For imposing confidence interval, sample
standard deviation can be utilized if
population standard deviation is not known
beforehand.
• For smaller sample size (n<30), estimation of
confidence interval resorts to t-Distribution.
• The t-Distribution tends to follow normal
distribution with higher degrees of freedom.
• Sample size is determined on the basis of
tolerance of residual and desired confidence
interval.
• Likewise mean, confidence interval can be
imposed for population proportion as well.
Download