Dependence and its Measuring

advertisement
Dependence and its Measuring
Dr Boyan Dimitrov, Kettering University, Mathematics Department
Abstract. The dependence in the world of the uncertainty is a complex concept. The textbooks do avoid any
discussions in this regard. In the classical approach the conditional probability is used to determine further rules
in operations with probability. We use it to establish a concept of dependence proposed about 50 years ago, and
what are the ways of its interpretation and measuring when two random events А and В are dependent events.
Then we apply it to some examples to illustrate how suitable is this approach in the studies of local dependence.
1. Introduction
What I intend to tell you here, you can not find in any contemporary textbook. I am not sure if you
could find it even in the older textbooks on Probability and Statistics. But, I have read it more than
40 years ago in the Bulgarian textbook on Probability written by the great Bulgarian
mathematician Nikola Obreshkov (1963). Later during the years I have seen lots of sources of
different kind, and never met it in other textbooks, or monographs. It is not a single word about
this basics even in the well known Encyclopedia on Statistical Science published more than 20
years later.
Not long ago I started working on the measures of dependence between random variables
(r.v.). I found that researchers and practitioners develop variety of approaches to model
dependence between rave’s copula, regressions, correlations, and indexes of various kinds). In
these they use specific tools based on knowledge high above the basics, and out of the
understandings of an ordinary student. And there was never a try to touch the base, where sir
Obreshkov felt the things, but left it just in the mist. And so, I decided to start from there, the zero
point. How far I can get? I cannot guess. I trust my incentives and throw myself brave in the
challenge. Hopefully, this is due not to my ignorance and lack of information. Forgive me, if I
rediscover the wheel. I truly hope to find followers among the readers of this article.
For me the things look natural and simple. All the prerequisites are: What is probability of
a random event, when we have dependence, what is conditional probability of a random event if
another event occurs, and several basic rules when calculate probabilities related to pairs of events.
Of course, some interpretations of the facts, which the probability contains, are in favor.
When necessary, we will repeat here the most needed facts in regard of probability rules for
two random events. Those, who are familiar with more than an introductory course in Probability
and Statistics will find well known things, and let do not blame the authors of the textbooks they
have used for the gaps we fill in here. For the beginners let the written here be a challenge to their
wish to get deeper into the essence of the concept of dependence. We are willing the made of
examples discussed here to be used for further ideas of applications, and to illustrate the rich
opportunities for the use of this approach in practice.
2. Dependent events. Connection between random events
Let А and B be two arbitrary random events. It is well known that А and B are independent only
when the probability for their joint occurrence is equal to the product of the probabilities for their
individual appearance, i.e. when it is fulfilled
P( A  B)  P( A) P( B).
(1)
The readers familiar with the basics of probability theory know that the independence is equivalent
to the fact, that the conditional probability of one of the events given that the other event occurred
is not changed, and remains equal to its original, unconditional probability, i.e. in such cases it is
fulfilled
P( A | B)  P( A).
(2)
The only inconvenience in equation (2) for the definition of independence is that it
requires P ( B )  0 , i.e. B has to be a possible event. Otherwise, the conditional probability
P( A | B) 
P( A  B)
P( B)
(3)
is not defined, since in the right hand side of (3) there will be an improper division by zero. In the
same time, even if we have P ( B )  0 (then В is called impossible, or zero event), equation (1) is
fulfilled. This is due to the fact that the inclusion A  B  B implies that we have also the equality
0=0, since 0  P( A  B)  P( B)  0 , whatever the probability of the event А is equal. By the way,
the identity in equation (1) holds also P( B)  1 , i.e. when В is a sure event. Then it is true
P( A  B)  P( A)  P( B)  P( A  B)  P( A)  1  1  P( A) .
Therefore, in (1) the equality sign is guaranteed. This means that a zero event as well as a sure
event is independent with any other event, including them.
The most important fact is that when equality in (1) does not hold, the events А and В are
dependent.
The dependence in the world of the uncertainty is a complex concept. It is too little done to
explain its essentials. The textbooks do avoid any discussions in this regard. In the cases of a
classical approach to the probability, in cases of countable finite equally likely elementary
outcomes the equation (3) is used to determine the conditional probabilities, as well as this
definition is a base for getting further rules in operations with probability. We enter into the
establishing what is the concept of dependence and what are the ways of its interpretation and
measuring when А and В are dependent events. In addition, we naturally assume that neither of
these events is a zero, or sure event. The terminology used is the one proposed by Obreshkov,
1963.
Definition 1. The number
 ( A, B)  P( A  B)  P( A) P( B)
is called connection between the events А and В.
We immediately derive the following properties of the connection between two random events.
(4)
δ1) The connection between two random events  ( A, B) equals zero if and only if these
events are independent. This includes the cases when some of the events are zero, or sure
events.
δ2) The connection between events А and В is symmetric, i.e. it is fulfilled
 ( A, B)   ( B, A).
δ3) If A1 , A2 , , A j ,  are mutually exclusive events, then it is fulfilled
 ( A j , B)   ( A j , B) ,
i.e. the function  ( A, B) is additive with respect to either of its arguments. Therefore, it is
also continuous function as the probabilities in its construction are.
δ4) It is true that  ( A  C , B)   ( A, B)   (C , B)   ( A  C , B) , and most of the
properties of the probability function for random events are transferred to the connection
function.
δ5) The connection between events А and B (the complement of the event В) is equal in
magnitude to the connection between events А and В, but has an opposite sign, i.e. it is
true  ( A, B )   ( B, A). Indeed, from P( A)  P( A  B)  P( A  B) we obtain that it is
true
 ( A, B)  P( A  B)  P( A) P( B)  [ P( A)  P( A  B)]  P( A)[1  P( B)]
= - P ( A  B ) + P ( A) P ( B ) = -  ( A, B) .
The connection between the complementary events A and B is the same as the one
between А and В, i.e. it is fulfilled  ( A, B)   ( B, A). This property follows immediately
after a double application of the property δ3).
δ6) If the occurrence of А implies the occurrence of В, i.e. when we have A  B, then it
is fulfilled  ( A, B)  P( A) P( B) and the connection between the events А and В is
positive. The two events are then called positively associated.
δ7) When А and В are mutually exclusive, i.e. when A  B   , then it is fulfilled
 ( A, B)   P( A) P( B) and the connection between the events А and В is negative.
δ8) When  ( A, B)  0, the occurrence of one of the two events increases the probability
(this is the conditional probability) for the occurrence of the other event. The following
relation is true:
 ( A, B)
(5)
P( A | B)  P( A) 
.
P( B)
By making use of (3), according to the multiplication rule for probabilities we get the
equality P( A  B)  P( A | B) P( B) , and by substituting it in (4), we obtain one more representation
for the connection between the two events А and В
 ( A, B)  [ P( A | B)  P( A)]P( B).
(6)
This equation solved with respect to P ( A | B ) , gives the relation (5). Let us note specifically
that  ( A, B)  0 is a guarantee that it is fulfilled P( A) P( B)  0 .
The property δ8) can be reversed and for the cases when  ( A, B)  0 (then we also
have P( A) P( B)  0 ) with a conclusion that if the connection is negative the occurrence of one of
the events decreases the chances for the other one to occur. Equation (5) remains true. It also
indicates that the knowledge of the connection is very important, and can be used for calculation of
the posteriori probabilities similar to what people is custom to do when apply the Bayes rule! In
our case we do not really need to know any complete system of hypotheses, which is required in
order to apply the Bayes rule. It is sufficient to know the numeric value of the connection  ( A, B)
between the two events only, and their prior probabilities, in order to exactly evaluate the posterior
probability of either of the two events if we know that the other one occurs. We anticipate the most
applications of the considered here measures of dependence for similar purposes.
We will call the events А and В positively associated when  ( A, B)  0 , and negatively
associated when  ( A, B)  0 . The reason for this is the relationship (5) (we connect it to the
increase or decrease of the conditional probability for the occurrence of one of the events when the
other one occurs), as well as with some likelihood to situations concerned random variables.
δ11) The connection between any two events А and В satisfies the inequalities
max{  P( A) P( B),[1  P( A)][1  P( B)]}   ( A, B)  min{ P( A)[1  P( B)], [1  P( A)]P( B)} .
We call it Freshe-Hoefding inequalities. They also indicate that the values of the connection as
a measure of dependence is between – ¼ and + ¼.
Example 1. There are 1000 observations on the stock market, and it is found that in
80 cases there was a significant increase in the oil prices (event A). For the same
observations it is registered that there is a significant increase of the income at the Money
Market (event В) в 50 cases. Simultaneous significant increase in both investments
(event A  B ) is observed in 20 occasions. Let us determine the connection between the
two events, and let see how the information about occurrence of one of these events can
help to make a forecast for the appearance of the other event.
According to the frequencies estimation about probabilities, we have
P ( A) 
80
 .08 ;
1000
P( B) 
50
 .05 ,
1000
и
P( A  B) 
20
 .02 .
1000
In accordance with Definition 1 we get
 ( A, B) =.02 – (.08)(.05) = .016.
Therefore, by equation (5) we find that if it is known that there is a significant increase in
the investments in money market, then the probability to see also significant increase in the
oil price is
P ( A | B ) = .08 + (.016)/(.05) = .4.
Analogously, if we have information that there is a significant increase in the oil prices on
the market, then the chances to get also significant gains in the money market at the same
day will be estimated as follows:
P ( B | A) = .05 + (.016)/(.08) = .25.
It is understandable, that these numbers can be obtained also if one uses formula
(3), when it is known Р( A  B ) . In our case we assume that we know the numerical value
of the connection  ( A, B) , and the individual prior probabilities Р (А) and Р (В) only. And
namely, the knowledge of these numbers seems much more natural in the real life and to be
in use in practice, as well as if one wants to model dependence between random events for
other purposes
Remark 1. For those who know more about the science of uncertainty, for instance what
is a random variable (r.v.) and what is expected value Z(mathematical expectation) we note the
following: If we introduce the r.v.s which are the indicators of the considered random events, i.e.
I A  1, when event A occurs, and I A  0 when it occurs the complementary event A , then it is
true E( I A )  P( A) and
Cov( I A , I B )  E( I A .I B )  E( I A ) E( I B )  E( I AB )  P( A) P( B)  P( A  B)  P( A) P( B)   ( A, B)
.
Therefore, the connection between two random events equals to the covariance between their
indicators.
Comment: Similar to the covariance between to r.v.’s the numerical value of the
connection  ( A, B) does not speak clearly about the magnitude of this connection between А and
В. It is intuitively clear that the deeper connection should be between two coinciding events, i.e.
the strongest connection must hold when А=В. In such cases we have P( A)  P( B) , and also
 ( A, B)  P( A)  P 2 ( A).
Let see some numbers. Let us assume that А = В, and P( A)  P( B) = .05. Then we will
find  ( A, B)   ( A, A)  .05  .0025  .0475 , i.e. the connection of the event A with itself has a
very low value .0475. Moreover, the value of the connection varies together with the probability of
the event A.
Let us look on another example where P( A)  .3, P( B)  .4, but А may occur as with В, as
well as with B , and P ( A | B )  .6 . Then, according to (6) we obtain  ( A, B)  (.6  .3)(.4)  .12 .
The value of this connection is about 25 times stronger then the previously considered, despite the
fact that in the firs case the occurrence of В guarantees the occurence of А.
Due to this reason there are introduced other measures for the strength of the dependence
between two random events. In this way more opportunities for penetration into the complex
concept of dependence are offered.
3. Regression coefficients as measure of dependence between random
events.
We start with an explanation and interpretation of the probability P( A), and of the conditional
probability P ( A | B ) .
There is a general concept of probability P( A), this is the measure of the chances of the
random event А to occur in a single experiment. When this experiment is repeatable by assumption
many times, and then P( A) is approximately equal to the portion of those experiments where the
event А occurs, relative to the all performed experiments.
Analogous is the interpretation of the conditional probability P ( A | B ) . In repeatable
experiments it is approximately equal to the portion of those experiments where both events A and
B simultaneously occur relative to the all counts where the event B occurred. In other words, the
conditional probability P ( A | B ) is the conditional measure of the chances for the event A to occur,
when it is already known that the other event B occurred.
When В is a zero event then P ( B )  0 , and the conditional probability by the rule (3) can
not be defined, and is considered usually as undefined. It is convenient, and we will accept once
forever that in such cases is fulfilled P ( A | B ) = P( A) , since А and В are independent events
according to the fulfillment of identity (1). We also have P ( A | B ) = P( A) when the event В is a
sure event, i.e. when P( B)  1 .
With this agreement and interpretations of the conditional probability we introduce the next
measurement of the dependence of the event А on the event В:
Definition 2. Regression coefficient rB (A) of the event А with respect to the event В is
called the difference between the conditional probability for the event А to occur given the event В,
and the conditional probability for the event А to occur given the complementary event B of the
event В, namely
(7)
rB (A) = P ( A | B ) - P ( A | B ) .
We immediately notice, that according to our convention above, the regression coefficient rB (A)
is always defined, for any pair of events А and В (zero, sure, arbitrary random).
Analogously is defined and the regression coefficient rA (B) of the event В with respect to
the event А, namely
(8)
rA (B) = P( B | A)  P( B | A) .
With the establishment of the properties of the two regression coefficients we will
understand what else these measures contain in regard of the dependence between the two events А
and В. The following statements hold:
(r1) The equality to zero rB (A) = rA (B) =0 takes place if and only if the two events are
independent.
Proof. This statement obviously holds when one of the events (assume that this is B) is a zero, or
a sure event. Hence, let consider the 0  P( B)  1 . Then also we will have 0  P( B)  1 .
By representing the two conditional probabilities in (7) according to (3) and
implementation of the written in the next chain of equivalent equalities we get
rB ( A) 

P( A  B) P( A  B) P( A  B) P( B)  P( A  B) P( B)


P( B)
P( B)
P( B) P( B)
P( A  B)[1  P( B)]  P( A  B) P( B) P( A  B)  [ P( A  B)  P( A  B)]P( B)

.
P( B) P( B)
P( B) P( B)
The expression in the quantity of the numerator in the last fraction gives
P( A  B)  P( A  B)  P( A). After we take into account the definition of the connection  ( A, B)
between the two events from the last fraction we will find that rB (A) and  ( A, B) are related by
the identity
rB ( A) 
 ( A, B)
P( B)[1  P( B)]
, and analogously, rA ( B) 
 ( A, B)
P( A)[1  P( A)]
.
(9)
Therefore, rB (A) = rA (B) =0 only when  ( A, B) =0. According to property δ1) of the
connection  ( A, B) , this equality to zero is fulfilled only when the events А and В are independent.
In order to avoid proofs continuous references to the extreme situations in further, from
now on we will consider only the general situation when neither of the events А and/or В is zero,
or sure event. However, all the next statements are true for these situations too, due to our
agreements above.
(r2) The regression coefficients rB (A) and rA (B) are numbers with equal signs and this is
the sign of their connection  ( A, B) . However, their numerical values are not always equal. To be
valid the equation rB (A) = rA (B) it is necessary and sufficient to be valid the equality
P( A)[1  P( A)] = P( B)[1  P( B)] .
Proof. It follows from (9) that the connection  ( A, B) and the two regression coefficients are
related also by the equalities
 ( A, B) = rB (A) P( B)[1  P( B)] = rA (B) P( A)[1  P( A)] .
(10)
From these follows statement (r2).
(r3) The regression coefficients rB (A) and rA (B) are numbers between –1 and 1, i.e. they
satisfy the inequalities
 1  rB ( A)  1;
 1  rA ( B)  1.
(r3.1) The equality rB (A) = 1 holds only when the random event А coincides (is
equivalent) with the event В. Тhen is also valid the equality rA (B) =1;
(r3.2) The equality rB (A) = - 1 holds only when the random event А coincides (is
equivalent) with the event B - the complement of the event В. Тhen is also valid the equality
rA (B) = - 1, and respectively A  B .
Proof. The inequalities in property (r3) follow from the equations (7) and (8) – the very
definition of the regression coefficients. These are differences of two probabilities, and each
probability is a number between 0 and 1. The extreme values of the differences are pointed out
in (r3.1) and (r3.2).
Assume that we have rB (A) =1. Then according to (7) it means that it is fulfilled P ( A | B )
= 1, and P ( A | B ) = 0. But then these equalities are equivalent to the identities P ( A  B )
= P(B) , and P ( A  B ) = 0. From these it follows that we have
P( A) = P ( A  B ) + P ( A  B ) = P ( A  B ) = P(B) .
Therefore, in this case the event А is equivalent to A  B , and it is equivalent В. From this fact
follows that the events А and В are equivalent since the relation equivalence is transitive.
Reversely, when А is equivalent to the event В (usually it is written as А=В), then we have
P ( A | B ) = 1, and P ( A | B ) = 0, and according to (7) we will get rB (A) = 1. Property (r3.1) is
proven.
Analogously are conducted the considerations in the case rB (A) = - 1. According to (7) it
means that it is fulfilled P ( A | B ) = 0, and P ( A | B ) = - 1. From these equations by following
the way shown above we arrive to a conclusion that the events А and B are equivalent.
Therefore, property (r3) is proven.
(r4) It is fulfilled rB ( A) = - rB (A) , as well as rB ( A) = - rB (A) . Also the identities rA ( B) =
rA (B) = - rB (A) hold.
Proof. The first equality is obvious since the complement to B is the event В. The second
equality is a consequence of (7) and of the equalities P ( A | B ) = 1 - P ( A | B ) , plus the relation
P ( A | B ) = 1 - P ( A | B ) . In this way we get
rB ( A) = P ( A | B ) - P ( A | B ) = 1 - P ( A | B ) - [1 - P ( A | B ) ] = - rB (A) .
(r5) It is true that for any mutually exclusive sequence of events it is fulfilled
rB ( j A j )   j rB ( A j ) .
(r6) The regression function possesses the property
rB ( A  C)  rB ( A)  rB (C)  rB ( A  C) .
The two properties (r5) and (r6) are simple transfer of the respective properties of the
conditional probabilities P(A|B) and P(A| B ) and some easy algebraic manipulations with the
explicit expressions in both sides of the written equations. We omit details here.
We will dare to interpret the properties (r3) of the regression coefficients in the following
way: As closer is the numerical value of rB (A) to 1, as “denser inside within each other are the
events A and B, considered as sets of outcomes of the experiment”. In a similar way we interpret
also the negative values of the regression coefficient rB (A) close to negative 1. In other words
then “denser inside within each other are the events A and B (the complement to the event B)
considered as sets of outcomes of the experiment”. By the way, we do not forget that when rB (A)
= 1, then also rA (B) = 1, and also we simultaneously have rB (A) = - 1 = rA (B) .
Remark 2. The students, and some practitioners frequently mix the concepts of mutually
exclusive events (the fact that we have A  B   which means that the impossibility for
simultaneous occurrence of the both events A and B), and mutually independent events,
expressed by the equations (1) or (2). For random events that are neither the zero or the sure
event, the independence requires to be fulfilled A  B   . For this reason the equalities
rB (A) = rA (B) =  ( A, B) = 0, which are equivalent to independence between A and B also
indicate that it is true also A  B   .
For completeness we will consider here some particular cases and specific forms of the
connection  ( A, B) and of the regression coefficients rB (A) and rA (B) , in view of the mutual
location of the two random events A and B in the sample space Ω of all possible outcomes of an
experiment. These mutual locations without the case where B is inside A are shown on the Venn’
diagrams on Fig. 1.
a. А is part of В
( A  B)
b. А and В are mutually
exclusive ( A  B   )
c. General location
( A B  )
Fig. 1.
Actually, particular are all of the shown cases, but as specific particular cases are
considered these on Fig. 1a, and 1b.
For the case 1a. when A  B , we have
A  B  A , P( A | B)  P( A) P( B) , and P ( A | B ) =0. We find accordingly
 ( A, B) = P( A)[1  P( B)]  P( A) P( B) , rB (A) = P( A) P( B) , rA (B) = P ( B ) P ( A) .
It is worth to notice that in all the cases of A  B , or B  A the dependence measures between two
events (connection, as well as both regression coefficients) are positive. These measures for the
case B  A are symmetric to these for the case a. with an exchange between the roles of the events
A and B in it
For the case, pictured on Fig. 1b, we find that the following equalities hold:
 ( A, B) =  P ( A) P ( B ) ,
rB (A) =  P ( A) P ( B ) , rA (B) =  P ( B ) P( A) ,
and all of these measures are simultaneously negative.
For the general case 1c one may get as positive, as well as negative measures of
dependence. For example, if one has Р(А)=Р(В)=.5, and P ( A  B ) =.3, then the connection and
both regression coefficients are positive; if we have Р(А)=Р(В)=.5, and P ( A  B ) =.1, these all
measures are negative. The sign of the dependence could be interpreted as a trend in the
dependence toward one of the extreme situations 1a, or 1b.
If we combine the properties (r2) and (r3) of the regression coefficients, we will obtain the
following interesting statement:
Corollary. The numerical values of the connection  ( A, B) between two random events are
1
1
always a number between - ¼ and ¼, i.e. it is fulfilled    ( A, B)  .
4
4
Proof. According to equations (10) and property (r3) of the regression coefficients, we always
have  ( A, B) = rB (A) Р(В)[1-P(B)] ≤ Р(В)[1-P(B)]. The probability р=Р(В) is always a number
between 0 and 1, and the function g(p)=p(1-p) reaches its maximum ¼ within p  [0,1] for p=½ .
Substitute these values in the above inequality, and get the inequality in the right hand side of the
corollary. The left hand side inequality is obtained similarly when one uses first the minimal value
–1 of the regression coefficient rB (A) , instead of its maximal value.
It is interesting also the fact that the equality signs in the corollary hold not only when the
two random events A and B (respectively А and B ) are equivalent, but in addition it is necessary to
be fulfilled Р(В) = Р( B ) = 1/2. Therefore, the maximal connection is obtained when the two
events coincide, and their probability equals to ½.
Example 1 (continued):
We calculate here the values of the two regression coefficients rB (A) and rA (B) according
to the data of the given above example. We will use formulas (9). In this way we find:
The regression coefficient of the event A (a significant increase of the gas prices on the
market) in regard to the event B (significant increase in the Money Market return) has a numerical
value
rB (A) = (.016)/[(.05)(.95)] = .3368 .
In the same time we have
rA (B) = (.016)/[(.08)(.92)] = .2174 .
One thing we immediately see: the measure of dependence of the event A with respect to the event
B, expressed by the numeric value of the regression coefficient rB (A) is 1.5 times stronger than the
strength of dependence of B with respect to A shown by the numerical value of the regression
coefficient rA (B) . There exists an obvious asymmetry in the dependence between random events.
(r7) Freshet-Hoefding inequalities for the Regression Coefficients between two random
events:
 P( A)
 P( A) 1  P( A) 
1  P( A) 
max 
,
,
  rB ( A)  min 
;
P( B) 
 1  P( B)
 P( B) 1  P( B) 
 P( B)
 P( B) 1  P( B) 
1  P( B) 
max 
,
,
  rA ( B)  min 
.
P( A) 
 1  P( A)
 P( A) 1  P( A) 
The proofs are relatively simple consequences from property δ11) of the connection function. The
last property is anticipated to be used in simulation of dependent random events with desired value
of the regression coefficient, and given marginal probabilities P (A) and P (B). The given
inequalities show that some restrictions must be satisfied in this respect.
Remark 3. It took some time to clarify for myself the true reason for this name “regression
coefficient”. It contains a hint to follow, and it needs some additional knowledge about random
variables (r.v.’s), regression modeling, and concepts for expectation and variance.
Let I B ( ) and I B ( ) be the indicator r.v. as introduced in Remark 1, where the argument
ω is a symbol for an arbitrary outcome from the experiment. Formally, construct the following
“regression model” which represents a possible linear relationship
(11)
I B ( ) =α + β I A ( ) + ε (ω).
It allows one to “predict” the value of the indicator I B ( ) if one knows just the value of
indicator I A ( ) , and admits an error ε(ω) = I B ( ) – [α + β I A ( ) ] in this prediction with the
following desired properties: ε has a zero expectation and minimum variance, i.e. the values of the
coefficients α and β are such numbers that it is fulfilled
E [ I B ( ) -α–β I A ( ) ] = 0, and Var [ I B ( ) -α – β I A ( ) ] =
min Var [ I
B
( ) -a – b I A ( ) ]. (12)
a ,b
The first equation gives that there must be fulfilled the relation α = P (B) – β P (A). When this is
substituted in the second equation of (12), it turns into a one variable optimization problem. By
applying the standard mathematical calculations and some algebra, we arrive to the conclusion that
the value of β must minimize the expression
 2 P(A)[1-P(A) - 2 β [P(A∩B) - P(A)P(B)] + P(B)[1-P(B)].
Therefore, its value is
β = [P(A)P(B) - P(A∩B)]/P(A)P( A ) = δ(A,B)/ P(A)P( A ) ,
(13)
where δ (A, B) is the connection between the two events discussed in the previous section. Hence,
the other “optimal” coefficient has the value
α = P (B) + δ (A, B)/ P ( A )
(14)
If one substitutes P(B) = P(A∩B) + P(B ∩ A ) in the numerator of the expression (13), and formula
(3) for the conditional probabilities, after some regrouping terms in numerator, and partitioning of
the obtained expression into simple fractions, he will agree that the optimal value of the coefficient
β has following equivalent forms of representation:
β = P ( B | A)  P( B | A) = rA (B) .
Analogous manipulations with expression (14) will prove that the regression coefficient α in
equation (8) has the following equivalent representation
α = P (B | A )
If in addition we take into account that the indicators of the complementary events are related by
the equation I A ( ) = 1 – I A ( ) and use it in the main regression model (11), we will se that the
same equation can be used for “optimal prediction” of the values of I B ( ) when the indicator r.v.
I A ( ) is used, namely will get
I B ( ) = P (B | A ) + rA (B) . I A ( ) + ε = P (B| A) - rA (B) . I A ( ) + ε.
With this Remark 3 we gave an explanation of the genealogy of the Regression
Coefficients. It is possible that in a textbook this measure of dependence to be introduced not as
Definition 2 at the early stage with the probability concepts and rules, but much later after
distributions and expectation for random vectors is introduced. In our opinion, such a delay will
lose the opportunity to offer an early discussion on the dependence, and the challenge to offer the
student an important gate to many small research studies on dependence.
The obtained regression equation gives us the opportunity to explain the meaning of the
specific numerical values of the regression coefficient rA (B) . We need to think in terms of
outcomes ω of the experiment (or in terms of individuals who carry the feature A, equivalent to a
statement that ω favors event A). Then P (B | A ) is something like the “net value of the indicator
variable I B ( ) , and rA (B) is the (positive or negative) contribution of any single outcome ω to the
prediction of the value of I B ( ) via the values of I A ( ) .More the value of rA (B) is, more
outcomes that favor event A will contribute (will favor also) to the values of event B. The
asymmetry in this form of dependence of one event on the other can be explained by the capacity
of one or the other event. Events with less capacity (fewer amounts of favorable outcomes will
have less influence on events with larger capacity. Therefore, when rA (B) is less than rB (A) , it can
be concluded that event A is “more powerful” in the influence on B, than the power of event B in
its influence on A. We should agree with this and accept it as reflecting what indeed exists in the
real life. In the same time by catching the asymmetry with the proposed measures we are
convinced about their flexibility and utility features
We guess that it is now possible to use it for construction of some gradation in respect of
the magnitude for the strength of dependence of one of the events with respect to the other one,
according to the distance of the regression coefficient from the zero (where the independence
stays). For instance, if the values are within .05 distance from the zero, the event could be
classified as “almost independent on the other”, for distances between .05 to .2 from the zero, the
event may be classified as weakly dependent on the other; if the distance is between .2 and .45, the
event could be classified as moderately dependent; from .45 to .8 to be called as in average
dependent, and above .8 to be classified as strongly dependent. Everybody will understand that this
classification is pretty much conditional, even is made up by the author. However, it shows a
possibility for the use of these coefficients.
What may be interesting here, this is that despite of the asymmetry it is possible when fix
one of the events, say B, and consider any finite sequence A1 , A2 ,, An of given random events,
then these events can be ordered according to their “magnitude of influence on event B” which
corresponds to the inverse order of the absolute value of their regression coefficients with respect
to the event B.
One last thing we would like to discuss here is how to use the known values of the
regression coefficients to predict the posterior probabilities, e.g. P ( B | A) , when the prior
(marginal probabilities) P(A) and P(B) are known. Assume that it is known rA (B) . Then
equations (5) will allow evaluating rB (A) , and therefore in the reverse case the rules will be
symmetric to what we show for the evaluation of the posterior (conditional) probability P ( B | A) .
From Definition 2, property δ5), and equations (10) we get the sequence of equivalent
presentations
 ( A,B)
P ( B | A) = rA (B) + P (B | A ) = rA (B) + P (B) +
=P (B) + rA (B) [1-P(A)].
P( A)
If we substitute in this last rule the calculated values of
rA (B) = .2174, P (A) =.08,
P(B)=.08, we will get the same value P ( B | A) = .05+.2174 ∙ (.92)= .25.
4. Correlation between two random events
It is not known to me why this brilliant mathematician, Dr of the French Sorbonne, Obreshkov
introduces the correlation between two random events to be equal to the geometric average of the
two regression coefficients rB (A) and rA (B) , with the sign of either one. His definition is as
follows:
Definition 3. Correlation coefficient between two events A and B we call the number
R A, B =  rB ( A)  rA ( B) ,
(15)
whose sign, plus or minus, is the sign either of the two regression coefficients.
If we use the representations (9) in equation (15), we immediately obtain an equivalent
representation of the correlation coefficient R A, B in terms of the connection  ( A, B) , namely
 ( A, B)
R A, B =
=
P( A) P( A) P( B) P( B)
P( A  B)  P( A) P( B)
.
(16)
P( A) P( A) P( B) P( B)
We do not forget that neither of the events А or В is zero, or sure event. However, if it happens
then  ( A, B) = rB (A) = rA (B) = R A, B = 0.
Remark 4. From representation (16) we understand, that in fact, the correlation coefficient
R A, B between the events А and В is equal to the correlation coefficient  I A , I B between the random
variables I A and I B (the indicators of the two random events A and B, exactly as these are defined
in Remark 1). To see this, one needs the concept of correlation coefficient between two random
variables, which is based on the concepts of random variable (r.v.) X, of probability distribution, of
mathematical expectation Е(Х) and of variance D(X) for a r.v., two-dimensional distribution and
related with expected values of functions of random variables. For those who have this knowledge
it should be familiar the formula
 X ,Y =
Cov( X , Y )
D( X ) D(Y )
=
E ( XY )  E ( X ) E (Y )
D( X ) D(Y )
,
which defines the correlation coefficient between two r.v.’s Х and Y. When Х = I A , and Y = I B it
is true that
Е(ХY)=Е( I A I B ) = P ( A  B ) ; Е( I A ) = P(A); D( I A ) = P( A)[1  P( A)]  P( A) P( A) ,
And ultimately it will be obtained  I A , I B = R A, B .
It is presumable to use this approach in the definition of the correlation coefficient between
two random events, namely as equal to the correlation between their indicators. And such an
approach would require the introduction of all the listed above complex concepts (including
integration of functions of r.v.’s and more), and will be not available at the basics of Probability
Theory. For this reason, we personally admire the approach proposed by Obreshkov.
Let us now discuss the properties of R A, B and see what gives the knowledge of the
correlation coefficient between two random events
R1. It is fulfilled R A, B = 0 if and only if the two events А and В are independent.
Proof. First, let us note that equation (16) seems useless when P(A)= 0, or Р(В) = 0 (i.e. when one
of the events is zero), or when P(A)= 1, or Р(В) = 1 (i.e. when one of the events is sure event).
But, in such situations А and В are independent. According to our earlier agreements we have
 ( A, B) = rB (A) = rA (B) =0, and also R A, B = 0. However, if R A, B = 0 in other occasions (when
are fulfilled the inclusions P( A)  (0,1) and P( B)  (0,1) ), then the zero correlation coefficients
holds only when also  ( A, B) = 0, i.e. when А and В are independent..
R2. The correlation coefficient R A, B always is a number between –1 and +1, i.e. it is
fulfilled -1 ≤ R A, B ≤ 1.
R2.1. The equality R A, B = 1 holds if and only if the events А and В are equivalent, i.e.
when А = В.
R2.2. The equality R A, B = - 1 holds if and only if the events А and B are equivalent, i.e.
when А = B (then, of course it holds also A = В).
Proof. The assertions here follow directly from Definition 3, and from properties (r3) of
the regression coefficients rB (A) and rA (B) .
With some more work these properties can be derived from equations (16) if these
equalities are used for direct definition of the correlation coefficient between two random events,
and nothing is known about the regression coefficients. We leave this challenge for the readers.
R3. The correlation coefficient R A, B has the same sign as the other measures of the
dependence between two random events А and В (and this is the sign of the connection.  ( A, B) , as
well as the sign of the two regression coefficients rB (A) and rA (B) ). The knowledge of R A, B
allows calculating the posterior probability of one of the events under the condition that the other
one is occurred. For instance, P (B | A) will be determined by the rule
P ( B | A) = P(B) + R A, B
P( A) P( B) P( B)
.
P( A)
(17)
Proof. The first part of the statement follows from the relationships (16) and from equation (9).
To prove (17) we use equation (16) and the presentation
 ( A, B) = P ( B | A) P( A) - P( A) P(B)
in it. This will lead to the relation
[P(B | A) - P(B)]P(A)
R A, B =
.
P( A) P( A) P( B) P( B)
When this equation is solved with respect to P ( B | A) , it will give (117).
Once again, equation (17) allows calculating the posterior probability P ( B | A) of the
random event В, when it is known the correlation coefficient R A, B , and the prior probabilities P( A)
and P(B) of both events, plus the information that the other event A occurs. This rule reminds the
Bayes rule for posterior probabilities. However, in our case there is no need of B to be a member of
a complete system of events (the so called “hypotheses”, representing a partitioning of the sure
event Ω in the form of mutually exclusive particular cases of the form Ω = B1 + . . . + Bn ). Also,
there is no need neither of the conditional probabilities of the event B under the assumption that
either of the hypotheses took place which are needed in order to be able to apply the Bayes Rule
P( B j | A) 
P( A | B j ) P( B j )
P( A | B1 ) P( B1 )    P( A | Bn ) P( Bn )
for the posterior probabilities. For our rule it is no necessity the event B to be part of a system of
hypotheses, and can be any. Then it is sufficient to have the prior probabilities P( A) and P(B)
available (then the probabilities of their complements P( A) and P(B) needed in (17) are
determined by the well known relations P( A) = 1 - P( A) ), an available numerical value of the
correlation coefficient R A, B , and the rule (17) to be applied.
Important Note: We should note that in the definition of R A, B (by formula (16) and by
Definition 1, or according to Definition 3 and the equations participating in Definition 2)
participate just probabilities. And these probabilities have natural frequencies statistical
estimations, so that there is an easy and natural way to estimate the correlation coefficient R A, B .
This is our reason to believe, that the proposed here rule for estimating (evaluating) the posterior
probabilities can be turned into a powerful tool in calculating posterior probabilities, with a
brilliant use of the statistical information for practical purposes.
In addition, we will notice that the net increase, or decrease in the posterior probability
compare to the prior probability and expressed by formula (17), is equal to the quantity
P( A) P( B) P( B)
, and depends only on the value of the mutual correlation R A, B (positive or
P( A)
negative), and also on the prior probabilities of the two events А and В. A comprehensive rule can
be written as well as in the case when it is known that the complement A of the event A occurs:
then the quantity R A, B must be replaced by - R A, B , as well as the symbols А must be replaced by
R A, B
A , and vice versa, A to be replaced by А. The readers easily will work out the details, and get the
equation
P ( B | A) = P(B) - R A, B
P ( A) P ( B ) P ( B )
.
P ( A)
R4. It is fulfilled R A, B = R A, B = - R A, B ; R A, B = R A, B .
Proof. These equations follow from the presentation (16) for R A, B , and from the property
δ4) of the connection δ(А,В) between two events and the possible combinations of these and their
complements.
Particular Cases. The found rules above work always when neither of the random events А
or В is zero, or sure event. Otherwise we have R A, B = 0. However, there are mutual allocations
between the two events as shown on Fig. 1. For each of these particular cases the correlation
coefficient takes the following specific values:
1а. When it is fulfilled A  B , then it is true
R A, B 
P ( A) P ( B )
;
P ( A) P ( B )
1б. Whenever the two events are mutually exclusive is true A  B   , and also
P( A) P( B)
.
R A, B  
P( A) P( B)
The use of the numerical values of the correlation coefficient is similar to the use of the two
regression coefficients. As closer is R A, B to the zero, as “closer” are the two events А and В to the
independence. Let us note once again that R A, B = 0 if and only if the two events are independent.
For random variables similar statement is not true. The equality to zero of their mutual
correlation coefficient does not mean independence, but only registers an absence of correlation.
The two random variables are called then non-correlated.
As closer is R A, B to the number 1, as “dense one within the other” are the events А and В,
and when R A, B = 1, the two events coincide (are equivalent).
As closer is R A, B to the number -1, as “dense one within the other” are the events А and B ,
and when R A, B = - 1, the two events coincide (are equivalent). As well dense one within the other
are the events A and В.
These interpretations seem convenient when conducting research and investigations
associated with qualitative (non-numeric) factors and characteristics. Such cases are common in
sociology, ecology, jurisdictions, medicine, criminology, design of experiments, and other similar


P( A) P( B) 
 P( A) P( B)

 P( A) P( B) P( A) P( B) 

max 
,

R
(
A
,
B
)

min
,



P( A) P( B) 


 P( A) P( B)

 P( A) P( B) P( A) P( B) 

areas.
R5. Freshet-Hoefding inequalities for the Correlation Coefficient
Its proof is similar to these for the regression coefficient. We omit details. Just notice, that these
inequalities are important if one wants to construct (e.g. for simulation purposes) events with
given individual probability, and desired mutual correlation.
Example 1 (continued): We will calculate the numerical value of the correlation
coefficient R A, B for the events considered in Example 1 according to its definition,
because we have the numerical values of the two regression coefficients rA (B) and rB (A)
from the previous section. In this way we get
R A, B =
(.3368)(.2174) = .2706.
Analogously, as in the cases with the work using the regression coefficients, the numeric
value of the correlation coefficient could be used for some classifications of the degree
(strength) of the mutual dependence. The practical implementation will give a clear
indication about the rules of such classifications. From our example we see that the
correlation coefficient is something in-between the two regression coefficients. In certain
degree, it absorbs the misbalance (the asymmetry) in the two regression coefficients, and
looks a well balanced measure of dependence between the two events, including its
magnitude of strength.
Assume, that it is known R (A, B) =. 2706 as it is calculated. Then the rule (17) will allow
evaluating P ( B | A) , as well as P ( A | B ) - the posterior (conditional) probabilities of one
event given the information that the other one occurs. If we substitute P (A) =.08, P (B)
=.05, we will get the same value P ( B | A) = .05+ (.2706) (.92)(.05)(.95) /(. 08) = .25 as in
Example 1 of Section 2.
Similar examples could be given in variety areas of our life. For instance, one could
consider the possible degree of dependence between tornado touch downs in Kansas (event A), and
in Alabama (event B). In sociology a family with 3, or more children (event A), and an income
above the average (event B); in medicine someone gets an infarct (event A), and a stroke (event B).
More examples, far better and meaningful are expected when the revenue of this approach is
assessed.
5. Empirical estimation of the measures of dependence between
random events
The fact that these measures of the dependence between random events are made of their
probabilities makes them very attractive and in the same time easy for statistical estimation and
practical use.
It is well known that if in N independent experiments an event А occurs k A times, the
statistical estimator of the probability Р (А) is the ratio k A /N. In this way all the probabilities in the
definitions of the introduced measures can be statistically estimated and when replaced in the
equations for the measures will represent the respective statistical estimations of these measures.
By the way, the obtained in this approach estimators are also maximum likelihood estimators of
these characteristics, since the estimator of the probability Р(А) is a maximum likelihood
estimator.
Let in N independent experiments (or observations) the random event А occurs k A times,
the random event В occurs k B times, and the event A  B occurs k AB times. Then the statistical
estimators of our measures of dependence will be respectively as follows:
For the connection between the two events the estimator is given by the formula
k
k k
ˆ( A, B)  AB  A  B .
N
N N
For the two regression coefficients the estimators are
k A B k A k B


N
N
N ;
rˆA ( B) 
kA
kA
(1  )
N
N
k A B k A k B


N
N
N ,
rˆB ( A) =
kB
kB
(1  )
N
N
and the correlation coefficient has the estimator
Rˆ ( A, B ) =
k A B k A k B


N
N N
.
kA
k A kB
kB
(1  ) (1  )
N
N N
N
According to the rules of the statistical estimation, these estimators are all consistent (i.e. in large
numbers of observations their values with high probability are close to the true values of the
estimated parameter). Moreover, the estimator of the connection ˆ( A, B) is also unbiased, since its
expectation is the very function connection δ(А,В) , i.e. there is no systematic error in this estimate.
As a consequence, we have that the proposed here estimators of the measures of the
dependence between any two random events can be used for practical purposes with the reasonable
interpretations and explanations, as it is shown above in our theoretical discussion, and up to
certain extend, in our example.
As we see, the use of the conditional probabilities in the estimations of the regression
coefficients is not needed. We personally are excited from the offered in these approach
opportunities. Even in the given here example we are using the frequency interpretation of the
probabilities and not any assumed theoretical values
6. Some warnings
First of all, we should note, that the introduced measures of dependence between random
events are not transitive. It is possible that the random event А is positively associated with the
random event B, and this event В to be positively associated with a third random event С, but the
event А to be negatively associated with С. To see this it is sufficient to imagine the events А and В
compatible (with a non-empty intersection, as shown on Fig. 1 c) as well the events В and С also
compatible, while А and С being incompatible, mutually exclusive, and therefore, with a negative
connection. Then it may happen to observe the mentioned here situation. As we could see, for
mutually exclusive events the connection is negative, when for the non-exclusive pairs (А, В) and
(В, С) every kind of dependence is possible. However, in these facts we also see a lots of
flexibility when studying the dependence not as an integral feature, but as composed in a number
of particular details.
7. An illustration of possible applications
As an illustration of what one can do with the proposed here measures of dependence between two
random events we analyze here the data from Table 2.4 from the book of Alan Agresti Categorical
Data Analysis, 2006.
The following table represents the observed data about the yearly income of people and the
job satisfaction.
Table 1: Observed Frequencies of Income and Job Satisfaction
Income US $$
< 6,000
6,000–15,000
15,000-25,000
> 25,000
Total
Marginally
Very
Dissatisfied
20
22
13
7
62
Job Satisfaction
Little
Moderately
Satisfied
Satisfied
24
80
38
104
28
81
18
54
108
319
Very
Satisfied
82
125
113
92
Total
Marginally
206
289
235
171
412
901
When apply the empirical rules for evaluating the probabilities in each category
ni , j
n. j
n
Pi , j 
, Pi ,.  .i , P., j 
,
n
n
n
the above table produces the empirical probabilities for a new observation to fall in the respective
cell
Table 2: Empirical Estimations of the probabilities for each particular case Pi , j , Pi.. , P. j
Income US $$
Very
Dissatisfied
Job Satisfaction
Little
Moderately
Satisfied
Satisfied
Very
Satisfied
< 6,000
6,000–15,000
15,000-25,000
> 25,000
.02220
.02442
.01443
.00776
.02664
.04217
.03108
.01998
.08879
.11543
.08990
.05993
.09101
.13873
.12542
.10211
Total
(marginal)
distribution
.22864
.32075
.26083
.18978
Total
(marginal)
distribution
.06881
.11987
.35405
.45727
1.00000
Applying the rules given by the definitions of the proposed measures of dependence
between random events, and either the use of the empirical probabilities in Table 2, or alternatively
by using the rules for empirical estimation of these measures as described in Section 5, we obtain
the following tables 3 to 6.
Table 3: Empirical Estimations of the connection function for each particular category of
Income and Job Satisfaction  ( IncomeGroupi , Satisfaction j )
Income US $$
< 6,000
6,000–15,000
15,000-25,000
> 25,000
Job Satisfaction
Very
Little
Dissatisfied
Satisfied
Moderately
Satisfied
Very
Satisfied
0.006467282
0.002349193
-0.00351771
-0.00077
0.003722
-0.00019
0.00784
0.001868
-0.00245
-0.01354
-0.00794
0.00615
-0.00529876
-0.00277
-0.00726
0.015329
0
0
Total sum in a
column
0
0
An interesting, and important feature of this table is that the sums of all entries in a row, as
well as the sum of all entries in a particular column, equals zero. This property is in accordance to
property δ3 because the sum represent the total connection of the respective category if the given
factor with the union of all other categories of the other factor, which equals to the sure event.
The numerical values of the connection function do not show the magnitude of the
dependence between the categories of the income and the levels of job satisfaction. However, their
sign indicates a direction of the possible dependence compare to the “neutral stage” of
independence. Positive sign indicates a positive local association between these two variables, and
the negative sign indicates a negative association in the locality of these particular categories of the
two variables. The negative association is marked in cold blue, and the positive areas of
association are highlighted in warm pink.
The numerical values of the regression coefficients are true measures for the magnitude of
the dependence between the two variables, besides the positive association between lowest
categories of the income and the low levels of satisfaction, we observe negative association
between low level of income and highest levels of job satisfaction. However, these magnitudes are
small, close to the zone of independence. We also observe the asymmetry of the dependence,
when compare the corresponding entries of tables 4 and 5. For instance, look at the first entries on
both tables of the regression coefficients: the one in Table 4 is about 3 times greater than the
respective entry in Table 5. It says that the income group is three times more dependent on the
answers about job satisfaction (Table 4), than the job satisfaction answer on the income group.
Table 4: Empirical Estimations of the regression coefficient between each particular level
of income with respect to the job satisfaction rSatisfaction j ( IncomeGroupi )
Job Satisfaction
Income US $$
< 6,000
6,000–15,000
15,000-25,000
> 25,000
Very
Dissatisfied
Little
Satisfied
Moderately
Satisfied
Very
Satisfied
0.100932704
0.036663063
-0.05489976
-0.00727
0.035276
-0.00176
0.034281
0.00817
-0.0107
-0.05456
-0.03199
0.024782
-0.08269601
-0.02625
-0.03175
0.061768
Table 5: Empirical Estimations of the regression coefficient between each particular
level of the job satisfaction with respect to the income
Income US $$
< 6,000
6,000–15,000
15,000-25,000
> 25,000
For instance, the number
Very
Dissatisfied
Job Satisfaction
Little
Moderately
Satisfied
Satisfied
Very
Satisfied
0.03667013
0.01078257
-0.01824561
-0.00435
0.017082
-0.00096
0.044454
0.008576
-0.01269
-0.07677
-0.03644
0.0319
-0.03446045
-0.01801
-0.04723
0.099694
rVeryDissatisfied ( 6000) =.100932704 indicates positive dependence of the
category of the lowest income “<6,000” on the category: “Very Dissatisfied” for the Job Satisfaction variable.
The same number with negative sign
rVeryDissatisfied ( 6,000) = - 100932704 indicate the negative strength
of dependence of all the other income categories, higher than “<6,000” on the category: “Very Dissatisfied”
for the Job Satisfaction variable. Similarly, the sums of numbers from several cells in a column of Table 4,
(or in a row of Table 5) will indicate the strength of dependence of the union of the categories of the
respective factor “Income” on the corresponding to the column category of the “Job, Satisfaction” (with
analogous switch of factor’s interpretation).
The two regression coefficient’ matrices allow us to calculate the correlation coefficients
between every pair of particular categories of the two factors, according to the rules of Section 4.
Table 6 summarizes these calculations. The numbers actually represent the numerical estimations
of the respective correlation coefficients. Obviously, each of these numbers gives the local average
measure of dependence between the two factors. Unfortunately, the summation of the numbers in
a vertical or horizontal line is not having the same or similar meaning as in the cases of connection,
or regression matrices. Also, the sums of the numbers in a row or in a column do not equal zero as
above. Graphical presentation of the information given in Tables 4 to 7 is given on Fig. 2 – Fig. 6
at the end.
Table 6: Empirical Estimations of the correlation coefficient between each particular income
group and the categories of the job satisfaction R( IncomeGroupi , Satisfaction j )
Income US $$
< 6,000
6,000–15,000
15,000-25,000
> 25,000
Job Satisfaction
Very
Little
Moderately
Dissatisfied
Satisfied
Satisfied
Very
Satisfied
0.060838
0.019883
- 0.031649
- 0.005623
0.024548
- 0.001302
0.039037
0.008371
- 0.011653
- 0.064721
- 0.034144
0.028117
- 0.053383
- 0.02174
- 0.038723
0.078472
A prediction of the Income group when you know the job satisfaction, the marginal probabilities
and the connection (or Correlation coefficients) matrix.
If one knows the category B of the job satisfaction, and has handy the connection function,
or either of the other measures of dependence between these categories plus the marginal
unconditional probabilities P(A) and P(B) of the particular groups, then the conditional (posterior)
probabilities P(A|B) for the income groups can be re-evaluated by making use one of the rules (5),
or (13), or equivalent. The following Table 7 presents these probabilities, and for comparison,
P(A) are given in the last column.
Table 7: Forecast of the probabilities P( Ai | B j )  P( Ai )   ( Ai , B j ) / P( B j ) of particular
income group given the categories of the job satisfaction
Income US $$
Very
Dissatisfied
< 6,000
6,000–15,000
15,000-25,000
> 25,000
Total
 i P(Bk | Ai )  1
0.32262753
0.35489028
0.20970789
0.11277431
1.00000
Little
Satisfied
0.22224076
0.35179778
0.25928089
0.16668057
1.00000
Moderately
Satisfied
0.250783788
0.326027397
0.253918938
0.169269877
1.00000
Very
Satisfied
0.19902902
0.303387495
0.274279966
0.223303519
1.00000
Unconditional
Probabilities
P(A)
.22864
.32075
.26083
.18978
1.00000
The red numbers show the “hot local positions”, where the conditional probability increases
compare to the prior (unconditional) probability. The blue colored numbers show the places of
local decrease in the posterior probability, and these are the places where the connection is
negative. Now we know, that if someone answer is “very dissatisfied, then the highest chance is
that it comes from someone who belongs to the range of 6,000 – 15,000 income. The chances that
such answer comes from the group of income “<6,000” have increased by approximately .10. If
someone answer is “Very Satisfied”, then the lowest chances that it comes from the group of
income “<$6,000”, and this is totally different (as the entire order of income classes) from the prior
distribution of the income. Also, the sum of the numbers in a column gives 1, since there are all the
possible parts of the sure event ( S  i Ai ) .
Analogously, if one knows the income group A, and has handy the connection function, or
either of the other measures of dependence P ( B j ) of the particular groups, then the conditional
(posterior) probabilities P( B j | Ai ) of the job satisfaction groups of answers can be re-evaluated by
making use the same rules respectively. The following Table 8 presents these probabilities, and for
comparison, P ( B j ) are given in the last row.
Table 8: Forecast of the probabilities p( B j | Ai )  P( B j )   (( Ai , B j ) / PAi ) of particular
income group given the categories of the job satisfaction
Very
Dissatisfied
< 6,000
6,000–15,000
15,000-25,000
> 25,000
Unconditional
Probabilities
P(B)
Very
Dissatisfied
Job Satisfaction
Little
Moderately
Satisfied
Satisfied
Very
Satisfied
Total
 k P( Ai | Bk )  1
0.09709587
0.07613406
0.05532339
0.11651505
0.13147311
0.11915807
0.388339748
0.359875292
0.344668941
0.398049335
0.432517537
0.480849596
1
1
1
0.04088945
0.1052798
0.3157867
0.538044051
1
0.06881
0.11987
0.35405
0.45727
1.00000
Here we like to notice, that similar “categorizations” can be made for any two numeric random
variables, and what we see and read in the above tables can be used for studies of the local
structure of dependence between random variables.
8. Conclusions
We discussed four measures of dependence between two random events. These measures are
equivalent, and exhibit natural properties. The numerical values or regression coefficients and of
the correlation coefficient may serve as indicators of the magnitude of dependence between
random events.
These measures provide simple ways to detect independence, coincidence, and degree of
dependence.
When either measure of dependence is known, as well as the individual probability of each
event, this allows restoration of all the other measures of dependence, and the joint probability.
Also it serves for better prediction of the chance for occurrence of one event, given that the other
one occurs.
If applied to the events A = [a ≤ X < b], and B = [c ≤ Y < d], these measures immediately turn
into measures of the LOCAL DEPENDENCE between the r.v.’s X and Y associated with the
rectangle [a, b] х [c, d] on the plane. Therefore, the proposed and discussed here measures offer a
great tool in the study of local dependence.
References
[1]
A. Agresti (2006) Categorical Data Analysis, John Wiley & Sons, Hew York.
[2] B. Dimitrov, and N. Yanev (1991) Probability and Statistics, A textbook, Sofia University
“Kliment Ohridski”, Sofia (Secоnd Edition 1998, Third Edition 2007).
[3] N. Obreshkov (1963) Probability Theory, Nauka i Izkustvo, Sofia (in Bulgarian).
[4]
Encyclopedia of Statistical Sciences (1981 – 1988), v. 1 – v. 9. Editors-in-Chief S. Kotz,
and N. L. Johnson, John Wiley & Sons, New York .
Connection Function between Income Levesl and
Satisfaction Levels
0.02
0.015
0.01
0.005
Connection
0
values
-0.005
-0.01
-0.015
Series1
Series2
Series3
S3
1
2
S1
3
Satisfaction
Level
Series4
4
Income Level
Fig. 2. Surface plot of the Connection Function according to data of Table 3.
Regression Coefficients of Sattisfaction w.r. to
Income Level
0.15
0.1
Regr. Coeff. 0.05
values
0
Series1
Series2
-0.05
-0.1
1
2
S3
Income Level
S1
3
Series3
Series4
4
Satisfaction
Level
Fig. 3. Surface plot of the Regression Coefficient Function rSatisfaction j ( IncomeGroupi )
according to data of Table 4
Regression Coefficients surface for Satisfaction w.r.to
Income
0.1
0.05
Regr. Coeff.
values
Series1
0
-0.05
Series2
S4
-0.1
S1
1 2
3
Income Level
4
Series3
Series4
Satisfaction
Level
Fig. 4. Surface plot of the Regression Coefficient Function rIncomeGroupi ( Satisfaction j )
according to data of Table 5
Correlation Function for the Income Level and the Job
Satisfaction Levels
0.1
0.05
Correlation
values
Series1
0
Series2
-0.05
S4
-0.1
1
2
S1
3
4
Job
Satisfaction
Series3
Series4
Income Level
Fig. 5. Surface plot of the Correlation Coefficient Function R( IncomeGroupi , Satisfaction j )
according to data of Table 6
Download