Markov Chains J. Alfredo Blakeley-Ruiz

advertisement
Markov Chains
J. Alfredo Blakeley-Ruiz
The Markov Chain
• Concept developed by Andrey Andreyevich Markov
• A set of states that can be inhabited at a moment in
time.
• These states are connected by transitions
• Each transition represents the probability that a state
will transition to another state in discrete time.
• The transition from one state to another state depends
only on the current state, not any states that may have
existed in the past.
– Markov property
Explained Mathematically
Given a series states X1, X2,…, Xn
As long as
Markov chains can be represented as a
weighted digraph
http://www.mathcs.emory.edu/~cheung/Courses/558/Syllabus/00/queueing/discrete-Markov.html
Markov chains can also be represented
by a transition matrix
1
2
3
4
1
0.6
0.2
0.0
0.2
2
0.6
0.4
0.0
0.0
3
0.0
0.3
0.4
0.3
4
0.3
0.0
0.0
0.7
Peter the Great
•
•
•
Lived 1672-1725
Tsar: 1682-1725
Ushered Russia into the modern era
–
–
–
–
–
Changed fashions – beard tax
Modernized Russian military
Changed capital to St. Petersburg
Created a meritocracy
Founded Academy of Sciences in
St. Petersburg
•
•
•
•
•
Imported many foreign experts (Euler)
Later native Russians began to make their mark
Nikolai Lobachevsky – non-Euclidian geometry
Pafnuty Chebyshev – Markov’s advisor
Some not so good things
– Huge wars of attrition with Ottomans and Swedish
– Meritocracy only for the nobles and clergy
– New tax laws turned the serfs into slaves
https://en.wikipedia.org/wiki/Peter_the_Great
Jacob Bernoulli
• Born: Basel Switzerland
– 1654-1705
• University of Basel
• Theologian, astronomer, and
mathematician
– Supporter of Leibniz in calculus
controversy
– Credited with a huge list of
mathematical contributions
• Discovery of the constant e
• Bernoulli numbers
• Bernoulli’s golden theorem – Law of
large numbers
The law of large numbers
• The relative frequency, hnt, of an event with
probability p = r/t, t = s + t, in nt independent
trials converges to probability p.
• This also often called the weak law of large
numbers.
https://en.wikipedia.org/wiki/Law_of_large_numbers
Adolphe Quetelet
• Born: Ghent, French Republic
– 1796-1874
• Alma Mater: University of Ghent
• Brussels Observatory
–
–
–
–
Astronomer
Statistician
Mathematician
Sociologist
https://en.wikipedia.org/wiki/Adolp
he_Quetelet
• From a sociological perspective he concluded that
while free will was real. The law of large numbers
demonstrated that it didn’t matter.
Pafnuty Chebyshev
• Born in Akatovo, Russian Empire
– 1821-1894
• Alma mater: Moscow University
• Professor: St. Petersburg
University
– Andrei Markov
– Aleksandr Lyapunov
• Chebyshev inequality
– Proves weak law of large numbers
https://en.wikipedia.org/wiki/Pafnuty_Chebyshev
Chebyshev inequality
• Let X be a sequence of independent and
identically distributed variables with mean πœ‡ and
standard deviation 𝜎.
• X = (x1,...,xn)/n
• <X> = πœ‡
• Var(X) = 𝜎2/n
• Chebyachev’s inequality states that for all πœ€ > 0.
P(|X-πœ‡|≥ πœ€) ≤ Var(X)/πœ€ = 𝜎2/(nπœ€)
→ lim P(|X−πœ‡| ≥ πœ€) ≤ lim 𝜎2/(nπœ€) = 0
𝑛→∞
𝑛→∞
Pavel Nekrasov
• Lived: 1853-1923
• University of Moscow
• Monarchist and supporter
Orthodox Church
• Attempted to use LLN to
prove free will
• Came up with a proof for LLN
using independent variables.
• His 1902 paper on these two
topics inspired Markov to
invent Markov chains.
http://bit-player.org/wp-content/extras/markov/#/33
Andrey Andreyevich Markov
• Born: Ryazan, Russian Empire
– 1856-1922
• Alma Mater: St. Petersburg University
– Advisor Pafnuty Cheyshev
• Professor: St. Petersburg University
• Notable achievments
– Published more than 120 papers
– Chebyshev inequality – Markov inequality
– Invention of Markov Chain
• Proving that the Law of Large numbers could apply to dependent
random variables
• Markov had a prickly personality
“I note with astonishment that in
the book of A. A. Chuprov, Essays on
the Theory of Statistics, on page 195,
P. A. Nekrasov, whose work in recent
years represents an abuse of
mathematics, is mentioned next to
Chebyshev.”
The Feud
• Nekrosov and Markov where the opposite in
every respect
– Moscow vs. St. Petersburg
– Monarchist vs. Anti-Tsarist
– Religious vs Secularist
• These have as much of a role to play in the
future perception of these two
mathematicians as mathematics
Nekrosov’s argument for free will
(1902 paper)
• Nekrosov disagreed with Quetelet
• Used Chebyech inequality to prove law of large
numbers for specific independent πœ€
• Showed Independent variables -> law of large numbers
• Conjectured that independent variables where
necessary for the law of large numbers
• Conjectured voluntary acts can be considered like
independent trials in probability theory.
• Stated that law of large numbers had been shown to
hold true for social behaviors.
• Argued that this was proof of free will
Markov’s counter argument
• Markov did not care about the philosophical
arguments and was only interested in the
mathematics
• Nekrosov and others assumed that the law of
large numbers applied only to independent
events
• Markov invented markov chains and used
them to prove that the law of large numbers
could apply to dependant events
Markov’s 1906-1907 paper
Ground work
• In his first Paper on Markov chains, Markov
considered two states  and 
• A simple chain was an infinite series x1,x2,…,xk,
xk+1
– Where k is the current time, and xk is the state at time
k.
• For any k, xk+1 was independent of x1,x2,…,xk-1
given that xk is known.
• This chain was also time homogeneous in that
xk+1 given xk was independent of k
Some Variables
P, = Probability of event xk+1= , Given xk= 
π‘ƒπ›½π‘˜+1 = ∑π›Όπ‘ƒπ›Όπ‘˜ 𝑃𝛼,𝛽
ai = expected value of independent variable xi
𝐴𝑖𝛾 = 𝐸 π‘₯π‘˜+𝑖 π‘₯π‘˜ = 𝛾 = Expected value of xk+I,
Given kk = 𝛾
Markov’s Theorom
Theorem: For a chain with a positive matrix, all
numbers ak+1 and 𝐴𝑖𝛾 have the same limit, which
they differ from by numbers < βˆ†π‘– . At the same time
βˆ†π‘– < Chi, where C and H are constants and 0<H<1.
This theorem shows that the limit of the probability
of the next variable converges to zero with both
independent variables and dependent variables
given a markov chain.
“I am concerned only with questions of pure analysis.... I refer to the
question of the applicability of probability theory with indifference.”
Markov’s 1913 Paper
Markov’s experiment on Pushkin’s Eugene Onegin
Hayes, B 2013
Simple Markov Chains can be used to
create a simple weather prediction model
Hayes, B 2013
Simple Markov Chain in Economics
Recession Prediction
Social Class Mobility Prediction
ng
mr
sr
ng
0.97
0.29
0
mr
0.145
0.778
0.77
sr
0
0.508
0.492
http://quant-econ.net/jl/finite_markov.html
poor
middle
class
Rich
poor
0.9
0.1
0
middle
class
0.4
0.4
0.2
rich
0.1
0.1
0.8
Chemical Kinetics: As a stochastic
process
π‘˜1
• π‘Ž ↔b
π‘˜2
– k1 rate that a transition to b
– k2 rate that b transition to a
• Modeled by a linear chain where each state is a
different number of a and b molecules
– x1,x2,...,xk,xk+1
– If the state xk there are 50 molecules of a and 0
molecule of b what will be the state at xk+1.
Chemical Kinetics Example
•
•
•
•
•
a0=5
b0=0
20 iterations
k1= 1
k2= 1
To emphasize randomness
Chemical Kinetics Example
•
•
•
•
•
a0 = 50
b0 = 0
100 iterations
k1= 1
k2= 1
Chemical Kinetics Example
•
•
•
•
•
a0 = 50
b0 = 0
100 iterations
k1= 5
k2= 1
Other applications of simple markov
chain
• Genetic drift
• Google page rank
algorithm
• Social sciences
• Games
– Snakes and ladders
– Monopoly
Hidden Markov Model
• So far we have discussed observable Markov
Models
• Sometimes the underlying stochastic process
in a system cannot be observed
• Observations resulting from the process can
be used to infer the underlying stochastic
process
• Developed by Leonard E. Baum and Co.
Hidden Markov Model
• X(t) = state at time t
• X(t) ∈ {x1,…,xn}
– n = # unobservable
states
• Y(t) ∈ {y1,…,yk}
– k = # possible
observations
• 𝑃 𝑋1:𝑇 , π‘Œ1:𝑇 =
𝑇
P π‘₯1 P 𝑦1 π‘₯1 ∏𝑑=2
𝑃 π‘₯𝑑 π‘₯𝑑−1 𝑃(𝑦𝑑 |π‘₯𝑑 )
https://en.wikipedia.org/wiki/Hidden_Markov_model
Simple Example HMM
• Bob and Alice talk on
the phone every day
• Alice cannot see the
weather.
• Bob only likes to talk
about three activities
• The weather’s markov
model can be predicted
using Alice’s
observations.
https://en.wikipedia.org/wiki/Hidden_Markov_model
Left to right HMM
• State transitions have the property aij = 0, j < i
– No transtions are allowed to states whose indices are
lower than the current state
• Transition can be restricted aij = 0, j > i + β–³.
Single Word Speech Recognition
• v words to be identified
– Each word modeled by a
distinct HMM
• k occurrences of each
word spoken by 1 or more
talkers.
• λv = HMM for each word
in the vocabulary
• O = {O1,O2,…,On)
• P(O| λv), 1 < v < V
• v* = argmax[P(O| λv)]
http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
Identifying unknown proteins
• Homologous proteins are proteins that share a
common ancestry (and likely a similar function).
– Orthologs – proteins that originate from a common
ancestor
– Paralogs – proteins that originate from copy events in
the same ancestor
• The sequence of known homologous proteins can
be used to predict the function of unknown
homologous proteins.
Identifying unknown proteins
• Global Alignment
– CD-Hit
– uClust
• Local Alignment
– Blast
• Problem is that
sequence does not
directly determine
function. Structure
does.
http://drive5.com/usearch/manual/uclust_algo.html
Protein Homology Identification
• Protein domains
represent functional
subunits in a protein
• A single protein can
be made up of one or
more domains
– We can use the
domain architecture
to predict remote
homology
Protein Homology Identification
• Pfam and other domain databases identify
domains of highly conserved sequences
• They then create a series of hundreds of
representative sequences in order to create an
HMM.
• This HMM can then be used to determine the
domains found in an unknown protein.
• Function can be inferred from domain
architecture
Example HMM’s
Homework
• Given the transition matrix bellow
– Draw the representative digraph
– Assuming we are in State 1 at t0, what is the
probability that we will be in state 2 at t2
– Calculate the probability matrix for t2
– At which t do the values converge so that the state
of t0 does not matter
1
2
3
1
0.9
0.15
0.5
2
0.075
0.8
0.25
3
0.025
0.05
0.25
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Hayes, B. (2013). First Links in the Markov Chain. American Scientist, 101(2), 92.
Schneider, I. (2005). Jakob Bernoulli, Ars Conjectandi (1713). Landmark Writings in Western Mathematics 16401940 (pp. 88-104). Amsterdam: Elsevier.
Kinchin, A (1929). Sur la loi des grands nombres. Comptes rendus de l’Academie des Sciences, 189, 477-479.
Vucinich, A. (1960). Mathematics in Russian culture. Journal of the History of Ideas, 21(2): 161-179
Senata, E. (2003). Statistical Regularity and free will: L. A. J. Quetelet and P. A. Nekrasov. International Statistical
Review, 71, 319-334.
Grinstead, M. and Snell, J. (1997). Introduction to Probability. American Mathematical Society.
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf
Basharin, G., Langville, A., Naumov, V. (2004). The life and work of A.A. Markov. Linear Algebra and its
Applications, 386, 3-26.
Rabiner, L. (1989). A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.
Proceedings of the IEEEl, 77(2), 1989.
Baum, L., Petrie, T. (1966). The Statistical Inference of Probabilistic Functions of Finite State Markov Chains. The
Annals of Mathematical Statistics, 37(6), 1554-1563.
Finn, R., Bateman, A., Clements, J., et al. (2014). Pfam: the protein families database. Nucleic Acids Research,
42(D1), D222-D230.
Edgar,RC (2010) Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461.
"Clustering of highly homologous sequences to reduce the size of large protein database", Weizhong Li, Lukasz
Jaroszewski & Adam Godzik Bioinformatics, (2001) 17:282-283.
Wheele, T., Clements, J., Finn, R. (2014). Skylign: a tool for creating informative, interactive logos representing
sequence alighments and profile hidden Markov models. BMC Bioinformatics, 15(7), 2014
Questions?
Download