Slide 1 - Department of Computer Science

advertisement
An Analysis of Hamptonese
Using Hidden Markov
Models
Ethan Le and Mark Stamp
Department of Computer Science
San Jose State University
McNair Scholars Program
Summer Research Project, 2003
Agenda








James Hampton
Purpose
What is Hamptonese?
Hidden Markov Models (HMMs)
HMMs and Hamptonese
Other approach
Results
Questions
James Hampton






WWII veteran
Janitor in Washington D.C.
Life of solitude
Died in 1964
Hamptonese discovered after his death
“The Throne” also found
“The Throne of the Third Heaven of the
Nations’ Millennium General Assembly”
James Hampton
alongside his Throne
Pieces of The Throne
Purpose


Analyzed Hamptonese using Hidden Markov
Model to determine whether Hamptonese
was created using simple substitution of
English letters (or other languages).
Consider alternative interpretations of
Hamptonese
What is Hamptonese?
What is Hamptonese? (Cont’d)



Unknown origin and content
Spans 167 pages
Could it be:




A cipher system?
Gibberish?
Example of human randomness?
Other?
Hidden Markov Models (HMMs)


Markov process with hidden states
HMMs provide probabilistic information about the
underlying state of a model, given a set of
observations of the system.
HMMs (Cont’d)
Three solvable problems
1.
2.
3.
Find probability of observed sequence
Find “optimal” state sequence
Train the model to fit observations
HMMs examples


Example 1: Music Information Retrieval
System (MIR)
Example 2: Deciphering English text (Cave
and Neuwirth)




27 symbols (alphabet letters plus space)
Assume 2 hidden states
Train HMM to best fit input data
Results: separation of consonants and vowels
HMM Example (Cont’d) – English Text
Character
Initial
Final
A
0.03685 0.03793
0.0044447 0.1306242
B
0.04007 0.03978
0.0241154 0.0000000
C
0.03362 0.03423
0.0522168 0.0000000
D
0.03777 0.03654
0.0714247 0.0003260
E
0.03409 0.03608
0.0000000 0.2105809
F
0.03685 0.03932
0.0374685 0.0000000
G
0.03593 0.03839
0.0296958 0.0000000
H
0.03961 0.03932
0.0670510 0.0085455
I
0.04007 0.03377
0.0000000 0.1216511
J
0.03501 0.03515
0.0065769 0.0000000
K
0.03685 0.03700
0.0067762 0.0000000
HMMs and Hamptonese





Transcribed 103 pages
About 30,000 observations
More than 40 distinct symbols
Assume 2 hidden states
3 different reading techniques
Other Approach

Hamptonese vs. 246 different languages


HMM for other languages
Religious references & organizational
patterns


Analysis of first 40 pages
Pattern recognition
Results



Hamptonese is not a simple
substitution for English letters
Hamptonese is probably not a simple
substitution for any other language
First comprehensive study of Hamptonese


Transcription of entire text
Future research suggestions
Acknowledgments…




Dr. Mark Stamp
SJSU McNair Scholars Program
SCCUR
UC Irvine
Questions???
References




Chai, Wei and Vercoe, Barry. (n.d.). Folk Music
Classification Using Hidden Markov Models. MIT Media
Lab.
M. Stamp, A revealing introduction to Hidden Markov
Models
http://www.cs.sjsu.edu/faculty/stamp/Hampton/HMM.pdf
M. Stamp and E. Le, Hamptonese website.
http://www.cs.sjsu.edu/faculty/stamp/Hampton/hampton.
html
R.L. Cave and L.P. Neuwirth, Hidden Markov Models for
English, in Hidden Markov Models for Speech, IDA-CRD,
Princeton, NJ, 1980
Download