Who Wrote Shakespeare? Application of Multi-Disciplinary Research to Medicine and Humanity Albert C.-C. Yang, M.D., PhD Attending Physician, Department of Psychiatry, Taipei Veterans General Hospital, Taiwan Assistant Professor, School of Medicine, National Yang-Ming University, Taiwan accyang@gmail.com Information Created by Biological Systems Neuronal Impulse Genetic Codes Information Created by Biological Systems Human Heartbeats Human Creations Earliest record of paintings by human Lascaux Cave France 20000 BC Human Creations Symbols Jiahu China 6600 BC Vinča signs Europe 4500 BC Indus script India 3500 BC Human Creations Writing Systems Cuneiform script Sumerians Iraq 2600 BC Challenge How to effectively categorize information of different origins? A general principle to analyze information-embedded signals Patterns Human Repetitive Genome vs. Chimpanzee Genome Information Categorization Method Comparison of human literary texts Repetitive patterns: words Frequency and Rank Order Statistics Frequency and Rank Order Statistics Rank Comparison Map Word Rank 50 Tale) (The Winter's Rank (Bonduca) MORE I AND TO OF YOU A Rank (The Winter's Tale) THE 1 3 2 2 330 1 40 ME7 6 FOR 8 THAT 90 … … 6 YOUR GOOD SHALL ARE WILL HIM NO WITH BUT BY WHAT THOU HE 4 520 MY NOW ALL 4 710 WE THEN HIS SO THIS AS HAVE IN BE IS 5ITNOT THAT MY A YOU OF 10 TO AND I THE 9 0 10 … 20 30 Rank (Bonduca) 40 50 Rank Comparison Maps Shakespeare vs. Shakespeare 50 50 WAS Rank (The Winter's Tale) 40 30 20 10 0 Shakespeare vs. Fletcher MORE OR WE SHE MORE WE 40 ALL ON ARE WILL IF WHICH HIM NO BY WHAT THOU THEN HE HIS HER SO WITH THIS ME BUT AS FOR HAVE IN BE IS YOUR IT NOT THAT MY A YOU OF TO AND I THE 0 10 20 30 Rank (Cymbeline) 40 NOW ALL SHALL SIR SHALL ARE WILL HIM NO 30 WITH 20 ME FOR YOUR 10 0 THAT MY A YOU OF TO AND I THE 0 10 BUT BY WHAT THOU HE 50 GOOD THEN HIS SO THIS AS HAVE IN BE IS IT NOT 20 30 Rank (Bonduca) 40 50 Rank Comparison Maps 金庸 vs. 古龍 金庸 vs. 金庸 50 ¤l Ґh 射雕英雄傳 40 20 є ¤в Ё§Ъ ¤§ ¤] Ґu Бn¦і ҐXЁм ЁЈ §A »Ў 10 20 30 倚天屠龍記 §Ъ 20 0 10 »Ў Ґh §A ¤F № D¬O ¤Ј Є є ¤@ 0 ¦і 30 ¤W ¤U¤¤ ¤j Ё У ҐL ¤H і o 10 0 ±o ¤] Ё­ ¤Я ¦b ¤S 40 ­У 30 50 L ®ЙАY № µЫ 40 50 іo ¬O ¤H ¤Ј № D Єє ¤@ 0 10 Ёє ¦b ЁУ 20 Бn ­У ±o ¤l ¤U ¤W ¤¤ ¤в ¤j 30 楚留香傳奇 40 50 Information-Based Similarity Index 1 N12 D(T1 , T2 ) R1 ( wk ) R2 ( wk ) F wk N12 k 1 50 MORE Rank (The Winter's Tale) WE 40 NOW ALL GOOD SHALL ARE WILL HIM NO 30 WITH 20 ME FOR YOUR 10 0 THAT MY A YOU OF TO AND I THE 0 10 BUT BY WHAT THOU HE THEN HIS SO THIS AS Physical Review Letters 90:108103 (2003); HAVE IN BE IS IT NOT Physica A 329:473483 (2003); 20 30 Rank (Bonduca) 40 50 Journal of Computational Biology 12(8):1103-16 (2005). Cluster Analysis Known Authorship Classification Chinese Authorship Debate Dream of the Red Chamber Dream of Red Chamber 紅樓夢 One of China's four great classical novels. Written by Cao Xueqin in the middle of the 18th century during the early Qing Dynasty. 80 Chapters in original manuscript copies. Gao E and Cheng Weiyuan added 40 additional chapters to complete the novel. Authorship Debate (紅樓夢) Rank 1-40 41-80 81-120 Word Frequency Word Frequency Word Frequency 1 了 6250 了 8301 了 6946 2 不 4505 不 5676 的 5499 3 的 4010 的 5539 不 5009 4 一 3891 一 4942 來 3944 5 道 3683 來 4097 道 3756 6 來 3563 人 3892 是 3741 7 人 3139 我 3769 人 3644 8 我 2843 是 3720 一 3461 9 是 2833 道 3683 說 3391 10 說 2805 說 3637 我 2743 Authorship Debate (紅樓夢) Who Wrote Shakespeare’s Plays? Both Marlowe and Shakespeare had births recorded in 1564. Before Shakespeare’s name became widely known, Marlowe had already produced several major works in various genres, including Tamburlaine the Great and Dr. Faustus. Marlowe’s career tragically ended on 30 May, 1593 when he was apparently murdered in a dispute. The Murder of The Man Who Was Shakespeare – Calvin Hoffman Shakespeare did not visit some places which vividly appeared in scenes of Shakespeare’s plays. Shakespeare seems to suddenly appear after Marlowe’s death. Marlowe had not died as claimed in 1593, but instead escaped to a secret refuge in Italy where he spent the rest of his life writing the body of plays generally attributed to Shakespeare. Who Wrote Shakespeare’s Plays? Shakespeare ? Marlowe Henry VIII The Two Noble Kinsmen Edward III Yang AC et al. Physica A 329:473-483 ( 2003) Shakespeare versus Fletcher Critique It is like taking all the words and throwing them in the blender ~ leading Shakespeare scholar Support The Calvin & Rose G. Hoffman Marlowe Memorial Trust 2003 Prize ~ leading Shakespeare scholar Boston Globe Aug 5, 2003: D1-D4; Cook Gareth: “Much Ado About Data” 仿 倪 匡 作 品 倪 匡 原 作 Application to Human Heartbeat Heart rate dynamics Parasympathetic stimulation Sympathetic stimulation Which Heart Rate Pattern is Healthy? Heart Failure Heart Failure Normal Atrial Fibrillation Technical Challenges • How to map a heart rate time series to a symbolic sequence? ? KJLFNHACUARAFVTH TYAERFVVAEVACVAZ CFVDFVZDSDSFVSDF VNTEWOSIXWRXDPOI JRROIRFUFNVIMVMF • How to define words in heart rate symbolic sequences? KJ LFN HACUA RAFVT HTY AER FV VA EVAC VAZ CF VDF VZ D SDSFV SDFV NTEWOSI XW RXDP OIJR RO IRFU FNV IMV MF ? interbeat interval (sec) Symbolic Mapping 1.2 1.0 0.8 1 1 0 0 0 1 1 0 0 1 1 0 0 0.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 beat number 8-bit word: 11000110, 10001100, 00011001 Comparison of Human Heartbeat Health vs. Health D = 0.10 Health vs. Disease D = 0.25 Yang AC et al. Physical Review Letter 90: 108103 (2003) Phylogenetic Tree of Human Heartbeat Yang AC et al. Physical Review Letter 90: 108103 (2003) Clustering of Human Heartbeat Is Associated with β2-AR Gene Polymorphisms Yang AC et al. PLoS ONE 6(5): e19232 (2011) Application to Genetic Sequences Picture obtained from www.genetic-programming.org Analogy to Natural Languages ATATTAGGTTTTTACCTACCCAGGAAAAGCCAACC AACCTCGATCTCTTGTAGATCTGTTCTCTAAACGA ACTTTAAAATCTGTGTAGCTGTCGCTCGGCTGCATG CCTAGTGCACCTACGCAGTATAAACAATAATAAA TTTTACTGTCGTTGACAAGAAACGAGTAACTCGTCC CTCTTCTGCAGACTGCTTACGGTTTCGTCCGTGT TGCAGTCGATCATCAGCATACCTAGGTTTCGTCCGG GTGTGACCGAAAGGTAAGATGGAGAGCCTTGTTC TTGGTGTCAACGAGAAAACACACGTCCAACTCAGT TTGCCTGTCCTTCAGGTTAGAGACGTGCTAGTGCG TGGCTTCGGGGACTCTGTGGAAGAGGCCCTATCGG AGGCACGTGAACACCTCAAAAATGGCACTTGTGGT Similarity Similarity ACTTAAGTACCTTATCTATCTACAGATAGAAAAGT TGCTTTTTAGACTTTGTGTCTACTTTTCTCAACTA AACGAAATTTTTGCTATGGCCGGCATCTTTGATGCT GGAGTCGTAGTGTAATTGAAATTTCATTTGGGTT GCAACAGTTTGGAAGCAAGTGCTGTGTGTCCTAGT CTAAGGGTTTCGTGTTCCGTCACGAGATTCCATTC TACAAACGCCTTACTCGAGGTTCCGTCTCGTGTTTG TGTGGAAGCAAAGTTCTGTCTTTGTGGAAACCAG TAACTGTTCCTAATGGCCTGCAACCGTGTGACACT TGCCGTAGCAAGTGATTCTGAAATTTCTGCAAATG GCTGTTCTACTATTGCGCAAGCCGTCCGCCGTTATA GCGAGGCCGCTAGCAATGGTTTTAGGGCATGCCG DNA “Words” 5’ 3’ TACCCCCACTGTCAACCCAACACAGGCATG…… Word Frequency Rank CCC 633 1 CCT 543 2 CTA 526 3 AAA 524 4 ACC 515 5 … … … Rank Comparison Maps Rank: Mitochondiral DNA (Human) Same Species 60 50 40 30 20 10 0 Different Species CTT TTT AAG GCA AGT ATG AAC TTC GGT CGG GAA TAT GCT TCT CTG CAG GTA ACC GTG AAT CTC TTA ACT TTG TAC TCA ATT ATC ATA CAT TAA TGA AGA GTT TGG TGT CAA GGG GCG AAA GGA GAC CCA TCC GAG TCG TGC ACA CAC GAT CGC GCC GGC GTC CCG AGC AGG CGA ACG TAG CGT CCC CTA CCT 0 10 20 30 40 50 60 Rank: Mitochondiral DNA (Human) CTT TTT GCA AGT ATG TTC GGT CGG 60 TAT GCT CTG CAG ACC 50 40 GTT TGG 30 GCG GAC 20 10 0 TTG TAC TGA AGA 10 20 AAC GAA TCT GTA GTG AAT CTC TTA ACT TCA ATT ATC CAT TGT CAA GGG AAA GGA GAG TCG TGC ACA CAC GAT CGC GCC GGC GTC CCG AGC AGG CGA ACG TAG CGT CCC CTA CCT 0 ATA TAA AAG CCA TCC 30 40 50 60 Rank: Mitochondiral DNA (Gorilla) Human Influenza Virus Our result is consistent with previous finding based on sequence alignment technique (Science 1986; 232: 980) Genome-wide Sequence Comparison (SARS Coronavirus) Yang AC et al. Journal of Computational Biology 12(8):1103-16 (2005). Mathematics compares the most diverse phenomena and discovers the secret analogies that unite them - Joseph Fourier Selected References and Tutorial • 1. Yang AC, Hseu SS, Yien HW*, Goldberger AL, Peng CK. Linguistic analysis of human heartbeats using frequency and rank order statistics. Physical Review Letters 90:108103 (2003). • 2. Yang AC, Peng CK, Yien HW, Goldberger AL. Information categorization approach to literary authorship disputes. Physica A 329:473-483 ( 2003). • 3. Yang AC, Goldberger AL, Peng CK.* Genomic classification using an informationbased similarity index: application to the SARS coronavirus. Journal of Computational Biology 12(8):1103-16 (2005). • 4. Peng CK, Yang AC , Goldberger AL. Statistical physics approach to categorize biologic signals: from heart rate dynamics to DNA sequences. Chaos 17: 015115 (2007). • 5. Yang AC, Tsai SJ, Hong CJ, Wang C, Chen TJ, Liou YJ, Peng CK. Clustering heart rate dynamics is associated with β-adrenergic receptor polymorphisms: analysis by information-based similarity index. PLoS ONE 6(5): e19232 (2011). Online Tutorial: http://www.physionet.org/physiotools/ibs/ Physionet: NIH Research Resource for Complex Physiologic Signals