Basics of Systems Bioinformatics 沈祖望 博士 Tsu-Wang (David) Shen, Ph.D. Medical Informatics Department Tzu Chi University, Taiwan E-mail: tshen@mail.tcu.edu.tw Web: http://www.biom3.tcu.edu.tw BioM3 Agenda BioM3 1. About me & Data mining 2. Introduction of Biological signal processing 3. Basics of Biology 4. Basics of Signals & Systems 5. Applications 6. Future Works 2016/3/18 2 Introduce myself BioM3 Education 美國威斯康辛大學麥迪遜分校生物醫學工程博士, Biomedical Engineering, Ph.D.(University of Wisconsin - Madison, WI, USA) 美國威斯康辛大學麥迪遜分校電機電腦工程碩士, Electrical and Computer Engineering, M.S. (University of Wisconsin - Madison, WI, USA) 美國伊利諾理工學院電機電腦工程碩士, Electrical and Computer Engineering M.S.(Illinois Institute of Technology, Chicago, IL, USA) Specialty 生醫工程、生物醫學訊號處理、生物安全辨識、類神經網 路、生物醫學統計、人工智慧、生醫電子 2016/3/18 3 Introduce myself BioM3 Experience 慈濟大學課務組組長 (2005.8.1~2006.7.31) 國立東華大學全球運籌管理研究所兼任助理教授 (2007) 慈濟大學醫學研究所合聘助理教授 (2006~) 教育部高科技專利取得與攻防種子教師 (2006) IEEE Member since 1996 台灣癲癇醫學會會員 since 2007 中華民國生醫工程師證照 (2007 I1090) Educational Psychology Department, University of Wisconsin-Madison, WI, USA, 電腦專案助理 (1999~2005) 大華技術學院專任講師(1995~1996) 2016/3/18 4 Data mining BioM3 • Why mining? • We need information from big data warehouse, but it is too many information! • What is data mining? • Data mining is the process of sorting through large amounts of data and picking out relevant information. 2016/3/18 5 Traditional data mining skills 2016/3/18 6 BioM3 Traditional data mining skills 2016/3/18 7 BioM3 Today – New view of data mining BioM3 Systems Bioinformatics: An Engineering Case-Based Approach by Gil Alterovitz, Marco F. Ramoni Computational Biology : This trail-blazing work introduces a quantitative systems approach to bioinformatics research using powerful computational tools drawn from signal processing, circuit analysis, control systems, and communications. It presents the functionality of biological processes in an engineering context to facilitate the application of technical skills in solving the field's challenges, from the lab bench to data analysis and modeling, and to enable reverse engineering from biology in the development of synthetic biological devices. 2016/3/18 8 Systems Bioinformatics BioM3 Today, we will focus on using signal processing and system skill to solve Bioinformatics & Biology problems as the “mining” skills. 2016/3/18 9 What are signals? BioM3 ♦ Signals are everywhere! Signals for detecting the presence of an object. For example, • Signals for making medical diagnosis (eg. electrocardiogram (ECG), electroencephalogram (EEG)] • The trading volume in the stock market. ♦ Signals can be categorized in various ways continuous (analog) - time signals discrete - time signals digital signals one - dimensiona l signals two - dimensiona l signals 2016/3/18 10 What are signals? A signal is an abstract element of information, or (more commonly) a flow of information (in one or more dimensions). One or more variable -> one dimensional vs. multidimensional Signal vs. noise -> 端看是否需要 2016/3/18 11 BioM3 Continuous (Analog), Discrete, and Digital signals 2016/3/18 12 BioM3 Continuous (Analog), Discrete, and Digital signals 2016/3/18 13 BioM3 BioM3 What is a system? A system is defined as an entity that manipulates one or more signals to accomplish a function, thereby yielding new signals x[n] y[n] h[n] 2016/3/18 14 Specific systems Communication Systems Control Systems Microelectromechanical systems (MEMS) Remote sensing Biomedical signal systems Auditory system Bioinformatics systems Others … 2016/3/18 15 BioM3 Biological Signal Processing The discipline aimed at Understanding and modeling the biological algorithms implemented by living systems using signal processing theory. (biology as an endpoint) The efforts seeking to use biology as a metaphor to formulate novel signal processing algorithms for engineering systems. (signal processing as an endpoint) 2016/3/18 16 BioM3 History BioM3 Claude Bernard 1800’s – concept of homeostasis Ludwig von Bertalanffy 1968 – General System Theory: there appear to exist general system laws which apply to any system of a particular type, irrespective of the particular properties of the systems and the elements involved … Norbert Wiener 1900’s – system feedback Reiner 1960s – System works vs. DNA findings Focused on reductionist component-level view. 2016/3/18 17 Electrical vs. Biological Signal Processing BioM3 整合訊號處理 與生物 2016/3/18 18 BSP at the cell level -Today’s coverage Signal detection and estimation DNA sequencing Gene identification Protein hotspots identification System identification and analysis Gene regulation systems Protein signal systems 2016/3/18 19 BioM3 BioM3 Basics of Biology DNA -> RNA -> amino acid ->protein -> cells (Gene) 2016/3/18 20 Introduction of Basics of Biology BioM3 In April 2003, sequencing of all three billion nucleotides in the human genome was declared complete. 1631 human genetic diseases are now associated with know DNA sequence. The human genome was not the only organism targeted. (June 2004, 1557 viruses, 165 microbes, and 26 eukaryotes.) 2016/3/18 21 DNA and Gene Expression •DNA is found in nucleus and mitochondria of eukaryotic cells. DNA是存在細胞核內,包含動物細胞、 植物細胞與真菌細胞。 •DNA in the nucleus contains information from both parents. •The chemical that directs protein synthesis and serves as a genetic blueprint is DNA, which is found in the nucleus of the cell. 2016/3/18 22 BioM3 BioM3 DNA 脫氧核糖核酸(DNA,為英文Deoxyribonucleic acid的縮寫),又稱去 氧核糖核酸,是染色體的主要化學成分,同時也是組成基因的材料。有 時被稱為「遺傳微粒」,因為在繁殖過程中,父代把它們自己DNA的一 部分複製傳遞到子代中,從而完成性狀的傳播。 事實上,原核細胞(無細胞核)的DNA存在於細胞質中,而真核生物的 DNA存在於細胞核中。嚴格的說,DNA是由兩條單鏈像葡萄藤那樣相互盤 繞成雙螺旋形,根據螺旋的不同分為A型DNA,B型DNA和Z型DNA,詹姆斯 ·沃森與佛朗西斯·克里克所發現的雙螺旋,是稱為B型的水結合型DNA, 在細胞中最為常見。 這種核酸高聚物是由核苷酸連結成的序列,每一個核苷酸都由一分子脫 氧核糖,一分子磷酸以及一分子鹼基組成。DNA有四種不同的核苷酸結 構,它們是腺嘌呤(adenine,縮寫為A),胸腺嘧啶(thymine,縮寫 為T),胞嘧啶(cytosine,縮寫為C)和鳥嘌呤(guanine, 縮寫為G )。在雙螺旋的DNA中,分子鏈是由互補的核苷酸配對組成的,兩條鏈 依靠氫鍵結合在一起。由於氫鍵鍵數的限制,DNA的鹼基排列配對方式 只能是A 對T或C對G。因此,一條鏈的鹼基序列就可以決定了另一條的 鹼基序列,因為每一條鏈的鹼基對和另一條鏈的鹼基對都必須是互補的 。在DNA複製時也是採用這 種互補配對的原則進行的:當DNA雙螺旋被 展開時,每一條鏈都用作一個模板,通過互補的原則補齊另外的一條鏈 。 分子鏈的開頭部分稱為3'端而結尾部分稱為5'端,這些數字表示脫氧核 糖中的碳原子編號。 2016/3/18 23 Base pairs BioM3 鹼基對是形成核酸DNA、RNA單 體以及編碼遺傳信 息的化學結構。組成鹼基對的鹼基包括A、G、T、 C、U。嚴格地說,鹼基對是一對相互匹配的鹼基 (即A:T, G:C,A:U相互作用)被氫鍵連接起 來。然而,它常被用來衡量DNA和RNA的長度(儘 管RNA是單鏈)。它還與核苷酸互換使用,儘管後 者是由一個五碳 糖、磷酸和一個鹼基組成。 鹼基對通常簡寫做bp(英語base pair),千鹼基對 爲kbp,或簡寫作kb(對於雙鏈核酸。對於單鏈核 酸,kb指千鹼基)。 2016/3/18 24 Base pairs, DNA 2016/3/18 25 BioM3 DNA and Gene Human Genome has been estimated about 30000. 2016/3/18 26 BioM3 DNA-> Gene BioM3 Each gene has a particular location in a specific chromosome and contains the “code” for producing one of three forms of RNA (ribosomal RNA (rRNA), messenger RNA (mRNA), and transfer RNA (tRNA)). 2016/3/18 27 Genomes 在生物學中,一個生物體的基因組是指該生物 的DNA(對一部分病毒是RNA)中所包含的全部 遺傳信息。基因組包括基因和非編碼序列。 1920年,德國漢堡大學植物學教授 Hans Winkler 首次使用基因組這一名詞。 精確地講,一個生物體的基因組是指一組染 色體中的完整的DNA序列。例如,生物個體體 細胞中的二倍體由兩條染色體組成,其中之一 的DNA序列就是一個基因組。 Yu-Gi-Oh! 2016/3/18 28 BioM3 The Central Dogma of biology The Central Dogma of biology. DNA is copied into RNA (transcription); the RNA is used to make proteins (translation); and the proteins perform functions such as copying the DNA (replication). (www.biologyforengineers.org). 2016/3/18 29 BioM3 Replication BioM3 Replication of DNA by DNA polymerase. After the two strands of DNA separate (top), DNA polymerase uses nucleotides to synthesize a new strand complementary to the existing one (bottom). Images from the online tutorial “Biological Information Handling: Essentials for Engineers” (www.biologyforengineers.org). 2016/3/18 30 Replication During Replication, some enzymes check for accuracy. Error rate: approximately one per billion. 2016/3/18 31 BioM3 Transcription (轉錄) BioM3 Transcription of DNA by RNA polymerase. Note that RNA contains U’s instead of T’s. Image from the online tutorial “Biological Information Handling: Essentials for Engineers” (www.biologyforengineers.org). 2016/3/18 32 Transcription mRNA,為messenger RNA的簡稱,或稱為信使RNA。 mRNA上帶著從DNA轉錄來的,提供轉譯成蛋白質所需訊 息。 Messenger RNA (mRNA) 是攜帶從DNA而來的遺傳 訊息,到細胞中合成蛋白質的核糖體位置的RNA。 mRNA 在它短暫的存在時間中,經過了數個步驟:在轉錄的過 程中,一個叫做RNA聚合酶的酵素,按照其需要,從DNA 中複製出一段基因到mRNA上。在原核生物中,mRNA並未 被進一步去處理(但有些罕有的特例),而經常是在轉錄 過程中,同時也進行轉譯。在真核生物中,轉錄跟轉譯 發生在細胞的不同位置(轉錄發生在DNA所儲存的細胞核 中,而轉譯是發生在核糖體所在的細胞質中)。 2016/3/18 33 BioM3 Transcription mRNA在真核細胞中 的交互作用。 RNA 在轉錄之後被創造 出來;在經過修剪 和加上多腺嘌呤尾 之後,被運送到細 胞質,然後在核糖 體那進行轉譯。 2016/3/18 34 BioM3 Transcription vs. replication BioM3 Only a certain stretch of DNA acts as the template and not whole strand Different enzymes are used. Only a single strand is produced. 2016/3/18 35 Translation (轉譯) BioM3 In the process of translation, each mRNA codon attracts a tRNA molecule containing a complementary anticodon. (www.biologyforengineers.org). 2016/3/18 36 反密碼 (anti-codon) BioM3 是位在tRNA上,tRNA上反密碼決定tRNA所要攜帶的氨 基酸,反密碼會與mRNA上的密碼配對。 2016/3/18 37 Transcription & Translation 2016/3/18 38 BioM3 DNA to RNA to Protein BioM3 The genetic information is stored in DNA, copied to RNA, and then interpreted from the RNA copy to form a functional protein. mRNA <-> rRNA <-> tRNA 蛋白質是由氨基酸串成的 。細胞內有20種氨基酸 ,可以串出不同長度、不同形狀、不同功能的蛋 白質。 2016/3/18 39 蛋白質是依照基因的密碼來合成 BioM3 DNA是遺傳物質, DNA上面三個相鄰 的鹼基就可以組成 一個密碼,基因要 做蛋白質的時候, 是先將DNA的密碼 轉錄成RNA,RNA離 開細胞核進入細胞 質中,與核糖體結 合,就可以依RNA 上的密碼合成蛋白 質。 2016/3/18 40 BioM3 DNA to RNA to Protein(蛋白質) 2016/3/18 41 如何知道是三個相鄰? 二十個氨基酸 (amino acid) 一個: 兩個: 4 4 1 4 16 2 三個 4 64 3 2016/3/18 42 BioM3 64 codons and the amino acid for each http://en.wikipedia.org/ http://www.medigenomix.de/ 2016/3/18 43 BioM3 BioM3 2016/3/18 Complexity at the protein level exceeds complexity at the gene and transcript levels. Individual genes in the genome are transcribed into RNA. In eukaryotes, the RNA may be further processed to remove intervening sequences (RNA splicing) and result in a mature transcript that encodes a protein. Different proteins may be encoded by differently spliced transcripts (alternative splicing products). Moreover, once proteins are produced, they can be processed (e.g. cleaved by a protein-cutting protease) or modified (e.g. by addition of a sugar or lipid molecule). Moreover, proteins may have non-covalent interaction with other proteins (and/or with other biomolecules such as lipids or nucleotides). Each of these can have tissue-, stage- and cell-type specific effects on the abundance, function and/or stability of proteins produced 44 from a single gene. Transcription BioM3 而且在真核生物中,mRNA在準備好轉譯前需要經過許多處理的步驟: 1. 加上一個5'cap-一個經過修飾的鳥嘌呤(guanine)被加到mRNA 的5'端(5'end)。這個5'cap對於辨識與接到適當的核糖體是相當 重要的。 2. 修剪(splicing)-pre-mRNA (尚未經過修飾或是部份經過修 飾的mRNA,稱作pre-mRNA,或是heterogeneous nuclear RNA, hnRNA)被修飾去除掉內含子(intron),也就是不被轉錄的區段; 其餘存留著,能夠轉譯成蛋白質的序列,則被稱作外顯子(exon)。 通常一個pre-mRNA能經由數種不同的修剪方式,產生不同的成熟 mRNA,使一段基因(在不同組織,器官中)能經過轉錄轉譯後產生 不同的蛋白質,表現出不同的作用。這種修剪叫做alternative splicing。大部分的mRNA修剪都是由酵素所執行,不過有些RNA分 子也有能力催化自身的修剪,例如:ribozymes。 3. 加上多個腺嘌呤尾 (polyadenylation)-藉由酵素polyA polymerase,在pre-mRNA的3'端上加上了一段數個腺嘌呤序列(通 常是數百個),(這項修飾並不會出現在原核生物中)。這段多腺嘌 呤尾在轉錄的時候,會加上一段特殊的片段,AAUAAA。 2016/3/18 45 Gel electrophoresis Method for separating DNA, RNA, or protein using an electrical charge to separate DNA molecules through a threedimensional matrix. The larger the DNA fragment, the slower it will move through the matrix. DNA isolated on a gel can be recovered and purified away from the matrix. 2016/3/18 46 BioM3 Polymerase Chain Reaction (PCR) BioM3 A method for amplification of a specific DNA fragment in which paired DNA strands are separated (by high temperature) and then each is used as a template for production of complementary strand by an enzyme (a DNA polymerase). PCR的發展可以說是從DNA合成酵素的發現緣起。但 由於這個酵素是一種易被熱所破壞之酵素,因此不符合 一連串的高溫連鎖反應所需。 現今所使用的酵素 (簡稱 Taq polymerase),則是於 1976年從熱泉中的細菌(Thermus Aquaticus) 分離 出來的。它的特性就在於能耐高溫,是一個很理想的酵 素. 2016/3/18 47 Polymerase Chain Reaction (PCR) Flash DEMO 2016/3/18 48 BioM3 BioM3 Gene clone Flow-chart of the major steps involved in gene clone set production and use. 2016/3/18 49 BioM3 Annotated Genome Computational Analysis Experimental Data Literature Mining Unique Set of Target ORFs Bioinformatic approaches to select target genes for a cloning project. For bacterial genomes, target selection primarily draws from genome sequence, where introns are not a consideration and genome-scale projects are feasible. For eukaryotes, researchers commonly use one or more informatics-based methods to identify sub-groups of target genes that share a common feature, such as function, localization, expression or disease association. As noted, these information sources draw significantly on one another (as experimental data is in genome annotation, etc.). 2016/3/18 50 BioM3 Basics of Signals and Systems 2016/3/18 51 BioM3 Signal processing Basics Time- Domain Representation 隨時間改變之訊號 1 0.5 0 -0.5 0 500 1000 1500 2000 Frequency – Domain Representation 傅立業發現:任何的訊號都可表達為SIN和COS函數的 組合 (驚!) 1 200 0.5 150 0 -0.5 100 -1 50 -1.5 -2 0 50 2016/3/18 100 150 200 52 250 50 100 150 200 250 Time Operations Time scaling y (t ) x(at ) y[n] x[kn], k 0 If a >0, Y(t) is compressed. If 0< a <1, y(t) is expended 2016/3/18 53 BioM3 Time scaling 2016/3/18 54 BioM3 Reflection BioM3 Y(t) = x(-t) The y(t) represents a reflected version of x(t) about t=0. 2016/3/18 55 Time shifting Y(t)=x(t-t0) If t0 >0 shifting toward the right. If t0 <0 shifting toward the left. 2016/3/18 56 BioM3 Example: Precedence Rule Y(t)=x(at-b) V(t)=x(t-b) Y(t)=v(at)=x(at-b) 2016/3/18 57 BioM3 時間訊號,傅力業轉換,Z 轉換 藍星人= 地球人 只是角度不同, 說話因而不同! 2016/3/18 58 BioM3 Relationship between Time Properties of a signal and the Appropriate Fourier Representation BioM3 Time property Periodic Non-periodic Continuous (t) Fourier Series (FS) Fourier Transform (FT) Discrete [n] Discrete-time Fourier Series (DTFS) Discrete-time Fourier Transform (DTFT) 2016/3/18 59 Relations Among Fourier Methods 2016/3/18 60 BioM3 BioM3 各種轉換方式 Time Fourier Z S -Laplace 2016/3/18 61 Filters 2016/3/18 62 BioM3 Advantages of digital filters over analog filters BioM3 Highly immune to noise because of the way it is implemented (software/digital circuits) Accuracy dependent only on round-off error, directly determined by the number of bits Easy and inexpensive to change a filter’s operating characteristics (e.g., cutoff frequency) Performance not a function of component aging, temperature variation, and power supply voltage 2016/3/18 63 BioM3 Signal conversion Band- limiter and sampler Continuous signal Reconstruction filter y(t) y(kT) x(kT) x(t) 2016/3/18 Digital filter (processor) Continuous signal Sampled signal 64 Sampling as multiplication by a train of impulses xp(t) x(t) (a) x p(t) x(t) (b) 0 t p(t) 1 0 Ts 2Ts 3Ts 4Ts 5Ts 6Ts 7Ts t 7Ts t (c) xp(t) xq(1Ts) xq(0Ts) xq(2Ts) xq(6Ts) (d) 0 2016/3/18 Ts 2Ts 3Ts 4Ts 65 5Ts 6Ts BioM3 Sampling theorem BioM3 Must sample at a rate at least twice the highest frequency present in the signal (including noise) If a signal contains no frequencies higher than fc, the original signal can be completely recovered by sampling at least 2 fc samples/s. Sampling frequency fs must be at least twice the highest frequency present in a signal (Nyquist frequency) 2016/3/18 66 LTI system BioM3 •何謂LTI (Linear Time Invarent) two conditions ay ( n ) by ( n ) L [ ax ( n ) bx ( n )] 1 2 1 2 y(n, k ) T [ x(n k )] y(n k ) 2016/3/18 67 BioM3 System operation convolution multiple y ( n) x ( k ) h( n k ) Y ( ) X ( ) H ( ) k h( k ) x ( n k ) k 2016/3/18 68 Autocorrelation and PSD BioM3 Rxx [m] E{x[n]x[n m]} R yy [m] Rxx [m k ]Rhh [k ] k The power spectral density (PSD) of a signal is defined as Fourier transform of the autocorrelation function. 2016/3/18 69 Example 數位 OR 類比? 2016/3/18 70 BioM3 BioM3 Signal Detection and Estimation 2016/3/18 71 Estimation and detection 2016/3/18 72 BioM3 Signal detection and estimation BioM3 Estimation Xˆ [n] Detection Hypotheses testing Five steps for analyzing genomic and proteomic data Describe and identify the measurement system S Define the signal of interest x[n]. Map the biological space into a numerical space. Formulate the problem (estimation, detection, or analysis Solve problem and compute output signals Interpret the results in a biological context. 2016/3/18 73 DNA sequencing BioM3 The DNA sequencing process DNA sample preparation Electrophoresis Processing Processing the eletropherogram data to identify the DNA sequence Conditioning the signal and increasing S/N ratio Identifying the underlying DNA seqence. 2016/3/18 74 DNA sequencing - eletropherogram 2016/3/18 75 BioM3 http://www.genome.uab.edu/ Model of DNA sequencing Blurring : for example, diffusion effects, instrument noises 2016/3/18 76 BioM3 Wiener filter http://en.wikipedia.org/wiki/Wiener_filter 2016/3/18 77 BioM3 DNA sequence estimation using LTI filtering h[k ]R~x ~x [m k ] R x~x [m] k j S x~x (e ) H (e ) S ~x ~x (e j ) j 2016/3/18 78 BioM3 Homomorphic Blind Deconvolution BioM3 Wiener filter can lead significant errors on diffusion effects. High-pass filter to reduce blurring (low frequency) 2016/3/18 79 Results BioM3 Error rate 1.06% is better than reports from florescence – base sequencing instrument! 2016/3/18 80 Model based estimation techniques There are four main distortions introduced by sequencing: 1.Loading artifects 2.Diffusion effects 3.Fluorescence interference 4.Additive instrument noise 以MODEL方式反推 2016/3/18 81 BioM3 Gene identification Once the DNA sequence has been identified, it needs to be analyzed to identify genes and coding sequence. 2016/3/18 82 BioM3 DNA signal properties Bacterium Aquifex aeolicus 2016/3/18 BioM3 • Consider autocorrelation function of xa[n] • The nonflat shape of the spectrum revels correlations at low frequencies, indicating that base pairs that are far away seems to be correlated. 83 DNA signal properties BioM3 At thin peak 2pi/3, the increased correlation corresponds to the tendency of nucleotides to be repeated along the DNA sequence with period 3 and is indicative of coding regions. The triplet nature of the codon Potentially codon bias (unequal usage of codon) The biased usage of nucleotide triples in genomic DNA (triplet bias). Yin and Yau showed that the period-3 property is not affected by codon bias. The period -3 property of coding regions seems to be generated by unbalanced nucleotide distributions in three codon potions. 2016/3/18 84 DNA signal processing for Gene identification BioM3 Fickett – the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding gene. i.e. predict the amino acid sequence of protein to provide the insight of function. The premise of all methods is to exploit the period-3 property of coding regions by processing the DNA signal to identify regions with strong period – 3 correlation. 2016/3/18 85 DNA signal processing for Gene identification BioM3 DNA spectrum 2 2 2 S x [k ] X a [k ] X t [k ] X c [k ] X g [k ] 2 Signal-to-noise ratio S x [ N / 3] Px ( N / 3) Sx Tiwari et al. observed that for most coding sequences in variety of organism, Px is large, but not in non coding regions. 2016/3/18 86 Fourier spectra Coding stretch of DNA 2016/3/18 noncoding stretch of DNA 87 BioM3 Filtering methods applied window Predict the five exons for gene F56F11.4 in C. elegans chromosome III. IIR antinotch Multistage filters 2016/3/18 88 BioM3 Protein Hotspots identification BioM3 Once coding regions have been identified, the corresponding protein sequence can be determined by mapping the coding region to the amino acid sequence using the genetic code. 2016/3/18 89 Protein Signal Definition BioM3 The new physicomathematical approach resented here is called the Resonant Recognition Model (RRM). The RRM is based on the representation of the protein primary structure as a numerical series by assigning to each amino acid a physical parameter value relevant to the protein’s biological activity. 2016/3/18 90 Protein Signal Definition BioM3 The RRM is a physical and mathematical model which interprets protein sequence linear information using signal analysis methods. It comprises two stages: The first involves the transformation of the amino acid sequence into a numerical sequence. Each amino acid is represented by the value of the electron-ion interaction potential (EIIP) which describes the average energy states of all valence electrons, in particular amino acids. Numerical series obtained this way are then analyzed by digital signal analysis methods in order to extract information pertinent to the biological function. 2016/3/18 91 BioM3 EIIP 2016/3/18 92 RESONANT RECOGNITION MODEL (RRM) BioM3 2016/3/18 93 Prediction of protein hotspot cytochrome C proteins 2016/3/18 94 BioM3 BioM3 System identification and Analysis 2016/3/18 95 Signal processing view of the cell 2016/3/18 96 BioM3 Signal coordination at cell level System view 2016/3/18 97 BioM3 Gene expression BioM3 A DNA microarray (also commonly known as gene chip, DNA chip, or biochip) is a collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon chip forming an array. DNA microarrays, such as cDNA microarrays and oligonucleotide microarrays. In genetics, complementary DNA (cDNA) is DNA synthesized from a mature mRNA template. cDNA is often used to clone eukaryotic genes in prokaryotes. 2016/3/18 98 Gene expression One sample from a tumor; One sample from a normal tissue. White – highly expressed under treatment A Gray – no difference Dark - highly expressed under treatment B 2016/3/18 99 BioM3 Time changes – measure drugs in 6 hours. 2016/3/18 100 BioM3 cDNA microarrays The procedure begins by attaching the DNA sequences of thousands of genes onto microscope slide in the pattern of spots, with each spot containing only DNA sequences of a single gene. 2016/3/18 101 BioM3 Oligonucleotide microarrays BioM3 • In oligonucleotide microarrays (or single- channel microarrays), the probes are designed to match parts of the sequence of known or predicted mRNAs. • In stead of attaching full-length DNAs, oligonucleotide microarrays make use of short oligonucleotide chosen to be specific to individual genes. 2016/3/18 102 Oligonucleotide microarrays These microarrays give estimations of the absolute value of gene expression and therefore the comparison of two conditions requires the use of two separate microarrays. 2016/3/18 103 BioM3 BioM3 DNA microarray x0 [ m] x [ m] 1 X x N 1[m] 2016/3/18 104 Principle components analysis BioM3 Principle component analysis transforms the original set of variables into a smaller set of linear combinations that account for most of variance of the original set. The purpose of principle component analysis is to determine factors (i.e., principle components) in order to explain as much of the total variation in the data as possible with as few of these factors as possible. PC (1) w11 X 1 w12 X 2 ... w1 p X p PC (m) wm1 X 1 wm 2 X 2 ... wmp X p The principal components are those uncorrelated linear combinations PC(1), PC(2), …, PC(m) whose variances are as large as possible. 2016/3/18 Eigenvalues and Eigenvectors BioM3 Definition Let A be an n n matrix. A scalar is called an eigenvalue of A if there exists a nonzero vector x in Rn such that Ax = x. The vector x is called an eigenvector corresponding to . 2016/3/18 106 Computation of Eigenvalues and Eigenvectors Let A be an n n matrix with eigenvalue and corresponding eigenvector x. Thus Ax = x. This equation may be rewritten Ax – x = 0 giving (A – In)x = 0 Solving the equation |A – In| = 0 for leads to all the eigenvalues of A. On expending the determinant |A – In|, we get a polynomial in . This polynomial is called the characteristic polynomial of A. The equation |A – In| = 0 is called the characteristic equation of A. 2016/3/18 107 BioM3 Example 1 Find the eigenvalues and eigenvectors of the matrix Solution We get 4 6 A 5 3 Let us first derive the characteristic polynomial of A. 4 6 1 0 4 6 A I 2 3 5 0 1 3 5 A I 2 (4 )(5 ) 18 2 2 We now solve the characteristic equation of A. 2 2 0 ( 2)( 1) 0 2 or 1 The eigenvalues of A are 2 and –1. =2 2016/3/18 6 6 x1 ( A 2I 2 ) x 0 3 x2 3 108 BioM3 BioM3 This leads to the system of equations 6 x1 6 x2 0 3x1 3x2 0 giving x1 = –x2. The solutions to this system of equations are x1 = –r, x2 = r, where r is a scalar. Thus the eigenvectors of A corresponding to = 2 are nonzero vectors of the form 1 r 1 = –1 3 6 x1 ( A 1I 2 ) x 0 6 x2 3 Thus x1 = –2x2. The eigenvectors of A corresponding to = –1 are nonzero vectors of the form s[-2 1]t 2016/3/18 109 Singular value decomposition (SVD) BioM3 Suppose M is an m-by-n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorization of the form • The matrix V thus contains a set of orthonormal "input" or "analysing" basis vector directions for M • The matrix U contains a set of orthonormal "output" basis vector directions for M • The matrix Σ contains the singular values, which can be thought of as scalar "gain controls" by which each corresponding input is multiplied to give a corresponding output. 2016/3/18 110 SVD example BioM3 A non-negative real number σ is a singular value for M if and only if there exist unit-length vectors u in Km and v in Kn such that 2016/3/18 111 Eigengenes from applying SVD 2016/3/18 112 BioM3 Project CLB2 and CLN3 2016/3/18 113 BioM3 Apoptosis System Identification 2016/3/18 114 BioM3 Apoptosis System Identification 2016/3/18 115 BioM3 PCA results 2016/3/18 116 BioM3 Summarizations We tried to link signals, systems, and biology. Filtering is necessary for removing artifacts. We learned “Signal detection and estimation” 1. DNA sequencing 2. Gene identification 3. Protein hotspots identification We learned “System identification and analysis” 1. Gene regulation systems 2. Protein signal systems We provided the new thought of data mining. 2016/3/18 117 BioM3 Thanks 2016/3/18 118 BioM3