RNA sequencing for differential expression genes

RNA sequencing for differential expression genes SPEAKER : TZU-CHUN LO ADVISOR : YAO-TING HAUNG Outline  Molecular Central Dogma  RNA Sequencing  Differential Expression Gene  Case–Control Study  Negative Binomial Distribution  Hypothesis Testing  Rice  SNP, QTL, Pathway Molecular Central Dogma  The central dogma of molecular biology describes the flow of genetic information within a biological system. Forest Branches BBQ RNA Sequencing Gene 1  DNA  RNA  Alignment exons Gene 2 mRNA reads Spliced alignment Alignment Read counts DEG process Finding differential expression genes via read counts each gene. Differential Expression Gene  We want to find the cold-resistant genes in rice.  Rice genome Gene 1 Gene 2 Gene 3  We should compare with two conditions.  Room temperature  Gene 1 Gene 2 Gene 3 Gene 1 13 6 Gene 2 4 5 Gene 3 7 2 Low temperature Cole-resistant differential expression genes : Strategy for DEG  Case–control study  Two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. condition case control Gene 1 69 71  69 v.s 71 Almost the same ? Gene 2 86 56  86 v.s 56 Possible DEG Gene 3 66 111  66 v.s 111 More likely DEG Gene … 4 80 … 60 …  80 v.s 60 How to judge? It is just one of sample in condition.  Question  Is the number adequate to the gene? Negative binomial distribution  How to define the gene is differential expression? Hypothesis test Negative Binomial Distribution  NB is a count data distribution that can substitute Poisson distribution for better variance. j Gene abundance parameter Smooth function i 69 𝑚𝑒𝑑𝑖𝑎𝑛 69 69 × 71 , 86 86 × 56 , 66 66 × 111 = 0.986 i=1~n j=1~m Library size parameter Smooth function is more complex, so let us forget it.  3 FPKM  An indicator used to represent mRNA expression.  Fragments Per Kilobase of transcript per Million mapper reads. 𝐹𝑃𝐾𝑀 = 𝑟𝑒𝑎𝑑𝑠 𝑜𝑓 𝑔𝑒𝑛𝑒 𝑎𝑙𝑙 𝑚𝑎𝑝𝑝𝑒𝑑 𝑟𝑒𝑎𝑑𝑠 𝑚𝑖𝑙𝑙𝑖𝑜𝑛𝑠 ∗ 𝑒𝑥𝑜𝑛 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑔𝑒𝑛𝑒(𝑘𝑖𝑙𝑜𝑏𝑎𝑠𝑒𝑠) 10 Genome Exon length: 8 10 Gene 1 7 4 8 9 reads bases Gene 2 10 𝐺𝑒𝑛𝑒 1 = = 0.029 ∗ 109 4 (10 + 4) (8 + 10 + 7) 𝐺𝑒𝑛𝑒 2 = = 0.017 ∗ 109 ∗ (10 + 4) (8 + 9) 106 103 ∗ 106 103 FPKM 109 𝑉𝑎𝑟 𝐹𝑃𝐾𝑀 = 𝑀∗𝐿 2 (𝑉𝑎𝑟[𝐾])  Before hypothesis testing, we have to get FPKM and variance of FPKM. K-Reads case control FPKM case control Gene 1 69 71 Gene 1 9.34 14.75 Gene 2 86 56 Gene 2 22.31 15.37 Gene 3 66 111 Gene 3 40.48 53.98 … … … … … … Var(K) case control Var(FPKM) case control Gene 1 10 6 Gene 1 6 3.6 Gene 2 170 166 Gene 2 136 132.8 Gene 3 362 310 Gene 3 120.6 109.3 … … … … … … Hypothesis Testing  Step 1 : You find some observations or clues support a novel idea.  Step 2 : Assume a against opinion that you want to fight it.  Step 3 : Go to test it and take a stand. p-value T-test  Using t-test to compare the log ratio (log fold-change) of gene’s expression between condition (a) and (b).  𝑌= 𝐹𝑃𝐾𝑀𝑎 , 𝐹𝑃𝐾𝑀𝑏  log 𝑌 = log 𝑖𝑓 𝑓𝑝𝑘𝑚𝑎 = 𝑓𝑝𝑘𝑚𝑏 , 𝑦 = 1 𝐹𝑃𝐾𝑀𝑎 𝐹𝑃𝐾𝑀𝑏 , 𝑖𝑓 𝑦 = 1, log(𝑦) = 0  𝐻0 : 𝜇 = 0, 𝐻1 : 𝜇 ≠ 0, 𝐴𝑠𝑠𝑢𝑚𝑒 𝑡ℎ𝑎𝑡 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒.  𝑇= 𝐸 log 𝑌 −𝜇 𝑉𝑎𝑟[log(𝑌)] = 𝐸 log 𝑌 𝑉𝑎𝑟[log(𝑌)] 𝐹𝑃𝐾𝑀 ≈ log 𝐹𝑃𝐾𝑀𝑎 𝑏 𝑉𝑎𝑟 𝐹𝑃𝐾𝑀𝑎 𝐹𝑃𝐾𝑀𝑎 2 𝑉𝑎𝑟 𝐹𝑃𝐾𝑀𝑏 𝐹𝑃𝐾𝑀𝑏 2 − T-test 𝐹𝑃𝐾𝑀𝑎 𝐹𝑃𝐾𝑀𝑏 log 𝑇≈ 𝑉𝑎𝑟 𝐹𝑃𝐾𝑀𝑎 𝐹𝑃𝐾𝑀𝑎 2 ⇒ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 𝑉𝑎𝑟 𝐹𝑃𝐾𝑀𝑏 𝐹𝑃𝐾𝑀𝑏 2 − FPKM case control Var(FPKM) case control Gene 1 9.34 14.75 Gene 1 6 3.6 Gene 2 22.31 15.37 Gene 2 136 132.8 Gene 3 40.48 53.98 Gene 3 120.6 109.3 … … … … … … T-test Gene 1 Gene 2 Gene 3 … p-value 0.187 0.039 0.014 … Result Investigating  Discussing alpha=0.05 with read counts & p-value. If alpha=0.05 case control p-value result Gene 1 69 71 0.187 X Gene 2 86 56 0.039 V Gene 3 66 111 0.016 V Gene 4 80 60 0.045 V  If alpha=0.04 or 0.03 ?  We don’t know which alpha is the best, but we can do some subsequent processing. RNA sequencing for Rice  Plan  Cold-resistant genes  Samples  Japonica (TN67): room temperature (R), low temperature (L)  Indica (IR64): room temperature (R), low temperature (L)  Rice  粳稻(TN67) : 米粒闊而短，黏性較大，Q彈，如 : 蓬萊米。  秈稻(IR64) : 米粒細而長，黏性較小，易碎，如 : 在來米。  Zone  TN67 : High-latitude, or high altitude  IR64 : Low-latitude, or low altitude Strategy for DEG  Case–control study  Four combinations   Different varieties or distinct temperatures Four sets of differential expression genes  The DEGs above combination (A,B,C,D)  Negative binomial  Inference probability situation by sample  Hypothesis test  Which is the DEG that we want  Subsequent processing  SNP, QTL, Pathway A TN67R IR64R D B TN67L IR64L C SNP  A single-nucleotide polymorphism is a sequence variation occurring when a single nucleotide differs between members of a biological species. Case ATGCCCTCGTAA TTACTGCGT ATGCGCTCGAAA TTACTCCGT Control Assembly SNP QTL  Quantitative traits refer to phenotypes (characteristics) that vary in degree and can be attributed to polygenic effects (product of two or more genes)  Quantitative trait loci (QTLs) are stretches of DNA containing or linked to the genes that underlie a quantitative trait. Ex : QT(cold) Loci : 599~799 (base) 1 genes QTL DNA  Cold tolerance (29) & pollen fertility (43)  QTL length : ~million bases 1000 Pathway  Pathway is a collection of manually drawn pathway maps representing molecular interaction and reaction networks. Rice Gene No.2 Gene No.55 Gene No.99 Cold-resistant Conclusion  Review  RNA Sequencing  Differential Expression Gene  Case–Control Study  Negative Binomial Distribution  Hypothesis Testing  Rice  SNP  QTL  Pathway Variance of negative binomial  NB is a count data distribution that can substitute poisson distribution for better variance. Strategy for DEG  Case-control in the same temperature : A, C  Case-control in the same variety : B, D  Let T is a set of all genes.  𝐴⋂𝐶 = 𝑋  𝐴⋂ 𝑇 − 𝐶 = 𝑌, 𝑇 − 𝐴 ⋂𝐶 = 𝑍  𝐵⋂𝐷 = 𝑂  𝐵⋂ 𝑇 − 𝐷 = 𝑃, 𝑇 − 𝐵 ⋂𝐷 = 𝑄  𝑟𝑒𝑠𝑢𝑙𝑡 = {𝑋, 𝑌, 𝑍, 𝑂, 𝑃, 𝑄} QTL  生物的另一類性狀例如人類的身高、體重、高  血壓、糖尿病；水稻株高及產量對疾病的抵抗程度；老鼠     的體脂肪百分比；乳牛的乳產量；雞的產卵量，由於其變異性是連續性的，不易分類，且易受環境影響，故稱為數量性狀（quantitative trait）。數量性狀是由多個基因所控制，由於每個基因對數量性狀均有影響，所以每一基因的作用便相對地小。這些控制數量性狀的基因稱為微效基因（polygenes）或又稱為數量性狀基因座(quantitative trait loci，QTL)。 Rice genome size 430Mb QTL Negative binomial distribution  NB is a count data distribution that can inference adequate number by sample. j i Smooth function Negative binomial distribution  NB is a count data distribution that can substitute Poisson distribution for better variance. Hypothesis test  Step 1 : You find some observations or clues support a novel idea.()  Step 2 : Assume a against opinion that you want to fight it.  Step 3 : Go to test it and take a stand. p-value Case-control example  Example condition case control Gene 1 69 71  69 v.s 71 Almost the same Gene 2 86 56  86 v.s 56 Possible DEG Gene 3 66 111  66 v.s 111 More likely DEG … … …  Question  Is the number adequate to the gene?   Negative binomial How to define the gene is differential expression?  Hypothesis test Variance of negative binomial  NB is a count data distribution that can substitute Poisson distribution for better variance. RNA sequencing Gene 1  DNA exons Gene 2 mRNA  RNA reads  Alignment Spliced alignment DNA We should align with regions above blue. RNA sequencing  Spliced alignment  TopHat  Condition 1 : case Condition 2 : control Sample 1 2 3 … 1 2 3 … Gene 1 75 69 70 … 73 71 68 … Gene 2 101 86 75 … 31 56 49 … Gene 3 28 66 45 … 120 111 145 … … … … … … … … … … Reads case control Variance case control Gene 1 69 71 Gene 1 69 71 Gene 2 86 56 Gene 2 86 56 Gene 3 66 111 Gene 3 66 111 … … … … … …

RNA sequencing for differential expression genes

Related documents

Products

Support

RNA sequencing for differential expression genes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib