BCB 444/544 Lecture 15 RNA, Proteins, Promoters, TFs More Review: Next time: Profiles & Hidden Markov Models (HMMs) #15_Sept26 BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 1 Required Reading (before lecture) Mon Sept 24 - Lecture 14 Review: Nucleus, Chromosomes, Genes, RNAs, Proteins Surprise lecture: No assigned reading Wed Sept 26 - Lecture 15 Profiles & Hidden Markov Models • Chp 6 - pp 79-84 • Eddy: What is a hidden Markov Model? 2004 Nature Biotechnol 22:1315 http://www.nature.com/nbt/journal/v22/n10/abs/nbt1004-1315.html Thurs Sept 27 - Lab 4 & Fri Sept 28 - Lecture 16 Protein Families, Domains, and Motifs • Chp 7 - pp 85-96 BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 2 Assignments & Announcements Wed Sept 26 • Exam 1 • HW#2 • • - Graded & returned in class - Graded & returned in class Answer KEYs posted on website Grades posted on WebCT • HomeWork #3 - posted online Due: Mon Oct 8 by 5 PM • HW544Extra #1 - posted online Due: Task 1.1 - Mon Oct 1 by noon Task 1.2 & Task 2 - Mon Oct 8 by 5 PM BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 3 BCB 544 - Extra Required Reading Mon Sept 24 BCB 544 Extra Required Reading Assignment: • Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, Kern AD, Dehay C, Igel H, Ares M Jr, Vanderhaeghen P, Haussler D. (2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature 443: 167-172. • http://www.nature.com/nature/journal/v443/n7108/abs/nature05113.html • doi:10.1038/nature05113 • PDF available on class website - under Required Reading Link BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 4 Cell & Molecular Biology: the Basics Slide Credits: Terribilini, 06; & some adapted from Erin Garland BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 5 Eukaryotic Cell • Enclosed & subdivided by membranes • Several compartments called organelles • Multiple linear chromosomes in nucleus BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 6 Prokaryotic Cell • • • • Enclosed by membrane & cell wall No real organelles Single circular chromosome (usually) Has nucleoid but no true nucleus Wrong! DNA is never naked (inside cells)! X BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 7 Translation: mRNA to protein, by ribosomes protein tRNA Amino acids ribosome tRNA mRNA = messenger RNA Codon = 3 nucleotides encode an amino acid BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 8 Genetic Code: Universal (almost!) Stop Codons Start Codon BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 9 Mutations • Nonsense = STOP codon in wrong place! • Missense = mutation that results in an amino acid change in the protein • Synonymous = mutation in DNA that does not result in an amino acid change in protein • Non-synonymous = mutation in DNA that does result in an amino acid change in protein Question: Can a "synonymous" mutation alter expression of a protein - even though DNA change is "silent" (because it does not change encoded amino acid)? YES! How? This was last Slide covered in Class on Mon 9/24 BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 10 Extra Credit Questions #2-6: 2. What is the size of the dystrophin gene (in kb)? (Is it still the largest known human protein?) 3. What is the largest protein encoded in human genome (i.e., longest single polypeptide chain)? 4. What is the largest protein complex for which a structure is known (for any organism)? 5. What is the most abundant protein (naturally occurring) on earth? 6. Which state in the US has the largest number of mobile genetic elements (transposons) in its living (plant and animal) population? For 1 pt total (0.2 pt each): Answer all questions correctly & submit to terrible@iastate.edu For 2 pts total: Prepare a PPT slide with all correct answers & submit to ddobbs@iastate.edu before 9 AM on Mon Oct 1 • Choose one option - you can't earn 3 pts! • Partial credit for incorrect answers? only if they are truly amusing! BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 11 Extra Credit Questions #7 & #8: Given that each male attending our BCB 444/544 class on a typical day is healthy (let's assume MH=7), and is generating sperm at a rate equal to the average normal rate for reproductively competent males (dSp/dT = ? per minute): 7a. How many rounds of meiosis will occur during our 50 minute class period? 7b. How many total sperm will be produced by our BCB 444/544 class during that class period? 8. How many rounds of meiosis will occur in the reproductively competent females in our class? (assume FH=5) For 0.6 pts total (0.2 pt each): Answer all questions correctly & submit to terrible@iastate.edu For 1 pts total: Prepare a PPT slide with all correct answers & submit to ddobbs@iastate.edu before 9 AM on Mon Oct 1 • Choose one option - you can't earn more than 1 pt for this! • Partial credit for incorrect answers? only if they are truly amusing! BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 12 Protein Function • Proteins are primary molecules responsible for carrying out cellular functions Proteins are the workhorses of the cell • Most "enzymes" that catalyze chemical reactions are proteins (but some are RNAs!) • Proteins have complex structures that are critical for their functions Protein structure for dystrophin: encoded by the largest known gene in humans (but, dystropin is not the largest known protein in humans) BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 13 Protein Structure: 4 levels of organization BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 14 Key Aspects of Protein Function: Localization & Interactions Protein localization - function depends on proteins being in right place at right time! Protein interactions - function depends on proteins interacting with correct partners inside cells! Both of these are "hot" areas of Bioinformatics research: later, you will use machine learning to "predict" these! BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 15 Protein Sequence-Structure-Function • Amino acid sequence determines protein structure • But some proteins need help folding ("chaperones") in vivo • Proteins fold to a single "native" structure (under a specific set of conditions) • Protein structure determines function • But level, timing & location of expression are important • Interactions with other proteins, DNA, RNA, & small ligands are also very important!! PROBLEMS: • We don't know the "folding code" that determines how proteins fold! • We don't know the "recognition code" that determines how proteins find and bind their correct partners! These are "hot" areas of Computational Biology research: soon, you will try to predict protein structures & protein binding sites! BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 16 Modeling Protein Interaction Networks Is this an engineering problem? This is a "hot" area of Systems Biology research: later, we will try out "Retinal Workbench" for analyzing networks involved in retinal development BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 17 Modeling Metabolic Pathways? see MetNet http://metnet.vrac.iastate.edu/MetNet_overview.htm BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 18 Genes Genes in chromatin are not just “beads on a string” they have complex structures that we don't yet fully understand BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 19 Eukaryotic gene structure • Recall: Eukaryotic genes are fragmented, containing introns between functional exons • In human, on average genes that encode proteins include ~2000-3000 bp coding sequences, but can have >10,000 bp between exons!! • Gene sizes can vary by up to 4 orders of magnitude! BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 20 RNA Processing - Splicing Introns are removed to generate a mature mRNA DNA Transcribed RNA One gene Introns removed by splicing mRNA One protein Different combinations of exons can be used to make different proteins (alternative splicing) BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 21 Gene regulation • Transcriptional regulation is primarily mediated by proteins that interact with cis-acting DNA elements associated with genes: • DNA level (sequence-specific) regulatory signals • Promoters, terminators • Enhancers, repressors, silencers • Chromatin level (global) regulation • Heterochromatin (inactive) •e.g., X-inactivation in female mammals • In eukaryotes, genes are often regulated at other levels: • Post-transcriptional (RNA transport, splicing, stability) • Post-translational (protein localization, folding, stability) BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 22 Promoter = DNA sequences required for initiation of transcription; contain TF binding sites, usually "close" to start site • Transcription factors (TFs) - proteins that regulate transcription • (In eukaryotes) RNA polymerase binds by recognizing a complex of TFs bound at promotor First, TFs must bind TF binding sites (TFBSs) within promoters; then RNA polymerase can bind and initiate transcription of RNA ~200 bp BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs Pre-mRNA 9/26/07 23 Enhancers & repressors = DNA sequences that regulate initiation of transcription; contain TF binding sites,can be far from start site! RNAP = RNA polymerase II Promoter Enhancer Repressor 10-50,000 bp Enhancers "enhance" transcription Repressors or silencers "repress" transcription BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs Gene Enhancer binding proteins (TFs) interact with RNAP Repressor binding proteins (TFs) block transcription 9/26/07 24 Transcription factors (TFs) & their binding sites (TFBSs) • Transcription factors - proteins that either activate or repress transcription, usually by binding DNA (via a DNA binding domain) & interacting with RNA polymerase (via a "trans-activating domain) to affect rate of transcription initiation • Promotors, enhancers, and repressors - all contain binding sites for transcription factors • Promoters - usually located close to start site; • Enhancers/Silencers/Repressor sequences - can be close or very far away: located upstream, downstream or even within the coding sequence of genes !! BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 25 "Non-coding" DNA? Many genes encode RNA that is not translated 4 Major Classes of RNA: 1. mRNA = messenger RNA 2. tRNA = transfer RNA 3. rRNA = ribosomal RNA 4. "Other" - Lots of these, diverse structures & functions: "Natural" RNAs: • siRNA, miRNA, piRNA, snRNA, snoRNA, … • ribozymes • Artificial RNAs: • RNAi • antisense RNA • BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 26 Web Resources for more information: • BioTech’s Life Science Dictionary • Online textbooks – NCBI bookshelf BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 27 Algorithms & Software for MSA? #3 (NOT covered on Exam1) Heuristic Methods - continued • Progressive alignments (Star Alignment, Clustal) • Others: T-Coffee, DbClustal -see text: can be better than Clustal • Match closely-related sequences first using a guide tree • Partial order alignments (POA) • Doesn't rely on guide tree; adds sequences in order given • PRALINE • Preprocesses input sequences by building profiles for each • Iterative methods • Idea: optimal solution can be found by repeatedly modifying existing suboptimal solutions (eg: PRRN) • Block-based Alignment • Multiple re-building attempts to find best alignment (eg: DIALIGN2 & Match-Box) • Local alignments • Profiles, Blocks, Patterns - more on these soon! BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 28 Chp 6 - Profiles & Hidden Markov Models SECTION II SEQUENCE ALIGNMENT Xiong: Chp 6 Profiles & HMMs • √Position Specific Scoring Matrices (PSSMs) • √PSI-BLAST Thurs & Fri: • Profiles • Markov Models & Hidden Markov Models BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 29 Chp 7 - Protein Motifs & Domain Prediction SECTION II SEQUENCE ALIGNMENT Xiong: Chp 7 Protein Motifs and Domain Prediction • Identification of Motifs & Domains in Multple Sequence Alignment • Motif & Domain Databases Using Regular Expressions • Motif & Domain Databases Using Statistical Models • Protein Family Databases • Motif Discovery in Unaligned Sequences • Sequence Logos BCB 444/544 F07 ISU Dobbs #15 - RNA, Proteins, Promoters, TFs 9/26/07 30