Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp and www.Araport.org data to determine whether Arabidopsis thaliana and human muscle protein genes and gene products are homologous. 1 The Arabidopsis Information Portal is funded by a grant from the National Science Foundation (#DBI-1262414) and co-funded by a grant from the Biotechnology and Biological Sciences Research Council (BB/L027151/1). These lessons were developed during the summer of 2015 as education outreach for the www.Araport.org portal in conjunction with the J. Craig Venter Institute, Rockville, MD, 20850, USA. Contact information General information: araport@jcvi.org Jason Miller, Grant Co-Principal Investigator, JCVI jmiller@jcvi.org This lesson was prepared by Andrea Cobb, Ph.D. (adcobb@fcps.edu) with the help of Margot Goldberg (mgoldberg1@pghboe.net) 2 In Part A, our sample question was: Can we study your muscle disease using a plant model? 3 We used the NCBI portal to find names of human muscle genes. 4 We also found the function of human actin-alpha 1 gene ( ACTA1) and asked “ Might plants need that same function?” 5 We used NCBI BLASTn to search in Arabidopsis thaliana for genes which align to human ACTA1 . 6 We learned that “alignment” is achieved by using an algorithm that maximizes local matches between two sequences. 7 We learned how to use the BLASTn report scores with Query cover, Ident and the E-values to choose a statistically meaningful alignment. 8 --Gene Discovery Scorecard In a group of 3-4 students, examine your gene discovery scorecard and then: Infer characteristics of genes which were in both A. thaliana and humans. Identify characteristics of genes present in humans but not found in plants. 9 What information so far indicates whether or not plants have animal muscle genes? What additional information might you need to be certain whether or not plants have animal muscle genes? 10 Part B: Evaluating homology- How similar are plant and human versions of a gene? 11 Recipes handed down often change 12 Which parts of the recipes were conserved (were almost the same) in all generations’ recipes? Which parts were not conserved? 13 Reasons why a recipe might be changed • Discuss in groups and report your ideas. 14 How might you track the passage of a recipe from one generation to the next if you can’t ask the cooks? ? 15 How is a gene like a recipe? • Discuss in groups and report your ideas. 16 What features of a gene might make it a version of another gene? Record your answers. https://www.youtube.com/watch? v=gCxrkl2igGY is a song you might remember. 17 Explore • What is homology? • What criteria do scientists use to classify particular genes and their protein products as homologs? 18 • Homology- a general term describing 2 or more genes which share an ancestral gene • How might recipes be “homologous”? 19 To use a plant model for my patient’s disease, I need to find a plant homolog to his ACTA1 gene. We found that the Arabidopsis thaliana ACT7 gene is a version, but is it similar enough to be a homolog? 20 Should we search for homologs using a gene sequence or a protein sequence? 21 The structure of a eukaryotic gene is complex! Translation (protein synthesis) http://nitro.biosci.arizona.edu/courses/EEB600A2003/lectures/lecture24/lecture24.htm l The amino acid sequence of the protein is more likely to be conserved than the gene sequence 22 A BLASTp using the gene product’s amino acid sequence is likely to find protein homologs A BLASTn might find more differences than similarities 23 We will use a protein BLAST tool, BLASTp, to find homologous proteins. We need to first find the protein sequence coded by the human ACTA1 gene on the NCBI protein page. 24 From the ACTA1 protein information page, select FASTA, then copy and paste the amino acid sequence Each amino acid into a Word Document. is represented by a particular letter >gi|49168518|emb|CAG38754.1| ACTA1 [Homo sapiens] MCDEDETTALVCDNGSGLVKAGFAGDD APRAVFPSIVGRPRHQGVMVGMGQKD SYVGDEAQSKRGILTLK YPIEHGIITNWDDMEKIWHHTFYNELRV APEEHPTLLTEAPLNPKANREKMTQIMF ETFNVPAMYVAIQA VLSLYASGRTTGIVLDSGDGVTHNVPIYE GYALPHAIMRLDLAGRDLTDYLMKILTER GYSFVTTAEREI VRDIKEKLCYVALDFENEMATAASSSSLEK SYELPDGQVITIGNERFRCPETLFQPSFIG MESAGIHETT YNSIMKCDIDIRKDLYANNVMSGGTTMY PGIADRMQKEITALAPSTMKIKIIAPPERK YSVWIGGSILAS LSTFQQMWITKQEYDEAGPSIVHRKCF 25 Navigate to the BLASTp link on NCBI. 26 Paste the protein sequence for ACTA1 here. Enter Arabidopsis thaliana for the search database. Select blastp and then click on the BLAST button. 27 The BLASTp report is similar to the BLASTn report. Query sequence 28 “Descriptions” shows 4 actins with the same query coverage, E-value and Ident! There appear to be 4 possible homologous proteins but which is most similar to the human ACTA1 protein? 29 There are a number of actin proteins with high Query coverage, very low E-values and high identity. Check them all (for some whose numbers are represented more than once, check the first listing). Then select “Multiple Alignment” to directly compare those sequences. 30 Conserved amino acids are shown in red. Which differences can you find quickly? Can you spot a deletion? Where is an amino acid replaced by a chemically similar type? Where is an amino acid replaced by a chemically different type? 31 Protein sequence homology is analyzed by constructing a Distance tree of results. Check the desired “hits”, then select “Distance tree”. 32 Query—human ACTA1 protein Nodes represent a shared ancestral gene These proteins are all homologs. 33 34 Of the proteins in Arabidopsis thaliana, ACT7 has the highest identity (88%) and lowest Evalue (0.0) when compared to human ACTA1. A gene tree program predicts the presence of ancestral genes between ACT7 and ACTA1. Is that sufficient to confirm protein homology for experimental modeling? 35 A more restricted alignment between human ACTA1 and the closest 3 Arabidopsis proteins can check that ACT7 is the protein closest to the ancestral gene. Check Align two or more sequences, then copy and past protein sequences for ACT7, ACT8 and ACT2 into Subject Sequence box. 36 Multiple alignment results for human ACTA1 protein and the 3 closest Arabidopsis proteins. 37 What do the distance tree results indicate? 38 Do you have enough data to use Arabidopsis ACT7 gene as a model for the human ACTA1 gene? Discuss and report your ideas. 39 What criteria from published work indicated that these plant processes and human diseases involved homologous genes or proteins ? 40 Homologous proteins will have: • Very low E-values for sequence alignment(< .00001) • >25% conserved sequences for >100 aa* • Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog • Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved sequences and protein domains * http://jura.wi.mit.edu/bio/education/hsteache rs2012/form_blast_intro.pdf 41 Let’s find homology information and data about the Arabidopsis ACT7 gene in http://www.Araport.org Use the pull-down menu to access the ThaleMine tool. 42 Enter information about your gene of interest, in this case, ACT7 43 Results show 1 gene, 2 articles and 1 mRNA in the database. We are only interested in studying the gene for now, so we will select the category – Gene or just select the identifier for the gene from the list at right 44 This is the Gene information sheet for the Arabidopsis thaliana ACT7 gene. How did the function listed under Curator Summary compare to your previous prediction? 45 The blue bar under Curator Summary has tabs that take you quickly to that section down the page. Click on the Homology tab. Links to information about human ACT7 homologs. 46 Homologous proteins will have: • • • • Very low E-values for sequence alignment (< .00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog • Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved protein domains * http://jura.wi.mit.edu/bio/education/hsteache rs2012/form_blast_intro.pdf 47 Compare the first (human ACTA1) and second (Arabidopsis ACT7) sequences in each alignment and it is evident that many more than 25% of any 100 amino acids in any of the regions align. 48 Homologous proteins will have: • • • • Very low E-values for sequence alignment (< .00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog • Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved protein domains * http://jura.wi.mit.edu/bio/education/hsteache rs2012/form_blast_intro.pdf 49 Actin interacts with many proteins https://www.youtube.com/watch?v=FzcTgrxM zZk 50 ACT7 and ACTA1 proteins each interact with a variety of other proteins. Because the same protein may have a plant name and a different animal name, further investigation is needed to know from this data whether ACTA1 and ACT7 are interacting with identical proteins. Arabidopsis ACT7 interacts with these proteins Human ACTA1 interacts with these proteins 51 Homologous proteins will have: • • • • Very low E-values for sequence alignment (< .00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog ?? • Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved protein domains * http://jura.wi.mit.edu/bio/education/hsteache rs2012/form_blast_intro.pdf 52 Co-expression (transcription of 2 or more genes at the same time in the same cell) is required for gene products (proteins) to work together. In the image above, two differently colored fluorescent proteins are co-expressed in Arabidopsis. http://www.frontiersin.org/files/Articles/9615 0/fpls-05-00426-HTML/image_m/fpls-0500426-g001.jpg 53 What genes are co-expressed (same time, same location) for ACT7 or ACTA1? Arabidopsis ACT7 is co-expressed with these genes Scientists would need to confirm that the different plant and animal names were actually the same protein. Human ACTA1 co-expression is shown with purple lines. 54 Homologous proteins will have: • • • • Very low E-values for sequence alignment (< .00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are somewhat similar to protein-protein interactions of the other homolog ?? • Some similar co-expression of genes for each homolog ?? • Some similar Function Gene Ontology (GO terms) • Conserved protein domains * http://jura.wi.mit.edu/bio/education/hsteache rs2012/form_blast_intro.pdf 55 Gene Ontology provides information about biological process, molecular function and cellular location –are any ACT7 GO terms similar to human ACTA1 GO terms? Arabidopsis ACT7 Human ACTA1 56 Homologous proteins will have: • • • • Very low E-values for sequence alignment (< .00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are somewhat similar to protein-protein interactions of the other homolog ?? • Some similar co-expression of genes for each homolog ?? • Some similar Function Gene Ontology (GO terms) • Conserved protein domains * http://jura.wi.mit.edu/bio/education/hsteache rs2012/form_blast_intro.pdf 57 58 Homologous proteins will have: • • • • Very low E-values for sequence alignment (< .00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are somewhat similar to protein-protein interactions of the other homolog ?? • Some similar co-expression of genes for each homolog ?? • Some similar Function Gene Ontology (GO terms) • Conserved protein domains * http://jura.wi.mit.edu/bio/education/hsteache rs2012/form_blast_intro.pdf 59 Members of the Arabidopsis actin family of genes are homologous with each other. Does that mean that the Arabidopsis actins are 60 homologous with human ACTA1? Arabidopsis actin gene ACT7 plays an essential role in germination and root growth Wild-type, no ACT7 mutation Wild-type, no ACT7 mutation The Plant Journal Volume 33, Issue 2, pages 319-328, 16 JAN 2003 DOI: 10.1046/j.1365-313X.2003.01626.x http://onlinelibrary.wiley.com/doi/10.1046/j.1365-313X.2003.01626.x/full#f2 Mutant ACT7+ We have an ACT7 mutant with an observable phenotype difference compared to the normal wild type. Mutant ACT7+ 61 Have we found a suitable plant research model for nemaline myopathy? What additional information would you want? Scientific literature searches for Arabidopsis information are easy to access in http:www.Araport.org apps 50 years of Arabidopsis research! 62