Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ Introduction to Gene Mining Part A: BLASTn-off! Part A Learning objectives: Use the bioinformatics NCBI Gene and BLASTn tools to search for a human gene of interest in a plant model. Evaluate the significance of your search results to see if human and plant genes are similar. Engage: Information and Instructions Recall models that you have studied in school. Lab Notebook 1. What is the first thing that comes to mind when you think of the word “model”? Answer this before seeing slide 3. 2. List 3 examples of scientific models: slide 4 3. Why do scientists use models? slide 5 To determine whether a plant might be a useful model for experimentation, it should have certain characteristics. Watch the video and record those. https://www.youtube.com/watch?v=foHiKrlY9 Qc What underlying principle of biology enables scientists to study some human diseases in plants? Our sample question: Do plants have human muscle genes? Compare and contrast human and plant movement. Watch: http://www.bbc.co.uk/programmes/p00lx6cl And https://www.youtube.com/watch?v=eDA8rmU P5ZM What underlying principle of biology enables scientists to study some human diseases in plants? http://www.OMIM.org is an interesting portal about human genetic disorders. Open it and examine “About” and “Entry Statistics” to see what sorts of genetic information OMIM provides. In the OMIM search box, type in any human genetic disease. 4. Characteristics of the model plant Arabidopsis thaliana are: slide 7,8 5. Why might scientists use plants to study human diseases? Slides 9,10 6. Compare and contrast human and plant movement. Slide 11 Human movement Plant movement 7. How is it possible for a plant to have a version of a human gene?______________________________________________ 8. Summarize types of info you found on OMIM. Slide 12 9. List types of genetic disorders from Entry Statistics. Slide 12 10. What disease name did you enter? What did you find? Slide 13 1 Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ What do I know about muscle genes and proteins involved in movement? Use Wikipedia, Google and other open access info to find information. Then try a more specific database, like a science journal. Use more specific search engines and portals. Scientists have performed numerous experiments to understand what characteristics are shared by all living organisms or which characteristics are unique to one domain or kingdom. Historically, these experiments were lengthy but the analysis was fairly straightforward. Nowadays, scientists are able to rapidly perform experiments which generate enormous amounts of data but the data analysis may take weeks or even years. To improve analysis of their data, they may make it accessible in public databases for other scientists and mathematicians to analyze. Since plants and animals both move, do they use the same types of proteins to move? Do they have the same genes coding for these proteins? 11. What are names of muscle proteins? How many hits did you find? How specific was the information? Slides 14 and 15 12. What is an example of a data set that has increased in size over the past decade? Slide 17 ___________________________ 13. What are problems scientists have with “Big Data”? Slide 17 14. Define bioinformatics: Slide 18 15. Of what advantage would it be to a geneticist to use a bioinformatics approach to study a disease? Slide 19 16. A bioinformatics use or question that interests me (Slide 20) is __________________________________________because ___________________________________________ 17. Make your own hypothesis about whether animals and humans have the same skeletal muscle protein genes and then explain your reasoning: (No slide is needed for this question.) Animals and humans will ____ will not____ have the same muscle protein genes because I know that: 2 Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ Explore the bioinformatics tool BLASTn to find data to test your hypothesis: Information and Instructions: Lab Notebook Which genes or proteins are involved in the biological process (muscle movement)? Go to http://www.ncbi.nlm.nih.gov Use the pull down menu near the top to select Gene. Then enter: homo sapiens skeletal muscle protein. Click on SEARCH. 1. What information does the result provide? Slide 24 2. Record the Name/Gene ID and Description for the top 3 results: slide 24 Name/Gene ID Description By clicking on the Gene Name ACTA1 for actin alpha 1, you will find its gene page. Use the Summary and the links below to decide whether alpha actin 1 has functions that would be shared by a plant. https://www.youtube.com/watch?v=FzcTgrxMzZk https://www.youtube.com/watch?v=VVgXDW_8O4U Scroll down the gene page until you see Genomic regions, transcripts and products. Find FASTA and click on it. Then click on GenBank. In FASTA, copy the entire ACTA1 gene sequence and paste it into a Word document. Include the top line that begins “>gi……”. Title the document: Gene sequences. 3. Info I could use to decide whether a plant version of ACTA1 would have a similar function as the human gene: slides 25,26 4. What does FASTA show? How is FASTA format different than GenBank format? Slides 27, 28 5. Describe the acronym, types, and features of BLASTn. Slide 30,31 One bioinformatics tool used to search for genes in one organism when a gene is known in another organism is BLASTn. Go to: http://blast.ncbi.nlm.nih.gov/Blast.cgi to learn more about BLASTn. 3 Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ BLASTn the ACTA1 gene vs the Arabidopsis thaliana genome. 6. Summarize the steps and reasons for each which you used with BLASTn to submit a query for a human version of ACTA1 in Arabidopsis thaliana. Slides 33-36 Step Reason for doing that step 1 Step 1 2 3 Step 2 Step 4 Step 3 4 7. A BLASTn report shows Graphics section below. Explain the significance of the colored blocks (black, blue, green, pink, and red) and how they relate to the tracks. Slide 37 What does the solid red bar represent? ____________________________________________ What do the numbers mean? 8. What information is in the Descriptions section? Slide 38 9. On the Alignment at lower left, label what Q, S, 1, 45, 403 and 447 indicate. Tell what the vertical lines indicate and what the spaces indicate. Slide 40 4 Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ When you get an alignment result, should you trust it? How could you decide whether the result is a version or just a lucky find? 10. What is an alignment score? How is it calculated for matching 2 sequences that have gaps (insertions or deletions)? Slide 41 For additional information go to https://www.youtube.com/watch?v=mvjHYMgJDTQ For an NCBI webinar on BLAST. Watch and answer questions 11-12 (through about 8:30 into the video). 11. What is Query cover? What are its units? Slide 4345 12. What is the meaning of “Ident”? What are its units? Slide 45 13. A portion of an Alignment section is shown at left. Label the human query sequence, the aligned subject sequence, a matching pair of nucleotides and a pair of nucleotides which does not match. Slide 46 14. What does an Expect (E) value tell you about your alignment result? What value indicates an acceptable non-random alignment? Slide 47-48 5 Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ Use your BLASTn search results to answer questions In the Description section, look at the Query cover for the 16-20. Which Arabidopsis thaliana GENE is most Actin 7 gene. similar to human ACTA1? Slide 49 _____________________________________________ 16. Record the E-value. What does the value indicate about the alignment of this gene to ACTA1? Slide 49 17. The Query cover for the Arabidopsis thaliana ACT7 gene is ________%. Explain which part of the graphic section illustrates that coverage. Slide 45 18. Are there sections of the alignment that have more consistent matches than others?_________ 19. When you compare the alignment with the graphics, what do you notice? Slide 45 and 48 Under Alignments, click on the Sequence ID to find more about the aligned Arabidopsis thaliana sequence. Explain: 20. From the graphics section, propose what the areas of poorest alignment (shown in thin black lines) might be in the gene structure:_______________________ Slide 45 21. Use the ACT7 gene information page to record similar functions of Arabidopsis thaliana ACT7 and Homo sapiens ACTA1. Slides 50, 51 Before conducting the BLASTn search, you hypothesized about whether plants might have a muscle gene similar to a human muscle gene. 1. What did the data indicate about your hypothesis? 2. Which data did you use? 3. What additional information would help you determine if the two genes were similar enough to use as a research model for a particular disease? 4. Why would it be significant for your study of nemaline myopathy for the two genes to be very similar in sequence and in function? 6 Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ Extend: 1. Pick one human gene which you think is highly conserved between plants and animals. 2. Follow the procedure you just learned to see if a similar Arabidopsis version exists. 3. Record your info on the scorecard below. 4. Repeat for a gene that you predict is unique to humans. 5. Do this until you have searched for 3 genes you think are conserved between humans and plants and for 3 you think are unique to humans. Keep score! Gene Prediction Scorecard Human Gene name Gene ID Gene Function Arabidopsis Gene name Gene ID Gene Function Will I find a plant version? Explain prediction: Predict: Evidence about whether there is or is not a plant version of the gene (E values, function, alignments, etc.) Correct prediction? +1 for correct prediction Yes Yes Yes No No 7 Name________________________________Class and Period____________________Date________ Partner(s) name(s)___________________________________________________________________ No 8