Functional Genomics, Final 2009 1. In 2008, the Venter team successfully assembled a complete synthetic Mycoplasma genitalium genome. However, they stopped short of transplanting the synthetic genome into a donor cytoplasm to spark the flame of synthetic life. a) Given that existing technology only allows efficient synthesis of 5-7 kb oligonucleotides, how is it possible to synthesize an genome from scratch? b) Compare and contrast the genomes of 1) wild type M. genitalium, 2) M. genitalium JCVI-1.0, and 3) the theoretical minimal M. genitalium derived genome. c) Could a synthetic Mycoplasma genome be “sparked” to life using an E. coli donor cytoplasm? Why or why not? According to Venter, how might this have been an advantage in his research? 2. Assuming RNAi can be delivered successfully into a cell of interest, what is one potential problem with using RNAi to seek the function of a novel gene? How can this potential problem be used to the researchers advantage? 3. RNA interference is an excellent technique for doing targeted genetics studies, especially in organisms that do not readily perform homologous recombination with exogenous DNA, and it also has other advantages, as well as some disadvantages that impede its use as a research tool and as a potential therapeutic technique. What are some of these advantages and disadvantages? 4. How can microarrays be used to do comparative analysis of transcripts? Why would you want to do such an experiment? 5. What is the basic RNA-seq methodology? And how can it be used to improve on learning transcriptome expression levels over the microarray technology? 6. In “The complete genome of an individual by massively parallel DNA sequencing” why would the experimenters think it was surprising that Watson was homozygous for four-base pair deletion in SGEF. What was the researchers explanation for this and what did the 4-bp deletion ultimately suggest in reference to Watson’s genome and about the use of reference genomes in general? 7. Please explain the molecular technique the researchers in the paper “The complete genome of an individual by massively parallel DNA sequencing” used to validate the SNPs found by 454 sequencing and alignment. What did the “technique” reveal as far as accuracy and how did they explain the low accuracy? (Hints: Table 2, and Fig. 1b) 8. The research lab you work in has recently discovered an mRNA transcript and the corresponding protein (PrNew) in the M. genetalium. After a comparison to the predicted gene sequences generated by OTTO, you find that the new transcript/protein correspond to a hypothetical gene/protein. A) What is a hypothetical gene, and how would a program like OTTO predict one. B) A bioinformatics study based on gene sequence, protein sequence and protein structure yielded high similarity scores to lactate dehydrogenase. You think that lactate dehydrogenase and PrNew may have complimentary/compensatory functions. How could you test if PrNew and lactate dehydrogenase have similar functions in a cell? i.e interact with similar proteins, undergo similar reactions etc. C) During a gene essentiality study, both are deemed unessential as single knockouts, can this data be extrapolated to consider them as unessential overall, in any circumstance? Detail an experiment to back up your statement. 9. After graduating from Western you enter the job market and are hired by a biotechnology firm that has just received a hefty Gates Foundation grant to develop new treatment strategies for influenza. The hope is to be better prepared for the anticipated pandemic when avian influenza subtype H5N1, commonly referred to as “Bird Flu”, mutates and becomes more transmissible and lethal to humans. The H5N1 genome has been sequenced, along with other less pathogenic strains of influenza that commonly infect humans. Your lab has infected tissue samples available. Your research group leader believes that determining the H5N1 protein interaction network and comparing it to other influenza interactomes may be of assistance in developing a treatment strategy. How would you go about doing this? After the interaction networks have been determined, how would you go about analyzing and comparing them? 10. You are working with a microorganism and found a gene that has not been discovered. Curious about the function of the gene you set out to do a study on that gene. What method can you use to study this gene? You don’t know whether this is an essential gene or not and this microorganism is known for its high rate of nonhomologous recombination. 11. In the study about the NEXTGEN sequencing of Watson’s genome, a reference genome was used to find SNPs, CNVs and indels . Why was a reference genome necessary? Was the reference genome sufficient? Why or Why not? 12. In terms of Pharmacogenomics, how will the Human Genome Project help a subset of the population? What particular subset of the population will the data from the Human Genome Project benefit and why? 13. In “ . . . Cloning of a Mycoplasma genitalium Genome” assembly of the genome was carried out in two stages. The first stage was in vitro assembly of synthetic cassettes and the second in vivo assembly by recombination in yeast. These two methods employed the use of different cloning vectors. What initially prompted the use of two different vectors? Please discuss the process of assembly in both cases, making note of similarities as well as differences between the two vectors used. Also, include the following terms in your discussion, -NotI restriction sites -T4 polymerase -3’ exonuclease -Recombination ‘hooks’ 14. Explain three ethical issues addressed in the paper “The complete genome of an individual by massively parallel sequencing”. Explain how each of these issues were addressed for Watson’s genome sequence and how this is relevant in regards to sequencing the genomes of the general public. 15. Based on what was presented on transcriptomics by Todd, Mark, and Tom: If you were given a microarray that was correlated with a mutant phenotype (meaning expression levels where done on a +phenotype, -phenotype, then both placed on a microarray to measure relative expression levels) and it was done over time, first, how would to try to figure out meaningful data from the array? Secondly, what are the problems of doing this, (include problems in general of any microarray study) and how would you try to solve them? And third, how would you test/study this information about the expression levels to find cause-effect? 16. In the paper “Gene expression profiling predicts clinical outcome of breast cancer”, the expression profiles of genes associated with cancerous tissue are used to predict disease outcome. Approximately 25,000 human genes are screened using microarray technology. Compare and contrast the benefits of microarry and RNA sequencing for use in researching the transcriptone of cancer tissue. 17. Explain what a candidate gene is and how they can be identified using at least two techniques we have learned in class. Why do candidate genes have such a high failure rate in terms of discovering adverse drug reactions? 18. Using examples from the presentation, why is it important that pharmaceutical companies expand their drug trials into a wider range of racial backgrounds? How will advances in the sequencing of the human genome aid these trails and why might pharmaceutical companies not be inclined to expand their studies? 19. You are working with the CDC on a virus with a very broad host range, including most eukaryotic cells. You’ve identified a gene on the virus that when altered, attenuates virulence. Your hypothesis is that the viral product of this gene interacts with host proteins(s). In fact, preliminary data shows that there is one interacting protein in your host model organism. You perform bioinformatics and learn that it is part of a large gene family, with unknown function. Further, the gene family though ubiquitous, has <20% amino acid similarity. You need to write a grant to get funding. a. Convince the grant panel that you have a Plan A to find host interacting proteins, with confirmation. And a Plan B. b. For the second part of the grant, you ask for money to sequence many, many viral genomes that have known hosts (the gene sequences of the hosts are already known.) Why? What will you do with the viral sequences? Hint: Why might this particular host protein family have low amino acid similarity? 20. What are the barriers to and benefits of pharmacogenomics and how are companies working to make personalized healthcare a reality for all (or at least those who can afford it)? 21. In the network article the researchers have tested and eventually predicted meaningful protein networks by discovering protein-protein interactions in Kaposi’s sarcoma-associated herpesvirus (KSHV), varicella zoster virus (VZV) and their interactions in the human protein network. What benefit(s) could result from making this network? 22. The authors discussed three types of genetic variation present in Watson’s genome. Choose two of these types of variation for discussion. Touch on the methods of identification and validation. As well, include a summery of the analysis, relevance, and possible shortcomings and/or future possibilities for this type of variation.