Bioinformatics project: the “genehunt”

advertisement
Bio/CS 251 Bioinformatics final project
Fall 2004
Drs. James and Leinbach
The genehunt
Scientific objective:
You will use bioinformatic approaches to identify, map, and analyze the genes contained in a poorly
characterized chunk of the genome of a dangerous pathogenic fungus, Coccidioides immitis. C.
immitis is a dimorphic fungus that can exist in either filamentous or yeastlike form. This means it can
form cells in long, branching chains (filamentous) or can switch its lifestyle to grow as single, yeastlike
cells. This is the form it takes when it establishes infections in the lungs or other tissues, where it
grows and spreads, causing great damage. More information on C. immitis and photographs of the
organism can be found here:
http://botit.botany.wisc.edu/toms_fungi/jan2002.html
http://www.doctorfungus.org/thefungi/coccidioides.htm
You will be issued a 50,000 bp (50 kb) segment of the recently sequenced genome of C. immitis. This
genome sequencing effort was performed by the Broad Institute at Massachusetts Institute of
Technology (MIT), as part of the Fungal Genome Initiative (FGI):
http://www.broad.mit.edu/annotation/fungi/fgi/
The C. immitis genome sequence is so new that it has not yet been annotated, meaning no one has
systematically identified and mapped all of its genes. Therefore, you will in a very real and true sense
be the first human to study the particular sequence assigned to you. And, you will in all likelihood be
the first person to discover the genes in this stretch of “virgin” DNA.
Project format:
Your portal of entry into the C. immitis genome is the Coccidioides immitis Database at
http://www.broad.mit.edu/annotation/fungi/coccidioides_immitis/.
This site is designed in much the same way as the Aspergillus nidulans website that you worked with in
Laboratory 9 (http://www.broad.mit.edu/annotation/fungi/aspergillus/). Refer to this lab exercise for
general information about navigating the C. immitis website, and for discovering genes, etc.
Project objectives:
A.
DNA.
Locate and identify all of the bona fide genes in the 50 kb stretch of C. immitis genomic
Produce a scale map showing the following information:
1. Gene location and gene direction.
2. Number of exons composing each gene.
3. Identity, or probable identity of each gene. If the sequence is novel, list it as a novel
hypothetical
protein.
B.
For each gene,
1. Present the predicted amino acid sequence. If the gene is distributed into multiple exons, edit
the
files so that they are merged to form a single, full-length protein sequence.
2. An alignment of the edited full-length protein sequence with (1) its closest relative (ortholog) in
another species, and (2) an ortholog that has a clear identity and function ascribed to it [(1) and
(2) may be the same alignment, or in the case where the best alignment is to a protein of
unknown function, show a second alignment to a protein of known function].
Make sure that the alignments contain the e-value + % identity and % similarity.
3. Identify any conserved domains, and briefly explain the nature and likely function of the
conserved domain.
4. List the name or names of the protein encoded by the gene, and a brief 2-3 line description
of
the protein’s function.
C. Blastp the C. immitis genome with each gene that you discovered:
1. Determine whether each gene is unique, or whether it is paralogous with other members of a
gene family.
2. Provide the output from each of the paralogy searches, including alignments.
D. Choose one gene family identified in C. above and use it to create a phylogram.
What is the minimum number of gene duplication events leading to the chosen family of genes in
C. immitis?
E. Using the same gene as in D. above, BLAST GenBank to find orthologous genes from other
species.
Determine the approximate time of origin of your chosen gene by establishing the range of
organisms
in which it can be found. In other words, is this a gene that appears to fungal-specific? Present
only
in fungi and animals? Universal in eukaryotes but absent from prokaryotes? or universal to all life?
Depending upon the level of conservation and degree of functional constraint, it may be necessary
to
perform iterative blastp searches, in other words you may need to use the C. immitis gene to
obtain the orthologous worm protein, but then use the worm protein to find the human protein, etc.
1. Choose representative orthologs spanning the widest range of organisms possible, and present
the alignment of the C. immitis gene with each ortholog.
2. Create a phylogram which includes all of the orthologs.
3. Create a phylogram which includes all of the orthologs + all of the paralogs identified in D.
above.
Provide written analysis of this phylogram. For example, is the chosen C. immitis gene more
closely related to its orthologs in other species, or to its paralogs? What does the answer to this
question indicate about the evolution of this gene and gene family?
F. Choose one conserved gene that has an ortholog in the budding yeast, Saccharomyces cerevisiae,
and perform in silico microarray analysis of the yeast ortholog, using the microarrays available at the
Saccharomyces Genome Database: http://www.yeastgenome.org/, as follows:
1. Search for the S. cerevisiae orthologs using the “Search SGD” box at
http://seq.yeastgenome.org/cgi-bin/nph-blast2sgd, and then use the ‘GO annotation’ +
‘Function Junction’ to gather information about its function in budding yeast. This
information will be located on the SGD BASIC INFORMATION page for your chosen yeast
ortholog.
2. Genomic and proteomic analysis of one gene in Saccharomyces cerevisiae.
Your entry point for each of these questions will be the SGD
BASIC INFORMATION page for your chosen gene.
3. Is this gene essential in budding yeast? What is the phenotype of
a null allele?
4. Expression analysis (DNA microarray analysis): in yeast, DNA microarray analysis is like
performing 5000+ Northern blots simultaneously, on a surface no larger than a microscope
slide. This allows one to assay the expression of 5000+ different genes in a single
experiment.
Use the SGD to analyze the expression of your chosen gene. Use the
‘Expression Connection’ to perform these analyses. Include in your
answers the links to each microarray experiment.
How does the expression of the gene vary under the following conditions
that have been assayed for all or nearly all budding yeast genes?
-- expression in response to alpha factor? (treatment of cells with
alpha factor synchronizes them at G1 phase of the cell cycle).
-- expression in response to agents that damage DNA.
-- expression during diauxic shift (what is a diauxic shift?)
-- expression in response to environmental changes
-- expression during the cell cycle
-- expression during sporulation (= meiosis)
5. Protein-protein interaction: does your protein interact with any other proteins in S.
cerevisiae?
-- use the ‘Interaction Database’ on the BASIC INFORMATION page.
For each protein-protein interaction, list the method used to detect
the interaction (two-hybrid, affinity chromatography, synthetic
lethality, etc.)
-- use the ‘Two-Hybrid (Portal Path Calling)’ function. Click to
the ‘Yeast Interaction Search Page’, then click on the gene name
link under “__ entries found for that keyword”.
-- print out the interaction maps, and incorporate them into your final
project report.
Download