The following 6 genes are induced by the addition of diamide to

advertisement
Genes and Genomes
Bioinformatics Practical
Introduction
The aim of this practical is to give you some experience of handling and interpreting
genomic and experimental data. Hopefully it will increase your understanding of
genome organization and differences and similarities between genomes from different
organisms.
The Streptomyces bacteria are best known for there ability to produce a huge range of
antibiotics and other bioactive compounds. They are members of the Actinobacteria
family of bacteria (actinomycetes). This large family includes several human
pathogens such as Mycobacterium tuberculosis. Discovering the function of genes
and their association with other genes in one actinomycete can help us predict
functions in another actinomycetes. The aim of this practical is to give you experience
of using genomic and experimental data to create hypotheses.
We are interested in regulatory mechanisms that allow Streptomyces to survive harsh
treatment, including oxidative stress. Oxidative stress is faced by all organisms that
grow aerobically. SigR is an RNA polymerase sigma factor whose activity is switched
on during oxidative stress, allowing it to switch on the SigR regulon. The SigR
regulon is switched on when unwanted disulphide bonds form between protein
cysteines in the cell. This form of oxidative stress can be induced using a thiolspecific oxidant called DIAMIDE.
Microarray data indicate that more than 50 genes are induced by the addition of
diamide to Streptomyces cultures. They are all therefore members of the “diamide
stimulon” but are NOT all regulated by SigR. The following 4 genes are members of
this diamide stimulon. All genes in the S. coelicolor genome are given a gene number,
numbered sequentially from one end of the chromosome to the other.
SCO number
SCO3889
SCO3669
SCO4956
SCO1421
Predicted Gene function
Disulphide reductase
Protein chaperone
Methionine sulphoxide reductase
RNA polymerase binding protein
Your aim is the answer the following questions.
1. How are the four genes organised in S. coelicolor – are they members of an
operon (the mRNA is polycistronic) or are they likely to be transcribed alone
(the mRNA is monocistronic)?
2. Using a bioinformatics approach which genes would you predict to be
members of the SigR regulon? How would you use microarrays to try to
PROVE that these genes are true targets of SigR
3. For each gene, what is the closest-related homologous gene in E. coli? Does E.
coli contain any paralogous version of this gene?
4. Is the organisation of genes conserved in the related actinomycete
Mycobacterium tuberculosis H37Rv (lab strain)?
5. Can you find evidence that the genes are associated with other S. coelicolor
genes, suggesting that they might be involved in the same physiological
process and/or that the protein products might interact?
6. Short essay question: Genome sequencing projects reveal many genes with
no known function. How can microarrays help us to understand the biological
roles such genes? (500 words)
Detailed Procedure
Q1. How are the six genes organised in S. coelicolor – are they members of an operon
(the mRNA is polycistronic) or are they likely to be transcribed alone (the mRNA is
monocistronic)?
 Visit the SCODB website. This site allows you to access genes using their
gene number (SCO number) and visualise their organisation in the genome.
 Type in the SCO number into the search gene box, search, then click on the
returned gene. You will see the gene in its local context on the genome. Look
to see if the gene is closely linked to another gene, upstream or downstream.
 Fill in Table 1 in the Practical Report. List the genes starting with the first
gene in the operon, through to the last gene in the operon. For this practical
assume that two genes are member of a single operon if the distance between
them is less than 50bp and that they are expressed in the same orientation.
Q2.
(a) Which genes would you predict to be members of the SigR regulon?
You will search upstream from all S. coelicolor genes for potential SigR-dependent
promoters using the search RSAT – Regulatory Sequence Analysis Tool



Click on the RSAT link, then click on “Gene-scale pattern matching –
strings”.
Choose Streptomyces coelicolor genome, Gene name, and direct strand only.
Type in the sequence of the consensus SigR promoter into the box.
GGAATnnnnnnnnnnnnnnnnnYGTT (17 n’s where Y is a pyrimidine)


When the results come back look at the gene numbers to see if any of them
match up with the first gene of each transcription units that you listed in Table
1.
Describe in your answer which genes/operons are likely to be regulated by
SigR. Note in your answer the approximate location of the promoter with
respect to the first gene in the transcription unit.
(b) How would you use microarrays to try to PROVE that these genes are true
targets of SigR?
Q3. For each gene, what is the closest-related homologue in E. coli? Does E. coli
contain any paralogous version of this gene?



Obtain the amino acid sequence of the protein using ScoDB.
Copy the sequence into memory. Click on BLAST Microbial Genomes and
paste the sequence into the box.
Choose the genome you are interested in - Escherichia coli K12.





Choose “Query: protein” and “Database: protein” in the drop down menus
which will search your sequence against every predicted E. coli protein.
Click on View report in the next page to see the results.
Scroll down to see the alignment of your protein with the top hit in E. coli.
To get the name of the gene click on the link to that gene and then scroll down
to the bottom. In the CDS section you are given the gene name (gene = “….”).
Fill in Table 3 with your results.
Note: For the purposes of this practical assume that homologous genes give Evalues <1 x 10-5. The E-value is a statistical calculation based on the score that
gives the number of hits of this score that this search would return by chance
using a database of this size. The smaller the E value the better the confidence
that the proteins really are related. In addition, assume that the homologue
with the lowest E value is an orthologue whereas a homologue with a higher E
value (but <1 x 10-5) will be a paralogue.
Q4. Is the organisation of genes conserved in the related actinomycete
Mycobacterium tuberculosis H37Rv (lab strain)?





Go to the Comprehensive Microbial Resource (CMR) page and copy in the
SCO number of the gene you are interested into the locus search box (top right
hand corner) and click search.
Choose Genome Region Comparison menu items on the left hand side.
Using the drop down menu ADD the genome you are interested in - M.
tuberculosis H37Rv. Perform search using standard settings.
You should see the S. coelicolor and M. tuberculosis genome regions
compared to each other, centred around your input gene.
Copy and paste the image into your practical report, and discuss below this the
extent of the conservation. For example if you look upstream from your input
gene, is the upstream gene homologous to the gene that is upstream from the
M. tuberculosis orthologue – ie what is the extent of the synteny between the
genomes at this locus.
Q5. Can you find evidence that the genes are associated with other S. coelicolor
genes, suggesting that they might be involved in the same physiological process
and/or that the protein products might interact?




Go to the STRING search and type in the SCO number of the gene into the
search box.
The output from the search includes a table that gives you a list of potential
interacting genes, the pieces of evidence that support this, and a score. For the
purposes of this practical assume that only overall scores above 0.9 are
significant.
Look at the individual pieces of evidence that suggests an interaction by
clicking on “Neighbourhood” or “Fusion” (ignore the others).
“Neighbourhood” will give you a visual display of the relative location of the
gene (in red) with the potential associated genes (colour coded). The names

along the left are family names e.g. the actinobacteria family include M.
tuberculosis and S. coelicolor (you can see species if you expand the list).
“Fusion” will show if is there are nay examples in the database where
homologues of your gene are fused to another gene to give a single
polypeptide (Rosetta Stone – see lecture). Discuss briefly your results for each
gene. Provide an example a “Rosetta Stone” fusion protein in another
organism, if you find one, noting the organism and the gene name.
Q6. Short essay question: Genome sequencing projects reveal many genes with no
known function. How can microarrays help us to understand the biological roles such
genes? (500 words)
Write up
You need to complete the Practical Report form. Tables are inserted to make this
easier. You can do most of the tasks in this session but you may need to complete it
afterwards.
Submission
You need to submit a HARD COPY of the practical write up by the deadline stated in
the Course Handbook. Usual rules concerning late submissions apply.
Websites
ScoDB - the Streptomyces coelicolor annotation server
http://streptomyces.org.uk/cgi-bin/sco/dc2.pl?width=900&start=4313753&end=4353753
CMR -Comprehensive Microbial Resource
http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi
Microbial Genome Blast
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi
STRING - Proteins and their Interactions
http://string.embl.de/newstring_cgi/show_input_page.pl?UserId=sbxQlBeY9Gc7&ses
sionId=LHCbWwBYanZe
RSAT – Regulatory Sequence Analysis Tool
http://rsat.ulb.ac.be/rsat/
Download