Genes and Genomes Bioinformatics Practical Introduction The aim of this practical is to give you some experience of handling and interpreting genomic and experimental data. Hopefully it will increase your understanding of genome organization and differences and similarities between genomes from different organisms. The Streptomyces bacteria are best known for there ability to produce a huge range of antibiotics and other bioactive compounds. They are members of the Actinobacteria family of bacteria (actinomycetes). This large family includes several human pathogens such as Mycobacterium tuberculosis. Discovering the function of genes and their association with other genes in one actinomycete can help us predict functions in another actinomycetes. The aim of this practical is to give you experience of using genomic and experimental data to create hypotheses. We are interested in regulatory mechanisms that allow Streptomyces to survive harsh treatment, including oxidative stress. Oxidative stress is faced by all organisms that grow aerobically. SigR is an RNA polymerase sigma factor whose activity is switched on during oxidative stress, allowing it to switch on the SigR regulon. The SigR regulon is switched on when unwanted disulphide bonds form between protein cysteines in the cell. This form of oxidative stress can be induced using a thiolspecific oxidant called DIAMIDE. Microarray data indicate that more than 50 genes are induced by the addition of diamide to Streptomyces cultures. They are all therefore members of the “diamide stimulon” but are NOT all regulated by SigR. The following 4 genes are members of this diamide stimulon. All genes in the S. coelicolor genome are given a gene number, numbered sequentially from one end of the chromosome to the other. SCO number SCO3889 SCO3669 SCO4956 SCO1421 Predicted Gene function Disulphide reductase Protein chaperone Methionine sulphoxide reductase RNA polymerase binding protein Your aim is the answer the following questions. 1. How are the four genes organised in S. coelicolor – are they members of an operon (the mRNA is polycistronic) or are they likely to be transcribed alone (the mRNA is monocistronic)? 2. Using a bioinformatics approach which genes would you predict to be members of the SigR regulon? How would you use microarrays to try to PROVE that these genes are true targets of SigR 3. For each gene, what is the closest-related homologous gene in E. coli? Does E. coli contain any paralogous version of this gene? 4. Is the organisation of genes conserved in the related actinomycete Mycobacterium tuberculosis H37Rv (lab strain)? 5. Can you find evidence that the genes are associated with other S. coelicolor genes, suggesting that they might be involved in the same physiological process and/or that the protein products might interact? 6. Short essay question: Genome sequencing projects reveal many genes with no known function. How can microarrays help us to understand the biological roles such genes? (500 words) Detailed Procedure Q1. How are the six genes organised in S. coelicolor – are they members of an operon (the mRNA is polycistronic) or are they likely to be transcribed alone (the mRNA is monocistronic)? Visit the SCODB website. This site allows you to access genes using their gene number (SCO number) and visualise their organisation in the genome. Type in the SCO number into the search gene box, search, then click on the returned gene. You will see the gene in its local context on the genome. Look to see if the gene is closely linked to another gene, upstream or downstream. Fill in Table 1 in the Practical Report. List the genes starting with the first gene in the operon, through to the last gene in the operon. For this practical assume that two genes are member of a single operon if the distance between them is less than 50bp and that they are expressed in the same orientation. Q2. (a) Which genes would you predict to be members of the SigR regulon? You will search upstream from all S. coelicolor genes for potential SigR-dependent promoters using the search RSAT – Regulatory Sequence Analysis Tool Click on the RSAT link, then click on “Gene-scale pattern matching – strings”. Choose Streptomyces coelicolor genome, Gene name, and direct strand only. Type in the sequence of the consensus SigR promoter into the box. GGAATnnnnnnnnnnnnnnnnnYGTT (17 n’s where Y is a pyrimidine) When the results come back look at the gene numbers to see if any of them match up with the first gene of each transcription units that you listed in Table 1. Describe in your answer which genes/operons are likely to be regulated by SigR. Note in your answer the approximate location of the promoter with respect to the first gene in the transcription unit. (b) How would you use microarrays to try to PROVE that these genes are true targets of SigR? Q3. For each gene, what is the closest-related homologue in E. coli? Does E. coli contain any paralogous version of this gene? Obtain the amino acid sequence of the protein using ScoDB. Copy the sequence into memory. Click on BLAST Microbial Genomes and paste the sequence into the box. Choose the genome you are interested in - Escherichia coli K12. Choose “Query: protein” and “Database: protein” in the drop down menus which will search your sequence against every predicted E. coli protein. Click on View report in the next page to see the results. Scroll down to see the alignment of your protein with the top hit in E. coli. To get the name of the gene click on the link to that gene and then scroll down to the bottom. In the CDS section you are given the gene name (gene = “….”). Fill in Table 3 with your results. Note: For the purposes of this practical assume that homologous genes give Evalues <1 x 10-5. The E-value is a statistical calculation based on the score that gives the number of hits of this score that this search would return by chance using a database of this size. The smaller the E value the better the confidence that the proteins really are related. In addition, assume that the homologue with the lowest E value is an orthologue whereas a homologue with a higher E value (but <1 x 10-5) will be a paralogue. Q4. Is the organisation of genes conserved in the related actinomycete Mycobacterium tuberculosis H37Rv (lab strain)? Go to the Comprehensive Microbial Resource (CMR) page and copy in the SCO number of the gene you are interested into the locus search box (top right hand corner) and click search. Choose Genome Region Comparison menu items on the left hand side. Using the drop down menu ADD the genome you are interested in - M. tuberculosis H37Rv. Perform search using standard settings. You should see the S. coelicolor and M. tuberculosis genome regions compared to each other, centred around your input gene. Copy and paste the image into your practical report, and discuss below this the extent of the conservation. For example if you look upstream from your input gene, is the upstream gene homologous to the gene that is upstream from the M. tuberculosis orthologue – ie what is the extent of the synteny between the genomes at this locus. Q5. Can you find evidence that the genes are associated with other S. coelicolor genes, suggesting that they might be involved in the same physiological process and/or that the protein products might interact? Go to the STRING search and type in the SCO number of the gene into the search box. The output from the search includes a table that gives you a list of potential interacting genes, the pieces of evidence that support this, and a score. For the purposes of this practical assume that only overall scores above 0.9 are significant. Look at the individual pieces of evidence that suggests an interaction by clicking on “Neighbourhood” or “Fusion” (ignore the others). “Neighbourhood” will give you a visual display of the relative location of the gene (in red) with the potential associated genes (colour coded). The names along the left are family names e.g. the actinobacteria family include M. tuberculosis and S. coelicolor (you can see species if you expand the list). “Fusion” will show if is there are nay examples in the database where homologues of your gene are fused to another gene to give a single polypeptide (Rosetta Stone – see lecture). Discuss briefly your results for each gene. Provide an example a “Rosetta Stone” fusion protein in another organism, if you find one, noting the organism and the gene name. Q6. Short essay question: Genome sequencing projects reveal many genes with no known function. How can microarrays help us to understand the biological roles such genes? (500 words) Write up You need to complete the Practical Report form. Tables are inserted to make this easier. You can do most of the tasks in this session but you may need to complete it afterwards. Submission You need to submit a HARD COPY of the practical write up by the deadline stated in the Course Handbook. Usual rules concerning late submissions apply. Websites ScoDB - the Streptomyces coelicolor annotation server http://streptomyces.org.uk/cgi-bin/sco/dc2.pl?width=900&start=4313753&end=4353753 CMR -Comprehensive Microbial Resource http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi Microbial Genome Blast http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi STRING - Proteins and their Interactions http://string.embl.de/newstring_cgi/show_input_page.pl?UserId=sbxQlBeY9Gc7&ses sionId=LHCbWwBYanZe RSAT – Regulatory Sequence Analysis Tool http://rsat.ulb.ac.be/rsat/