Project Bioinformatics BB051B 2015-2016 This document can also be found at Blackboard and on the course website http://swift.cmbi.ru.nl/teach/LRTOM4/ Mutations in the LRTOMT gene In the final project of this course you will perform a small bioinformatics research project. You have so far learned how to use SwissProt, UniProt, EMBL, OMIM, PROSITE, PDB, MRS, BLAST, Clustal, Yasara and Ensembl. Now you have to show that you know how to apply these tools in the real world. So, use as many as possible of them when you are answering the bioinformatics question stated below. You will do an analysis of a number of DNA sequences found in patients that have a specific disease. The main goal is to gather as much information as possible for the sequences , analyze the mutations and make a comprehensive report about it. The report will have the layout of a research article. Hannie Kremer spoke about one of her projects: deafness caused by mutations in the LRTOMT gene. In her seminar she discussed several LRTOMT mutations, but all genetic disorders can have many hundreds of molecular reasons, and we are going to analyze a few LRTOMT mutations thoroughly. The goal is to explain at the molecular level whether or not you expect an effect on the phenotype of the patients, that is, do you think they are (partially or completely) deaf or suffer no consequences on their hearing? Your task We expect you to write, in teams of two students, a report of max 9 A4 (excluding appendices). Please give the names of the group members to Karen or Celia or send the group names to Celia.vanGelder@radboudumc.nl. You will make the project report in this Word file (see further in this file for the template you can use). Do not use 2 columns text on a page, use the full width of the page. Hand in your report digitally at Celia.vanGelder@radboudumc.nl as a doc(x) file (not a pdf!) Deadline for report is January 25 2016, 9:00 hr. Preparation First sit down and think about the research project. What is the meaning of the data? What is the research question that you are going to answer? Where in this project are you going to transfer information? Before starting with your research you have to make a short (max 1 A4) project plan outlining the major steps in your approach. Discuss this plan with one of the teachers or assistants. Deadline Jan 7 2016 (lunch) but preferably Dec 18, 2015!! The DNA sequences The DNA sequences that you are going to analyze are : Sequence 1 AGACATGGGTGGAAAATCACTCCTTTGTCTTTATTAAAGAAACTTAGACCAGACCTGGCAATCAAGGGGCGAGGTACTGG CCAGGAAGGTGGAGTAGGTTTCAGGCCCTGGGGATTTCAAGTGCAGA Sequence 2 CCCATGCCCTGCCCGGTGACCCTGGTCACATCCTCACCACCCTGGACCACTGGAGCAGCCGCTGCGAGTACTTGACCCACA TGGGGCCTGTCAA Sequence 3 GCTTATTGCCCGAGCCCTGCCCCCTGGGGGTCGCCTTCTTACTGTGGTGCGGGACCCACGCA Sequence 4 TTCAGGGTCCTGTCCTGTCTAGCCTGGCTTTTGGTTTCCCTCCCCAACAGATCCTAACAACTTTCTTCAACCTGAGTGTCCTC TATCTTCACGGCAACAGCATCCAGCGC Sequence 5 CCCGACCCCTCGCTTACAAGCCTTTCGGGAACTTGCCCTATACCCCCGACCCCAGTCACCTTCAGATTTAATTCC ACAGAAACAGACCCTCCCCTTTAAGGCACCCCCCCCCCCCCCGGCTCCTCCCTCTCAGGCGCCTCTCCTCACAAA CCTTACCCCCATAGATTCTGCCCTTTC Sequence 6 GGGCTGCGGATCGAGGAGCAGGCCTTCAGCTACGTGCTCACCCATGCCCTGCCCGGTGACCCTGGTCACATCCTCACCA Hints for starting up Main steps in your research will consist of: 1. Finding out the gene and protein to which the mutations belong . Hint 1: Use BLAT from Ensembl . Hint 2: If you want you can translate the sequences to get a hint about whether they are in coding or non-coding regions. You can use the Transeq tool for this (see under Links on course website) 2. Analyze the gene with Ensembl (or another genome browser). What are the mutations at DNA level and where are they located in the gene? 3. Try to predict the effect of the mutations on the phenotype of the individuals carrying this mutation. a. For mutations that influence the protein sequence you can analyze the 3D structure of the protein or in case a 3D structure is not available, transfer information from a homologous protein sequence that has a 3D structure available (we call that protein the template). If you have found one: Look at the structure and try to understand what is going on. b. For mutations that occur in other regions you will have to come up with other solutions, using the bioinformatics knowledge you have gained so far. Think about: i. Intron-exon structure. ii. Known variations in this gene. Are your variations new or have they already been discovered? iii. Regulatory regions in this gene. iv. Comparison with other organisms, e.g. rodents. This can help in finding out if you can use e.g. mouse as a model organism to study deafness. v. Everything else that you find interesting and important to mention General hints & remarks for the report: General remarks: o Choose carefully what information is relevant to show, and how you want to show it. Use figures, tables, pictures etc. to illustrate your results. You can for instance in the Introduction paragraph show schematically the domains in a protein sequence. o Don’t make your text a “bullet list” of results where the user has to draw his own conclusions. You will take the reader by the hand and lead him through your text in a story. o Avoid constructs like “We ran BLAST to…and then we did this and then we did that, etc. Write neutral texts: “Searches for homologs were performed with BLAST (version etc)” o Try to be as clear and specific as possible. When talking about a certain amino acid residue or mutation mention the protein involved, the residue name and the residue number. Do not use words like “mutation number 1 has no effect”, but rather “the A127P mutation has no effect”. o Remember to use amino acid numbering related to the protein you are studying. The wild-type (wt) protein numbering can be used as reference. o Try to make a good overview (e.g. a table) with the mutations. It is impossible to write a clear story without putting the correct residue numbers to the respective amino acids or nucleotides. 2. Figures and tables: o should be functional, comprehensive, and nearly self-explanatory o should be numbered and you must refer to them in the main text, otherwise no one will read them. o should always have a title and a legend explaining what is shown in those figures/tables. o Should not contain things that have nothing to do with the goal of the figure o Should contain labels when relevant 1. 3. Copyright: It is allowed to copy maximally 200 words literally from anywhere, provided you put double quotes around it and you provide a reference to the source from where you copied it. TEMPLATE FOR REPORT Title The report has a good title describing the topic of the study. Authors List the authors here including student numbers. Abstract The abstract will consist of a few sentences (max 6) where you summarise the main aspects of the project: Sentence 1 summarises the research question being solved in this project. Sentence 2 summarises your approach. Please don’t mention the tools used here (do not say “we used Yasara, BLAST and CLUSTAL to investigate....) but formulate the approach. Sentence 3 lists the main results/conclusion. Be specific here, do not use words like “mutation number 1” but use the amino acid name and number to indicate a mutation And that should normally be enough. If really needed, you can add a fourth sentence summarising some discussion points. Often, you will write the abstract after you have written the complete report, because it can be easily distilled from the report text 1. Important:Do not put literature references in the Abstract, nor tables or figures 2. Do not use swissprot and PDB codes in the abstract!! Describe the protein with its name & species not by its code in a database (it goes without saying that you put the proper names for the genes and proteins involved. Be precise!) Introduction (maximally 2,5A4) The introduction should explain the question and provide the background information needed for somebody of your own skill-level to understand what you have done to answer that question. The things really needed in the introduction are: 1. The molecule & and its role in biology: what is it , where does it do what etc. Describe the genomic environment of the gene, like intron-exon structure, known variations in the gene, known regulatory regions. Also describe the important functional sites of the protein sequence and structure, like active site residues, ligands, protein domain structure, known mutations and anything else you find worth mentioning 2. The mutations investigated in this study, how were they found, what is their effect. You may want to put in a table with the mutations, but this can also be part of the Results & Discussion section. Please use at least one Figure to illustrate an aspect of the biological function that you think is relevant to show . Be careful to choose a functional picture, not only a nice colourful image. Methods (max 1 A4) The report has a short Methods section where you describe the methods and tools used. In principle, others should find enough information in the Methods section to allow them to repeat your studies. Do NOT put results or discussion in the methods section. You also do not discuss here WHY you use certain tools, only HOW you use(d) them. The following sentence is an example of a good sentence that you can use: “Searches for homologs in SwissProt (version…) and PDB (has no version, so you cannot list it ) were performed with MRS BLAST. The PDB file 1ABC was used for all studies, except blabla”. Put references to the tool and databases in the References section (and not in the Methods text!). In the case of tools, you can use the website of the tool as reference. Please include the version numbers of the databases and tools. Use the right tool for the job. Use BLAST to search for homologues, use CLUSTAL for alignments. So make sure not to use BLAST (which is a local alignment tool for fast database searching) but CLUSTAL to make the crucial alignment in the Results & Discussion section.. When analyzing structure(s) you may want to use the WHAT IF servers (http://swift.cmbi.ru.nl/servers/html/) to: 1. List the sequence of a PDB file (Under Administration) 2. Analyze protein-cofactor contacts (Under Protein Analysis) 3. Analyze hydrogen bonds Results & Discussion ( maximally 6 A4) 4. In these sections the results of your study have to be described, including an explanation of the strategy you used, the steps you took, etc. They also include the final results, i.e. the answers to the biological question(s). Analyze all the mutations and describe the analysis results. Think carefully about how to report for mutations in the coding region (you will talk about amino acids, structures, protein sequence alignments) and for mutations in the non-coding regions (you will talk about gene sequences and gene properties). Important: If you are going to transfer information from one sequence to another: Indicate the E-value, the percentage identity and the length of the alignment, and make clear if trust the quality of the alignment and you are allowed to transfer information. Also describe carefully from which sequence your are transferring information and which sequence you are transferring information to. As a minimum requirement: Include a Figure showing the crucial protein sequence alignment used for transfer of information! This figure belongs here and not in the Appendix. When showing an alignment: 1. you should use a proportional font, e.g. Courier, otherwise the amino acids will not line up correctly. 2. make sure to have the names of the proteins in the alignment and not names like “sequence 1” etc. 3. Think of ways to illustrate important amino acids in your alignment. Use colours, numbers, boxes, labels, arrows etc. Be creative. Colour them and/or put a box around them. Also put the residue number of important amino acids in the alignment. 4. skip useless data. If an alignment has been made to only find out if two sequences are homologs, then you should not put the alignment in the report. But if an alignment was made to transfer information from one sequence to another, an alignment is needed. Be systematic. If you describe the mutations, name and describe them in the same order as you mentioned them in the introduction (if you did). Feel free to add small but clear 3D pictures for the mutations. If possible, combine two mutations in one picture. When showing structures/ parts of structures: o Don’t make pictures because they are nice, but because they make something clear to the reader that you think is important to get clear o Remove from pictures everything that does not add information useful for the point you want to make o In detailed pictures make sure to display the side chain atoms of the amino acids. In close-up pictures we want to look at atoms, not at ribbons! o When making pictures of molecules, use YASARA with a white background. o Remove hydrogens if showing them is not relevant. o Only label atoms or residues that you discuss in the text. o Do not zoom in too much, do not zoom out too much, think about what you want to show o Use colours only when needed to show something or to help the reader find a residue, atom, interaction, etc., and not because they make the picture nicer. Conclusion ( maximally1/2 A4) This is a short section containing the overall conclusion of your research. You can also talk about things like potential weaknesses in your study, suggestions for follow-up research and anything else you think should be discussed Acknowledgements If you want you can add Acknowledgements. References Here you give a list of references. These are typically: 1. both research papers 2. and hyperlinks (to tools, databases & webservers). Make sure that you quote the references in the text of the report Please choose one method of referencing and use this method for all references. For research articles: Please give the full literature reference and not only a PubMed link! If you want you can also the URL (e.g. of PubMed) for the paper, but it should be preceded by the full literature reference!! Example for reference to a research paper: 1. Joosten RP, Joosten K, Cohen SX, Vriend G, Perrakis A. Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank. Bioinformatics 27:3392-8 (2011). Example of reference to tool or database: 4. http://www.ensembl.org/, Release 69, consulted 16-12-2012 You also have to put a reference with a figure if you borrowed a picture from a paper There is more out there than Wikipedia! Use more sources to gather information otherwise it will cost you points. Appendices Feel free to use Appendices but choose carefully what to put in the main text, what to put in an Appendix, and what not to put in at all because you can also describe it in two sentences instead of showing a large computer output. Appendices contain additional information that is not required to follow the flow of the story you tell in the report. If you use Appendices make sure to refer to them in the main text, otherwise they will not be read at all.