《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 Experiments in Bioinformatics Content Experiment 1 Analysis of DNA sequence………….….……2-4 Experiment 2 Analysis of protein sequence………....….….5-8 Experiment 3 Multiple sequence alignments…………......9-11 Experiment 4 Analysis of gene function…………..……..12-13 Experiment 5 Literature retrieval using PubMed……....14-19 Experiment 6 Design primers for PCR and draw a plasmid map…………………………….20-21 Appendix Useful website address………..………...…22-30 1 向太和 《生物信息学》实验操作指导 Experiment 1 杭州师范大学生命与环境科学学院 向太和 Analysis of DNA sequence THE PURPOSE 1. Molecular databases: you will learn how to use and understand molecular databases that store the wealth of information that is so useful to the molecular biologist, such as finding and retrieving sequence in public databases, how to read the coding of database entries, etc. 2. Similarity searching: perform your own similarity searches of provided “unknown” sequences on the nucleotide databases with Blast, the most popular sequence alignment search tool. You have been given an unknown sequence to identify, but no clues as to what it is. The provider wants an unbiased opinion. LIST OF MATERIALS AND TOOLS cab or SOD gene sequences; GenBank accession number: AY580883.1 Unknown DNA sequences: >X1 1 gacatactct actactacta gccagtaagc tagctaacta actacgtggc tatggccccc 61 accgtgatgg cctcctcggc cacctccgtg gctccattcc aagggctcaa gtccaccgcc 121 gggctccccg tcagccgccg ctccaccaac tcgggcttcg gcaacgtcag caatggcgga 181 aggatcaagt gcatgcaggt gtggccaatt gagggcatca agaagttcga gaccctatcg 241 tacctgccac cactcaccgt ggaggatctt ttgaagcaga tcgagtacct gcttcgatcc 301 aagtgggtgc cttgcctcga gttcagcaag gttggattcg tctaccgtga gaaccacagg 361 tctcccgggt actacgatgg caggtattgg accatgtgga agctgcccat gttcggctgc 421 accgatgcca cccaggtgct caaggagctc gaggaggcca agaaggccta ccccgatgcc 481 tttgtccgta tcatcggctt cgacaacgtc aggcaggtgc agttgattag cttcatcgcc 541 tacaagcccc caggttgcga ggagtctggt ggcaactaag ctaagatcaa gcatcgcgct 601 ggtggattgc tgcctataat aatagtatgc agctttgttt tgggctatgt tgatgatata 661 tcaatatata atatgctata tatttttatt ttacagtttg gttatgtacc atctcaatgg 2 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 721 cctctgctct taacacatat gtaataatct cttccctccc tctccggccg gttttattgt 781 aagagtacta caattatcgt tgggtgagga tatgtgaaaa caaagctccg gctatataca 841 cacaaaaaaa aaaaaaaaaa Databases: three main public databases(GenBank,EMBL and DDBJ). Tools: PubMed;Entrez and SRS;Blast PROCEDURE Step 1: Obtaining a sequence of interest There are many ways to obtain a sequence of interest (cab or SOD gene in this experiment) through www.biosino.org.cn or www.cbi.pku.edu.cn (China). Search GenBank (or EMBL and DDBJ) for sequences of interest (US). Search PubMed, a public version of full Medline for topics of interest (US). Search a variety of sequence and structures databases using the SRS server at EMBL (Germany) or Entrez server at NCBI (US). Step 2: Reading database entries (records) Step 3: Find similar sequence in the databases with Blast 1. Go to Blast at NCBI 2. Enter the Blast Server WWW page (www.ncbi.nlm.nih.gov/blast) If for any reason, you cannot access the Blast server directly, you can use any Blast server mirror or link, such as the mirror at Peking University (www.cbi.pku.edu.cn) and the link at www.biosino.org.cn, 3. Select the program: Blastn This is the Blast program that will compare a nucleotide query sequence against a nucleotide database. 4. Select the DNA database: nr (without GHT and ESTs and so on) This is the main GenBank nucleotide database. 5. Ignore the matrix option. It is not used by Blastn. 6. Select sequence input format: Plain Text 3 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 You will be submitting the nucleotide sequence in plain text. 7. Select the following: Gapped Alignment: ON; Blast filter: ON; Graphic Output: ON. These are all ON by default. 8. Paste the query sequence into the specified area. 9. Hit the button: Run Blast 10. Wait as your query is processed by the server. 11. Examine the output. Step 4: Reading the output of Blast Step 5: Understanding Blast Copy the second unknown query sequence into the pasting window and run a same Basic Blast search. Examine the result. QUESTIONS FOR DISCUSSION 1. There are three main different searching programs (Blast, FASTA and BLITZ) available. Which program is best to use for a certain type of sequence? 2. Explain the result of Step 5. 4 《生物信息学》实验操作指导 Experiment 2 杭州师范大学生命与环境科学学院 向太和 Analysis of protein sequence THE PURPOSE 1. Protein sequence databases: There are two major, non-specialised protein databases that you will frequently encounter: PIR and SWISS-PROT. Unlike the three major nucleotide databases, the entries in PIR and SWISS-PROT are not mirrored (copied). Each one has it's advantages and disadvantages, which you should consider before deciding which database to search. You will learn how to use and understand the entries in those databases. 2. Protein databases searching: Protein database searching is the most important method to master. It is between two and five times more sensitive than DNA database searching. Perform similarity searching in protein database which you specify with the same programs (Blast or FASTA). LIST OF MATERIALS AND TOOLS Protein ID: CAA32643.1 and CAA00826.1 A human amino acid sequence: MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKV LRLFEENDVNLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIG ATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPV YRARRKQFADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHA CYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRD FLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFS QEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGEL QYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAA TIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQKIK Databases: PIR and SWISS-PROT; PDB and PAHdb Tools: Blast or FASTA 5 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 PROCEDURE Step 1. Obtaining the sequences of interest and examine the results (not necessary in this experiment) Search PubMed, a public version of full Medline for topics of interest (US). Search a variety of sequence and structures databases using the SRS server at EMBL (Germany) or Entrez server at NCBI (US). Search PIR (or SWISS-PROT) for sequences of interest (US). Step 2. A exercise of searching In the exercise given below, you will integrate the knowledge you have gained from last experiment and classroom. You should also realize how easy it is to use other databases and related sources of information, particularly now that you have an understanding of the molecular databases. 1. Go to a sequence alignment program of your choice. You might choose to use: The ExPASy Blast server. Or the GeneStream FASTA server. 2. Copy the human amino acid sequence (given in the one letter code). 3. Paste the sequence into the query sequence window and adjust the options as necessary. You won't need to specify advanced options, but you should choose a program and database. For simplicity, please use the main SWISS-PROT database. You may wish to try other databases, but you should return to SWISS-PROT when continuing with this exercise. 4. Run the search and identify the protein. Select the following: Gapped Alignment: ON; Blast filter: ON; Graphic Output: ON. Matrix: blosum 62 5. Use the link provided to see the SWISS-PROT report. If the link fails for any reason, you can do a text search of SWISS-PROT. Go to SWISS-PROT and search by the identifier you identified after the BLAST or FASTA search. 6 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 Step 3. Answer the following questions and correct them Now, try to answer all of the questions below. You may need to look at pages that are linked from the SWISS-PROT report, but you will not need to search further than the first page of any site. Answering all of the questions may take some time, but you will get a feel for what is available, and how to get it. You may even find yourself becoming fascinated by the report, and exploring on your own! Write down the answers, and see if you got them right by comparing your answers to the correct answers on the next page. 1. What is the SWISS-PROT name of the entry? 2. What is the SWISS-PROT primary accession number? 3. What is the most common name of the protein? 4. What is the gene called? 5. Which year was the nucleotide sequence of the full-length complementary DNA of human phenylalanine hydroxylase gene cloned? Which year was the crystal structure of the catalytic domain determined? Name the first names of authors. 6. Does the enzyme require a co-factor to function? If so, what? 7. Name the most common disease that arises as a result of deficiency of this enzyme. 8. Which cytogenetic locus does the gene reside at? (e.g. 13p10.1) 9. What is the PAHdb? 10. How many amino acid residues are there in the protein? 11. What is the molecular weight of the protein? 12. More tasks (if you can): Look briefly at entries in GeneCards, MIM (Mendelian Inheritance in Man), obtain the nucleic acid sequence and locate a FASTA report for the protein sequence. View a three-dimensional (3D) image of the protein that the gene codes for (Hint: PDB stores such files!). Exercise answers: 1. What is the SWISS-PROT name of the entry? PH4H_Human 2. What is the SWISS-PROT primary accession number? 7 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 P00439 3. What is the name of the protein? Phenylalanine-4-Hydroxylase 4. What is the gene name? PAH 5. Which year was the nucleotide sequence of the full-length complementary DNA of human phenylalanine hydroxylase gene cloned? Which year was the crystal structure of the catalytic domain determined? Name the first names of authors. 1985, Kwok SCM 1997, Erlandsen H 6. Does the enzyme require a co-factor to function? If so, what? Yes. A ferrous ion. 7. Name the most common disease that arises as a result of deficiency of this enzyme. Phenylketonuria (PKU). 8. Which cytogenetic locus does the gene reside at? (e.g. 13p10.1) 12q22-q24.2 9. What is the PAHdb? It's the Phenylalanine Hydroxylase Locus Database of mutations. A specialised database that concentrates purely on PAH. It includes comprehensive entries about PAH mutations. 10. How many amino acid residues are there in the protein? 452 11. What is the molecular weight of the protein? 51.862 kDa QUESTIONS FOR DISCUSSION 1. Protein database searching is between two and five times more sensitive than DNA database searching. Why? 8 《生物信息学》实验操作指导 Experiment 3 杭州师范大学生命与环境科学学院 向太和 Multiple sequence alignments THE PURPOSE Multiple sequence alignments are a powerful tool for investigating the relationship between structure and function in biomolecules. Such alignments represent the evolutionary history of the group of sequences. This evolutionary history is a record of successful mutagenesis experiments carried out by nature on a family of macromolecules. The multiple sequence alignment demonstrates the extent to which specific residues may be changed without destroying the essential structure and function of the macromolecule. At the same time they can identify which residues must be changed to create a new and different function within a similar structural framework. Thus multiple sequence alignments are a valuable source of information that can be brought to bear in many ways while investigating the properties, characteristics, and functions of macromolecules. ClustwalW or GeneDoc are multiple sequence alignment editor. The softwares provide a combination of alignment editing and alignment analysis capabilities intended to help users refine their alignments. These refinements should be directed toward the goal creating an alignment that accurately reflects the evolutionary history of the sequences being aligned. The editing can be directed toward aligning specific sequence residues to reflect structural or biochemical information that could not be incorporated in the initial alignment procedure. In this experiment you should learn to use ClustwalW or GeneDoc for aligning and editing three sequences obtained from a biotechnology lab. LIST OF MATERIALS AND TOOLS Three sequences are as follows: >Y1 MSKRPADIIISAPASKARRRLNFDSPYVSRAAAPIVRVTKARSWTNRPMNR KPKMYRMYRSPDVPRGCEGPCKVQSFDAKNDIGHMGKVLCLSDVTRGI 9 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 GLTHRVGKRFCVKSLYFVGKIWMDENIKVKNHTNTVLFWIVRDRRPTGT PNDFQQVFNVYDNEPSTATVKNDQRDRFQVIRRFQATVTGGQYAAKDQA IIRKFYRVNNYVVYNQEAGKYENHTENALLLYMACTHASNPVYATLKVR SY FYDSVTN >Y2 MSKRPADIIISTPASKVRRRLNFDSPYVSRAAAPIVRVTKARAWANRPMNR KPRMYRMYRSPDVPRGCEGPCKVQSFESRHDIQHIGKVMCVSDVTRGT GLTHRVGKRFCVKSVYVLGKVWMDENIKTKNHTNSVMFFLVRDRRPVD KPQDFGEVFNMFDNEPSTATVKNVHRDRYQVLRKWHATVTGGQYASKE QALVKKFVRVNNYVVYNQQEAGKYENHSENALMLYMACTHASNPVYAT LKIRI YFYDSVT N >Y3 MAKRPADIIISTPASKVRRRLNFDSPYGARAVVPIARVTKAKAWTNRPMN RKPRMYRMYRSPDVPRGCEGPCKVQSFESRHDVSHIGKVMCVSDVTRG TGLTHRVGKRFCVKSVYVLGKIWMDENIKTKNHTNSVMFFLVRDRRPTG SPQDFGEVFNMFDNEPSTATVKNMHRDRYQVLRKWHATVTGGTYASKE QALVRKFVRVNNYVVYNQQEAGKYENHTENALMLYMACTHASNPVYAT LKIRIYFYDSATN Tools: ClustalW by Thompson JD, Higgins DG and Gibson TJ, 1994 or GeneDoc by Nicholas KB and Nicholas HB, 1997. PROCEDURE Method One: Use ClustalW Step 1. Go to ClustalW at EMBL (http://www.ebi.ac.uk/clustalw) or DDBJ (http://clustalw.genome.jp) Step 2. Copy and input (paste) the three sequences with FASTA at plain box. Step 3. Run ClustalW program, do alignments of protein sequences and waiting for output. Step4 Understand the result report. Method Two: Use GeneDoc Step 1. Download and extract GeneDoc file and this guide 10 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 Download those file from internet website,eg: http://www.psc.edu/biomed/genedoc, ftp://ftp.psc.edu/biomed/genedoc/gdsrc262.zip Step 2. Copy and input (paste) the three sequences Run GeneDoc program and create a new file. Open Edit Sequences List at Project menu. Copy the three sequences in this guide and press Input button input (paste) them, respectively. Then choose Done button Step 3. Change the alignment shading Change the options in Auto Shade Sequences and Auto Shading Mode sub-manual at Sequence menu for different purposes and commands. Step 4. Edit the multiple sequences alignment manually Use the options in toolbar, such as Arrange Sequences mode, Insert Dashes mode, Delete Dashes mode, Insert other mode, Delete other mode, to edit the alignment manually (for details see Help menu Index option) Step 5. Understand the result reports There are several result reports created for a certain alignment after GeneDoc analysis. The statistical report is important result for evaluating similarity of your alignment. Read the Statistical Report of this experiment at Reports menu. QUESTIONS FOR DISCUSSION 1. The multiple sequence alignments usually can be optimized manually based on the report of computer. Please make a trial for the alignment of GeneDoc. Write down your manual satistical report and compare it with that of GeneDoc. Compare the difference of GenDoc and ClustalW alignment. 11 《生物信息学》实验操作指导 Experiment 4 杭州师范大学生命与环境科学学院 向太和 Analysis of gene function THE PURPOSE Analysis of gene function is the most important part of bioinformatics. How to find genes and predict their functions attract many researchers and commercial investment. There are two general approaches to gene finding. The homology-based methods include the use of known mRNA sequences as well as gene families and inter-specific sequence comparisons. The ab initio methods include detection of exons and other sequence signals, like splice sites, by various computational methods within the sequence being analyzed. In this experiment, you will analyze an unknown DNA sequence and predict its function through the two approaches. LIST OF MATERIALS AND TOOLS Two contigs from the finished sequence of rice chromosome 1 by Rice Genome Research Program (RGP, RGP website: http://rgp.dna.affrc.go.jp) of Japan: accession number AP003610 (clone name P0402A09, 141966 bp, Chr 1) and AP003214 (clone name OSJNBa0083M16, 138711 bp, Chr1). Their cM distances of genetic map at the long arm direction of chromosome 1 are 0 and 10.9 respectively. The annotation of AP003610 sequence has been completed meanwhile AP003214 is not. Databases: SWISS-PROT, dbEST, BLOCK Tools: GENSCAN and RiceHHM; ORF Finder, GENEDOC; Blast GENSCAN: http://genes.mit.edu/GENSCAN.html RiceHHM: http://rgp.dna.affrc.go.jp/RiceHMM/index.html ORF Finder: http://ncbi.nlm.nih.gov/gorf PROCEDURE Step 1. Identify ORFs and translate into protein Search sequence of AP003610 with ORF Finder at NCBI. 12 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 DNA sequence of your ORFs translation into protein tool at ExPASy (http://www.expasy.org). Step 2. Find similar sequences in the databases Step 3. Do a global alignment of your sequence vs similar sequences Even though, the previous Blast search engines provide local alignments (alignment of the similar regions), a global alignment (alignment of all regions) may help getting a better insight about your target sequence. Pairwise sequence alignment query at Baylor College of Medicine (US) or software GENEDOC. Step 4. Look for gene families Analyze multiple sequence alignments at the AMAS (http://barton.ebi.ac.uk/servers/amas_server.html) server at Oxford University (UK) or software GENEDOC. AMAS website: http://www.compbio.dundee.ac.uk Step 5. Enter ExPASy protein datebase. Look for the presence of specific patterns in your protein Step 6. Determine the putative structure of your protein Step 7. Obtain information about function of related proteins Step 8. Another approach: use computational algorithms to model genes and make preditions Many such programs are available. You can use GENESCAN server at Stanford University (USA) and RiceHMM server (particular for this experiment of rice sequence) at RGP (Japan). Step 9. Check your predictions with the annotation of AP003610 by RGP Step 10. Make annotation of AP003214 sequence for RGP It is too much sequence data for researchers of RGP to annotate them in time. You may make annotation of AP003214 sequence (uncompleted based on the sequencing status of RGP in 2001/8/10) for RGP and send your report to RGP or GenBank. Then wait for their answers or normal public annotation. QUESTIONS FOR DISCUSSION 1. What is a predicted gene? 13 《生物信息学》实验操作指导 Experiment 5 杭州师范大学生命与环境科学学院 向太和 Literature retrieval using PubMed THE PURPOSE: PubMed, a database that provides access to bibliographic information from MEDLINE and other life sciences literature, is a service of the National Library of Medicine. ﹡PubMed is free to anyone with Internet access. ﹡PubMed provides access to the databases, MEDLINE, OLDMEDLINE, plus In Process and Publisher Supplied records. ﹡PubMed is updated daily. Purpose: To teach users to effectively search for information using PubMed. Goals: ①. Conduct a simple search. ②. Conduct an advanced search using the MeSH Browser. PROCEDURE 1. Accessing PubMed There are some ways to access the PubMed system. One way of direct access is to open http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=dukemlib in your web browser 2. System Features 3. Searching PubMed Simple Search Entering Search Terms In the query window/search box, enter the key concepts to be searched. You can enter one or more terms or phrases at a time. If no connecting words (Boolean operators, such as AND or OR) are entered, terms will automatically be combined using AND. If you need to use connecting 14 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 words, put them in all capital letters. Once all of the terms are entered, click on "Go." Retrieval The next screen displays the results of your search. The format in which your results display can be changed by clicking on the drop-down menu beside the "Display" button. (For example: to display abstracts, click on the drop-down menu and "abstracts.") You can also choose to sort your results by author, journal, or publication date by clicking on the "Sort" 15 向太和 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 drop-down menu. Refining the Search To review how the system has translated your search and determine if your search needs to be refined, click on the "Details" button. The information on the Details screen will tell you if your key terms were searched as MeSH (Medical Subject Headings) terms or as text words. The point of doing this is to ensure that you have retrieved the kind of information you want. It is often more effective to have the database search for the key terms as MeSH terms than as simple text words. MeSH can be more comprehensive, and using them does not require you to remember all the different synonyms for a concept or idea. (Note: MeSH terms are automatically exploded in PubMed.) Sometimes, however, there is no MeSH term to describe a concept and it becomes necessary to be creative and think of all the different ways to search for that piece of information. Advanced Search MeSH Browser You can access the "MeSH Browser" by clicking on the link on the left side of your screen. In the query window/search box, enter one keyword or phrase, then click on "Go." You will either be presented with a list of possible matches OR routed to the MeSH term used to describe the keyword or phrase. Path 1: If you are routed to a list of possible matches, highlight the term that relates to your topic. Click on "Browse this term." If none of the subject headings accurately describes the concept or idea, search from the Simple Search screen using keywords. 16 向太和 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 Path 2: If you are routed directly to a MeSH term, you will see a brief definition of the subject heading and a display of the MeSH vocabulary structure, or "tree," where the word appears. Click on "Detailed display." From this screen, you will have the option of choosing subheadings, majoring the MeSH term, exploring related MeSH terms, and not exploding a term. Combining in the MeSH Browser If you find an appropriate subject heading in the MeSH Browser, click on the "Add" button to add it to your search strategy. Repeat this process to add additional terms to your strategy. Once you have all the necessary concepts incorporated into your search strategy, you are ready to run the search. Click on "PubMed Search." Limits The "Limits" button, available from the "Features Bar," allows you to further refine your search. You can restrict your results to a particular language, age group, information field, gender, human or animal, publication type, or publication date. Click on "Go" to limit your search. History The "History" button, located on the "Features Bar," gives you the ability to review the different strategies that you have used to search for information and their results. If you are interested in looking at articles from a previously conducted search, click on the hyperlinked number (under the column labeled "Result") that corresponds to the set. You can also combine searches using this feature. Clipboard 17 向太和 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 The "Clipboard" creates a list of citations to print or save, from items that you have marked and added to the "Clipboard." As you view your search results, place checks in the boxes beside the citations you want. Click on "Clip Add." The articles you picked should now have a green number beside them. To look at your complete list of marked citations, click on "Clipboard" from the "Features Bar." You can remove an item from the "Clipboard" by placing a check beside it and clicking on "Clip Remove." Printing Printing in PubMed is a function of your Web Browser. Before printing, however, you can reformat the screen to print simple text by clicking on the "Text" button. This feature allows you to fit more citations per page. Under the "File" menu, choose "Print." Saving To save citations to a disk, click on the "Save" button. When the "Save as" box appears, be sure to change the name of the file to something meaningful and the file extension to .txt Click on "Save." QUESTIONS FOR DISCUSSION 1. Do a simple search for articles written by Maria B. Grant. Which of the following articles is in your results for Maria B. Grant? a. Grant MB, et al. [See Related Articles] The contribution of adult hematopoietic neovascularization. 18 stem cells to retinal 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 Adv Exp Med Biol. 2003; 522: 37-45. Review. No abstract available. b. Grant BM, et al. [See Related Articles] Use of intraoral cassettes for dental xeroradiography. Oral Surg Oral Med Oral Pathol. 1978 Nov;46(5):717-20. c. Needham CW. [See Related Articles] In response to Dr. Maria Lenaz's letter to the editor, "Ethics in managed care". Conn Med. 1998 Feb;62(2):108-9. No abstract available. 2. Which of these searches will retrieve MORE articles? ①. Vaccine AND vaccination ②. Vaccine OR vaccination ③. Vaccine NOT vaccination 19 向太和 《生物信息学》实验操作指导 Experiment 6 杭州师范大学生命与环境科学学院 向太和 Design primers for PCR and draw a plasmid map THE PURPOSE 1. You will learn how to use bioinformatics software to design primers for PCR, and how to make a useful plasmid map. LIST OF MATERIALS AND TOOLS Primer3 software on line: http://frodo.wi.mit.edu Plasmid processor: http://www.hytti.uku.fi/%7Eoikari/plasmid.html You can also get the software by link at http://www.bio-soft.net. PROCEDURE Step 1. Download sequence of AY871310 and any sequence more than 1000bp from GenBank, EMBL and DDBJ respectively. Step 2. Go to primer3 website. Paste your sequences. Primers picking conditions are as follows: 1. PCR product size ranges are 600-750 bp. 2. Primers size optimum is 22 bp. 3. Primers Tm (temperature melting) optimum is 58.0. Step 3. Get your primers sequences and understand the reports. Step 4. Download Plasmid processor at website of http://www.hytti.uku.fi/%7Eoikari/plasmid.html and install in your computer. Step5. Draw a plasmid. The information of a plasmid is as follows: 1. The length is 3000 bp. 2. The 24th, 38th and 2300th locations have EcoRⅠ, SalⅠ and BamHⅠsites respectively. 3. From 50th to 480th is ampicillin (Ap) resistant gene. 20 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 QUESTIONS FOR DISCUSSION What are optimum primers for PCR? 21 向太和 《生物信息学》实验操作指导 Appendix 杭州师范大学生命与环境科学学院 Useful website address (生物信息学一些重要的网络地址) 一、国际 3 大核酸数据库以及我国的部分核酸数据库 GenBank: http://www.ncbi.nlm.nih.gov/genbank EMBL: http://www.ebi.ac.uk/embl DDBJ: http://www.ddbj.nig.ac.jp/index-e.html 北京大学生物信息学中心(Centre of Bioinformatics, Peking University): http://www.cbi.pku.edu.cn 北京华大基因研究中心: http://www.genomics.cn/index.php 清华大学生物系生物信息研究室: http://www.bioinfo.tsinghua.edu.cn 中国科学院上海生命科学研究院生物信息中心: http://www.biosino.org.cn 二、基因组数据库 大肠杆菌 E Coli——ECDC 数据库 http://www.uni-giessen.de/~gx1052/ECDC/ecdc.htm 酵母菌 Yeast ——CYGD 数据库 http://mips.gsf.de/genre/proj/yeast/index.jsp 线虫 Caenorhabditis elegans——AceDB 数据库 http://www.acedb.org http://elegans.swmed.edu/genome.shtml http://www.wormbase.org 果蝇 Drosophila——FlyBase 数据库 http://flybase.bio.indiana.edu 老鼠 Mouse——MGD 数据库 http://www.informatics.jax.org http://www.ncbi.nlm.nih.gov/genome/guide/mouse 22 向太和 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 小鼠 Rat http://www.ncbi.nlm.nih.gov/genome/guide/rat 牛 Cow http://locus.jouy.inra.fr/cgi-bin/bovmap/intro2.pl 羊 Sheep http://www.sheepgenetics.org.au http://www.sheepgenomics.com 鸡 Chicken http://www.ri.bbsrc.ac.uk/chickmap/chickbase/manager.html 斑马鱼 Zebra fish http://zfish.uoregon.edu 人类 Human——GDB 数据库 http://gdbwww.gdb.org http://www.ncbi.nlm.nih.gov/genome/guide/human 拟南芥 Arabidopsis——TAIR(AtDB)数据库 http://www.arabidopsis.org http://www.kazusa.or.jp/kaos http://www.tigr.org/tdb/e2k1/ath1 棉花 Cotton http://cottondb.org 豆类 Beans http://beangenes.cws.ndsu.nodak.edu http://www.nenno.it/Beanref 玉米 Maize http://www.agron.missouri.edu 水稻 Rice——RGP 数据库 http://rgp.dna.affrc.go.jp http://www.genomics.org.cn http://compbio.dfci.harvard.edu/tgi 大豆 Soya http://soybase.agron.iastate.edu 23 向太和 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 目前完成全基因组测序工作的物种有很多,并在随时更新(update)。可以进入 ncbi 的基因组计划二次数据库查看,其网址: http://www.ncbi.nlm.nih.gov/Genomes 三、蛋白质数据库 ExPASy: http://www.expasy.org MIPS——Munich Information Centre for Protein Sequences: http://www.helmholtz-muenchen.de/en/ibis PDB: http://www.rcsb.org/pdb (美国) http://www.ebi.ac.uk/pdb (欧洲) NRL-3D: http://pir.georgetown.edu/pirwww/dbinfo HSSP: http://www.cmbi.kun.nl/gv/hssp SCOP http://scop.mrc-lmb.cam.ac.uk/scop CATH: http://www.cathdb.info TransFac: http://www.gene-regulation.com/pub/databases.html 蛋白质回环数据库: http://www.bmm.icnet.uk/loop 蛋白质结构预测数据库: http://www.embl-heidelberg.de/predictprotein/predictprotein.html Prosite(蛋白质序列功能位点数据库): http://cn.expasy.org/prosite DSSP (Definition of Secondary Structure of Proteins): http://www.cmbi.kun.nl/gv/dssp 24 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 FSSP (Families of Structural Similar Proteins): http://ekhidna.biocenter.helsinki.fi/dali 四、DNA 和蛋白质结构功能的预测 Genomic DNA and cDNA, 利用 ORF Finder 软件发现开放阅读框 ORF(Open reading frame): http://www.ncbi.nlm.nih.gov/projects/gorf 编码蛋白质中的氨基酸序列蛋白质结构域预测,利用 SMART(Simple Modular Rrvhitecture Research Tool)软件: http://smart.embl-heidelberg.de 利用 ScanProsite 软件,进行蛋白质基序(motif)预测: http://www.expasy.org/tools/scanprosite 利用软件 NRL-3D,进行蛋白质三维结构的预测: http://swissmodel.expasy.org http://www.ncbi.nlm.nih.gov/structure 利用 Blastn 或 Blastp 软件,对 GenBank 数据库中相似性和同源性的核酸或者蛋白质进行搜 索: http://www.ncbi.nlm.nih.gov/blast 利用 Genscan 等软件,对 genomic DNA 和 cDNA 进行基因功能预测: GENSCAN: http://genes.mit.edu/GENSCAN.html GeneFinder: http://genomic.sanger.ac.uk/gf/gf.shtml http://linux1.softberry.com/berry.phtml Gene Feature Searches: http://searchlauncher.bcm.tmc.edu/seq-search/gene-search.html Grail: http://compbio.ornl.gov/Grail-1.3 GrailEXP: http://grail.lsd.ornl.gov/grailexp GeneMark: http://opal.biology.gatech.edu/GeneMark/eukhmm.cgi GENEID: http://www1.imim.es/software/geneid Genlang: 25 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 http://diana.cslab.ece.ntua.gr Glimmer: http://www.cbcb.umd.edu/software/glimmer MZEF: http://www.cshl.org/genefinder 利用 RiceHMM 等软件,对模式植物水稻进行基因预测: http://rgp.dna.affrc.go.jp/RiceHMM/index.html 利用 Compute pI/Mw 软件,对基因编码的蛋白质进行等电点(pI)和分子量(Mw)的预测: http://www.expasy.org/tools/pi_tool.html 利用 Promoter 软件,对 genomic DNA 进行 Promoter 的预测: http://www.fruitfly.org/ http://tools.genome.duke.edu/generegulation/McPromoter 五、DNA 或蛋白质序列相似性比较 Blast,有多种形式,根据比较的需要选择: http://www.ncbi.nlm.nih.gov/blast ClustalW, 可以进行多条 DNA 或者蛋白质序列的比较: http://www.ebi.ac.uk/clustalw (欧洲) http://align.genome.jp (日本) FASTA: http://www.ebi.ac.uk/fasta BLITZ: http://www.ebi.ac.uk/searches/blitz_input.html 六、分子进化树和系谱分析 Mega http://www.megasoftware.net/mega.html PAUP: http://paup.csit.fsu.edu/index.html ClustalW: http://www.ebi.ac.uk/clustalw GCG package: http://www.gcg.com 26 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 PHYLIP: http://evolution.genetics.washington.edu/phylip.html MEGA http://imeg.psu.edu/ http://www.megasoftware.net Hennig86: http://www.cladistics.org/education/hennig86.html GAMBIT: http://genomics.ucla.edu/gambit Phylogenetic analysis: http://www.ucmp.berkeley.edu/subway/phylo/phylosoft.html http://evolution.genetics.washington.edu/phylip.html 七、中外文文献和查询数据库 PubMed: http://www.ncbi.nlm.nih.gov/PubMed USDA: http://www.nal.usda.gov SCI: http://isiwebofknowledge.com Providing links to the world's electronic journals: http://www.e-journals.org CNKI 中国期刊网全文数据库: http://www.cnki.net 重庆维普: http://www.tydata.com www.cqvip.com 万方数据库: http://www.wanfangdata.com.cn 国家科技图书文献中心 (National Science and Technology Library) http://www.nstl.gov.cn 27 向太和 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向太和 杭州市科技信息网(杭州市 IP 可以免费使用文献数据库) http://www.hznet.com.cn SRS: http://www.ebi.ac.uk/srs/srsc 多种数据库、分析工具和生物信息学机构: http://www.unl.edu/stc-95/Restools/biotools 多种数据库和分析工具: http://www.ebi.ac.uk/Tools 生物软件网的网址,提供了多种非常有用的生物信息学软件的链接地址(link address): http://www.bio-soft.net 八、其它有用的网址 The Eukaryotic Promoter Database: http://www.epd.isb-sib.ch http://www.genome.ad.jp/dbget/dbget2.html GRAMENE Home http://www.gramene.org/resources/ Plant R gene database: http://www.ncgr.org Sanger Centre: www.sanger.ac.uk Animal Genome Size Database: http://www.genomesize.com ENZYME (Enzyme Data Bank): http://www.expasy.org/enzyme Electronic PCR: http://www.ncbi.nlm.nih.gov/unists http://www.ncbi.nlm.nih.gov/sutils/e-pcr 对测定的序列,进行去除载体污染的在线免费服务: http://www.embl-ebi.ac.uk/blastall/vectors.html http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html 28 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 向 GenBank 提交序列: http://www.ncbi.nlm.nih.gov/BankIt http://www.ncbi.nlm.nih.gov/WebSub/?tool=genbank PCR 引物设计,Primer3 在线免费设计: http://frodo.wi.mit.edu/primer3 http://www.ncbi.nlm.nih.gov/tools/primer-blast Oligo 软件网址: http://www.mbinsights.com http://www.oligo.net DNA 序列酶切位点 NEB lab 或 webcutter 在线免费分析的软件网址: http://tools.neb.com/NEBcutter2/index.php http://www.firstmarket.com/cutter/cut2.html 质粒绘图软件 SimVector 2.01 http://www.premierbiosoft.com/plasmidmap/plasmidmap.html 质粒图谱的绘制软件 Plasmid Processor 软件,在线免费下载: http://hznugene.spaces.live.com http://iubio.bio.indiana.edu/soft/molbio/ibmpc/plasmid-processor-readme.html 质粒图谱的绘制软件 DMUP 软件,在线免费下载: http://hznugene.spaces.live.com http://www.bioinformatics.org/annhyb/dmup.php 质粒作图的在线软件 PlasMapper2.0 http://wishart.biology.ualberta.ca/PlasMapper 质粒图谱的绘制软件 pDRAW32 软件: http://www.acaclone.com Comparative sequence analysis: http://www.bork.embl-heidelberg.de 测序图谱分析 Chromas http://www.technelysium.com.au 序列的拼接 Sequencer ftp://genecodes.com/pub/SequencherPC.zip microRNA 数据库 29 向太和 《生物信息学》实验操作指导 杭州师范大学生命与环境科学学院 http://microrna.sanger.ac.uk 30 向太和