the purpose - 杭州师范大学生命与环境科学学院

advertisement
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
Experiments in Bioinformatics
Content
Experiment 1 Analysis of DNA sequence………….….……2-4
Experiment 2 Analysis of protein sequence………....….….5-8
Experiment 3 Multiple sequence alignments…………......9-11
Experiment 4 Analysis of gene function…………..……..12-13
Experiment 5 Literature retrieval using PubMed……....14-19
Experiment 6 Design primers for PCR and draw
a plasmid map…………………………….20-21
Appendix
Useful website address………..………...…22-30
1
向太和
《生物信息学》实验操作指导
Experiment 1
杭州师范大学生命与环境科学学院
向太和
Analysis of DNA sequence
THE PURPOSE
1. Molecular databases: you will learn how to use and understand molecular databases
that store the wealth of information that is so useful to the molecular biologist, such as
finding and retrieving sequence in public databases, how to read the coding of
database entries, etc.
2. Similarity searching: perform your own similarity searches of provided “unknown”
sequences on the nucleotide databases with Blast, the most popular sequence
alignment search tool. You have been given an unknown sequence to identify, but no
clues as to what it is. The provider wants an unbiased opinion.
LIST OF MATERIALS AND TOOLS
cab or SOD gene sequences;
GenBank accession number: AY580883.1
Unknown DNA sequences:
>X1
1 gacatactct actactacta gccagtaagc tagctaacta actacgtggc tatggccccc
61 accgtgatgg cctcctcggc cacctccgtg gctccattcc aagggctcaa gtccaccgcc
121 gggctccccg tcagccgccg ctccaccaac tcgggcttcg gcaacgtcag caatggcgga
181 aggatcaagt gcatgcaggt gtggccaatt gagggcatca agaagttcga gaccctatcg
241 tacctgccac cactcaccgt ggaggatctt ttgaagcaga tcgagtacct gcttcgatcc
301 aagtgggtgc cttgcctcga gttcagcaag gttggattcg tctaccgtga gaaccacagg
361 tctcccgggt actacgatgg caggtattgg accatgtgga agctgcccat gttcggctgc
421 accgatgcca cccaggtgct caaggagctc gaggaggcca agaaggccta ccccgatgcc
481 tttgtccgta tcatcggctt cgacaacgtc aggcaggtgc agttgattag cttcatcgcc
541 tacaagcccc caggttgcga ggagtctggt ggcaactaag ctaagatcaa gcatcgcgct
601 ggtggattgc tgcctataat aatagtatgc agctttgttt tgggctatgt tgatgatata
661 tcaatatata atatgctata tatttttatt ttacagtttg gttatgtacc atctcaatgg
2
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
721 cctctgctct taacacatat gtaataatct cttccctccc tctccggccg gttttattgt
781 aagagtacta caattatcgt tgggtgagga tatgtgaaaa caaagctccg gctatataca
841 cacaaaaaaa aaaaaaaaaa
Databases: three main public databases(GenBank,EMBL and DDBJ).
Tools: PubMed;Entrez and SRS;Blast
PROCEDURE
Step 1: Obtaining a sequence of interest
There are many ways to obtain a sequence of interest (cab or SOD gene in this
experiment) through www.biosino.org.cn or www.cbi.pku.edu.cn (China).

Search GenBank (or EMBL and DDBJ) for sequences of interest (US).

Search PubMed, a public version of full Medline for topics of interest (US).

Search a variety of sequence and structures databases using the SRS server at
EMBL (Germany) or Entrez server at NCBI (US).
Step 2: Reading database entries (records)
Step 3: Find similar sequence in the databases with Blast
1. Go to Blast at NCBI
2. Enter the Blast Server WWW page (www.ncbi.nlm.nih.gov/blast)
If for any reason, you cannot access the Blast server directly, you can use any Blast
server mirror or link, such as the mirror at Peking University (www.cbi.pku.edu.cn)
and the link at www.biosino.org.cn,
3. Select the program: Blastn
This is the Blast program that will compare a nucleotide query sequence against a
nucleotide database.
4. Select the DNA database: nr (without GHT and ESTs and so on)
This is the main GenBank nucleotide database.
5. Ignore the matrix option.
It is not used by Blastn.
6. Select sequence input format: Plain Text
3
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
You will be submitting the nucleotide sequence in plain text.
7. Select the following:
Gapped Alignment: ON;
Blast filter: ON;
Graphic Output: ON.
These are all ON by default.
8. Paste the query sequence into the specified area.
9. Hit the button: Run Blast
10. Wait as your query is processed by the server.
11. Examine the output.
Step 4: Reading the output of Blast
Step 5: Understanding Blast
Copy the second unknown query sequence into the pasting window and run a same
Basic Blast search. Examine the result.
QUESTIONS FOR DISCUSSION
1. There are three main different searching programs (Blast, FASTA and BLITZ)
available. Which program is best to use for a certain type of sequence?
2. Explain the result of Step 5.
4
《生物信息学》实验操作指导
Experiment 2
杭州师范大学生命与环境科学学院
向太和
Analysis of protein sequence
THE PURPOSE
1. Protein sequence databases: There are two major, non-specialised protein databases
that you will frequently encounter: PIR and SWISS-PROT. Unlike the three major
nucleotide databases, the entries in PIR and SWISS-PROT are not mirrored (copied).
Each one has it's advantages and disadvantages, which you should consider before
deciding which database to search. You will learn how to use and understand the
entries in those databases.
2. Protein databases searching: Protein database searching is the most important
method to master. It is between two and five times more sensitive than DNA database
searching. Perform similarity searching in protein database which you specify with
the same programs (Blast or FASTA).
LIST OF MATERIALS AND TOOLS
Protein ID: CAA32643.1 and CAA00826.1
A human amino acid sequence:
MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKV
LRLFEENDVNLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIG
ATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPV
YRARRKQFADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHA
CYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRD
FLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFS
QEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGEL
QYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAA
TIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQKIK
Databases: PIR and SWISS-PROT; PDB and PAHdb
Tools: Blast or FASTA
5
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
PROCEDURE
Step 1. Obtaining the sequences of interest and examine the results (not
necessary in this experiment)

Search PubMed, a public version of full Medline for topics of interest (US).

Search a variety of sequence and structures databases using the SRS server at
EMBL (Germany) or Entrez server at NCBI (US).

Search PIR (or SWISS-PROT) for sequences of interest (US).
Step 2. A exercise of searching
In the exercise given below, you will integrate the knowledge you have gained from
last experiment and classroom. You should also realize how easy it is to use other
databases and related sources of information, particularly now that you have an
understanding of the molecular databases.
1. Go to a sequence alignment program of your choice. You might choose to use:
The ExPASy Blast server.
Or the GeneStream FASTA server.
2. Copy the human amino acid sequence (given in the one letter code).
3. Paste the sequence into the query sequence window and adjust the options as
necessary. You won't need to specify advanced options, but you should choose a
program and database. For simplicity, please use the main SWISS-PROT database.
You may wish to try other databases, but you should return to SWISS-PROT when
continuing with this exercise.
4. Run the search and identify the protein. Select the following:
Gapped Alignment: ON;
Blast filter: ON;
Graphic Output: ON.
Matrix: blosum 62
5. Use the link provided to see the SWISS-PROT report. If the link fails for any
reason, you can do a text search of SWISS-PROT. Go to SWISS-PROT and search by
the identifier you identified after the BLAST or FASTA search.
6
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
Step 3. Answer the following questions and correct them
Now, try to answer all of the questions below. You may need to look at pages that are
linked from the SWISS-PROT report, but you will not need to search further than the
first page of any site. Answering all of the questions may take some time, but you will
get a feel for what is available, and how to get it. You may even find yourself
becoming fascinated by the report, and exploring on your own! Write down the
answers, and see if you got them right by comparing your answers to the correct
answers on the next page.
1. What is the SWISS-PROT name of the entry?
2. What is the SWISS-PROT primary accession number?
3. What is the most common name of the protein?
4. What is the gene called?
5. Which year was the nucleotide sequence of the full-length complementary DNA of human
phenylalanine hydroxylase gene cloned? Which year was the crystal structure of the
catalytic domain determined?
Name the first names of authors.
6. Does the enzyme require a co-factor to function? If so, what?
7. Name the most common disease that arises as a result of deficiency of this enzyme.
8. Which cytogenetic locus does the gene reside at? (e.g. 13p10.1)
9. What is the PAHdb?
10. How many amino acid residues are there in the protein?
11. What is the molecular weight of the protein?
12. More tasks (if you can): Look briefly at entries in GeneCards, MIM (Mendelian
Inheritance in Man), obtain the nucleic acid sequence and locate a FASTA report for
the protein sequence. View a three-dimensional (3D) image of the protein that the
gene codes for (Hint: PDB stores such files!).
Exercise answers:
1. What is the SWISS-PROT name of the entry?
PH4H_Human
2. What is the SWISS-PROT primary accession number?
7
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
P00439
3. What is the name of the protein?
Phenylalanine-4-Hydroxylase
4. What is the gene name?
PAH
5. Which year was the nucleotide sequence of the full-length complementary DNA of human
phenylalanine hydroxylase gene cloned? Which year was the crystal structure of the
catalytic domain determined?
Name the first names of authors.
1985, Kwok SCM
1997, Erlandsen H
6. Does the enzyme require a co-factor to function? If so, what?
Yes. A ferrous ion.
7. Name the most common disease that arises as a result of deficiency of this enzyme.
Phenylketonuria (PKU).
8. Which cytogenetic locus does the gene reside at? (e.g. 13p10.1)
12q22-q24.2
9. What is the PAHdb?
It's the Phenylalanine Hydroxylase Locus Database of mutations. A specialised
database that concentrates purely on PAH. It includes comprehensive entries about
PAH mutations.
10. How many amino acid residues are there in the protein?
452
11. What is the molecular weight of the protein?
51.862 kDa
QUESTIONS FOR DISCUSSION
1. Protein database searching is between two and five times more sensitive than DNA
database searching. Why?
8
《生物信息学》实验操作指导
Experiment 3
杭州师范大学生命与环境科学学院
向太和
Multiple sequence alignments
THE PURPOSE
Multiple sequence alignments are a powerful tool for investigating the
relationship between structure and function in biomolecules. Such alignments
represent the evolutionary history of the group of sequences. This evolutionary history
is a record of successful mutagenesis experiments carried out by nature on a family of
macromolecules. The multiple sequence alignment demonstrates the extent to which
specific residues may be changed without destroying the essential structure and
function of the macromolecule. At the same time they can identify which residues
must be changed to create a new and different function within a similar structural
framework. Thus multiple sequence alignments are a valuable source of information
that can be brought to bear in many ways while investigating the properties,
characteristics, and functions of macromolecules.
ClustwalW or GeneDoc are multiple sequence alignment editor. The softwares
provide a combination of alignment editing and alignment analysis capabilities
intended to help users refine their alignments. These refinements should be directed
toward the goal creating an alignment that accurately reflects the evolutionary history
of the sequences being aligned. The editing can be directed toward aligning specific
sequence residues to reflect structural or biochemical information that could not be
incorporated in the initial alignment procedure. In this experiment you should learn to
use ClustwalW or GeneDoc for aligning and editing three sequences obtained from a
biotechnology lab.
LIST OF MATERIALS AND TOOLS
Three sequences are as follows:
>Y1
MSKRPADIIISAPASKARRRLNFDSPYVSRAAAPIVRVTKARSWTNRPMNR
KPKMYRMYRSPDVPRGCEGPCKVQSFDAKNDIGHMGKVLCLSDVTRGI
9
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
GLTHRVGKRFCVKSLYFVGKIWMDENIKVKNHTNTVLFWIVRDRRPTGT
PNDFQQVFNVYDNEPSTATVKNDQRDRFQVIRRFQATVTGGQYAAKDQA
IIRKFYRVNNYVVYNQEAGKYENHTENALLLYMACTHASNPVYATLKVR
SY FYDSVTN
>Y2
MSKRPADIIISTPASKVRRRLNFDSPYVSRAAAPIVRVTKARAWANRPMNR
KPRMYRMYRSPDVPRGCEGPCKVQSFESRHDIQHIGKVMCVSDVTRGT
GLTHRVGKRFCVKSVYVLGKVWMDENIKTKNHTNSVMFFLVRDRRPVD
KPQDFGEVFNMFDNEPSTATVKNVHRDRYQVLRKWHATVTGGQYASKE
QALVKKFVRVNNYVVYNQQEAGKYENHSENALMLYMACTHASNPVYAT
LKIRI YFYDSVT N
>Y3
MAKRPADIIISTPASKVRRRLNFDSPYGARAVVPIARVTKAKAWTNRPMN
RKPRMYRMYRSPDVPRGCEGPCKVQSFESRHDVSHIGKVMCVSDVTRG
TGLTHRVGKRFCVKSVYVLGKIWMDENIKTKNHTNSVMFFLVRDRRPTG
SPQDFGEVFNMFDNEPSTATVKNMHRDRYQVLRKWHATVTGGTYASKE
QALVRKFVRVNNYVVYNQQEAGKYENHTENALMLYMACTHASNPVYAT
LKIRIYFYDSATN
Tools: ClustalW by Thompson JD, Higgins DG and Gibson TJ, 1994
or GeneDoc by Nicholas KB and Nicholas HB, 1997.
PROCEDURE
Method One: Use ClustalW
Step 1. Go to ClustalW at EMBL (http://www.ebi.ac.uk/clustalw) or DDBJ
(http://clustalw.genome.jp)
Step 2. Copy and input (paste) the three sequences with FASTA at plain box.
Step 3. Run ClustalW program, do alignments of protein sequences and waiting for output.
Step4 Understand the result report.
Method Two: Use GeneDoc
Step 1. Download and extract GeneDoc file and this guide
10
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
Download those file from internet website,eg: http://www.psc.edu/biomed/genedoc,
ftp://ftp.psc.edu/biomed/genedoc/gdsrc262.zip
Step 2. Copy and input (paste) the three sequences

Run GeneDoc program and create a new file.

Open Edit Sequences List at Project menu.

Copy the three sequences in this guide and press Input button input (paste) them,
respectively. Then choose Done button
Step 3. Change the alignment shading
Change the options in Auto Shade Sequences and Auto Shading Mode sub-manual at
Sequence menu for different purposes and commands.
Step 4. Edit the multiple sequences alignment manually
Use the options in toolbar, such as Arrange Sequences mode, Insert Dashes mode,
Delete Dashes mode, Insert other mode, Delete other mode, to edit the alignment
manually (for details see Help menu Index option)
Step 5. Understand the result reports
There are several result reports created for a certain alignment after GeneDoc analysis.
The statistical report is important result for evaluating similarity of your alignment.
Read the Statistical Report of this experiment at Reports menu.
QUESTIONS FOR DISCUSSION
1. The multiple sequence alignments usually can be optimized manually based on the
report of computer. Please make a trial for the alignment of GeneDoc. Write down
your manual satistical report and compare it with that of GeneDoc. Compare the
difference of GenDoc and ClustalW alignment.
11
《生物信息学》实验操作指导
Experiment 4
杭州师范大学生命与环境科学学院
向太和
Analysis of gene function
THE PURPOSE
Analysis of gene function is the most important part of bioinformatics. How to find
genes and predict their functions attract many researchers and commercial investment.
There are two general approaches to gene finding. The homology-based methods
include the use of known mRNA sequences as well as gene families and inter-specific
sequence comparisons. The ab initio methods include detection of exons and other
sequence signals, like splice sites, by various computational methods within the
sequence being analyzed.
In this experiment, you will analyze an unknown DNA sequence and predict its
function through the two approaches.
LIST OF MATERIALS AND TOOLS
Two contigs from the finished sequence of rice chromosome 1 by Rice Genome
Research Program (RGP, RGP website: http://rgp.dna.affrc.go.jp) of Japan: accession
number AP003610 (clone name P0402A09, 141966 bp, Chr 1) and AP003214 (clone
name OSJNBa0083M16, 138711 bp, Chr1). Their cM distances of genetic map at the
long arm direction of chromosome 1 are 0 and 10.9 respectively. The annotation of
AP003610 sequence has been completed meanwhile AP003214 is not.
Databases: SWISS-PROT, dbEST, BLOCK
Tools: GENSCAN and RiceHHM; ORF Finder, GENEDOC; Blast
GENSCAN: http://genes.mit.edu/GENSCAN.html
RiceHHM: http://rgp.dna.affrc.go.jp/RiceHMM/index.html
ORF Finder: http://ncbi.nlm.nih.gov/gorf
PROCEDURE
Step 1. Identify ORFs and translate into protein

Search sequence of AP003610 with ORF Finder at NCBI.
12
《生物信息学》实验操作指导

杭州师范大学生命与环境科学学院
向太和
DNA sequence of your ORFs translation into protein tool at ExPASy
(http://www.expasy.org).
Step 2. Find similar sequences in the databases
Step 3. Do a global alignment of your sequence vs similar sequences
Even though, the previous Blast search engines provide local alignments (alignment
of the similar regions), a global alignment (alignment of all regions) may help getting
a better insight about your target sequence. Pairwise sequence alignment query at
Baylor College of Medicine (US) or software GENEDOC.
Step 4. Look for gene families
Analyze
multiple
sequence
alignments
at
the
AMAS
(http://barton.ebi.ac.uk/servers/amas_server.html) server at Oxford University (UK) or
software GENEDOC.
AMAS website: http://www.compbio.dundee.ac.uk
Step 5. Enter ExPASy protein datebase. Look for the presence of specific
patterns in your protein
Step 6. Determine the putative structure of your protein
Step 7. Obtain information about function of related proteins
Step 8. Another approach: use computational algorithms to model genes and
make preditions
Many such programs are available. You can use GENESCAN server at Stanford
University (USA) and RiceHMM server (particular for this experiment of rice
sequence) at RGP (Japan).
Step 9. Check your predictions with the annotation of AP003610 by RGP
Step 10. Make annotation of AP003214 sequence for RGP
It is too much sequence data for researchers of RGP to annotate them in time. You
may make annotation of AP003214 sequence (uncompleted based on the sequencing
status of RGP in 2001/8/10) for RGP and send your report to RGP or GenBank. Then
wait for their answers or normal public annotation.
QUESTIONS FOR DISCUSSION
1. What is a predicted gene?
13
《生物信息学》实验操作指导
Experiment 5
杭州师范大学生命与环境科学学院
向太和
Literature retrieval using PubMed
THE PURPOSE:
PubMed, a database that provides access to bibliographic information from
MEDLINE and other life sciences literature, is a service of the National Library of
Medicine.
﹡PubMed is free to anyone with Internet access.
﹡PubMed provides access to the databases, MEDLINE, OLDMEDLINE, plus In
Process and Publisher Supplied records.
﹡PubMed is updated daily.
Purpose: To teach users to effectively search for information using PubMed.
Goals: ①. Conduct a simple search. ②. Conduct an advanced search using the MeSH Browser.
PROCEDURE
1. Accessing PubMed
There are some ways to access the PubMed system. One way of direct access
is to open http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=dukemlib in
your web browser
2. System Features
3. Searching PubMed
Simple Search
Entering Search Terms
In the query window/search box, enter the key concepts to be searched. You can enter one or
more terms or phrases at a time. If no connecting words (Boolean operators, such as AND or OR)
are entered, terms will automatically be combined using AND. If you need to use connecting
14
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
words, put them in all capital letters.
Once all of the terms are entered, click on "Go."
Retrieval
The next screen displays the results of your search. The format in which
your results display can be changed by clicking on the drop-down menu
beside the "Display" button. (For example: to display abstracts, click on
the drop-down menu and "abstracts.") You can also choose to sort your
results by author, journal, or publication date by clicking on the "Sort"
15
向太和
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
drop-down menu.
Refining the Search
To review how the system has translated your search and determine if
your search needs to be refined, click on the "Details" button. The
information on the Details screen will tell you if your key terms were
searched as MeSH (Medical Subject Headings) terms or as text words.
The point of doing this is to ensure that you have retrieved the kind of
information you want.
It is often more effective to have the database search for the key terms as
MeSH terms than as simple text words. MeSH can be more
comprehensive, and using them does not require you to remember all the
different synonyms for a concept or idea. (Note: MeSH terms are
automatically exploded in PubMed.) Sometimes, however, there is no
MeSH term to describe a concept and it becomes necessary to be
creative and think of all the different ways to search for that piece of
information.
Advanced Search
MeSH Browser
You can access the "MeSH Browser" by clicking on the link on the left
side of your screen.
In the query window/search box, enter one keyword or phrase, then click
on "Go." You will either be presented with a list of possible matches OR
routed to the MeSH term used to describe the keyword or phrase.

Path 1: If you are routed to a list of possible matches, highlight
the term that relates to your topic. Click on "Browse this term."
If none of the subject headings accurately describes the concept
or idea, search from the Simple Search screen using keywords.
16
向太和
《生物信息学》实验操作指导

杭州师范大学生命与环境科学学院
Path 2: If you are routed directly to a MeSH term, you will see a
brief definition of the subject heading and a display of the
MeSH vocabulary structure, or "tree," where the word appears.
Click on "Detailed display." From this screen, you will have the option
of choosing subheadings, majoring the MeSH term, exploring related
MeSH terms, and not exploding a term.
Combining in the MeSH Browser
If you find an appropriate subject heading in the MeSH Browser, click
on the "Add" button to add it to your search strategy. Repeat this process
to add additional terms to your strategy.
Once you have all the necessary concepts incorporated into your search
strategy, you are ready to run the search. Click on "PubMed Search."
Limits
The "Limits" button, available from the "Features Bar," allows you to
further refine your search. You can restrict your results to a particular
language, age group, information field, gender, human or animal,
publication type, or publication date.
Click on "Go" to limit your search.
History
The "History" button, located on the "Features Bar," gives you the
ability to review the different strategies that you have used to search for
information and their results. If you are interested in looking at articles
from a previously conducted search, click on the hyperlinked number
(under the column labeled "Result") that corresponds to the set.
You can also combine searches using this feature.
Clipboard
17
向太和
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
The "Clipboard" creates a list of citations to print or save, from items
that you have marked and added to the "Clipboard."
As you view your search results, place checks in the boxes beside the
citations you want. Click on "Clip Add." The articles you picked should
now have a green number beside them.
To look at your complete list of marked citations, click on "Clipboard"
from the "Features Bar." You can remove an item from the "Clipboard"
by placing a check beside it and clicking on "Clip Remove."
Printing
Printing in PubMed is a function of your Web Browser. Before printing,
however, you can reformat the screen to print simple text by clicking on
the "Text" button. This feature allows you to fit more citations per page.
Under the "File" menu, choose "Print."
Saving
To save citations to a disk, click on the "Save" button. When the "Save
as" box appears, be sure to change the name of the file to something
meaningful and the file extension to .txt
Click on "Save."
QUESTIONS FOR DISCUSSION
1. Do a simple search for articles written by Maria B. Grant. Which of the following articles
is in your results for Maria B. Grant?
a. Grant MB, et al. [See Related Articles]
The
contribution
of
adult
hematopoietic
neovascularization.
18
stem
cells
to
retinal
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
Adv Exp Med Biol. 2003; 522: 37-45. Review. No abstract available.
b. Grant BM, et al. [See Related Articles]
Use of intraoral cassettes for dental xeroradiography.
Oral Surg Oral Med Oral Pathol. 1978 Nov;46(5):717-20.
c. Needham CW.
[See Related Articles]
In response to Dr. Maria Lenaz's letter to the editor, "Ethics in managed
care".
Conn Med. 1998 Feb;62(2):108-9. No abstract available.
2. Which of these searches will retrieve MORE articles?
①. Vaccine AND vaccination
②. Vaccine OR vaccination
③. Vaccine NOT vaccination
19
向太和
《生物信息学》实验操作指导
Experiment 6
杭州师范大学生命与环境科学学院
向太和
Design primers for PCR and draw a
plasmid map
THE PURPOSE
1. You will learn how to use bioinformatics software to design primers for PCR, and how
to make a useful plasmid map.
LIST OF MATERIALS AND TOOLS
Primer3 software on line: http://frodo.wi.mit.edu
Plasmid processor: http://www.hytti.uku.fi/%7Eoikari/plasmid.html
You can also get the software by link at http://www.bio-soft.net.
PROCEDURE
Step 1. Download sequence of AY871310 and any sequence more than 1000bp from
GenBank, EMBL and DDBJ respectively.
Step 2. Go to primer3 website. Paste your sequences. Primers picking conditions are
as follows:
1. PCR product size ranges are 600-750 bp.
2. Primers size optimum is 22 bp.
3. Primers Tm (temperature melting) optimum is 58.0.
Step 3. Get your primers sequences and understand the reports.
Step
4.
Download
Plasmid
processor
at
website
of
http://www.hytti.uku.fi/%7Eoikari/plasmid.html and install in your computer.
Step5. Draw a plasmid. The information of a plasmid is as follows:
1. The length is 3000 bp.
2. The 24th, 38th and 2300th locations have EcoRⅠ, SalⅠ and BamHⅠsites
respectively.
3. From 50th to 480th is ampicillin (Ap) resistant gene.
20
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
QUESTIONS FOR DISCUSSION
What are optimum primers for PCR?
21
向太和
《生物信息学》实验操作指导
Appendix
杭州师范大学生命与环境科学学院
Useful website address
(生物信息学一些重要的网络地址)
一、国际 3 大核酸数据库以及我国的部分核酸数据库
GenBank: http://www.ncbi.nlm.nih.gov/genbank
EMBL: http://www.ebi.ac.uk/embl
DDBJ: http://www.ddbj.nig.ac.jp/index-e.html
北京大学生物信息学中心(Centre of Bioinformatics, Peking University):
http://www.cbi.pku.edu.cn
北京华大基因研究中心:
http://www.genomics.cn/index.php
清华大学生物系生物信息研究室:
http://www.bioinfo.tsinghua.edu.cn
中国科学院上海生命科学研究院生物信息中心:
http://www.biosino.org.cn
二、基因组数据库
大肠杆菌 E Coli——ECDC 数据库
http://www.uni-giessen.de/~gx1052/ECDC/ecdc.htm
酵母菌 Yeast ——CYGD 数据库
http://mips.gsf.de/genre/proj/yeast/index.jsp
线虫 Caenorhabditis elegans——AceDB 数据库
http://www.acedb.org
http://elegans.swmed.edu/genome.shtml
http://www.wormbase.org
果蝇 Drosophila——FlyBase 数据库
http://flybase.bio.indiana.edu
老鼠 Mouse——MGD 数据库
http://www.informatics.jax.org
http://www.ncbi.nlm.nih.gov/genome/guide/mouse
22
向太和
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
小鼠 Rat
http://www.ncbi.nlm.nih.gov/genome/guide/rat
牛 Cow
http://locus.jouy.inra.fr/cgi-bin/bovmap/intro2.pl
羊 Sheep
http://www.sheepgenetics.org.au
http://www.sheepgenomics.com
鸡 Chicken
http://www.ri.bbsrc.ac.uk/chickmap/chickbase/manager.html
斑马鱼 Zebra fish
http://zfish.uoregon.edu
人类 Human——GDB 数据库
http://gdbwww.gdb.org
http://www.ncbi.nlm.nih.gov/genome/guide/human
拟南芥 Arabidopsis——TAIR(AtDB)数据库
http://www.arabidopsis.org
http://www.kazusa.or.jp/kaos
http://www.tigr.org/tdb/e2k1/ath1
棉花 Cotton
http://cottondb.org
豆类 Beans
http://beangenes.cws.ndsu.nodak.edu
http://www.nenno.it/Beanref
玉米 Maize
http://www.agron.missouri.edu
水稻 Rice——RGP 数据库
http://rgp.dna.affrc.go.jp
http://www.genomics.org.cn
http://compbio.dfci.harvard.edu/tgi
大豆 Soya
http://soybase.agron.iastate.edu
23
向太和
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
目前完成全基因组测序工作的物种有很多,并在随时更新(update)。可以进入
ncbi 的基因组计划二次数据库查看,其网址:
http://www.ncbi.nlm.nih.gov/Genomes
三、蛋白质数据库
ExPASy:
http://www.expasy.org
MIPS——Munich Information Centre for Protein Sequences:
http://www.helmholtz-muenchen.de/en/ibis
PDB:
http://www.rcsb.org/pdb (美国)
http://www.ebi.ac.uk/pdb (欧洲)
NRL-3D:
http://pir.georgetown.edu/pirwww/dbinfo
HSSP:
http://www.cmbi.kun.nl/gv/hssp
SCOP
http://scop.mrc-lmb.cam.ac.uk/scop
CATH:
http://www.cathdb.info
TransFac:
http://www.gene-regulation.com/pub/databases.html
蛋白质回环数据库:
http://www.bmm.icnet.uk/loop
蛋白质结构预测数据库:
http://www.embl-heidelberg.de/predictprotein/predictprotein.html
Prosite(蛋白质序列功能位点数据库):
http://cn.expasy.org/prosite
DSSP (Definition of Secondary Structure of Proteins):
http://www.cmbi.kun.nl/gv/dssp
24
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
FSSP (Families of Structural Similar Proteins):
http://ekhidna.biocenter.helsinki.fi/dali
四、DNA 和蛋白质结构功能的预测
Genomic DNA and cDNA, 利用 ORF Finder 软件发现开放阅读框 ORF(Open reading frame):
http://www.ncbi.nlm.nih.gov/projects/gorf
编码蛋白质中的氨基酸序列蛋白质结构域预测,利用 SMART(Simple Modular Rrvhitecture
Research Tool)软件:
http://smart.embl-heidelberg.de
利用 ScanProsite 软件,进行蛋白质基序(motif)预测:
http://www.expasy.org/tools/scanprosite
利用软件 NRL-3D,进行蛋白质三维结构的预测:
http://swissmodel.expasy.org
http://www.ncbi.nlm.nih.gov/structure
利用 Blastn 或 Blastp 软件,对 GenBank 数据库中相似性和同源性的核酸或者蛋白质进行搜
索:
http://www.ncbi.nlm.nih.gov/blast
利用 Genscan 等软件,对 genomic DNA 和 cDNA 进行基因功能预测:
GENSCAN:
http://genes.mit.edu/GENSCAN.html
GeneFinder:
http://genomic.sanger.ac.uk/gf/gf.shtml
http://linux1.softberry.com/berry.phtml
Gene Feature Searches:
http://searchlauncher.bcm.tmc.edu/seq-search/gene-search.html
Grail:
http://compbio.ornl.gov/Grail-1.3
GrailEXP:
http://grail.lsd.ornl.gov/grailexp
GeneMark:
http://opal.biology.gatech.edu/GeneMark/eukhmm.cgi
GENEID:
http://www1.imim.es/software/geneid
Genlang:
25
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
http://diana.cslab.ece.ntua.gr
Glimmer:
http://www.cbcb.umd.edu/software/glimmer
MZEF:
http://www.cshl.org/genefinder
利用 RiceHMM 等软件,对模式植物水稻进行基因预测:
http://rgp.dna.affrc.go.jp/RiceHMM/index.html
利用 Compute pI/Mw 软件,对基因编码的蛋白质进行等电点(pI)和分子量(Mw)的预测:
http://www.expasy.org/tools/pi_tool.html
利用 Promoter 软件,对 genomic DNA 进行 Promoter 的预测:
http://www.fruitfly.org/
http://tools.genome.duke.edu/generegulation/McPromoter
五、DNA 或蛋白质序列相似性比较
Blast,有多种形式,根据比较的需要选择:
http://www.ncbi.nlm.nih.gov/blast
ClustalW, 可以进行多条 DNA 或者蛋白质序列的比较:
http://www.ebi.ac.uk/clustalw (欧洲)
http://align.genome.jp (日本)
FASTA:
http://www.ebi.ac.uk/fasta
BLITZ:
http://www.ebi.ac.uk/searches/blitz_input.html
六、分子进化树和系谱分析
Mega
http://www.megasoftware.net/mega.html
PAUP:
http://paup.csit.fsu.edu/index.html
ClustalW:
http://www.ebi.ac.uk/clustalw
GCG package:
http://www.gcg.com
26
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
PHYLIP:
http://evolution.genetics.washington.edu/phylip.html
MEGA
http://imeg.psu.edu/
http://www.megasoftware.net
Hennig86:
http://www.cladistics.org/education/hennig86.html
GAMBIT:
http://genomics.ucla.edu/gambit
Phylogenetic analysis:
http://www.ucmp.berkeley.edu/subway/phylo/phylosoft.html
http://evolution.genetics.washington.edu/phylip.html
七、中外文文献和查询数据库
PubMed:
http://www.ncbi.nlm.nih.gov/PubMed
USDA:
http://www.nal.usda.gov
SCI:
http://isiwebofknowledge.com
Providing links to the world's electronic journals:
http://www.e-journals.org
CNKI 中国期刊网全文数据库:
http://www.cnki.net
重庆维普:
http://www.tydata.com
www.cqvip.com
万方数据库:
http://www.wanfangdata.com.cn
国家科技图书文献中心 (National Science and Technology Library)
http://www.nstl.gov.cn
27
向太和
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向太和
杭州市科技信息网(杭州市 IP 可以免费使用文献数据库)
http://www.hznet.com.cn
SRS:
http://www.ebi.ac.uk/srs/srsc
多种数据库、分析工具和生物信息学机构:
http://www.unl.edu/stc-95/Restools/biotools
多种数据库和分析工具:
http://www.ebi.ac.uk/Tools
生物软件网的网址,提供了多种非常有用的生物信息学软件的链接地址(link address):
http://www.bio-soft.net
八、其它有用的网址
The Eukaryotic Promoter Database:
http://www.epd.isb-sib.ch
http://www.genome.ad.jp/dbget/dbget2.html
GRAMENE Home
http://www.gramene.org/resources/
Plant R gene database:
http://www.ncgr.org
Sanger Centre:
www.sanger.ac.uk
Animal Genome Size Database:
http://www.genomesize.com
ENZYME (Enzyme Data Bank):
http://www.expasy.org/enzyme
Electronic PCR:
http://www.ncbi.nlm.nih.gov/unists
http://www.ncbi.nlm.nih.gov/sutils/e-pcr
对测定的序列,进行去除载体污染的在线免费服务:
http://www.embl-ebi.ac.uk/blastall/vectors.html
http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html
28
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
向 GenBank 提交序列:
http://www.ncbi.nlm.nih.gov/BankIt
http://www.ncbi.nlm.nih.gov/WebSub/?tool=genbank
PCR 引物设计,Primer3 在线免费设计:
http://frodo.wi.mit.edu/primer3
http://www.ncbi.nlm.nih.gov/tools/primer-blast
Oligo 软件网址:
http://www.mbinsights.com
http://www.oligo.net
DNA 序列酶切位点 NEB lab 或 webcutter 在线免费分析的软件网址:
http://tools.neb.com/NEBcutter2/index.php
http://www.firstmarket.com/cutter/cut2.html
质粒绘图软件 SimVector 2.01
http://www.premierbiosoft.com/plasmidmap/plasmidmap.html
质粒图谱的绘制软件 Plasmid Processor 软件,在线免费下载:
http://hznugene.spaces.live.com
http://iubio.bio.indiana.edu/soft/molbio/ibmpc/plasmid-processor-readme.html
质粒图谱的绘制软件 DMUP 软件,在线免费下载:
http://hznugene.spaces.live.com
http://www.bioinformatics.org/annhyb/dmup.php
质粒作图的在线软件 PlasMapper2.0
http://wishart.biology.ualberta.ca/PlasMapper
质粒图谱的绘制软件 pDRAW32 软件:
http://www.acaclone.com
Comparative sequence analysis:
http://www.bork.embl-heidelberg.de
测序图谱分析 Chromas
http://www.technelysium.com.au
序列的拼接 Sequencer
ftp://genecodes.com/pub/SequencherPC.zip
microRNA 数据库
29
向太和
《生物信息学》实验操作指导
杭州师范大学生命与环境科学学院
http://microrna.sanger.ac.uk
30
向太和
Download