Uploaded by 孙程明

分子进化学

advertisement
分子进化学
上机操作
雷 霆
华中农业大学生命科学技术学院
Tel: 87282669
Email: leiting@mail.hzau.edu.cn
2012年11月
1
Steps in the construction of gene phylogenies
2
第一节 获得同源序列
A cartoon of the evolution of the met10 gene on the phylogenetic tree
Li X, Wong W H PNAS 2005;102:9481-9486
3
Select a Sequence or Sequences of Interest
Coding regions, introns, promoters, chromosomes, whole
genomes…It is better to use sequences that mutate slowly
The sequences should be much easier and/or less
expensive to clone and sequence for the study in
question
ss-rRNA for studies of microbial evolution
The sequences should be chosen based on your
experimental design
ss-rRNA: microbes with every different growth temperatures,
evolution rates vary greatly between taxa, relatively
recent evolution
no type of sequence is perfect for all purposes4
Obtaining Sequences of Homologs
—Approach 1: Sequencing
DNA, RNA,
or protein
PCR
Conserved
regions
Moderately Variable Region
Protein-coding
sequences
Degenerate
PCR
primers
5
Obtaining Sequences of Homologs
—Approach 2:Database Searching
核苷酸
数据库
http://www.ncbi.nlm.nih.gov/nuccore/NM_008656.5
6
蛋白质数据库
UniPROT
http://www.uniprot.org/
收集两类序列:来源于实验的有详细注释的真实
序列(Swiss-Prot)和自动注释的推测序列
(TrEMBL)
http://www.rcsb.org/
收集实验测定的蛋白质及其它生物大分子的结构
信息
7
DNA or Protein?
For closely related sequences, translating to protein will loss
synonymous mutation information and result in lower resolution
trees.
But beyond a certain taxonomic range, the nucleotide sequences
are going to lose information anyway due to repeated mutations
at a given site, indels, codon bias, GC skew, etc. Beyond that
range, you probably won't be able to perform any meaningful
nucleotide-based evolutionary analyses anyway, so protein
alignments are probably best.
If you just want to know which proteins are more closely related
and probably have similar biochemical properties, the trees from
amino acid alignments should be better. But if you want to trace
the history of the sequence, study selection, molecular clocks, etc,
8
nucleotide sequences alignments are better.
从数据库中获取同源序列——BLAST
http://blast.ncbi.nlm.nih.gov/
一条序列(query)
对数据库进行比对
两条序列进行比对
9
BLAST结果解读
Query
sequence
10
BLAST结果解读
Red: very good
Green: acceptable
Black: bad
1e-03 = borderline E-value
1e-04 = good E-value
1e-10 = very good E-value
E-values lower than 1e-4
indicate possible homology
E-values higher than 1e-4
require extra evidence to
support homology
Score (Bit score)
Hit list
High bit score = good match
E-Value
Low E-value = good match
11
BLAST结果解读
• Two protein
sequences with more
than 25 % identity
(over 100 amino
acids ) are
homologues
• Two DNA sequences
with more than 70 %
identity (over 100
nucleotides) are
homologues
12
保存BLAST结果
在Alignments打开选
定的同源序列,确定
区域和序列方向
在Display Settings
下拉菜单中选择
格式FASTA(text)
保存为txt文件
(多重fasta格式)
13
借助氨基酸序列查询编码区的同源序列
E.coli K-12 ebgC (DNA)
14
借助氨基酸序列查询编码区的同源序列
E.coli K-12 ebgC (protein)
提取同源蛋白质序列的cds
15
获取同源序列——危险的做法
在BLAST结果中
从未出现,原因?
在BLAST结果中
出现,是同源序列
16
获取同源序列——危险的做法
17
获取同源序列——危险的做法
18
同源序列获取方法——利用同源序列数据库
http://www.ncbi.nlm.nih.gov/
19
利用HomoloGene获取同源序列
Starting with...
•
•
•
•
•
A GENE NAME
A PROTEIN ACCESSION NUMBER
A NUCLEOTIDE ACCESSION
A NUCLEOTIDE SEQUENCE
A PROTEIN SEQUENCE
检索技巧可参看
HomoloGene主页的
Query Tips
20
利用HomoloGene获取同源序列
输入基因
或蛋白质
名称、注
册号…
在下拉菜单中可选
择下载同源基因的
protein、mRNA或
genomic DNA序列
21
其它同源序列数据库
Protein Clusters
http://www.ncbi.nlm.nih.gov/proteinclusters
Conserved domain database
http://www.ncbi.nlm.nih.gov/cdd
Prosite
http://prosite.expasy.org/
22
课堂练习
1. 用BLAST查询人GAPDH(NP_002037)在蛋
白质数据库中的相似序列,根据比对结果中
的E值和相同度确定其在黑猩猩、小鼠、猪、
斑马鱼和线虫中的同源蛋白序列,并获取这
些序列的编码区段。
2. 熟悉从HomoloGene中获取同源序列的操作。
23
Download