分子进化学 上机操作 雷 霆 华中农业大学生命科学技术学院 Tel: 87282669 Email: leiting@mail.hzau.edu.cn 2012年11月 1 Steps in the construction of gene phylogenies 2 第一节 获得同源序列 A cartoon of the evolution of the met10 gene on the phylogenetic tree Li X, Wong W H PNAS 2005;102:9481-9486 3 Select a Sequence or Sequences of Interest Coding regions, introns, promoters, chromosomes, whole genomes…It is better to use sequences that mutate slowly The sequences should be much easier and/or less expensive to clone and sequence for the study in question ss-rRNA for studies of microbial evolution The sequences should be chosen based on your experimental design ss-rRNA: microbes with every different growth temperatures, evolution rates vary greatly between taxa, relatively recent evolution no type of sequence is perfect for all purposes4 Obtaining Sequences of Homologs —Approach 1: Sequencing DNA, RNA, or protein PCR Conserved regions Moderately Variable Region Protein-coding sequences Degenerate PCR primers 5 Obtaining Sequences of Homologs —Approach 2:Database Searching 核苷酸 数据库 http://www.ncbi.nlm.nih.gov/nuccore/NM_008656.5 6 蛋白质数据库 UniPROT http://www.uniprot.org/ 收集两类序列:来源于实验的有详细注释的真实 序列(Swiss-Prot)和自动注释的推测序列 (TrEMBL) http://www.rcsb.org/ 收集实验测定的蛋白质及其它生物大分子的结构 信息 7 DNA or Protein? For closely related sequences, translating to protein will loss synonymous mutation information and result in lower resolution trees. But beyond a certain taxonomic range, the nucleotide sequences are going to lose information anyway due to repeated mutations at a given site, indels, codon bias, GC skew, etc. Beyond that range, you probably won't be able to perform any meaningful nucleotide-based evolutionary analyses anyway, so protein alignments are probably best. If you just want to know which proteins are more closely related and probably have similar biochemical properties, the trees from amino acid alignments should be better. But if you want to trace the history of the sequence, study selection, molecular clocks, etc, 8 nucleotide sequences alignments are better. 从数据库中获取同源序列——BLAST http://blast.ncbi.nlm.nih.gov/ 一条序列(query) 对数据库进行比对 两条序列进行比对 9 BLAST结果解读 Query sequence 10 BLAST结果解读 Red: very good Green: acceptable Black: bad 1e-03 = borderline E-value 1e-04 = good E-value 1e-10 = very good E-value E-values lower than 1e-4 indicate possible homology E-values higher than 1e-4 require extra evidence to support homology Score (Bit score) Hit list High bit score = good match E-Value Low E-value = good match 11 BLAST结果解读 • Two protein sequences with more than 25 % identity (over 100 amino acids ) are homologues • Two DNA sequences with more than 70 % identity (over 100 nucleotides) are homologues 12 保存BLAST结果 在Alignments打开选 定的同源序列,确定 区域和序列方向 在Display Settings 下拉菜单中选择 格式FASTA(text) 保存为txt文件 (多重fasta格式) 13 借助氨基酸序列查询编码区的同源序列 E.coli K-12 ebgC (DNA) 14 借助氨基酸序列查询编码区的同源序列 E.coli K-12 ebgC (protein) 提取同源蛋白质序列的cds 15 获取同源序列——危险的做法 在BLAST结果中 从未出现,原因? 在BLAST结果中 出现,是同源序列 16 获取同源序列——危险的做法 17 获取同源序列——危险的做法 18 同源序列获取方法——利用同源序列数据库 http://www.ncbi.nlm.nih.gov/ 19 利用HomoloGene获取同源序列 Starting with... • • • • • A GENE NAME A PROTEIN ACCESSION NUMBER A NUCLEOTIDE ACCESSION A NUCLEOTIDE SEQUENCE A PROTEIN SEQUENCE 检索技巧可参看 HomoloGene主页的 Query Tips 20 利用HomoloGene获取同源序列 输入基因 或蛋白质 名称、注 册号… 在下拉菜单中可选 择下载同源基因的 protein、mRNA或 genomic DNA序列 21 其它同源序列数据库 Protein Clusters http://www.ncbi.nlm.nih.gov/proteinclusters Conserved domain database http://www.ncbi.nlm.nih.gov/cdd Prosite http://prosite.expasy.org/ 22 课堂练习 1. 用BLAST查询人GAPDH(NP_002037)在蛋 白质数据库中的相似序列,根据比对结果中 的E值和相同度确定其在黑猩猩、小鼠、猪、 斑马鱼和线虫中的同源蛋白序列,并获取这 些序列的编码区段。 2. 熟悉从HomoloGene中获取同源序列的操作。 23