Bioinfo_primer_03

advertisement
Christophe Roos - MediCel ltd
christophe.roos@medicel.fi
Mutations change sequences
Function preserves sequences
Similarity is the result of
conservation or converging
evolution – it has its reason of being
The public biological databases
• EMBL or GenBank or DDBJ for DNA
– emblnew for daily updates, merges the main DB 4x/year
• SwissProt or PIR for proteins
– Trembl, tremblnew, remtrembl
• PDB for structures
• In flat file format, yet quite informative and convertible
– Fasta format is a ‘universal’ sequence format: first line starts with ‘>’
followed by free text. Second line has the start of the sequence (50 or 60
characters per line). Use the first line for the name or the Accession
Number (AC)
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
Database homes
• The European database home is in Hinxton, Cambridge, UK:
European Bioinformatics Institute - EBI
– http://www.ebi.ac.uk
– Access through the Sequence Retrieval System, SRS
• The American database home is in Washington DC: National
Center for Biotechnology Information – NCBI
– http://www.ncbi.nlm.nih.gov
– Access through Entrez
• Both centers exchange their data on a daily basis, however
there are differences in annotations, consistency, speed and
quality.
• There is also a Japanese database provider, DDBJ.
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
A look at one entry from EMBL
part 1/3
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
A look at one entry from EMBL
part 2/3
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
A look at one entry from EMBL
The feature table of the entry contains several linked
items, such as exon-assembly (mRNA) and coding
sequence (CDS).
There are also cross-references to other databases
part 3/3
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
A look at one entry from SwissProt
The eyeless gene: a master regulatory gene in eye formation
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
The effect of the eyeless gene
The eyeless gene is a master regulatory gene in eye formation
Normal
Absent
Overexpressed in antennae and wings
• When it is absent, no eyes are formed
• When it is present where it should not, it induces eye formation
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
A look at one entry from SwissProt
Part 2: the annotations about the function and location
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
A look at one entry from SwissProt
Part 3: The feature table and the amino acid sequence
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
A look at one entry from SwissProt
The eyeless gene is also called PAX6 and can be found
in several species: birds, mammals, reptiles, fish, invertebrates
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
Sequence comparison
- Why?
• Function by analogy: If sequences are conserved their function
is probably also conserved.
• Functional domains: If some parts of the sequences are more
conserved than other parts, there must be an underlying
biological reason for it.
• Establishing relationship/differences in function: By
quantification of sequence relationships it is possible to
estimate function of novel genes
• Establishing relationship between species
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
Sequence comparison – how?
•
•
•
•
•
•
Compare two sequences of similar length
Compare two sequences of very different length
Compare several sequences
Allow gaps or not?
Scoring: yes-no or good-intermediate-bad
The best or all above a threshold?
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
Sequence comparison – metrics
gap
match
GA-CGGATTAG
•
•
•
•
•
•
GATCGGAATAG
The scoring matrix
mismatch
The score for a match
The penality for a mismatch
The penality for the insertion of a gap (gap-open)
The penality for elongating a gap (gap-length)
Local or global similarities ?
Christophe Roos - 3/6 Sequence databases & comparison
Spring 2002
Download