Cloning and Sequencing Explorer Series - Bioinformatics - Bio-Rad

advertisement
1
Cloning and Sequencing Explorer Series
Bioinformatics
2
Instructors
Stan Hitomi
Coordinator – Math & Science
Principal – Alamo School
San Ramon Valley Unified School District
Danville, CA
Kirk Brown
Lead Instructor, Edward Teller Education Center
Science Chair, Tracy High School
and Delta College, Tracy, CA
Bio-Rad Curriculum and Training Specialists:
Sherri Andrews, Ph.D.
sherri_andrews@bio-rad.com
Essy Levy, M.Sc.
essy_levy@bio-rad.com
Leigh Brown, M.A.
leigh_brown@bio-rad.com
3
Bioinformatics
The application of information
technology to molecular biology
4
Questions
Concerning
your Data
5
Class Data Set
• Are our sequences high quality?
• Are my sequences similar to GAPDH?
• Are any of my sequences primarily cloning
vector?
Individual Clone Sequences
• Do my individual sequences align to give me a
single long sequence?
• Are there discrepancies between my reads?
• Which GAPDH gene did we clone?
Annotation of Clone Sequence
• What is the intron- exon structure/mRNA
sequence of my clone?
• What is the protein sequence of my clone?
Sequence
Data
Analysis
Tools
Sequence data storage and analysis tools (iFinch
and Finch TV)
Sequence comparison algorithm (NCBI BLAST)
Sequence Assembly (CAP3)
mRNA sequence prediction (BLAST and manual)
Protein sequence prediction (EMBL-EBI
EMBOSS Transeq)
6
Advanced
Preparation
• Practice with iFinch using the guest accounthighly recommended!
• Activate your iFinch account (2 months
subscription)
• Download FinchTV onto lab computers
• Set up project and folder in iFinch
• Upload sequence data
7
Guest
iFinch
Account
http://classroom1.bio-rad.ifinch.com/Finch
Username: BR_guest
Password: guest
• Example data sets for each stage of
process
• No uploading of data
8
Your own
iFinch
account
Each account has a unique URL:
http://Platenumber.ifinch.com/Finch
E.g. http://A150936.ifinch.com/Finch
Instructor’s Username: Platenumber e.g. A150936
Instructor’s Password: Platenumber e.g. A150936
Student Username: Platenumber_student e.g.
A150936_student
Student Password: Platenumber e.g. A150936
Once activated- change your passwords!
Active for 2 months.
9
Download
FinchTV
10
• www.geospiza.com/finchtv
Make
project &
folder and
upload data
to iFinch:
Demo
11
Student
Activities
1. Review data quality and view sequence traces
2. Use BLAST for preliminary check on which
GAPDH was cloned
3. Assemble sequences into a contig
4. Verify which GAPDH gene was cloned
5. Predict intron exon boundaries and generate
mRNA sequence
6. Predict protein sequence
12
Sequence
Quality
13
Q20 values
The quality value of a “base call” is:
Q= -10Log10(Perror)
where P is the probability of an error.
Thus if the chance that a base call is incorrect is 1/100, P
would be 0.01 and the quality value would be 20 (Q=20).
Convention rates sequences by the number of basecalls that
have quality values of 20 or higher- a Q20 value.
The quality values of a sequence are calculated automatically
by software in iFinch- a common program for this was
developed by the University of Washington and is called
“Phred”
14
Sequence
Quality
Q20= 732
Q20= 161
Q20= 238
15
Screen for
poor quality
sequence,
vector,
GAPDH
family
17
Class Data
Set
18
Sort Class
Data into
Folders
19
Record
Data
Information
20
Download
sequences
for initial
screen
using
BLAST
21
• Open Guest iFinch account
– User: BR-guest, Psw: guest
• Click :Folders
• Click :Salvia folder
• Look at data
• Go back to folder report
• Click: Download folder data- save to new folder
on hard drive
• View FASTA format in MSWord or text editor
• Upload file back to iFinch
BLAST
sequences
for initial
screen
22
•
•
•
•
Click NCBI BLAST on iFinch homepage
Choose nucleotide search
Browse for downloaded salvia.fsa file to upload
Choose “Others (nr etc)”, Select “Reference
Genomic sequences”
• Choose “Plants (taxid)”
• Choose “Somewhat similar sequences (blastn)”
• Click BLAST
BLAST
Results
23
• All 4 sequences were analyzed by BLASTchoose from pull down menu at top of page
• Mouse over top bar
• Scroll down to list of homologous sequences
– E value represents the number of equally
good sequence matches to the query
sequence that would be expected in a
database of the same size containing random
sequences.
• Scroll down to sequence alignments
– Query: Your sequence
– Subject: Database matching sequence
Which
GAPC
Gene?
24
Break time!
25
Questions
Concerning
your Data
26
Class Data Set
• Are our sequences high quality?
• Are my sequences similar to GAPDH?
• Are any of my sequences primarily cloning
vector?
Individual Clone Sequences
• Do my individual sequences align to give me a
single long sequence?
• Are there discrepancies between my reads?
• Which GAPDH gene did we clone?
Annotation of Clone Sequence
• What is the intron- exon structure/mRNA
sequence of my clone?
• What is the protein sequence of my clone?
Initial
Screen
Result
• We have cloned Salvia GAPC gene
• Now we need to put the sequences together to
make a contig- (contiguous sequence)
Sequence fragments
Run the CAP3 program to assemble the sequence
fragments into a contiguous sequence
Contiguous sequence
27
• Then correct any sequence discrepancies
between different reads
CAP3
Program
(Contig
Assembly
Program)
28
• On iFinch home page click “sequence assembly”
Assembly
Results
• Your sequence file (your input)
• Single sequences (any seqs that could not be
assembled)
• Contigs (save in FASTA format as “.txt” file)
• Assembly details (Save as landscape “.txt file)
29
Salvia
Contig
A01
I01
C01
G01
30
Check for
Discrepancies
• Look through assembly file for sequence
discrepancies
• Open chromatogram files in FinchTV
• Examine actual chromatograms and use
personal judgment to determine which base call
is correct
• Correct FinchTV file and save back to iFinch (not
available in guest account) noting the changes in
the revision history
31
• If the consensus sequence has changed,
download folder sequences again like previously
and reassemble with CAP3 program
BLAST
search with
contig
32
Submit contig FASTA file for BLAST search
(same database as before- plant reference
genomic database)
Break time!
33
Questions
Concerning
your Data
34
Class Data Set
• Are our sequences high quality?
• Are my sequences similar to GAPDH?
• Are any of my sequences primarily cloning
vector?
Individual Clone Sequences
• Do my individual sequences align to give me a
single long sequence?
• Are there discrepancies between my reads?
• Which GAPDH gene did we clone?
Annotation of Clone Sequence
• What is the intron- exon structure/mRNA
sequence of my clone?
• What is the protein sequence of my clone?
Determine
Gene
Structure
35
Workflow
36
BLAST
Search
Against
Reference
mRNA
Database
37
• Blastn search with contig against plant
Reference mRNA sequences database
• Change Algorithm parameters
Reformat
BLAST
results
38
• Reformat results in plain text format
• Save files to iFinch folder
Save Contig
File in
MSWord
• Delete all paragraph marks using find and
replace command
• Save to hard drive as “.rtf” file with a new name.
• Color contig sequence with exons as determined
from BLAST results
• Put exons together in a first draft of the mRNA
sequence and save to iFinch folder
• Submit draft mRNA sequence to blastn against
plant reference mRNA database
39
BLAST
search with
derived
mRNA
sequence
• Correct intron-exon boundaries (use Arabidopsis
mRNA as model)
• Resubmit to BLAST
• Reiterate if necessary until no indels are evident
and you are satisfied with a final mRNA
sequence
• Save to iFinch folder
40
Use blastx
to Search
Protein
Database
41
• Blastx converts nucleic acid sequence to amino
acid sequence and searches protein database.
Translate
mRNA into
Protein
Sequence
42
Check
Protein
Sequence
with blastp
Search
43
• Ensure translation is in correct frame
• Save to iFinch folder
Congratulations!
• You have cloned, sequenced and annotated a
novel gene.
• You could now submit this to GenBank.
• Data from additional samples would strengthen
the data- for example- assemble sequences from
the same gene from different student teams
• Download data from iFinch if you wish to keep it
for the long term
44
Webinars
• Enzyme Kinetics — A Biofuels Case Study
• Real-Time PCR — What You Need To Know
and Why You Should Teach It!
• Proteins — Where DNA Takes on Form and
Function
• From plants to sequence: a six week
college biology lab course
• From singleplex to multiplex: making the
most out of your realtime experiments
explorer.bio-rad.comSupportWebinars
45
Download