Basic sequence analyses and submission

advertisement
BIT150 - Lab 1
Sequence Analysis
Remember:
Door code: 93175
Log in: 150student
Password: $Jorge1
Log on to: Bioinfolab
Introduction
Using Triticum monococcum L. genomic DNA we subcloned a 4,116-bp Hind III fragment into a
pBluescript II SK vector. This vector has M13 forward (F) and reverse (R) sites at both sides of
the Hind III cloning site. Commercial primers M13F and M13R were used to start sequencing
the cloned fragment (chromatograms M13_F and M13_R).
However, this sequence is too long to be completed in a single sequencing reaction, so primer
walking was used to complete the sequence. Using the sequence obtained with the M13 primers,
new primers were designed (F1 and R1) and used to extend the sequence. Then, the new
sequences were used to design primers F2 and R2. Finally, primer F3 was used to close the last
gap (Figure 1).
F3 
F2 
Vector
 R2
F1 
 R1
Vector
M13_F 
Figure 1
 M13_R
Sequences are available on the class website (BIT150), and also in the Z: drive/10_Lab1
directory. You can read but not write in the ‘Z:’ drive, but you can read and write in the ‘C:’
drive.
Create in the C: drive a directory with your last name within the class directory (BIT150),
and copy the directory Lab1 from Z: into C:. Take a copy with you for the homework (Hwk1).
Objective: Manually prepare a full-length integrated sequence of the T. monococcum
fragment, without vector, without sequencing errors, and without duplicated overlapping
sequences.
Activities:
1. Use Chromas to open and inspect the chromatograms
Chromas is a chromatogram-viewing program that can display chromatograms in both forward
and reverse complement orientation. As all the software that will be used in the class, Chromas
was pre-installed on your lab computers, and a copy of the free-ware version is included in the
course CD that was distributed in the first class. You can start this and the other programs by
clicking on START-> Programs->Bioinformatics (you can also create shortcuts in the Desktop).
1
In the Chromas window click on Open.
Chromas will open SCF and Applied Biosystems sequencing files. Here we will use Applied
Biosystems files with the extension .ab1
Click on the .ab1 file you want to open, and then click on Open.
Once the chromatogram is open, browse through the length of the sequence and check the quality
of the sequence. Compare the quality at both the beginning and the end of the sequence, with the
quality at the middle of the sequence.
o
o
The chromatogram files are the files with extension <.ab1>
When the sequence peaks in the chromatogram are sharp and equally spaced, that indicates
high quality of the read.
o Determining sequence quality through visual inspection is a highly subjective scoring
technique. However, quality scoring software sometimes perform poorly on otherwise good
quality sequence due to sequencing artifacts, irregular spacing of the peaks, etc. Thus, it is still
considered the most reliable method for decision making on quality scoring. You can use the
examples below as guidelines for your own decision making:
High quality
Low quality
Low quality
2
2. Use Chromas to convert the chromatograms into FASTA format and copy them to the
MBCS Add-In Word document in your Lab1 directory.
In Chromas, Chromatogram Editor, click on Edit  Copy Sequence  FASTA format.
Open the MBCS Add-In Word Document. A Security Warning should open at the top of the
document. Click the Options button and select Enable this Content. Paste the sequence into
the MBCS Add-In Word document.
The M13_R and the R1 and R2 sequences are in reverse complement orientation to the forward
sequences (they are sequences from the other DNA strand). To put all the sequences in the same
orientation, you need to reverse complement them (e. g. from AGCTT to AAGCT). It might be
easier to align first M_13R, R1 and R2 and then, at the end, reverse complement the correct
contig.
To reverse complement a sequence, select the sequence in the MBCS Add-In Document. Select
Add-Ins  MBCS1.2  Sequence Manipulation  Antisense DNA/RNA Sequence. (Two
other sequence tools are listed here that are similar but not the same. Reverse will list the bases
backwards. Complement will list the complementary bases. Antisense will both Reverse and
Complement the sequence. It is important that you select the correct option.) In the sequence
manipulation window you can select to copy the new sequence to the clipboard or insert it at the
end of your sequence. Be sure to indicate that the sequence has been reverse complemented, by
adding ‘RC’). CURRIER NEW font is useful for a good alignment of sequences.
An alternative way to reverse complement the sequence is to use Chromas to reverse
complement the chromatogram before exporting the sequence. It is doubly important to make a
note of the orientation of the sequence after pasting it in the word document if you use Chromas
to reverse it, as Chromas will not change the name of the sequence when it is reversed. This is
useful however to view the chromatogram of the reversed sequence.
3. Identify Restriction sites in your sequence using the webtool NEB Cutter by New
England Biolabs.
http://tools.neb.com/NEBcutter2/index.php
Paste your sequence into the NEBcutter window. Select All Commercially available
specificities and click Submit.
To identify sites of a specific enzyme, in the results window under main options, click Custom
Digest. Check the boxes of the specific enzymes you are interested in and click Digest.
To transfer figures from screen to your Word documents: Press the Prnt Scrn key of your
keyboard to copy the screen image to the clipboard (Open Start/Programs/
Accessories/Paint, select Edit/Paste (or Ctrl V); select the cut tool on the left
region you want to cut, copy it (Ctrl C), and paste it into your Word document.
, mark the
3
A custom digest for BamHI on the assembled 4116 bp sequence is shown below.
4. Trim the vector sequence from the M13_F and M13_R sequences using the NCBI's
BLAST-VecScreen (http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html).
Copy the M13_F sequence and click on RunVecScreen. Find the region corresponding to
vector.
Identify the Hind III cloning site: AAGCTT, and eliminate the vector sequence, but not the
cloning site. Highlight the cloning site.
Repeat the same process with the M13_R sequence. The other sequences (F1, F2, F3, R1 and
R2) do not have vector because the primers were designed within the cloned segment.
5. Use BLAST 2 sequences to Align M13_F with F1
http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi (create a bookmark for this site)
Copy the M13_F sequence in the Sequence 1 window and the F1 sequence in the Sequence 2
window. Unselect Filter, and click on Align.
Identify the overlapping sequence between M13_F and F1, and highlight it in both sequences in
your Word document (remember to use Currier New font).
The two sequences are from the same molecule, and therefore they should be identical. The
differences between them are sequencing errors. Examine the chromatograms to decide which
base is the correct one for each difference observed in the overlapping sequence.
Eliminate the duplicated region generated by the sequence overlaps and create a combined clean
sequence (without vector and without sequencing errors).
6. Repeat the process with the other chromatograms until you assemble a complete clean
sequence
4
Sequence Submission to NCBI
Introduction
An important part of working with genomic or protein sequences is the submission of the final
sequence to the central databases (GenBank for the US). The software tool developed by NCBI
for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases is
called Sequin, and is available freely at http://www.ncbi.nlm.nih.gov/Sequin/. For Sequin
tutorial, go to SEQUIN at NCBI.
Objective: Prepare a sequence submission to GenBank using Sequin.
Activities:
1. Use Sequin to prepare a sequence submission to GenBank
For this assignment you have the genomic DNA sequence (.txt) from barley (Hordeum vulgare
L.), the protein translation (.txt), and the annotated genomic sequence (.doc) for the Acyl Co-A
Synthetase in the subdirectory with the name ‘Sequin Acyl Co-A Synthetase’, into the Lab1
directory.
Download the files into your created directory in the C: drive.
1.1. Start Sequin.
1.2. Enter your personal information as submitter. Ask the sequence to be released in 1 year from
today. Move to the next form.
1.3. Load the ‘proper’ Co-A data file(s) into Sequin and move to the next form.
The FASTA genomic DNA sequence is in the file Co-A_DNA.txt and the protein one in Co-A_Protein.txt (note
that the protein sequence should NOT have the asterisk representing the stop codon at the end). For both .txt
files, note that the SeqID after the ‘>’ symbol in the definition line should not contain any space.
The final annotation of the sequence is in the Word document called ‘Final annotation.doc’. Sequin will format
and annotate the sequence using automated programs (called macros) to determine exon locations, etc. Check if
the coordinates of the exons are correct using the Tools/Word Count option in your annotated Word document.
1.4. To get the taxonomic information, you can go to NCBI, select the Taxonomy database and
search for Hordeum vulgare. Click on the line Unclassified in the Organism field. Select the
Lineage tab and paste the lineage into the Taxonomic Lineage box.
Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta;
Spermatophyta; Magnoliophyta; Liliopsida; commelinids; Poales; Poaceae; BEP clade; Pooideae; Triticeae;
Hordeum
1.5. Click on Search->Validate to check correctness of the automated annotation. If a
submission is invalid, you can correct it manually by clicking on the shown error and completing
the requested information.
1.6. Once you fixed the error, click on Revalidate.
1.7. To save your document, click on File --> Export GenBank (then you will be able to open
this file as a Word document).
The complete and validated GenBank file should be submitted as part of Homework 1.
5
Download