EXPERIMENT SIX: DNA SEQUENCING The objectives of this ex periment are to: (1) carry out a sequencing reaction using a pCR2.1-amplicon clone (1) electrophorese sequencing reactions to obtain the nucleotide sequence of this clone. (3) assemble and analyze the sequencing data. (4) use NCBI BLAST to determine the identity of the sequence and uncover sequence similarities with other genes in the databases. (5) create in silico plasmid maps based on sequence data Experimental expectations: 75 DNA Sequencing Materials and Equipment Required 1. Sequencing reaction PCR thermocycler (ABI 2600) and tube tray 2 thin-walled PCR tubes ice bath Reagents Required 1. Sequencing reaction 4 µL Big Dye Terminator buffer 2 µL Primer (use M13-Forward -20 &-40 and M13-Reverse) 2. Reaction clean-up Heat block at 90 & 95oC ice bath Gel electrophoresis DNA sequencing apparatus 2. Reaction clean-up 40 µL Ethanol/sodium acetate mix 125 µL 75% Ethanol 2 µL Loading buffer Procedure PCR Reaction (Cycle Sequencing) – for each primer/template combination to be used 1. Prepare a 10 µL PCR reaction as follows in a thin-walled PCR tube on ice: a. 4.0 µL Big Dye Terminator buffer b. 2.0 µL Primer (0.8 pmol/µL concentration; 1.6 pmol final amount) c. 4.0 µL plasmid to be sequenced or pGEM control (2 controls are prepared per gel) 2. Place in a thermocycler set to the following program: a. 96°C for 60 seconds b. 96°C for 10 seconds c. 50°C for 5 seconds d. 60°C for 4 minutes Repeat b-d for 24 additional cycles, then hold at 4°C until removed from machine. May be stored at –20°C until ready to use. Reaction Clean-up/Precipitation (allow 1.5-2 hours) 1. Add 40 µL of ethanol/sodium acetate mix* to each tube containing PCR products. 2. Close tubes and vortex briefly. 3. Incubate at room temperature for at least 15 minutes (do not exceed 24 hours). 4. Microfuge at 14, 000 rpm for 20 minutes (use double carriers to prevent tubes from falling through!). Hint: Place tube hinge outward and use carrier tubes to hold the small tube. Look for the pellet on the side of the tube underneath the hinge. 5. Immediately remove the supernatant (if there is a delay, re-centrifuge for 2 minutes). 6. Add 125 µL of 75% ethanol. 7. Close tube and vortex briefly. One may stop here to store tube at 4°C overnight. 8. Microfuge at 14, 000 rpm for 5 minutes with hinge outward. 9. Remove supernatant completely, taking care not to dislodge or discard the pellet (where would you expect the pellet to be?). 10. Dry at room temperature for 45-60 minutes with lid open. 11. Proceed to Resuspension Procedure or store the dried product at –20°C until ready to use. *Ethanol/sodium acetate mix: 1.5 µL 3M sodium acetate, pH 4.6; 31.25 µL 95% ethanol; 7.25 µL sterile deionized water 76 Resuspension 1. Resuspend dried pellet in 2 µL sequencing gel loading buffer. Loading buffer is prepared using a ratio of 5 parts deionized formamide to 1 part 25 mM EDTA, pH 8, with blue dextran (50 mg/ml). 2. Pulse vortex to mix (or mix with micropipette) 3. Heat samples at 95°C for 2 minutes to denature. 4. Immediately place on ice until ready to load gel. Gel Loading (allow 30 minutes) Load 2 µL onto gel comb as instructed. Primers for sequencing reaction These are standard primers found in most cloning vectors used in molecular biology. The sequences are provided below: M13 Forward (-20) GTAAAACGACGGCCAG M13 Forward (-40) GTTTTCCCAGTCACGA M13 Reverse CAGGAAACAGCTATGAC 77 Computer Lab Sequence Analysis and Database Search Project Goals: In this lab, you will be introduced to some of the software and web resources available to the molecular biologist. The software package we will be using today is called Lasergene and is produced by a company called DNASTAR (they have provided us with a free site license!). This package will allow us to examine the raw data generated from the DNA sequencer from each of our sequencing reactions. By comparing the sequences and examining the electropherograms, we hope to build an accurate consensus sequence that may be used to search for our gene in the database. The major goals for today’s lab are to (1) build an accurate consensus sequence using a technique called multiple sequence alignment; (2) use the sequence to search for our gene in a huge database of sequences held by the National Institutes of Health (NIH) using BLAST (Basic Local Alignment Search Tool). Your instructor will lead you through this project as an introduction to the software and the web resources available (be sure to take notes!). In the next lab, you will be introduced to a new DNA analysis software called DNA Strider and will be responsible for completing a task in class. At the end of the day, we hope to use these resources to find: * the complete sequence (sequence data will not cover the entire region) corresponding to the genomic copy as well as the cDNA * what type of protein our DNA insert encodes and its function in yeast Sequence Analysis and Database Search DNA Sequence Alignment: 1) Open Lasergene software from alias found in Apple menu (or open from Mac HD/Applications). 2) Select “Sequencing Project Management” button to open Seqman program (note name of program in top right corner of screen). 3) Open new project and select “Add sequences” box a. Go to BIO375 folder in Mac HD and open folder corresponding to this quarter (e.g., Spring 2003) and open the “sequence files” folder b. Select all files (Apple-a) and click “Add files” button (all should be placed in software window). c. Click “Done” and return to Seqman window. 4) Click “Assemble” button to perform multiple sequence alignment (window on right should now have status report of sequences included in “contig” Note: contig is a term that is short for contiguous sequence where several sequences that are aligned can form a longer sequence due to differences in length. By comparing the sequences that comprise the contig, you are able to obtain a consensus sequence 78 Consensus sequence 5’ 3’ Raw seq 5) Double click on contig of interest (normally the longest contig containing more than one sequence) to view sequences as shown in above diagram a. The arrow at the end of each sequence shows the direction of original sequence input (alignment will reverse orientation to fit into agreement with other sequences) b. Examine the associated electropherogram by opening the > arrowhead at left of each sequence i. Enlarge view clicking the magnifying glass symbol on control panel ii. Increase height by sliding the controller to the bottom on the same panel c. Numbers at sequence name show length and start of sequence used in contig (some ends may be removed due to poor quality from sequencer). 6) Determine good vs. bad data: a. Examine peak strength and resolution differences in electropherogram b. Lower-case letters, letters other than ATGC and gaps indicate potential problems that require further examination (sequence discrepancy among raw data files) A=adenosine R=G or A S=G or C H=A or C or T C=cytidine Y=T or C W=A or T V=G or C or A T=thymidine K=G or T B= G or T or C N= any G=guanine M=A or C D=G or A or T (-)=gap c. As you accumulate more sequence data in region, less errors are identified due to sample size 7) Alter consensus sequence based upon observation of data in contig so as to remove regions of low confidence (if not able to visually determine sequence, leave symbol of ambiguity). 8) Save results: a. Consensus sequence: * Go to Contig menu and “Save Consensus” command (save to your zip disk). Save file as “consensus” in FASTA format (.fas) and Lasergene format (.seq). b. Contig: * Go to File menu and save assembly to zip disk. 79 BLAST search: 1) Open Netscape or Internet Explorer and go to http://www.ncbi.nlm.nih.gov/BLAST/ 2) Since our sequence is DNA (not protein), perform the BLAST search at the nucleotide level by following the “Standard nucleotide-nucleotide BLAST [blastn]” link. 3) Go to your saved FASTA format of the consensus sequence and open the file (you may be required to select the application for this file, select Simple Text or like program) 4) Highlight and copy the DNA sequence only and paste this sequence into the “search” window in your Netscape/Explorer window for the BLAST search. 5) Click “BLAST!” button to initiate the search. 6) You will be sent to a BLAST search queue - click “Format!” to see the results (may be delayed before results are presented). 7) Once the search is done, a new window will appear presenting you with all of the “hits” that your query brought up from the non-redundant nucleotide database. 8) View the score (should be over 200) and E-value (likelihood of this match being found randomly - should be VERY small) for each match to determine if you can identify the gene(s) that most closely matches your sequence. 9) Examine the alignment for your top matches and follow one or more of the associated links to obtain more information about the subject sequence (“sbjct”). 10) Note in your lab notebook the identity of this sequence. Complete gene recovery: 1) Open a new window in Netscape and go to http://genomewww.stanford.edu/Saccharomyces/ 2) Search the SGD by typing in the name of the gene you have identified by BLAST (threeletter name and number). 3) Once you have found the page associated with your gene, examine this page on your own and write in your notebook the following information: a. are there introns in this gene and where are they located within the ORF? b. what are the chromosomal coordinates of the gene? c. is the gene found on the Watson or Crick strand of the chromosome? d. what are the ORFs on either side of the gene (Watson or Crick strand)? e. what is the molecular function of this gene product? f. what is one of the cellular components? 4) Go to the “Retrieve Sequences” section of this window (upper right side) and open the window containing the “DNA+1kb up/downstream”. 5) Highlight and copy this sequence. 80 6) Start the program DNA Strider (MacHD/Applications OS9/DNA Strider) and open a new window for DNA (File/New DNA or Apple-n). 7) Paste the sequence and save this file to your zip disk as “gene name+1kb”. 8) Return to your Netscape window and repeat this procedure to make a new file with the “coding sequence”. 9) Save this file as “gene name-coding”. Manipulation of insert sequence: 1) Open “consensus” file (FASTA format) using Simple Text and copy the sequence as before. 2) Open new window in DNA Strider for “Degenerate DNA” using the File menu and paste the sequence into this window using the Edit menu (or Apple-v). 3) Go to the “<->” menu and select the “5’ to 3’ DNA” option to convert the degenerate DNA into a non-degenerate format - save this file to your disk under the name of “consensus”. 4) Go to the “gene name-coding” file and copy the sequence using the Edit menu (or Applec) corresponding to the first 10 nucleotides starting with ATG (why not more than 10?). 5) Return to the “consensus” file, open Find menu/ “Find” command (or Apple-f) and paste into this window the nucleotide sequence. 6) Be sure your cursor is at the start of the sequence, then click the “Find” button to search for the start of your gene’s ORF. Note: if you are unsuccessful, you may need to search the antiparallel orientation of the “consensus” DNA (<-> menu / Antiparallel). If the sequence is not found, your consensus sequence may not be complete. If this is the case, proceed to step 9. 7) Once this sequence is found, try to determine the junction between vector and insert for both ends of your DNA comparing the “consensus” file with that of the vector sequence (flanking the T/A cloning site) found in your lab manual. 8) Once this is found, open the “gene name+1kb” file and trim it down to resemble the size of your insert information obtained from your consensus file. 9) Alternatively, use the primer sequence from the forward and reverse primers that were used to amplify your insert to define your insert size in the “gene name+1kb”. 10) The process of finding your complete insert sequence will be the active-learning portion of this experiment. As you are trying to complete this task, keep in mind that there may be regions that are outside of the coding region in your insert. 11) Once you have determined the complete sequence corresponding to the insert in your plasmid using the “gene name+1kb”, note the size of the fragment and save this file as “insert-gDNA” . If your amplicon was from a cDNA template, you will have to use this file to transform the “gene name-coding” file to generate the “insert-cDNA” (no intron). 12) Once this file is saved, you may leave the Computer Lab (you may need to return to the computer lab if you were unable to complete the task during class). 81 Computer Lab Computer Plasmid Map Project Project Goals: The purpose of this project is to expose you to computer software tools that are commonly used for DNA analysis in molecular biology research labs. During this project, you will use a program called DNA Strider to accomplish the following: 1) Create two versions of a recombinant plasmid by ligating the insert sequence into the MCS site of the plasmid pCR2.1-TOPO, in silico. 2) Use the sequence above to design a restriction mapping experiment to test the orientation of the insert in the actual plasmid you constructed in the lab. Project Outline (this is to be done alone if there are enough computer stations) Ligation in silico: Preparation of the empty vector sequence: 1) Start the DNA Strider program . 2) Open the pCR2.1 sequence that is found in the BIO375 folder (same folder from which you obtained the raw sequence data). 3) Find the site used in our T/A cloning procedure by searching for the upstream vector sequence using the information provided in the pCR2.1 map. In the “Find” window (apple-f or go to find menu), type in the sequence immediately upstream of the T/A site (see lab manual) and click OK. 4) After you find this site, highlight the “TA” sequence that was used as the cloning site for TOPO TA cloning. Capitalize this region to set it apart from the lower-case sequence of the vector by going to the Edit menu and selecting “UPPER CASE” command. 5) Go to <-> menu (also known as Convert menu in older versions of the software) and select “Circularize” command to make the DNA a circular entity rather than a linear piece of DNA (you should see “circular” appear and replace “linear” in the top right corner of your Strider window). 6) Go to file menu and save this file using “save as” command. Save the file as “empty pCR2.1” on your Mac zip disk. 82 Preparation of the 2 possible orientations of the insert sequence: 1) Open sequence file named “insert-cDNA” or “insert-gDNA”, depending upon your plasmid. 2) Write in the note section of this file (window under the sequence) the orientation of the ORF (ATG->TAA or TAA->ATG) and save it to your zip disc under the name of “ampliconOri1” 3) Change the orientation of your insert by selecting the entire sequence (using Edit menu or Apple-a) and going to the <-> menu to select the “Anti-Parallel” command. This will flip the orientation of the DNA (resulting in the presentation of the reverse complementary strand). 4) Alter the note section accordingly and save this file as “ampliconOri2” (using “Save As…” command in File menu). Construction of the 2 possible recombinant plasmids from T/A cloning in pCR2.1: 1) Find the T/A cloning site in the file “empty pCR2.1” that was offset from the rest of the sequence by capitalization (see above) and place your cursor in the cloning site. 2) Go to the “ampliconOri1” file and select all (apple-a) /copy (apple-c) the sequence of your amplicon. 3) Return to the pCR2.1 window and paste this sequence into the T/A site of the “empty pCR2.1” file. 4) Make note of the positions of the insert “beginning” and “end” sites and the overall plasmid size. Write this information in your lab notebook and tell the instructor so as to be sure of the proper in silico ligation. 5) Save this new recombinant construct as “pCR2.1+amplicon1” (using “Save As…” command in File menu). 6) Retrieve the original “empty pCR2.1” file and copy and paste the sequence from the “ampliconOri2” file into the T/A site as described in steps 1-4 (if you were successful, the position and length should not change from step 4). Save this file as “pCR2.1+amplicon2”. 7) Now you have two recombinant pCR2.1 plasmids that differ only in the orientation of the inserted amplicon sequence. 83 Restriction Mapping: Restriction analysis to determine the orientation of pCR2.1+amplicon 1) Open the “pCR2.1+amplicon1” file and go to Enzyme menu and select the "Restriction Report” command to display the sequence and all of the potential restriction sites found in this DNA sequence. 2) Scan through report menu which is broken into 3 parts: (1) sequence displaying all of the restriction recognition sequences included in this program; (2) site usage list of all enzymes and the frequency of each site within the sequence (note that “-“ means there were no sites found); (3) list of restriction enzymes (and recognition sequence) that cut within the sequence ordered according to frequency. 3) Scroll through the report until you find the third section and examine the enzymes that cut twice in the sequence. 4) Go to your lab notebook and make note of the start and end position of the insert amplicon. 5) Scan through the list of two-cutter enzymes and make a list in your lab notebook of all those enzymes where one of the sites falls within the insert amplicon and the other within the vector sequence (why would this be necessary?). 6) Close the restriction report window and return to the Strider sequence window for “pCR2.1+amplicon1”. 7) Go to Enz menu and scroll down to “Enzyme Chooser…” command to select one of the enzymes on your list of potential twice cutters. Highlight one of your enzymes and click “OK” button (more than one enzyme may be chosen by holding the shift key when you select additional enzymes). 8) Return to Enz menu and select “Digestion..” command to cut the DNA with your selected enzyme. Note: selection of enzyme may also be done by holding the option key down while you select the “Digestion..” command. After clicking the “OK” button, digestion will take place using the highlighted enzyme. 9) Repeat this digestion with each of your selected twice-cutter enzymes and compare the products from each reaction with those that would be generated using the “pCR2.1+amplicon2” sequence. 10) Note in your lab notebook which of the possible enzymes would provide you with the best data for determining the orientation of your insert (hint: you will want to use the enzyme that gives you fragments that would be easily distinguishable between the two orientations). 11) For your chosen enzyme (after you have shown your work to the instructor), repeat the digest for each pCR2.1+amplicon orientation using this enzyme as described above. 12) Go to the Enz menu and select the “Graphic Map” command to view your plasmid with the restriction sites (and position) indicated. 13) Print 2 copies of this map for each orientation (one copy for your lab notebook and one copy for the instructor. 84 Use the sequences generated in this lab to answer the following questions. This assignment must be turned in to the instructor by the end of the lab period. 1. On the graphic print out, circle the enzyme sites that will be cut, write the expected size of each product generated upon digestion and turn in a copy with your answers to these questions. (4 pts) 2. What is the exact size (bp) of your new plasmids? (2 pts) 3. What is the exact size (bp) of the insert? (2 pts) 4. List the bp positions of the inserted sequence (only the actual insert sequence – not T/A site) in your new plasmids (e.g., gene X is located in position 134bp to 1134bp in construct Y). (2 pts) 5. If you cut both of your plasmids with RsaI in the lab, how many and what size fragments will you expect to generate? (3 pts) 6. What size fragment(s) would you expect from an RsaI digest if your ligation were unsuccessful (i.e., there is no ACT1 insert)? (3 pts) 7. What enzyme are you going to use to differentiate between the two possible orientations? Explain the logic behind using this restriction enzyme (as opposed to other enzymes, like EcoRI, etc) to determine the orientation and the expected results. (4 pts) 85