DC Bioinformatics II

advertisement
Biology 3492
Spring 2008
Laboratory 1-2
----------------------------------------------------------------------------------------------------------------The Tetrahymena genome database (TGD): getting your gene’s sequence
----------------------------------------------------------------------------------------------------------------This semester we will be working together to study the membrane skeleton of the ciliated
protozoan, Tetrahymena thermophila. The membrane skeleton is a protein structure just beneath
the plasma membrane and is likely essential for maintaining cell shape and regulating cell
functions, such as exocytosis. As a starting point, we will be using proteomic analysis performed
by Prof. Jerry Honts (Drake University), which has identified about two dozen protein
components of the membrane skeleton (see tables on following pages). We will be his
collaborators, studying these proteins during the course. We will use cell and molecular biology
techniques to demonstrate that the proteins identified by Dr. Honts are really a part of the
membrane skeleton, also called the “epiplasm”.
You will each be assigned one gene to characterize during the course. We will use the
Tetrahymena genome database (www.ciliate.org) to find our genes’ sequences, download to files
for use throughout the course, and design oligonucleotide primers to amplify and clone the genes
from Tetrahymena genomic DNA. We will also design PCR primers to each gene to perform
rtPCR to determine whether and when these genes are expressed. In this first week of class, we
will retrieve our gene sequences and design the primers for this work. So each one will design
primers for the chosen gene for both cloning and expression analysis. We will use the Primer3
program (http://frodo.wi.mit.edu/), which matches optimal primer pairs for PCR. We will also
become familiar with the program Gene Construction Kit, which we will use to analyze and
display our genes’ sequences. A graphic representation of a DNA sequence is an important part
of the design of experiments in molecular biology. One needs to be familiar with the overall
structure/layout/features of the DNA of interest to enable future studies.
Steps:
1. Getting your gene sequence.
Go to TGD and download the sequence of your gene of interest.
--Download your:
1) coding sequence
2) ORF translation
3) at least a 5 kbp region spanning your gene sequence (larger if this does
not include your whole gene.
-- Paste each sequence into a Word file (save your work)
-- Paste your genomic sequence (3) into a GCK file
MAKE SURE TO SAVE YOUR WORK
2. Annotating your gene sequence.
a. Color your coding region Blue
b. Find your introns, Compare your Genomic sequence to your coding sequence using the
EMBOSS alignment website (http://www.ebi.ac.uk/emboss/align/)
The introns will we the gaps in the coding sequence
c. Go to the TGD genome browser and look for available cDNA sequences. Download these
and use to confirm the predicted introns.
Color predicted introns RED, confirmed introns GREEN.
d. Mark the following restriction sites on your sequence: ApaI, BamHI, BglII, BsrGI, EcoRI,
EcoRV, HincII, HindIII, NotI, PstI, SacI, SalI, ScaI, XbaI, XhoI
1
Biology 3492
Laboratory 1-2
3. Designing oligos to clone your gene sequence.
Spring 2008
To get started, one needs to first PCR amplify one’s desired coding sequence and clone into an
entry vector such as pENTR-D. When using the pENTR-TOPOD cloning kit, 4 nucleotides –
CACC- must be added to the primer in frame with the ATG start codon (plus the six nucleotides
upstream of the ATG start codon sequence to ensure good translation of the mRNA). This will
allow directional cloning into the pENTR-D vector.
your favorite gene- PCR amplify and clone into pENTR-D-TOPO
yfg
coding sequence
CACC XXX XXX ATG................................................................................................GATATC
-6
+1
a. Paste in the your coding sequence into the primer3 dialog box. Set the size of your PCR
product to amplify between the size of your coding sequence and this minus 50 bases.
b. Adjust parameters as needed to get appropriate oligos. The upstream oligo must start at –
6 plus have a CACC added to the 5’ end. The downstream oligo must start at the last
base of the last codon and be designed anti-sense to the gene. A GATATC should be
added to its 5’ end.
c. To design primers for expression analysis, if possible locate an intron in your gene and
design them to span the intron. Paste this region of your sequence into the Primer3 dialog
box. Set size range of your product to be between 170 and 250 bp (not counting the
intron). Select primers.
d. Copy all primer sequences into your word file and email to Prof. Chalker
(dchalker@wustl.edu).
MAKE SURE TO SAVE YOUR WORK
Accessing Bio3492 folder on the NSLC server (to use to turn in computer files):
In ‘Finder’ select ‘GO’:’Connect to Server’
Select ‘Local’ folder: Select ‘NSLC Server’: Connect’: logon as ‘biol3492’ password: ‘epiplasm’
Create a folder of your files on your computer’s desktop; before ending, be sure to Click and
drag your folder into for future use.
2
Biology 3492
Spring 2008
Laboratory 1-2
Some GCK function commands you will need:
Make codon table for Tetrahymena
Open file;
Construct: features :edit codon table: new codon table,
Change TAA and TAG stop codons to Gln (glutamine); name ‘Tetrahymena’ ; ‘OK’
Select codon table for Tetrahymena
Open file;
Construct: features :select codon table: Select ‘Tetrahymena’ ; ‘OK’
To show numbered coordinates of the sequence
Open file Construct: Display: Show Positions
To group sequence for friendly display
Select sequence;
Format:Grouping:(select ‘by threes’ for protein sequence, ‘by tens’ for
others)
To color a block of sequence
Select sequence;
Format:color:(select color)
To designate a region
Select sequence;
Construct: features :Make region
type in name, designate as protein if desired
To change regions arrow display to line or different arrow
Select region; Format: lines: (select line type)
To mark restriction sites
deselect all sequence; Construct: features :mark sites
in popup window, select enzyme names one at time and ‘add’ to list, click ‘OK’
To mark a particular position (e.g. to indicate an oligo position)
Select cursor location at site; Construct: features: mark location
To insert a particular sequence into an existing file (e.g. to indicate an oligo position)
Select sequence to be inserted and copy
Place cursor over at site to insert sequence and paste sequence to this location
Notes: By highlight a region of sequence in target file, the highlighted sequences will be removed
during operation
Sequence can be inverted by ‘special paste’
To space restriction site markers
Select sites in graphic display Format:Sitemarkers:automatic arrangement (or just drag
individually)
To generate list of restriction sites in your sequence
deselect any sequence; Construct: features :list sites
in popup window, select enzyme names one at time (or ‘add all’ and ‘add’ to list, click ‘OK’
3
Biology 3492
Laboratory 1-2
Spring 2008
Database and analysis tools links
Tetrahymena Genome Database
www.ciliate.org
Eric Cole’s: biology of Tetrahymena website
http://www.stolaf.edu/people/colee/
Sequence analysis tools at EMBL-EBI
http://www.ebi.ac.uk/Tools/sequence.html
EMBOSS pairwise alignments
http://www.ebi.ac.uk/emboss/align/
ClustalW sites for multiple sequence alignments
http://www.ebi.ac.uk/clustalw/
http://clustalw.genome.jp/
[Easier to download trees]
NCBI Blast homepage
http://www.ncbi.nlm.nih.gov/BLAST/
Pfam database of conserved protein motifs
http://pfam.wustl.edu/
4
Download