Bioinformatics and Genomics Lab - GENI

advertisement
Bioinformatics and Genomics Lab
Dr. Wood
Seattle Pacific University
Purpose: The goal of this lab is to introduce you to the tools and resources used to extract
information from genome sequence data and to familiarize you with the information resources
that will enable you to apply this information in clinical practice.
Part I: What can we learn from DNA?
Scenario: You are a laboratory assistant who has just taken a job at the Centers for Disease
Control in Atlanta, GA. Your supervisor has just begun work on a disease outbreak occuring in a
small town in Washington state called Coupeville. A large number of high school athletes have
acquired skin infections that are resistant to standard antibiotic treatment. Clinicians on site are
confident that they have identified the causative agent of the disease and are working hard to
find an effective treatment. Given the size of the outbreak the CDC has decided to begin an
investigation into the source of the outbreak which requires further molecular characterization.
You have been given two tasks. First, confirm the identity of the responsible bacterium. Second,
determine the cause of the antibiotic resistance. Once this is complete the laboratory will
genetically profile all isolates to determine if a one or more individual strains are involved in
this outbreak (you will not do this during this lab).
Procedure:
Identify the causative agent using rDNA comparison. Your supervisor has requested that the
laboratory sequence the rDNA of the bacterium isolated from patients in the outbreak. The
sequence has been provided to you below and you must use it to verify the clinical
identification of the agent causing this outbreak.
(modified from http://rdp.cme.msu.edu/assigngen/basicinstr.jsp ):
Find the closest match to your sequence




Go to the Ribosomal Database Project at http://rdp.cme.msu.edu/index.jsp.
Go to the Sequence Match analysis tool (also called SEQMATCH).
Paste your unknown rDNA sequence into the text box.
Change the options below to:
o Strain: Both
o Source: Isolates
o
o
o
o


Size: >1200
Quality: Good
Taxonomy: Nomenclatural
KNN matches: 1
Click on "Submit".
Go to "view selectable matches." Only the closest match will be displayed. Record the Genus
and species of the closest relative.
Name of organism:______________________________
What diseases are caused by this type of bacterium?
Determine why the organism is resistant to antibiotics. Your supervisor has asked the laboratory to
provide you with the sequence of a particular gene that she believes is involved in the ability of this
pathogen to resist antimicrobic therapy. This sequence is available below. Determine the identity of the
gene encoded by this sequence and investigate the mechanism by which it confers resistance.
a. Identify the unknown gene. Using the BLAST program discussed in lab (see links below) identify
the name of the unknown gene and answer the following questions:
I.
From NCBI BLAST (http://www.ncbi.nlm.nih.gov/blast/ ) select protein blast near the
center of the page.
II.
Paste your unknown sequence into the box and select the BLAST button near the
bottom of the screen.
III.
Review the information and answer the following questions:
b. Questions:
I.
What is the e-value of the match between your protein and its best match? What does
this tell you about these two proteins?
II.
Is your protein identical to the best match? If not, how many of the amino acids are
exact matches?
III.
What is the name of your unknown protein based on similarity with its best match?
IV.
What is the name of the gene that makes your protein?
c. Investigate protein domains. Using the PFAM program below investigate any protein domains
in your unknown protein to help you determine its function.
I.
From the PFAM programs at the Sanger center (http://pfam.sanger.ac.uk/) select the
Sequenc e Search link.
II.
Paste your unknown sequence into the box and hit go.
III.
Review the information and answer the following questions.
d. Questions:
I.
What is a protein domain?
II.
Evaluate the three best scoring domains. List each below along with the e-value of the
match and briefly describe its function .
i.
ii.
iii.
e. Further investigation. You should work on this section at home. You will need to research the
function of the gene you identified using internet or other sources. Feel free to work with your
lab partner or in groups to answer these questions.
I.
What class of antibiotics would you expect this pathogen to be resistant to?
II.
How does general type of protein work in normal cells?
III.
How does the presence of this protein provide resistance to antibiotics?
IV.
What antibiotics might work to treat this disease?
Part II: The human genome and disease. In this part of the laboratory you will investigate the link
between genetic alterations in the human genome and disease. Use the information found at
http://www.ncbi.nlm.nih.gov/disease/ and other sites you locate on the internet to answer the
following questions.
Questions:
1. What gene or genes are mutated in patients with Cystic Fibrosis?
2. How many mutations are associated with CF?
3. Which chromosome contains the gene whose alteration leads to CF?
4. What specific microorganisms are commonly associated with this disease?
5. How is this disease treated?
6. Select one other genetic disease (it does not need to be microbial in nature). Note the
chromosome or chromosomes involved, the gene or genes involved, the specific
mutation or mutations and briefly review the symptoms and treatment for the disease.
Resources:
Your unknown rDNA sequence:
TTTTATGGAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAACGGACG
AGAAGCTTGCTTCTCTGATGTTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTACCTATAAGACTGGGATAACT
TCGGGAAACCGGAGCTAATACCGGATAATATTTTGAACCGCATGGTTCAAAAGTGAAAGACGGTCTTGCTGTCACTA
TAGATGGATCCGCGCTGCATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCAACGATGCATAGCCGACCTGAGAG
GGTGATCGGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGG
CGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACAT
ATGTGTAAGTAACTGTGCACATCTTGACGGTACCTAATCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTA
ATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTTTTAAGTCTGATGTGAAA
GCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGT
AGCGGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGATGTG
CGAAAGCGTGGGGATCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGG
TTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAG
GAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAAATCTTG
ACATCCTTTGACAACTCTAGAGATAGAGCCTTCCCCTTCGGGGGACAAAGTGACAGGTGGTGCATGGTTGTCGTCAG
CTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTAAGCTTAGTTGCCATCATTAAGTTGGGCA
CTCTAAGTTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGC
TACACACGTGCTACAATGGACAATACAAAGGGCAGCGAAACCGCGAGGTCAAGCAAATCCCATAAAGTTGTTCTCAG
TTCGGATTGTAGTCTGCAACTCGACTACATGAAGCTGGAATCGCTAGTAATCGTAGATCAGCATGCTACGGTGAATA
CGTTCCCGGGTCTTGTACACACCGCCCGTCACACCACGAGAGTTTGTAAACCCGAAGCCGGTGGAGTAACCTTTTAG
GAGCTAGCCGTCGAAGGTGGGACAAATGATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGA
TCACCTCCTTTCT
Sequence of unknown protein provided by your supervisor:
mkkikivpli
emmereikiy
gmwkldwdhs
kdywaiakel
rnyplgkats
ivddnsntia
vstpsydvyp
dktsykidgk
kgmkklgvge
ngninaphll
grqigwfisy
livvvvgfgi
nslgvkdini
viipgmqkdq
iieedyikqq
hllgyvgpin
htliekkkkd
fmygmsneey
wwqkdkswgg
diesdypfyn
kdtknkvwkk
dkdnpnmmma
yfyaskdkei
qdrkikkvsk
sihienlkse
mqqawvqddt
seelkqkeyk
gkdiqltida
nvltedkkep
ynqtryevvn
aqisnknldn
niiskeninl
invkdvqdkg
nntiwaiedk
nkkrvdaqyk
rgkiwdrnnv
fvplktvkkm
gykddavigk
kvqksiynnm
llnkfqitts
gniqlqqaie
eilladsgyg
ltmgmmqvvn
masynakisg
nfkqvykdss
iktnygnidr
elantgtaye
deylsdfakk
kgleklydkk
kndygsgtai
pgstqkilta
ssdfiffarv
qgeilinpvq
kthkediyrs
kvydelyeng
yisksdngev
nvqfnfvked
igivpknvsk
fhlttnetes
lqhedgyrvt
hpqtgellal
miglnnktld
alelgskkfe
ilsiysalen
yanligksgt
nkkydide
Download