Medical Problem Solving Anemia

advertisement
Medical Problem Solving Case Study
Medical Problem Solving: A Genetic Link to Anemia
Part 1: Molecular Diagnosis
Imagine you are a pediatrician at the Clinic for Special Children. A young, Mennonite boy, John, comes to see
you. He has diarrhea, headaches, fatigue, a sore mouth and tongue as well as tingling of the hands and feet.
His skin is noticeably pale. You suspect John may be suffering from anemia, so you order a complete blood
count (CBC) test. It shows that he has low hemoglobin levels—John has anemia.
Use the OMIM database (Online Mendelian Inheritance and Man:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim) to see if you can diagnose John’s condition. OMIM is a
database of human diseases and genes. Search the term “anemia”.
1. How many possible diseases could John have?
1. 600+ accessions
Clearly you need more information because this list is HUGE. You request additional blood work.
Examination of a blood smear under the microscope shows larger than normal erythrocytes (red
blood cells) called megaloblasts. You conclude that John is suffering from megaloblastic anemia.
Additional blood work shows normal levels of folate, but low levels of cobalamin (vitamin B12).
Use the OMIM database to see if you can diagnose John’s condition with this additional information.
the terms“megaloblastic anemia and cobalamin”.
(from
http://jcp.bmj.c
Search
om/content/vol
56/issue4/cover
.dtl)
OMIM User Tips:
Use Boolean search terms like AND, OR, and NOT to narrow your search.
2. How has your list of possible diseases changed?
You also note that John does not appear to suffer from:
 Homocystinuria (symptoms include dislocation of eye lens, skeletal malformations)
 Methylmalonic acidemia (symptoms include feeding issues, kidney disease, pancreatitis)
Search the OMIM database again to see if you can diagnose John’s condition. This time, limit the search (see
below) and search “megaloblastic anemia and cobalamin not homocystinuria not methylmalonic acidemia”.
3. How many possible diseases are listed now?
7
Next you will limit the search by following the directions in the box below.
OMIM User Tips:
 Limits: by using the LIMITS tab in the top left, you can limit your search to genes where the phenotype (disease)
and genotype (genes, markers or sequence) are known.
 Prefixes: Names in OMIM are preceded by symbols that code for how much is known about the genotypes and
phenotypes.
 Known genes: You may want to limit your search to genes where the phenotype (disease) and genotype
(sequence or molecular basis) are known. Check the box beside (#). Since we’re trying to link phenotype and
genotype, situations where only one is known won’t help us.

1
Medical Problem Solving Case Study
4. How many possible diseases are listed now?
You use an antibody test and it rules out intrinsic factor deficiency.
Click on your candidate disorders in the OMIM entry. Each click will open up a detailed entry about the
particular disorder. In the screen shot below, note the gene locations and gene links are listed. Click on the
gene links for the genes that are designated as possible causes of the disorder. As seen in the second screen
shot, click on Genome under ‘Table of Contents” and then click on the “NCBI Map Viewer”.
 Four possible genes
2
Medical Problem Solving Case Study
Investigate each candidate gene using Map Viewer. Click on the chromosome where the gene is located (in
red). You will then an image that looks like the screen shot below.
5. Fill in the first three columns of Table 1 below with the gene name, genetic location, and physical
location for each candidate gene.
Tips about location, location, location…
There are at least two ways to describe where a gene is located.
1. Genetic location: is based on recombination frequencies and chromosome bands. For example, the AMN gen is
found at 14q32.
 “14” refers to the chromosome.
 “q” refers to the long arm of the chromosome. The short arm is “p”.
 “32”means that it’s found after the 32nd band on the stained chromosome.
2. Physical location: is counted in nucleotides from the beginning of the chromosome. The AMN gene is found on
chromosome 14 at 102.45 MB.
 The gene begins 102.45 million bases from the beginning of the short arm of chromosome 14.
Candidate
gene
GIF
Genetic
location
11q13
CUBN
10p12.1
Table 1: Candidate Gene Information
Physical location
Number homozygous
(MB)
markers
3
Size of homozygous
region (bp)
3
873,898
4,981,811
3
Medical Problem Solving Case Study
Now that you have your candidate genes, how are you going to determine which (if any) of these genes is
responsible for John’s condition? You will need to collect more data.
 genetic marker information from the microarrays to look for regions of homozygosity. Then you will
Part 2: Which gene is it?
You now have several candidate genes. To figure out which (if any)
gene is causing John’s condition, you need to survey John’s genome. At
the Clinic for Special Children, you would request that Dr. Erik
Puffenberger run a “10K microarray” in the lab. This will pinpoint
regions of the genome where John is homozygous; meaning he has two
copies of the same allele.
Why does homozygosity matter? Most genetic disorders are recessive,
meaning an affected individual carries two copies of the “broken” gene.
This gene then goes on to make a defective version of a protein that
does not function normally in the cell. We can predict that the “broken”
gene resides in a region of John’s genome that is homozygous.
Open the Excel file called “Microarray Data Med Prob Example.” The
title row will remain frozen so you can retain it while scrolling through
this vast amount of data. Use the tips below to help understand what
each column of data represents.
A 10K what?
A “10K microarray”—also
known as a gene chip—is a
way of looking at 10,000
markers simultaneously. The
markers are single nucleotide
polymorphisms (SNPs).
These single base pair
differences are spread
throughout the genome.
Analyzing SNPs can help
identify genetic problems
including homozygosity.
Scroll through the data until you find the markers around the candidate genes. (Hint: use the physical
locations in the table on page 3). Fill in the last two columns in Table 1 (prior page) with the number of
homozygous SNP markers and the size of the homozygous region in base pairs. To find the size, subtract the
outer most numbers (on either side of the homozygous area).
SNP Excel Tips
 The SNP name is useful for cross referencing databases, but isn’t important for you.
 The chromosome column tells you where the SNP marker is located.
 The position corresponds to the physical map and represents the number of base pairs from the
short end of the chromosome.
 The alleles tell you whether John is homozygous or heterozygous.
 Homozygous markers (AA or BB) are highlighted in yellow. They have two copies of the same
allele.
 Heterozygous markers (AB) are highlighted in green. They have two different alleles.
 The number of homozygous SNP markers in a row is important for finding blocks of homozygosity—
probable locations for John’s problem gene.
 The LOD score represents the log of the odds ratio (likelihood linked/likelihood unlinked).
6. Where do you hypothesize you will find the problem gene? Why? What do you do next? Hint: Think
homozygous recessive.
4
Medical Problem Solving Case Study
Part 3: Sequencing and Sequence Analysis
The next step is to sequence the hypothetical problem gene. You should have chosen the AMN gene because
it is in the largest block of homozygosity in any of the candidate regions.
After sequencing John’s DNA, you receive the following electropherogram and sequence (253 bp in length):
CCGCCCCTCGCACCAGGCGCAGCCGTGGATCTGCGCGGCCCTGCTCCAGCCCCGGCCCAGGGGCAGTGCTGTGACCTCTG
TGGTAAGCGCCCCCGCCGGGCCCTGCTTGCTGGGAAGGCCTGGAGGACCAGGTTCGTCCCCCGCCTCAGTTTCCTGCCGG
GCCCGGATCCACGGCGCTGACCCCTGCCCTCCCGCCGCAGGAGCCGTTGTGTTGCTGACCCACGGCCCCGCATTTGACCT
GGAGCGGTACCGG
In order to determine where a potential mutation may be located in John’s AMN sequence, you must compare
it to a reference sequence. This can be done using a sequence comparison tool called BLAST. BLAST takes
your sequence of interest and compares it to more than ten million sequences from different species that are
stored in the public database (NCBI). BLAST will show how your sequence aligns with other sequences. It also
calculates the statistical significance of similarity for the matches it finds.
What is a reference sequence?
A reference sequence is a baseline
for comparison. It is a composite
of many different genetic
sequences that best approximates
the average genome for a species.
BLAST John’s sequence above by going to http://blast.ncbi.nlm.nih.gov/Blast.cgi and clicking on the Nucleotide
BLAST (BLASTn) option. BLAST can work with proteins as well, but of course you are working with DNA. Next
copy and paste the sequence into the white box under “enter query sequence”.
Click on the blue BLAST button at the bottom of the page.
5
Medical Problem Solving Case Study
Read “Mining the BLAST data” in the box and look at the screen shot at the bottom of this page. Analyze the
alignment data. The Query is John’s sequence.
Mining the BLAST data
The results are presented in three formats: the Graphic Summary, the Descriptions and the Alignments.
 The Graphic Summary is a visual summary of all the alignments and their quality.
 The red bar represents the length of John’s sequence (called the “Query”).
 The thin colored lines represent matches from the database.
 The line color corresponds to the alignment score (S), which measures the closeness of fit to
John’s sequence. The higher the score, the better the alignment.
 Mouse over one of these sequences to see its name, the alignment score (S) and expect value (E).
 The expect value (E) describes the number of hits one can “expect” to see by chance when
searching a database of a particular size. As the S score increases, the E score decreases
exponentially. Therefore, in a good alignment, this score will approach zero.
 The Descriptions give a summary of all the matches in a table.
 The Alignments show the base-by-base pairing between John’s sequence (“Query) and the data base
sequence (“Sbjct”).
 Get there by either clicking on one of the colored lines in the Graphic Summary or by scrolling
further down the results page.
 Look for 100% coverage. You set out to sequence a whole gene. Your best match will likely fit the
whole sequence.
 Pay close attention to the base pair numbers at the beginning and end of each line of matched
sequence. They represent nucleotides and will reveal gaps.
 atches are to the AMN gene that you chose to seqence. That’s good. But its showing the results in tle
6
Medical Problem Solving Case Study
pieces. The best match doesn’t appear because the stringency is set too high.
7. Are any the sequences good matches? Do the sequences match the whole gene or only part of it?
Remember that you began this section by sequencing a candidate gene. Therefore, the best match will be the
reference sequence for the gene you sequenced. It will likely cover 100% of John’s sequence.
8. The default is to BLAST with very high stringency—allowing only the tightest matches to appear. Why
might the beginning of John’s sequence be left out?
9. How might lowering the stringency of your BLAST analysis affect the results?
 Allows the big deletion to appear.
To adjust the stringency of the BLAST search, or optimize it, go back to the screen where you entered John’s
sequence by clicking on “Edit and Resubmit” at the top-left of the page . At the bottom of the page, choose
“More dissimilar sequences” and look at how your results change with a less stringent BLAST analysis. Next
run another BLAST analysis at the least stringent condition, using the “Somewhat similar sequences” option.
10. What are the differences between the results from your three BLAST analyses?
11. Is there a problem with John’s AMN gene? If so, what?
Next you will use another nucleotide comparison tool called BLAT. BLAT essentially does the same thing as
BLAST, but compares only human DNA sequences. Go to http://genome.ucsc.edu and click on BLAT at the
top. Copy and paste John’s DNA sequence into the top, white box and hit submit.
The BLAT results will be listed with some important information as seen in the box on the next page.
7
Medical Problem Solving Case Study
BLAT data
 The score is the number of matches vs. mismatches. The higher the score, the more matches. If the
score is close in number to the size of the DNA sequence query, this indicates a good match.
 Be sure to check the chromosome number. If you have identified a chromosome with an area of
interest, that should be your first place to start.
 The identity number may be high, but only for a few matches.
 Matching bases in cDNA and genomic sequences are colored blue and capitalized. Light blue bases
mark the boundaries of gaps in either sequence (often these are splice sites, but could also be
deletions).
AMN gene
Click on the details of the seqeunce that you feel bests fits your query. Look at alignment to help figure out
what is wrong with John’s gene.
12. Does the BLAT search confirm your conclusions made from your BLAST search?
13. What are the advantages/disadvantages of using BLAT?
You have now successfully found the genetic cause of John’s disorder. However, nothing you’ve done yet has
actually helped John feel better. To help John, you must now formulate a treatment strategy. A good place to
start is to look again at the OMIM entries for megaloblastic anemia 1 and the AMN gene and read about the
function of the gene in humans.
Megaloblastic Anemia 1: http://www.ncbi.nlm.nih.gov/omim/261100
AMN gene: http://omim.org/entry/605799
14. Biochemically, what effect does the AMN gene mutation have on John? Which tissues are affected?
15. Brainstorm some possible treatments that could help John.
8
Download