Autosomal Dominant Diseases:

advertisement
Familial Hypercholesterolemia:
Understanding the Molecular Biology
Authors: John Sabo & Kevin Messner
This paper will discuss the inherited disease familial hypercholesterolemia. The overall
purpose is to teach the user how to search protein databases using the Biology
Workbench, a suite of bioinformatics tools, to conduct research. This tutorial uses tools
that range in complexity, and it is designed to allow the user to walk through the research
process, step-by-step, in order to get the desired results and achieve understanding of the
molecular biology of this disease.
Familial hypercholesterolemia is a genetic disorder in which cholesterol levels in
the blood are very high. There are several genes which can affect cholesterol levels. The
type we will study is an autosomal recessive genetic disorder. This means that only
people with two recessive alleles of this gene are unable to transport cholesterol to the
liver cells that metabolize it. Cholesterol is carried through the bloodstream in a lowdensity lipoprotein (LDL, a protein and lipid containing particle). In normal people, the
lipoprotein binds to a specific receptor on the liver cell surface and is then taken up into
the cell by endocytosis.
Some people who have the disorder lack a fully functional lipoprotein, named
Apolipoprotein E, due to a point mutation in the genetic sequence that encodes the
protein. In the point mutant, an arginine amino acid in position 158 is mutated into a
cysteine amino acid. This changes the conformation of the protein and the cholesterolcarrying lipoprotein cannot enter the cell because it cannot successfully attach itself to the
receptor protein on the liver cell. The excess cholesterol can build up on the inner walls
of the blood vessels, which can lead to blockage in the arteries, causing a heart attack, or
a blood clot in the brain, leading to a stroke. there are also other genetic problems that
cause this disease, such as mutations in the lipoprotein receptor, but in this tutorial we
will deal only with Apolipoprotein E.
PART I:
Opening an Account
If you already have a Biology Workbench account, go ahead and log into the program. If
not, this is what you need to do:
Go to the Biology Workbench homepage:
http://workbench.sdsc.edu
Click on the link that says, “Set up a free account”. Fill out the information requested.
After you submit the necessary information, click on “Register”. Type in your user
name and password and click on “OK”.
You will then be brought to the homepage for the Biology Workbench.
Scroll down to the bottom and select the background color you want. (We recommend
rose because this makes it easier to see the blue and green colors used later when aligning
sequences.)
Note: There are a variety of different layouts for the Biology Workbench. You can toggle
between the different layouts by clicking on the Biology Workbench logo at the top of the
screen – click on it until you find the format that fits the layout of this tutorial.
Starting a New Session:
As you can see, there are a number of different tool domains supplied by the Workbench:
“Session Tools”, “Protein Tools”, “Nucleic Tools”, “Alignment Tools”, and
“Structure Tools (Alpha)”. We will use "Session Tools" and "Protein Tools" in this
experiment. To remain organized, when you enter or leave the Workbench, you should
create a new session for every different topic you research.
1) Click on “Session Tools”. Highlight “Start New Session” and click on “Run”.
2) The following screen will require that you name this session. Call this session
“Hypercholesterolemia”. Then click on “Start New Session”.
Your new “Hypercholesterolemia” session will appear right below your “Default
Session”. You can click back and forth between the default and Hypercholesterolemia
sessions. However, make sure for the remainder of this exercise that the
“Hypercholesterolemia” session is selected.
In this tutorial, you are going to learn how to use the tools that will allow you to
search protein databases and analyze protein sequences imported into the Workbench.
Clicking on the box that says “Protein Tools” at the top of the page will bring you to the
Protein Tools homepage.
The new page says it is “empty”, that is, no protein sequences have been saved here yet.
Importing sequences from any number of sequence databases will change this and the
word “empty” will disappear; in its place will be the list of sequences that were imported.
PART II:
Importing Sequences and Viewing Structures from Protein Databases
In this section, you will learn how to search databases for a protein sequence. As
mentioned above, you will be working with apolipoprotein E to study this form of
hypercholesterolemia.
Notice the scrollable textbox at the top of the page… this box contains a variety
of tools, some of which you will now explore. Highlight “Ndjinn – Multiple Database
Search” and click “Run”.
The next screen list the different databases that you can search. In the search box at the
top of the page, type: “lipoprotein”. This tells the search engine what to look for. Also,
notice the box to the right of the input box. This simply allows you to decide how many
sequences you want to display. For this exercise, you want to see all of the sequences
that are found, so select “All”.
Scroll down the page. Below the input box, you will see a list of many different
databases, all containing a variety of sequence information. The databases are separated
into two distinct groups: the first group contains sequences from many different
organisms (for example, “GBBCT” contains a large number of sequences from many
different bacteria), whereas the second group contains the entire genome of specific
organisms (for example, “Mthe” contains the entire genome sequence of the bacterium
Methanobacterium thermoautotrophicum).
Click on the box that is next to the PDBFinder Database. This database contains
protein sequences that have a crystal structure formulated so that a 3-D picture of the
protein can be visualized. A little later we will look at some of these pictures.
Scroll back up to the top of the screen and click on “Search”.
You will then be sent to a page that contains the results of your search. At the time this
tutorial was written, the search engine found 53 matches for “lipoprotein”. If you get
more than 53 results, do not panic. Inconsistencies in the number of search results can
occur because new sequences are being added to the databases on a daily basis.
From the descriptions of the search results, we need to determine which one is the
wild type sequence for the lipoprotein receptor protein. In order to find the correct
sequence, we will need to check the records of the sequences to find out which sequence
is wild type. To do this, highlight the 53 sequences with your mouse.
Once all of the sequences are highlighted, click on the button below that says “Show
Records”. This tool will give a detailed description of each sequence that is highlighted.
If you are using Netscape, a new window will appear; in Windows Explorer the browser
will move to a new page. Scroll down this page until you can locate the wild type
sequence for Apolipoprotein E. Specifically, we want the "LDL receptor-binding
domain" of the protein. If you have correctly identified the wild type protein, you should
have picked the following:
This file (1LPE) is the wild type, or normal, protein sequence for lipoproteins
found in people that do not have hypercholesterolemia. This is the protein that we want
to work with. We need to do two things: look at a model of the protein structure, and
second, download the amino acid sequence of the protein into our Biology Workbench
session. First, let's look at at structural model of this protein. Click on the link to "PDB
Structure Explorer" in the upper-right hand of the entry for our protein.
This will bring up a new window with a page from the Protein Databank web site. Click
on the link to "View Structure" on the left side of this page:
You will come to a page with several options for viewing the protein crystal structure.
We will look at a still image of the protein. Scroll down and click on the link that reads
"Ribbons (250 x250)":
This will take you to a ribbon diagram of the protein. In this diagram you can't see
individual atoms or amino acids, but you can see the overall shape of the protein
molecule. Note the "corkscrews" that make up most of the protein chain. These
represent alpha helices, a common kind of protein structure.
So that later on you can compare this picture side-by-side with a picture of the
mutant protein, copy this picture into an application like Word. To do this, RIGHT click
on the picture itself, and then click on the "Copy" command that comes up in the
command box:
Next, open Word, and use the command bar go to Edit --> Paste (or press "CTRL" and
"v" simultaneously):
The picture should appear on your Word document as shown below. Be sure to label
your picture so you will remember what it is and not confuse it with the next picture.
Now that we've got the picture of the protein, next we want to import the sequence of this
protein from the Protein Databank into the Biology Workbench. Keep the Word
document open so we can look at it again later, and go back to the Biology Workbench
screen with the Ndjinn search results. (Note: If you are using Netscape, you can close the
pop-up Biology Workbench window with the full records. If you are using Windows
Explorer, you will need to hit the "back" button on the browser to get back to the search
results page).
Highlight the correct sequence: “pdbfinder:1lpe – lipoprotein”. Now click on
the button at the bottom that says “Import Sequence(s)”. This will bring the wild type
protein sequence for the lipoprotein (the sequence we just highlighted) into the protein
tools homepage for further investigation.
PART III:
Finding the Mutant Lipoprotein Sequence: Using BLAST
Now that you have the wild type lipoprotein sequence, you can use it to find the
mutant lipoprotein sequence that causes familial hypercholesterolemia. Because we
know that the mutant lipoprotein equence differs from the wild type sequence by a single
amino acid, we can use the wild type sequence to search a database for sequences that are
extremely similar (or homologous) to it. In order to carry out the homology search, you
are going to use a tool called BLAST.
Scroll down the textbox and look for the tool called “BLASTP – Compare a PS
to a PS DB”. (This is an abbreviation for “Compare a Protein Sequence to a Protein
Sequence Database”.) Select this tool and ensure that your ILPE lipoprotein sequence
has a checkmark in the box next to it – then click on the “Run” button.
You will be sent to a screen that will give you many options. These options allow
the user to fine-tweak their search. For the purposes of this exercise, we do not need to
deal with this. The important step here is to choose the database you want to use for your
homology search. Scroll down in the textbox until you come to the “PDBFinder”
database. We will use this database again since it is the one that we started with.
Highlight this database.
Now scroll to the bottom and click on “Submit”.
You will be sent to the following screen that contains the results of your BLASTP search:
Now, you will need to find the mutant sequence that is responsible for causing familial
hypercholesterolemia. But, before you do this, look at the number to the right hand side
of the results box. If you look over to the “Score (bits)” column, you will see that the
first few sequences have very high Score (bits) values. A Score (bits) value above 200
means that the sequence has high homology with the sequence that you are comparing it
to. However, you can be more certain of the extent of the homology between two
sequences by looking at the “E Value”. This is the number right next to the “Score
(bits)” number.
E Value
The E Value or “Expect” value is the most intuitive, or instinctive, way to rank
the results of a search. The E Value estimates the statistical significance of the search
result by specifying the number of matches with a given score that could be expected to
occur purely by chance in a search of a database of a particular size. For example, an
Expect value of 2.0 would indicate that two matches with that particular score would be
expected to occur purely by chance. The expected value changes with the size of the
database (in a larger database, more chance matches with a given score are expected).
Search results with E values much higher than 0.1 are unlikely to reflect true sequence
relatives, but in some circumstances they are useful. Essentially, the smaller the E
value, the more homologous or similar the sequence is to the original sequence
BLASTED. An E value of zero indicates that no matches would be expected by
chance – this would represent a perfect or near perfect match.
Now it is time to decide which sequence is the mutant lipoprotein sequence. The
mutation that causes familial hypercholesterolemia is a point mutation in the protein
sequence that changes the amino acid arginine in the 158th position to the amino acid
cysteine. Find the mutant in these results that matches this description. You can do this
by checking the records like we did a bit ago.
The correct sequence record is shown below:
If you will notice, this is a mutant that replaces arginine at position 158 with a cysteine.
That is the one we want!!
Like before, we want to take a look at this protein structure. Click on the "PDB
Structure Explorer" link at the top right of this record. On the window that comes up
click on the "View Structure" link on the left side. On the next screen click the link that
says "Ribbons (250 x250)." Like before, you can copy the picture that comes up to your
Word document. Right click on the picture, copy, then switch to the Word document you
still have up. Put the new picture beneath the first one. Make sure the cursor is at the
bottom of the Word document, then paste the picture in. Put in a caption for the second
picture; make sure to use the word "mutant" in this label.
Now, without the captions, could you tell the two proteins apart? They look
practically identical -- even though one of them is a mutant, and we know the mutant
protein doesn't function correctly! What's going on? Let's go back to the Biology
Workbench to see if we can figure this out.
If you're using Netscape, you can close the extra Biology Workbench window
with the detailed protein records, and just keep open the window with the list of
"sequences producing significant alignments." If you're using Explorer, you'll just need
to use the "back" button on the browser to get to the sequences screen.
We want to import the sequence for the mutant protein. This is:
(1LE2_LIPOPROTEIN|Apolipoprotein – E2_(LDL_receptor_binding_d…). Once
the sequence is highlighted, click “Import Sequence(s)”.
Now, the mutated sequence that causes the disease hypercholesterolemia is stored
in the Protein Tools homepage along with the wild-type sequence. We will now further
investigate where the mutation occurs in the mutant sequence of the lipoprotein, and try
to figure out why the mutant protein doesn't work.
PART IV:
Aligning the sequences: Using CLUSTALW
In order to compare two protein sequences side-by-side, they must be aligned one
on top of the other. This is the purpose of the CLUSTALW tool. The alignment process
takes place by comparing the two sequences and finding common regions within them.
The Biology Workbench then uses an algorithm to compute the most likely position in
which the two sequences line up. Therefore, alignment is a key step for you to determine
where a mutation is located. The two sequences are aligned one on top of the other and a
color-coding system is used to differentiate highly conserved regions and semi-conserved
regions (in royal blue and green, respectively) from the non-conserved regions. On to the
alignment…
Scroll down in the text box menu and highlight “CLUSTALW – Multiple
Sequence Alignment”. Click on the two lipoprotein sequences, 1LPE and 1LE2. Once
there are checkmarks in the boxes next to the sequences, click on the “Run” button.
Another screen will appear in which the alignment parameters can be altered – we
are going to use the default settings so just click on the “Submit” button. You will now
be taken to a screen that will show you the aligned sequences.
Scroll down until you see the alignment. If you will notice, the wild type
sequence is located on the bottom row and the mutant sequence is located on the top row.
The letters that you see each represent an amino acid in the protein sequence. Almost all
of the amino acids in both protein sequences match up perfectly… You can see this by
the royal blue color (see consensus key at the top). However, there is a single position in
which the two protein sequences do not match, and are color-coded black instead of royal
blue. This is the site of the mutation. This is where the arginine amino acid was replaced
by a cysteine amino acid in the mutant sequence. So you can easily see here that the
mutation is near the end (or "C-terminus" in biochemistry jargon) of the protein
sequence.
PART V:
Predicting the Secondary Structure of a Protein Sequence: Using GOR4
You are now going to take the two protein sequences and predict the secondary
structure of the proteins by using a tool called “GOR4”. This tool will show the
sequence of each protein and color code alpha helices with the color red and beta sheets
with the color blue, according to where they are found in the protein sequence. We will
use this tool to see if the mutated amino acid has caused a conformational change in the
mutant sequence when compared to the wild type sequence.
Go back to the Protein Tools homepage. You can do this by scrolling to the
bottom of the screen and clicking on “Return”.
Make sure that there are check marks in the boxes next to the sequences. Highlight the
choice that says “GOR4 – Predict Secondary Structure of PS”. Once this choice is
highlighted, click on “Run”.
Now you will be brought to a new screen. Just click on “Submit”.
A new screen will appear and your results will be shown:
The top sequence is the wild type (you tell this by looking at the code “1lpe” compared
to the mutant sequence found on the bottom; “1le2”. If you will look at the legend found
below the sequences, you will see that alpha helices are colored red and beta sheets are
colored blue.
Now, count the number of alpha helices and beta sheets in the wild type protein
sequence and in the mutant protein sequence. If you counted correctly, there are 5 alpha
helices and 1 beta sheet in the wild type. However, there are 5 alpha helices and 2 beta
sheets in the mutant sequence. Some of the helical structure of the wild-type protein has
been changed to beta sheet structure. This tells us that the point mutation in the mutant
protein sequence has caused a subtle conformational change in the protein structure.
Next we'll do one last experiment, to see if we can determine why the protein structure
might be affected in the mutant.
PART VI:
Determining the Isoelectric Point of the Proteins
Think about what determines the structure of a protein. Chemical bonds between
the amino acids do. For example, an amino acid with a plus charge, like arginine, might
bond with a negative amino acid, like glutamate, to form an ionic bond in the protein that
helps keep it together.
The "isoelectric point" of a protein is a number between 0 and 14 that measures
the charges on the protein. A higher isoelectric point means there are more positive
charges on the protein (or fewer negative charges), and a lower the isoelectric point
means there are fewer positive charges (and more negative ones). In this section we'll use
a tool that predicts the isoelectric points of our proteins to see if there are changes in
charge that might explain the change in structure.
Go back to the Protein Tools homepage. You can do this by scrolling to the
bottom of the screen and clicking on “Return”:
Select the tool choice that says "PI -- Isoelectric Point Determination." Unfortunately,
we can only run this tool on one protein at a time, so uncheck the mutant protein (1LE2)
check box so we calculate the isoelectric point for the wild-type protein (1LPE) first.
Then press the "Run" button:
On the next screen just press the "Submit" button:
On the results page, you will see a table of "pH" and "charge." The isoelectric point is
the point where the protein has a net charge of zero. Scroll down to the bottom to find
the isoelectric point, as shown below:
Then go back to the Protein Tools menu using the "Return" button at the bottom:
Next run the same experiment on the mutant protein. Uncheck the "1LPE" box and
check the "1LE2" box, then run the program as before.
The number you get for the isoelectric point should be different and lower. This means
that the mutant protein has lost some positive charge. This positive charge, on the
arginine amino acid that was replaced by cysteine in the mutant, must be important for
the structure and function of the protein!
CONCLUSION
Why doesn't the mutant protein work right? We now have some important
information, which we found using bioinformatics tools. The change in charge in the
mutant apolipoprotein can be seen in the different isoelectric point of the mutant. We
also saw using GOR4 that there is a small conformational change in the mutant protein.
Remember that we couldn't see that change very well just looking at a picture of the two
proteins. This conformational change turns out to be responsible for this form of the
disease hypercholesterolemia. The changed shape in the mutant lipoprotein does not
allow it to attach onto liver cells which would normally “eat” the cholesterol the
lipoprotein delivers. So, instead, the cholesterol cannot be used up by these liver cells
and remains in the bloodstream, which causes symptoms and threats such as those
mentioned in the beginning of the tutorial.
Download