Poster Presentation

advertisement
Identification and Characterization of Lin12/Notch Repeats (LNRs):
A Bioinformatics Approach
Abstract
Lin12 Notch Repeats (LNRs) are Ca2+ binding, cysteine-rich protein
domains. They were first found in a block of three in a transmembrane receptor
protein called Notch. Since then they have also been found in other types of
multidomain proteins such as the Pregnancy-associated Plasma Protein (PAPP)
and Stealth proteins. In these proteins, the LNRs are present in a variety of
different numbers and arrangements.
For this project, we have used a variety of different bioinformatics tools to
identify, align, and compile information on different LNRs from different protein
sources. These tools include BLAST, ClustalW, ExPASy Proteomics Tools, and
UniProt. Using these tools, we have been able to compile a list of different LNRs
along with certain physicochemical properties of each, including the theoretical pI,
the molecular weight, the number of acidic and basic residues and the extinction
coefficients. We have also broken down the percentages of each amino acid and
each type of amino acid in each residue position relative to the cysteines.
Our preliminary results indicate that although all LNRs, regardless of their
origin, are small, acidic sequences. There are important subtle differences in the
details of each LNR sequence that might shed light into their unique biological
function within the larger multidomain protein scaffold. The compilations
presented in this work are useful in comparing different LNRs and deciding which
LNRs would be valuable for further studies. .
Fathima F. Jahufar, Framingham High School ’07.
Didem Vardar-Ulu, Chemistry Department
Fig. 2:
Websites
and Tools
Introduction:
Lin12 Notch Repeats (LNRs) are relatively short protein domains (only
about 35-40 amino acids long) found in a variety of different protein families.
LNRs were first found in a block of three in Notch protein, a transmembrane
receptor protein. In this protein, LNRs help maintain the receptor in a resting,
metalloprotease-resistant conformation prior to ligand binding (1) . LNRs are also
found in other multidomain proteins such as PAPP proteins and Stealth proteins.
PAPP proteins, like the Notch, have three LNRs. However, the third LNR is
separated from the second LNR by more than 1000 amino acids (2). LNRs in
PAPP are thought to determine the proteolytic specificity of PAPP, which cleaves
insulin-like growth factor-binding proteins (2) . In Stealth, LNRs come in ones or
twos, but are not found in all Stealth proteins (3).
Average natural abundance of cysteine in proteins is about 2.3% (4).
However, most LNRs are ~ 15-17% cysteine. Hence, they are very cysteine rich
and require Ca2+ to fold properly into their native forms. Most LNRs have six
cysteines, while a few have only four. These cysteines help to form three (or two)
specific disulfide bridges that help give LNRs their structure. LNRs also contain
several aspartic acids and asparagines that coordinate the binding of Ca2+ ions.
Using bioinformatics to study LNRs involves the use of websites such as
UniProt, BLAST, ClustalW2, and ExPASy Proteomics Tools. UniProt allows
keyword/ text searches to identify amino acid sequences from different data bases.
It also matches input sequences to sequences within proteins in a database and
provides basic information about these proteins. Protein BLAST (Basic Local
Alignment Search Tool) compares amino acids sequence inputs to those in the
protein database and outputs significant matches. ClustalW2 is an online tool that
aligns multiple amino acid sequences facilitating one to one amino acid
comparisons. Finally, ExPASy (Expert Protein Analysis System) Proteomics tools
allow information to be gathered and predictions to be made about amino acids
sequences. We have used UniProt and BLAST to first identify different LNR
sequences within the protein database and to determine their location within their
corresponding protein sources. Then, we used ClustalW to align these LNRs, after
which we improved these automated alignments manually based on the position of
the cyteines and the Ca2+ coordinating residues that define an LNR. Finally, in
order to better understand and predict the biochemical and biophysical
characteristics of LNRs, we used EXPASY Proteomics Tools to compile a list of
physicochemical properties for each of the identified LNR sequences. The
alignments of the LNRs (each slot numbered) and small sections of the tables
detailing the properties of the LNRs and of each slot in the alignments are
presented here.
NEC
1
5
10
15
20
25
30
35
40
45
50
Slot #
LNRA
LNRB Human
LNRC Notch1
LNRA
LNRB Human
LNRC Notch2
LNRA
LNRB Human
LNRC Notch3
LNRA
LNRB Human
LNRC Notch4
LNRA
LNRB Mouse
LNRC Notch1
LNRA Fruit
LNRB Fly
LNRC Notch
LNRA
LNRB Frog
LNRC Notch
LNRA Zebra
LNRB Fish
LNRC Notch
Gluc TraB
Green Algae
EGF-Like repeats
LNR
A
LNR
B
RAM
LNR C
LNRs
Ankyrin repeats
Transactivatio
n Domain
PAPP A
Transmembrane region
PAPP A2
12345
LNR B
PAPP E
LNR C
Nematode
Notch
Proteolytic Domain
Human Stealth:
CR2
CR1
LNRA
CR3
LNR B
CR4
Fly Stealth:
CR1
CR2
LNRA
CR3
# Basic
(H, K, R)
# Acidic
(D, E)
Slot #
1
2
3
4
5
6
7
8
9
Total
9
10
12
12
18
19
29
29
29
High. Perc.
78% L
40% N
33% F
25% N
33% D
47% P
31% E
21% K
28% N
10
29
100% C
11
2
50% V, E
1
1
12
3
33% D, V, T
1
1
13
14
15
16
17
18
19
20
3
3
10
10
18
28
32
32
33% Y, S, L
33% Q, N, R
50% N
60% P
33% L
25% Y
22% D
22% Q
1
# Polar
(S, C, T, N, Q)
1
4
29
100% Polar
6
1
4
2
2
8
4
1
8
1
2
6
10
3
50% Acid/Hydr.
1
1
7
11
5
13
13
4
3
4
2
3
17
%
100% Hydrophobic
60% Polar
67% Hydrophobic
42% Hydrophobic
33% Acidic
68% Hydrophobic
34% Acidic
52% Hydrophobic
59% Polar
1
1
1
1
1
4
7
8
1
7
8
3
1
33% Hydr/Acid/Polar
1
2
7
2
6
7
7
9
33% Hydr/Arom/Polar
67% Polar
70% Polar
70% Hydrophobic
61% Hydrophobic
29% Aromatic
41% Hydrophobic
41% Hydrophobic
Name
Accession
Residues
# Cys
hN1
LNRA
P46531
EEACELPECQEDAGNKVCSLQCNNHACGWDGGDCS
1447-1481
6
3715.9
3.89
8
1
92.14
10
1
2
13
35
5875
P46531
LNFNDPWKNCTQSLQCWKYFSDGHCDSQCNSAGCLFDG
FDCQ
1482-1523
6
4827.2
4.28
5
2
53.47
7
7
3
13
42
12865
P46531
RAEGQCNPLYDQYCKDHFSDGHCDQGCNSAECEWDGL
DCA
1524-1563
6
4484.7
4.12
9
2
20.63
9
4
4
14
40
8855
1422-1455
6
3593.8
4.17
6
2
52.41
10
2
3
9
34
7365
LTMENPWANCSSPLPCWDYINNQCDELCNTVECLFDNFE
hN2 LNRB Q04721 CQ
1457-1497
6
4804.3
3.26
7
0
50.62
7
5
0
15
41
12865
GNSKTCKYDKYCADHFKDNHCDQGCNSEECGWDGLDC
hN2 LNRC Q04721 A
1498-1535
6
4261.5
4.64
8
4
36.11
7
4
6
12
38
8855
1384-1418
6
3785.1
6.31
6
6
69.21
8
1
6
9
35
5875
LSVGDPWRQCEALQCWRLFNNSRCDPACSSPACLYDNFD
hN3 LNRB Q9UM47 CH
1419-1459
6
4722.2
4.75
5
3
95.14
9
5
4
10
41
12865
AGGRERTCNPVYEKYCADHFADGRCDQGCNTEECGWD
hN3 LNRC Q9UM47 GLDCA
1460-1501
6
4616.9
4.35
9
4
34.81
12
4
5
12
42
8855
hN1 LNRB
hN1 LNRC
hN2
LNRA
hN3
LNRA
hN4
LNRA
Sequence
Instability
#
MW pI (Theor.) # neg. # Pos.
Index
aliphatic # Aromatic # Basic # Acidic Total # Ex. Co. (all half)
Q04721 PATCLSQYCADKARDGVCDEACNSHACQWDGGDC
Q9UM47 EPRCPRAACQAKRGDQRCDRECNSPGCGWDGGDCS
Q5STG5 CEGRSGDGACDAGCSGPGGNWDGGDCS
1180-1206
4
2490.5
3.71
5
1
48.32
11
1
1
6
27
5750
Q5STG5 PGAKGCEGRSGDGACDAGCSGPGGNWDGGDCS
1175-1206
4
2900.9
4.04
5
2
37.03
14
1
2
6
32
5750
LGVPDPWKGCPSHSRCWLLFRDGQCHPQCDSEECLFDG
hN4 LNRB Q5STG5 YDCE
1207-1248
6
4830.3
4.57
8
3
83.37
9
5
5
10
42
12865
TPPACTPAYDQYCHDHFHNGHCEKGCNTAECGWDGGDC
hN4 LNRC Q5STG5 R
1249-1287
6
4297.6
5.2
6
2
69.48
8
4
6
9
39
8855
mN1
LNRA
Q01705 EEACELPECQVDAGNKVCNLQCNNHACGWDGGDCS
1446-1480
6
3713
3.95
7
1
74.68
11
1
2
13
35
5875
mN1
LNRB
LNFNDPWKNCTQSLQCWKYFSDGHCDSQCNSAGCLFDG
Q01705 FDCQ
1481-1522
6
4827.2
4.28
5
2
53.47
7
7
3
13
42
12865
mN1
LNRC
LTEGQCNPLYDQYCKDHFSDGHCDQGCNSAECEWDGLD
Q01705 CA
1523-1562
6
4471.7
3.93
9
1
25.45
9
4
3
14
40
8855
P07207
7
3771
4.17
6
3
60.46
7
2
3
11
35
1865
dN LNRA
RAMCDKRGCTECQGNGICDSDCNTYACNFDGNDCS
1479-1513
Fig. 5: Physichochemical characteristics of LNRs. Each LNR sequence is characterized using ExPASy Proteomics. Information such as the
theoretical pI and total number of residues tells us that all LNRs are acidic and are less than 45 amino acids long. This tables shows a few of the
characteristics and compiled information for some selected LNR sequences.
Stealth
PAPP:
LNR A
# Arom.
(F, Y, W)
Fig. 4: Alignment Slots Statisticss –Some of Them. Each slot (see Fig. 3) is analyzed for the most abundant
amino acid (Column 3 – Highest Percentage) and then analyzed for different types of amino acids (Columns 48). Many slots are made predominantly of a certain type of amino acid. Information for slots 1-20 is shown.
Gluc TraA
Notch:
# Hydr.
(G, A, V, L, I, M, P)
9
3
8
5
4
13
7
15
7
Fig. 3: LNR alignment. All LNRs are aligned based on the position of key
structural amino acids such as the cysteines and the aspartic acids (highlighted in
red and green, respectively). Each “slot” is numbered (on top). This alignment
allows us to see similarities and trends in each “slot”, giving us further clues to
LNR characteristics.
CR4
Fig. 1: Domain organization of different classes of proteins that contain
LNRs. In the Notch protein the LNRs (represented as yellow ovals) are found
in a tandem block of three, while in the PAPP the first two tandem LNRs are
separated from the third LNR by ~1000 amino acids. Human Stealth Protein
contains two LNRs while the Fly Stealth protein contains only a single LNR
Acknowledgements
- National Science Foundation Research Experiences for Undergraduates (NSFREU) in Chemistry and Physics
- Professor Didem Vardar-Ulu, Christina Hao, Sharline Madera, and Ursela
Siddiqui.
Conclusion/Future Work:
Our preliminary results indicate that although all LNRs, regardless of their origin, are small, acidic sequences, there are important subtle differences in
the details of each LNR sequence that might shed light into their unique biological function within the larger multidomain protein scaffold. We have also
found that some slots in the alignment of the LNRs predominantly contain a certain type of amino acid, either acidic, basic, hydrophobic, polar or
aromatic. This compiled information, in the future, will be used in deciding which LNRs are relevant for further experimental characterization study and
comparison of the bioinformatics data with experimental results will give us a clearer understanding of the characteristics of LNRs from such a diverse
variety of protein families.
References:
1.Notch Subunit Heterodimerization and Prevention of Ligand-Independent Proteolytic Activation Depend, Respectively, on a Novel Domain and the LNR
Repeats. Cheryl lSanchez-Irizarry, Andrea C. Carpenter, Andrew P. Weng, Warren S. Pear, Jon C. Aster, and Stephen C. Blacklow. Molecular and Cellular
Biology, Nov.2004, Vol.24, No.21. p 9265–9273.
2.The Lin12-Notch Repeats of Pregnancy-associated Plasma Protein-A Bind Calcium and Determine its Proteolytic Specificity. Henning B. Boldt, Kasper
Kjaer-Sorensen, Michael T. Overgaard, Kathrin Weyer, Christine B. Poulsen, Lars Sottrup-Jensen, Cheryl A. Conovers, Linda C. Giudice, Claus Oxvig.
Journal of Biological Chemistry, Sept. 2004, Vol. 279, No. 37, p. 38525-38531.
3.Stealth Proteins: In Silico Identification of a Novel Protein Family Rendering Bacterial Pathogens Invisible to Host Immune Defense. Peter Sperisen,
Christoph D. Schmid, Philipp Bucher, Olav Zilian. PLoS Comput Biol. 1(6): e63. 2005.
4.“Number of Cysteines Histogram”. UCSC Genome Bioinformatics. Updated 12 Feb. 2004.
<http://genome.ucsc.edu/google/goldenPath/help/pbTracksHelpFiles/pbcCnt.shtml>
Download