QSAR models

advertisement
Predicting Highly Connected Proteins in PIN
using QSAR
Art Cherkasov
UBC / VGH
artc@interchange.ubc.ca
Apr 14, 2011
THE UNIVERSITY OF BRITISH COLUMBIA
What is
Chem(o)informatics ???
Chemical Space: Navigation(Grouping)
THE UNIVERSITY OF BRITISH COLUMBIA
What is
Chem(o)informatics ???
hits + GARBAGE
Chemical Space: Navigation(Grouping)
GARBAGE
hits
THE UNIVERSITY OF BRITISH COLUMBIA
What is
Chem(o)informatics ???
LIGAND-BASED
METHODS
-
hits + GARBAGE
+
Target
Structure?
METHODS
Chemical Space: Navigation(Grouping)
GARBAGE
hits
+
STRUCTURE-BASED
Docking works?
+
Traditional Drug Design Modes
De Novo works?
THE UNIVERSITY OF BRITISH COLUMBIA
Predictive QSAR Modeling Workflow* is complex
Y-randomization
Original
Dataset
Split into
Training, Test
and External
Validation
sets
Multiple
Training
Sets
Combi-QSAR
Modeling
Multiple
Test
Sets
Database
Screening Using
Applicability
Domain
Experimental
Validation
External validation
Using Applicability
Domain
Activity
Prediction
Validated Predictive
Models with High
Internal & External
Accuracy
Only accept
models
that have
Q2 > 0.6
R2 > 0.6
etc.
*Tropsha, A. Best Practices for QSAR Model Development, Validation,
and Exploitation Mol. Inf., 2010, 29, 476 – 488
CHEMBENCH.MML.UNC.EDU
hits + GARBAGE
Cheminformatics ???
LIGAND-BASED
METHODS
-
Target
Structure?
GARBAGE
STRUCTURE-BASED
METHODS
-
QSAR, FP similarity, Clustering,
MolFields, etc
hits
+
+
Docking works?
+
Traditional Drug Design Modes
De Novo works?
THE UNIVERSITY OF BRITISH COLUMBIA
hits + GARBAGE
STRUCTURE-BASED
LIGAND-BASED
METHODS
METHODS
Cheminformatics !!!
GARBAGE
hits
+
Docking works?
+
Conventional Drug Design Modes
De Novo works?
THE UNIVERSITY OF BRITISH COLUMBIA
QSAR – “Quantitative Structure-Activity Relationships” PubMed Citations
1000
QSAR papers in PubMed
900
Protein structures in PDB (x100)
800
compounds in CAS (x100k)
700
600
500
400
300
200
100
0
from A. Cherkasov & A. Tropsha, Nature Drug Discovery Reviews, 2011 (in progress)
THE UNIVERSITY OF BRITISH COLUMBIA
specifics of the talk:
1. Chemical Space: Quantification (Modeling ) and
Navigation (Grouping)
a. Ligand QSAR:: Concept:Consensus Modeling
b. Ligand QSAR: :Examples: BML Model, Antibiotics
2. Peptide QSAR: :Example: Antimicrobial Peptides
3. Protein QSAR: :Example:“Hubs” in PINs
THE UNIVERSITY OF BRITISH COLUMBIA
Principles of QSAR modeling: Compounds, Descriptors, Functions, Activity
O
C
O
M
P
O
U
N
D
S
N
0.613
O
0.380
N
O
N
O
N
O
N
O
N
O
N
O
N
O
N
D
E
S
C
R
I
P
T
O
R
S
-0.222
0.708
Quantitative
Structure
Activity
Relationships
1.146
0.491
0.301
0.141
0.956
0.256
0.799
1.195
O
N
1.005
Slide by A. Tropsha, 2010
A
C
T
I
V
I
T
Y
Principles of QSAR modeling: Compounds, Descriptors, Functions, Activity
O
C
O
M
P
O
U
N
D
S
N
0.613
O
0.380
N
O
N
O
N
O
N
O
N
O
N
O
N
O
N
D
E
S
C
R
I
P
T
O
R
S
-0.222
0.708
Quantitative
Structure
Property
Relationships
1.146
0.491
0.301
0.141
0.956
0.256
0.799
1.195
O
N
1.005
Slide by A. Tropsha, 2010
P
R
O
P
E
R
T
Y
Compounds : Chemical Universe
1040 - 10120 compounds with
C, H, O, N, P, S, F, Cl, Br, I, and MW < 500 ??
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCT TCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGA GGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
.
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGT GCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Descriptors: “Inductive” etc
N 1
(  j   i )( R 2j  Ri2 )
i j
r j2i
N j  Q j   
N
N
s MOL  
j i j i
 MOL 
1
s MOL
Picture 4
R R
2
j
r
2
i
2
j i
1

N 1 R 2  R 2
j
i
2
2
r
j i
j i
THE UNIVERSITY OF BRITISH COLUMBIA
Functions: MLR, PLS, kNN, SVM, ANN, Binary
Regression, Decision Tree, RandomForest, PCA,
Hybrid Methods, LDA, etc
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Activity: Continuous, Binary
Molecular structure
gets translated into numbers
(descriptors)
Activity:
Continuous
f ( Descriptors) ~ Activity
Binary
THE UNIVERSITY OF BRITISH COLUMBIA
Chemical Space: Activity (Property) Quantification
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
EXAMPLE: QSAR TOX CONSENSUS MODELING:2008
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
Overview
of QSAR modeling approaches employed by six cheminformatic groups
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
involved
in this study.
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Group ID
Modeling
Techniques
Descriptor Type
Applicability Domain Definition
UNC
kNN, SVM
MolconnZ, Dragon
Euclidean distance threshold between
a test compound and compounds in
the modeling set
ULP
MLR, SVM,
kNN
Fragments (ISIDA), Molecular
(CODESSA-Pro)
Euclidean distance threshold between
a compound and compounds in the
modeling set; bounding box
UI
MLR/OLS
Dragon
Leverage approach
UK
PLS
Dragon
Residual standard deviation and
leverage within the PLSR model
VCCLAB
ASNN
E-state indices
Maximal correlation coefficient of the
test molecule to the training set
molecules in the space of models
UBC
MLR, ANN,
SVM, PLS
IND_I
Range of independent variables values
in the training set +/- 15%
THE UNIVERSITY OF BRITISH COLUMBIA
Modeling Set (n=644)
Validation Set I (n=339)
Validation Set II (n=110)
Model
Group ID
Coverage
Coverage
Coverage
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
Q2abs
MAE
R2abs
MAE
R2abs
MAE
(%)
(%)
(%)
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
kNN-Dragon
UNC
0.92
0.22
100
0.85
0.27
80.2
0.72
0.33
52.7
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
kNNGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
UNC
0.91
0.23
99.8
0.84
0.30
84.3
0.44
0.39
53.6
MolconnZ
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
SVM-Dragon
UNC
0.93
0.21
100
0.81
0.31
80.2
0.83
0.27
52.7
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
SVMCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
UNC
0.89
0.25
100
0.83
0.30
84.3
0.55
0.37
53.6
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
MolconnZ
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
ISIDA-kNN
ULP
0.77
0.37
100
0.73
0.36
78.5
0.63
0.37
42.7
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
ISIDA-SVM
ULP
0.95
0.15
100
0.76
0.32
100
0.38
0.50
100
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
ISIDA-MLR
ULP
0.94
0.20
100
0.81
0.31
95.9
0.65
0.41
51.8
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
CODESSAULP
0.72
0.42
100
0.71
0.44
100
0.58
0.47
100
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
MLR
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
OLS
UI
0.86
0.30
92.1
0.77
0.35
97.0
0.59
0.43
98.2
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
PLS
UK
0.88
0.28
97.7
0.81
0.34
96.1
0.59
0.40
95.5
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
ASNN
VCCLAB
0.83
0.31
83.9
0.87
0.28
87.4
0.75
0.32
71.8
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
PLS-IND_I
UBC
0.76
0.39
100
0.74
0.39
99.7
0.45
0.54
100
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
MLR-IND_I
UBC
0.77
0.39
100
0.75
0.40
99.7
0.46
0.53
100
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
ANN-IND_I
UBC
0.77
0.39
100
0.76
0.39
99.7
0.46
0.53
100
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
SVM-IND_I
UBC
0.79
0.31
100
0.79
0.35
99.7
0.53
0.46
100
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
THE UNIVERSITY OF BRITISH COLUMBIA
Modeling Set (n=644)
Validation Set I (n=339)
Validation Set II (n=110)
Model
Group ID
Coverage
Coverage
Coverage
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
Q2abs
MAE
R2abs
MAE
R2abs
MAE
(%)
(%)
(%)
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
kNN-Dragon
UNC
0.92
0.22
100
0.85
0.27
80.2
0.72
0.33
52.7
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
kNNGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
UNC
0.91
0.23
99.8
0.84
0.30
84.3
0.44
0.39
53.6
MolconnZ
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
SVM-Dragon
UNC
0.93
0.21
100
0.81
0.31
80.2
0.83
0.27
52.7
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
SVMCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
UNC
0.89
0.25
100
0.83
0.30
84.3
0.55
0.37
53.6
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
MolconnZ
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
ISIDA-kNN
ULP
0.77
0.37
100
0.73
0.36
78.5
0.63
0.37
42.7
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
ISIDA-SVM
ULP
0.95
0.15
100
0.76
0.32
100
0.38
0.50
100
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
ISIDA-MLR
ULP
0.94
0.20
100
0.81
0.31
95.9
0.65
0.41
51.8
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
CODESSAULP
0.72
0.42
100
0.71
0.44
100
0.58
0.47
100
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
MLR
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
OLS
UI
0.86
0.30
92.1
0.77
0.35
97.0
0.59
0.43
98.2
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
PLS
UK
0.88
0.28
97.7
0.81
0.34
96.1
0.59
0.40
95.5
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
ASNN
VCCLAB
0.83
0.31
83.9
0.87
0.28
87.4
0.75
0.32
71.8
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
PLS-IND_I
UBC
0.76
0.39
100
0.74
0.39
99.7
0.45
0.54
100
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
MLR-IND_I
UBC
0.77
0.39
100
0.75
0.40
99.7
0.46
0.53
100
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
ANN-IND_I
UBC
0.77
0.39
100
0.76
0.39
99.7
0.46
0.53
100
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
SVM-IND_I
UBC
0.79
0.31
100
0.79
0.35
99.7
0.53
0.46
100
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
Consensus
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
0.92
0.23
100
0.85
0.29
100
0.67
0.39
100
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA
ATTTGCATTCCCTACC
Model
Ia
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
Consensus
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
0.92
0.22
100
0.87
0.27
100
0.70
0.34
100
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
Model
IIb
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
Consensus
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
0.92
0.22
100
0.87
0.27
100
0.70
0.36
100
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
Model
IIBc
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
Consensus
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
0.92
0.22
100
0.86
0.28
99.7
0.70
0.34
98.2
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
Model
IIId
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
QSAR CONSENSUS MODELING:2010
THE UNIVERSITY OF BRITISH COLUMBIA
Geography of collaboration
40 scientists, 15 institutions
23
Slide by A. Tropsha, 2011
Chemical Space: Navigation(Grouping)
THE UNIVERSITY OF BRITISH COLUMBIA
THE UNIVERSITY OF BRITISH COLUMBIA
THE UNIVERSITY OF BRITISH COLUMBIA
THE UNIVERSITY OF BRITISH COLUMBIA
THE
UNIVERSITY
OF
BRITISH COLUMBIA
MEDICINE
INFECTIOUS
DISEASES
Retrival Percentage
Retrival Percentage
Retrieval of Antibiotic Compounds
Retreival of Human Methabolites
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
100
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
100
90
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
90
R=0.1
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
80
R=0.15
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
80
R=0.2
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
70
R=0.1 Random
R=0.1
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAG
TATTCCATTGTAGCTC
70
R=0.15 Random
R=0.15
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
60
R=0.2 Random
60
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
R=0.2
50
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
R=0.1 Random
50
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
40
R=0.15 Random
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
40
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
R=0.20 Random
30
30
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
20
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
20
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
10
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
10
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
0
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
0
0
100
200
300
400
500
600
700
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGG
AATTTCAGTTCATGCA
0
100
200
300
400
500
600
700
Query number
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
Query points
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATG CCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCC ACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
db50
THE UNIVERSITY OF BRITISH COLUMBIA
MERCK Database QSAR annotation as Antibiotics and BML
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCAC AGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
OH
O
OH
O
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
O
O
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
O
O
O
O
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
O
O
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGT
CTCTGTTTGCTGATGC
OH
O
O
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
OH
O
OH
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
O
OH
O
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
O
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
O
OH
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
OH
O
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
OH
O
OH
OH
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
O
O
O
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
O
O
OH OH O
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
O
O
O
O
O
OH OH O
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
O
O
O
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
O
O
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGT GGCAGCCAGTGCCACC
O
O
O
OH
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
O
O
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
OH
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
OH
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
OH
OH
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
OH
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
O
O
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
OH
OH
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Lead compound
Merck
annotation
BM
L
scor
e
Antilipemic
0.71
Antibioti
clikeness
(30-10-2
ANN)
0.56
Antineoplasti
c
Cytostatic
agent
0.72
0.99
Analgesic;
Antiinflamat
ory
0.74
0.66
Lovastatin
NP-007587
Olivomycin A
Gentisic acid
Metabolite analogue
NP-009248
NP-001423
THE UNIVERSITY OF BRITISH COLUMBIA
MERCK Database QSAR annotation as Antibiotics and BML
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCAC AGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
OH
O
OH
O
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
O
O
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
O
O
O
O
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
O
O
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGT
CTCTGTTTGCTGATGC
OH
O
O
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
OH
O
OH
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
O
OH
O
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
O
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
O
OH
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
OH
O
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
OH
O
OH
OH
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
O
O
O
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
O
O
OH OH O
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
O
O
O
O
O
OH OH O
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
O
O
O
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
O
O
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGT GGCAGCCAGTGCCACC
O
O
O
OH
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
O
O
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
OH
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
OH
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
OH
OH
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
OH
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
O
O
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
OH
OH
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Lead compound
Merck
annotation
BM
L
scor
e
Antilipemic
0.71
Antibioti
clikeness
(30-10-2
ANN)
0.56
Metabolite analogue
CONFIRMED
Lovastatin
NP-007587
Antineoplasti
c
Cytostatic
agent
0.72
Analgesic;
Antiinflamat
ory
0.74
0.99
CONFIRMED
Olivomycin A
Gentisic acid
NP-009248
0.66
NP-001423
CONFIRMED
THE UNIVERSITY OF BRITISH COLUMBIA
THE UNIVERSITY OF BRITISH COLUMBIA `
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAG TATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
Distinguishing
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
Distinguishing
Antimicrobials from Antimicrobials from
Antimicrobials versus QSAR AATTTCAGTTCATGCA
model
for
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGG
Antimicrobials from
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
Drugs
Drug-likes
Drugs versus Drug- Bacterial Metabolites
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
all others
likes
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
Train
Test
Train
Test
Train
Test
Train
Test
Train
Test
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
T_P
327
130
332
140
294
124
270
89
360
139
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
T_N
631
248
841
342
1490
621
1486
644
792
347
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
F_P
49
33
7
14
32
20
17
14
39
26
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATG CCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
F_N
33
35
30
23
66
41
108
58
48
19
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
SPEC
0.93
0.88
0.99
0.96
0.98
0.97
0.99
0.98
0.95
0.93
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
SENS
0.91
0.79
0.92
0.86
0.82
0.75
0.71
0.61
0.88
0.88
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
ACCUR
0.92
0.85
0.97
0.93
0.95
0.92
0.93
0.91
0.93
0.92
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
PPV
0.87
0.80
0.98
0.91
0.90
0.86
0.94
0.86
0.90
0.84
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCC
NPV
0.95
0.88
0.97
0.94
0.96
0.94
0.93
0.92
0.94 ACATTTTCTAGCCCAC
0.95
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
A number of QSAR models have been elaborated to separate individual clusters within
the dataset of 958 human therapeutics, 519 antimicrobials, 1202 drug-like chemicals,
as well as 1102 human-,
551 bacterial-, 2351 plant- and 825 fungal metabolites.
THE UNIVERSITY OF BRITISH COLUMBIA
Separation of various classes of substances in the
chemical space
General drugs
Bacterial metabolites
Inactive
Chemicals
Antibacterials
THE UNIVERSITY OF BRITISH COLUMBIA
The two acyl hydrazone-based in silico hits with potent
selective inhibitory activity towards MRSA Pyruvate Kinase.
IC50 (mM)
Compound Structure
MRSA
Growth inhibition (%)
PK
Human
M1 PK
Human
M2 PK
Human
R PK
Human
L PK
S. aureus
HeLa
0.85
450
519
450
38
10
13
IS-63
N
O
NH N
O
OH
FP search of ZINC db with BML scoring
THE UNIVERSITY OF BRITISH COLUMBIA
The two acyl hydrazone-based in silico hits with potent
selective inhibitory activity towards MRSA Pyruvate Kinase.
IC50 (mM)
Compound Structure
MRSA
Growth inhibition (%)
PK
Human
M1 PK
Human
M2 PK
Human
R PK
Human
L PK
S. aureus
HeLa
0.85
450
519
450
38
10
13
0.091
375
125
350
350
35
0
IS-63
N
O
NH N
O
OH
IS-130
H
N
Br
NH N
N
O
OH
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAG TATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGG AATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
1
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
0.9
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
0.8
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
Power law
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
0.7
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
Weibull
0.6
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
analytical
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
Weibull by
0.5
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATG
CCTTCAAGTTGGGCTT
plotting
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
0.4
experiment
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
0.3
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
0.2
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
0.1
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
0
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
0
100
200
300
400
500
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCC ACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
b
f (GO)  aGO
Pareto’s inequality law
introduced more then a century ago
(Pareto, 1897 )
economic-, professional-, sexualand social networks
airline routing
power lines connections
language networks
internet hyperlinks
protein interactomes
brain organization
metabolic pathways
food and ecological webs
THE UNIVERSITY OF BRITISH COLUMBIA
Most common distinct molecular scaffolds
classified for the studied groups of chemical substances
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAG TATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGG AATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATG CCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCC ACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAG TATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGG AATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATG CCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCC ACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Most common distinct substituents classified
for the studied groups of chemical substances
THE UNIVERSITY OF BRITISH COLUMBIA
GoingBigger -> Peptide QSAR:: Antimicrobial Peptides
THE UNIVERSITY OF BRITISH COLUMBIA
Bad Bugs Need Drugs: IDSA, March 2006
Antimicrobial Availability Task Force
Widespread prevalence of MDR bacteria in hospitals
Few drugs in a pipeline, Urgent need for R&D
Experts Fear Increase in Drug-resistant Infectious
Here: Globe and Mail, March 2006
MRSA, a treatment-resistant form of bacteria
that spreads through direct contact, is called a
greater threat to public health than SARS or
bird flu. The Boston Globe, August 21, 2006
THE UNIVERSITY OF BRITISH COLUMBIA
Antimicrobial Peptides
(AMP) Modes of Action
Oren and Shai
(Biopolymers
1998)
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCT GCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTG ATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
8-50 AA long, gene-coded, often contain lead sequence, parts of innate immunity
THE UNIVERSITY OF BRITISH COLUMBIA
Factors Influencing activity of AMP’s:
Usually helical,
but can be beta,
cyclic, irregular,
induced
IKWLKIFL
THE UNIVERSITY OF BRITISH COLUMBIA
Factors Influencing activity of AMP’s:
Hydrophobicity, Positive Charge, two-phased
IKWLKIFL
BUT: 9^20 possible sequence variants!!!
THE UNIVERSITY OF BRITISH COLUMBIA
Sources of antibiotic peptides
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
SWISS-PROT database:
ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release/sprot42.dat
University of Nebraska Medical Center:
http://aps.unmc.edu/AP/main.php
Biochemistry Department University of Triest, Italy:
http://www.bbcm.units.it/~tossi/pag5.htm
National Library of Health Sciences, TERKKO, University of
Helsinki:
http://oma.terkko.helsinki.fi:8080/~SAPD/login
School of Crystallography, Birkbeck University of London:
http://www.cryst.bbk.ac.uk/peptaibol/peptaibol_database_1lettercodes.htm
THE UNIVERSITY OF BRITISH COLUMBIA
Examples of typical gene-coded AMPs:
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCT GCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTG ATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
beta-Defensins, alpha-Defensins
Cecropins, Cholecystokinins
Stomoxyns, Gastrins,
Transferrins, Magainins,
Brevinins, Xenopsins
Dermaseptins, Provicilins
Cupiennines, Vicilins,
Corticostatins, Apidaecin,
Cathelicidin , Statherins, Histatins
Bombinins, Dermaseptins,
Maximins, Dermadistinctins,
Maculatina, Caerins, Aureins,
Citropin, Waglerins,
Gastrins, Cholecystokinins,
Magainins, Xenopsins
non -TOXIC
non - IMMUNOGENIC
do not cause RESISTANCE
fast and broadly ACTIVE
THE UNIVERSITY OF BRITISH COLUMBIA
Examples of typical gene-coded AMPs:
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCT GCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTG ATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
beta-Defensins, alpha-Defensins
Cecropins, Cholecystokinins
Stomoxyns, Gastrins,
Transferrins, Magainins,
Brevinins, Xenopsins
Dermaseptins, Provicilins
Cupiennines, Vicilins,
Corticostatins, Apidaecin,
Cathelicidin , Statherins, Histatins
Bombinins, Dermaseptins,
Maximins, Dermadistinctins,
Maculatina, Caerins, Aureins,
Citropin, Waglerins,
Gastrins, Cholecystokinins,
Magainins, Xenopsins
non -TOXIC
non - IMMUNOGENIC
do not cause RESISTANCE
fast and broadly ACTIVE
BIOINFORMATICS APPROACHES TO
MODELING AMPs ALL FAILED!
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
100,000
2
0
Designed Cheminformatics pipeline for AMPs
THE UNIVERSITY OF BRITISH COLUMBIA
Trained statistics for AMPs QSAR models
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Accuracy
Specificity
Sensitivity
Positive
Predictive
Value
5%
0.96
0.98
0.62
0.58
10%
0.93
0.94
0.76
0.39
25%
0.78
0.78
0.85
0.17
5%
0.94
0.97
0.33
0.30
10%
0.88
0.90
0.33
0.12
25%
0.77
0.77
0.80
0.12
5%
0.95
0.97
0.47
0.47
10%
0.91
0.92
0.54
0.27
25%
0.76
0.77
0.66
0.13
Training Top %
set
as actives
A
B
A+B
10 cross
THE UNIVERSITY OF BRITISH COLUMBIA
0.5
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
100,000 PEPTIDES have been designed
using random sequence composition with
ongoing enrichment for key aminoacids
Subjected to QSAR and 20 AMPs
Synthesized and TESTED
0.45
Amino acid fraction
0.4
0.35
Set A
Set B
Q1
Q2
Q3
Q4
0.3
0.25
0.2
0.15
0.1
0.05
0
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
Amino acid
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
SRANDARD USED:
Compound MX-226
(aka MBI-226, omiganan)
ILRWPWPWRRK
- prevention of wounds
- burn
-device-related infections (central venous catheter
related infections)
In Phase III-b clinical trials by MIGENIX©
THE UNIVERSITY OF BRITISH COLUMBIA
EX VIVO
against
12 bacterial
strains
(uM)
100,000
randomly
designed, 20
tested from
Q1-Q4
(predicted
high-,
medium-,
and
low-actives
Pseudomonas aeruginosa, Pseudomonas maltophilia,
Staphylococcus aureus, Enterobacter cloacae
THE UNIVERSITY OF BRITISH COLUMBIA
THE
UNIVERSITY
OF
BRITISH COLUMBIA
MEDICINE
INFECTIOUS
DISEASES
Pseudomonas aeruginosa, Pseudomonas maltophilia,
Staphylococcus aureus, Enterobacter cloacae
THE UNIVERSITY OF BRITISH COLUMBIA
THE
UNIVERSITY
OF
BRITISH COLUMBIA
MEDICINE
INFECTIOUS
DISEASES
Pseudomonas aeruginosa, Pseudomonas maltophilia,
Staphylococcus aureus, Enterobacter cloacae
THE UNIVERSITY OF BRITISH COLUMBIA
THE
UNIVERSITY
OF
BRITISH COLUMBIA
MEDICINE
INFECTIOUS
DISEASES
6
400
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCT
TCTGCCTTTGCTCCTG
350
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
5
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
300
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
4
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
250
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
200
3
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
150
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
2
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
100
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
1
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
50
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
0
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
0
Q4
Q3
Q2
Q1
Set B
Set A
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
Set A
Set B
Q1
Q2
Q3
Q4
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGA
GGCTGGCTGGCTGGAC
0.7
4
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
3.5
0.6
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
3
0.5
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
2.5
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
0.4
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
2
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
0.3
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACC
1.5
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGT
GCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
0.2
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
1
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
0.1
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
0.5
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
0
0
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
Set A
Set B
Q1
Q2
Q3
Q4
Set A
Set B
Q1
Q2
Q3
Q4
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Hydrophobic fraction MEDIAN
PROPERTIES
DISTRIBUTIONS
AMONG HIGH-,
MEDIUM- AND LOWACTIVES
Formal charge MEDIAN
ACTIVITY
H20 – phobicity
Hydrophobic moment MEDIAN
MEDIAN MIC [microMolar]
CHARGE
H20 – phobic moment
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Untreated
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Treated
THE UNIVERSITY OF BRITISH COLUMBIA
TOXICITY
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
% of Survived Reb Blood Cells
after 24h peptide administration @50ug/ml
150
100
50
0
H
H
C
H -8
H
H C-9
H
C
H - 10
H
C
H - 20
H
C
H - 36
H
C
H - 45
H
C
H - 48
H
C
H - 53
H
C
H - 57
H
C
H - 66
H
C
H - 69
H
C
H - 71
H
C
H - 75
H
H C-7
H
C 7
H -10
H
C 0
H -12
H
C 3
H -12
H
C 6
H -13
H
C 3
H -14
H
C 2
-1
4
Ba 8
c2
a
% viable cells
200
THE UNIVERSITY OF BRITISH COLUMBIA
IN
VIVO Ability of new antimicrobial peptides HHC-10 and HHC-36 to
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
protect
mice against S. aureus infections. Bacterial loads in the peritoneal
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
lavage
from individual mice after 24 h of infection. Dead animals were
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
assigned
the highest CFU count obtained in the experiment. The solid line
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
represents
the arithmetic mean for each group. ``
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
9
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
8
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
7
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
6
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
5
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
4
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
1.0×10
1.0×10
1.0×10
1.0×10
-3
H
H
C
H
H
C
-1
0
6
1.0×10
Sa
lin
e
CFU/ml
1.0×10
Treatment Group
THE UNIVERSITY OF BRITISH COLUMBIA
GoingEvenBigger -> Protein QSAR
THE UNIVERSITY OF BRITISH COLUMBIA
Protein interaction networks
are scale-free networks
The web of human sexual
contacts
(Liljeros et al., Nature, 411
(2001) 907.
The food
network
Neurons connections
THE UNIVERSITY OF BRITISH COLUMBIA
MRSA Proteins Interactions Network
2D representation of the developed MRSA PIN. Hub proteins are marked in yellow
and non-hubs are in blue. The conventional antimicrobial targets are marked in red
if they are also non-hubs. The conventional antimicrobial targets are marked in pink
if they are also hubs.
THE UNIVERSITY OF BRITISH COLUMBIA
MRSA Proteins Interactions Network
TASK: to sample the network with
fewest experiments?
HUBS !
THE UNIVERSITY OF BRITISH COLUMBIA
A summary of protein interaction data used in the training and testing of the hub classifiers
Training / Testing set
E. coli
S. cerevisiae
# of proteins
# of hubs (10% of total proteins)
# of non-hubs (90% of total proteins)
# of protein interactions
minimum # of interactions per hub
2860
286
2574
13888
20
5397
535
4862
37167
33
D.
melanogaster
6935
628
6307
19994
16
H.
sapiens
6592
620
5972
19115
13
THE UNIVERSITY OF BRITISH COLUMBIA
Hub proteins conservation among species
Query species
E. coli
Subject species
S.
D.
cerevisiae
melanogaster
H. sapiens
E. coli
% of hubs with similar proteins
18.18%
15.03%
18.18%
% of non-hubs with similar proteins
8.00%
5.67%
5.75%
% of conserved hubs
% of conserved non-hubs
S. cerevisiae
% of hubs with similar proteins
4.20%
6.72%
1.05%
5.56%
2.80%
5.36%
7.48%
34.02%
39.44%
% of non-hubs with similar proteins
3.78%
10.98%
11.74%
% of conserved hubs
3.55%
6.36%
10.28%
% of conserved non-hubs
2.88%
10.22%
10.26%
D. melanogaster
% of hubs with similar proteins
1.27%
12.26%
23.89%
% of non-hubs with similar proteins
1.93%
9.75%
20.64%
% of conserved hubs
% of conserved non-hubs
1.11%
1.43%
6.69%
7.23%
6.69%
17.82%
H. sapiens
% of hubs with similar proteins
2.58%
22.10%
37.90%
% of non-hubs with similar proteins
2.28%
12.34%
24.55%
% of conserved hubs
1.94%
10.00%
9.35%
% of conserved non-hubs
1.62%
8.98%
21.78%
Index
QSAR
descriptors
1
2
3-22
number of residues
molecular weight
fraction of each residues in sequence
23
24
25
26
27
28
29
30
31
32-51
fraction of polar residues in sequence
fraction of hydrophobic residues
fraction of charged residues
net charge at pH = 7.0
average hydrophobicity, Gtrans (kcal/mol)
average “hydrophilicity”, Gapp (kcal/mol)
fraction of surface residues in sequence
estimated surface area
estimated volume
fraction of each residue at surface
52
53
54
55
56
57
58
fraction of polar residues at surface
fraction of hydrophobic residues at surface
fraction of charged residues at surface
net surface charge
average surface hydrophobicity
average surface hydrophilicity
ratio of average surface hydrophobicity to average hydrophobicity for sequence
59
ratio of average surface “hydrophilicity” to average hydrophilicity for sequence
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
isoelectric point (elementary charge unit)
isoelectric point of surface (elementary charge unit)
fraction of random coil residues
fraction of α-helix residues
fraction of -sheet residues
helix-to-coil ratio for surface – helix-to-coil ratio for sequence
average surface polarizability (kcal/mol)
average surface MEP (kcal/mol)
average surface ionization potential (kcal/mol)
average surface electron affinity (kcal/mol)
average surface electronegativity (kcal/mol)
number of coil stretches > 4 residues in length
length of longest contiguous coil stretch
fraction of flexible coil residues in sequence
fraction of flexible residues at surface
THE UNIVERSITY OF BRITISH COLUMBIA
average flexibility index for coil residues at the surface
E. coli hub classifier
S.
cerevisiae hub classifier
Four-fold cross-validation average performance
Four-fold cross-validation average performance
Training
Training
sensitivity
specificity
accuracy
PPV
NPV
sensitivity
specificity
accuracy
PPV
NPV
86.71%
91.60%
91.11%
53.41%
98.41%
84.36%
88.99%
88.53%
45.74%
98.10%
Testing
Testing
sensitivity
specificity
accuracy
PPV
NPV
sensitivity
specificity
accuracy
PPV
NPV
51.40%
88.19%
84.51%
32.59%
94.23%
62.99%
86.16%
83.86%
33.37%
95.49%
D. melanogaster
H. sapiens
hub classifier
hub classifier
Four-fold cross-validation average performance
Four-fold cross-validation average performance
Training
Training
sensitivity
specificity
accuracy
PPV
NPV
74.95%
87.24%
86.12%
36.90%
97.22%
sensitivity
specificity
accuracy
PPV
NPV
51.77%
91.31%
87.59%
38.21%
94.80%
sensitivity
specificity
accuracy
PPV
NPV
26.61%
88.78%
82.93%
19.76%
92.10%
Testing
Testing
sensitivity
specificity
accuracy
PPV
NPV
41.24%
83.86%
80.00%
20.28%
93.48%
THE UNIVERSITY OF BRITISH COLUMBIA
THE UNIVERSITY OF BRITISH COLUMBIA
MRSA Proteins Interactions Network
Bait coverage summary and conserved interactions for MRSA and other PIN datasets.
nr = non-redundant. *Percentages were calculated with respect to the subject species.
THE UNIVERSITY OF BRITISH COLUMBIA
TakeHomeMessages {
-> QSAR allows sampling and navigating through Chemical
Space as well as modeling complex mol properties
-> When done properly, QSAR can handle even
unconventional systems like peptides
-> QSAR methodology can/should substitute sequencebased ideology (bioinformatics) on many levels
}
THE UNIVERSITY OF BRITISH COLUMBIA
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC
CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCT GAGCAGCCACAACCTA
CTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTG
GGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACT
CATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAA
GATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCT
TGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTC
TTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCT
CAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGA
GTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGAC
CTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTGGCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTC
TGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACATCAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGAC
CTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCGGGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGG
TGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCAGGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGT
TCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTTCTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCG
GGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCACAAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCT
CCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAGGGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGC
AGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACT
CCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACG
GCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCA
GGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCC
TGCTTTTCAAGGCTGTATGTTTACATTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCAC
ATTTGTATTTGTCATTAGTCAACCGGAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATG
ATCACACAGTCATACACGTTCTAACTCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGC
TGATGATCCACATTTTCTAGCCCACTCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAA
TCCTAAAGCTCTGGGAGCTGGGTGTCAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAAT
CAGTGAACACACTTGATGGGAGTTTTCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTA GGTGCCCTTGAGCCCA
GCTTTGGGAGCAATGTTGGATGAGTGAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGAC
GAGTCAGGAGCCCCTTCCAAGGGTGGACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAATCTCAGCCTCGCCCACTG
GCGCTGGACTTGGTACACAGGGTGGGGCAAAGTGGGTACTGGATCCTGATCATCCCTATCCCTGGGGTGTGGCTTCTTGCTGCACAGTCAGCTTCTAGTTCTGTAGCCCCAGCTGCTCCTGCGGTGGAGGGAGCTACACAT
CAGGCTCTGACCCCCTCCAGGTGGGGCCTTCGCGTGAGGGGAGTCAGCACGCATCAGCAGCTGGGCCCAGGGAGTTGCCCCACTGAGCACTGCGGGCTGACCTGCTCCCAACCAGGGAGATGGAGCTTCCCCCTTGAGTCG
GGCTGCTGAAGGGGGGTAGGGGATGGAAACAGTGCGTTTGCAGGAGTAAGGGTGCAGTTGGGTCCCTGCGAGAAAATGTCTCAGTTGTGGCAACTGATTGGTGACCTGGGGGGCGTTTCTGAGCCCACAGTGCTGGCATCA
GGACTCAGGTGTGAGGTGCCCCAGACCCTCCCCTTGCCAGTAATTAGCTGATGGCTCGGTGATGCCCAGGGTGAAGGAAGACTTGATTTTGGGAGGGGAGTTCTCTCGTAATGACACTGAGGATGCCTTCAAGTTGGGCTT
CTGGCATGTTCTGCCCTCGCTCCCCTTCTGTAGTCACCTTGGCCCTCGTGTTGCTGAGCTGTGTGTGGGAGCGGGAAGCGCGTCAGTGGGCGGAGGGAGCGGGAAGCGCGTCAGTGGGCGGAGTATTTGAGAACATTTCAC
AAGCCGCTGTTGAGGTTCAGAATCAACCAGCAGATACAGAAACATATTTCGGAGCGTGGGGACCCTTGGGTGAGCTGCCACATGAAGCAGCCCCAGGACCTCCCTGGCTCAAGGAGTGACAGCGAGTTTGTCTGAGGTGAG
GGCACAGGCCTGGCGAAGCCTCGTGTGTGGGTGAGACCTGCCCGACCCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCGTTGAGGCCAGGGGCA
TAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTGGCAGCCAGTGCCACC
ATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGCCAGTGCCTTACCCGAGGAGCTACTGGCCCAGTGGGGGAGGCATTCAGGTGGGCAGAGTCAGGGAGACTCATGAGGCCG
TTGAGGCCAGGGGCATAGAGCTGGCCAAGGAGCCATGGCTCACTAACGTGTTGTATGGGGCTCCTTCCCTTCAGGTCCAGGCTCCTGCGTGAAGTGATGCTCCTCTTTGCCTTACTCCTAGCCATGGAGCTCCCATTGGTG
GCAGCCAGTGCCACCATCGCGCTCAGTGTAAGTATCATTCCCTCTCACTGTCCTGGAGAGGACGAGAATTCCACCTGGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTA ATTTGCATTCCCTACC
GGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTACCCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCC
AAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGGTTGTCACATTCTGAGCAGCCACAACCTACTGCATCCCATGTAGAAGTGGAAGTGACCTGATTTTTTCCTGCTTTTCAAGGCTGTATGTTTACA
TTTGCCTCCAATCATTCCTATGGGAATTCCTTGGGAGTCTAACTTGGAGATTTTGTTTCTTCTGCCTTTGCTCCTGGGGGCTTAATCACTTCTGTGCCTCTGGTTATCTGTGGCACATTTGTATTTGTCATTAGTCAACCG
GAGACTCGGGGTCTGAGTGGAGGGTATGTCCCCCTCCAGTGATGGTTTCTGTTGGCTTCCCAGGGTGAGGATGACTCATGACCACTTGCAAGTGGTTTTTGTGTCTGGGGTTTATGATCACACAGTCATACACGTTCTAAC
TCCAGACTGACTGTTGAGAAAGCCTCTGGGTAAGGGAATTCCTGGGAAACACACTGTTTTCATGCATCCTCTGGAAGATGAGGCCTGAAGTTACCAGGGTCTCTGTTTGCTGATGCTGATGATCCACATTTTCTAGCCCAC
TCTGCTTCTCTGACACCTTTAGTCTTGAGGATCCATGNTCTGTGAAGGAATCCAAGCTCTCATTTCGCACTCACCTTGGCCCTGGCTCTGTCTCCAGGACCTCTTCTACTACAAAATCCTAAAGCTCTGGGAGCTGGGTGT
CAACCTGTGCCCGAGGAAATCATACAGTTACTGTGGACTTTCCAGTTTGCTGTCTTCTAGTATTCCATTGTAGCTCTTGGGTATTTTCCCATCCACCCCAAGATCCAGCTGGAAATCAGTGAACACACTTGATGGGAGTTT
TCCTGCATGTGCTCTGGGCATTGACAGTAGAAGGGTGTTCAGAATGTCTGCTGTGCCCTCATGGAGGAAGAGNGCTCAGTGTACATGCTCTGGGTCAGTAGGTGCCCTTGAGCCCAGCTTTGGGAGCAATGTTGGATGAGT
GAAGGAGGGATCCAGGGCAAAGCAGGCACGACAGAGTGGAGACGGCGCTGCTGGCTCTCAGGGGAATGGGCATGGAGTGGGTAGGAGATCCACCTAAGGAGGCTGGCTGGCTGGACGAGTCAGGAGCCCCTTCCAAGGGTG
GACACTGACAGGCCCCCAGTCTTGGTCTCCTGCATGCCAGAGGTACCAGCCCATCTTTTTTCCTAAACTTGATGACCTAGGGCTAGGGGCATGTTGAA
Lab:
Michael Hsing
Simon Chan
Nels Thorstein
Chris Fjell
Fuqiang Ban
Melian Huang
Ken Bydler
Osvaldo Santos-Filho
P. Axiero
Evgeny Maksakov
UBC Microbiology
REW Hancock
K Hilpert
H Jenssen
U.Sask VIDO:
L Babuick & team
SFU Computer Sciences
C. Sahinalp
E. Karakoc
F. Hormozdiari
CIHR
V.I.D.O. U.Sask.
Saskatoon, SK
CIHR/MSFHR
Bioinformatics
UBC I.D., Microbiology
Genome Canada,
Genome BC
UBC/VGH Prostate Centre
THE UNIVERSITY OF BRITISH COLUMBIA
Download