Genome-Wide Genetic Interaction Analysis of Glaucoma Using

advertisement
GENOME-WIDE GENETIC INTERACTION ANALYSIS OF GLAUCOMA
USING EXPERT KNOWLEDGE DERIVED FROM HUMAN PHENOTYPE
NETWORKS
TING HU† , CHRISTIAN DARABOS† , MARIA E. CRICCO, EMILY KONG, JASON H. MOORE
Institute for the Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College
Hanover, NH 03755, U.S.A.
E-mail: jason.h.moore@dartmouth.edu
The large volume of GWAS data poses great computational challenges for analyzing genetic
interactions associated with common human diseases. We propose a computational framework for
characterizing epistatic interactions among large sets of genetic attributes in GWAS data. We build
the human phenotype network (HPN) and focus around a disease of interest. In this study, we use
the GLAUGEN glaucoma GWAS dataset and apply the HPN as a biological knowledge-based filter
to prioritize genetic variants. Then, we use the statistical epistasis network (SEN) to identify a
significant connected network of pairwise epistatic interactions among the prioritized SNPs. These
clearly highlight the complex genetic basis of glaucoma. Furthermore, we identify key SNPs by
quantifying structural network characteristics. Through functional annotation of these key SNPs
using Biofilter, a software accessing multiple publicly available human genetic data sources, we
find supporting biomedical evidences linking glaucoma to an array of genetic diseases, proving our
concept. We conclude by suggesting hypotheses for a better understanding of the disease.
Keywords: GWAS; Epistasis; Gene-gene interaction; Human phenotype network; Statistical epistasis
network; Biofilter; Eye diseases; Glaucoma; SNPs; Pathways.
1. Introduction
With the rapid development of genotyping technologies and exponential increase in computational power, we are able to leverage the wealth of genome-wide association studies (GWAS)
data to test millions of genetic variations (single nucleotide polymorphisms, SNPs) for their
associations with common human diseases.1,2 However, common disorders are usually the
result of the non-linear combined effect of many variations. The study of these complex,
epistatic interactions among multiple genetic attributes is crucial in explaining portions of the
genotype-to-phenotype associations.3,4
Detecting and analyzing complex genetic interactions in GWAS data pose great statistical
and computational challenges. Epistatic effects potentially involve any number of SNPs, from
a couple to hundreds or even thousands. In turn, the uncertainty of size of the interacting SNP
set would require interaction studies to comprehensively explore all possible combinations of
SNPs. A typical GWAS dataset encompasses hundreds of thousands of variations, making
the enumeration of all pairwise SNP interactions computationally infeasible, and the computational cost increases exponentially with the number of features at hand. This inevitably
becomes a computational bottleneck of processing and analyzing GWAS data.
Therefore, genetic interaction studies meant to process modern GWAS data require the
development of advanced informatics methodologies. These novel algorithms must be able to
†
Co-first authors – TH and CD have contributed equally to this work.
simultaneously analyze large sets of interactions and identify genetic attributes potentially
involved in higher-order interactions. Network modeling has emerged as a suitable framework
for such purposes,5–7 thanks to its ability to represent a large number of entities, as vertices,
and their relationships, as edges.
In this study, we propose an informatics framework of detecting and analyzing genetic interactions for GWAS data. We use the example of the GLAUGEN study, a GWAS on glaucoma
consisting of over a million attributes. We pre-screen the dataset using a knowledge-based filter by building human phenotype network (HPN) to prioritize SNPs and reduce the search
space according to their known relationship to the disease/phenotype in question. This reduced SNP list is then used to quantitatively evaluate all pairwise epistatic interactions and
to build statistical epistasis network (SEN) to characterize their global interaction structure.
The key SNPs identified by the SEN are functionally annotated and evaluated for their interaction with other disorders. This new framework has the potential to elucidate large parts of
the complex genetic architecture of common human diseases, such as glaucoma, and identify
key SNPs for further biological validations.
2. Dataset and Methods
2.1. Glaucoma and the GLAUGEN Study
Glaucoma, a neurodegenerative disease, is the primary cause of irreversible blindness, affecting
over 60 million people worldwide. The most common kind of glaucoma, in all populations, is
primary open-angle glaucoma (POAG). Currently the only modifiable risk factor of POAG
is intraocular pressure (IOP), but even lowering that will only slow the process, not stop it.8
However, a substantial amount of POAG has been shown to have a genetic basis. Familial
aggregation of POAG has been long recognized and studied to find multiple loci linked to
them, causing the discovery of glaucoma-causing genes myocilin (MYOC), optineurin (OPTN)
and WDa -repeat domain 36.9 About 5% of POAG is presently attributed to a single-gene or
Mendelian forms of glaucoma. More cases of POAG are caused by the combined epistatic
effects of many genetics risk factors.10
The Glaucoma Gene Environment (GLAUGEN) seeks to illuminate the origin of the disease, to discover genetic loci associated with POAG, and to identify gene-gene and geneenvironment associations.8 GLAUGEN is a GWAS, case-control study, with about 2,000 unrelated cases and over 2,300 controls. To be involved in the study, both the cases and the
controls had to be at least 40 years of age and European derived or Hispanic Caucasian.
Subjects were genotyped with 1,048,965 SNPs examined.8
2.2. Human Phenotype Network (HPN)
The human disease network,11 or the more general human phenotype networks (HPNs),12,13
are mathematical graph models where nodes represent human genetic disorders and edges
link those nodes with shared biology.14 The underlying connections of the HPN contribute to
a
tryptophan-aspartic acid
300"
Obesity
Renal sinus fat
Bone Density
1000$
Follicule stimulating hormone
Heart rate variability traits Neuranatomic and neurocognitive phenotypes
Epirubicin-induced leukopenia
Anthropometry
Neuroanatomy
Hormone measurements
Adiponectin
Azoospermia Anthropometric traits
Oocytes Ceramides
Syndrome
Adverse response to lamotrigine and phenytoin
Osteoporosis Creutzfeldt-Jakob
fibrin fragment D [Supplementary Concept]
Serum selenium levels Prion diseases Male fertility
Thyroid volume Goiter Endothelial function traits Oligospermia Coronary spasm Coronary Vasospasm
Anticonvulsants Ovarian reserve Iron deficiency
Thyroid cancer Leukopenia Left Atrial Function
Recombination rate (males)
radiation-related) Thyroid cancer (Papillary
Blood Platelets
Ileal carcinoids Sphingolipids Glucosylceramides
Partial Thromboplastin Time Anemia
Fuchs's corneal dystrophy
Caudate nucleus volume
Echocardiographic traits
Tumor biomarkers DNA
Caudate Nucleus Forced Vital Capacity Carcinoid Tumor Brachial Artery
Genetic Recombination
Left Ventricular Function Thyroid Neoplasms
Platelet function and related traits
Fuchs Endothelial Dystrophy
Circulating vasoactive peptide levels
Primary Ovarian Insufficiency Kidney Calculi Kidney stones QT interval Heart Atria
Papillomaviridae
Folic Acid Vitamin B 6 N-glycan levels Vitamin B 12 Circulating cell-free DNA
Blood Coagulation Factor Inhibitors
Insulin-Like Growth Factor Binding Protein 3
5-HTT brain serotonin transporter levels Acetaminophen
Cortical structure
Gastric cancer Sphingomyelins
Dupuytren's disease Cleft Palate
Citalopram
Amygdala
Hepatitis B vaccine response Social Desirability
Sleep duration Polysaccharides Dupuytren Contracture
Serotonin Plasma Membrane Transport Proteins
Intracranial volume Dengue Hemorrhagic Fever Uterine fibroids
Nasopharyngeal Neoplasms Blood Vessels
Ovarian cancer
Antidepressive Agents
Morbidity-free survival Visual Cortex
Endometriosis Hepatitis B Vaccines
Dengue shock syndrome Small Cell Lung Carcinoma
Chronic Hepatitis B
Insulin-like growth factors Cleft Lip Response to antineoplastic agents
Response to acetaminophen (hepatotoxicity) Functional MRI
Nasopharyngeal carcinoma Fibromyoma
Fructose-Bisphosphate Aldolase
Folate pathway vitamin levels
Chronic hepatitis B infection
Event-related brain oscillations
Ovarian Neoplasms
Transferrin Receptors
Myoglobin
Lymphoma
Jaw Abnormalities
Response to cerivastatin
Marijuana Abuse
Religion and Psychology
Insulin-Like Growth Factor Binding Protein 5
Germinoma
Complement C4b-Binding Protein
Leukocyte Count Inflammation
White matter integrity
Insulin-Like Growth Factor Binding Protein 4
Response
to
TNF
antagonist
treatment
Pubertal anthropometrics
Personality dimensions
Walking
Floxacillin
alpha-Macroglobulins
Benzodiazepines
Extraversion (Psychology) Stomach
Venous
thromboembolism
Neoplasms
Leptin
Kawasaki disease
Aortic-valve calcification Rhabdomyolysis
Tobacco UseInterleukin-10
Disorder
Depressive Disorder
Interleukin-18
Waves
Cardiac structure and function
Interleukin-2 Receptor alpha Subunit
Movement Disorders Acute lung injury Insulin-Like Growth Factor I Iron-Regulatory Proteins
Survival Brain
Systemic lupus erythematosus and Systemic sclerosis
T-Lymphocytes Immunoglobulin A Smoking Testicular cancer
Sensory disturbances after bilateral sagittal split ramus osteotomy
IgE levels Behavior
Eosinophils Thyroxine
Drug-Induced Liver Injury
Mucocutaneous Lymph Node Syndrome
Idiopathic
CD8-Positive T-Lymphocytes
Myopia Exploratory Behavior
Rheumatoid arthritis Hypertrophic Pyloric Stenosis
Interleukin-12
Vitamin
D pulmonary fibrosis
Response to tamoxifen in breast cancer
Risperidone Corneal Topography
Testicular Neoplasms
Blood trace element (Zn levels) Cortical
thickness
Word reading Brain Mapping Immunoglobulin E
Cardiac Arrhythmias Addiction
Blood Sedimentation
Smoking behavior
Vascular Endothelial Growth Factor A
Hodgkin Disease
Tamoxifen Personality
Glioma Biliary Liver Cirrhosis Lymphocytic Leukemia
Resting heart rate
Alpha-Globulins
Interleukin-8
Keratoconus
Hodgkin's
lymphoma Volumetric brain MRI Primary biliary cirrhosis
Non-substance related behavioral disinhibition
Emphysema-related traits
Erythrocyte
Indices
Sleep
Carotene
Vascular endothelial growth factor levels
Allergic sensitization
Inflammatory biomarkers
Membranous Glomerulonephritis
Moyamoya disease beta
Astigmatism
Interleukin 1 Receptor Antagonist Protein Serum uric acid levels
Lung cancer
Bleomycin sensitivity Self-reported allergy Brain Urinary albumin excretion Lung Neoplasms
Transferrin
Infantile hypertrophic pyloric stenosis
PR interval
Red blood cell count mean corpuscular hemoglobin concentration
Parietal Lobe Celiac disease
lipoprotein A-I [Supplementary Concept] bipolar disorder and depression (combined)
Entorhinal Cortex
Schizophrenia
Emphysema
Systemic
Scleroderma
Chemokine
CCL2
Brugada syndrome
Substance-Related Disorders
Response to iloperidone treatment (PANSS-T score)
Sclerosis Protein quantitative trait loci
Glycoproteins
Menarche
Heart Function Tests
Antipsychotic-induced QTc interval prolongation Aorta
Allergic Rhinitis Exfoliation Syndrome Venous Thrombosis Frontal Lobe Butyrylcholinesterase Ferritins Homocysteine
Angiotensin-Converting Enzyme Inhibitors Acquired Immunodeficiency Syndrome Vascular Diseases
Isoxazoles
Angiotensin-converting enzyme activity
Behavioural disinhibition (generation interaction)
Monocyte count Weight
Face
Prostate cancerCerebrum Hyperactive-impulsive symptoms Temporal Lobe
Nephropathy
Parathyroid Hormone vWF and FVIII levels
Vitamin K 1
Taste Antipsychotic Agents Menopause Pancreatic cancer
Stevens-Johnson Syndrome
Alanine Transaminase
Reading and spelling Lymphocytes
Duodenal Ulcer Aging traits Upper aerodigestive tract cancers
Sudden Death Gout
Asperger Syndrome
Cervical cancer Celiac disease and Rheumatoid arthritis Glaucoma
Interleukin-6 Receptors
carbohydrate-deficient transferrin [Supplementary Concept] Vascular constriction
Open-Angle Glaucoma Muscular Diseases Urinary Bladder Neoplasms
Clozapine Foot
Tumor Necrosis Factor-alpha Acenocoumarol
Dietary macronutrient intake
Tooth Eruption
Head and Neck Neoplasms Vitamin K
Perphenazine
Corneal curvature
Autoimmune
Diseases
Chemokine CCL4
Non-Small-Cell Lung Carcinoma
Atrioventricular conduction
Hepatocellular carcinoma
Graves Disease Warfarin
Aortic root size
External Ear
Esophageal Neoplasms Aging
Mental Disorders Basal Cell Carcinoma End-stage
Permanent tooth development
IgA
levels
Chronic
lymphocytic
leukemia
coagulation
Brain
structure
Sepsis Menarche and menopause (age at onset)
Cystatin C Essential Tremor
Select biomarker
traits cancer Memory
human [Supplementary Concept] F8 protein Odontogenesis Apolipoprotein A-I
Esophageal
Transforming Growth Factor beta1
Psychological Sexual Dysfunctions Erectile Dysfunction Common traits (Other) Eye
IGA Glomerulonephritis
Allopurinol
von Willebrand Factor
Chronic Kidney Failure
Vitamin E
Central Nervous System
Sneezing
Life Expectancy
Illicit drug use Myeloid Leukemia
Hematological and biochemical traits
Hippocampus Myasthenia gravis Optic Disk Melanoma Chronic kidney disease
Sj�gren's syndrome
gamma-Glutamyltransferase Neuropsychological Tests
Inattentive symptoms
Facial morphology
Interleukin-6
Cornea
Kidney Diseases Blood Cells
Metabolome
Pain
Non-albumin
volume
Progranulin levels
Carotid
intima media
thickness
Cardiovascular
disease
risk factors
cancer protein levels
Maximal Midexpiratory Flow Rate Angiography Bladder
Coronary restenosis HippocampalDysplastic
Response to Vitamin E supplementation
Levels
Nevus Syndrome
Suntan
Two-hour glucose challenge Apolipoprotein Autism
Urinary metabolites
Albuminuria ErythropoietinBlood
Dehydroepiandrosterone
Tanning
Proteins
Treatment response for severe sepsis Hair
Aspartatealpha-Tocopherol
aminotransferase L-Lactate Dehydrogenase
β2-Glycoprotein I (β2-GPI) plasma levels
Myeloproliferative Disorders
Interleukin-1beta
Melanosis
Hair
Color
Serum dimethylarginine levels (symmetric)
Lipid Metabolism
Response to platinum-based chemotherapy in non-small-cell lung cancer
Optic Nerve
total Cholesterol AIDS Atherosclerosis
Anti-cyclic Citrullinated Peptide Antibody Blood pressure measurement (cold pressor test)
Haptoglobins
Longevity
Pulmonary function Sunburns
Chronic myeloid leukemia Precursor Cell Lymphoblastic Leukemia-Lymphoma
CD40 Ligand Alcohol Drinking Sex Hormone-Binding Globulin Glycemic traits Abdominal Aortic
Braces
Aneurysm Lymphoid Leukemia
Eye color Blue vs. green eyes
Colorectal Neoplasms
Monocyte early outgrowth colony forming units
Response to platinum-based agents Paget's disease
Biomedical quantitative traits
1-Alkyl-2-acetylglycerophosphocholine
Esterase
gemcitabine [Supplementary Concept] Liver Cirrhosis
Comprehensive
strength and appendicular lean mass Cyclic Peptides
Dialysis-related mortality
Beta-2
microglubulin
Tolerance
Test plasma levels
Protein C Brain imaging Neurotic Disorders
Iris color Circadian Rhythm
Carotid Stenosis Glucose
Cognitive test performance
Endometrial cancer
Blue vs. brown
eyes
Atrial Natriuretic
Factor
Mortality
Metabolic Syndrome X Metabolic traits
Handedness in dyslexia
Functional Laterality
Hair morphology Nevus count
Apolipoproteins C Behcet's disease
Colorectal cancer
Gestational
Diabetes
Eye color traits
Neurobehavioral Manifestations Digit length ratio
Autistic Disorder
Cardiovascular Diseases
Blood Viscosity Blood Glucose
Chemokines Mortality among heart failure patients
Thoracic Aortic Aneurysm
Palmitic acid (16:0) plasma levels
Multiple myeloma
pigmentation
Endometrial Neoplasms
Colony-Forming Units Assay
Colonic Neoplasms Stearic acid (18:0) plasma levels
Blood Flow Velocity Pain Skin
Alkaline Phosphatase
Measurement
Amyloid
beta-Peptides
Cognitive
decline
Nevi
and
Melanomas
Response to treatment for acute lymphoblastic leukemia
Tissue Plasminogen Activator
Glomerulosclerosis Paget's disease Vital Capacity
Palmitoleic acid (16:1n-7) plasma levels Speech Perception
Peroxidase Cystatins
Luteinizing
Hormone
Follicle Stimulating Hormone
Glucose Transporter Type 2 Phospholipids
Psychomotor
Performance LDL Lipoproteins
of Fallot
Alopecia Carotid Arteries
oxaliplatin [Supplementary Concept] Response to gemcitabine in pancreatic cancer
Telomere length Tetralogy
Brain Natriuretic Peptide Serum alkaline phosphatase levels Cardiomegaly Oleic acid (18:1n-9) plasma levels
Deformans
Information processing speed Chemerin Osteitis
levels
Intercellular
Adhesion
Molecule-1
Intra-Abdominal
Fat
Working memory Aromatase Inhibitors
Natriuretic peptide levels
Proinsulin levels
Mean forced vital capacity from 2 exams
Apolipoproteins E
Quantitative traits Anxiety
Occipital Lobe Intestinal Diseases
Iris characteristics
Carotid
Artery
Diseases
Hypothyroidism
Adverse response to aromatase inhibitors
Iris
Phytosterols
Hyperkalemic Periodic Paralysis
HIV-1 Retinal vascular caliber
Drug Toxicity
Heart Diseases
Common Carotid Artery
Focal Segmental Glomerulosclerosis
Soluble E-selectin levels
Major CVD
Thyroid hormone levels
Thyrotoxic hypokalemic periodic paralysis Primary sclerosing cholangitis
Soluble levels of adhesion molecules
Androgen
levels
Lupus
Vulgaris
Telomere
Short-Term
Memory
Pulse
Adverse response to carbamapezine Internal Carotid Artery Migraine with aura Abdominal Fat
P-Selectin Hand Strength
gamma-Glutamylcyclotransferase
Subclinical atherosclerosis traits (other)
Pure-Tone Audiometry
Dehydroepiandrosterone Sulfate
Crohn's disease and celiac disease
Carbamazepine Sclerosing Cholangitis Subcutaneous Fat Attention Deficit and Disruptive Behavior Disorders
Digestive system disease (Barrett's esophagus and esophageal adenocarcinoma combined)
Vitamin A
Migraine without aura HIV-1 susceptibility Gallbladder Diseases Retinal Vein
Retinol levels
Hoarding
Gallbladder cancer
Thyrotropin Soluble ICAM-1
Hearing Loss T-tau Migraine
Barrett's esophagus
Urinalysis Depression E-Selectin
Abdominal Aorta
Esophageal
adenocarcinoma
HIV-1
viral
setpoint
Migraine - clinic-based Vascular Calcification
Coffee Cardiac hypertrophy Ulcerative colitis
Caffeine consumption Intuition
tau Proteins
Hearing impairment
Ankle Brachial Index
Migraine Disorders Esophagitis Coffee consumption Reasoning Meningococcal disease
Gallbladder Neoplasms
P-tau181p Addictive Behavior
Radiation response Meningococcal Infections
Response to radiation
Caffeine
Cytomegalovirus Vaccines
Cytomegalovirus antibody response
frequency"
100$
200"
Insulin
10$
Forced Expiratory Volume
1$
100"
0"
0.1$
1$
1"
5"
10$
9"
13"
degree"
Asthma
100$
Electrocardiography
17"
Pancreatic Neoplasms
Prostatic Neoplasms
Neutrophils
Respiratory Function Tests
Attention Deficit Disorder with Hyperactivity
Coronary Disease
Macular Degeneration
Type 2 diabetes
Gambling
Crohn Disease
Metabolism
Inflammatory bowel disease
Breast Neoplasms
Breast size
Fig. 1. The SNP-based human phenotype network, filtered (edge weight cutoff = 0.002). Vertices are colored
by “modules”13 to increase readability. The vertex sizes are proportional to the number of associated SNPs.
The degree distribution is given on both linear and logarithmic scales.
the understanding of the basis of disorders, which in turn leads to a better understanding of
human diseases.
In the present work, we rely on the SNP-based HPN, where connected diseases share common SNPs. We use two sources for the phenotype-to-SNPs mapping: the National Human
Genome Research Institute GWAS catalog,15 a manually curated list of all the National Institute of Heath (NIH) funded GWAS studies, and the NIH’s database of Genotypes and Phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap). To build the HPN, all traits listed in
the GWAS catalog and dbGaP are annotated with their risk-associated SNPs. All the traits
become nodes in our network. Then, traits that have one or more SNPs in common are linked
in the HPN by an edge, the weight of which is proportional to the size of the SNPs’ overlap.
The GWAS catalog and dbGaP report 1,252 traits combined, annotated with 37,681 SNPs in
16,411 loci. The resulting bipartite network is projected in the space of phenotype vertices to
obtain the HPN shown in Fig. 1.
The HPN contains 985 vertices and over 26,000 edges, and encompasses all phenotypes
listed in the GWAS catalog and dbGaP, provided that they are connected to at least one other
trait. The resulting network is extremely dense, with an average degree greater than 500.
2.3. Statistical Epistasis Network (SEN)
Statistical epistasis network (SEN) is designed to study a global aggregation of pairwise
epistatic interactions among a large number of factors.16,17 First, pairwise epistatic interaction
is quantified using the information-theoretic measure called information gain18 for all possible
pairs of genetic attributes. Specifically, the genotypes of two SNPs A and B , and the discrete
phenotypic class C are regarded as variables. Mutual information I(A; C) or I(B; C) measures
the shared information between (A or B ) and C calculated as I(A; C) = H(A)+H(C)−H(A, C)
or I(B; C) = H(B) + H(C) − H(B, C), where entropy H measures the uncertainty of a random
variable or joint multiple random variables. Therefore, mutual information I(A; C) or I(B; C)
describes the reduction of the uncertainty of the phenotypic class C given the knowledge of
the genotype of A or B , known as the main effect of A or B on C . Similarly, the joint mutual
information I(A, B; C) measures the total effect of combining A and B on explaining the phenotype C . Subtracting the individual mutual information I(A; C) and I(B; C) from I(A, B; C)
captures the gained information on C by considering A and B together rather than individually. This information gain IG(A; B; C) is a practical measure for the epistatic interaction
effect of SNPs A and B on phenotype C , and has been used as an efficient non-parametric and
model-free statistical quantification of pairwise epistasis in genetic association studies.19–23
Second, all pairwise SNP-SNP interactions are quantified and ranked. We build statistical
epistasis networks where vertices are SNPs and edges are weighted pairwise interactions. We
only include pairs of SNPs if their interaction strengths are stronger than a preset threshold.
We analyze the network topological properties at each cutoff value of the threshold, such as the
size of the network (the number of its vertices and the number of its edges), the connectivity of
the network (the size of its largest connected component), and its vertex degree distribution.
Then, a threshold of the pairwise interaction strength is determined systematically by
finding the cutoff where the topological properties of the real-data network differentiate the
most from the null distribution of permutation testing. Such a SEN provides a significant
global structure of clustered strong pairwise epistatic interactions associated with a particular
phenotype. It serves as a map for further network properties investigation and key SNPs
prioritization as discussed in the following section.
2.4. Network Property Analysis of SEN
The assortativity of a network measures the propensities of vertices with similar characteristics
to connect to one another.24,25 In the context of SENs, we are interested in looking into the
main effect assortativity, i.e., whether there exists a correlation of main effects between pairs
of interacting SNPs. Such main effect assortativity is calculated as the Pearson correlation
coefficient r of the main effects at either ends of an edge in a SEN,
P
P
i 2
M −1 i ji ki − [M −1 i ji +k
2 ]
r=
P 2 i2
P ji +ki 2 ,
−1
M −1 i ji +k
−
[M
i 2 ]
2
(1)
where M is the total number of edges, and ji and ki are the main effects, calculated as
the mutual information of a SNP and the phenotypic class of the vertices (SNPs) at the
ends of the i-th edge, with i = 1, 2, ..., M . The coefficient r lies between -1 and 1, with r =
1 indicating perfectly assortative, r = 0 for non-assortative, and r = −1 for complete disassortative networks.
At the vertex level, centrality measures the importance of individual vertices in a network.
The most commonly used centrality measure is the node’s degree CD (v), which is the total
Phenotype
Exfoliation Syndrome
Coronary Artery Disease
Cardiovascular Diseases
Corneal curvature
Optic Disk
Open-Angle Glaucoma
Glioma
Eye
Glaucoma
Coronary restenosis
#SNPs
1
639
70
13
17
6
12
13
18
56
Degree
1
2
2
4
4
5
2
4
9
3
(a)
(b)
Fig. 2. Glaucoma network neighborhood. (a) Phenotypes, including the number of SNPs associated with
each phenotype and the “strength” of the interaction (degree) with the rest of the sub-network. (b) a depicted
representation of the Glaucoma-centered sub-network, in which the node sizes are proportional to the number
of associated SNPs.
number of edges connected to vertex v . The degree of a SNP in the statistical epistasis network
shows the number of other SNPs with which it is interacting. Betweenness centrality is a more
sophisticated metric that quantifies the number of times a vertex is part of the shortest path
P
between any pair of vertices,26 represented as CB (v) = s6=v6=t∈V σstσst(v) , where σst is the total
number of shortest paths from vertex s to vertex t and σst (v) is the number of those paths
that pass through vertex v . Closeness centrality, defined as CC (v) = P 1 dvs , where dvs is the
s6=v
distance between vertices v and s.27,28 This metric describes how easily a given vertex can
reach all other vertices in a network. In the context of SENs, centrality measures are used to
identify key SNPs that play an essential role in the global interaction structure.
3. Results
3.1. Pre-screening of Glaucoma Data Using HPN
To reduce the computational cost (in time and resources) of analyzing a GWAS dataset, the
data must be filtered, selected and/or prioritized prior to processing. Algorithmic filtering
methods fall into two main families: knowledge-driven and data-driven. Knowledge-driven
filters relies on the actual known biology behind the data, whereas data-driven filtering solely
relies on the statistical pre-processing of the data, regardless of their type. ReliefF, SURF,
and TuRF29,30 are examples of data-driven statistical methods of preprocessing GWAS data.
For a complete review of the knowledge-based vs. data-driven filtering, refer to Sun et al.31
The method we propose to use here starts with the full HPN described in Section 2.2.
We identify the glaucoma vertex and establish a list of the disorders and phenotypes in its
immediate neighborhood. The 10 members of the “glaucoma neighborhood” are presented in
table in Fig. 2.
Using the sub-HPN, we compile a list of 948 SNPs that have been associated with any
of the phenotypes in the sub-network. We subsequently annotate each of the risk-associated
SNPs with its linkage disequilibrium32 SNPs, indirectly associated with an increased risk. The
full list of 4,388 features containing both direct and indirect risk-associated SNPs is used
to filter the full one million SNPs of the GLAUGEN GWAS data. The subset made of the
overlap between our priority-list and the GLAUGEN dataset, composed of 2,380 SNPs, is
used to build the SNP interaction network presented in the following section.
3.2. Statistical Epistasis Network of Glaucoma
Using the list of 2,380 SNPs, we evaluate all the pairwise epistatic interactions (>2.8 million)
using the information gain measure. The strongest interacting pair is SNPs rs13123790 and
rs6572380 with a strength of 1.163% association of the disease class. We set the pairwise
interaction strength threshold decreasing from the strongest observed value with a step of
0.0001. At each cutoff, a SEN is built by including pairs of SNPs as edges and two end
vertices if their interaction strength is above the cutoff. Then for each network we evaluate its
topological properties including the number of non-isolated vertices, the number of edges, the
sum of weighted edges, and the size of its largest connected component. A 100-fold permutation
test assesses the significance of these observed network properties. We permute the data 100
times by randomly shuffling the disease class column, and for each permuted dataset all the
pairwise interaction strengths are calculated and networks are built using the same cutoffs as
the read-data networks.
We observe a network at cutoff 0.693% that has a largest connected component including
significantly more vertices (p = 0.05) than those permuted-data networks at the same cutoff.
This largest connected component has 713 vertices and 789 edges (Fig. 3).
The main effect assortativity of this network is -0.053 with a significance of p = 0.04
using a 100-fold edge swapping permutation test where we randomly pick 10 × |E| (|E| is the
total number of edges) pairs of edges and swap their end vertices. This significant negative
assortativity indicates that the main effects of interacting SNPs are negatively correlated, i.e.,
high main-effect vertices tend to interact more with low main-effect vertices.
The degree, betweenness, and closeness centralities of all the vertices in the network are
evaluated, and their distributions are shown in Fig. 4. Most of vertices have degrees equal to
or less than 3. However there are a minority vertices with a degree higher than 5, meaning that
they interact with larger number of other SNPs. In the distribution of betweenness centrality,
the majority of vertices have low betweenness. However, some vertices have a much higher
betweenness than the rest. The closeness centrality follows a normal distribution indicating
that most vertices have similar access to all other vertices. We sort the vertices by descending
centrality. At the top, we find the key SNPs that are potentially involved the genetic processes
responsible for glaucoma. We further investigate the clinical and biomedical implications of
these SNPs through Biofilter33 functional annotations. Biofilter provides a single interface
to multiple publicly available online human genetic datasources (https://ritchielab.psu.
edu/software/biofilter-download).
4. Discussion
In this study, we proposed a genetic interaction analysis framework of using human phenotype
network (HPN) as knowledge-based data filter and subsequently statistical epistasis network
(SEN) for quantitative epistatic interaction detection and visualization. Our method was successfully applied to GLAUGEN GWAS data and we were able to identify a connected network
of interacting SNPs that included significantly more SNPs than expected randomly. Such a
rs7481311
rs1057510
rs16852600
rs1540923
rs9498099
rs918911
rs766992
rs7792939
rs2080401
rs12359685
rs10807323
rs13102884
rs10493900
rs2047870
rs7208097
rs1559777
rs8190893
rs10738886
rs2048327
rs2223662
rs10828213
rs9540221
rs1859428
rs13117608
rs10816943
rs6895327
rs1861612
rs1167242
rs1151008
rs9863717
rs2123808
rs12418204
rs10518221
rs2701248
rs4800148
rs1325273
rs6566726
rs11943513
rs7604827
rs11196686
rs1434709
rs9285397
rs1048228
rs701276
rs7090871
rs4421636
rs1754073
rs9521112
rs4979264
rs13240504
rs1216472
rs1480577
rs2554380
rs4845405
rs9399799
rs16940375
rs17261321
rs2439312
rs10485733
rs1473533
rs1145153
rs6795970
rs6565681
rs7648727
rs9521509
rs10495936
rs6869688
rs7044529
rs815815
rs2869832
rs4507859
rs2273301
rs6780510
rs6779469
rs4346352
rs16861329
rs16860271
rs885389
rs1538138
rs6774494
rs4952125
rs1932397
rs11749320
rs7239883
rs1357523
rs11921014
rs400337
rs12420080
rs2833836
rs6013871
rs1870661
rs2168159
rs12597778
rs7129429
rs795955
rs38831
rs9294851
rs2405781
rs1484948
rs17041869
rs17128272
rs3771395
rs6129032
rs12576239
rs10801913
rs4792192
rs7498403
rs1239947
rs10506627
rs560713
rs2410501
rs17286397
rs1950702
rs1950500
rs553346
rs7502462
rs1558139
rs727957
rs6504218
rs2627775
rs2750023
rs1532357
rs2306374
rs1945085
rs4793501
rs3827414
rs10424480
rs9946041
rs10791993
rs2948998
rs1746048
rs9449444
rs4365558
rs173700
rs8055236
rs2844479
rs3118914
rs886374
rs17034592
rs1037124
rs882300
rs12076129
rs7318731
rs894558 rs972869
rs891835
rs7525553
rs888663
rs10499480
rs3766793
rs3743200
rs896076
rs10505879
rs3916765
rs7171517
rs9487094
rs2273194
rs12199222
rs7678436
rs1858943
rs10505031
rs3752823
rs7138792
rs10486771
rs2162256
rs17141444
rs1524976
rs3805647
rs1077165
rs813164
rs3743157
rs300483
rs4653061
rs2079918
rs2822557
rs4640244
rs1805015
rs1802295
rs10184275
rs284489
rs11782819
rs9557082
rs10814916
rs1251588
rs3825214
rs4721923
rs989488
rs4371861
rs353121
rs2782934
rs17706439
rs1771428
rs4455882
rs4313471
rs4897336
rs3748069
rs11669012
rs12517906
rs2281680
rs1816002
rs10848958
rs2286036
rs10502372
rs189798
rs1396477
rs10509779
rs6663404
rs7172432
rs2171
rs9472138
rs10508288
rs9935794
rs3745060
rs11154022
rs1570211
rs6996971
rs1504749
rs11664670
rs1220025
rs17103138
rs5015480
rs891088
rs892267
rs12502295
rs923375
rs2853552
rs7336332
rs150724
rs2732494
rs2449222
rs6918956
rs4722613
rs2605463
rs9349200
rs11714343
rs7914674
rs391300
rs12741973
rs4743034
rs1035025
rs1040046
rs6885224
rs2149998
rs7191006
rs10518996
rs739981
rs7017184
rs4951536
rs11981985
rs710841
rs2849460
rs16872571
rs11108037
rs10237118
rs2291256
rs282544
rs551719
rs7178902
rs930716
rs334213
rs1245531
rs4537554
rs11856676
rs2580816
rs1270837
rs1436900
rs7209435
rs13438327
rs4503747
rs10468542
rs3924153
rs1533971
rs11580979
rs188019
rs11211081
rs1058793
rs2783169
rs4242252
rs2106913
rs1491192
rs2522422 rs2522425
rs2829588
rs535343
rs1401471
rs6982250
rs7542900
rs1553847
rs538827
rs16852525
rs1875206
rs10506546
rs7128948
rs3125050
rs756199
rs688391
rs4121677
rs569688
rs1490388
rs9568793
rs1901103
rs7403531
rs17171705
rs344907
rs4771591
rs7025486
rs10906115
rs17084051
rs1568482
rs13097792
rs6725887
rs7069291
rs1874326
rs2621469
rs597425
rs11708189
rs10014970
rs7812590
rs1944866
rs1949009
rs270504
rs1367158
rs2470984
rs11730157
rs2015454
rs10502032
rs300493
rs10922273
rs10834309
rs10496091
rs13077378
rs517871
rs499818
rs1017307
rs1007000
rs1525361
rs10085496
rs1880781
rs7743494
rs11122547
rs11599750
rs1029973
rs4842838
rs10895547
rs1111233
rs2146758
rs11123849
rs2024341
rs10014145
rs7669668
rs11090754
rs498872
rs6780569
rs1275419
rs804125
rs11131055
rs322239
rs6020814
rs12592821
rs745557
rs1052131
rs2670321
rs1293151
rs7092929
rs9825379
rs11926571
rs12356233
rs1865640
rs9326439
rs3825569
rs11170631
rs9626074
rs7954465
rs1545552
rs974110
rs4672150
rs6585827
rs4794665
rs10829156
rs10863936
rs7756992
rs944260
rs792747
rs785422
rs2074404
rs9318225
rs16886072
rs10518527
rs4379368
rs7833986
rs7660418
rs655487
rs17356672
rs11807640
rs2478830
rs11127920 rs1229542
rs7873584
rs540937
rs8048899
rs11900940
rs6564651
rs274917
rs1438342
rs1410518
rs10518622
rs2319517
rs943855
rs1137
rs6764363
rs12150284
rs4982323
rs989548
rs3847375
rs13389923
rs1406961
rs2876303
rs212062
rs6435396
rs6929078
rs6480314
rs4072683
rs2376999
rs1733724
rs13012046
rs2385576
rs11241936
rs248471
rs343079
rs17364464
rs1410811
rs925019
rs11223996
rs2347824
rs17291650
rs2853676
rs1399004
rs7314446
rs9391221
rs6563943
rs2390024
rs2794467
rs3797235
rs1407149
rs941567
rs913170
rs11782634
rs10915424
rs1435066
rs10430923
rs7809017
rs8000245
rs7810842
rs2383208
rs6476642
rs4527850
rs4977574
rs7000183
rs979976
rs3774937
rs1847118
rs16907259
rs13150558
rs6036896
rs1549318
rs11062784
rs11861962
rs3129055
rs2547416
rs928952
rs6905922
rs1281317
rs17288067
rs1898162
rs2206734
rs1293470
rs1549250
rs7003283
rs952361 rs13167932
rs12682841
rs9644059 rs964184
rs11859036
rs10483727
rs9317297
rs1467242
rs668204
rs16893890
rs7916697
rs1022141
rs1572072
rs2188857
rs7669856
rs2767000
rs9386234
rs9296824
rs16897288
rs6792584
rs11656696
rs706080
rs1870301
rs6856061
rs6671793
rs10282118
rs4129959
rs1889151
rs6499640
rs7754614
rs919129
rs12446047
rs10496166
rs9328448
rs140174
rs557962
rs12940030
rs6570507
rs9325777
rs10511904
rs10497323
rs9897551
rs1260836
rs463235
rs1444439
rs10504130
rs1888814
rs2717128
rs6670655
rs642773
rs1998038
rs10758892
rs6572380
rs12358475
rs3786800
rs758099
rs7963889
rs12931996
rs11763147
rs13123790
rs2750024
rs1267043
rs6667140
rs756202
rs7283906
rs6507838
rs1363258
rs6765739
rs7027110
rs6981943
rs12651137
rs1541010
rs7665764
rs1412444
rs10491472
rs7741993
rs10514431
rs4648011
rs2495699
rs914981
rs7013518
rs1476882
rs4130328
rs9848261
rs17117825
rs342169
rs8042680
rs6708562
rs1230545
rs4666636
rs1315134
rs9304456
rs1532358
rs2753268
rs7606754
rs6773931
rs1935681
rs565659
rs2967055
rs1333040
rs7698623
rs12579680
rs2305707
rs1053049
rs2839670
rs4687161
rs9359617
rs10754458
rs1412829
rs10502182
rs1968011
rs7742369
rs7332115
rs12583288
rs7488278
rs7404645
rs10514995
rs4405822
rs295208
rs8111071
rs12679004
rs12970134
rs280171
rs745324
rs968008
rs6284
rs2501968
rs1019156
rs10505784
rs17565975
rs9323914
rs788178
rs1965072
rs7856675
rs1192415
rs1076673
rs7651039
rs12338076
rs7183263
rs17550532
rs9358118
rs587847
rs10520520
rs834485 rs831983
rs2367563
rs5751614
rs17553657
rs3134904
rs704744
rs7045593
rs1044581
rs1105778
rs2236691
rs10504592
rs1376486
rs3027322
rs5215
rs1933988
rs6010620
rs4380028 rs167275
rs831571
rs472265
rs17584499
rs1741344
rs7084584
rs6699417
rs3111828
rs7112925
rs1635281
rs2171963
rs10510189
rs7823896
rs297367
rs705384
rs1145796
rs1324034
rs1012997
rs11918414
rs494459
rs1324982
rs10929060
rs10509680
rs7084752
rs6107955
rs7031748
rs779388
rs10932055
rs7029757
rs284495
rs7669522
rs4978434
rs4778818
rs1039524
rs2142219
rs803422
rs7028544
rs1719079
rs1490331
rs2221618
rs798489
rs26438
rs10165941
rs10777084
rs1320288
rs3752591
rs5219
rs747175
rs2274557
rs7249094
rs13201929
rs9563035
rs956627
rs1935881
rs7466269
rs2277912
rs2277142
rs2462140
rs245136
rs9941233
rs10777845
rs738494
rs992014
rs9366399
rs9374274
rs10010325
rs10008860
rs5768690
rs13266634
rs953259
rs6571228
rs4369779
rs9841585
Fig. 3. The SEN of Glaucoma. The network includes 713 SNPs (vertices) and 789 pairwise interactions
(edges). The size of a vertex represents the strength of the main effect of its corresponding SNP, with the
disease association ranging from 0.001% to 1.124%. The width of an edge indicates the strength of its
corresponding interaction, with the disease association ranging from 0.693% to 1.163%.
250
200
120
150
100
frequency
400
frequency
frequency
140
500
300
200
100
80
60
40
50
100
0
0
1
2
3
4
5
degree
(a)
6
7
8
20
0
0.0
0.1
0.2
0.3
betweenness
(b)
0.4
0.5
0.03 0.04 0.05 0.06 0.07 0.08
closeness
(c)
Fig. 4. The distributions of (a) degree, (b) betweenness, and (c) closeness centralities of vertices in the SEN
of Glaucoma.
large connected SNP network indicated a complex genetic basis of glaucoma.
HPN is an computational model that systematically organizes and clusters various phenotypes and diseases based on their shared genetic association factors or related basic functional
units. It also serves a powerful knowledge-based filter for GWAS data pre-screening and prioritization. Indeed, when compared to established data-driven filters, including the family of
Relief algorithms, the HPN filtering performed equally well. This was evaluated by quantifying
all the pairwise interactions of filtered sets of SNPs using different data-driven filters (data
not shown).
SEN is able to detect and analyze genetic interactions among a large set of attributes in
genetic association studies. The derived SNP interaction network of glaucoma in this study
had a significant connected structure and may serve as a map for identifying key genetic
factors associated with the disease with further biological validations. The significant negative
main effect assortativity of such a network indicated that epistatic interactions were likely to
happen between SNPs with negatively correlated main effects. This finding suggests that most
existing main-effect-based GWAS data filtering algorithms may overlook potential important
interactions since some SNPs with low main effects would not be pre-selected using such filters.
The set of SNPs with highest ranked centralities were annotated using Biofilter33 to their
associated genes and, in turn, the genes to pathways and functional groups. We used the
annotation to identify other complex diseases possibly sharing biology with glaucoma and
suggest novel hypotheses for undocumented shared genetic interactions.
It is known that glaucoma and type II diabetes (T2D) are associated, but direct genetic
relationship remains unknown. We found SNPs connecting T2D to glaucoma: rs1333040 (CD =
5, CB = 0.043) and rs1053049 (CD = 5). The genes CDKN2B-AS1 and PPARD, which these two
SNPs are respectively located in, are known to be associated with diabetes. Peng et al. observed
that CDKN2A/B gene had a 1.17-fold (95% CI: 1.1-1.23) risk estimate for T2D.34 PPARD
has been found to be associated with higher fasting glucose, glycated hemoglobin (HbA1c),
and increased risk of T2D and combined impaired fasting glucose/T2D in a population-based
Chinese Han sample.35 SNP rs9540221 (CD = 5) is also intergenic between NFYAP1 and
LGMNP1, of which the former has also been associated with diabetes and obesity. Indeed,
population-based studies suggest that persons with T2D have an increased risk of developing
open-angle glaucoma (OAG): the Beaver Dam Eye Study,36 the Framingham Eye Study,37
and the Los Angeles Latino Eye Study.38 Chopra et al. has noted that there is a positive
association between T2D and OAG in a sample Latino population study in Los Angeles,
USA.38 The Rotterdam Study39 has also previously reported a positive association between
DM and OAG, with a relative risk of 3.11 (95% CI, 1.12-8.66). Laboratory evidence for T2D
and glaucoma linkages also exists. Some have observed that there is an inner retinal cell
death, determined by the presence of hyaline bodies in the ganglion cell and retinal nerve fiber
layer of the eyes of post-mortem eyes of patients with T2D.40 Possible hypotheses of T2D to
glaucoma direct linkage include how altered biochemical pathways and vascular changes in
T2D reduce blood flow and impaired oxygen diffusion, thus increasing oxidative stress and
endothelial cell injury.41 Excessive glial cell activation may contribute to chronic inflammation,
making the retinal ganglion cells more susceptible to glaucomatous damage.42 Connective
tissue remodeling might also promote increased intraocular pressure (IOP) and greater optic
nerve head mechanical stress, which can result in glaucomatous optic neuropathy.43 Diabetes
is known to cause microvascular damage and may affect the vascular autoregulation of the
optic nerve and retina. Thus a longer duration of T2D would be associated with a higher risk
of OAG.
Obesity, like T2D, is a trait often associated with glaucoma, but whose connection remains
elusive. We found associations between obesity and glaucoma in SNPs rs12970134 (CD =
5, CC = 0.062), rs11807640 (CD = 8, CB = 0.516, CC = 0.071), rs9540221 (CD = 5), rs10777845
(CD = 5), and rs4365558 (CD = 5, CC = 0.063), all of which showed significance in our glaucoma
study. SNP rs12970134 is located upstream of gene MC4R, the defects of which have been
identified as a cause for autosomal dominant obesity.44 SNP rs11807640 is located downstream
to gene CDK4PS, which has been associated with obesity traits among some postmenuposal
women.45 Both rs9540221, which is intergenic between NFYAP1 and LGMNP1 and rs4365558,
which is intergenic between A4GALT and RPL5P34, are SNPs situated between genes that
have been associated with obesity as well. Finally, rs10777845 is upstream to gene RMST,
which has been observed to be associated with severe early-onset obesity.46 In the investigation
to determine direct linkage between glaucoma and obesity, several hypotheses exist. NewmanCasey et al. observed that not only was the frequency of OAG was higher among obese
individuals (3.1%) than among non-obese individuals (2.5%), but that the relationship between
OAG and obesity had significant correlations with gender.47 Obese women had a 6% increased
hazard of developing OAG when compared to non-obese women, while there was no significant
effect in men. Similar findings were made by Zang and Wynder, in which they observed OAG
diagnosis as twice as likely in women with a body mass index (BMI) above 27.5, but no impact
of BMI in men.48 Some hypothesize that increased orbital pressure as a result of excess fat
tissue may cause a rise in venous pressure and a consequent increase in intraocular pressure
(IOP).47
Our study also show links between Alzheimer’s disease (AD) and glaucoma. The SNP
rs803422 (CD = 5, CC = 0.060), which we found to affect glaucoma is also associated with
AD. In addition, the two genes SORCS3 and MTHFD1L which contain two of the SNPs with
the strongest association are also known to be associated with AD. Patients with AD have
a significantly increased rate of glaucoma occurrence. In a study in four nursing homes in
Germany, 112 Alzheimer’s patients were taken as a case group. 29 of those 112 were found to
have Glaucoma, a rate of 25.9% as opposed to a 5.2% rate in the control group.49 However,
in spite of significant research on this topic, there have still been no clear clinical or genetic
relationships found between the two diseases.50
Perhaps unsurprisingly, our research has also led to us connecting glaucoma with other
phenotypes relating to the eye, particularly cataracts and myopia. The association between
cataracts and glaucoma can be seen in the similar genes CDK4PS , NFYAP1, and LGMNP1,
all of which contain SNPs found to be highly associated with glaucoma. Myopia and glaucoma
are both directly related to the SNP rs569688 (CB = 0.415, CC = 0.074). There have been many
recent studies about the connection between glaucoma and myopia, particularly primary openangle glaucoma (POAG) and high myopia. Though the connection has still not been found, it
is believed the high myopia is another risk factor of POAG, though it is not as important as
intraocular pressure. However, it is recommended that those with high myopia be screen for
glaucoma more frequently.51
Of the SNPs and genes associated to glaucoma in our study, the majority that are not
directly eye-related, are heart related. Glaucoma is connected therefore probably connected
to cardiovascular disease, coronary artery disease, and coronary restenosis. SNPs rs17034592
(CD = 7, CC = 0.060) and loci LDB2, CDKN2B and TRNAQ46P are all associated with
coronary artery disease. SNPs rs499818 (CD = 5, CC = 0.052), rs1333040 (CD = 5, CC = 0.052),
and loci A4GALT, RPL5P34, and CST7 are all associated with myocardial infarction, and
A4GALT and RPL5P34 are associated with cardiovascular disease. Studies have shown that
cardiovascular disease may play an important role in the development and progression of
glaucoma. However, it is also hypothesized that this relationship may be indirect, due to
the variable of age.52 Additional studies about the relationship between intraocular pressure,
the main risk factor for glaucoma, and cardiovascular disease exist. However, any genetic
relationship between these two diseases is yet to be found.
An undocumented interaction found in this study is the link between glaucoma and colorectal carcinoma. Relevant SNPs that were significant were rs972869 (CD = 7, CC = 0.052),
located near the EPHB1 gene, and rs300493 (CD = 6, CB = 0.311, CC = 0.066), located in
the gene NAV3, both of which are linked to colorectal carcinoma. Studies have shown that
obesity and T2D are associated risks of colorectal cancer, which could lead to the hypothesis
that colorectal carcinoma and glaucoma may be co-morbidities and are affected by similar
pathways.
In summary, the successful application of our methodology on GLAUGEN GWAS data
proves the effectiveness of our bioinformatics framework, and suggests its great potential in
detecting and characterizing gene-gene interactions in GWAS data. In future studies, we expect
to extend the interaction analysis beyond the order of pairs. The increased computational
demand would, however, require a stricter data pre-filtering. The more stringent SNP filtering
can be achieved using a HPN based on SNP overlap to restrict the interactions to the most
significant ones. Other avenues of research would include using eye-specific tissue genotype
data to further refine the analysis and the drug response of the nodes of interest.
Acknowledgements
This work was supported by the National Institute of Health (NIH) grants R01-EY022300,
R01-LM010098, R01-LM009012, R01-AI59694, P20-GM103506, P20-GM103534.
References
1. R. Sachidanandam, D. Weissman, S. C. Schmidt, J. M. Kakol, L. D. Stein and et al., Nature
409, 928 (2001).
2. W. Y. S. Wang, B. J. Barratt, D. G. Clayton and J. A. Todd, Nature Review Genetics 6, 109
(2005).
3. H. J. Cordell, Human Molecular Genetics 11, 2463 (2002).
4. J. H. Moore, Nature Genetics 37, 13 (2005).
5. N. A. Davis, J. E. Crowe Jr., N. M. Pajewski and B. A. McKinney, Genes and Immunity 11,
630 (2010).
6. T. Hu and J. H. Moore, Network modeling of statistical epistasis, in Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, eds. M. Elloumi
and A. Y. Zomaya (Wiley, 2013) pp. 175–190.
7. A. Pandey, N. A. Davis, B. C. White, N. M. Pajewski, J. Savitz, W. C. Drevets and B. A.
McKinney, Translational Psychiatry 2, p. e154 (2012).
8. J. L. Wiggs, M. A. Hauser, W. Abdrabou, R. R. Allingham, D. L. Budenz, E. Delbono, D. S.
Friedman, J. H. Kang, D. Gaasterland, T. Gaasterland, R. K. Lee, P. R. Lichter, S. Loomis,
Y. Liu, C. McCarty, F. A. Medeiros, S. E. Moroi, L. M. Olson, A. Realini, J. E. Richards, F. W.
Rozsa, J. S. Schuman, K. Singh, J. D. Stein, D. Vollrath, R. N. Weinreb, G. Wollstein, B. L.
Yaspan, S. Yoneyama, D. Zack, K. Zhang, M. Pericak-Vance, L. R. Pasquale and J. L. Haines,
J Glaucoma 22, 517 (Sep 2013).
9. M. Takamoto and M. Araie, Ophthalmology Journal 58, 1 (2014).
10. J. H. Fingert, Eye (Lond) 25, 587 (May 2011).
11. K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal and A.-L. Barabasi, Proceedings of the
National Academy of Sciences 104, 8685 (2007).
12. C. Darabos, K. Desai, R. Cowper-Sallari, M. Giacobini, B. Graham, M. Lupien and J. Moore,
Lecture Notes in Computer Science 7833, 23 (2013).
13. C. Darabos, M. J. White, B. E. Graham, D. N. Leung, S. Williams and J. H. Moore, BioData
Mining 7, p. 1 (Jan 2014).
14. H. Li, Y. Lee, J. L. Chen, E. Rebman, J. Li and Y. A. Lussier, Journal of the American Medical
Informatics Association : JAMIA 19, 295 (January 2012).
15. D. Welter, J. MacArthur, J. Morales, T. Burdett, P. Hall, H. Junkins, A. Klemm, P. Flicek,
T. Manolio, L. Hindorff and H. Parkinson, Nucleic Acids Res 42, D1001 (Jan 2014).
16. T. Hu, N. A. Sinnott-Armstrong, J. W. Kiralis, A. S. Andrew, M. R. Karagas and J. H. Moore,
BMC Bioinformatics 12, p. 364 (2011).
17. T. Hu, Y. Chen, J. W. Kiralis and J. H. Moore, Genetic Epidemiology 37, 283 (2013).
18. T. M. Cover and J. A. Thomas, Elements of Information Theory: Second Edition (Wiley, 2006).
19. A. Jakulin and I. Bratko, Analyzing attribute dependencies, in Proceedings of the 7th European
Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), ,
Lecture Notes in Artificial Intelligence Vol. 2838 (Springer-Verlag, 2003).
20. J. H. Moore, J. C. Gilbert, C.-T. Tsai, F.-T. Chiang, T. Holden, N. Barney and B. C. White,
Journal of Theoretical Biology 241, 252 (2006).
21. D. Anastassiou, Molecular Systems Biology 3, p. 83 (2007).
22. J. H. Moore, N. Barney, C.-T. Tsai, F.-T. Chiang, J. Gui and B. C. White, Human Heredity 63,
120 (2007).
23. B. A. McKinney, J. E. Crowe, J. Guo and D. Tian, PLoS Genetics 5, p. e1000432 (2009).
24. M. E. J. Newman, Networks: An Introduction (Oxford University Press, 2010).
25. M. E. J. Newman, Physical Review Letters 89, p. 208701 (2002).
26. L. C. Freeman, Sociometry 40, 35 (1977).
27. A. Bavelas, Journal of the Acoustical Society of America 22, 725 (1950).
28. G. Sabidussi, Psychometrika 31, 581 (1966).
29. J. H. Moore and B. C. White, Lecture Notes in Computer Science 4447, 166 (2007).
30. C. S. Greene, N. M. Penrod, J. Kiralis and J. H. Moore, BioData Min 2, p. 5 (2009).
31. X. Sun, Q. Lu, S. Mukheerjee, P. K. Crane, R. Elston and M. D. Ritchie, Front Genet 5, p. 106
(2014).
32. C. International HapMap, Nature 437, 1299 (Oct 2005).
33. S. A. Pendergrass, A. Frase, J. Wallace, D. Wolfe, N. Katiyar, C. Moore and M. D. Ritchie,
BioData Min 6, p. 25 (2013).
34. F. Peng, D. Hu, C. Gu, X. Li, Y. Li, N. Jia, S. Chu, J. Lin and W. Niu, Gene 531, 435 (Dec
2013).
35. L. Lu, Y. Wu, Q. Qi, C. Liu, W. Gan, J. Zhu, H. Li and X. Lin, PLoS One 7, p. e34895 (2012).
36. B. E. Klein, R. Klein and S. C. Jensen, Ophthalmology 101, 1173 (Jul 1994).
37. P. Mitchell, W. Smith, T. Chey and P. R. Healey, Ophthalmology 104, 712 (Apr 1997).
38. V. Chopra, R. Varma, B. A. Francis, J. Wu, M. Torres and S. P. Azen, Ophthalmology 115, 227
(Feb 2008).
39. S. de Voogd, M. K. Ikram, R. C. W. Wolfs, N. M. Jansonius, J. C. M. Witteman, A. Hofman
and P. T. V. M. de Jong, Ophthalmology 113, 1827 (Oct 2006).
40. J. R. Wolter, Am J Ophthalmol 51, 1123 (May 1961).
41. V. H. Y. Wong, B. V. Bui and A. J. Vingrys, Clin Exp Optom 94, 4 (Jan 2011).
42. M. Nakamura, A. Kanamori and A. Negi, Ophthalmologica 219, 1 (Jan-Feb 2005).
43. V. C. Lima, T. S. Prata, C. G. V. De Moraes, J. Kim, W. Seiple, R. B. Rosen, J. M. Liebmann
and R. Ritch, Br J Ophthalmol 94, 64 (Jan 2010).
44. K. M. Ling Tan, S. Q. D. Ooi, S. G. Ong, C. S. Kwan, R. M. E. Chan, L. K. Seng Poh, J. Mendoza,
C. K. Heng, K. Y. Loke and Y. S. Lee, J Clin Endocrinol Metab 99, E931 (May 2014).
45. P. Coveney and R. Highfield, Frontiers of Complexity: The Search for Order in a Chaotic World
(Faber and Faber, London, 1995).
46. E. Wheeler, N. Huang, E. G. Bochukova, J. M. Keogh, S. Lindsay, S. Garg, E. Henning, H. Blackburn, R. J. F. Loos, N. J. Wareham, S. O’Rahilly, M. E. Hurles, I. Barroso and I. S. Farooqi,
Nat Genet 45, 513 (May 2013).
47. P. A. Newman-Casey, N. Talwar, B. Nan, D. C. Musch and J. D. Stein, Ophthalmology 118,
1318 (Jul 2011).
48. E. A. Zang and E. L. Wynder, Nutr Cancer 21, 247 (1994).
49. A. U. Bayer, F. Ferrari and C. Erb, Eur Neurol 47, 165 (2002).
50. P. Wostyn, K. Audenaert and P. P. De Deyn, Br J Ophthalmol 93, 1557 (Dec 2009).
51. S.-J. Chen, P. Lu, W.-F. Zhang and J.-H. Lu, Int J Ophthalmol 5, 750 (2012).
52. S. S. Hayreh, Survey of Ophthalmology 43, Supplement 1, S27 (1999).
Download