An Attempt to Categorize using Affective Disorder Pathways

advertisement
An Attempt to Categorize
the Severity of the Chronic Fatigue Syndrome Disease
using Affective Disorder Pathways
Earl F. Glynn1, Frank Emmert-Streib1 and Arcady R. Mushegian1,2
1
Stowers Institute for Medical Research
th
1000 East 50 Street, Kansas City, MO 64110 USA
816.926.4412
{efg | fes | arm}@Stowers-Institute.org
2
University of Kansas Medical Center
Kansas City, KS 66160 USA
ABSTRACT
Gene expression and SNP data may help to categorize the disease
state of patients with chronic fatigue syndrome objectively instead
of the current subjective clinical surveys. Analysis of a
comprehensive set of candidate genes, chosen based on a priori
hypotheses about the primary cause of a disease, may yield
successful detection of specific genes and pathways associated
with an illness.
We study the effectiveness of using affective disorder pathways,
defined by Hattori's list of 257 affective disorder genes, to
categorize the severity of the disease state of 227 chronic fatigue
syndrome patients from a clinical study in Wichita. We evaluate
whether the changes in gene expression in affective disorders, or
the information in limited SNP data, are good predictors of the
chronic fatigue syndrome severity category, which was recently
published by Reeves et al.
Keywords
Chronic Fatigue Syndrome, microarray analysis, SNP analysis,
principal component analysis, gene expression, heatmap
1. INTRODUCTION
Chronic Fatigue Syndrome (CFS) is characterized by profound
fatigue, which seriously interferes with daily activities [5]. It is a
debilitating illness with no known cause or effective therapy [1].
CFS is defined by symptoms and disability and has no
confirmatory physical signs or characteristic laboratory
abnormalities. The etiology, risk factors and pathophysiology of
CFS are unknown [10]. The disease is not explained by
conventional medical and psychiatric diagnoses [12].
Estimates of the number affected by CFS in the US vary from
400,000 [12] to 2.2 million [1]. The duration of the disease
normally ranges from 2 to 7 years [5], but can persist as long as
20 years [12]. While afflicted, the disease is cyclic in occurrence
and severity of its symptoms [11]. The drain on the economy is
estimated at $9.1 billion/year, or about $20,000 per person [12].
Even worse, many cases of CFS are unrecognized by the medical
community, and persons diagnosed with CFS may not have CFS
[13]. The illness remains an inadequately managed health
problem [12]. There are no standardized criteria for defining
CFS, and the lack of standardized criteria has constrained research
[11]. Since CFS is clinically defined by self-reported symptoms,
finding a reliable clinical test is highly desirable.
A pathologic lesion for CFS is unknown, so a specified CFSspecific diseased sample cannot be studied. In a proposed study
of CFS patients in Wichita [14], the Centers for Disease Control
hypothesized that a gene expression profile of peripheral blood
mononuclear cells, a standard sample for profiling psychoneuroendocrine-immune processes [6], could be used to develop a
molecular signature of CFS. CDC said the intent of their
proposed microarray study was “to determine the association
between measurements of gene expression and peripheral
neuroendocrine activity...”.
A recent microarray study, which used peripheral blood samples,
looked at finding genes and metabolic pathways that explained
CFS more accurately and reliably than the subjective surveys,
which are currently used [11]. Whistler used a microarray with
3800 genes to look for the exercise responsive genes in the
peripheral blood of women affected by CFS and identified 21
differentially expressed genes [16]. Other work by Whistler
identified about 100 genes that are differentially expressed
between patients with sudden onset of CFS, versus a gradual
affliction [15]. This later study cautioned about the interpretation
of results since the number of subjects was small, were only
women, and the genes profiled represented only a fraction of
those potentially important. (Females were also overrepresented
in the Wichita CFS study: 186 females vs. 41 males).
Hattori [4] suggested that candidate genes should be chosen based
on a priori hypotheses on the primary cause of the disease being
studied. In 2005 Hattori published a list of 257 well -documented
candidate affective disorder genes for five putative
pathophysiology pathways, all thought possibly to play a part in
CFS:
1.
Neurotransmission system
2.
Neuroendocrine system
3.
Intracelluar signaling genes largely shared by 1 and 2
4.
Circadian rhythm
5.
Genes implicated in pathophysiology of other diseases
relevant to major affective disorders
In the study of a complex disease, like CFS, where individual
genes may each have only a weak effect, it is sensible to look at a
list of candidate affective disorder genes, which are thought to be
related to CFS, and to ask which of them may be helpful in
categorizing CFS patients (or perhaps rejecting CFS diagnosis)
more effectively than the subjective surveys currently in use. We
attempted to answer this question by looking at microarray and
SNP data from a Wichita CFS study for affective disorder genes
given in Hattori's list [4] to determine if these genes could
categorize the patients as well as the recently published CFS
patient "clusters" [11].
The "raw" sARMDens values were skewed to the left with many 0
values, somewhat like how Affy raw intensity values are skewed.
A log2 transformation [actually, log2(1+value), because of the
many 0 values] resulted in a near-normal distribution, except for
the spike in values near 0, as shown in Figure 1.
Histogram of log2[Expression]
1500
1000
0
2.1 Clinical Data
500
The Wichita CFS data analyzed in this study [2] were previously
unpublished, and no published paper describes the microarray,
proteomics or SNP data. Some previous papers, e.g., [11],
describe aspects of the clinical data, but no coding manual or data
description exists for the ~225 clinical data fields. In some cases,
it is not clear what the data represent at all. For example, the
“alert flags” were not defined.
Frequency
2. METHODS
In December 2005 Reeves et al [11] published their severity
clusters of the Wichita CFS patients based on data from two
surveys. Only when this paper was published was the significance
of the "cluster" field in the clinical data known with certainty.
Based on this paper, we assume the "Cluster" field values can be
interpreted as follows in Table 1:
Cluster Frequency
(Truncated y-axis should extend to 10,000)
2000
This paper is organized as follows: Section 2 explains the data
and analysis methods used. Section 3 explains the results (with
additional results in the online supplement). Section 4 is a brief
discussion of the microarray analysis, followed by Section 5, the
conclusions.
The data field sARMDens in the microarray expression datasets
was identified as the density value of each microarray spot minus
the background. Since re-analysis of the datasets was not possible
without ArrayVision RLS software, the data were used without
modification.
Description
Worst
30
Most Severe ("lowest SF-36, highest MFI")
Middle
67
Intermediate
Least
67
Least Severe ("scores essentially reflected
population norms.")
Table 1. Summary of "cluster" frequencies in clinical data.
Cluster colors will be used in heatmap sidebars.
Many patients in the "least severe" category were part of the 55
non-fatigued controls matched to CFS patients based on sex, race,
age and body mass index.
We assume these categories are correct for the purpose of
analyzing whether Hattori's affective disorder gene list can be
used to differentiate these three severity groups.
2.2 Microarray Data
According to an E-mail from Suzanne Vernon, CDC, the Wichita
CFS microarray slides were from MWG Biotech and used their
proprietary RLS (Resonance Light Scattering) technology. The
two microarray slides contained a total of about 20,000 features.
The blood samples used in the microarray study were collected
after the patients were recumbent for 30 minutes (as opposed to
Whistler’s exercise study [16]).
ArrayVision RLS software, which is sold separately from the
"regular" ArrayVision software, is being phased out by its owner,
Invitrogen, and was not available to re-analyze the microarray
readout.
0
5
10
15
log2[Expression]
Figure 1. Histogram of log2[Expression] for probes
corresponding to affective disorder genes. The color scale will
be used in gene expression heatmaps.
2.3 Genomic Data
MWG Biotech provided a "master file" of the probes on the
microarray slides, but unfortunately, this file did not match all the
actual probe names in the gene expression datasets. So, the list of
probe names, which consisted of an accession number and a
suffix, needed to be matched with gene names, before the probes
could be connected with Hattori's gene list.
After excluding control probes, the gene expression datasets had
19,700 probes. An R [9] script using the Bioconductor biomaRt
package [3] was used to "connect" microarray probe IDs with
gene IDs. Of the 19,700 probes, 16,321 were connected to gene
IDs. Since some genes had multiple probes, only 12,958 unique
genes were present.
The genes associated with the microarray probes were matched
against the list of 257 genes from Hattori. The results of this
matching are shown in Table 2.
Because a few of the probes were replicates, a total of 380 probes
reduced to 367 unique identifiers, which matched 237 of the 257
genes in Hattori's list.
The expression data for these 380 probes were extracted from all
the gene expression value files from [2].
1.0
TOTAL
Description
Neurotransmission
Group
Hattori’s Set
1.1
Monoaminergic
42
38
51
1.2
Cholinergic
11
10
14
1.3
Amino-acid
44
40
70
1.4
Other or
neuromodulator
32
31
36
2.0
Neuroendocrine
system
2.1
HPA axis
2.2
Neurotrophic/
growth factor
3.0
Intracellular
signaling in 1&2
45
4.0
Circadian rhythm
30
4.1
Clock genes
15
13
22
4.2
Light/dark cycle
15
13
19
5.0
Major affective
disorders
5.1
Parkinson’s disease
12
11
20
5.2
Schizophrenia
21
19
36
237
367
257
Sub
129
20
45
33
257
Group
CAMDA ’06 Dataset
12
12
16
8
8
13
42
70
42
237
Sub
Probes
119
system
20
Table 2. Summary of Hattori’s Affective Disorder Genes
matched to probes in CAMDA ’06 microarray datasets.
Group colors will be used in heatmap sidebars.
As a starting point, "exploratory data analysis" of the expression
data using heatmaps and principal component analysis was used
to look for patterns and groups in the microarray data.
Heatmaps for the affective disorder genes were created with
various R scripts. Heatmaps were visually explored for patterns
of gene expression that correlated with the patient CFS severity
classification, but this inspection was tedious.
To automate and supplement visual inspection of the heatmaps,
the mean log expression values were computed for each of the
patient severity categories. The idea here is that the gene’s
“signal” for the severity category can be represented by the mean
signal of this gene in all patients within the category. Two-way
comparisons of these mean values were evaluated statistically. A
Welch two-sample t-test (R function t.test) was used to compute a
p-value for the comparison. Because of the spike in the histogram
(Fig 2), which may result in violation of assumptions needed for a
valid t-test, a Wilcoxon rank sum test (R's wilcox.test) was also
applied. Only probes with p-values less than 0.05 for both tests
were selected. A multiple test correction was not applied in this
preliminary analysis.
Principal Component Analysis (PCA) using Partek Pro [8] was
used to look for patterns and clustering of the patients by
scatterplots of the principal components of the gene expression
data.
ADRA1
A.AF0 1 2
3 20
26 1
1 .1
.1 .1
.1
ADRA1A.D3
ADRA1A.L
30
1 67
77 9
4 .1
.1 .1
.1
ADRA1
B.NM_ 0 0
ADRA1D.L
ADRA1
D.NM_ 0 3
01
0 77
67 2
8 .1
.1 .1
.1
ADRA2 B.M
A.M 3
14
8 04
41 1
5 .1
.1 .1
.1
ADRA2
ADRB1
.AF1 5
63
9 17
00 0
6 .1
.1 .1
.1
QDPR.AB0
TPH1
04
01
4 40
17 3
9 .1
.1 .1
.1
HTR1.NM_
A.AB0
HTR1B.D1
HTR1
D.AF4 9 0
8 99
97 5
9 .1
.1 .1
.1
HTR1 F.AF4
E.AF4 9
98
8 98
98 1
0 .1
.1 .1
.1
HTR1
HTR2
A.M0 0
80
6 62
84 1
1 .1
.1 .1
.1
HTR2
A.NM_
HTR2
B.AY1 3 0
6 76
75 3
1 .1
.1 .1
.1
HTR2C.X8
HTR3 B.AF0
A.AF4 8
90
8 58
98 2
4 .1
.1 .1
.1
HTR3
HTR4 .AJ 2 7
8 98
2 .1
.1
HTR5 A.NM_
04
21
4 14
01 7
2 .1
.1 .1
.1
HTR6 .L
HTR7 .NM_
.NM_ 0
01
19
9 86
85 0
9 .1
.1 .1
.1
HTR7
SL C6 .AY0
A4 .L 4
08
5 75
56 7
8 .1
.1 .1
.1
ABCG1
ABCG1
.NM_ 0 1
17
6 17
81 4
8 .1
.1 .1
.1
DBH.BC0
DRD1
.NM_ 0 3
00
0 62
79 5
4 .1
.1 .1
.1
DRD2.M
DRD2 .NM_
.NM_ 0
00
10
6 79
57 6
4 .1
.1 .1
.1
DRD3
DRD3
.NM_
0
3
3
65
8
.1
DRD3 .NM_ 0 3 3 66 0 .1 .1
.1
DRD5 .BC0
.AY1 0
39
6 74
75 8
0 .1
.1 .1
.1
DRD5
67
7 58
43 6
9 .1
.1 .1
.1
NR4DRD5.M
A2 .AB0 1
NR4 A2 .S7
7 48
15 5
4 .1
.1 .1
.1
DDC.BC0
09
0
M
AOA.M 6
22 6
.1 .1
MAOB.BC0
20
2 36
49 0
4 .1
.1 .1
.1
TH.NM_ 0 0
COMT.NM_
02
04
7 17
31 8
0 .1
.1 .1
.1
SL C6 A3 .L
8 A1
.BC0
03
6 05
31 3
7 .1
.1 .1
.1
SLSL
C1C1
8A1
.NM_
00
SL C1 8A2
.NM_ 0 0
05
3 90
05 6
4 .1
.1 .2
.1
CHAT.AF3
CHAT.NM_ 0
02
20
0 98
54 4
9 .1
.1 .2
.2
CHAT.NM_
CHAT.S4
5 43
01 2
8 .1
.1 .2
.2
CHRNA3
.U6 2
CHRNA4
.NM_ 0 8
03
0 71
74 2
4 .1
.1 .2
.2
CHRNA5.M
CHRNA6 .AF3
.AB0 8
75
9 58
25 5
1 .1
.1 .2
.2
CHRNA7
CHRNA7
.NM_
07
07
0 18
74 6
6 .1
.1 .2
.2
CHRNB2
.AF0
CHRNB3
.NM_ 0
00
00
0 73
74 8
9 .1
.1 .2
.2
CHRM
1 .NM_
CHRM
2 .AF3 8
5 58
8 .1
.2
GABRA1
.NM_
02
02
0 48
80 8
6 .1
.1 .3
.3
GABRA2
.BC0
GABRA2
.NM_
02
08
0 62
80 9
7 .1
.1 .3
.3
GABRA3
.BC0
GABBR1 .AJ
.AF0
92
9 18
14 6
8 .1
.1 .3
.3
GABBR1
01
GABBR1
.AJ 0
20
26
5 39
02 8
8 .1
.1 .3
.3
GABBR1
.NM_
SL
03
06
3 08
04 3
2 .1
.1 .3
.3
SLC6A1
C6 A1.NM_
1 .BC0
SL C6
C6 A1
A1 2
1 .NM_
.NM_ 0
00
13
4 04
22 4
9 .1
.1 .3
.3
SL
DBI.BC0
04
6 20
46 0
6 .1
.1 .3
.3
DBI.M 1
GAD2.M 7
74
0 82
43 6
5 .1
.1 .3
.3
GAD2.M
GAD1
.BC0
36
7 88
78 8
0 .1
.1 .3
.3
GAD1
.L 1
ABAT.L
36
2 52
96 9
1 .1
.1 .3
.3
GL RA3
.NM_ 0 9
0
GL RB.AF0
4 75
4 .1
.3
GRIA1.M
80
1 81
88 4
6 .1
.1 .3
.3
GRIA2 .L 2
GRIA3
.AL 0
30
57
6 32
21 5
3 .1
.1 .3
.3
GRIA3
.NM_
GRIA4
.NM_
04
09
0 20
82 8
9 .1
.1 .3
.3
GRIK1
.AJ 2
GRIK1
12
9 24
05 6
8 .1
.1 .3
.3
GRIK2
.AJ.L
25
GRIK2
RIK2 .BC0
.AJ 3 3
07
1 95
61 4
0 .1
.1 .3
.3
G
GRIK2
.NM_
09
29
1 45
95 1
6 .1
.1 .3
.3
GRIK3
.AJ 2
GRIK3 .NM_
.NM_ 0
01
04
0 61
83 9
1 .1
.1 .3
.3
GRIK4
GRIN1
.AF0
13
5 51
73 5
1 .1
.1 .3
.3
GRIN1
.D1
GRIN1
.NM_ 0
00
20
1 83
56 3
9 .1
.1 .3
.3
GRIN2
A.NM_
GRIN2
B.NM_
00
0 83
4 .1
.3
GRIN2B.U2 8
8 86
75 1
8 .1
.1 .3
.3
GRIN2B.U2
GRIN2B.U2
8 83
86 5
2 .1
.1 .3
.3
GRIN2
C.NM_ 0 0 0
GRIN2
D.NM_
03
05
0 69
83 8
6 .1
.1 .3
.3
GRM
1 .AL 0
GRM1 .L
.L 3
75
6 31
63 8
1 .1
.1 .3
.3
GRM2
GRM3
.NM_
0
0
0
84
0
.1
GRM4 .NM_ 0 0 0 84 1 .1 .3
.3
GRM6
05
08
0 05
84 3
3 .1
.1 .3
.3
GRM.NM_
7 .AF4
GRM7
03
06
0 92
84 1
4 .1
.1 .3
.3
GRM.NM_
8 .AJ 2
GRM.NM_
8 .AJ 0
20
30
6 84
92 5
2 .1
.1 .3
.3
GRM8
GRM8
.U9
5
02
5
.1
SL C1 A1 .AF0 3 7 98 2 .1 .3
.3
SL C1
C1 A1
A1 .BC0
.AL 1 3
33
6 04
23 0
1 .1
.1 .3
.3
SL
SL
C1 A2
.AL 1 3
3 33
0 .1
.3
SL C1A2
.NM_
006
4 44
17 3
1 .1
.1 .3
.3
SL C1
A3 .D2
SL C1
C1 A6
A6 .BC0
.AC0 2
08
4 72
65 1
9 .1
.1 .3
.3
SL
SL C6A9
C1A6 .NM_
.NM_ 0
00
06
5 93
07 4
1 .1
.1 .3
.3
SL
SL C6 A9 .S7
0 91
60 7
9 .1
.1 .3
.3
DAO.NM_
001
SRR.AF1
60
9 41
97 4
4 .1
.1 .4
.3
AVP.AL
16
AVP.M
2
5
64
7
.1
AVPR1 A.AF0 3 0 62 5 .1 .4
.4
CCK.BC0 1
03
8 60
28 5
3 .1
.1 .4
.4
CCKAR.L
CCKBR.L 4
01
7 24
74 0
6 .1
.1 .4
.4
HCRT.AF0
HCRTR1 .AF0
.AF0 4
41
1 24
24 5
3 .1
.1 .4
.4
HCRTR2
NPY.M
1
5
78
9
.1
NPY1
R.BC0
30
6 91
65 0
7 .1
.1 .4
.4
NPY2
R.NM_
00
.4
NPY5NTS.BC0
R.NM_ 0 1
00
6 91
17 8
4 .1
.1 .4
.4
NTSR1 .AL3
.AL3 5
57
70
0 33
33 .2
.1 .1
.1 .4
.4
NTSR1
NTSR1 .NM_
.AL3 50700233
.1 .4
.4
NTSR1
53.3
1 .1
NTSR2
.NM_ 0 3
12
2 62
34 5
4 .1
.1 .4
.4
SST.BC0
TAC1 .NM_
.NM_ 0
01
13
3 99
99 7
6 .1
.1 .4
.4
TAC1
TACR1.M 8
84
1 42
79 6
7 .1
.1 .4
.4
TACR1.M
TACR2
.AB0
66
5 39
73 2
1 .1
.1 .4
.4
TACR3
.S8
VIP.L
06
0 56
15 6
7 .1
.1 .4
.4
VIPR2
.L 3
GPR2
4 .AB0
64
3 56
17 2
4 .1
.1 .4
.4
PDYN.AL
03
OPRD1 .NM_
.NM_ 0
00
00
0 91
91 2
1 .1
.1 .4
.4
OPRK1
OPRM1 .U1
2 56
9 .1
.4
ADO
RA1
.AY1
3
6
74
6
.1
ADORA2 A.NM_ 0 0 0 67 5 .1 .4
.4
ADORA2.AL3
B.AY1
74.1
8 .1
.1 .4
.4
ADORA3
9 031695
ADORA3
.AL3
9 021995
.1 .4
.4
ADORA3
.BC0
83.2
1 .1
POMC.J 1
01
0 03
29 1
2 .2
.2 .1
.1
CRH.BC0
CRHR1 .U1
.L 2 6
3 27
33 3
3 .2
.2 .1
.1
CRHR1
CRHR2
06
05
1 91
88 5
3 .2
.2 .1
.1
MC2.NM_
R.AB0
NR3 C1
C1 .U0
.U0 1
13
3 51
51 .2
.1 .2
.2 .1
.1
NR3
NR3
C2C2.M
.AJ 3 1
16
5 80
51 1
4 .2
.2 .1
.1
NR3
M C4R.L
08
8 61
60 1
3 .2
.2 .1
.1
HSPA5
.AF1 8
SERPINA6
.J 0 2 53
94 5
3 .2
.2 .1
.1
.AF0
HSD1ABCB1
1B1 .AL0
2 213698
.1 .2
.1
HSD1 1B1
1B1 .AL0
.AL0 3
21
23
3 16
98 .1
.2 .2
.2 .1
.1
HSD1
HSD1
1B1
.AL0
3
1
3
16
.2
.2
HSD1 1 B1 .AY0 4 4 08 4 .2 .1
.1
BDNF.NM_
00
01
1 96
70 3
9 .2
.2 .2
.2
EGF.NM_ 0
FGF2 .S4
7 15
38 6
0 .2
.2 .2
.2
IGF1.M
14
IGF1 .U4
0 66
87 0
0 .2
.2 .2
.2
TGFB1 .NM_
000
IGF1
R.NM_
0
0
0
87
5
.2
NTRK2 .AF4 0 0 44 1 .2 .2
.2
NTRK2 .AF4
.AF4 1
10
0 90
89 0
9 .2
.2 .2
.2
NTRK2
NTRK2
.AF4
12
0 53
90 0
1 .2
.2 .2
.2
NTRK3
.NM_
00
NTRK3 .S7 636
47962.2
.2
ADCY9.AF0
7 .3
ADCY9.AY0
28 3
92
59
9 .3
.3
ADRBK2.AL
0 05
22
ADRBK2
.NM _ 0
16
0 .3
CREB1 .M427
1 .3
.3
CREM.D1
82659.1
CREM.D1 4
4 82
82 6
5 .1
.2 .3
.3
CREM.D1
CREM.D1
82868.2
.3
CREM
.NM _ 0401
1 .3
GNAI2.BC0
14 0
67
20
7 .3
.3
GNAI2
.NM _ 0 02
GNAL .L 05
10 2
65
63
5 .3
.3
GNAS.AF1
GNAS.BC0
5 .3
.3
GNAS.M
2 122
14827.1
GNAS.M
2
1
14
2
.2
GNAS.NM _ 0 80 4 2 5 .3
.3
GNAS.NM
_ 0 20
80 9
46
25
6 .3
.3
PDE4 A.L
PDE4
37 2
71
43
4 .3
.3
PDE4A.M
A.S75
PDE4 A.U97
A.U18 5
08
84
7 .3
.3
PDE4
PDE4
B.U85 0
48
.3
PDE4 D.U50
D.L 20 1
95
77
0 .3
.3
PDE4
PRKACA.NM
_ 0 04
02 4
79
32
0 .3
.3
PRKAR2 B.AC0
RGS2 0.AF3
0.AF0 66
74 0
95
74
9 .3
.3
RGS2
RGS2 0.AY0
0.AF3 46
66 5
03
58
5 .3
.3
RGS2
RGS4.AF4 22
93 0
90
29
8 .3
.3
RGS7.BC0
PPP1 R9
R1 B.AJ
B.AK0
24 1
58
99
3 .3
.3
PPP1
4 01
KCNN3.AF0 49
31 7
83
14
5 .3
.3
KCNN3.AY0
MPRKCA.M
ARCKS.D10
59
29
2 .3
.3
22 1
PRKCA.NM _
_ 0 02 4
730
7 .3
.3
PRKCE.NM
PL A2
G1 B.AC0 00305
98 20.1
.3
PLPLA2
A2 G1G1
B.AC0
0 305
98328.2
.3
B.BC0
6 .3
PL
CG1.AL 0 15
22 9
32
90
4 .3
.3
GNB3.BC0
GNB3 .M
.M 13
31 9
39
24
8 .3
.3
BCL2
BCL2 .M 13
13 3
98
92
5 .3
.3
DUSP6.AB0
DUSP6.BC0
05 6
04
45
7 .3
.3
MAP2
K2.BC0 18
MAPK1
.NM _ 0 83
02 8
73
40
5 .3
.3
AKT1.AF2
GNAQ.NM
_ 0 05
02 2
06
72
2 .3
.3
GNA1 1.AC0
1 .M 42
69 7
02
19
3 .3
.3
IMGNA1
PA1.AF0
IM PA2.AF1
PA2.AF0 57
14 1
30
92
8 .3
.3
IM
INPP5 F.AF1
13 2
22
20
7 .3
.3
ITPKA.NM
_
0
02
ITPKB.NM _ 0 02 2 2 1 .3
ITPKB.Y18
04
26
4 .3
.3
PIK3 C2 B.NM
_ 0 02 6
PIK3
C3CA.AF0
.NM _ 0 12
02 8
67
42
7 .3
.3
PIK4
36 0
13
54
1 .3
.3
PIP5PIK4CA.L
K2 A.BC0 18
KIAA0
27 4 .D87
41
67
4 .3
.3
SYNJ 1.AB0
20 7
SYNJ.AB0
1.AF0
0 .1
.3
ARNTL
0 009
81034.4
ARNTL .AF0
.AB0 4
04
0 28
81 8
5 .4
.4 .1
.1
ARNTL
ARNTL .U5
.D8 1
9 62
72 7
2 .4
.4 .1
.1
ARNTL
ARNTL 2 .AB0
.BC0 0
04
0 06
17 6
2 .4
.4 .1
.1
BHLHB2
CLOCK.AB0 0
05
2 53
33 5
2 .4
.4 .1
.1
CLOCK.AB0
CRY1
.AK0
94
8 65
61 7
5 .4
.4 .1
.1
CRY1
.D8
CSNK1 D.NM_
001
89 3
.4 .1
CSNK1
E.AB0
22
4 10
59 7
7 .4
.4 .1
.1
PER1
.AB0 0
PER1 .AB0
.BC0 0
22
8 34
20 5
7 .4
.4 .1
.1
PER2
PER2PER3
.NM_.Z9
028
2 88
81 4
7 .4
.4 .1
.1
TIM EL
ESS.AF0
91
8 35
16 2
2 .4
.4 .1
.1
DBP.NM_
00
NR1
D1.M
2
4
89
8
.4
NR1 D1.M 2 4 90 0 .4 .1
.1
PROK2
.AF3 3
31
3 17
02 2
5 .4
.4 .2
.2
TGFA.M
EGFR.AF1
FR.AF1 2
25
55
5 39
39 .2
.1 .4
.4 .2
.2
EG
EGFR.AF2
85
8 22
73 8
8 .4
.4 .2
.2
EG
FR.NM_ 0 0
EGFR.U9
5
08
9
.4
AANAT.NM_ 0 0 1 08 8 .4 .2
.2
M TNR1
A.AF4
35
5 95
58 8
8 .4
.4 .2
.2
MM
TNR1
A.NM_
03
0
TNR1
B.AB0
3 59
8 .4
.2
CRX.AF0
27
4 78
71 8
1 .4
.4 .2
.2
OPN4
.AF1 4
ADCYAP1R1
.NM_
06
05
1 70
11 0
7 .4
.4 .2
.2
ADCYAP1
.AB0
FYN.M 1
14
4 67
33 6
3 .4
.4 .2
.2
FYN.M
RAB3.NM_
A.AC0
62
8 51
49 8
9 .4
.4 .2
.2
NPAS2
00
NPAS2
.U5
1 97
62 3
5 .5
.4 .1
.2
PARK2
.AB0
09
PARK2
.BC0
2
2
01
4
.5
PARK2 .NM_ 0 1 3 98 7 .5 .1
.1
PARK2SNCA.L
.NM_ 0 3
16
3 67
98 4
8 .5
.5 .1
.1
SNCAIP.AF1
60
7 33
30 2
6 .5
.5 .1
.1
UCHL 1 .BC0 0
GPR3UBB.AC0
7 .NM_ 0 0
05
5 25
30 3
2 .5
.5 .1
.1
UBB.AF3
4
8
70
0
.5
UBB.BC0
18
4 95
88 5
0 .5
.5 .1
.1
UBB.NM_
01
.1
UBE1.M
53
8 33
02 4
8 .5
.5 .1
.1
UBE1
.NM_ 0 0
STUB1 .AE0
.AE0 0
06
64
4 64
64 .2
.1 .5
.5 .1
.1
STUB1
STUB1
.AF2
10
7 51
96 9
8 .5
.5 .1
.1
UBE2
L 3 .AJ
00
UBE2
L 6 .AL
.AF0
64
1 41
73 7
6 .5
.5 .1
.1
PARK7
03
PARK7
.BC0
05
8 04
18 5
8 .5
.5 .2
.1
REL
N.NM_
00
DISC1 .AJ
.AB0
06
7 17
92 7
6 .5
.5 .2
.2
DISC1
50
DISC1
.AJ 5 8
02
6 07
17 8
8 .5
.5 .2
.2
NDEL
1 .AF1
PAFAH1
B1 B1
.AF2
03
8 38
83 7
8 .5
.5 .2
.2
PAFAH1
.L 1
PTAFR.M
82
8 62
17 4
7 .5
.5 .2
.2
PTAFR.S5
CHL.NM_
1 .AF0
00
2 42
24 5
6 .5
.5 .2
.2
L 1 CAM
00
0
NCAM
1 .NM_ 0
0 61
5 .5
.2
DTNBP1
.AK0
5
4
59
3
.5
NRG1 .AF4 9 1 78 0 .5 .2
.2
NRG1 .L 9
44
1 16
82 5
7 .5
.5 .2
.2
NRG1.M
NRG1.M 9
94
4 16
16 7
6 .5
.5 .2
.2
NRG1.M
NRG1
.NM_ 0
01
16
3 33
96 5
0 .5
.5 .2
.2
PRO
DH.NM_
CL
DN1 1 .BC0
.AJ 2 0
42
5 70
90 6
1 .5
.5 .2
.2
ERBB3
ERBB3.M
31
4 95
30 3
9 .5
.5 .2
.2
ERBB3 .S6
GALC.D2
5 11
28 6
4 .5
.5 .2
.2
GALC.L
23
BP.L 3
10
8 51
86 5
5 .5
.5 .2
.2
MMBP.M
MOG.NM_
004
2 56
43 7
3 .5
.5 .2
.2
M OG.U6
OL IG2
.NM_30105587
80.1
6 .5
.5 .2
.2
SOX1
0 .AL0
.AL0
SOX1
0
3 1 5 87 .2
.5 .2
SOX1
0 .BC0 8
08
7 14
59 4
5 .5
.5 .2
.2
TF.AF2
TF.M
1
1
37
2
.5
TF.M 1 2 52 5 .5 .2
.2
No.
2.5 SNP Data
An R script was used to reformat the Excel worksheets of data
into a single matrix of 223 patients by 40 SNPs. This included
information from 10 genes, 6 of which are in Hattori's Group 1,
while 4 of the genes are in Group 2. Gene expression data is
available for 9 of the 10 SNP genes.
This SNP matrix was analyzed using heatmaps created in R and
PCA using Partek Pro.
3. RESULTS
3.1 Microarray Data
Several heatmaps of the gene expression data were studied
looking for gene expression patterns that correlated with the
patient CFS categories. Fig. 2 shows the overall heatmap, but no
obvious patterns were observed. Additional heatmaps may be
found in the online supplement.
26
Gene Gene
Expression
-- All Data
Groups
1
2
3
4
5
26077901
25072501
22419602
22019704
30
30 Worst (Most Severe)
67 Intermediate
24531401
23590003
23553102
23171703
22665403
22104702
21753101
21656101
21646505
21533303
20082302
10689003
10193601
10103103
10081101
24071401
27300202
22388001
22290005
10860201
27084202
28493201
25015003
20717901A
20717901B
23163604
21842502
20583901
29160601
28603101
28268103
27792302
27316201
27297204
26173601
25465004
24699304
23845206
22771403
22160902
20676602
10268605
22089405
23775701
22248202
21629101
29430601
28063802
26940404
25909605
25869702
25198707
22032504
21217004
21187102
10803801
10240402
28647903
28542303A
28542303B
28423601
27472402
27369701
27343302
27080302
26734203
26680803
26153406A
26153406B
25658603
24884201
24655701
24547404
Patients
67 Least Severe (Controls)
27374101
27242303
26874901
26803202
26399002
26158601
26096201
25738102
25654501
25345302
2.4 Microarray Analysis
63 Excluded
23032801
21268303
25505307
26731903
22769401
22256806
21785503
22257304
24525202
22690303
28950002
28337501
28322303
26653201
26631601
26406501
23660804
10043905
27758104
21689102
10243501A
10243501B
28494803
28478101
28354403
27879003
25124001
24983901
23869501
23768502
23696901
23227803
20866603A
20866603B
10261501
26275202
25215703
24799804
20465002
28641304
26357603
22507101
28428202
21225003
27067304
23214404
22350902
29408901
28985301
28039401
27106001
25950103
24056505
22743401
22453603
21159401
20731806
20563002
20532102
27914402
22803203A
22803203B
20634901
21987705
21196002
20077904
28762903
26461202
26056701
23885701
23804401
23681102
22117703
20366001
20052705A
20052705B
10203401
24904402
23770002
20416901
10215901
20129103A
20129103B
23899301
22403303
Genes
Figure 2. Gene expression heatmap of affective disorder
genes for all 227 Wichita CFS patients.
A variety of heatmaps by patient groups and gene groups were
inspected, but no gene patterns that correlated with patient groups
were observed. One unexpected pattern was observed when the
data were clustered by patients, but the cluster doesn't correlate
with patient category. The significance of this cluster has not
been studied.
Because visual "integration" of color for a single gene across a set
of patients is not an easy task for a human, the comparison was
automated.
The "Worst" and "Intermediate" groups were combined to form a
"Sick" group, which were compared against the "Least (controls)"
group. The comparison of mean log expression values for all
genes in this "Sick" vs "Control" comparison is shown graphically
in Fig. 3.
inspection by a human. Numerically, the expression average is
10.97 for the "sick"group, but only 10.38 for the control group.
The t-test p-value for this comparison is 0.02, while the Wilcoxon
rank sum test p-value is 0.01.
10
15
Sick vs Control
5
Sick
While a total of 37 genes were identified that showed differences
between any two patient severity groups, no single gene
differentiated all three categories. Only three genes, SERPINA6,
NTRK2 and PIP5K2A, had utility in differentiating all three
comparisons:
Sick vs. Control, Worst vs. Control and
Intermediate vs. Control. See the online supplement for details.
0
p ≥ 0.1
p < 0.1
p < 0.05
p < 0.01
0
5
10
15
Control
Figure 3. Comparison of mean log2 expression values for
affective disorder genes for Sick vs Control patient groups.
Here “Sick” refers to the combined
“Worst” + “Intermediate” groups.
Fig. 4 shows the heatmap for the resulting 22 genes that have a
statistically significant difference in their mean log expression
values for Sick vs Control. Find the gene list in the online
supplement.
Sick vs Control
PCA shows several interesting subgroups of patients, including
one obvious “outlier”. The most interesting PCA group is a
cluster of 17 "worst" patients shown by the ellipse in Fig. 5.
About half of the CFS “worst” patients are in this cluster, but
intermixed with other patients. The other half of the “worst”
patients are more dispersed in the diagram. See the online
supplement for details.
Note that the dots in the PCA scatterplots correspond to patients
and are colored by Reeve’s CFS severity classification. Purple
dots represent the “worst” CFS severity patients; blue dots
represent “intermediate” and green represent the “least” severe
patients (the controls). Patients excluded from the original study
are shown by the red dots. Partek software allows this figure to be
rotated in any direction interactively to view relationships among
the patients.
NTRK2.AF410901.2.2
MOG.U64567.5.2
GNAS.AF105253.3
UBE1.M58028.5.1
NTSR2.NM_012344.1.4
GRM8.AJ236921.1.3
SERPINA6.J02943.2.1
SRR.AF169974.1.3
OPRM1.U12569.1.4
NTRK2.AF410900.2.2
DRD2.M30625.1.1
IMPA2.AF157102.3
QDPR.AB053170.1.1
PDYN.AL034562.1.4
SLC6A9.S70609.1.3
HCRTR1.AF041243.1.4
GRM4.NM_000841.1.3
CHRNA6.AB079251.1.2
PIP5K2A.BC018034.3
SLC1A1.AL136231.1.3
GRM1.L76631.1.3
GNAS.NM_080426.3
26077901
25072501
22419602
22019704
28493201
25015003
20717901A
20717901B
23163604
21842502
20583901
29160601
28603101
28268103
27792302
27316201
27297204
26173601
25465004
24699304
23845206
22771403
22160902
20676602
10268605
22089405
23775701
22248202
21629101
29430601
28063802
26940404
25909605
25869702
25198707
22032504
21217004
21187102
10803801
10240402
28647903
28542303A
28542303B
28423601
27472402
27369701
27343302
27080302
26734203
26680803
26153406A
26153406B
25658603
24884201
24655701
24547404
24531401
23590003
23553102
23171703
22665403
22104702
21753101
21656101
21646505
21533303
20082302
10689003
10193601
10103103
10081101
24071401
27300202
22388001
22290005
10860201
27084202
23032801
21268303
25505307
26731903
22769401
22256806
21785503
22257304
24525202
22690303
28950002
28337501
28322303
26653201
26631601
26406501
23660804
10043905
27758104
21689102
10243501A
10243501B
28494803
28478101
28354403
27879003
27374101
27242303
26874901
26803202
26399002
26158601
26096201
25738102
25654501
25345302
25124001
24983901
23869501
23768502
23696901
23227803
20866603A
20866603B
10261501
26275202
Two genes, ARNTL and CRY1 could be used to differentiate
between the “Excluded” group and the control group, but were
not seen as different in the “sick” group. Perhaps these “clock”
genes reflect a condition for exclusion from a CFS categorization.
Figure 4. Gene Expression Heatmap for genes with
statistically significant differences in mean log2 expression
between the “Sick” and “Control” groups. The arrow identifies
the GRM1 gene, which is discussed in the text.
Fig. 4 shows gene expression for these 22 genes from left to right
and patients from top to bottom, with “severe” patients at the top
and “control” patients at the bottom. The first column of Fig. 4,
shown by the arrow above, is for the GRM1 gene. Note that
visually the area by the darker grey bars, corresponding to the
CFS “sick,” is slightly brighter than the darker area near the
control group. But, this could easily be missed with a visual
Figure 5. Partek ScatterPlot of first three PCA
Components of Gene Expression Data. Ellipse encloses cluster
of 17 of the 30 “Worst” CFS patients (with other patients).
Reeve’s CFS severity classification is shown by dot color.
3.2 SNP Data
7. ACKNOWLEDGMENTS
Thanks to Suzanne Vernon, Centers for Disease Control and
Prevention, for helpful E-mail discussions, especially about the
microarray data. Thanks to Christoph Bausch and Chris Seidel,
Stowers Institute, for helpful discussions and feedback. Thanks to
Gaye Hattem, Stowers Institute, for proofreading this document.
8. REFERENCES
[1] Bierl, Cynthia, et al. Regional distribution of fatiguing illnesses in
the United States: a pilot study. Population Health Metrics , 2:1,
2004.
[2] CAMDA 2006 Conference Datasets,
www.camda.duke.edu/camda06/datasets
[3] Durinck, Steffen, et al. BioMart and Bioconductor: a powerful link
between biological databases and microarray data analysis.
Bioinformatics, 21(16):3439-3440, 2005.
[4] Hattori, E, C Liu, H Zhu, and ES Gershon. Genetic tests of biologic
Figure 6. Partek scatterplot of first three PCA
components of SNP data.
No clustering of patients was observed.
Reeve’s CFS severity classification is shown by dot color.
No significant grouping patterns were observed in the SNP data
with heat maps or PCA. Fig. 6 shows how randomly the patients
are dispersed in a scatterplot of the principal components.
4. DISCUSSION
While Hattori’s list only had 20 endocrine system genes, a similar
approach would be interesting using the 1622 genes in
Nicholson’s psycho-neuroendococrine-immune database [6].
None of the 21 CFS "exercise genes" reported by Whistler [16]
were in the list identified here as discriminating CFS patients.
None of the identified genes matched any of the ~100
differentially expressed genes reported in another microarray
study [15]. A comparison of patients suffering from gradual onset
versus sudden onset has not been performed for the Wichita
patients.
Nisenbaum discusses the CFS illness states over time [7]. Since
Reeves suggests CFS is cyclic in occurrence and severity of its
symptoms [11], a microarray study attempting to identify "high"
and "low" states may be useful in identifying genes involved in
the disease.
systems in affective disorders. Molecular Psychiatry, 10(8), 719740, 2005.
[5] Jones, James F, et al, Medication by Persons with Chronic Fatigue
Syndrome: Results of a Randomized Telephone Survey in Wichita,
Kansas. Health and Quality of Life Outcomes, 1:74, 2003.
[6] Nicholson, Ainsley C, et al, Exploration of neuroendocrine and
immune gene expression in peripheral blood mononuclear cells.
Molecular Brain Research, 129:193-197, 2004.
[7] Nisenbaum, Rosane, et al. A population-based study of the clinical
course of chronic fatigue. Health and Quality of Life Outcomes ,
1:49, 2003.
[8] Partek. www.partek.com. Feb 2006.
[9] R Development Core Team (2005). R: A language and environment
for statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. www.R-project.org
[10] Reeves, William C, et al. Identification of ambiguity in the 1994
chronic fatigue syndrome case definition and recommendations for
resolution. BMC Health Services Research, 3:25, 2003.
[11] Reeves, William C, et al. Chronic fatigue syndrome – a clinically
empirical approach to its definition and study. BMC Medicine,
3:19, 2005.
[12] Reynolds, Kenneth J, et al. The economic impact of chronic fatigue
syndrome, Cost Effectiveness and Resource Allocation , 2:4, 2004.
[13] Solomon, Laura and WC Reeves. Factors Influencing the Diagnosis
of Chronic Fatigue Syndrome. Arch Intern Med, 164:2241-2245,
2004.
[14] US Centers for Disease Control & Prevention, National Center for
5. CONCLUSIONS
About three dozen genes were identified that differentiated the
CFS patient severity categories using the affective disorder genes.
The Hattori affective disorder genes show some utility in
differentiating CFS patient severity categories, but analysis is not
yet complete.
SNP data do not appear to be useful in identifying CFS patients.
Infectious Diseases. Proposal: Clinical Assessment of Subjects with
Chronic Fatigue Syndrome and Other Fatiguing Illnesses in Wichita.
Atlanta, GA. 2002.
ftp.camda.duke.edu/CAMDA06_DATASETS/wichita_clinical_irb_
protocol.doc
[15] Whistler, Toni, et al. Integration of gene expression, clinical, and
epidemiologic data to characterize Chronic Fatigue Syndrome.
Journal of Translational Medicine , 1:10, 2003.
[16] Whistler, Toni, et al. Exercise response genes measured in
6. SUPPLEMENTARY MATERIALS
This web page contains a full color version of this paper and
supplementary information, including all R source code:
http://research.stowers-institute.org/efg/2006/CAMDA/
peripheral blood of women with Chronic Fatigue Syndrome and
matched control subjects. BMC Physiology, 5:5, 2005.
Download