An Attempt to Categorize the Severity of the Chronic Fatigue Syndrome Disease using Affective Disorder Pathways Earl F. Glynn1, Frank Emmert-Streib1 and Arcady R. Mushegian1,2 1 Stowers Institute for Medical Research th 1000 East 50 Street, Kansas City, MO 64110 USA 816.926.4412 {efg | fes | arm}@Stowers-Institute.org 2 University of Kansas Medical Center Kansas City, KS 66160 USA ABSTRACT Gene expression and SNP data may help to categorize the disease state of patients with chronic fatigue syndrome objectively instead of the current subjective clinical surveys. Analysis of a comprehensive set of candidate genes, chosen based on a priori hypotheses about the primary cause of a disease, may yield successful detection of specific genes and pathways associated with an illness. We study the effectiveness of using affective disorder pathways, defined by Hattori's list of 257 affective disorder genes, to categorize the severity of the disease state of 227 chronic fatigue syndrome patients from a clinical study in Wichita. We evaluate whether the changes in gene expression in affective disorders, or the information in limited SNP data, are good predictors of the chronic fatigue syndrome severity category, which was recently published by Reeves et al. Keywords Chronic Fatigue Syndrome, microarray analysis, SNP analysis, principal component analysis, gene expression, heatmap 1. INTRODUCTION Chronic Fatigue Syndrome (CFS) is characterized by profound fatigue, which seriously interferes with daily activities [5]. It is a debilitating illness with no known cause or effective therapy [1]. CFS is defined by symptoms and disability and has no confirmatory physical signs or characteristic laboratory abnormalities. The etiology, risk factors and pathophysiology of CFS are unknown [10]. The disease is not explained by conventional medical and psychiatric diagnoses [12]. Estimates of the number affected by CFS in the US vary from 400,000 [12] to 2.2 million [1]. The duration of the disease normally ranges from 2 to 7 years [5], but can persist as long as 20 years [12]. While afflicted, the disease is cyclic in occurrence and severity of its symptoms [11]. The drain on the economy is estimated at $9.1 billion/year, or about $20,000 per person [12]. Even worse, many cases of CFS are unrecognized by the medical community, and persons diagnosed with CFS may not have CFS [13]. The illness remains an inadequately managed health problem [12]. There are no standardized criteria for defining CFS, and the lack of standardized criteria has constrained research [11]. Since CFS is clinically defined by self-reported symptoms, finding a reliable clinical test is highly desirable. A pathologic lesion for CFS is unknown, so a specified CFSspecific diseased sample cannot be studied. In a proposed study of CFS patients in Wichita [14], the Centers for Disease Control hypothesized that a gene expression profile of peripheral blood mononuclear cells, a standard sample for profiling psychoneuroendocrine-immune processes [6], could be used to develop a molecular signature of CFS. CDC said the intent of their proposed microarray study was “to determine the association between measurements of gene expression and peripheral neuroendocrine activity...”. A recent microarray study, which used peripheral blood samples, looked at finding genes and metabolic pathways that explained CFS more accurately and reliably than the subjective surveys, which are currently used [11]. Whistler used a microarray with 3800 genes to look for the exercise responsive genes in the peripheral blood of women affected by CFS and identified 21 differentially expressed genes [16]. Other work by Whistler identified about 100 genes that are differentially expressed between patients with sudden onset of CFS, versus a gradual affliction [15]. This later study cautioned about the interpretation of results since the number of subjects was small, were only women, and the genes profiled represented only a fraction of those potentially important. (Females were also overrepresented in the Wichita CFS study: 186 females vs. 41 males). Hattori [4] suggested that candidate genes should be chosen based on a priori hypotheses on the primary cause of the disease being studied. In 2005 Hattori published a list of 257 well -documented candidate affective disorder genes for five putative pathophysiology pathways, all thought possibly to play a part in CFS: 1. Neurotransmission system 2. Neuroendocrine system 3. Intracelluar signaling genes largely shared by 1 and 2 4. Circadian rhythm 5. Genes implicated in pathophysiology of other diseases relevant to major affective disorders In the study of a complex disease, like CFS, where individual genes may each have only a weak effect, it is sensible to look at a list of candidate affective disorder genes, which are thought to be related to CFS, and to ask which of them may be helpful in categorizing CFS patients (or perhaps rejecting CFS diagnosis) more effectively than the subjective surveys currently in use. We attempted to answer this question by looking at microarray and SNP data from a Wichita CFS study for affective disorder genes given in Hattori's list [4] to determine if these genes could categorize the patients as well as the recently published CFS patient "clusters" [11]. The "raw" sARMDens values were skewed to the left with many 0 values, somewhat like how Affy raw intensity values are skewed. A log2 transformation [actually, log2(1+value), because of the many 0 values] resulted in a near-normal distribution, except for the spike in values near 0, as shown in Figure 1. Histogram of log2[Expression] 1500 1000 0 2.1 Clinical Data 500 The Wichita CFS data analyzed in this study [2] were previously unpublished, and no published paper describes the microarray, proteomics or SNP data. Some previous papers, e.g., [11], describe aspects of the clinical data, but no coding manual or data description exists for the ~225 clinical data fields. In some cases, it is not clear what the data represent at all. For example, the “alert flags” were not defined. Frequency 2. METHODS In December 2005 Reeves et al [11] published their severity clusters of the Wichita CFS patients based on data from two surveys. Only when this paper was published was the significance of the "cluster" field in the clinical data known with certainty. Based on this paper, we assume the "Cluster" field values can be interpreted as follows in Table 1: Cluster Frequency (Truncated y-axis should extend to 10,000) 2000 This paper is organized as follows: Section 2 explains the data and analysis methods used. Section 3 explains the results (with additional results in the online supplement). Section 4 is a brief discussion of the microarray analysis, followed by Section 5, the conclusions. The data field sARMDens in the microarray expression datasets was identified as the density value of each microarray spot minus the background. Since re-analysis of the datasets was not possible without ArrayVision RLS software, the data were used without modification. Description Worst 30 Most Severe ("lowest SF-36, highest MFI") Middle 67 Intermediate Least 67 Least Severe ("scores essentially reflected population norms.") Table 1. Summary of "cluster" frequencies in clinical data. Cluster colors will be used in heatmap sidebars. Many patients in the "least severe" category were part of the 55 non-fatigued controls matched to CFS patients based on sex, race, age and body mass index. We assume these categories are correct for the purpose of analyzing whether Hattori's affective disorder gene list can be used to differentiate these three severity groups. 2.2 Microarray Data According to an E-mail from Suzanne Vernon, CDC, the Wichita CFS microarray slides were from MWG Biotech and used their proprietary RLS (Resonance Light Scattering) technology. The two microarray slides contained a total of about 20,000 features. The blood samples used in the microarray study were collected after the patients were recumbent for 30 minutes (as opposed to Whistler’s exercise study [16]). ArrayVision RLS software, which is sold separately from the "regular" ArrayVision software, is being phased out by its owner, Invitrogen, and was not available to re-analyze the microarray readout. 0 5 10 15 log2[Expression] Figure 1. Histogram of log2[Expression] for probes corresponding to affective disorder genes. The color scale will be used in gene expression heatmaps. 2.3 Genomic Data MWG Biotech provided a "master file" of the probes on the microarray slides, but unfortunately, this file did not match all the actual probe names in the gene expression datasets. So, the list of probe names, which consisted of an accession number and a suffix, needed to be matched with gene names, before the probes could be connected with Hattori's gene list. After excluding control probes, the gene expression datasets had 19,700 probes. An R [9] script using the Bioconductor biomaRt package [3] was used to "connect" microarray probe IDs with gene IDs. Of the 19,700 probes, 16,321 were connected to gene IDs. Since some genes had multiple probes, only 12,958 unique genes were present. The genes associated with the microarray probes were matched against the list of 257 genes from Hattori. The results of this matching are shown in Table 2. Because a few of the probes were replicates, a total of 380 probes reduced to 367 unique identifiers, which matched 237 of the 257 genes in Hattori's list. The expression data for these 380 probes were extracted from all the gene expression value files from [2]. 1.0 TOTAL Description Neurotransmission Group Hattori’s Set 1.1 Monoaminergic 42 38 51 1.2 Cholinergic 11 10 14 1.3 Amino-acid 44 40 70 1.4 Other or neuromodulator 32 31 36 2.0 Neuroendocrine system 2.1 HPA axis 2.2 Neurotrophic/ growth factor 3.0 Intracellular signaling in 1&2 45 4.0 Circadian rhythm 30 4.1 Clock genes 15 13 22 4.2 Light/dark cycle 15 13 19 5.0 Major affective disorders 5.1 Parkinson’s disease 12 11 20 5.2 Schizophrenia 21 19 36 237 367 257 Sub 129 20 45 33 257 Group CAMDA ’06 Dataset 12 12 16 8 8 13 42 70 42 237 Sub Probes 119 system 20 Table 2. Summary of Hattori’s Affective Disorder Genes matched to probes in CAMDA ’06 microarray datasets. Group colors will be used in heatmap sidebars. As a starting point, "exploratory data analysis" of the expression data using heatmaps and principal component analysis was used to look for patterns and groups in the microarray data. Heatmaps for the affective disorder genes were created with various R scripts. Heatmaps were visually explored for patterns of gene expression that correlated with the patient CFS severity classification, but this inspection was tedious. To automate and supplement visual inspection of the heatmaps, the mean log expression values were computed for each of the patient severity categories. The idea here is that the gene’s “signal” for the severity category can be represented by the mean signal of this gene in all patients within the category. Two-way comparisons of these mean values were evaluated statistically. A Welch two-sample t-test (R function t.test) was used to compute a p-value for the comparison. Because of the spike in the histogram (Fig 2), which may result in violation of assumptions needed for a valid t-test, a Wilcoxon rank sum test (R's wilcox.test) was also applied. Only probes with p-values less than 0.05 for both tests were selected. A multiple test correction was not applied in this preliminary analysis. Principal Component Analysis (PCA) using Partek Pro [8] was used to look for patterns and clustering of the patients by scatterplots of the principal components of the gene expression data. ADRA1 A.AF0 1 2 3 20 26 1 1 .1 .1 .1 .1 ADRA1A.D3 ADRA1A.L 30 1 67 77 9 4 .1 .1 .1 .1 ADRA1 B.NM_ 0 0 ADRA1D.L ADRA1 D.NM_ 0 3 01 0 77 67 2 8 .1 .1 .1 .1 ADRA2 B.M A.M 3 14 8 04 41 1 5 .1 .1 .1 .1 ADRA2 ADRB1 .AF1 5 63 9 17 00 0 6 .1 .1 .1 .1 QDPR.AB0 TPH1 04 01 4 40 17 3 9 .1 .1 .1 .1 HTR1.NM_ A.AB0 HTR1B.D1 HTR1 D.AF4 9 0 8 99 97 5 9 .1 .1 .1 .1 HTR1 F.AF4 E.AF4 9 98 8 98 98 1 0 .1 .1 .1 .1 HTR1 HTR2 A.M0 0 80 6 62 84 1 1 .1 .1 .1 .1 HTR2 A.NM_ HTR2 B.AY1 3 0 6 76 75 3 1 .1 .1 .1 .1 HTR2C.X8 HTR3 B.AF0 A.AF4 8 90 8 58 98 2 4 .1 .1 .1 .1 HTR3 HTR4 .AJ 2 7 8 98 2 .1 .1 HTR5 A.NM_ 04 21 4 14 01 7 2 .1 .1 .1 .1 HTR6 .L HTR7 .NM_ .NM_ 0 01 19 9 86 85 0 9 .1 .1 .1 .1 HTR7 SL C6 .AY0 A4 .L 4 08 5 75 56 7 8 .1 .1 .1 .1 ABCG1 ABCG1 .NM_ 0 1 17 6 17 81 4 8 .1 .1 .1 .1 DBH.BC0 DRD1 .NM_ 0 3 00 0 62 79 5 4 .1 .1 .1 .1 DRD2.M DRD2 .NM_ .NM_ 0 00 10 6 79 57 6 4 .1 .1 .1 .1 DRD3 DRD3 .NM_ 0 3 3 65 8 .1 DRD3 .NM_ 0 3 3 66 0 .1 .1 .1 DRD5 .BC0 .AY1 0 39 6 74 75 8 0 .1 .1 .1 .1 DRD5 67 7 58 43 6 9 .1 .1 .1 .1 NR4DRD5.M A2 .AB0 1 NR4 A2 .S7 7 48 15 5 4 .1 .1 .1 .1 DDC.BC0 09 0 M AOA.M 6 22 6 .1 .1 MAOB.BC0 20 2 36 49 0 4 .1 .1 .1 .1 TH.NM_ 0 0 COMT.NM_ 02 04 7 17 31 8 0 .1 .1 .1 .1 SL C6 A3 .L 8 A1 .BC0 03 6 05 31 3 7 .1 .1 .1 .1 SLSL C1C1 8A1 .NM_ 00 SL C1 8A2 .NM_ 0 0 05 3 90 05 6 4 .1 .1 .2 .1 CHAT.AF3 CHAT.NM_ 0 02 20 0 98 54 4 9 .1 .1 .2 .2 CHAT.NM_ CHAT.S4 5 43 01 2 8 .1 .1 .2 .2 CHRNA3 .U6 2 CHRNA4 .NM_ 0 8 03 0 71 74 2 4 .1 .1 .2 .2 CHRNA5.M CHRNA6 .AF3 .AB0 8 75 9 58 25 5 1 .1 .1 .2 .2 CHRNA7 CHRNA7 .NM_ 07 07 0 18 74 6 6 .1 .1 .2 .2 CHRNB2 .AF0 CHRNB3 .NM_ 0 00 00 0 73 74 8 9 .1 .1 .2 .2 CHRM 1 .NM_ CHRM 2 .AF3 8 5 58 8 .1 .2 GABRA1 .NM_ 02 02 0 48 80 8 6 .1 .1 .3 .3 GABRA2 .BC0 GABRA2 .NM_ 02 08 0 62 80 9 7 .1 .1 .3 .3 GABRA3 .BC0 GABBR1 .AJ .AF0 92 9 18 14 6 8 .1 .1 .3 .3 GABBR1 01 GABBR1 .AJ 0 20 26 5 39 02 8 8 .1 .1 .3 .3 GABBR1 .NM_ SL 03 06 3 08 04 3 2 .1 .1 .3 .3 SLC6A1 C6 A1.NM_ 1 .BC0 SL C6 C6 A1 A1 2 1 .NM_ .NM_ 0 00 13 4 04 22 4 9 .1 .1 .3 .3 SL DBI.BC0 04 6 20 46 0 6 .1 .1 .3 .3 DBI.M 1 GAD2.M 7 74 0 82 43 6 5 .1 .1 .3 .3 GAD2.M GAD1 .BC0 36 7 88 78 8 0 .1 .1 .3 .3 GAD1 .L 1 ABAT.L 36 2 52 96 9 1 .1 .1 .3 .3 GL RA3 .NM_ 0 9 0 GL RB.AF0 4 75 4 .1 .3 GRIA1.M 80 1 81 88 4 6 .1 .1 .3 .3 GRIA2 .L 2 GRIA3 .AL 0 30 57 6 32 21 5 3 .1 .1 .3 .3 GRIA3 .NM_ GRIA4 .NM_ 04 09 0 20 82 8 9 .1 .1 .3 .3 GRIK1 .AJ 2 GRIK1 12 9 24 05 6 8 .1 .1 .3 .3 GRIK2 .AJ.L 25 GRIK2 RIK2 .BC0 .AJ 3 3 07 1 95 61 4 0 .1 .1 .3 .3 G GRIK2 .NM_ 09 29 1 45 95 1 6 .1 .1 .3 .3 GRIK3 .AJ 2 GRIK3 .NM_ .NM_ 0 01 04 0 61 83 9 1 .1 .1 .3 .3 GRIK4 GRIN1 .AF0 13 5 51 73 5 1 .1 .1 .3 .3 GRIN1 .D1 GRIN1 .NM_ 0 00 20 1 83 56 3 9 .1 .1 .3 .3 GRIN2 A.NM_ GRIN2 B.NM_ 00 0 83 4 .1 .3 GRIN2B.U2 8 8 86 75 1 8 .1 .1 .3 .3 GRIN2B.U2 GRIN2B.U2 8 83 86 5 2 .1 .1 .3 .3 GRIN2 C.NM_ 0 0 0 GRIN2 D.NM_ 03 05 0 69 83 8 6 .1 .1 .3 .3 GRM 1 .AL 0 GRM1 .L .L 3 75 6 31 63 8 1 .1 .1 .3 .3 GRM2 GRM3 .NM_ 0 0 0 84 0 .1 GRM4 .NM_ 0 0 0 84 1 .1 .3 .3 GRM6 05 08 0 05 84 3 3 .1 .1 .3 .3 GRM.NM_ 7 .AF4 GRM7 03 06 0 92 84 1 4 .1 .1 .3 .3 GRM.NM_ 8 .AJ 2 GRM.NM_ 8 .AJ 0 20 30 6 84 92 5 2 .1 .1 .3 .3 GRM8 GRM8 .U9 5 02 5 .1 SL C1 A1 .AF0 3 7 98 2 .1 .3 .3 SL C1 C1 A1 A1 .BC0 .AL 1 3 33 6 04 23 0 1 .1 .1 .3 .3 SL SL C1 A2 .AL 1 3 3 33 0 .1 .3 SL C1A2 .NM_ 006 4 44 17 3 1 .1 .1 .3 .3 SL C1 A3 .D2 SL C1 C1 A6 A6 .BC0 .AC0 2 08 4 72 65 1 9 .1 .1 .3 .3 SL SL C6A9 C1A6 .NM_ .NM_ 0 00 06 5 93 07 4 1 .1 .1 .3 .3 SL SL C6 A9 .S7 0 91 60 7 9 .1 .1 .3 .3 DAO.NM_ 001 SRR.AF1 60 9 41 97 4 4 .1 .1 .4 .3 AVP.AL 16 AVP.M 2 5 64 7 .1 AVPR1 A.AF0 3 0 62 5 .1 .4 .4 CCK.BC0 1 03 8 60 28 5 3 .1 .1 .4 .4 CCKAR.L CCKBR.L 4 01 7 24 74 0 6 .1 .1 .4 .4 HCRT.AF0 HCRTR1 .AF0 .AF0 4 41 1 24 24 5 3 .1 .1 .4 .4 HCRTR2 NPY.M 1 5 78 9 .1 NPY1 R.BC0 30 6 91 65 0 7 .1 .1 .4 .4 NPY2 R.NM_ 00 .4 NPY5NTS.BC0 R.NM_ 0 1 00 6 91 17 8 4 .1 .1 .4 .4 NTSR1 .AL3 .AL3 5 57 70 0 33 33 .2 .1 .1 .1 .4 .4 NTSR1 NTSR1 .NM_ .AL3 50700233 .1 .4 .4 NTSR1 53.3 1 .1 NTSR2 .NM_ 0 3 12 2 62 34 5 4 .1 .1 .4 .4 SST.BC0 TAC1 .NM_ .NM_ 0 01 13 3 99 99 7 6 .1 .1 .4 .4 TAC1 TACR1.M 8 84 1 42 79 6 7 .1 .1 .4 .4 TACR1.M TACR2 .AB0 66 5 39 73 2 1 .1 .1 .4 .4 TACR3 .S8 VIP.L 06 0 56 15 6 7 .1 .1 .4 .4 VIPR2 .L 3 GPR2 4 .AB0 64 3 56 17 2 4 .1 .1 .4 .4 PDYN.AL 03 OPRD1 .NM_ .NM_ 0 00 00 0 91 91 2 1 .1 .1 .4 .4 OPRK1 OPRM1 .U1 2 56 9 .1 .4 ADO RA1 .AY1 3 6 74 6 .1 ADORA2 A.NM_ 0 0 0 67 5 .1 .4 .4 ADORA2.AL3 B.AY1 74.1 8 .1 .1 .4 .4 ADORA3 9 031695 ADORA3 .AL3 9 021995 .1 .4 .4 ADORA3 .BC0 83.2 1 .1 POMC.J 1 01 0 03 29 1 2 .2 .2 .1 .1 CRH.BC0 CRHR1 .U1 .L 2 6 3 27 33 3 3 .2 .2 .1 .1 CRHR1 CRHR2 06 05 1 91 88 5 3 .2 .2 .1 .1 MC2.NM_ R.AB0 NR3 C1 C1 .U0 .U0 1 13 3 51 51 .2 .1 .2 .2 .1 .1 NR3 NR3 C2C2.M .AJ 3 1 16 5 80 51 1 4 .2 .2 .1 .1 NR3 M C4R.L 08 8 61 60 1 3 .2 .2 .1 .1 HSPA5 .AF1 8 SERPINA6 .J 0 2 53 94 5 3 .2 .2 .1 .1 .AF0 HSD1ABCB1 1B1 .AL0 2 213698 .1 .2 .1 HSD1 1B1 1B1 .AL0 .AL0 3 21 23 3 16 98 .1 .2 .2 .2 .1 .1 HSD1 HSD1 1B1 .AL0 3 1 3 16 .2 .2 HSD1 1 B1 .AY0 4 4 08 4 .2 .1 .1 BDNF.NM_ 00 01 1 96 70 3 9 .2 .2 .2 .2 EGF.NM_ 0 FGF2 .S4 7 15 38 6 0 .2 .2 .2 .2 IGF1.M 14 IGF1 .U4 0 66 87 0 0 .2 .2 .2 .2 TGFB1 .NM_ 000 IGF1 R.NM_ 0 0 0 87 5 .2 NTRK2 .AF4 0 0 44 1 .2 .2 .2 NTRK2 .AF4 .AF4 1 10 0 90 89 0 9 .2 .2 .2 .2 NTRK2 NTRK2 .AF4 12 0 53 90 0 1 .2 .2 .2 .2 NTRK3 .NM_ 00 NTRK3 .S7 636 47962.2 .2 ADCY9.AF0 7 .3 ADCY9.AY0 28 3 92 59 9 .3 .3 ADRBK2.AL 0 05 22 ADRBK2 .NM _ 0 16 0 .3 CREB1 .M427 1 .3 .3 CREM.D1 82659.1 CREM.D1 4 4 82 82 6 5 .1 .2 .3 .3 CREM.D1 CREM.D1 82868.2 .3 CREM .NM _ 0401 1 .3 GNAI2.BC0 14 0 67 20 7 .3 .3 GNAI2 .NM _ 0 02 GNAL .L 05 10 2 65 63 5 .3 .3 GNAS.AF1 GNAS.BC0 5 .3 .3 GNAS.M 2 122 14827.1 GNAS.M 2 1 14 2 .2 GNAS.NM _ 0 80 4 2 5 .3 .3 GNAS.NM _ 0 20 80 9 46 25 6 .3 .3 PDE4 A.L PDE4 37 2 71 43 4 .3 .3 PDE4A.M A.S75 PDE4 A.U97 A.U18 5 08 84 7 .3 .3 PDE4 PDE4 B.U85 0 48 .3 PDE4 D.U50 D.L 20 1 95 77 0 .3 .3 PDE4 PRKACA.NM _ 0 04 02 4 79 32 0 .3 .3 PRKAR2 B.AC0 RGS2 0.AF3 0.AF0 66 74 0 95 74 9 .3 .3 RGS2 RGS2 0.AY0 0.AF3 46 66 5 03 58 5 .3 .3 RGS2 RGS4.AF4 22 93 0 90 29 8 .3 .3 RGS7.BC0 PPP1 R9 R1 B.AJ B.AK0 24 1 58 99 3 .3 .3 PPP1 4 01 KCNN3.AF0 49 31 7 83 14 5 .3 .3 KCNN3.AY0 MPRKCA.M ARCKS.D10 59 29 2 .3 .3 22 1 PRKCA.NM _ _ 0 02 4 730 7 .3 .3 PRKCE.NM PL A2 G1 B.AC0 00305 98 20.1 .3 PLPLA2 A2 G1G1 B.AC0 0 305 98328.2 .3 B.BC0 6 .3 PL CG1.AL 0 15 22 9 32 90 4 .3 .3 GNB3.BC0 GNB3 .M .M 13 31 9 39 24 8 .3 .3 BCL2 BCL2 .M 13 13 3 98 92 5 .3 .3 DUSP6.AB0 DUSP6.BC0 05 6 04 45 7 .3 .3 MAP2 K2.BC0 18 MAPK1 .NM _ 0 83 02 8 73 40 5 .3 .3 AKT1.AF2 GNAQ.NM _ 0 05 02 2 06 72 2 .3 .3 GNA1 1.AC0 1 .M 42 69 7 02 19 3 .3 .3 IMGNA1 PA1.AF0 IM PA2.AF1 PA2.AF0 57 14 1 30 92 8 .3 .3 IM INPP5 F.AF1 13 2 22 20 7 .3 .3 ITPKA.NM _ 0 02 ITPKB.NM _ 0 02 2 2 1 .3 ITPKB.Y18 04 26 4 .3 .3 PIK3 C2 B.NM _ 0 02 6 PIK3 C3CA.AF0 .NM _ 0 12 02 8 67 42 7 .3 .3 PIK4 36 0 13 54 1 .3 .3 PIP5PIK4CA.L K2 A.BC0 18 KIAA0 27 4 .D87 41 67 4 .3 .3 SYNJ 1.AB0 20 7 SYNJ.AB0 1.AF0 0 .1 .3 ARNTL 0 009 81034.4 ARNTL .AF0 .AB0 4 04 0 28 81 8 5 .4 .4 .1 .1 ARNTL ARNTL .U5 .D8 1 9 62 72 7 2 .4 .4 .1 .1 ARNTL ARNTL 2 .AB0 .BC0 0 04 0 06 17 6 2 .4 .4 .1 .1 BHLHB2 CLOCK.AB0 0 05 2 53 33 5 2 .4 .4 .1 .1 CLOCK.AB0 CRY1 .AK0 94 8 65 61 7 5 .4 .4 .1 .1 CRY1 .D8 CSNK1 D.NM_ 001 89 3 .4 .1 CSNK1 E.AB0 22 4 10 59 7 7 .4 .4 .1 .1 PER1 .AB0 0 PER1 .AB0 .BC0 0 22 8 34 20 5 7 .4 .4 .1 .1 PER2 PER2PER3 .NM_.Z9 028 2 88 81 4 7 .4 .4 .1 .1 TIM EL ESS.AF0 91 8 35 16 2 2 .4 .4 .1 .1 DBP.NM_ 00 NR1 D1.M 2 4 89 8 .4 NR1 D1.M 2 4 90 0 .4 .1 .1 PROK2 .AF3 3 31 3 17 02 2 5 .4 .4 .2 .2 TGFA.M EGFR.AF1 FR.AF1 2 25 55 5 39 39 .2 .1 .4 .4 .2 .2 EG EGFR.AF2 85 8 22 73 8 8 .4 .4 .2 .2 EG FR.NM_ 0 0 EGFR.U9 5 08 9 .4 AANAT.NM_ 0 0 1 08 8 .4 .2 .2 M TNR1 A.AF4 35 5 95 58 8 8 .4 .4 .2 .2 MM TNR1 A.NM_ 03 0 TNR1 B.AB0 3 59 8 .4 .2 CRX.AF0 27 4 78 71 8 1 .4 .4 .2 .2 OPN4 .AF1 4 ADCYAP1R1 .NM_ 06 05 1 70 11 0 7 .4 .4 .2 .2 ADCYAP1 .AB0 FYN.M 1 14 4 67 33 6 3 .4 .4 .2 .2 FYN.M RAB3.NM_ A.AC0 62 8 51 49 8 9 .4 .4 .2 .2 NPAS2 00 NPAS2 .U5 1 97 62 3 5 .5 .4 .1 .2 PARK2 .AB0 09 PARK2 .BC0 2 2 01 4 .5 PARK2 .NM_ 0 1 3 98 7 .5 .1 .1 PARK2SNCA.L .NM_ 0 3 16 3 67 98 4 8 .5 .5 .1 .1 SNCAIP.AF1 60 7 33 30 2 6 .5 .5 .1 .1 UCHL 1 .BC0 0 GPR3UBB.AC0 7 .NM_ 0 0 05 5 25 30 3 2 .5 .5 .1 .1 UBB.AF3 4 8 70 0 .5 UBB.BC0 18 4 95 88 5 0 .5 .5 .1 .1 UBB.NM_ 01 .1 UBE1.M 53 8 33 02 4 8 .5 .5 .1 .1 UBE1 .NM_ 0 0 STUB1 .AE0 .AE0 0 06 64 4 64 64 .2 .1 .5 .5 .1 .1 STUB1 STUB1 .AF2 10 7 51 96 9 8 .5 .5 .1 .1 UBE2 L 3 .AJ 00 UBE2 L 6 .AL .AF0 64 1 41 73 7 6 .5 .5 .1 .1 PARK7 03 PARK7 .BC0 05 8 04 18 5 8 .5 .5 .2 .1 REL N.NM_ 00 DISC1 .AJ .AB0 06 7 17 92 7 6 .5 .5 .2 .2 DISC1 50 DISC1 .AJ 5 8 02 6 07 17 8 8 .5 .5 .2 .2 NDEL 1 .AF1 PAFAH1 B1 B1 .AF2 03 8 38 83 7 8 .5 .5 .2 .2 PAFAH1 .L 1 PTAFR.M 82 8 62 17 4 7 .5 .5 .2 .2 PTAFR.S5 CHL.NM_ 1 .AF0 00 2 42 24 5 6 .5 .5 .2 .2 L 1 CAM 00 0 NCAM 1 .NM_ 0 0 61 5 .5 .2 DTNBP1 .AK0 5 4 59 3 .5 NRG1 .AF4 9 1 78 0 .5 .2 .2 NRG1 .L 9 44 1 16 82 5 7 .5 .5 .2 .2 NRG1.M NRG1.M 9 94 4 16 16 7 6 .5 .5 .2 .2 NRG1.M NRG1 .NM_ 0 01 16 3 33 96 5 0 .5 .5 .2 .2 PRO DH.NM_ CL DN1 1 .BC0 .AJ 2 0 42 5 70 90 6 1 .5 .5 .2 .2 ERBB3 ERBB3.M 31 4 95 30 3 9 .5 .5 .2 .2 ERBB3 .S6 GALC.D2 5 11 28 6 4 .5 .5 .2 .2 GALC.L 23 BP.L 3 10 8 51 86 5 5 .5 .5 .2 .2 MMBP.M MOG.NM_ 004 2 56 43 7 3 .5 .5 .2 .2 M OG.U6 OL IG2 .NM_30105587 80.1 6 .5 .5 .2 .2 SOX1 0 .AL0 .AL0 SOX1 0 3 1 5 87 .2 .5 .2 SOX1 0 .BC0 8 08 7 14 59 4 5 .5 .5 .2 .2 TF.AF2 TF.M 1 1 37 2 .5 TF.M 1 2 52 5 .5 .2 .2 No. 2.5 SNP Data An R script was used to reformat the Excel worksheets of data into a single matrix of 223 patients by 40 SNPs. This included information from 10 genes, 6 of which are in Hattori's Group 1, while 4 of the genes are in Group 2. Gene expression data is available for 9 of the 10 SNP genes. This SNP matrix was analyzed using heatmaps created in R and PCA using Partek Pro. 3. RESULTS 3.1 Microarray Data Several heatmaps of the gene expression data were studied looking for gene expression patterns that correlated with the patient CFS categories. Fig. 2 shows the overall heatmap, but no obvious patterns were observed. Additional heatmaps may be found in the online supplement. 26 Gene Gene Expression -- All Data Groups 1 2 3 4 5 26077901 25072501 22419602 22019704 30 30 Worst (Most Severe) 67 Intermediate 24531401 23590003 23553102 23171703 22665403 22104702 21753101 21656101 21646505 21533303 20082302 10689003 10193601 10103103 10081101 24071401 27300202 22388001 22290005 10860201 27084202 28493201 25015003 20717901A 20717901B 23163604 21842502 20583901 29160601 28603101 28268103 27792302 27316201 27297204 26173601 25465004 24699304 23845206 22771403 22160902 20676602 10268605 22089405 23775701 22248202 21629101 29430601 28063802 26940404 25909605 25869702 25198707 22032504 21217004 21187102 10803801 10240402 28647903 28542303A 28542303B 28423601 27472402 27369701 27343302 27080302 26734203 26680803 26153406A 26153406B 25658603 24884201 24655701 24547404 Patients 67 Least Severe (Controls) 27374101 27242303 26874901 26803202 26399002 26158601 26096201 25738102 25654501 25345302 2.4 Microarray Analysis 63 Excluded 23032801 21268303 25505307 26731903 22769401 22256806 21785503 22257304 24525202 22690303 28950002 28337501 28322303 26653201 26631601 26406501 23660804 10043905 27758104 21689102 10243501A 10243501B 28494803 28478101 28354403 27879003 25124001 24983901 23869501 23768502 23696901 23227803 20866603A 20866603B 10261501 26275202 25215703 24799804 20465002 28641304 26357603 22507101 28428202 21225003 27067304 23214404 22350902 29408901 28985301 28039401 27106001 25950103 24056505 22743401 22453603 21159401 20731806 20563002 20532102 27914402 22803203A 22803203B 20634901 21987705 21196002 20077904 28762903 26461202 26056701 23885701 23804401 23681102 22117703 20366001 20052705A 20052705B 10203401 24904402 23770002 20416901 10215901 20129103A 20129103B 23899301 22403303 Genes Figure 2. Gene expression heatmap of affective disorder genes for all 227 Wichita CFS patients. A variety of heatmaps by patient groups and gene groups were inspected, but no gene patterns that correlated with patient groups were observed. One unexpected pattern was observed when the data were clustered by patients, but the cluster doesn't correlate with patient category. The significance of this cluster has not been studied. Because visual "integration" of color for a single gene across a set of patients is not an easy task for a human, the comparison was automated. The "Worst" and "Intermediate" groups were combined to form a "Sick" group, which were compared against the "Least (controls)" group. The comparison of mean log expression values for all genes in this "Sick" vs "Control" comparison is shown graphically in Fig. 3. inspection by a human. Numerically, the expression average is 10.97 for the "sick"group, but only 10.38 for the control group. The t-test p-value for this comparison is 0.02, while the Wilcoxon rank sum test p-value is 0.01. 10 15 Sick vs Control 5 Sick While a total of 37 genes were identified that showed differences between any two patient severity groups, no single gene differentiated all three categories. Only three genes, SERPINA6, NTRK2 and PIP5K2A, had utility in differentiating all three comparisons: Sick vs. Control, Worst vs. Control and Intermediate vs. Control. See the online supplement for details. 0 p ≥ 0.1 p < 0.1 p < 0.05 p < 0.01 0 5 10 15 Control Figure 3. Comparison of mean log2 expression values for affective disorder genes for Sick vs Control patient groups. Here “Sick” refers to the combined “Worst” + “Intermediate” groups. Fig. 4 shows the heatmap for the resulting 22 genes that have a statistically significant difference in their mean log expression values for Sick vs Control. Find the gene list in the online supplement. Sick vs Control PCA shows several interesting subgroups of patients, including one obvious “outlier”. The most interesting PCA group is a cluster of 17 "worst" patients shown by the ellipse in Fig. 5. About half of the CFS “worst” patients are in this cluster, but intermixed with other patients. The other half of the “worst” patients are more dispersed in the diagram. See the online supplement for details. Note that the dots in the PCA scatterplots correspond to patients and are colored by Reeve’s CFS severity classification. Purple dots represent the “worst” CFS severity patients; blue dots represent “intermediate” and green represent the “least” severe patients (the controls). Patients excluded from the original study are shown by the red dots. Partek software allows this figure to be rotated in any direction interactively to view relationships among the patients. NTRK2.AF410901.2.2 MOG.U64567.5.2 GNAS.AF105253.3 UBE1.M58028.5.1 NTSR2.NM_012344.1.4 GRM8.AJ236921.1.3 SERPINA6.J02943.2.1 SRR.AF169974.1.3 OPRM1.U12569.1.4 NTRK2.AF410900.2.2 DRD2.M30625.1.1 IMPA2.AF157102.3 QDPR.AB053170.1.1 PDYN.AL034562.1.4 SLC6A9.S70609.1.3 HCRTR1.AF041243.1.4 GRM4.NM_000841.1.3 CHRNA6.AB079251.1.2 PIP5K2A.BC018034.3 SLC1A1.AL136231.1.3 GRM1.L76631.1.3 GNAS.NM_080426.3 26077901 25072501 22419602 22019704 28493201 25015003 20717901A 20717901B 23163604 21842502 20583901 29160601 28603101 28268103 27792302 27316201 27297204 26173601 25465004 24699304 23845206 22771403 22160902 20676602 10268605 22089405 23775701 22248202 21629101 29430601 28063802 26940404 25909605 25869702 25198707 22032504 21217004 21187102 10803801 10240402 28647903 28542303A 28542303B 28423601 27472402 27369701 27343302 27080302 26734203 26680803 26153406A 26153406B 25658603 24884201 24655701 24547404 24531401 23590003 23553102 23171703 22665403 22104702 21753101 21656101 21646505 21533303 20082302 10689003 10193601 10103103 10081101 24071401 27300202 22388001 22290005 10860201 27084202 23032801 21268303 25505307 26731903 22769401 22256806 21785503 22257304 24525202 22690303 28950002 28337501 28322303 26653201 26631601 26406501 23660804 10043905 27758104 21689102 10243501A 10243501B 28494803 28478101 28354403 27879003 27374101 27242303 26874901 26803202 26399002 26158601 26096201 25738102 25654501 25345302 25124001 24983901 23869501 23768502 23696901 23227803 20866603A 20866603B 10261501 26275202 Two genes, ARNTL and CRY1 could be used to differentiate between the “Excluded” group and the control group, but were not seen as different in the “sick” group. Perhaps these “clock” genes reflect a condition for exclusion from a CFS categorization. Figure 4. Gene Expression Heatmap for genes with statistically significant differences in mean log2 expression between the “Sick” and “Control” groups. The arrow identifies the GRM1 gene, which is discussed in the text. Fig. 4 shows gene expression for these 22 genes from left to right and patients from top to bottom, with “severe” patients at the top and “control” patients at the bottom. The first column of Fig. 4, shown by the arrow above, is for the GRM1 gene. Note that visually the area by the darker grey bars, corresponding to the CFS “sick,” is slightly brighter than the darker area near the control group. But, this could easily be missed with a visual Figure 5. Partek ScatterPlot of first three PCA Components of Gene Expression Data. Ellipse encloses cluster of 17 of the 30 “Worst” CFS patients (with other patients). Reeve’s CFS severity classification is shown by dot color. 3.2 SNP Data 7. ACKNOWLEDGMENTS Thanks to Suzanne Vernon, Centers for Disease Control and Prevention, for helpful E-mail discussions, especially about the microarray data. Thanks to Christoph Bausch and Chris Seidel, Stowers Institute, for helpful discussions and feedback. Thanks to Gaye Hattem, Stowers Institute, for proofreading this document. 8. REFERENCES [1] Bierl, Cynthia, et al. Regional distribution of fatiguing illnesses in the United States: a pilot study. Population Health Metrics , 2:1, 2004. [2] CAMDA 2006 Conference Datasets, www.camda.duke.edu/camda06/datasets [3] Durinck, Steffen, et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21(16):3439-3440, 2005. [4] Hattori, E, C Liu, H Zhu, and ES Gershon. Genetic tests of biologic Figure 6. Partek scatterplot of first three PCA components of SNP data. No clustering of patients was observed. Reeve’s CFS severity classification is shown by dot color. No significant grouping patterns were observed in the SNP data with heat maps or PCA. Fig. 6 shows how randomly the patients are dispersed in a scatterplot of the principal components. 4. DISCUSSION While Hattori’s list only had 20 endocrine system genes, a similar approach would be interesting using the 1622 genes in Nicholson’s psycho-neuroendococrine-immune database [6]. None of the 21 CFS "exercise genes" reported by Whistler [16] were in the list identified here as discriminating CFS patients. None of the identified genes matched any of the ~100 differentially expressed genes reported in another microarray study [15]. A comparison of patients suffering from gradual onset versus sudden onset has not been performed for the Wichita patients. Nisenbaum discusses the CFS illness states over time [7]. Since Reeves suggests CFS is cyclic in occurrence and severity of its symptoms [11], a microarray study attempting to identify "high" and "low" states may be useful in identifying genes involved in the disease. systems in affective disorders. Molecular Psychiatry, 10(8), 719740, 2005. [5] Jones, James F, et al, Medication by Persons with Chronic Fatigue Syndrome: Results of a Randomized Telephone Survey in Wichita, Kansas. Health and Quality of Life Outcomes, 1:74, 2003. [6] Nicholson, Ainsley C, et al, Exploration of neuroendocrine and immune gene expression in peripheral blood mononuclear cells. Molecular Brain Research, 129:193-197, 2004. [7] Nisenbaum, Rosane, et al. A population-based study of the clinical course of chronic fatigue. Health and Quality of Life Outcomes , 1:49, 2003. [8] Partek. www.partek.com. Feb 2006. [9] R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. www.R-project.org [10] Reeves, William C, et al. Identification of ambiguity in the 1994 chronic fatigue syndrome case definition and recommendations for resolution. BMC Health Services Research, 3:25, 2003. [11] Reeves, William C, et al. Chronic fatigue syndrome – a clinically empirical approach to its definition and study. BMC Medicine, 3:19, 2005. [12] Reynolds, Kenneth J, et al. The economic impact of chronic fatigue syndrome, Cost Effectiveness and Resource Allocation , 2:4, 2004. [13] Solomon, Laura and WC Reeves. Factors Influencing the Diagnosis of Chronic Fatigue Syndrome. Arch Intern Med, 164:2241-2245, 2004. [14] US Centers for Disease Control & Prevention, National Center for 5. CONCLUSIONS About three dozen genes were identified that differentiated the CFS patient severity categories using the affective disorder genes. The Hattori affective disorder genes show some utility in differentiating CFS patient severity categories, but analysis is not yet complete. SNP data do not appear to be useful in identifying CFS patients. Infectious Diseases. Proposal: Clinical Assessment of Subjects with Chronic Fatigue Syndrome and Other Fatiguing Illnesses in Wichita. Atlanta, GA. 2002. ftp.camda.duke.edu/CAMDA06_DATASETS/wichita_clinical_irb_ protocol.doc [15] Whistler, Toni, et al. Integration of gene expression, clinical, and epidemiologic data to characterize Chronic Fatigue Syndrome. Journal of Translational Medicine , 1:10, 2003. [16] Whistler, Toni, et al. Exercise response genes measured in 6. SUPPLEMENTARY MATERIALS This web page contains a full color version of this paper and supplementary information, including all R source code: http://research.stowers-institute.org/efg/2006/CAMDA/ peripheral blood of women with Chronic Fatigue Syndrome and matched control subjects. BMC Physiology, 5:5, 2005.