Supplementary Information Supplementary Information is composed of a txt file containing Supplementary Materials and Methods, References, and Supplementary Tables (1 and 2) and Figure (1 and 2) Legends, and 2 JPG files for Supplementary Figures 1 and 2. Supplementary Materials and Methods Enrollment and follow up of study patients From August 2001 to January 2005, 178 consecutive patients met the eligibility criteria. Among them, 82 patient samples (46.1%) were excluded, because of inadequate quantity (n=49) or quality (n=33) of isolated total RNA. Hence, gene expression microarray data of pretreatment biopsy samples from 96 patients (all treated with CF by on medical oncologist [H.K.]) were used for the expression profiling training set. Between Feburary 2005 and April 2006, 43 patients were treated with either CF or CX (cisplatin plus oral capecitabine, a fluorouracil pro-drug considered equivalent to fluorouracil) by another group of medical oncologists in the same institution. CX therapy was demonstrated to be therapeutically equivalent to the CF regimen for metastatic gastric cancer. Adequate quantity and quality of total RNA were retrieved from 27 (62.8%) out of the 43 samples. Tissue procurement and processing were the same for the training and validation samples. Abdominal spiral computed tomography (CT) scans were performed every 3 cycles, and radiographic response was evaluated according to World Health Organization (WHO) criteria in patients with measurable disease. Unless more frequent imaging was indicated due to symptoms or signs suggesting progressive disease, CT scans were performed every 3 cycles of chemotherapy (every 9 weeks). Partial response (PR) is defined as more than a 50% decrease in the sum of the product of the two largest perpendicular diameters of measurable lesions for at least 4 weeks, but a follow-up CT scan was not routinely performed 4 weeks after the initial documentation of PR. Overall survival was measured from the initiation of chemotherapy to the time of death. Survival data were censored in May 2008. survival and time to progression were plotted by the Kaplan-Meier method. Overall Log-rank and Wald’s tests were performed to assess differences in overall survival and time to progression, and the statistical significance of the Cox hazard ratio, respectively, using SPSS (version 15.0; SPSS, Inc., Chicago, IL). Logistic regression was performed using SAS (version 9.1.3, SAS Institute Inc, Cary, NC) Gene ontology analyses The survival times of patients in the array training set were correlated with the mRNA expression levels measured by microarray. There were 917 genes that were significantly correlated with the poor survival of the 96 patients (P value < 0.05) (poor-prognosis signature). Highly represented functional categories and canonical pathways were identified by performing Ingenuity Pathway Analysis (IPA, version 8.5, www.ingenuity.com) on these 917 genes. As the default IPA setting, functional categories and canonical pathways are ranked according to Benjamini-Hochberg P value and the ratio respectively. The ratio is defined by the number of genes among 917 genes in a given pathway divided by total number of genes that make up that pathway. Array CGH Genomic DNA and total RNA were extracted from samples using TRI reagent (Invitrogen, Carlsbad, CA), according to the manufacturer’s protocol. Genomic DNA was additionally purified using the QIAamp DNA Micro Kit (QIAGEN, Valencia, CA). For array CGH experiments, Agilent 4x44k HD-CGH Microarrays containing 44,000 features (Agilent Technologies, Santa Clara, CA) were used. For aCGH experiments, 0.5-1 µg of tumor genomic DNA samples and the same amount of human genomic DNA from multiple anonymous female donors (Promega, Madison, WI) were digested with AluI (50 units) and RsaI (50 units) for 2 h at 37°C. 5 µl of Random Primer was mixed with the digested DNA template. The reference and sample DNA were labeled using Agilent’s Labeling Kit PLUS, which includes 5x buffer, 10x dNTP, Cy-3/5 dUTP (1.0 mM), and Exo-Klenow Fragment. The probe mixture of Cy3 labeled sample DNA, Cy5 labeled reference DNA (39 µl), 5 µl of human Cot-1 DNA (Invitrogen), 11 µl of Agilent 10× blocking agent and 50 µl of Agilent 2× hybridization buffer was denatured at 95 °C for 3 min and incubated at 37 °C for 30 min. The probe was applied to the array using an Agilent microarray hybridization chamber, and hybridized for 21 h at 65°C in a rotating oven at 20 rpm. Arrays were washed according to the manufacturer's recommendations, dipped in Agilent’s Stabilizing and Drying Solution, and scanned using an Agilent 2565AA DNA microarray scanner. The Agilent’s Scan Program Control Program 7.0 and Agilent’s Feature Extraction Software Program 9.5.1 were used for data processing. Array CGH data were analyzed using Agilent's CGH Analytics software (version 3.5.14). The ADM-2 algorithm was used with a threshold of 6.0. The aberration filter was set at a minimum of 5 probes in the region, and a log ratio that was equal to the DLRSpread (the spread of the ratio differences between consecutive probes) of each sample. Aberrations with average tumor/normal log2 ratio > 2.0 were defined as amplifications for this analysis. Real-time RT-PCR RNA was reverse transcribed using the AffinityScript reverse transcription kit (Stratagene, La Jolla, CA) in a 20 µl reaction containing 0.2 µg of total RNA and oligo(dT) primer according to manufacturers’ instructions. The 10 µl reaction mixture for Q-RT-PCR was composed of 5 µl of 2x TaqMan universal PCR master mix (Applied Biosystems, Foster City, CA), 0.5 µl 40x primer and probe mix (600 nM each forward and reverse primers and 100 nM specific TaqMan probe, Applied Biosystems), 2 µl cDNA, and 2.5 µl H2O. All real-time qPCR assays were done in duplicate on an ABI PRISM 7900HT Sequence Detector System (Applied Biosystems) according to the following program: 50 C for 2 minutes, 95°C for 10 minutes, 40 cycles at 95°C for 15 seconds, and at 60°C for 1 minute. Primers and TaqMan probes for MYC, EGFR, FGFR2, and GAPDH were purchased from Applied Biosystems (ABI-Assay on DemandTM). The amplifications were carried out on the same plate under identical conditions for the control gene and each gene of interest. The comparative CT method was used to calculate relative quantification of gene expression. CT values were determined, where CT represents the threshold cycle or the PCR cycle number at which an increase in reporter fluorescence crosses a threshold significantly above the baseline signal. For data normalization, GAPDH was selected as the reference endogenous control gene. Relative quantification of each mRNA was achieved by first normalizing the specific mRNA CT values to one reference CT value, GAPDH, then comparing the test samples to control samples. Specifically, the ΔCT -Sample value was calculated as ΔCT Sample = avg. CT Sample – avg. CT Reference, then the ΔΔCT Test to Control =Δ CT Sample Test – ΔCT Sample Control. To calculate the correlation with DNA microarray data, the RT-PCR data was converted to a fold change by raising 2 to the power of - ΔΔCT Test to Control. The difference in GAPDH (213453_x_at)-subtracted RMA signals between the test and control samples served as the fold change in DNA microarray data. A Pearson correlation coefficient between these 2 fold changes was calculated for each gene, using SPSS. Immunohistochemistry Immunohistochemistry was performed using paraffin-embedded formalin-fixed tissue slides. Primary antibodies used for immunohistochemistry were sc-003 for EGFR, sc-122 for FGFR2 (Santa Cruz Biotechnology Inc., Santa Cruz, CA), and 9E10 for MYC (BD Biosciences, San Jose, CA). Titers of primary antibodies were 1:100 (overnight hybridization at room temperature (RT)), 1:50 (30 min), and 1:500 (overnight) for EGFR, FGFR2, and MYC, respectively. Secondary antibody titer was 1:100 for FGFR2 and MYC (30 min) and 1:500 for EGFR2 (30 min). Numbers of samples evaluated were 10, 10, and 17 for EGFR, FGFR2, and MYC, respectively. Immunostaining scoring was based on the strength of staining for EGFR and FGFR2 (from 0 to 3+)1, 2, and on the percentage of positive cells for MYC (<10%, 10-50%, or >50%)3. Five microscopic fields (100x) were evaluated for each sample. Pearson (for EGFR and FGFR2) and Spearman (for MYC) correlation coefficients were calculated between the immunostaining scores and microarray signal values using SPSS. Supplementary References 1. Kim MA, Lee HS, Lee HE, Jeon YK, Yang HK, Kim WH, et al. EGFR in gastric carcinomas: prognostic significance of protein overexpression and high gene copy number. Histopathology 2008;52(6):738-746. 2. Ho CL, Sheu LF, Li CY. Immunohistochemical expression of basic fibroblast growth factor, vascular endothelial growth factor, and their receptors in stage IV non-Hodgkin lymphoma. Appl Immunohistochem Mol Morphol 2002;10(4):316-321. 3. Yu Y, Dong W, Li X, Yu E, Zhou X, Li S. Significance of c-Myc and Bcl-2 protein expression in nasopharyngeal carcinoma. Arch Otolaryngol Head Neck Surg 2003;129(12):1322-1326. Supplementary Tables Supplementary Table 1. Quality control parameters of microarray data for patient subsets used for the analyses Training set (N=96) Validation set (N=27) % Present call Median Interquartile range 58.0 52.2-62.2 58.0 55.0-62.0 Scaling factor1 Median Interquartile range 0.7 0.4-0.9 0.6 0.5-1.0 3/5 GAPDH ratio Median Interquartile range 1.8 1.4-2.4 1.6 1.3-1.9 1Target signal, 100 Supplementary Table 2. Functional categories highly represented in poor prognosis genes1 according to the Ingenuity Pathway Analysis Category Protein synthesis DNA replication/ recombination/repair Cancer 1 2 B-H2 P value Genes 3.7 x 10-7 ∼ 3.6 x 10-4 AATF, ADAM17, ALOX12, ANAPC5, APOA2, CKAP5, COPS5, CPZ, EGFR, EIF1, EIF2B2, EIF2B4, EIF2S1, EIF3B, EIF3C, EIF3D, EIF3E, EIF3F, EIF3H, EIF3I, EIF4A1, EIF4A3, EIF4B, EIF4EBP1, EIF5, EIF5B, FAF1, GRP, GSPT1, HRSP12, HTRA2, IGF1, KHDRBS1, KRT17, MAN1B1, METAP2, MRPL13, MRPL16, MRPL3, MTIF2, MYC, PAPPA2, PCSK5, PEPD, PITRM1, PMPCA, RAD23A, RPL13, RPL18, RPL24, RPL30, RPL38, RPL5, RPL7, RPL7A, RPL8, RPS2, RPS5, SPHK1, SSB, TIMP2, TIMP3, TSFM, UBE2I, UBE2N, UBR5, XPO1 4.4 x 10-7 ∼ 0.042 ACTL6A, APEX1, ATF7IP, BRCA2, C10ORF2, C14ORF169, CBX1, CCNB2, CEBPA, COPS5, CTCF, DDB1, DNMT1, DNMT3B, EBNA1BP2, EIF3E, ERCC2, HCFC1, HDAC5, HMGA1, HSF1, IGF1, KAT2A, KIF2C, KPNA2, LIG3, MCM2, MCM4, MEN1, METAP2 MSH2, MYC, NAP1L1, NAP1L4, NBN, NCAPD3, NCL, NEK2, NEK9, PARP1, PARP2, POLD1, POLG2, PRKDC, PRPF19, RAD1, RAD21, RAD23A, RAD23B, RAD51C, RAD51L1, RAD52, RAD54B, RAN, RBBP7, RECQL4, RUVBL2, SET, SMARCA4, SMARCA5, TBP, TREX2, TTF1, UBE2N, UNG 8.8 x 10-7 ∼ 0.037 ACLY, ADSL, APOC1, AQP3, BYSL, C16ORF33, CACYBP, CCNB2, CCT4, CCT5, CLDN3, CNTNAP2, COL2A1, CTSL1, DDX39, DSCC1, EGFR, FGFR2, GLRX3, HSPE1, IBSP, IGF1, KPNA2, MCAM, MCM3, MCM4, MCM10, MRPS30, MYC, NEK2, NF1, NLE1, NUFIP1, NUP37, PAICS, POLD1, RAD1, RAD54B, RFC4, RPP40, RUVBL1, SET, SLC7A5, SQLE, TCP1, TEAD4, TIMP2, TIMP3, TPD52L1, TRAP1, TRIP13, UNG, YES1, ACLY, ADSL, APOC1, AQP3, BYSL, C16ORF33, CACYBP, CCNB2, CCT4, CCT5, CDH2, CENPF, CLDN3, CNTNAP2, COL2A1, COLEC12, CTSL1, DDX39, DSCC1, EGFR, FABP6, FGFR2, FSTL3, FZD7, GART, GCDH, GLRX3, HDAC5, HSP90AB1, HSPE1, HUWE1, IBSP, IGF1, IVD, KPNA2, KRT18, MCAM, MCM3, MCM4, MCM10, MRPS30, MSH2, MYC, NEK2, NF1, NLE1, NUFIP1, NUP37, PAICS, PHF16, POLD1, RAD1, RAD54B, RFC4, RPP40, RUVBL1, SET, SLC7A5, SQLE, TCEA1, TCP1, TEAD4, TIMP2, TIMP3, TPD52L1, TRAP1, TRIP13, TUBA1C, TUBD1, UNG, YES1, CDKN1C, MCM10, PARP1, PLK1, CDKN1C, FXN, PARP1 917 genes correlated with short survival of 96 patients in the training set Benjamini-Hochberg Supplementary Figure Legends Supplementary Figure 1. Pearson correlation between DNA microarray and real-time RTPCR data for the 3 genes. P values were 0.004, 0.031, and 0.004 for EGFR, FGFR2, and MYC, respectively. Supplementary Figure 2. Representative immunohistochemical staining results for low- and high-expressers of EGFR (Panels A and B), low- and high-expressers of FGFR2 (Panels C and D), and low- and high-expressers of MYC (Panels E and F; Magnification = 400x). Pearson correlation coefficients between microarray signal value and immunostaining score were 0.56 and 0.47 for EGFR and FGFR2, respectively (P=0.09 and 0.17). correlation coefficient for MYC was 0.50 (P =0.04). Spearman