Supplementary material Microarray experiment. Microarrays were custom-designed for this study. Collection of genes was identified which is shown to be overexpressed in the RCC tumor compare to normal kidney tissue (Gumz 2007). Each Reference sequence was divided into three segments of roughly equal length along 5’-3’ axes. Individual probe design was conducted for each sequence segments. Nine thousand four hundred transcripts produced probes with optimal characteristics in base composition for all three segments (28,000 probes), 2000 transcripts produced two out of three probes with optimal design, 1100 transcripts produced only one probe with optimal design, and 300 transcripts did not result in design of optimal probes. Eighteen hundred transcripts produced all three probes optimal for base composition, but 30% of the designed probes had potential for cross hybridization with other sequences. Probes selected from the group with optimal characteristics and probes available for all three segments were chosen and used to generate spotted arrays for the experiments. Probes were generated as in situ synthesized oligos (Agilent) in 44K format. The RNA was used as the starting material for synthesis of cDNA target using oligo dT primer, and a spiking control was added to each sample prior to cDNA synthesis in order to provide a consistent positive reference in subsequent hybridizations. Single color labeled cDNA with Cy3 was hybridized to microarrays. The array production, cDNA synthesis, labeling, and hybridization were conducted following standard Agilent procedures. Initial analysis was conducted to test the system. Four replicates of probe labeling were conducted on total RNA sample and RNA amplified from that total RNA sample in a standard labeling reaction. Signal intensity distribution and between sample sets were conducted. Frequency distribution of signal intensities showed slight bi-modality, with consistent differences between total and amplified RNAs. Control probes spiked into each RNA population produced signals with appropriate linearity of response indicating that target labeling reaction is behaving normally. The labeled RNA generated from same biological total or amplified RNA was hybridized to separate arrays. The data analyzed indicates linear correlation between signals from both arrays indicating that technical replicates are highly reproducible. 96% of reporters were concordant in both samples, whereas 4 % of reporters were absent in one and present in another or vice versa, however these samples were found near the limit of signal detection. Total RNA and amplified RNA produced similar results in this test indicating that the system established is not biased to the type of sample. Reproducibility of results was similar to that obtained in a conventional gene array experiments. The preliminary testing of the system demonstrated that technical variability is highly controlled; therefore, experimental design can focus on biological replicates and on understanding differences between methods used to generate amplified RNA. When the total RNA of sample was compared to the RNA amplified from that sample using the old amplification method and the data is analyzed on all three probes or the whole transcript levels there is a good correlation with 94% of transcripts behaving similarly (either present or absent) in both the amplified and the total RNA sample. 248 transcripts present in total RNA but absent in amplified RNA (2.15%); 375 present in amplified RNA but present in total RNA (3.25%). The data changes along probe types with the data with data more concordant at the 3’end probe and less concordant at the 5’ end probe. This results is expected because the methods for RNA amplification is anchored by the poly(A) tail – probes located in the 3’ end of the transcript. Of transcripts that were detected in the total RNA but not in the amplified RNA they may be absent due to truncation during amplification process: for 5’ probe 13.9%, middle probe 12.2% and 3’ probe 3.9%. This was the more frequent result. Some transcripts were detected in amplified RNA but not total RNA and this could be due to high levels of amplification skewing the detectability and therefore the relative representation: 5’ probe 4.9%; middle probe 3.3% and 3’ probe 2.4%. This was the less frequent result. However, in both cases the differences were more pronounces for probes located in the 5’ end are more likely to be differentially detected. This result indicates that processivity at the target labeling reaction during the microarray experiment and not during the RNA amplification process. To understand the differences between amplification protocols the RNA was amplified using either old protocol or new protocol and compared to the total RNA using Principle Component analysis on the whole transcript data with all three probes (Figure 2 panel C of the manuscript). The PCA analysis repeated using intensity data derived by probe type: 5’end, middle or 3’ end. When the 5’ probe data was analyzed by amplification method it was shown that the RNA amplified using the new process is more closely related to the total RNA from which it was derived than the RNA amplified using the old method (data not shown or right plot from slide 26 and 28). Using data from 3’ end, the overall distribution of samples appears very similar to whole transcript data. This indicates that Probe Type 3 is the major contributor to the overall distribution of the data in PCA analysis using whole transcript data. When analyzed by the amplification method the data obtained using new amplification protocol are still more closely related to the source samples.