Supplementary methods: Choice of genes on the array. The genes on the multi-species cDNA array were chosen from lists of genes with Refseq identifiers that were previously shown to be expressed in primate livers, according to GeneCards database (http://www.genecards.org/) or Enard et al. (2002). Genes were not chosen based on their function or known association to human disease. However, as detailed in Gilad et al (2005), we tried to ensure that the amplified cDNA probes would include no more than a single >100-bp segment with a matching sequence elsewhere in the human genome at an identity cutoff of 85%. This procedure excluded genes that are located within recent segmental duplications in the human genome and are likely to differ in their copy number between different species. Indeed, of 757 genes on our array whose physical location in the human genome can be obtained directly by using their Refseq identifier, only one (PARG) is located within a recent (>98%) segmental supplication, based on the Segmental Duplication Database (http://humanparalogy.gs.washington.edu/). We focus on liver because, in addition to the obvious cognitive and linguistic differences between humans and non-human apes, humans are the only primate to regularly consume cooked food, with the earliest unequivocal evidence for controlled use of fire dating to ~400,000 years ago (Jones, 1992). The digestion of cooked food, among other shifts in nutrition, has led to a human diet that differs sharply from that of our close relatives (Wrangham, 1999). Such changes are likely to been accompanied by molecular adaptations (e.g., Neel, et al. 1998), notably in the liver. Since the genes were chosen based on their expression in humans, a concern is that our sample may be biased towards genes that are highly expressed in humans 1 compared with the other species. In order to avoid this bias, we rotated the leading species in the first PCR amplification i.e., the species on which each set of primers was first tested. Specifically, approximately 25% of the primers were tested on each species. If a primer pair successfully amplified a unique PCR product of the expected size in the leading species, we obtained the product for this gene from the other three species as well. If not, this primer set was excluded. The rationale behind this approach is that the first test of the primers is more likely to yield successful amplification if the gene is highly expressed in the species from which cDNA is used as template. Overall, we tested approximately 1400 primer pairs on a leading species, 1056 of which resulted in successful amplification. Of these, successful amplifications were obtained in all species for 907 genes as detailed in Gilad et al. (2005). A recent survey of gene expression in human and chimpanzee liver detected 9390 expressed genes (Khaitovich et al. 2005); our array therefore includes probes for roughly 10% of genes that are expressed in the liver. Supplementary table 1 contains the Refseq identifiers of these 907 genes along with their multiple tissue expression patterns based on Novartis Gene Expression Atlas (http://expression.gnf.org/FAQ.html#abscall when available. Samples and hybridizations. We extracted RNA from the liver of five adult males from each of the four species. One of the advantages of working with livers is that it is one of the most homogeneous tissues with respect to cellular composition (Balashova et al. 1984). This is in contrast to brain tissue for example, which may differ substantially in their cellular composition between samples and, in particular, between species (e.g., Brodal et al. 1983). For humans, healthy tissue samples were obtained from adult male 2 liver resections performed at Yale Hospital (in accordance with Yale University HIC regulations). Non-human primate samples were collected from adult male chimpanzees, orangutans and rhesus macaques that died of natural causes or were euthanized following a liver-unrelated disease. Sample preparation, hybridizations, and washes were performed on our multi-primate cDNA array platform, as previously described (Gilad et al. 2005). Analysis. In primate inter-species gene expression comparisons, we are unable to stage the tissues or minimize environmental differences among samples. Consequently, we chose a reference design and laboratory procedures that are aimed at minimizing the technical variance. Specifically, we used five individuals from each of the four species and four technical replicates of each comparison (for a total of 80 hybridizations). We performed all four hybridizations with the same RNA sample on the same day, further minimizing the technical variance. Preparation of the reference RNA for the entire experiment was done in advance. The common reference design facilitates an analysis in which we can estimate components of variation for species, individuals within species, arrays and so on. The replicate arrays for each individual give us precise estimates of the expression for each individual and hence increase our power to detect species and other differences at the relatively small cost of confounding individual measurements with the technical variance associated with RNA preparation and varying days. While previous studies performed permutation tests to assess whether interspecies divergence was greater than expected from within species differences, our approach presents the advantage of estimating variation in expression levels both within and between species. In this respect, our 3 approach resembles the HKA test (Hudson, et al. 1987), which compares nucleotide polymorphism and divergence levels across multiple loci. The microarray experiment was conducted as a multi-level design where nested within each of the four target species t=h,c,o,r we have 5 individuals labelled i=1…5, within each individual we have 4 technical replicates labelled j=1…4 and within each array, for each gene, we have 4 probes p=h,c,o,r. For a given gene let Itp be the observed fluorescence intensity in the red or green channel which corresponds to target species t on probe species p. We define E log 2 I tp tp (0.1) where E represents expectation. The attenuation caused by the sequence mismatch occurs when the target species and the probe species are different and we assume that this factor attenuates the intensity by a given amount (ktp) for each gene. Therefore I tp ktp I tt . Taking expectations of the log, we get tp tp tt , (0.2) where tp log 2 ktp . Each spot on the array measures the differential expression between the target species t and the human reference h giving a log-ratio value Mtp=log2(Rtp/Ghp) where Rtp and Ghp are the red (Cy5) and green (Cy3) intensities. The log expression ratio for each spot will be a linear combination of the difference in RNA expression between 4 the two species, the attenuation in expression caused by sequence mismatch, and a possibly intensity dependent dye bias term. E ( M tp ) ( tt tp r ) ( hh hp g ) t ( tp hp ) (r g ) (0.3) where t=tt – hh is the true log fold change in expression for species t relative to human, tp – hp is a difference of log attenuation factors, and r – g is the intensity dependent dye bias. A direct application of the lowess normalization procedure to log ratios for probes of species p only, will generally result in biased estimates of expression levels. This procedure would lead to a line centered at the local mean (across genes) of tp – hp , which is not zero in general. Carrying out a similar normalization to all probes of the four species together would (at best) lead to a line centered at the local mean of 1 4 ( tt ht ) ( th hh ) ( tp hp ) ( tp hp ) 14 ( th ht ) ( tp hp ) ( tp hp ) where p and p are the two probe species which are not human or the target species t. This again is not expected to be zero. However, looking at this expression we see that if we make the reasonable assumption that th = ht, a standard procedure applied to the log ratios Mtp for p=t and p=h, should lead to an approximately unbiased normalization curve. This is what was implemented in Gilad et al. (2005) and is appropriate if not too many of the t are non-zero, or about half are positive and half negative (see Yang et al., 2002, for more discussion). We applied this procedure and the resulting adjustment was 5 applied to log ratios for all probes. An example of an array hybridized with rhesus as the target species against the human reference is shown below. The red spots are the unnormalized rhesus probes with the red line a lowess fit through those probes. The black dots are the unnormalized human probes with the black line a lowess fit through those probes. A lowess fit using both sets of probes results in the blue line which is what we use to adjust all the probes on the array. Figure 1: Log-ratios (M) vs log intensities (A) for a Rhesus-macaque/ human hybridization. The rhesus probes are shown in red with the red line a lowess curve through these probes. The human probes are shown in black with the black line a lowess curve through these probes and the blue line is the lowess curve through both sets of probes used in the normalization. Linear Modeling The fixed effects in the linear model (equation 0.3) describe expression levels t of the target species t relative to human h and the expression attenuation tp caused by the target and probe sequence mismatch. After normalization we performed a check of additivity by examining the residuals for several genes after fitting the fixed effects using ordinary least squares. We stratified the residuals by target species and by probes species and we 6 did not find any outstanding deviations from the model. Examples of these plots are shown in Figure 2 for four random genes. There are also a number of error levels in the experiment which could be included in the model and we used analysis of variance to investigate the relative contribution of error levels. For each gene we have 320 measurements arising from 4 species × 5 individuals × 4 arrays × 4 probes. We initially estimated the fixed effects by ordinary least squares. Next we analyzed the residuals by estimating the variance components for a model with random effects for individuals, arrays within individuals and error, that is rtijp ti tij tijp (0.4) where rtijp are the residuals of the measurements after subtracting the fixed effects and is the intercept term. The effects for individuals ti are assumed to be uncorrelated with mean zero and variance , and the effects tij, for arrays within individuals, are assumed to be uncorrelated with mean zero and variance . Finally the residual errors tijp, are assumed to be uncorrelated with mean zero and variance . All of these analyses are gene specific but here we have suppressed the gene labels. The mean squares and variance components were estimated by analysis of variance for each gene. 7 Figure 2: Examination of the residuals of four random genes on the array. Each row represents a different gene. The first figure in each row plots the residuals for the 320 observation against the fitted values and the four colors represent the four probe species. The second figures are boxplot of the residuals stratified by target species and the third figures are normal quantile-quantile plots. 8 Negative variance components were dealt with in the way outlined in Thompson (1962) by setting such components to zero and recalculating the lower level component by pooling the data from the two levels. We found that about 35% of the genes produced estimates of ˆ 0 when calculated this way. We estimated the variance components for each of the species separately, and for the species together, and these produced very similar results. Boxplots of the three variance components for all the genes for each of the species separately are shown in figure 3. It can be seen that the term for the arrays ( ˆ ) is substantially smaller than the individual and error terms and therefore we did not keep this term in the model. We could further split off a probe term tip from the residual error term tijp as in the following model rtijp ti tij tip tijp . (0.5) The variance estimates from this model give a similar picture to the previous model with the error and individual terms dominating and the array and probe terms substantially smaller (data not shown). We concluded that neither the array nor probe terms were warranted and therefore fitted a model with only random effect terms for individual and error as outlined in the paper. 9 Figure 3: Boxplots of log variance components of all the genes for the three random effects from equation 0.4. For each species the component for the arrays is significantly smaller than the other two terms and about 35% are equal to zero and are not shown in the plot. Inferring ancestral state to determine changes in human and chimp lineages. We chose genes that showed significant differences between human and chimpanzee using the likelihood ratio tests on pairs of species. From these, we restricted ourselves to genes which were not significantly different between orangutan and rhesus. There were 84 genes satisfying these criteria. We used the mean of the orangutan and rhesus expression as the expression of the common primate ancestor (a fig. 4). Changes can occur in any of the three branches connecting the species. The amount of change in expression in each branch is calculated by inferring the expression relative to the most recent common ancestor of human and chimpanzees denoted by x following Rossnes et al. (2005). If the expression of the common primate ancestor, a, is between the expression of human, h, and chimpanzee, c, then the expression at x is deemed to 10 be equal to that at the common ancestor, otherwise, the log expression at x is half way between the expression at the ancestor a and the closer of the human and chimpanzee. More than twothirds of the genes have the expression at the ancestor a between the expression of human and chimpanzee. Once this is inferred, we can calculate the log expression change in the branches connecting the ancestor, human and chimpanzee. a x h c Figure 4: Changes in gene expression can occur in any of the 3 branches connecting human (h) chimpanzee (c) and their common ancestor (a). The expression at a is calculated as the mean of the Orangutan and Rhesus expression after selecting genes for which these are not differentially expressed. The amount of change in relative expression in each branch is calculated by inferring the relative expression at x (see methods). If the expression of the outgroup, a, is between h and c then x = a, otherwise, x is half way between the expression of a and the closer of h and c. Search for genes involved in cancer. In order to identify genes that are associated with human cancer, we performed an automated search (in October 2005) of the Descriptions, OMIM Disorder, and Protein Domains and Families fields of each gene entry in GeneCards database for the following strings (also as part of a word): cancer, carcinoma, lymphoma, leukemia, malignant, tumor. This search yielded 53 genes. 11