Supplementary Material

advertisement
Supplementary Material
Supplementary Material ....................................................................................................................... 1
1. Study selection .......................................................................................................................................................... 1
2. The "deep sequencing" approach introduces nucleotide biases ...................................................... 3
3. Analyses for the complete set of biochemical predictors in AAIndex ............................................. 5
4. How to reproduce figures and tables ............................................................................................................. 8
References........................................................................................................................................................................ 8
1. Study selection
Published studies were identified for a meta-analysis of the fitness effects of
amino acid replacements caused by single transitions or transversions, i.e., the
so-called "singlet" replacements. Studies chosen for inclusion meet a size
threshold of 20 for studies of random mutations, and 10 for studies of beneficial
mutations (which tend to be smaller).
A random set of mutations is a set designated by a random choice, either by
deliberately engineering a set of randomly assigned mutations (Sanjuan, et al.
2004), by using a procedure such as error-prone PCR, or by exhaustively
synthesizing all possible versions of a sequence. Beneficial mutations are
chosen by selection or screening under some condition.
We restricted our attention to studies that measure fitness, as distinct from
binding or activity. Various studies are available that measure fitness. We
ignore differences in precisely how fitness is defined and measured (Chevin
2011), and accept any study that uses a measure of growth, including the
measurement of intrinsic growth rate in a pure culture, pairwise competitive
growth assays (mutant against wild-type) and growth in mixed culture (as in
some high-throughput studies). Thus, we use the growth-rate results from
Jacquier, et al. (2013) rather than any activity or resistance measurements.
However, upon discovering extreme and idiosyncratic biases in studies that use
deep sequencing to identify and quantify mutants, we rejected these studies as
inappropriate (see next section).
To understand the nature of these criteria, it is helpful to discuss the kinds of
studies that are excluded. There are probably dozens of systematic studies of
singlet replacements that report on other phenotypes (e.g., antibiotic resistance,
biochemical activity) without reporting on fitness (e.g,. a dozen cited in
Yampolsky and Stoltzfus 2005).
Studies that measure fitness may be excluded for several reasons, most often
because they do not have enough singlet replacements (e.g., Betancourt 2009;
Bataillon, et al. 2011; McDonald, et al. 2011). For instance, (McDonald, et al.
2011) isolated 100 beneficial mutants, genotyped 20, and found 13 different
mutations, 11 of which were deletions, and only 2 of which were singlet
replacements. Some studies look at the distribution of fitness effects without
determining genotypes (e.g., Barrett, et al. 2006; Kassen and Bataillon 2006), or
they assign fitnesses to lineages with multiple changes, rather than to individual
replacements (Holder and Bull 2001; Rokyta, et al. 2009).
In two cases, a relevant study was included wholly or largely in a later study.
Most of the data reported by MacLean, et al (2009) are reported in a later paper
from the same laboratory (MacLean, et al. 2010), which we used instead.
Mutants reported by Rokyta, et al (Rokyta, et al. 2005) recur in Miller, et al
(2011).
Finally, the data from Lind, et al. (2010) are excluded because the authors
themselves report that mutant fitness effects are dominated by effects on
expression and fail to show predictable protein-level effects.
2. The "deep sequencing" approach introduces
nucleotide biases
Some recent studies of mutant fitnesses use the methodology of "deep
mutational scanning", in which the fitnesses of thousands of mutants growing in a
mixed culture are measured simultaneously using deep sequencing. Though this
is a promising technology, its application to the measurement of fitness is subject
to extreme nucleotide-level biases. Out of just 13 high-throughput studies that
measure fitness using deep sequencing, we found 2 that exhibit extreme
nucleotide-level biases that are different between the 2 studies. Perhaps other
studies do not have such biases, but we cannot be sure.
In both cases, the authors completely excluded certain pathways on the grounds
that they did not feel confident controlling for effects of mutation bias. Yet, the
remaining pathways still show extreme biases.
The biases are illustrated in the 2 figures below. Each figure is a matrix of
distributions for mutation from one nucleotide to another, where row = from and
column = to, e.g., the upper right distribution is for TG. Each distribution is a
histogram of fitness quantiles for mutants of that particular type. If there were no
differences in fitness distributions, each histogram would be flat.
In the data from Acevedo, et
al. (2014), the CT and
GA transitions are
excluded, and among the
remaining pathways, the
zero values are assigned
overwhelming to TR
transversions or GY
transversions (figure at
right).
In the study by Wu, et al (next page), the GT and CA transversions are
excluded, and there are disproportionate numbers of zero values assigned to
CG and GC tranversions.
3. Analyses for the complete set of biochemical
predictors in AAIndex
The purpose of this section is merely to show that the small samples of 25
indices presented in the main text in Fig 2 and Fig 4 do not present a misleading
picture of the entire distribution of biochemical indices in AAIndex. As noted
earlier, we discard over half of the indices in AAIndex because they are not
genuine biochemical properties, but properties of the evolved distribution of
amino acids in natural sequences (e.g., the frequency with which a particular
amino acid is found in helixes). The two figures below use the entire set of
genuine biochemical factors.
The figure at left, which is provided for comparison with Fig 2, shows the power
of a binary predictor (based on the named factor) to predict fitness effects. The
names are evenly distributed horizontally, so halfway down the list is the median,
which corresonds to an AUC of about 0.57. That is, most biochemical predictors
have more power than the ti:tv distinction, with AUC = 0.53 ± 0.03.
The figure at right, for comparison with Figure 4, shows the support of
biochemical predictors for the idea that transitions are conservative. About 3/5 of
the predictors are above 0.5, and so could be used to rationalize the
conservative transitions hypothesis. The other 2/5 of predictors could be used to
rationalize the opposite idea. The fact that the entire distribution is weakly biased
toward transitions (median AUC = 0.53) is not necessarily evidence of their
conservativeness, because scientists clearly have directed their attention to
developing predictors that are effective in accounting for observed evolutionary
tendencies, which are strongly biased toward transitions.
4. How to reproduce figures and tables
All of the figures and tables are generated by scripts. Please contact Arlin
Stoltzfus if you wish to use these scripts. Currently the scripts and the data are
in a github archive (http://github.com/arlin/qsme). The instructions specific for
this manuscript are in the file "meta-analyses/SN2015/README.md".
References
Acevedo A, Brodsky L, Andino R. 2014. Mutational and fitness landscapes of an
RNA virus revealed through population sequencing. Nature 505:686-690.
Barrett RD, MacLean RC, Bell G. 2006. Mutations of intermediate effect are
responsible for adaptation in evolving Pseudomonas fluorescens populations.
Biol Lett 2:236-238.
Bataillon T, Zhang T, Kassen R. 2011. Cost of adaptation and fitness effects of
beneficial mutations in Pseudomonas fluorescens. Genetics 189:939-949.
Betancourt AJ. 2009. Genomewide patterns of substitution in adaptively evolving
populations of the RNA bacteriophage MS2. Genetics 181:1535-1544.
Chevin LM. 2011. On measuring selection in experimental evolution. Biol Lett
7:210-213.
Holder KK, Bull JJ. 2001. Profiles of adaptation in two similar viruses. Genetics
159:1393-1404.
Jacquier H, Birgy A, Le Nagard H, Mechulam Y, Schmitt E, Glodt J, Bercot B,
Petit E, Poulain J, Barnaud G, et al. 2013. Capturing the mutational landscape of
the beta-lactamase TEM-1. Proc Natl Acad Sci U S A 110:13067-13072.
Kassen R, Bataillon T. 2006. Distribution of fitness effects among beneficial
mutations before selection in experimental populations of bacteria. Nat Genet
38:484-488.
Lind PA, Berg OG, Andersson DI. 2010. Mutational robustness of ribosomal
protein genes. Science 330:825-827.
MacLean RC, Buckling A. 2009. The distribution of fitness effects of beneficial
mutations in Pseudomonas aeruginosa. PLoS Genet 5:e1000406.
MacLean RC, Perron GG, Gardner A. 2010. Diminishing returns from beneficial
mutations and pervasive epistasis shape the fitness landscape for rifampicin
resistance in Pseudomonas aeruginosa. Genetics 186:1345-1354.
McDonald MJ, Cooper TF, Beaumont HJ, Rainey PB. 2011. The distribution of
fitness effects of new beneficial mutations in Pseudomonas fluorescens. Biol Lett
7:98-100.
Miller CR, Joyce P, Wichman HA. 2011. Mutational effects and population
dynamics during viral adaptation challenge current models. Genetics 187:185202.
Rokyta DR, Abdo Z, Wichman HA. 2009. The genetics of adaptation for eight
microvirid bacteriophages. J Mol Evol 69:229-239.
Rokyta DR, Joyce P, Caudle SB, Wichman HA. 2005. An empirical test of the
mutational landscape model of adaptation using a single-stranded DNA virus.
Nat Genet 37:441-444.
Sanjuan R, Moya A, Elena SF. 2004. The distribution of fitness effects caused by
single-nucleotide substitutions in an RNA virus. Proc Natl Acad Sci U S A
101:8396-8401.
Yampolsky LY, Stoltzfus A. 2005. The exchangeability of amino acids in proteins.
Genetics 170:1459-1472.
Download