Methods for Gamification of Molecular Engineering by Directed

Methods for Gamification of Molecular Engineering by Directed Evolution I. Introduction The tremendous promise of synthetic biology- and in particular of molecular engineering by directed evolution- has yet to be realized. Relevant techniques still require expensive reagents, finicky instrumentation, and great expertise in molecular biology protocols. It would be a superlative boon to advancing this field if more people with the requisite curiosity, scientific creativity, and raw aboveaverage intellect normally found in a full-time professional scientist could independently play the game of research without the many years of formal training and significant financial and infrastructural resources typically required. Meanwhile, billions of personhours’ worth of human brainpower are expended every day to solve the problems encountered within videogames. While this phenomenon has been put to good use to solve scientific problems requiring imagination, calculation, intuition, and pattern matching, it has not been deployed to allow game players to independently experiment upon actual physical systems. We describe methods that turn the physical process of molecular engineering by directed evolution into a game that anyone can play with no more infrastructure than access to the Internet. This will vastly increase the number of people who can contribute to synthetic biology research and will thus vastly accelerate the rate of progress in this life-saving, world-changing, mind-blowing field. II. List of figures Figure 1 Diagram of user interface for continuous directed evolution via synthesis and sequencing Figure 2 Flowchart showing how the game is played Figure 3 Diagram of exemplary “genetic avatars”, “constructavars”, or “molavatars” compared to sequences of the constructs for which they act as avatars, or visual representations. Figure 4 Process flow showing what happens physically/ remotely during gameplay Figure 5 Representative program written in OODLES, Objected Oriented Design Language for Experimental Science Figure 6 Representative screenshot of one embodiment of the invention Figure 7 List of positions and possible qualities for describing one particular “genetic avatar”, “constructavar”, or “molavatar” as a monkey based on data from Seelig and Szostak (2007). Figure 8 Reference, figure 3 from Seelig and Szostak (2007). III. Description The invention includes methods enabling a person with little scientific knowledge or physical or financial resources to perform real-world experiments in order to engineer molecules with desired properties. These methods include but are not limited to 1) procedures for interfacing between player turns/actions and what must happen physically at a remote semi-automated facility in correspondence to those inputs, 2) software and user interface implementations of game flow that enable the player to completely ignore any and all details of experimental protocol underlying the game, 3) game design features that allow a player to provide rational and intuitive guidance to directed evolution of specific biomolecular constructs without any knowledge of their sequence or biochemical or structural details, 4) game design features that allow a self-regulating economy of physical resources in terms of virtual actions, and 5) software implementations that allow experts to hack, refine, repeat, and scale all the actions taken by presumably naïve but successful players. Figure 1 shows a diagram of the minimal elements required of the invention in a generically described main user interface. The game consists of any number of missions that represent desired goals in molecular engineering, such as enzymes that take the carbon atoms in C12H22O11 and add them to a growing lattice of sp3hybridized carbon atoms, i.e., turning table sugar into diamonds. The player can be treated to such details in some embodiments, but it is noteworthy that game play can proceed usefully toward engineering goals without knowledge of these details. The main user interface has four minimal generic requirements: 1a) the gene pool, a pool of genes that the player can use, 1b) the ability to define new genes or edit old ones, 1c) elective evolutionary protocol submission- the ability to choose to subject genes to evolutionary procedures, and 1d) display of results- information on how specific genes fare when subjected to evolutionary protocols. In some embodiments, the gene pool can be augmented using genes shown to have various qualities by other players, and/or the gene pool can be augmented by designing genes individually from scratch and/or editing existing genes. Of note is that component 1b only requires the player to be able to perceive and design details of easily visualized, concrete, or familiar objects, as described below in our description of figure 3. Component 1c does not require any knowledge on the player’s part of the details of the underlying evolutionary protocols- he only knows that his genes will be subjected to evolutionary pressure of a specific kind, and that afterwards he’ll see, per component 1d, whether and how much any aspects of the genes changed during the process. Thus 1a-1d is a set of requirements for a useful direct molecular engineering game in which a player can complete the loop of turn cycles described in figure 2, which we shall now describe. Figure 2 is a flowchart that tracks a player’s series of actions in the course of the game, i.e., how the game can be played at a generic level. First, the player sees if he or she has any genes to use. If no genes are available, genes must be constructed virtually. Genes may be constructed using the technology described later in figure 3, so that no knowledge of genetics or biochemistry is needed. Once genes are available, the player faces a virtual economic choice that corresponds to the actual real-world economic costs of doing experiments. The player’s virtual resources may be represented by a virtual currency, such as a statistic associated with the player’s “character” (in a role-playing game-like context) that corresponds to this virtual wealth (in one embodiment, the currency is referred to as “nucleomana”, which evokes “nucleic acid” and also the concept of “mana”, spiritual or magical power often used as a reservoir for the ability to cast spells in fantasy-based role-playing games.) This aspect of the invention allows straightforward harmonization between virtual and real world economic resources , which is foreseen to be necessary until the underlying technologies are themselves free to use in their entirety. It is important to note that for clarity of play in some embodiments the need to spend “nucleomana” may happen prior to gene construction, as it is simpler to think of paying for the construction of a gene rather than to imagine paying for the more complicated and multifarious and à la carte processes involved in the next step, but in practice, this hardly makes a difference. In reality, as will be noted later, the genes are not actually physically made until the player has committed to experimenting with them- they are not synthesized in the real world at the same time as they are synthesized virtually. It is notable that during gene construction, players can set the precise degree to which they would like the gene to be randomized during synthesis, with full granular control over every element’s degree of synthetic randomization. Players may even copy and paste genes in the gene pool in order to edit the underlying randomization factors assumed during synthesis. This randomization allows for tuning of the inputs to an “evolution” so that the optimal degree of randomization can be determined in terms of optimal evolutionary outputs. Further, total virtual resources will be depleted in a manner corresponding to actual physical resources. As one example, a player may commit genes to an “evolution”, but wish to have more physical resources committed to the synthesis and input of that particular gene. Thus, players will be able to place the same gene multiple times into one evolution, with each instance representing actual physical resource use. However, the design of the economic cycle should take into account startup costs, where the cost to synthesize a small amount of one precisely built gene is greater than the per-unit cost of synthesizing more of it. Once the player is assured of adequate virtual resources in the form of “nucleomana” or “hit points”, or whatever form the particular embodiment uses, the player then decides how to apportion the available genes between various “evolutions”, an broadly-encompassing pseudo-neologism corresponding to everything between input and output in the realm of molecular engineering via directed evolution. The neologism may be employed to gloss over underlying complexities of design and implementation with which the player need not be concerned. To a PHOSITA (Person Having Ordinary Skill In The Art), “evolutions” will be understood as referring to particular selection and screening procedures that result in different physical outcomes for the constructs subjected to them, where some outcomes are considered to indicate greater evolutionary fitness than others. Once the player has apportioned various genes to various “evolutions”, he can then inspect the results, which comprise a “results gene pool” from which he can draw to continue the cycle. Importantly, the “results gene pool” may consist of many new variants that were measured as having come to dominate a greater portion of the population than they did in the starting materials. Thus it is useful though not strictly necessary to have the player define the degree to which the specific details of his gene may be randomized, so that unexpected variants can emerge and greater swaths of sequence space can be explored. It is important to note that the “results gene pool” will contain consensus sequences representing the most selected for related inputs among the inputs given by the player. Thus genes in the results pool do not necessarily represent actual individual physical entities that were found to have been selected or screened for during the “evolutions”. Rather, results pool genes are formed by algorithms that look for optimal residues in various positions within related individual genes sequenced post-evolution. A user might ideally select and/or edit these to use as starting materials to be synthesized for the “next round” of the game. Importantly, players will also be able to view individual sequencing results not built as consensus estimates, and may select among these. Also importantly, the results pool may also be viewable as a 3D topological fitness landscape, wherein three variables are arrayed in X, Y, and Z in such a manner that typically the most important variable in terms of the fitness ideal toward which a specific mission is driven is chosen as the Z coordinate, with the optimal value defined as the highest Z point, so that the player may simply observe how the genes in the results pool (or other genes that have been copied and derived from the same evolutions, or genes that have not been actually synthesized but whose placement on the fitness landscape is algorithmically estimated, with caveats given) sit in a 3D (or more) fitness landscape. This “fitness landscape view”, where genes may be directly and intuitively observed, is an important aspect of the software and visualization methods of the invention. In one embodiment, the X, Y, and Z values correspond to chosen outputs of specific evolutions. As an example, a mission might have the goal of finding enzymes that fulfill three different criteria: cleavage of a target molecule, doing so at a certain pH value mainly, and doing so mainly in the presence of a specified cofactor. A 3D fitness landscape exploration of such data from three different experiments, with predefined optima, should help guide the player to making better choices in pursuit of the mission’s ends. Figure 3 illustrates a method for how to create a game that can be played intelligently (as a function of the person’s raw ability to pattern match) without knowledge of genetics or biochemistry. Rather than having to directly manipulate sequence data in the form of strings of nucleotides, players manipulate visualizations of objects more suited to human intuition. These objects correspond in fine detail to the underlying actual sequence of the molecules they represent. Here we consider one particular unmodified scaffold protein- retinoid-X-receptoralpha, or RXR-alpha, as a monkey. Following Seelig and Szostak (2007), we may wish in our game to use this very handy scaffold in order to evolve other novel catalytic activities within two variable loop regions- regions consisting of a total of 21 amino acids. We consider a “base monkey” who has up to 58 different definable features that can each take on between 2 and 26 different values, in order to visually code for all observed variants in the loops and other contingencies that arise as the scaffold itself mutates. We list the “monkey’s features” and all the different values that those features might have- from all the different types of hats it might wear to the color of its fur in different places to the length of its tail, etc., and we show the “gene avatar” for ligase 1 that was “selected for” in Seelig and Szostak (2007), in figure 2 of that paper. Note as well that this is a simple 2D visualization of the monkey. There can be embodiments of the game involving 3D and even 4D and higher “dimensional” inspection of the “gene avatar”, so that (for instance) the full body all around can be considered in 3D, in 4D walking and other behaviors can be considered a variable trait, in 5D considering sound as a fifth dimension, traits related to noisemaking and speech could be variable traits, and in 6D considering psychological artificial intelligence and interactive traits as a sixth dimension, players could inspect these by, well… interacting. One aspect of this method of the invention is the concept of an “overflow buffer”, which is activated when mutations become too numerous or complex for concrete visual representation. The overflow buffer comprises a set of generic image manipulation procedures (filters, distortions, color manipulation, inversions, etc.) that don’t need to be defined for a particular scaffold/avatar. For instance, RXRalpha’s avatar may be a monkey, but extreme directed evolution could result in mutations to its scaffold and its variable region so extensive that the predesigned image variants and changes simply can’t accommodate them. At this point, the program would call up the overflow buffer procedures, which will allow encodement of a large number (e.g. ten kilobytes’ worth) of sequence changes (an outlandishly large number in the context of engineering even a highly complex protein and treating it all as variable- almost every amino acid would have to change in order to overflow the overflow buffer). It is critical to note that the example presented is extremely graphically primitive, but a PHOSITA in programming and graphic design skills will be able to create much less clunky-looking graphics based on the principles of the method. Figure 4 shows a process flow indicating what happens remotely and physically in order to correctly correspond to the actions taken by the player within the game. The process relies heavily on two key technologies, DNA sequencing and DNA synthesis. A copy of Figure 2 is underlaid on Figure 4 in order to show correspondence. First, a player must have genes to play with. However, as a matter of real-world economics, no genes are actually synthesized until the player has committed to subjecting them to a set of defined “evolutions”, or selection/screening protocols. At this point, genes are physically synthesized, and then processed into the particular molecular constructs that may be needed for the “evolutions”. Different examples would include phage display- where the gene needs to get put into a phage- and mRNA display- where instead the gene needs to get processed as an mRNA that is covalently linked to its protein product. Then the selection/screening protocols are performed, and deep sequencing is done on various selected/screened portions. This sequencing serves to define the “results pool” from which players can draw for subsequent turns of game play. The “results pool” will typically show sequence consensuses rather than actual found sequences. At this point the cycle is repeated. Figure 5 shows code examples from OODLES, Object Oriented Design Language for Experimental Science, an automation-directing programming language into which player actions can be compiled. At its deepest level, OODLES directs the actions of automation systems. At the highest level, OODLES provides easily human parseable experimental strategy content and protocol descriptions. Because it is useful to store player actions in stereotypable representative manners that are later amenable to repeating, optimizing, hacking, changing, and reporting, and because that process increases the odds in the long run that players will make meaningful scientific contributions while playing the game, this method of the invention increases the invention’s overall usefulness. The figure shows code compiled directly from player actions within “Sequence Space Explorer”, an embodiment of the invention closest to that described herein. The figure also shows code compiled directly from actions of someone playing “Temple of Genomic Freedom”, a game that acts as an example of another method of the invention- a game that more elite and knowledgeable players can use in order to perform R&D at the a more granular and detailed level than in “Sequence Space Explorer”. Thus, players within “Temple of Genomic Freedom” will be able to design missions and control automated facilities that implement evolutions. Playing “Temple of Genomic Freedom” is analogous to being a “dungeonmaster” in the traditional role-playing game Dungeons & Dragons, where the normal players are playing “Sequence Space Explorer” and other games. Figure 6 shows a representative screenshot from one embodiment of the invention, a game entitled “Sequence Space Explorer”. The main user interface screen shown here has all the necessary components outlined in Figure 1. A corral of genes are shown in the upper left. The ability to construct new genes is offered by simply right-clicking on any gene to open an editing box. The evolutions are listed by their technical names (not a necessity) in a center column in white. Inputs are on the leftgenes or “gene avatars” that are queued up for commitment to the select “evolutions” with which they line up in the middle. Results are shown on the rightincluding results specific to this player’s genes (she’s just started playing so there are none yet), and also, under “All Results”, global results from everyone who’s playing the game. This lets even unsophisticated players form an idea of consensus values for particular aspects of genes undergoing evolutions. For players who wish to look at real details of the underlying molecule biology, the box that opens upon right-clicking allows any player to change the avatar of any gene to a crystal structure of the closest homologous entity, with or without energy minimization to match the changes between the homologous entity and the gene selected. Finally, Figure 7 is a list by which the images in Figure 3 were generated. In Figure 7, the first code set corresponds only to possible straightforward modifications of the variable loop regions within RXR-alpha. Standard amino acid one-letter codes are used below in alphabetical order, including “B” for either N or D and “Z” for either Q or E., “J” for possible unnatural amino acid residues, “O” for deletions that would be specified in a third code set, and “U” for inserts that in this case don’t need to be described, but would be described in a fourth code set if necessary. Further, “X” is used to indicate an undefined (presumably random) position. All “random loop” residues begin as set to “X” in terms of the monkey’s features. Thus, all 26 letters of the alphabet are used, so features can be read in alphabetical order based on A-Z. Note that the “original monkey”, the starting scaffold with no mutations other than the “X”’s, has “unmodified” in each place. In all cases, there is an attempt to create descriptors that alphabetically match to the starting letter of each amino acid as an aid to mnemonic interpretation by developers and elite players. More ideally than in the example presented here, an embodiment would use feature descriptors that mimetically or visually represent the continuum of chemical similarities between amino acids or other forms of biomolecular polymer residues, so that players could intuit similarities between different feature values without knowing anything about the underlying biochemistry. First, Figure 7(a) lists the 21 random loop positions. It is also attempted to use these positions in a clockwise manner with regard to the particular generic monkey. Finally, for the sake of brevity, only features are presented that are relevant to LIGASE 1 described in figure 3 and in figure 8. Secondly, Figure 7(b) encodes changes to the non-variable scaffold in terms of three different non-variable regions. Up to 2 different changes as far apart as 12 residues need to be accounted for in the first region, up to 8 different changes as far apart as 7 residues need to be accounted for in the second region, and up to 7 different changes as far apart as 14 residues need to be accounted for in the third region. In order to avoid having an unwieldy number of different variable features, these changes are encoded in a compressed manner. First, for all scaffold areas the total number of changes are feature-encoded- second, for each change, a feature encodes the distance in amino acids from the previous feature (disregarding intervening variable regions) with up to a number of possibilities equivalent to the length of the scaffold or the observed greatest distance between mutations- third, for each change, a feature encodes the change in standard 26-possible-letter mnemonic format. Thus, the total number of features needing to be encoded to give a full specification of mutations to scaffold regions- not counting the size of deletions- is 1 total number specifier plus X distances between them plus Y change specifications, where X=Y. Thus, 1+17+17 features are needed to describe all observed scaffold mutations, for a total of 35 different features to fully specify changes to the scaffold region. Including 21 features for variable regions, and 2 features to specify deletion sizes, there are a total of 58 features necessary to specify all 7 ligase mutants in Seelig and Szostak’s paper. Note however that we here present only ligase 1, which does not require this many features to be specified. A (small) third code set would be required to represent the size of deletions. This would encode the size of deletions as expressed by “O”-variable features in the scaffold regions. Note that the smaller variable region’s deletions are fully described inherently, so additional descriptors are not required. However, the length of deletions needs to be described in order to provide a complete specification of mutations to scaffold regions. In this example, we need only provide as many features as there were deletional “O” values in the scaffold mutation descriptions. As there were observed up to only one deletion in each of two scaffold regions, we need to provide only two features that can take on a number of traits equivalent to the longest possible or in this case longest observed deletion, which is fifteen amino acids in scaffold region two and thirteen amino acids in scaffold region three. However, we make these notes only theoretically, as we present only a gene avatar for ligase 1 in this example, and deletion encodements are not necessary to represent ligase 1 in respect to the original scaffold. However, if we were to represent ligases 2,3,5,6, and 7 herein, there would be the need to represent the deletions by following the feature-encodement methods presented above. Note that no provision has been made for insertional mutations in this embodiment, but that such provision could be made straightforwardly based on the encoding principles described herein. Note as well that this particular embodiment does not include provision for an “overflow buffer” allowing for feature-based visualization of genes with greater complexity (including greater scaffold variant distances, greater number of scaffold variants, and greater number of deletions) than the ligase mutants represented here. However, the value of the method is in its general applicability conceptually to any gene systems that require feature-based visualization, and so all that is required in order to represent greater complexity is the input of more features and possible variable values for those features into a system for creating and representing those variants visually. It is also important to note that such a system could be implemented such that all variants are produced manually, but that to a PHOSITA of computer programming and video graphics it will be a straightforward matter of programming in order to automate such procedures as those that generate the present embodiment through manual input. Finally, Figure 8 is a reference with acknowledged copyright from the British journal Nature, showing the key active mutants uncovered in that paper, which correspond to Figure 3’s monkeys in all their multifariousness. Figure 1 Minimal elements of the invention Figure 2 Very basic flowchart of play Figure 3 Gene avatars for gamification of molecular engineering by DE Left: “basic monkey” “gene avatar”, representing original RXR-alpha gene with all “X”’s in variable region and no changes to the scaffold. Right: “complex monkey” “gene avatar”, representing ligase 1 from Seelig and Szostak (2007). basic monkey RXR-alpha complex monkey Ligase 1 Figure 4 Real world management actions overlaid on gameplay flow Figure 5 OODLES example code Figure 6 Example embodiment of the invention as a game called “Sequence Space Explorer” Figure 7- lists by which images/ genetic avatars in Figure 3 were constructed. 7(a) random loop encodements Random loop 1 Position 1- top of monkey’s head, for I Iridescent hair bubble Pos 2- monkey’s eyes, for L Loving heart eyes Pos 3- monkey’s left ear, for D Devil-ette inside Pos 4- monkey’s nose, for D Doubling for total of four nostrils Pos 5- monkey’s mouth, for A Apple stuck inside it Pos 6- monkey’s right hand holding something, for Y Yin-Yang symbol Pos 7- monkey’s left hand changes, for D Distended laterally Pos 8- monkey’s right foot changes, for Y Yellowish glow Pos 9- monkey’s right leg changes for K Kinked leg Pos 10- monkey’s left arm changes for Q Quilled (feather appearance) Pos 11- monkey’s left foot changes for T Tripled Pos 12- monkey’s left leg changes for D Doubled Random loop 2 (note for convenience sake in current embodiment these are performed first upon the basic monkey.) In ligase 1: ESYHKCQDL Pos 1- monkey’s distal tail change for E- Elongated Pos 2- monkey’s proximal tail change for S- Sharpened (contrast increase) Pos 3- monkey’s lower abdomen change for Y- Yellowed Pos 4- monkey’s midsection change for H- Hot, shown on fire Pos 5- monkey’s chest region change for K- “kooled”, made blue Pos 6- monkey’s neck region change for C- corned, an ear of corn is put here Pos 7- monkey’s mouth change for Q- Quintupled Pos 8- monkey’s back, thing riding on it, for D- a donkey! Pos 9- monkey’s left ear change, for L- Larger 7(b) changes to scaffold encodements Pos 1: total number of changes in scaffold region Feature: # of short green lines drawn on outsides of monkey’s arm For ligase 1: 14 for each of 14 changes, the distance from the previous (or beginning) of the scaffold to the change must be encoded onto the body parts listed below in order. To each of these, apply a 100 pixel radius vortex distortion filter using Pixelmator for Mac to each of the areas in 1-14 with an angle equal to 300*the distance. Then, paste a small ~50x50 pixel “tattoo” to the center of each of these vortex areas, based on the starting letter of the amino acid. 1. top of monkey’s head, 1600˚, Narwhal 2. monkey’s eyes, 4800˚, robot 3. monkey’s left ear, 900˚, robot 4. monkey’s nose, 300˚, queenbee 5. monkey’s mouth, 2100˚, lucky charm 6. monkey’s right hand palm area, 2100˚, sword 7. monkey’s left hand, 1200˚, yak 8. Right foot, 600˚, yak 9. Right leg, 300˚, robot 10. Left arm, 2400˚, robot 11. Left foot, 1200˚, king 12. Left leg, 600˚, tiger 13. Proximal tail, 3300˚, inkpot 14. Distal tail, 900˚, queenbee Figure 8 Literature basis for gene avatar design This is a screen capture of figure 3 from “Selection and evolution of enzymes from a partially randomized non-catalytic scaffold”, Seelig B and Szostak J, Nature 448, 828-831 (16 August 2007)

Methods for Gamification of Molecular Engineering by Directed

Related documents

Products

Support

Methods for Gamification of Molecular Engineering by Directed

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib