Stukova 1 Summer 2010 CS 350 Final Exam By Inna Stukova 1. The client has decided that it may be desirable to add additional rows of tiles to the existing structure with the result being a protein sheet. We currently have a row which describes the edge of the protein sheet and a row that can be built upon. The client wants to be able to input an integer R, which represents the number of rows of length N, and search for and display the largest protein sheet possible given a particular data file. New Requirements: - - User-inputted R that represents the number of rows for the protein sheet. Thus, the protein sheet will consist of an initial strand and R-1 complement strands of length N. Similarly to the initial requirement, each complementary strand will begin and end with Type Single tile and consist of the Type Zero tiles. Proposal: The new requirements posed by the user will affect the following epics: User Input, Solver, and Validation. The Creation epic will remain unchanged since there are no new requirements about the tile and bucket structure. User Input Epic: As set up initially, the Helix application allows a file name to be passed in as a parameter. This file enables the user to input tile data directly for the Type Double, Type Single, and Type Zero buckets. The user will then be prompted to specify N, the length of protein sheet, and R, the number or rows for the protein sheet. The user will have an option to enter R and thus receive an output as a protein sheet of size NxR, or to let the program search for and display the largest protein sheet possible without specifying R. Solver Epic: The initial strand of size N is created (solved) in accordance with the initial design. The left side Type Double type is the root of a created tree. From left to right, each tile is added to the previous tile until the number of tiles reaches N-1. Type Double tile is the last in the strand of size N. The complement strand is then found according to the initial story cars. The bonding rules remain unchanged. However, after the first complement strand is found, the same process is Stukova 2 repeated to solve for the rest of the complement strands until a protein sheet of size NxR. If the user does not specify R and chooses to find the largest protein sheet possible given a particular data file, then the process of solving the complement strands will stop only if there are no more possible complement strands to create. Validation Epic: The validation stage will check the built protein sheet against the other sheets created. In case of specified R, if there are any tiles in the new sheet that correspond in edges to the tiles in the previous sheet, the new protein sheet will be disregarded as a valid solution. If the user did not specify R, the Validation process will also compare each protein sheet in terms of the number of rows in the sheet. The output of this process, the largest protein sheet will then be displayed to the user. Otherwise, the output will be all possible solutions. The exact format of the output is still under the consideration. The following Story and Test cards are added: S-U1 T-U1 Prompt the user to enter the number of rows for the desired protein sheet. Check integer R to make sure that R>0 Allocate the space for integer R. The user will have an option to skip this step by pressing Enter. In this case the solution will be the largest possible protein sheet. Estimated LOC: 20 Estimated Time: 30 min S-S1 Once the first complement strand is built, continue the same process of solving the rest of the complement strands until the number of rows reaches R, if R is specified, or until there can be no more complement strands built. Estimated LOC: 50 T-S1 If R is specified, check that the number of complement strands is equal to R-1 Stukova 3 Estimated Time: 75 min S-V1 T-V1 Check for duplicate protein sheet solutions. Ensure that only duplicates are removed from the list and all other solutions are ignored. Estimated LOC: 30 Estimated Time: 45 min S-V2 T-V2 If R is not specified, check each protein sheet solution for the number of rows. Output the solution with the largest protein sheet built Calculate R for each protein sheet solution. Find the max R and compare it to the R of the output for accuracy Estimated LOC: 30 Estimated Time 45 Project Plan Summary Epics Initial LOC Added LOC Total Total % Increase User Input 200 20 220 9.1 Random Input 410 0 410 0 Creation 715 0 715 0 Solver 495 50 545 9.2 Validation 85 60 145 41.4 Total 1905 130 2035 59.7% Stukova 4 2. The client been contacted by the FBI and has been asked to modify our code to be used on fragmented DNA sequences of length N. Imagine that instead of an empty list of tiles we are provided with a number of substring fragments that may occur in either of the two rows. The buckets now contain the DNA acid components and an exhaustive list is provided. In some special cases, letters besides A, T, C, and G are present in a sequence. These letters represent ambiguity. Of all the molecules sampled, there is more than one kind of nucleotide at that position. What is desired is for the user to be receive a report of all possible solutions and of equal importance is to report what is NOT possible in order to find any negative matches within a set of potential DNA sequences to eliminate suspects. False negatives are not acceptable. For example given a particular input the substring AATC may not be possible in the existing gaps. New Requirements: - Create tiles of types A, T, C, G, and N Create buckets for each tile type Create look up list for DNA bonding rules Proposal: The new requirements will eliminate the Random Input epic as the information to be analyzed will be provided in a user-specified file. All other epics will be affected in the following manner: Creation: Tile creation: a tile is an object of type A, T, C, and G Bucket creation: five bucket objects are created for each tile type. User Input Epic: The user will be prompted to enter the name of the file that will provide a number of substring fragments that occur in either of the two rows. The information provided will look similar to this (Row 1) A C G A T (Row2) T C Solver Epic: Stukova 5 Below is the look up table that will be available for the Solver Epic as well as the Validation Epic. A = adenine C = cytosine G = guanine T = thymine R = G A (purine) Y = T C (pyrimidine) K = G T (keto) M = A C (amino) S = G C (strong bonds) W = A T (weak bonds) B = G T C (all but A) D = G A T (all but C) H = A C T (all but G) V = G C A (all but T) N = A G C T (any) Following the rule of DNA bonding (A binds with T and C binds with G), the Solver will first analyze the first row. For each tile in the top row it will check the tile in the bottom row. If the tile is correct, it will move on to the next tile in the top row, it the bottom row tile is missing, the appropriate tile will be inserted. If the top tile is absent, the Solver will analyze the next tile pair. Once the top row ends, the Solver will then move on to the bottom row and pair every bottom tile with the appropriate tile on the top. It will skip all blanks. Validation Epic: This process will search for every invalid solution in the following way. If the following sequence is to be validated: AACT GA TTGA CT such solution will be considered invalid due to the break in the strand. The following Story and Test cards will be in effect for this project: S-U1 T-U1 Prompt the user to enter the name of the file that contains number of substring fragments that may occur in either of the two rows. Check that the file specified by the user does exist. Estimated LOC: 5 Estimated Time: 10 min Stukova 6 S-C1 T-C1 Create a tile object. Four tile types are A, C, G, T Verify that only types A, C, G, and T are accepted by the program Estimated LOC: 30 Estimated Time: 45 min S-C2 T-C2 Create a bucket for tile type A Implement bucket as an indexed list (vector). Read a line from the file and call tile builder passing it the information. Add the returning tile to the bucket. Estimated LOC: 60 Estimated Time: 90 min S-C3 Create a bucket for tile type C Implement bucket as an indexed list (vector). Read a line from the file and call tile builder passing it the information. Add the returning tile to the bucket. Estimated LOC: 60 Estimated Time: 90 min S-C4 Create a bucket for tile type G Implement bucket as an indexed list (vector). Read a line from the file and call tile builder passing it the information. Add the returning tile to the bucket. Estimated LOC: 60 Estimated Time: 90 min S-C5 Create a bucket for tile type T Check that the bucket only contains tiles of type A T-C3 Check that the bucket only contains tiles of type C T-C4 Check that the bucket only contains tiles of type G T-C5 Check that the bucket only contains tiles of type C Stukova 7 Implement bucket as an indexed list (vector). Read a line from the file and call tile builder passing it the information. Add the returning tile to the bucket. Estimated LOC: 60 Estimated Time: 90 min S-S1 Starting with the top row, check each tile for its complemented tile on the bottom row and fill the bottom row according to the look up table provided. Skip the blanks and move to the next tile until the end of the row Example ACG T AG ↓ ↓ ↓ ↓ ↓ ↓ TGCAA TC Estimated LOC: 200 Estimated Time: 300 min S-S2 Starting with the first tile on the bottom row check each time for its complement on the top row and fill in the top row blanks with the appropriate tiles according to the look up table provided. If there is a blank, skip it Example ACGTT AG ↑ TGCAA TC Estimated LOC: 200 Estimated Time: 300 min S-V1 Check the solution for blanks. If found, the solution becomes invalid as there is a break in the DNA strand. If no T-S1 Check the look up table to ensure that all tiles are matched correctly with their complements T-S2 Check the look up table to ensure that all tiles are matched correctly with their complements T-V1 Check that only solutions with breaks in the strand are considered invalid Stukova 8 breaks found, display the completed DNA strand as a valid solution Estimated LOC: 100 Estimated Time: 150 min Project Plan Summary Epics Initial LOC New LOC New Total % User Input 200 10 5 Random Input 410 0 0 Creation 715 270 37.8 Solver 495 400 80.8 Validation 85 100 117.6 Total 1905 780 40.9% 3. The client is interested in adapting the software to move from 4 bonds on each tile to 6 bonds on a cube. Furthermore these bonds can rotate on a surface (imagine a box with a pencil in one side, you can spin the box around that axis a full 2 PI radians. This can lead to strange three dimensional tree-like structures, so just as our 2-d tiles have a North orientation, each cube has a north face and each face has a rotation angle around one of the three Cartesian coordinate axis. E.g. 6 reference angles for each cube structure. Additionally, there will now be 4 buckets, one for each of the three current buckets and one more for 3 blocks. The 3 block-cubes are arranged so that all surfaces with a block share a common vertex (or corner). As in the original project a size N is entered and all possible 3d solutions of that size are presented. New Requirements: - Create an additional Type 3 bucket that will contain 3-block cubes, Type 3 Solve and verify all possible 3-d solutions. Proposal: Stukova 9 To satisfy the new requirements, no changes have to be made to the existing epics. However, additions are necessary for all epics in the following way. User Input: The user input file enables the user to input data directly for the Type 3, Type 2, Type 1, and Type 0, rather than have it randomly generated. Essentially, the input file is split into four new files, one for each tile type. In addition, the user will still be prompted to specify N, the length of a strand that forms a solution. The user will also have to specify whether a 2d or 3d solution is desired. If a 2d solution is desired, then the program will follow the original algorithm for solving for initial and complement strands. If 3d solution is requested, the program will follow a new algorithm. Creation: In addition to creating 2d tiles in accordance with the initial story cards, a new 3d cube with 3 blocks will be created. The cubes are arranged so that all surfaces with a block share a common vertex. The structure of the cube consists of the following information: ID (char Type T, int seqNum), Name, Orientation. New bucket will also be created to contain these Type T cube. Random Input: In addition to randomly creating Type 2, Type 1, and Type 0 tiles, the new algorithm will be implemented to randomly create 3-block cubes, Type 3. Solver: Solver will build an initial strand that meets the following requirements: has specified length N, consists of cubes from the Type 3 bucket, the beginning cubes will have blocks on the top, front and left sides and the end cubes will have blocks on the top, front and right sides. Each interior cube must have a block on top and a left key that complements the right key of the cube on its right and a right key that complements the left key of the cube on its left. The complement strand will be built in a similar way. Additionally, each cube’s top key much complement the bottom key of the appropriate cube from the initial strand. The first and last cubes of the complement strand will have blocks of the left and right side respectively. Validation: Similarly to the initial algorithm, the validation process will check for the duplicates. If there are any cubes in the new solution that correspond in edges to cubes in previous solutions, the new solution will be disregarded. Stukova 10 The following Story and Test cards will be added to the project: S-U1 T-U1 Prompt the user to specify whether the 2d or 3d solution is needed. Estimated LOC: 10 Estimated Time: 20 min S-R1 If user chooses randomly generate the tiles and cubes, build the 3d cube with the following attributes: ID (char Type T, int seqNum), Name (RandT + seqNum), Orientation. Fill Type with T, Assign 0 key to any three sides of the cube that share common corner, randomly assign numbers from -11 to 11 excluding 0 to other sides of the cube. Output cubes to the cube data file. Estimated LOC: 100 Estimated Time: 150 min S-C1 Create 3d cubes Type 3. Each cube has three block keys that share a common vertex. The structure of the cube consists of the following information: ID (char Type T, int seqNum), Name (RandT + seqNum), Orientation. Each cube will look similar to this: 0 0 5 0 4 1 Estimated LOC: 200 Estimated Time: 300 min S-C2 T-R1 Check that Type field is T. Only three sides contain a block key All other sides contain non-zero integers from -11 to 11 T-C1 Verify that Type 3 cube is created Verify that there are 3 block keys T-C2 Stukova 11 Initialize a Type 3 bucket for the Type 3 cube. Implement bucket as an indexed list (vector) Read the cube data input file. Each line contains the input data to build one cube. Estimated LOC: 150 Estimated Time: 200 min S-S1 Initial and complement strands of length N consist of only Type 3 elements. The beginning cubes will have blocks on the top, front and left sides and the end cubes will have blocks on the top, front and right sides. Each interior cube must have a block on top and a left key that complements the right key of the cube on its right and a right key that complements the left key of the cube on its left. Verify that the bucket contains only Type 3 elements T-S1 Check that each key of the interior cubes is matched correctly with its complement. Check that the beginning cubes have blocks on the top, front and left sides and the end cubes will have blocks on the top, front and right sides. Estimated LOC: 300 Estimated Time: 400 min S-S2 T-S1 The complement strand is built similarly to initial strand. Each cube’s top key much complement the bottom key of the appropriate cube from the initial strand. The first and last cubes of the complement strand will have blocks of the left and right side respectively. Check that each key of the interior cubes is matched correctly with its complement in the complement strand as well as with its initial strand cube complement. Check that the beginning cubes have blocks on the left side and the end cubes have blocks on the right side. Estimated LOC: 300 Estimated Time: 400 min S-V1 T-V1 If there are any cubes in the new solution that correspond in edges to cubes in previous solutions, the new solution will be disregarded. Output the valid 3d solution to the user. Estimated LOC: 50 Estimated Time: 75 min Check that only duplicate solutions are eliminated as valid. Stukova 12 Project Plan Summary Epics Initial LOC Added LOC Total Total % Increase User Input 200 10 210 4.8 Random Input 410 100 510 19.6 Creation 715 350 1065 32.9 Solver 495 600 1095 54.8 Validation 85 50 135 37.0 Total 1905 1110 3015 36.8%