Project in Bioinformatics – 236524

Project in Bioinformatics – 236524 Coloring Heuristics for Dotted Graphs Presented By: Shai Lubliner 036317048 Talya Gendler 061106597 Contents: 1. Introduction (Biological Background) 2. Modeling the problem 3. Algorithms & their implementation 4. Program design & usage 5. Results 5.1 Biological data 5.2 Randomly generated data 6. References 7. Appendix 2 1. Introduction (Biological Background) Genotyping The process of genotyping determines the variants of certain sequences in an individual’s DNA, out of a known set of polymorphisms which exist in the population. This allows differentiation between individuals, linkage analysis, prenatal disease detection, etc. Among the most important types of polymorphisms is MLP (Microsatellite Length Polymorphism). Microsatellite Length Polymorphism Microsatellites are ubiquitous short tandem-repeat sequences widely and randomly distributed throughout eukaryotic genomes. They are acutely prone to replication errors that result in expansions and contractions of repeat unit (repeat length variability) because of misalignment of the template and daughter strands. The typical repeated sequence is 1-5 bp long, typically repeated 10-20 times. The MLP is flanked by fixed areas, which enable the design of specific primers. The genotyping process The process of determining an individual’s allele is comprised of designing primers complementary to the flanking regions, enhancement by PCR and electrophoresis. The amount of resources needed for electrophoresis is measured by two factors: tags and lanes. Minimization of both can reduce experiment costs. From here on we will focus on reducing the number of lanes, under the assumption that only one tag is used. This can be achieved by multiplexing – using the same lane for more than one MLP. We seek the minimal number of lanes with which one can conduct genotyping, in a manner that enables decisive interpretation of the outcome. 3 2. Modeling the Problem MLP Representation It is clear that an MLP can be represented as an arithmetic progression, also referred to as a dotted interval. Formally: • For any polymorphic microsatellite p, let F be the length of the flanking area for p and Δ the length of the repeat sequence. Let l and h be the minimum and maximum number of repeats possible in p, respectively. • The possible lengths of the DNA fragment corresponding to p are: Sp = {F + n Δ : l ≤ n ≤ h} which is an arithmetic progression. Two MLPs are distinct iff their arithmetic progressions share no mutual point. When genotyping, two MLPs can be run in the same lane iff they are distinct. Dotted Interval Graphs (DIGs) Seeking to exploit the dotted interval representation of MLPs, Aumann et Al.[1] present the following model, which is a variation on the well known Interval Graph: - Each arithmetic progression is represented as a vertex. - Two vertices have an edge between them if and only if the corresponding progressions are distinct. - The resulting graph is referred to as a DIG (Dotted Interval Graph). - The notation DIGd represents a DIG with the integer d being its maximal jump. Solution lies in coloring It is obvious that a coloring of the graph induces a partition of all MLP samples to lanes. Unfortunately, coloring DIGs has been proven ([1]) to be NP-complete. Therefore we turn to approximation algorithms and heuristics. Our goal is to examine and compare a set of approximation and heuristic algorithms, and see which gives the best results when run on DIGs. Mainly, we wish to conclude which algorithm gives the best results for biological data. The primary parameter by which we rate an algorithm’s performance is having minimal number of colors assigned to the graph’s vertices. The secondary one is the skew of the color size. 4 3. Algorithms & their implementation Sequential coloring based algorithms The following algorithms are all composed of two main steps. The first one is a vertex ordering algorithm and the second is the sequential coloring algorithm. In our implementation of the Sequential Coloring (SC) algorithm, we insert each vertex into the minimal sized color (if possible), in order to try and minimize the skew of the color sizes, as proposed in [3]. All three algorithms are heuristic. The first two algorithms are elementary. We regard them as an experimental control. 1. Ending First Coloring (EFC / Interval Graph Coloring) – Vertices are ordered such that vertex (dotted interval) a precedes vertex b if a ends before b. 2. Beginning First Coloring (BFC) – Vertices are ordered such that vertex (dotted interval) a precedes vertex b if a begins before b. 3. Smallest Last Coloring (SLC), which is based on Smallest Last Ordering (SLO), discussed in [2]. SLC is known to be a useful algorithm for coloring general graphs. Roth et al. [3] discussed and tested SLC performance on a dotted interval graph representing the biological data taken from the Weber8 set [4]. SLC outperformed the Heuristic Approximate Chordal Coloring algorithm, and was also proved to have returned the optimal result (in this case). We therefore consider this as “the algorithm to beat”. 5 “DIG aware” approximation algorithms The following two algorithms are presented in Aumann et al. [1], and exploit the characteristics of DIGs. DIG2 Coloring (DIG2) – This algorithm has been proven to color DIG2 graphs with an approximation ratio of 3/2. Some definitions for a vertex vi: bi - the beginning point of the DI represented by vi Vi - the vertices that are represented by the DIs above the point bi. Gi - a subgraph of G induced by the vertices of Vi. Following is the algorithm, which iterates over the vertices according to BFO (beginning first order): The “DIG awareness” of this algorithm is epitomized in the first if condition (3-4). Considering the possibility of multiplexing, it first tries to use one of the colors already assigned to overlapping DIs. 6 DIGd Coloring (DIGd) – This algorithm has been proven to color DIGd graphs with an approximation ratio of (7/8)d +3/8. Some definitions: Vi - the vertices (DIs) whose jump=i. Gi - a subgraph of G induced by the vertices of Vi. Following is the algorithm: A weakness of this algorithm is the fact that it assigns a new set of colors in each invocation of a coloring algorithm. 7 A color reducing heuristic We suggest a simple algorithm that, given a coloring of a graph, attempts to reduce the number of colors. The algorithm ReduceCol: 1. sort colors by ascending size → C1,…,Cn 2. for C = C1 to Cn : 2.1. foreach v ∈ C try to relocate v into another color, in descending size order if relocation fails, go to 2 for next color 2.2 if C is empty remove it from coloring The notion of this algorithm arose after we received the DIG2 and DIGd algorithms’ results (presented later). These results were characterized by a distinct skew of the color sizes. It is obvious that this algorithm can only improve the results attained by any other algorithm, regarding number of colors used. Since this algorithm tries to eliminate colors starting with the smallest, we expect it to reduce the skew of color sizes whenever it receives a skewed coloring to begin with. On the other hand, lessening the color number of balanced colorings may come at the price of enlarging the coloring skew. Notation issue – We add the ++ sign to an algorithm’s name to indicate that ReduceCol is run successively. For example, DIG2++ means running DIG2 and immediately after running ReduceCol on its output. 8 4. Program design & usage Components Our code exists of the following: - A C++ core program (DIG.cpp\hpp, DIG_DottedInterval.cpp\hpp, DIG_Vertex.cpp\hpp & DIG_Util.cpp\hpp) - A C++ random dataset generator (DIGen.cpp\hpp) - A perl script which serves as a wrapper for the core program (DIG.pl) - A perl script which allows mass testing of the algorithms’ performance (StatGen .pl) The core program The core program receives a data file which specifies a set of dotted intervals and uses it to build a DIG. The dotted intervals can be infinite as well as finite, so the program supports the wider notion of DIGs (as presented in [1]). The program implements all of the algorithms discussed in section 4. The coloring algorithms specified by a user request are run on the DIG to produce an output file, specifying the results per algorithm. The core program design focuses on data safety and correctness, rather than on time and space performance. As will be later mentioned, for biologically relevant sized datasets, the time performance is satisfying. Random dataset generator This program receives the following parameters: Dataset size, maximal jump, random seed, previous random seed. Dataset size – determines the number of DIs in the dataset produced. Maximal jump – for each DI we randomly select a jump value from the integer range [1,…,maximal jump]. Random seed – used to initiate the random generation process. For a given nonzero random seed, the generation is actually deterministic. Hence, given a specific set of parameters (with a nonzero random seed), the program generates a specific dataset. This enables reconstruction of previously generated datasets and allows us to include the option of regression, used in stat_gen.pl (explained below). Previous random seed – used to ensure that two consecutive calls to the generator will not result in identical sets. This is relevant for mass testing as in StatGen.pl. Each DI is characterized by the size of its flanking areas, its jump and the number of dots it contains. Based on random decisions of these three parameters, we can determine the DI’s beginning, end and offset. 9 Since we focus on achieving best results for biological data, which of course is made up only of finite DIs, we only constructed finite DIs. We chose the range of [1,…,200] for the flanking area size and that of [1,…,50] for the number of points. These ranges were chosen based on examination of the Weber sets [4]. All random decisions are made to simulate uniform distributions. StatGen.pl This program has two operation modes; regression based and non-regression based. In both modes the program requests the core program to run all of the algorithms on each dataset. In the non-regression based, the following parameters are received: Number of sets per dataset size (num_sets) (default=10), max jump (default=2), output file and new regression file. Number of sets per dataset size – specifies number of times a new dataset of size X is generated and fed to the core program. X belongs to {20,50,100,200,300,500,750,1000}. Max jump – passed on to the random dataset generator. Output file – contains the results. For each algorithm and dataset size X the following parameters are calculated, as an average on all num_sets datasets: - average number of colors - average color size - minimal color size - maximal color size New regression file – contains all information needed to recreate the test case. In the regression based option, the script receives a single parameter – the name of a regression file. This enables the recreation of a specific test case. Therefore a standard test case can be used for comparison of future algorithms with the existing ones. Documentation Each perl script includes a man page, invoked when the script is called with the ‘-man’ option. This documentation is user-oriented. We found it superfluous to include extensive developer-oriented documentation, trusting in our code fluency. Making the programs: We supply a short Makefile which specifies the needed rules for the making of the core program and the random dataset generator. For reasons of simplicity we did not partition our files into subdirectories. Any such partition calls for changes in the scripts and in the Makefile. 10 5. Results We discuss the results we received on two different types of data, the first being biological data and the second randomly generated data. 5.1 Results received on biological data We run the program on three different sets, all taken from the Weber sets [4]. Each such set includes multiple MLPs found in the human genome. The Weber sets are not distinct, and some contain great amounts of data contained in other sets. We chose these specific sets because they belong to different groups, and seem to represent most of the data present in all sets. In order to create the files representing these sets in the format requested by the program, we preformed some preliminary manual manipulation on the data taken from the web, followed by the help of a simple perl script (“convert.pl”, submitted). In doing so, we omitted any MLP with incomplete/indecisive definition (for example a missing starting\ending point, or starting\ending points not corresponding to the jump values given). The results are presented in the following tables: For set Weber8: (301 vertices, maximal clique size – 23) EFC number of colors 26 max color size 12 min color size 11 EFC++ 24 19 6 BFC 24 13 11 BFC++ 23 24 1 SLC 23 14 13 SLC++ 23 25 6 DIG2 24 21 2 DIG2++ 24 21 3 DIGd 26 21 1 DIGd++ 23 21 4 SLC 46 8 7 SLC++ 45 14 3 DIG2 45 15 1 DIG2++ 45 15 2 DIGd 47 18 1 DIGd++ 46 15 2 SLC 20 12 11 SLC++ 20 24 2 DIG2 22 21 1 DIG2++ 21 21 3 DIGd 24 22 1 DIGd++ 21 24 3 For set Weber13: (355 vertices, maximal clique size – 43) EFC number of colors 47 max color size 8 min color size 7 EFC++ 46 13 3 BFC 46 8 7 BFC++ 45 15 1 For set Weber53: (230 vertices, maximal clique size – 19) EFC number of colors 24 max color size 10 min color size 9 EFC++ 22 18 3 BFC 21 11 10 BFC++ 21 21 3 11 A point worth elaborating on is the results for the Weber8 set, since Roth et al. [3] also used the SLC algorithm on this set and received a result of 34 colors. They stated that this was in fact an optimal coloring, since the maximal clique size they found was 34. The set we used had 365 MLPs in it, and only 301 were left after omission as explained above. Their set, however, contained 383 MLPs. We can only conclude that there is some difference between the sets used. The best coloring we received was 23 colors, as is the size of the maximal clique. Therefore, we also achieved an optimal coloring in this case. Discussion – We recall that we measure the algorithms’ performance according to the number of colors assigned and according to the skew of the color size. As can be seen from the tables, the SLC++ algorithm is the best regarding color number. However, it does tend to greatly enlarge the coloring skew compared with the results of the SLC algorithm. Considering that the skew is practically important when actually running DNA samples in gel (we do not want overly crowded lanes), it seems that the best algorithm for these cases is SLC. 12 5.2 Results received on randomly generated data We used the StatGen.pl script described above for the generation of this data. The dataset sizes are {20,50,100,200,300,500,750,1000}. The parameters used were: Number of sets per dataset size – 100 max jump – 5 The raw output appears in section 7 (Appendix), and contains, in addition to the information discussed here, the standard deviation of the number of colors assigned by each algorithm. The following table contains the average number of colors assigned by each algorithm to each dataset size: 20 50 100 200 300 500 750 1000 EFC 7.26 14.29 25.03 43.8 62.43 96.52 138.29 178.89 EFC++ 6.87 13.08 22.57 39.53 56.19 87.26 124.57 162.01 BFC 7.11 13.85 24.29 43.05 61.72 96 137.48 179.16 BFC++ 6.83 12.98 22.24 38.98 55.11 87.59 123.05 159.46 SLC 6.86 12.98 22.26 39.02 54.79 85.05 121.13 157.49 SLC++ 6.82 12.79 21.84 38.3 54.01 83.88 119.67 155.59 DIG2 6.88 13.04 22.24 38.53 54.24 83.81 118.37 154.19 DIG2++ 6.79 12.79 21.97 38.09 53.65 83.07 117.71 153.25 DIGd 8.35 15.52 25.92 43.59 59.95 91.69 127.51 164.2 DIGd++ 6.87 12.98 22.15 38.64 54.33 84.34 119.73 155.33 Best results are marked in blue. As can be seen, ReduceCol indeed manages to reduce the number of colors. The SLC++ algorithm seems to be the best for smaller datasets though not by much. For larger datasets, DIG2++ is the best. Regarding the approximation algorithms presented in [1], it is clear that the DIG2 algorithm’s strategy (as mentioned in section 3) provides good results, while the DIGd algorithm’s color assignment “promiscuousness” provides poorer results. The graph on the following page shows the color skewness for each algorithm. It can be seen from the graph that our expectation was correct and that ReduceCol reduces the skew of color sizes whenever it receives a skewed coloring to begin with (for example DIG2), and enlarges the coloring skew of balanced colorings. 13 18 16 14 12 av col num EFC-min EFC-max EFC++-min EFC++-max BFC-min BFC-max BFC++-min BFC++-max SLC-min SLC-max SLC++-min SLC++-max DIG2-min DIG2-max DIG2++-min DIG2++-max DIGd-min DIGd-max DIGd++-min DIGd++-max 10 8 6 4 2 0 0 200 400 600 set size 14 800 1000 1200 6. References 1. Y. Aumann, M. Lewenstein, O. Melamud, R.Y. Pinter, Z. Yakhini, Dotted interval graphs and high throughput genotyping, Proceedings of the sixteenth annual ACMSIAM symposium on Discrete algorithms, Session 4B, Pages: 339 - 348 , 2005. 2. D. W. Matula and L. L. Beck, Smallest-Last Ordering and Clustering and Graph Coloring Algorithms, J. of the Association for Computing Machinery, vol. 30, no. 3, pp. 417--427, July 1983. 3. R.M. Roth, P. Webb, Z. Yakhini, Tagging DNA Fragments and Graph Coloring Methods, HP, 1997 4. Web site http://research.marshfieldclinic.org/genetics/sets/combo.html 15 7. Appendix The following is the raw output of StatGen.pl, upon which we base the discussion in section 5.2: SETS PER SIZE: 100 SET SIZE: 20 EFC: Mean Number of Colors = 7.26 +/- 1.12 Average Color Size = 2.75 Average Max Color Size = 4.15 Average Min Color Size = 1.46 EFC:++ Mean Number of Colors = 6.87 +/- 1.11 Average Color Size = 2.91 Average Max Color Size = 4.82 Average Min Color Size = 1.36 BFC: Mean Number of Colors = 7.11 +/- 1.14 Average Color Size = 2.81 Average Max Color Size = 4.09 Average Min Color Size = 1.68 BFC:++ Mean Number of Colors = 6.83 +/- 1.19 Average Color Size = 2.93 Average Max Color Size = 4.75 Average Min Color Size = 1.49 SLC: Mean Number of Colors = 6.86 +/- 1.16 Average Color Size = 2.92 Average Max Color Size = 3.57 Average Min Color Size = 2.40 SLC:++ Mean Number of Colors = 6.82 +/- 1.15 Average Color Size = 2.93 Average Max Color Size = 4.69 Average Min Color Size = 1.42 DIG2: Mean Number of Colors = 6.88 +/- 1.18 Average Color Size = 2.91 Average Max Color Size = 5.02 Average Min Color Size = 1.36 DIG2:++ Mean Number of Colors = 6.79 +/- 1.17 Average Color Size = 2.95 Average Max Color Size = 4.81 Average Min Color Size = 1.63 DIGd: Mean Number of Colors = 8.35 +/- 1.15 16 Average Color Size = 2.40 Average Max Color Size = 4.25 Average Min Color Size = 1.20 DIGd:++ Mean Number of Colors = 6.87 +/- 1.17 Average Color Size = 2.91 Average Max Color Size = 4.59 Average Min Color Size = 1.60 SET SIZE: 50 EFC: Mean Number of Colors = 14.29 +/- 1.45 Average Color Size = 3.50 Average Max Color Size = 5.33 Average Min Color Size = 1.58 EFC:++ Mean Number of Colors = 13.08 +/- 1.47 Average Color Size = 3.82 Average Max Color Size = 6.32 Average Min Color Size = 1.67 BFC: Mean Number of Colors = 13.85 +/- 1.63 Average Color Size = 3.61 Average Max Color Size = 5.18 Average Min Color Size = 1.79 BFC:++ Mean Number of Colors = 12.98 +/- 1.52 Average Color Size = 3.85 Average Max Color Size = 6.20 Average Min Color Size = 1.73 SLC: Mean Number of Colors = 12.98 +/- 1.52 Average Color Size = 3.85 Average Max Color Size = 4.51 Average Min Color Size = 3.22 SLC:++ Mean Number of Colors = 12.79 +/- 1.51 Average Color Size = 3.91 Average Max Color Size = 6.41 Average Min Color Size = 1.89 DIG2: Mean Number of Colors = 13.04 +/- 1.63 Average Color Size = 3.83 Average Max Color Size = 7.13 Average Min Color Size = 1.23 DIG2:++ Mean Number of Colors = 12.79 +/- 1.54 Average Color Size = 3.91 Average Max Color Size = 6.69 Average Min Color Size = 1.84 DIGd: 17 Mean Number of Colors = 15.52 +/- 1.67 Average Color Size = 3.22 Average Max Color Size = 6.18 Average Min Color Size = 1.13 DIGd:++ Mean Number of Colors = 12.98 +/- 1.54 Average Color Size = 3.85 Average Max Color Size = 6.37 Average Min Color Size = 1.79 SET SIZE: 100 EFC: Mean Number of Colors = 25.03 +/- 1.90 Average Color Size = 4.00 Average Max Color Size = 6.08 Average Min Color Size = 1.77 EFC:++ Mean Number of Colors = 22.57 +/- 1.88 Average Color Size = 4.43 Average Max Color Size = 7.82 Average Min Color Size = 1.63 BFC: Mean Number of Colors = 24.29 +/- 2.10 Average Color Size = 4.12 Average Max Color Size = 6.03 Average Min Color Size = 1.81 BFC:++ Mean Number of Colors = 22.24 +/- 1.94 Average Color Size = 4.50 Average Max Color Size = 7.63 Average Min Color Size = 1.78 SLC: Mean Number of Colors = 22.26 +/- 1.93 Average Color Size = 4.49 Average Max Color Size = 5.10 Average Min Color Size = 3.96 SLC:++ Mean Number of Colors = 21.84 +/- 1.98 Average Color Size = 4.58 Average Max Color Size = 8.16 Average Min Color Size = 1.94 DIG2: Mean Number of Colors = 22.24 +/- 2.11 Average Color Size = 4.50 Average Max Color Size = 8.70 Average Min Color Size = 1.11 DIG2:++ Mean Number of Colors = 21.97 +/- 2.06 Average Color Size = 4.55 Average Max Color Size = 8.29 Average Min Color Size = 1.87 18 DIGd: Mean Number of Colors = 25.92 +/- 2.52 Average Color Size = 3.86 Average Max Color Size = 7.42 Average Min Color Size = 1.21 DIGd:++ Mean Number of Colors = 22.15 +/- 2.03 Average Color Size = 4.51 Average Max Color Size = 7.94 Average Min Color Size = 1.88 SET SIZE: 200 EFC: Mean Number of Colors = 43.80 +/- 2.18 Average Color Size = 4.57 Average Max Color Size = 7.21 Average Min Color Size = 1.93 EFC:++ Mean Number of Colors = 39.53 +/- 2.54 Average Color Size = 5.06 Average Max Color Size = 9.11 Average Min Color Size = 1.88 BFC: Mean Number of Colors = 43.05 +/- 3.04 Average Color Size = 4.65 Average Max Color Size = 7.09 Average Min Color Size = 1.53 BFC:++ Mean Number of Colors = 38.98 +/- 2.66 Average Color Size = 5.13 Average Max Color Size = 8.94 Average Min Color Size = 1.77 SLC: Mean Number of Colors = 39.02 +/- 2.59 Average Color Size = 5.13 Average Max Color Size = 5.77 Average Min Color Size = 4.54 SLC:++ Mean Number of Colors = 38.30 +/- 2.71 Average Color Size = 5.22 Average Max Color Size = 10.12 Average Min Color Size = 2.22 DIG2: Mean Number of Colors = 38.53 +/- 3.19 Average Color Size = 5.19 Average Max Color Size = 10.60 Average Min Color Size = 1.12 DIG2:++ Mean Number of Colors = 38.09 +/- 2.96 Average Color Size = 5.25 Average Max Color Size = 10.14 Average Min Color Size = 1.99 19 DIGd: Mean Number of Colors = 43.59 +/- 3.01 Average Color Size = 4.59 Average Max Color Size = 9.10 Average Min Color Size = 1.23 DIGd:++ Mean Number of Colors = 38.64 +/- 2.62 Average Color Size = 5.18 Average Max Color Size = 9.39 Average Min Color Size = 1.86 SET SIZE: 300 EFC: Mean Number of Colors = 62.43 +/- 3.06 Average Color Size = 4.81 Average Max Color Size = 7.52 Average Min Color Size = 2.12 EFC:++ Mean Number of Colors = 56.19 +/- 3.14 Average Color Size = 5.34 Average Max Color Size = 9.86 Average Min Color Size = 1.90 BFC: Mean Number of Colors = 61.72 +/- 3.81 Average Color Size = 4.86 Average Max Color Size = 7.44 Average Min Color Size = 1.62 BFC:++ Mean Number of Colors = 55.11 +/- 2.96 Average Color Size = 5.44 Average Max Color Size = 9.79 Average Min Color Size = 1.77 SLC: Mean Number of Colors = 54.79 +/- 3.07 Average Color Size = 5.48 Average Max Color Size = 6.09 Average Min Color Size = 4.96 SLC:++ Mean Number of Colors = 54.01 +/- 3.11 Average Color Size = 5.55 Average Max Color Size = 11.36 Average Min Color Size = 2.35 DIG2: Mean Number of Colors = 54.24 +/- 3.63 Average Color Size = 5.53 Average Max Color Size = 11.30 Average Min Color Size = 1.09 DIG2:++ Mean Number of Colors = 53.65 +/- 3.52 Average Color Size = 5.59 Average Max Color Size = 10.69 20 Average Min Color Size = 1.95 DIGd: Mean Number of Colors = 59.95 +/- 3.66 Average Color Size = 5.00 Average Max Color Size = 10.00 Average Min Color Size = 1.19 DIGd:++ Mean Number of Colors = 54.33 +/- 3.25 Average Color Size = 5.52 Average Max Color Size = 10.28 Average Min Color Size = 1.86 SET SIZE: 500 EFC: Mean Number of Colors = 96.52 +/- 3.88 Average Color Size = 5.18 Average Max Color Size = 8.19 Average Min Color Size = 2.32 EFC:++ Mean Number of Colors = 87.26 +/- 3.80 Average Color Size = 5.73 Average Max Color Size = 10.87 Average Min Color Size = 1.89 BFC: Mean Number of Colors = 96.00 +/- 5.06 Average Color Size = 5.21 Average Max Color Size = 8.22 Average Min Color Size = 1.27 BFC:++ Mean Number of Colors = 85.79 +/- 3.88 Average Color Size = 5.83 Average Max Color Size = 11.08 Average Min Color Size = 1.78 SLC: Mean Number of Colors = 85.05 +/- 3.98 Average Color Size = 5.88 Average Max Color Size = 6.45 Average Min Color Size = 5.22 SLC:++ Mean Number of Colors = 83.88 +/- 3.72 Average Color Size = 5.96 Average Max Color Size = 12.88 Average Min Color Size = 2.47 DIG2: Mean Number of Colors = 83.81 +/- 4.49 Average Color Size = 5.97 Average Max Color Size = 12.55 Average Min Color Size = 1.02 DIG2:++ Mean Number of Colors = 83.07 +/- 4.33 Average Color Size = 6.02 21 Average Max Color Size = 12.14 Average Min Color Size = 1.92 DIGd: Mean Number of Colors = 91.69 +/- 5.08 Average Color Size = 5.45 Average Max Color Size = 10.76 Average Min Color Size = 1.23 DIGd:++ Mean Number of Colors = 84.34 +/- 4.08 Average Color Size = 5.93 Average Max Color Size = 11.22 Average Min Color Size = 1.87 SET SIZE: 750 EFC: Mean Number of Colors = 138.29 +/- 4.32 Average Color Size = 5.42 Average Max Color Size = 8.76 Average Min Color Size = 2.49 EFC:++ Mean Number of Colors = 124.57 +/- 4.11 Average Color Size = 6.02 Average Max Color Size = 11.43 Average Min Color Size = 2.06 BFC: Mean Number of Colors = 137.48 +/- 6.23 Average Color Size = 5.46 Average Max Color Size = 8.74 Average Min Color Size = 1.29 BFC:++ Mean Number of Colors = 123.05 +/- 4.29 Average Color Size = 6.10 Average Max Color Size = 11.66 Average Min Color Size = 1.87 SLC: Mean Number of Colors = 121.13 +/- 4.69 Average Color Size = 6.19 Average Max Color Size = 6.86 Average Min Color Size = 5.67 SLC:++ Mean Number of Colors = 119.67 +/- 4.55 Average Color Size = 6.27 Average Max Color Size = 14.57 Average Min Color Size = 2.61 DIG2: Mean Number of Colors = 118.37 +/- 5.03 Average Color Size = 6.34 Average Max Color Size = 13.34 Average Min Color Size = 1.04 DIG2:++ Mean Number of Colors = 117.71 +/- 4.84 22 Average Color Size = 6.37 Average Max Color Size = 13.38 Average Min Color Size = 2.01 DIGd: Mean Number of Colors = 127.51 +/- 4.65 Average Color Size = 5.88 Average Max Color Size = 11.58 Average Min Color Size = 1.17 DIGd:++ Mean Number of Colors = 119.73 +/- 4.55 Average Color Size = 6.26 Average Max Color Size = 12.08 Average Min Color Size = 2.03 SET SIZE: 1000 EFC: Mean Number of Colors = 178.89 +/- 5.21 Average Color Size = 5.59 Average Max Color Size = 9.14 Average Min Color Size = 2.50 EFC:++ Mean Number of Colors = 162.01 +/- 5.47 Average Color Size = 6.17 Average Max Color Size = 12.01 Average Min Color Size = 2.07 BFC: Mean Number of Colors = 179.16 +/- 7.57 Average Color Size = 5.58 Average Max Color Size = 8.97 Average Min Color Size = 1.13 BFC:++ Mean Number of Colors = 159.46 +/- 5.70 Average Color Size = 6.27 Average Max Color Size = 12.17 Average Min Color Size = 1.75 SLC: Mean Number of Colors = 157.49 +/- 5.92 Average Color Size = 6.35 Average Max Color Size = 7.05 Average Min Color Size = 5.80 SLC:++ Mean Number of Colors = 155.59 +/- 5.95 Average Color Size = 6.43 Average Max Color Size = 16.02 Average Min Color Size = 2.68 DIG2: Mean Number of Colors = 154.19 +/- 6.63 Average Color Size = 6.49 Average Max Color Size = 14.00 Average Min Color Size = 1.01 DIG2:++ 23 Mean Number of Colors = 153.25 +/- 6.40 Average Color Size = 6.53 Average Max Color Size = 13.89 Average Min Color Size = 1.94 DIGd: Mean Number of Colors = 164.20 +/- 6.57 Average Color Size = 6.09 Average Max Color Size = 12.13 Average Min Color Size = 1.15 DIGd:++ Mean Number of Colors = 155.33 +/- 6.09 Average Color Size = 6.44 Average Max Color Size = 12.78 Average Min Color Size = 1.88 24

Project in Bioinformatics – 236524

Related documents

Products

Support

Project in Bioinformatics – 236524

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib