Introductory Description for BestBet 2.0

The Matlab© program above is called BestBet because the title is both descriptive and memorable. The core idea of BestBet follows both the reasoning of Expectation Maximization (EM) and its iterative pursuit of the most complete estimate of the proportion of organisms possible. This algorithm works well without additional data such as a data base of constituent organisms, together with their restriction fragment lengths associated with these organisms. Although, in the cases that such a data base is available, the program is even more effective. To see the core idea, suppose that we look at the data and see that a specific cutoff, x1 has a plurality in over all other restriction fragment lengths. Similarly, suppose that x2 has a plurality for the second enzyme, and that x3 is the most abundant fragment length for the third enzyme. Now, considering this information, which triple of fragment lengths would you pick as most likely to give this information, that is, to have the largest expectation of creating this data set? Obviously, the best bet would be to assume that some of the samples contain an organism with restriction cut offs equal to (x1, x2, x3). We do not have a good way to immediately estimate the proportion of this organism, so we construct a learning model in the spirit of EM. We make the minimal assumption that at least some small amount of the organism exist in some of the samples with fragment lengths (x1, x2, x3). Remove a small amount of the organism from the data. This means that we remove a fixed small percentage of each fragment length, (x1, x2, x3), from each sample in which it occurs. These results will be reported in detail in the methods paper that supports and dovetails (Treusch, et al) but in the case of samples from the BATS location taken over roughly half a decade, BestBet was quite successful at picking out the top 3 – 5 organisms from each sample. Finally, to apply BestBet to the case of a [fairly] complete data base of organisms and triples of fragment lengths, run the program as before, only allowing triples (x1, x2, x3) from the data base to be considered. 1. Arthur Dempster, Nan Laird, and Donald Rubin. "likelihood from incomplete data via the EM algorithm". Journal of the Royal Statistical Society, Series B, 39(1):1– 38, 1977. 2. Dinov, ID. "Expectation Maximization and Mixture Modeling Tutorial". California Digital Library, Statistics Online Computational Resource, Paper EM_MM, http://repositories.cdlib.org/socr/EM_MM, December 9, 2008. 3. Alexander H. Treusch, Kevin L. Vergin, Liam A. Finlay, Michael G. Donatz, Bobert M. Burton, Craig A. Carlson and Stephen J. Giovannoni. "Seasonality and Vertical Structure of an Ocean Gyre". Submitted, January 20, ISME Journal.

Introductory Description for BestBet 2.0

Related documents

Products

Support

Introductory Description for BestBet 2.0

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib