TOPALi v2 is heavily (Java) packaged, although ideally everything would be a single “TOPALi” package. Due to the number of classes it is easier to create various subpackages for the separate areas of the program. topali.analysis Contains (non GUI) analysis code that is only ever run client side (such as parameter estimation). topali.cluster Contains webservices code for responding to requests to a tomcat server, along with all the cluster-side analysis code for the methods (even those that can run client side too). Subpackages also exist for the actual analysis-related webservices: topali.cluster.dss topali.cluster.hmm topali.cluster.lrt topali.cluster.pdm topali.cluster.trees topali.cluster.jobs Contains code for (client side) setup of jobs to be submitted as webservices, including code to submit, monitor and retrieve the results. topali.cluster.sge Contains code specific to running TOPALi analysis methods on a Sun Grid Engine (SGE) enabled cluster. Mainly to deal with job status monitoring. topali.data Contains all the data structures that make up a TOPALi project and are written to XML as part of a save operation. Data objects are a mixture of analysis result related objects and GUI save states (positions of components, last used values etc) topali.fileio Contains code to handle file input and output, mainly reading/writing the various multiple alignment formats that are supported. topali.gui Contains code relating to the main client interface. topali.gui.dialog Contains code for the various client-side dialog boxes and input frames that appear. topali.gui.dialog.hmm Contains additional code relating to the Run HMM dialog (because it had so many classes due to its complexity). topali.gui.dialog.settings Contains code for the Analysis|Settings dialog box. This is expected to contain a number of JPanels (likely to be one-per-class) and was therefore subpackaged. topali.gui.nav Contains code related to the navigation tree (visible down the left hand side of the client interface). topali.gui.results Contains code for displaying GUI components related to results that have been obtained from running analysis methods (mainly alignment graphs to date) topali.gui.tree Contains GUI code for dealing with phylogenetic trees. topali.mod Contains code that has been obtained elsewhere and modified for use within TOPALi. topali.vamsas Contains code that is related to the VAMSAS integration within TOPALi (so far code to read/write/import/export VAMSAS xml documents). Webservices and analyses are set up in a generic way. Using DSS as an example, the package will contain DSSWebService Contains the axis web service code that extends from topali.cluster.WebService. Is responsible for starting a thread that runs: RunDSS Performs initial DSS setup (in DSS’s case, breaking the full analysis into [N] jobs and simulating alignments for each job). The job (or job array) is then submitted to the cluster via DRMAA or qsub. The job startup scripts are located in /WEB-INF/cluster/sge – a DSS job is an array job calling DSSAnalysis on each alignment. DSSAnalysis Performs the actual analysis over an alignment, which is mainly a case of working out windows for that alignment and running the DSS method on each window DSS This is the class that performs the method Other classes may be involved for other web services too. The final common class is a collation class, responsible for checking on the progress of a jobtype when requested by a client-server call. CollateDSS Monitors the directories in use by the job, looking for completed subtasks and calculating a percentage complete based on the information gained. It is up to the job type itself to decide how and where to write percentage information that the collation class can monitor, but most tasks create a “percent” directory in the job’s home directory and write one file per percentage point complete, in the form p1, p2, p3, etc. DSSWebService RunDSS DSSAnalysis DSSAnalysis DSSAnalysis … [N] bootstraps DSS DSS DSS … [N] windows RunFitch RunFitch RunFitch PDM2Analysis PDMAnalysis.run() We run through a loop that is moving the window along the (sub partition) of the alignment that this job is looking at. For each window: 1. Run it through MrBayes 2. Take its results, and rewrite them into a win[n].txt file 3. Once data for a pair of windows is available, calculate a PDM score for them For 2) PDM. saveWindowResults() is used This writes the same file to two locations – the local “working” directory (on the cluster node) and the job’s “run” directory on the head node. As these files are being created, a float[] array holding each probability score is created for the PDM class. After two windows have been run, two of these arrays will exist: set1 (window1) and set2 (window2) Window1 set2 created Window2 set2 copied to set1 new set2 created Window3 set2 copied to set1 new set2 created This means that for any window pair there is always a set1 and set2. The file written to disk takes “pdm.nex.trprobs” from the MrBayes output and rewrites it so each line contains tree_string_1 prob_score_1 tree_string_2 prob_score_2 For 3) The PDM class holds the maximum Robinson-Faulds distance we expect (rfMax) that is calculated from 2n-6 where n is the number of sequences. (http://www.bio.umontreal.ca/casgrain/en/labo/robinson_foulds.html) (Robinson DF & Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53:131147) We run TreeDist on the two window files win[n-1].txt and win[n].txt (where n is the last run windowed position along the alignment). TreeDist compares all trees to calculate an RF distance for each pair. This file is “outfile” and is formatted, eg: 1 4 0 (tree1 in window1 against tree1 in window2) 1 5 2 (tree1 in window1 against tree2 in window2) 1 6 1 (tree1 in window1 against tree3 in window2) This file compares a) against b) but also b) against a) so only the first half of it is needed. Using this data, we then attempt to do a summation PDM score PDM.doCalculations() For each line in the file: Is it the first line? If so, we can determine the starting index (win2Index) of the 2nd window’s trees by looking at the 2nd column. (4 in the example above). We now know that to get “4” in set2 we have to do [[i]-4] Is the 2nd column value equal to 1? If so, then we know we’ve reached the half-way point and can stop. (as it’s window2_tree1 against window1_tree1) Calculate a PDM score for this line and add it to the current total (see below) To calculate a score for a given line we use: pK is the probability for win1, tree[n] – found in set1[index-1] qK is the probability for win2, tree[n] – found in set2[index-win2Index] Probability scores in set1 and set2 given as: set1: tree1 tree2 tree3 tree4 0.01 0.02 0.1 0.04 set2 tree5 0.02 tree6 0.03 TreeDist comparison provides: tree1 tree5 rfValue1/5 tree1 tree6 rfValue1/6 tree2 tree5 rfValue2/5 tree2 tree6 rfValue2/6 tree3 tree5 rfValue3/5 tree3 tree6 rfValue3/6 tree4 tree5 rfValue4/5 tree4 tree6 rfValue4/6 <and duplicates where 5 against 1, 5 against 2 etc> for each line, we then compute a PDM score, such that the first two scores will be: pK tree1 (0.01) and qK tree5 (0.02) and rfValue1/5 are the inputs pK tree1 (0.01) and qK tree6 (0.03) and rfValue1/6 are the inputs This score is summed over all iterations.