SUPPLEMENT PANET: A GPU-based tool for fast parallel analysis of robustness dynamics and feedforward/feedback loop structures in large-scale biological networks Hung-Cuong Trinh1, Duc-Hau Le2 and Yung-Keun Kwon1,* 1 School of Electrical Engineering, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan 680-749 School of Computer Science and Engineering, Water Resources University, 175 TaySon, Dong Da, Hanoi, Vietnam 2 1 Hung-Cuong Trinh et al. Text S1. A brief introduction to OpenCL In this work, we employed an OpenCL library which is designed to run on any available multi-core central processing unit (CPU) or graphics processing unit (GPU) (http://www.khronos.org/opencl/). It utilizes the tremendous computing power of a normal computer by operating the cores of the CPU or hundreds/thousands of cores in the GPU. In general, an OpenCLexecutable device is divided into one or more compute units (CUs). Each of these is further divided into one or more processing elements (PEs). CUs refer to cores in a multi-core CPU or streaming multiprocessors in a GPU whereas a PE represents a virtual scalar processor, an arithmetic logic unit of a CPU or a scalar processor of a GPU. In general, an OpenCL application is divided into two parts, the device and the host programs. The device program consists of special functions, called kernels, which are coded with the OpenCL programming language. On the other hand, the host program offers an interface to manage the device execution flow. In other words, the kernel is a basic unit of executable code that can run on GPU or CPU devices whereas the host program takes responsibility for sending kernels to be executed on devices using command queues. From a logical data-parallelism respect, the host program defines an N-dimensional array of work-items (N = 1, 2 or 3) in each of which the same kernel is executed. In addition, work-items are grouped into work-groups, and each work-group performs synchronization between work-items by sharing local memory. From the viewpoint of the OpenCL hardware architecture, the work-groups are distributed to CUs and the work-items in a work-group are executed concurrently on PEs of the same CU. OpenCL defines a hierarchy of different memory types in terms of functionality, size, and speed. The first type of memory is global memory, which has the largest size and the slowest bandwidth. It can be read and written by the host and the OpenCL device, and thus allows intercommunication between the host and the OpenCL device. The second type of memory is constant memory, which is the part of the global memory that remains constant during the execution of a kernel. The third type of memory is the local memory, which is the smallest but the fastest. Each CU has an individual local memory to be shared by the PEs within the CU. It can be used to synchronize between the work-items in the same work-group. The last one is private memory, which is private to a work-item. Variables defined in the private memory of a work-item are not visible to the other work-items. The programmer must choose the most appropriate memory in order to achieve the best possible performance with the available memory bandwidth. 2 Hung-Cuong Trinh et al. Text S2. OpenCL-based parallel computation of robustness (a) Pseudo-codes for robustness computation in parallel The following figure shows the pseudo-codes of two important functions, parallel_computing_attractors_for_all_states and parallel_computing_attractors_for_all_rules which can compute the attractors in parallel for all initial states (S) and every update rule (F), respectively, given a Boolean network. In computing attractors, we used an array ATT where each element ATT[s, f] represents an attractor of a network G(V, A) starting from the initial state s and the sequence of update rules f. The algorithm iteratively computes a state transition until it arrives at a state which has already been visited. We note that the dashed blocks denote kernel codes which are executed in parallel on CPUs or GPUs. In other words, the original NetDS serially computed the attractors for a number of initial states or update rules, whereas PANET computes them in parallel by distributing the tested cases to PEs in the OpenCL device. function[ATT] parallel_computing_attractors_for_all_states(V, A, f, S) // V, A: A set of nodes V={v1, v2, …, vN} and a set of links A of a network (Here, V[i] represents viV.) A sequence of update rules (Here, f = f1 f2 …fN and fi represents the // f: update rule with respect to viV.) A collection of initial states considered for the robustness investigation (Here, S[i] represents ith initial state in S.) // ATT: The resulting collection of attractors each of which is represented by a sequence of states. // S: ATT[0.. 2 -1] NULL; // Every element of ATT is initialized by NULL. nth[0.. 2|V| -1] 0; // Every element of nth is initialized by 0. |V| for i1 to |S| // for every state s S[i]; if (ATT[s, f] ≠ NULL) continue; endif traj NULL; count 0; while (TRUE) count++; traj trajs; // Here represents the string concatenation operation. nth[s] count; s’ update_states (V, A, f, s); // This computes the next state. if (nth[s’] ≠ 0) if (ATT[s’, f] = NULL) att trajnth[s’]..count; // Given a string t=t1t2…tT, ti..j represents // titi+1…tj-1tj which is a substring of t. else att ATT[s’, f]; endif for j1 to count ATT[trajj, f] = att; endfor break; function [ATT] parallel_computing_attractors_for_all_rules (V, A, F, s) // V, A: A set of nodes V={v1, v2,…, vN} and a set of links A of a network (Here, V[i] represents viV.) A collection of sequences of update rules (Here, F[i] represents ith sequence of update rules in F) //s: An initial state considered for the robustness investigation //ATT: The resulting collection of attractors each of which is represented by a sequence of states. //F: ATT[0..2|V|-1]NULL; // Every element in ATT is initialized by NULL. for i1 to |F| // for every rule nth[0.. 2|V|-1] 0; // Every element in nth is initialized by 0. trajNULL; count = 0; while (TRUE) count++; trajtrajs; // represents the string concatenation operation. nth[s] count; s’ update_states (V,A, F[i], s); // This computes the next state. if(nth[s’] ≠ 0) att = trajnth[s’]..count; // Given a string t=t1t2…tT, ti..j represents //titi+1…tj-1tj which is a substring of t. ATT[s, F[i]] = att; break; else ss’; endif endwhile endfor returnATT; end else s s’; endif endwhile endfor return ATT; end By using those functions, we can easily compute not only the robustness of a node against the initial-state perturbation and the update-rule perturbation (γs(v) and γr(v), respectively), but also the robustness of a network G against the initial-state perturbation and the update-rule perturbation (γs(G) and γr(G), respectively) as shown in the following pseudo-codes. 3 Hung-Cuong Trinh et al. function [] robustness_initial_state (V, A, f, S) // V, A: A set of nodes V={v1, v2, …, vN} and a set of links A of a network // f: // S: // : (Here, V[i] represents viV.) A sequence of update rules (Here, f = f1 f2 …fN and fi represents the update rule with respect to viV.) A collection of initial states considered for the robustness investigation (Here, S[i] represents ith initial state in S.) The resulting robustness against initial-state perturbations // Step 1: Examine the original attractors. ATT parallel_computing_attractors_for_all_states (V, A, f, S); function [] robustness_update_rule (V, A, f, S) // V, A: A set of nodes V={v1, v2, …, vN} and a set of links A of a network // f: // S: // : (Here, V[i] represents viV.) A sequence of update rules (Here, f = f1 f2 …fN and fi represents the update rule with respect to viV.) A collection of initial states considered for the robustness investigation (Here, S[i] represents ith initial state in S.) The resulting robustness against update-rule perturbations // Step 1: Examine the original attractors. ATT parallel_computing_attractors_for_all_states (V, A, f, S); // Step 2: Examine the changed attractors by initial-state perturbations. R[1.. |V|] 0; // Every element of R is initialized by 0. // Step 2: Examine the changed attractors by update-rule perturbations. R[1.. |V|] 0; // Every element of R is initialized by 0. for i1 to |S| S’[1..|V|] NULL; // Every element of S’ is initialized by NULL. for j1 to |V| s S[i]; sj 1- sj; // sj denotes the value of vj in s, and then the resultant s for i1 to |S| F[1..|V|] NULL; // Every element of F is initialized by NULL. for j1 to |V| f’ f; if (fj = AND ) f’j OR; else f’j AND; // f’means an update-rule perturbation at a node vjV. endif F[j] f’; endfor ATT’ parallel_computing_attractors_for_all_rules (V, A, F, S[i]); // denotes an initial-state perturbation at a node vjV. S’[j] s; endfor ATT’ parallel_computing_attractors_for_all_states(V, A, f, S’); for j1 to |V| if ( ATT[i] = ATT’[j] ) R[j]++; endif endfor endfor for j1 to |V| if ( ATT[i] = ATT’[j] ) R[j]++; endif endfor endfor // Step 3: Compute the robustness against the initial-state perturbations 0; for j1 to |V| R[j] = R[j]/|S|; // Here, R[j] represents s(vj). + R[j]; endfor / |V|; // As a result, represents the robustness of the given network. return ; end // Step 3: Compute the robustness against the update-rule perturbations 0; for j1 to |V| R[j] = R[j]/|S|; // Here, R[j] represents r(vj). + R[j]; endfor / |V|; // As a result, represents the robustness of the given network. return ; end 4 Hung-Cuong Trinh et al. Text S3. OpenCL-based parallel examination of feedback and feed-forward loops (a) A pseudo-code for efficient FBL search in parallel The following figure shows the pseudo-code of the ‘searchFBL’ function to search all feedback loops of a maximum length L in a given network G(V, A). For each link (vi, vj) ∈ A, the algorithm starts to search all the FBLs involving the link (vi, vj) based on depth-first-search (DFS). We note that the dashed block explains the searching task. It is a kernel code which can be executed in parallel on CPUs or GPUs. In addition, we improved the search speed by avoiding redundant search (* and ** lines in the pseudo-code). function [FBL] searchFBL(V, A, L) // V, A: // L: // FBL: A set of nodes V={v1, v2,…, vN} and a set of links A of a network (Here, V[i] represents viV.) The maximal FBL length to be examined The resultant set of the feedback loops found FBLNULL; for each (vi,vj)∈ A with i<j// To avoid redundant search (*) k 0; stack[k](vi,vj); visited[(vi,vj)]TRUE; while (k 0) (va,vb)stack[k]; if (k = L or b<i)// To avoid redundant search (**) k k-1; continue; endif if(b= i) FBLFBL∪{(stack[0], stack[1],…, stack[k])};// The sequence of links, (stack[0],stack[1],…,stack[k]) // eventually represents the feedback loop found. k k-1; continue; endif if ((vb,vc)∈A such that visited[(vb,vc)]≠TRUE) visited[(vb,vc)]TRUE; k k+1; stack[k] (vb,vc); else k k-1; for each(vb, v)∈A, visited[(vb, v)]FALSE; endfor endif endwhile endfor returnFBL; end 5 Hung-Cuong Trinh et al. (b) A pseudo-code for efficient FFL search in parallel The following figure shows the pseudo-code of the ‘searchFFL’ function to search all feed-forward loops of a maximum length L in a given network G(V, A). We have a set of source nodes VS and a set of destination nodes VD used to find FFLs. For each link (vs, vj) ∈ A with vs ∈ VS, the algorithm starts to search all the FFLs involving the link (vs, vj) based on depthfirst-search. We note that the dashed block explains the searching task. It is a kernel code which can be executed in parallel on CPUs or GPUs. function [FFL] searchFFL(VS,VD, A, L) // VS, VD: A set of source nodes VS and a set of destination nodes VD of a network (VS∩VD=) // A: // L: // FFL: A set of links A of a network The maximal FFL length to be examined The resultant set of the feed-forward loops found FFLNULL; for each (vs,vj)∈ A with vs∈ VS k 0; stack[k] (vs,vj); visited[(vs,vj)]TRUE; while (k 0) (va,vb)stack[k]; if (k= L) k k-1; continue; endif if(vb∈ VD) FFLFFL∪{(stack[0], stack[1],…, stack[k])};// The sequence of links, (stack[0],stack[1],…,stack[k]) // eventually represents the simple path found. k k-1; continue; endif if ((vb,vc)∈A such that visited[(vb,vc)]≠TRUE) visited[(vb,vc)]TRUE; k k+1; stack[k] (vb,vc); else k k-1; for each(vb, v)∈A, visited[(vb, v)]FALSE; endfor endif endwhile endfor returnFFL; end 6 Hung-Cuong Trinh et al. Text S4. Format of an output file by batch-mode simulation on RBNs After the batch-mode simulation is completed, two resultant files are created: “net_based_result.txt” and “node_based_result.txt”. The former and the latter describe network-based and node-based results, respectively. (a) Network-based result As shown in the figure below, “net_based_result.txt” consists of 11 network-based results with respect to robustness and FFL/FBL structures of RBNs. Each row describes a result of one RBN. Column 1 2 3 4 5 6 7 8 9 10 11 Name Network ID No.Nodes No.Edges sRobustness rRobustness NuFBL+ NuFBLNuCoFBL NuInCoFBL NuCoFFL NuInCoFFL Description The unique identification number of an RBN The number of nodes of an RBN The number of edges of an RBN The robustness against initial-state perturbation of an RBN The robustness against update-rule perturbation of an RBN The number of positive FBLs of an RBN The number of negative FBLs of an RBN The number of coherently coupled FBLs of an RBN The number of incoherently coupled FBLs of an RBN The number of coherently coupled FFLs of an RBN The number of incoherently coupled FFLs of an RBN (Column description in “net_based_result.txt”) (Example of “net_based_result.txt”) (b) Node-based result As shown in the figure below, “node_based_result.txt” shows more detailed results than “net_based_result.txt” because it includes the results with respect to robustness and FBL structures at each node level in the RBNs (The result regarding FFL structures are not included for simplicity, though, because there can be as many cases as the number of all pairs of nodes). 7 Hung-Cuong Trinh et al. Column 1 2 3 4 5 6 7 8 9 Name Network ID No.Nodes No.Edges Node ID sRobustness rRobustness NuFBL<=L NuFBL=L PosNuFBL<=L PosNuFBL=L NegNuFBL<=L NegNuFBL=L Description The unique identification number of an RBN The number of nodes of an RBN The number of edges of an RBN The unique identification number of a node The robustness against initial-state perturbation of a node The robustness against update-rule perturbation of a node The number of FBLs whose length <= L involved by a node The number of FBLs whose length = L involved by a node The number of positive FBLs whose length <= L involved by a node The number of positive FBLs whose length = L involved by a node The number of negative FBLs whose length <= L involved by a node The number of negative FBLs whose length = L involved by a node (Column description in “node_based_result.txt”) (Example of “node_based_result.txt”) 8 Hung-Cuong Trinh et al. 0.80 0.75 Shuffle I, |V| = 1609 and |A| = 5063 0.75 0.65 0.70 0.60 γr(G) 0.65 γr(G) Shuffle I, |V| = 818 and |A| = 1801 0.70 0.60 0.55 0.55 0.50 0.45 0.50 0.40 0.45 0.35 0.40 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.30 0.1000 0.24 0.1125 Ratio of coherent FFLs 0.1250 0.1375 (a) 0.745 Shuffle II, |V| = 1609 and |A| = 5063 0.780 0.740 0.775 0.735 0.770 0.730 0.765 0.725 0.760 0.715 0.750 0.710 0.745 0.705 0.325 0.350 0.375 0.400 0.425 Ratio of coherent FFLs (c) 0.1750 0.1875 0.450 0.475 Shuffle II, |V| = 818 and |A| = 1801 0.720 0.755 0.740 0.300 0.1625 (b) γr(G) γr(G) 0.785 0.1500 Ratio of coherent FFLs 0.700 0.125 0.150 0.175 0.200 0.225 0.250 0.275 Ratio of coherent FFLs (d) Figure S1. Relationship between the ratio of coherent FFLs and update-rule robustness in large-scale Boolean networks by Shuffling models. (a) Result of Shuffle I-based RBNs of the same size with HSN. (b) Result of Shuffle I-based RBNs of the same size with CCSN. (c) Result of Shuffle II-based RBNs of the same size with HSN. (d) Result of Shuffle II-based RBNs of the same size with CCSN. The maximal length of examined FFLs is set to 4 or 6 for (a) and (c), or (b) and (d), respectively. For robustness against update-rule perturbation, |S| is set to 1,024. In (a), (b) and (c), the correlations are not statistically significant (P-values = 0.715, 0.490, and 0.832, respectively). On the other hand, the correlation only in (d) is statistically significant (the slope of the regression line = 0.03297, P-value = 0.015). 9 Hung-Cuong Trinh et al. 0.98 1.00 |V| = 1609 and |A| = 5063 |V| = 818 and |A| = 1801 0.95 0.90 0.97 γs(G) γs(G) 0.85 0.96 0.80 0.75 0.70 0.95 0.65 0.60 0.94 0.35 0.40 0.45 0.50 0.55 0.60 0.30 0.65 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 Ratio of coherent FBLs Ratio of coherent FBLs (a) (b) 1.00 1.00 |V| = 50 and |A| = 97 |V| = 50 and |A| = 117 0.95 0.95 0.90 0.90 0.80 γs(G) γs(G) 0.85 0.75 0.85 0.70 0.80 0.65 0.60 0.75 0.55 0.70 0.50 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Ratio of coherent FBLs (c) 0.8 0.9 1.0 0.40 0.45 0.50 0.55 0.60 0.65 0.70 Ratio of coherent FBLs (d) Figure S2. Relationship between the ratio of coherent FBLs and initial-state robustness in Boolean networks by the ER model. (a) Result of RBNs of the same size with the HSN. (b) Result of RBNs of the same size with the CCSN. (c) Result of RBNs with |V| = 50 and |A| = 97. (d) Result of RBNs with |V| = 50 and |A| = 117. The maximal length of examined FBLs is set to 6, 8, 50 and 12, in (a) through (d), respectively. For robustness against initial-state perturbation, |S| is set to 1,024. In (a), (b), and (d), the correlations are not statistically significant (P-values = 0.052, 0.384, and 0.080, respectively). On the other hand, the correlation is significantly positive in (c) (the slope of the regression line = 0.05610, P-value = 0.012). 10 Hung-Cuong Trinh et al. 0.65 0.80 Shuffle I, |V| = 1609 and |A| = 5063 0.60 0.70 0.55 0.65 0.50 0.60 γs(G) γs(G) Shuffle I, |V| = 818 and |A| = 1801 0.75 0.45 0.55 0.50 0.45 0.40 0.40 0.35 0.35 0.30 0.4996 0.4998 0.5000 0.5002 0.5004 0.5006 0.5008 0.5010 0.30 0.4994 0.5012 0.4996 0.4998 (a) 0.670 0.5002 0.5004 0.5006 0.5008 0.5010 (b) Shuffle II, |V| = 1609 and |A| = 5063 0.685 0.665 Shuffle II, |V| = 818 and |A| = 1801 0.680 0.660 0.655 0.675 0.650 0.670 0.645 γs(G) γs(G) 0.5000 Ratio of coherent FBLs Ratio of coherent FBLs 0.640 0.665 0.635 0.660 0.630 0.655 0.625 0.650 0.620 0.615 0.51 0.52 0.53 0.54 0.55 0.56 0.57 Ratio of coherent FBLs (c) 0.58 0.59 0.60 0.645 0.496 0.498 0.500 0.502 0.504 0.506 0.508 0.510 0.512 Ratio of coherent FBLs (d) Figure S3. Relationship between the ratio of coherent FBLs and initial-state robustness in Boolean networks by Shuffling models. (a) Result of Shuffle I-based RBNs of the same size with the HSN. (b) Result of Shuffle I-based RBNs of the same size with the CCSN. (c) Result of Shuffle II-based RBNs of the same size with the HSN. (d) Result of Shuffle II-based RBNs of the same size with the CCSN. The maximal length of examined FBLs is set to 6 or 8 for (a) and (c), or (b) and (d), respectively. For robustness against initial-state perturbation, |S| is set to 1,024. In (a), (b), (c) and (d), the correlations are not statistically significant (P-values = 0.603, 0.356, 0.211 and 0.551, respectively). 11