Supplemental Materials I. Sensitivity analysis of dictionary learning model parameters 1.1 Effect of regularization coefficient λ1 and λ2 in Eq. 2 In this work, we have tried using different regularization parameter combinations in the dictionary learning. Then the learned dictionary was used to derive the eight connectome patterns as shown in Fig. 6. We can then reconstruct the WQCPs using the learned eight connectome patterns. By aiming at minimizing the reconstruction error, we could finally determine the optimal parameter combination, which are 0.005 and 0.05 for λ1 and λ2 respectively. In addition, the table below shows how the value of regularization coefficient would affect the final connectome patterns learned. Each cell in the table measures the average relative difference between the DICCCOL-based temporally dynamic functional connectome patterns obtained by using the parameter combination defined by the cell's row and column header, vs. the connectome patterns obtained using parameter set 0.005 and 0.05. From the table, it could be seen that within a reasonable range (λ1 from 0.002 to 0.01, λ2 from 0.01 to 0.05) the model results are similar. However, 1) if λ2 is greater than 0.05, the dictionary learning result would likely to be degraded to fewer sub-dictionaries thus cannot faithfully represent the original classes; 2) The model performance would become more unstable when the choice of parameters varies, and it would be hard to estimate how the results would deviates. λ2=0.001 0.002 0.005 0.01 0.02 0.05 λ1=0.001 12.07% 10.09% 11.99% 11.80% 5.54% 3.93% 0.002 12.22% 13.02% 10.96% 10.52% 4.36% 3.74% 0.005 11.62% 9.99% 6.56% 6.28% 3.74% N/A 0.01 10.87% 5.85% 6.68% 4.48% 6.48% 1.96% 0.02 12.82% 12.22% 10.21% 10.52% 9.95% 8.25% Supplemental Table 1. Relative differences between the results obtained using parameter combination of (0.005, 0.05) and the results obtained using other parameter combinations indicated by the row and column header. 1.2 Effect of classification weight (w) parameter in Eq. 3 The weight (w) parameter controls the balance between the reconstruction error and the distance between coefficient vectors and the mean vector in each class during the classification. In this work, it is set to 0.1 to ensure that the two values are on the same scale. To investigate how the classification results change with coefficient w, we tried the classifications using different values of w, and the results are summarized in the table below. By checking number of WQCPs classified to different classes comparing with the results obtained at w=0.1, we can see that the classification result would not be affected much by the value of w. We also examined the connectome patterns derived from each class with different weight values, and the patterns are consistent. W 0.01 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 5 10 Number of Classification Difference 0 0 N/A 0 0 0 1 2 2 2 2 2 4 4 4 Supplemental Table 2. Number of WQCPs classified to a different/"wrong" class by using different values of w, compared with the classification result using w=0.1. Value of 0 means all the WQCPs are classified into the "correct" class i.e. the classification results perfectly match using two weight values. 1.3 Effect of sliding time window size on the dynamic functional connectivity To investigate how the sliding time window size would affect the dynamic functional connectivity results. We randomly selected one subject, applied different window sizes ranging from 10 to 50, and compared the corresponding dynamic cumulative functional connectivities, which is the average of all the connectivities over the whole period (time windows), as shown in (a) in the figure below. We also visualized the corresponding standard deviation of each time point (window) in (b)-(e). It can be observed that both of the mean strength and the standard deviation of the functional connectivity will decrease as the length of the time window increases. Supplemental Figure 1. (a) The mean functional connectivity strength over the whole time points (windows) when the length of the time window is set to be 10, 14, 25 and 50 time points. (b)-(e) The standard deviation of the functional connectivity strength of the whole time points (windows) when the length of the time window is 10, 14, 25 and 50 time points. 1.4 Parameters used in multi-view co-training process There are two key parameters in the multi-view clustering method: the number of eigenvectors used (k) for co-training and the N-cut threshold (T) for clustering. The sensitivities of the two parameters in identifying different resting state networks are detailed below. The number of eigenvectors used (k): we have tried to use different numbers of k (15, 20, 25, 30, 35, 40) eigenvectors to perform the multi-view co-training and spectral clustering on multiple dynamic patterns, i.e. multi-pattern clustering. The corresponding clustering results are summarized in the figure below. The figure shows the corresponding dynamic DICCCOL clusters in the first dynamic temporal pattern of the brain, i.e. the Fig. 6(a) in our manuscript. The N-cut threshold for spectral clustering is fixed to be 0.2 for comparison and the optimal iteration number during each co-training procedure is selected according to ECC criterion in Section 3.3. Supplemental Figure 2. Multi-view co-training and spectral clustering results by using different values of k. From the above figure, it can be found that if small k value was set, some useful information will be removed (e.g. the high functional connection information in orange the circle when k=15) and will result in over-training. On the other hand, large k value will retain some uncommon information that should be removed and thus may cause under-training (the very small cluster in the red circle when k=40). When k=25 and k=30, both ten clusters were obtained which were almost the same (only 6 DICCCOL nodes are designed to different clusters). When k=35, totally nine clusters were obtained, where the sixth cluster (the cluster highlighted by the black square box) corresponds to the fifth and the eighth clusters (pointed by the two black arrows) when k=30. Thus we set k=30, which should be an appropriate value for our 358 DICCCOL landmarks. The N-cut threshold (T): the influence of the N-cut threshold is shown in Table 1 in the main manuscript. This threshold determines whether current cluster could be further divided. For example, when the threshold is set to be 0.05 and 0.1, nine clusters would be obtained; while the threshold is set to be 0.2, one of the clusters will be further divided into two sub-groups, and thus the total cluster number is ten. When the threshold is selected as 0.3, 0.4, and 0.5, no more clusters will be further divided, thus we chose 0.2 as the threshold as the result was stable at ten clusters. II. Analysis of the effect of head motion on results To identify the potential outlier time points caused by the head motion and correct them, we checked the relative motions of each time point of each subject by using tools provided by the FSL motion correction kit. Outlier time points were identified by the criteria of relative motion value>0.5mm. For each subject, averagely 9 time points were identified as outliers out of 200 time points. Then, we performed linear interpolation on those outlier time points to obtain a new time series, and afterwards use the new time series for the subsequent functional connectivity strength analysis. An illustration of the functional connectivity strength pattern obtained from original time series (a) and the new time series (b) of one randomly selected subject is presented in the figure below. The figure shows no observable differences between the functional connectivity strength obtained by the original and interpolated time series. Statistically, the subject-wise average difference of the functional connectivity strength before and after the outlier detection and linear interpolation is less than 5%. Supplemental Figure 3. Visualization of the functional connectivity strength matrix before (a) and after (b) the outlier identification and interpolation. In the figure, X-axis is the time points, while Yaxis is the indices of the 358 DICCCOL ROIs. Each cell is color-coded by the functional connectivity strength.