Supplemental Digital Content 5

advertisement
Supplemental Digital Content 5 : Description of the method used to
estimate confidence intervals for information transmission analysis
The confidence interval of information transmission analysis was estimated using a
bootstrapping non-parametric statistical method (Azadpour et al. 2014). For each
feature confusion matrix many new sample matrices were reconstructed by randomly
resampling with replacement the feature matrix, and the distribution of the information
transmission measures of the new matrices was used to estimate the 95% confidence
interval of information transmission of the original feature matrix. The following algorithm
was used to resample (bootstrap) the confusion matrices (Hardin et al. 1997). First the
values of all the cells of the confusion matrix were increased by a small number (1) to
assure that all cells have a value above zero. Then the matrix was converted into a list
of records where the number of records is equal to N (the sum of the cells in the matrix).
For each cell at ith row and jth column the symbol ‘Cij’ was listed for the number of
times equal to the value of the cell. N samples were randomly selected with
replacement from the list and a new matrix was generated from the set of N random
samples. The value in the ith row and jth column of the new matrix was set as the
number of times ‘Cij’ appeared in the random sample set. A large number of (2000) new
matrices were reconstructed by resampling the original confusion matrix as explained
above and information transmission analysis was performed on each matrix. The 95%
confidence intervals were obtained from the distribution of the information transmission
values of the generated resampled matrices.
A randomization method was also used to obtain the higher limits of the 95%
confidence intervals of the chance level, the latter being zero for an information
transmission analysis (Azadpour et al. 2014). Several consonant and vowel confusion
matrices were randomly generated (20000 times) on which information transmission
analysis was performed for each articulation feature. The maximum value for
information transmission that occurred in less than 97.5% of the matrices was used as
the highest limit for chance information transmission for that feature.
References:
Azadpour, M., McKay, C. M., Smith, R. L. (2014). Estimating confidence intervals for
information transfer analysis of confusion matrices. J Acoust Soc Am, 135.
Hardin, P. J., Shumway, J. M. (1997). Statistical significance and normalized confusion
matrices. Photogrammetric Engineering and Remote Sensing, 63, 735-740.
Download